Data Engineering

Data Engineering taxonomy generated by the site skill importer.

8 skills
A
clickhouse-io

by affaan-m

clickhouse-io is a ClickHouse-focused skill for schema design, analytical SQL, ingestion patterns, and performance tuning. Use it to guide MergeTree choices, partitioning, materialized views, and workload-specific query optimization.

Database Engineering
Favorites 0GitHub 156.1k
W
airflow-dag-patterns

by wshobson

airflow-dag-patterns helps design production-ready Apache Airflow DAGs with stronger task patterns, dependencies, operators, sensors, testing, and deployment guidance for scheduled jobs.

Scheduled Jobs
Favorites 0GitHub 32.6k
W
data-quality-frameworks

by wshobson

The data-quality-frameworks skill helps teams plan production data validation with dbt tests, Great Expectations, and data contracts. Use it to choose the right checks, map them to a testing pyramid, and guide CI/CD-ready data quality workflows for Data Cleaning and pipeline reliability.

Data Cleaning
Favorites 0GitHub 32.6k
W
dbt-transformation-patterns

by wshobson

dbt-transformation-patterns helps agents structure dbt projects with staging, intermediate, and marts layers, plus testing, documentation, and incremental model guidance. Use it to plan installs, scaffold new repos, or refactor SQL into cleaner analytics engineering patterns for Database Engineering teams.

Database Engineering
Favorites 0GitHub 32.6k
W
spark-optimization

by wshobson

spark-optimization is a practical guide to diagnosing slow Apache Spark jobs with partitioning, shuffle, skew, caching, and memory tuning. Use it to install the skill from wshobson/agents, read SKILL.md, and apply evidence-based fixes from Spark UI symptoms, cluster settings, and query patterns.

Performance Optimization
Favorites 0GitHub 32.6k
M
data-analytics

by markdown-viewer

The data-analytics skill creates PlantUML diagrams for data analysis workflows, including ETL, ELT, data lakes, warehouses, streaming pipelines, log analytics, and BI dashboards. It is optimized for clear source-to-destination flow, AWS analytics/database stencils, and practical data-analytics guide output—not generic software or cloud architecture diagrams.

Data Analysis
Favorites 0GitHub 1.1k
T
tinybird-python-sdk-guidelines

by tinybirdco

tinybird-python-sdk-guidelines helps you install and use tinybird-sdk for Python-based Tinybird projects. It covers datasources, endpoints, clients, connections, migration from legacy files, and backend development workflows with build and deploy guidance.

Backend Development
Favorites 0GitHub 16
K
lamindb

by K-Dense-AI

The lamindb skill helps you work with LaminDB, an open-source biology data framework for making data queryable, traceable, reproducible, and FAIR. Use it for lamindb for Data Analysis, metadata curation, ontology-based annotation, schema validation, and lineage-aware workflows across notebooks and pipelines.

Data Analysis
Favorites 0GitHub 0
Data Engineering