Data Pipelines

Data Pipelines taxonomy generated by the site skill importer.

6 skills
A
clickhouse-io

by affaan-m

clickhouse-io is a ClickHouse-focused skill for schema design, analytical SQL, ingestion patterns, and performance tuning. Use it to guide MergeTree choices, partitioning, materialized views, and workload-specific query optimization.

Database Engineering
Favorites 0GitHub 156.1k
W
airflow-dag-patterns

by wshobson

airflow-dag-patterns helps design production-ready Apache Airflow DAGs with stronger task patterns, dependencies, operators, sensors, testing, and deployment guidance for scheduled jobs.

Scheduled Jobs
Favorites 0GitHub 32.6k
W
dbt-transformation-patterns

by wshobson

dbt-transformation-patterns helps agents structure dbt projects with staging, intermediate, and marts layers, plus testing, documentation, and incremental model guidance. Use it to plan installs, scaffold new repos, or refactor SQL into cleaner analytics engineering patterns for Database Engineering teams.

Database Engineering
Favorites 0GitHub 32.6k
W
spark-optimization

by wshobson

spark-optimization is a practical guide to diagnosing slow Apache Spark jobs with partitioning, shuffle, skew, caching, and memory tuning. Use it to install the skill from wshobson/agents, read SKILL.md, and apply evidence-based fixes from Spark UI symptoms, cluster settings, and query patterns.

Performance Optimization
Favorites 0GitHub 32.6k
M
data-analytics

by markdown-viewer

The data-analytics skill creates PlantUML diagrams for data analysis workflows, including ETL, ELT, data lakes, warehouses, streaming pipelines, log analytics, and BI dashboards. It is optimized for clear source-to-destination flow, AWS analytics/database stencils, and practical data-analytics guide output—not generic software or cloud architecture diagrams.

Data Analysis
Favorites 0GitHub 1.1k
W
ml-pipeline-workflow

by wshobson

ml-pipeline-workflow is a practical guide to designing end-to-end MLOps pipelines for data prep, training, validation, deployment, and monitoring, with orchestration patterns for repeatable workflow automation.

Workflow Automation
Favorites 0GitHub 0
Data Pipelines