data-quality-frameworks

by wshobson

The data-quality-frameworks skill helps teams plan production data validation with dbt tests, Great Expectations, and data contracts. Use it to choose the right checks, map them to a testing pyramid, and guide CI/CD-ready data quality workflows for Data Cleaning and pipeline reliability.

Stars32.6k

Favorites0

Comments0

AddedMar 30, 2026

CategoryData Cleaning

Install Command

npx skills add wshobson/agents --skill data-quality-frameworks

Curation Score

This skill scores 68/100, which means it is acceptable to list for directory users who want a substantial reference on data quality patterns, but they should expect to translate the guidance into their own environment rather than follow a tightly operationalized workflow. The repository evidence shows real content and clear triggers around Great Expectations, dbt tests, and data contracts, yet lacks install/runtime specifics, support files, or linked examples that would reduce execution guesswork further.

68/100

Strengths

Clear triggerability from frontmatter and "When to Use" guidance covering validation pipelines, dbt tests, data contracts, monitoring, and CI/CD.
Substantive documentation footprint: long SKILL.md with multiple sections, concepts, constraints, workflows, and code fences suggests real workflow content rather than a placeholder.
Useful cross-framework coverage: combines Great Expectations, dbt testing, and data contract patterns, giving agents a stronger starting point than a generic one-off prompt.

Cautions

Operational clarity is limited by missing support files, references, and repo/file links, so agents must infer implementation details for a specific stack.
No install command or executable assets are provided in the skill, which reduces confidence for quick adoption and reproducibility.

Data Quality Data Validation Data Contracts Dbt Great Expectations Ci Cd Data Engineering Workflow

Overview

Overview of data-quality-frameworks skill

What the data-quality-frameworks skill does

The data-quality-frameworks skill helps an agent design practical data quality validation using three common approaches: dbt tests, Great Expectations, and data contracts. It is aimed at teams that need more than a vague “add data checks” prompt and want a structured way to decide what to test, where to test it, and how to operationalize those checks in pipelines and CI/CD.

Who should use data-quality-frameworks

This skill is best for data engineers, analytics engineers, platform teams, and technical leads building repeatable quality controls for tables, models, and pipeline interfaces. It is especially useful when you need data-quality-frameworks for Data Cleaning in a production context, not just one-off exploratory cleanup.

The real job to be done

Users rarely want a framework name by itself. They want to answer questions like:

Which quality dimensions matter for this dataset?
Should this check live in SQL, dbt, Great Expectations, or a contract?
What is the minimum viable test suite before production?
How do we prevent schema drift and bad upstream changes?

The data-quality-frameworks skill is most valuable when the goal is to translate business reliability needs into concrete validation patterns.

What differentiates this skill from a generic prompt

The repository content is stronger on decision structure than on automation. It gives a reusable mental model centered on:

core data quality dimensions
a testing pyramid for data
framework selection across dbt, Great Expectations, and contracts
production-oriented use cases such as CI/CD and monitoring

That makes it more useful than a generic “write some data checks” prompt, but it still expects you to provide your stack, schemas, and failure thresholds.

What to know before you install

This is a text-only skill with guidance in SKILL.md. There are no helper scripts, templates, or reference files in the skill folder. Adoption is easy because there is little setup, but output quality depends heavily on the inputs you provide. If you want copy-paste-ready configs without supplying table details, this skill will feel incomplete.

How to Use data-quality-frameworks skill

Install context for data-quality-frameworks

Install the skill from the wshobson/agents repository:

npx skills add https://github.com/wshobson/agents --skill data-quality-frameworks

Because the skill lives as a single SKILL.md, there is no extra local package setup inside the skill itself. The main setup work is in your own environment: dbt, Great Expectations, warehouse access, and any CI runner you use.

Read this file first

Start with:

plugins/data-engineering/skills/data-quality-frameworks/SKILL.md

Since there are no supporting README, resources, or scripts, the fastest reading path is:

When to Use This Skill
Core Concepts
sections covering the testing pyramid and framework patterns
any implementation examples in code blocks

This is a short skill to consume, so the main gain comes from using it with a precise prompt, not from deep repository spelunking.

What input the skill needs from you

For strong data-quality-frameworks usage, give the agent:

dataset or model names
column list with types
expected grain or primary key
freshness expectations
allowed value ranges or enums
nullable vs required fields
known upstream/downstream dependencies
where checks should run: ingestion, transform, publish, or contract boundary
failure handling policy: warn, fail job, quarantine, alert

Without those details, the agent can only return generic examples like uniqueness, null, and range checks.

Turn a rough goal into a strong prompt

Weak prompt:

Help me add data quality checks.

Better prompt:

Use the data-quality-frameworks skill to design a validation plan for our orders pipeline. Source is raw event data loaded to BigQuery, transformed with dbt. Key fields: order_id, customer_id, order_status, order_total, created_at, updated_at. order_id must be unique at the mart layer. order_status must be one of pending, paid, shipped, cancelled, refunded. order_total must be >= 0. Freshness target is under 2 hours. We want: 1) source-level checks, 2) dbt tests, 3) any checks that fit Great Expectations, 4) a simple data contract for upstream producers, and 5) CI/CD recommendations with fail-vs-warn guidance.

That prompt works because it gives the skill enough context to map requirements to the right framework.

How to ask for the right output format

Ask the agent to produce outputs in layers:

quality dimensions by dataset
testing pyramid placement
concrete framework mapping
sample test definitions
rollout order

Example:

Using the data-quality-frameworks guide, return a table with columns: check, dimension, layer, framework, severity, reason. Then generate sample dbt tests and Great Expectations expectations only for the highest-value checks.

This reduces overengineering and keeps the first pass implementation-focused.

Practical workflow for data-quality-frameworks usage

A good workflow is:

Inventory your critical datasets.
Identify the grain and contract surface.
Classify checks by quality dimension.
Place each check in the testing pyramid.
Assign each check to dbt, Great Expectations, or a data contract.
Decide which checks block deployments and which only alert.
Implement the smallest reliable set first.

This skill is better for system design and validation planning than for brute-force generation of every possible test.

When to use dbt, Great Expectations, or contracts

Use the skill to separate concerns:

dbt fits model-level assertions like uniqueness, non-null, accepted values, and relationship tests.
Great Expectations fits richer validation workflows, profiling-style expectations, and runtime validation around pipeline stages.
Data contracts fit producer-consumer agreements such as schema shape, required fields, and semantic guarantees at boundaries.

A common mistake is forcing one tool to do everything. The data-quality-frameworks skill is most helpful when you use each framework for its natural layer.

What the testing pyramid means in practice

The skill’s testing pyramid is useful for prioritization. In practice:

put many cheap structural checks at lower levels
add fewer cross-table and business-rule checks at higher levels
reserve expensive end-to-end validation for the most critical paths

If your first plan contains only complex business assertions and no basic null, uniqueness, schema, or freshness checks, you are likely skipping the highest ROI layer.

What this skill does well for Data Cleaning

For data-quality-frameworks for Data Cleaning, the skill is best used to define ongoing validation after cleaning logic is introduced. It helps answer:

which bad inputs should be blocked
which values should be standardized
which anomalies should trigger review instead of pipeline failure
how to ensure cleaned outputs stay conformant over time

It is less about cleaning transformations themselves and more about proving those transformations produce trustworthy outputs.

Constraints and adoption tradeoffs

This skill has low installation friction but limited built-in implementation assets. Expect to do your own translation into project files such as:

models/*.yml for dbt
expectation suites or checkpoints for Great Expectations
contract documents in your preferred schema format

If you need a repository with ready-made templates, this skill is lighter-weight than that. Its value is in helping an agent reason correctly, not in shipping a turnkey starter kit.

data-quality-frameworks skill FAQ

Is data-quality-frameworks good for beginners?

Yes, if you already understand basic tables, columns, and pipelines. The concepts are approachable: quality dimensions, test layering, and framework selection. Absolute beginners may still need separate documentation for dbt or Great Expectations syntax because the skill is not a full tutorial for either tool.

Is this better than an ordinary prompt?

Usually yes, when your problem is framework choice and test strategy. A normal prompt may generate random checks. The data-quality-frameworks skill gives the agent a more disciplined structure: dimensions, pyramid, and framework fit. That usually leads to fewer irrelevant tests.

What is the main limitation?

The skill does not include helper files, implementation templates, or project-specific adapters. It cannot infer your warehouse semantics, SLAs, or business rules unless you provide them. The quality of the result is tightly tied to the specificity of your prompt.

When should I not use data-quality-frameworks?

Skip it if you only need a one-line check for a single CSV or a quick ad hoc cleanup script. It is also a weak fit if your team has already standardized fully on one framework and only needs syntax snippets, not design guidance.

Can I use data-quality-frameworks with only dbt?

Yes. Even though the skill mentions multiple frameworks, you can ask it to constrain recommendations to dbt only. The same applies if your team prefers Great Expectations or wants to focus on data contracts first.

Does it help with CI/CD decisions?

Yes. One of the clearer use cases in the source skill is automating validation in CI/CD. Ask explicitly which checks should fail pull requests, which should run post-deploy, and which should produce alerts only. That distinction materially improves the usefulness of the output.

How to Improve data-quality-frameworks skill

Give the agent dataset semantics, not just schema

The fastest way to improve data-quality-frameworks results is to include meaning, not just columns. For example:

“customer_id can be null for guest checkout”
“revenue_amount should never be negative except for refunds”
“status values are controlled by the application enum”

These details let the agent recommend realistic validity and consistency checks instead of generic ones.

Separate critical checks from nice-to-have checks

Tell the agent which failures are production blockers. Example:

Tier 1: schema drift, null primary keys, duplicate business keys.
Tier 2: freshness breaches over 2 hours.
Tier 3: soft anomaly detection on distribution shifts.

This helps the skill produce a plan your team can actually adopt instead of a long backlog that never ships.

Ask for framework mapping, not just a flat list

A common failure mode is getting 30 checks with no implementation path. Improve the prompt by requiring every check to include:

dimension
layer
framework
severity
owner

That turns the data-quality-frameworks guide into an execution plan rather than an idea dump.

Provide sample rows and known bad cases

If you want better data-quality-frameworks usage, include examples of both valid and invalid data. Known failure examples help the agent write sharper rules around:

edge-case nullability
date ordering
enum drift
duplicate logic
impossible value combinations

Real bad cases are often more informative than a perfect schema.

Iterate after the first output

Do not stop at the first generated plan. Ask follow-ups like:

“Which 5 tests give the highest reliability per hour of work?”
“Which recommendations belong in dbt versus contracts?”
“Which checks are likely too expensive for every run?”
“Rewrite this for BigQuery and incremental models.”

The data-quality-frameworks skill improves noticeably when used as a narrowing tool over two or three iterations.

Watch for common overdesign mistakes

The most common mistakes are:

starting with expensive end-to-end assertions
treating profiling as a substitute for hard guarantees
mixing data cleaning logic with validation logic
failing jobs on every anomaly, causing alert fatigue
writing tests with no clear owner or remediation path

If you ask the agent to rank checks by cost, confidence, and operational impact, the output usually becomes much more deployable.

Ask for a phased rollout plan

A strong improvement prompt is:

Using data-quality-frameworks, create a 30/60/90-day rollout: immediate checks, next-layer business assertions, and longer-term contract governance.

This keeps teams from trying to implement every framework at once. In most cases, the best path is basic dbt tests first, then targeted Great Expectations, then broader contract discipline at team boundaries.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

lamindb

by K-Dense-AI

The lamindb skill helps you work with LaminDB, an open-source biology data framework for making data queryable, traceable, reproducible, and FAIR. Use it for lamindb for Data Analysis, metadata curation, ontology-based annotation, schema validation, and lineage-aware workflows across notebooks and pipelines.

Data Analysis

Favorites 0GitHub 0

exploratory-data-analysis

by K-Dense-AI

The exploratory-data-analysis skill turns scientific files into format-aware EDA reports. It detects file type, summarizes structure and quality, extracts key metadata, and suggests downstream analysis. Use it for exploratory-data-analysis for Data Analysis across chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and other scientific file formats.

Data Analysis

Favorites 0GitHub 0

read-file

by duckdb

read-file helps an agent read and inspect CSV, JSON, Parquet, Avro, Excel, SQLite, spatial files, or remote URLs with DuckDB. Use it to preview rows, check schema, profile data, and answer what’s in this file. It’s best for read-file usage on real data artifacts, not source code.

Office Documents

Favorites 0GitHub 443

dummy-dataset

by phuryn

dummy-dataset generates realistic test data in CSV, JSON, SQL, or Python script form. It helps with mock datasets, demos, database seeding, QA, and data cleaning by letting you define columns, row counts, and constraints for believable sample records.

Data Cleaning

Favorites 0GitHub 11.1k

data-analyst

by Shubhamsaboo

data-analyst is a minimal GitHub skill that guides agents toward SQL, pandas, and basic statistical analysis for data exploration. Best for users who want code-backed queries, transformations, and interpretations from a single SKILL.md prompt layer.

Data Analysis

Favorites 0GitHub 104.2k

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Skill Authoring

Favorites 2GitHub 105.1k

azure-identity-py

by microsoft

azure-identity-py helps set up Azure authentication in Python with Microsoft Entra ID. Use it to choose DefaultAzureCredential, managed identity, or service principal auth, configure environment variables, and troubleshoot access control and credential chain issues. Install guidance, usage patterns, and practical setup notes are based on the repo skill file.

Access Control

Favorites 0GitHub 2.2k

claude-api

by anthropics

claude-api is a practical skill for installing and using the Claude API and Anthropic SDKs. It helps developers choose the right SDK or raw HTTP path, detect language-specific docs, and implement streaming, tool use, files, batches, and error handling with less guesswork.

API Development

Favorites 0GitHub 105k

wrangler

by cloudflare

The wrangler skill helps you find correct CLI commands, config shapes, and deployment steps for Cloudflare Workers. Use it for wrangler usage, wrangler install checks, and a practical wrangler guide when building or shipping Workers for Backend Development.

Backend Development

Favorites 0GitHub 1.3k

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

clickhouse-architecture-advisor

by ClickHouse

clickhouse-architecture-advisor helps design ClickHouse workloads with workload-aware decisions for ingestion, partitioning, joins, dictionaries, upserts, and pre-aggregation. It is especially useful for Backend Development, observability, SIEM, product analytics, IoT telemetry, and financial pipelines. The skill labels guidance as official, derived, or field.

Backend Development

Favorites 0GitHub 412

figma-generate-library

by figma

figma-generate-library helps you build or update a Figma design system from a codebase with an ordered workflow for tokens, component libraries, documentation, and light/dark theming. Use the figma-generate-library skill when you need a practical guide for Design Systems, not a one-off mockup. It complements figma-use for Plugin API calls.

Design Systems

Favorites 0GitHub 0