data-quality-frameworks
by wshobsonThe data-quality-frameworks skill helps teams plan production data validation with dbt tests, Great Expectations, and data contracts. Use it to choose the right checks, map them to a testing pyramid, and guide CI/CD-ready data quality workflows for Data Cleaning and pipeline reliability.
This skill scores 68/100, which means it is acceptable to list for directory users who want a substantial reference on data quality patterns, but they should expect to translate the guidance into their own environment rather than follow a tightly operationalized workflow. The repository evidence shows real content and clear triggers around Great Expectations, dbt tests, and data contracts, yet lacks install/runtime specifics, support files, or linked examples that would reduce execution guesswork further.
- Clear triggerability from frontmatter and "When to Use" guidance covering validation pipelines, dbt tests, data contracts, monitoring, and CI/CD.
- Substantive documentation footprint: long SKILL.md with multiple sections, concepts, constraints, workflows, and code fences suggests real workflow content rather than a placeholder.
- Useful cross-framework coverage: combines Great Expectations, dbt testing, and data contract patterns, giving agents a stronger starting point than a generic one-off prompt.
- Operational clarity is limited by missing support files, references, and repo/file links, so agents must infer implementation details for a specific stack.
- No install command or executable assets are provided in the skill, which reduces confidence for quick adoption and reproducibility.
Overview of data-quality-frameworks skill
What the data-quality-frameworks skill does
The data-quality-frameworks skill helps an agent design practical data quality validation using three common approaches: dbt tests, Great Expectations, and data contracts. It is aimed at teams that need more than a vague “add data checks” prompt and want a structured way to decide what to test, where to test it, and how to operationalize those checks in pipelines and CI/CD.
Who should use data-quality-frameworks
This skill is best for data engineers, analytics engineers, platform teams, and technical leads building repeatable quality controls for tables, models, and pipeline interfaces. It is especially useful when you need data-quality-frameworks for Data Cleaning in a production context, not just one-off exploratory cleanup.
The real job to be done
Users rarely want a framework name by itself. They want to answer questions like:
- Which quality dimensions matter for this dataset?
- Should this check live in SQL,
dbt,Great Expectations, or a contract? - What is the minimum viable test suite before production?
- How do we prevent schema drift and bad upstream changes?
The data-quality-frameworks skill is most valuable when the goal is to translate business reliability needs into concrete validation patterns.
What differentiates this skill from a generic prompt
The repository content is stronger on decision structure than on automation. It gives a reusable mental model centered on:
- core data quality dimensions
- a testing pyramid for data
- framework selection across
dbt,Great Expectations, and contracts - production-oriented use cases such as CI/CD and monitoring
That makes it more useful than a generic “write some data checks” prompt, but it still expects you to provide your stack, schemas, and failure thresholds.
What to know before you install
This is a text-only skill with guidance in SKILL.md. There are no helper scripts, templates, or reference files in the skill folder. Adoption is easy because there is little setup, but output quality depends heavily on the inputs you provide. If you want copy-paste-ready configs without supplying table details, this skill will feel incomplete.
How to Use data-quality-frameworks skill
Install context for data-quality-frameworks
Install the skill from the wshobson/agents repository:
npx skills add https://github.com/wshobson/agents --skill data-quality-frameworks
Because the skill lives as a single SKILL.md, there is no extra local package setup inside the skill itself. The main setup work is in your own environment: dbt, Great Expectations, warehouse access, and any CI runner you use.
Read this file first
Start with:
plugins/data-engineering/skills/data-quality-frameworks/SKILL.md
Since there are no supporting README, resources, or scripts, the fastest reading path is:
When to Use This SkillCore Concepts- sections covering the testing pyramid and framework patterns
- any implementation examples in code blocks
This is a short skill to consume, so the main gain comes from using it with a precise prompt, not from deep repository spelunking.
What input the skill needs from you
For strong data-quality-frameworks usage, give the agent:
- dataset or model names
- column list with types
- expected grain or primary key
- freshness expectations
- allowed value ranges or enums
- nullable vs required fields
- known upstream/downstream dependencies
- where checks should run: ingestion, transform, publish, or contract boundary
- failure handling policy: warn, fail job, quarantine, alert
Without those details, the agent can only return generic examples like uniqueness, null, and range checks.
Turn a rough goal into a strong prompt
Weak prompt:
Help me add data quality checks.
Better prompt:
Use the
data-quality-frameworksskill to design a validation plan for ourorderspipeline. Source is raw event data loaded to BigQuery, transformed withdbt. Key fields:order_id,customer_id,order_status,order_total,created_at,updated_at.order_idmust be unique at the mart layer.order_statusmust be one ofpending,paid,shipped,cancelled,refunded.order_totalmust be>= 0. Freshness target is under 2 hours. We want: 1) source-level checks, 2) dbt tests, 3) any checks that fit Great Expectations, 4) a simple data contract for upstream producers, and 5) CI/CD recommendations with fail-vs-warn guidance.
That prompt works because it gives the skill enough context to map requirements to the right framework.
How to ask for the right output format
Ask the agent to produce outputs in layers:
- quality dimensions by dataset
- testing pyramid placement
- concrete framework mapping
- sample test definitions
- rollout order
Example:
Using the
data-quality-frameworks guide, return a table with columns:check,dimension,layer,framework,severity,reason. Then generate sampledbttests andGreat Expectationsexpectations only for the highest-value checks.
This reduces overengineering and keeps the first pass implementation-focused.
Practical workflow for data-quality-frameworks usage
A good workflow is:
- Inventory your critical datasets.
- Identify the grain and contract surface.
- Classify checks by quality dimension.
- Place each check in the testing pyramid.
- Assign each check to
dbt,Great Expectations, or a data contract. - Decide which checks block deployments and which only alert.
- Implement the smallest reliable set first.
This skill is better for system design and validation planning than for brute-force generation of every possible test.
When to use dbt, Great Expectations, or contracts
Use the skill to separate concerns:
dbtfits model-level assertions like uniqueness, non-null, accepted values, and relationship tests.Great Expectationsfits richer validation workflows, profiling-style expectations, and runtime validation around pipeline stages.- Data contracts fit producer-consumer agreements such as schema shape, required fields, and semantic guarantees at boundaries.
A common mistake is forcing one tool to do everything. The data-quality-frameworks skill is most helpful when you use each framework for its natural layer.
What the testing pyramid means in practice
The skill’s testing pyramid is useful for prioritization. In practice:
- put many cheap structural checks at lower levels
- add fewer cross-table and business-rule checks at higher levels
- reserve expensive end-to-end validation for the most critical paths
If your first plan contains only complex business assertions and no basic null, uniqueness, schema, or freshness checks, you are likely skipping the highest ROI layer.
What this skill does well for Data Cleaning
For data-quality-frameworks for Data Cleaning, the skill is best used to define ongoing validation after cleaning logic is introduced. It helps answer:
- which bad inputs should be blocked
- which values should be standardized
- which anomalies should trigger review instead of pipeline failure
- how to ensure cleaned outputs stay conformant over time
It is less about cleaning transformations themselves and more about proving those transformations produce trustworthy outputs.
Constraints and adoption tradeoffs
This skill has low installation friction but limited built-in implementation assets. Expect to do your own translation into project files such as:
models/*.ymlfordbt- expectation suites or checkpoints for
Great Expectations - contract documents in your preferred schema format
If you need a repository with ready-made templates, this skill is lighter-weight than that. Its value is in helping an agent reason correctly, not in shipping a turnkey starter kit.
data-quality-frameworks skill FAQ
Is data-quality-frameworks good for beginners?
Yes, if you already understand basic tables, columns, and pipelines. The concepts are approachable: quality dimensions, test layering, and framework selection. Absolute beginners may still need separate documentation for dbt or Great Expectations syntax because the skill is not a full tutorial for either tool.
Is this better than an ordinary prompt?
Usually yes, when your problem is framework choice and test strategy. A normal prompt may generate random checks. The data-quality-frameworks skill gives the agent a more disciplined structure: dimensions, pyramid, and framework fit. That usually leads to fewer irrelevant tests.
What is the main limitation?
The skill does not include helper files, implementation templates, or project-specific adapters. It cannot infer your warehouse semantics, SLAs, or business rules unless you provide them. The quality of the result is tightly tied to the specificity of your prompt.
When should I not use data-quality-frameworks?
Skip it if you only need a one-line check for a single CSV or a quick ad hoc cleanup script. It is also a weak fit if your team has already standardized fully on one framework and only needs syntax snippets, not design guidance.
Can I use data-quality-frameworks with only dbt?
Yes. Even though the skill mentions multiple frameworks, you can ask it to constrain recommendations to dbt only. The same applies if your team prefers Great Expectations or wants to focus on data contracts first.
Does it help with CI/CD decisions?
Yes. One of the clearer use cases in the source skill is automating validation in CI/CD. Ask explicitly which checks should fail pull requests, which should run post-deploy, and which should produce alerts only. That distinction materially improves the usefulness of the output.
How to Improve data-quality-frameworks skill
Give the agent dataset semantics, not just schema
The fastest way to improve data-quality-frameworks results is to include meaning, not just columns. For example:
- “
customer_idcan be null for guest checkout” - “
revenue_amountshould never be negative except for refunds” - “
statusvalues are controlled by the application enum”
These details let the agent recommend realistic validity and consistency checks instead of generic ones.
Separate critical checks from nice-to-have checks
Tell the agent which failures are production blockers. Example:
Tier 1: schema drift, null primary keys, duplicate business keys.
Tier 2: freshness breaches over 2 hours.
Tier 3: soft anomaly detection on distribution shifts.
This helps the skill produce a plan your team can actually adopt instead of a long backlog that never ships.
Ask for framework mapping, not just a flat list
A common failure mode is getting 30 checks with no implementation path. Improve the prompt by requiring every check to include:
dimensionlayerframeworkseverityowner
That turns the data-quality-frameworks guide into an execution plan rather than an idea dump.
Provide sample rows and known bad cases
If you want better data-quality-frameworks usage, include examples of both valid and invalid data. Known failure examples help the agent write sharper rules around:
- edge-case nullability
- date ordering
- enum drift
- duplicate logic
- impossible value combinations
Real bad cases are often more informative than a perfect schema.
Iterate after the first output
Do not stop at the first generated plan. Ask follow-ups like:
- “Which 5 tests give the highest reliability per hour of work?”
- “Which recommendations belong in
dbtversus contracts?” - “Which checks are likely too expensive for every run?”
- “Rewrite this for BigQuery and incremental models.”
The data-quality-frameworks skill improves noticeably when used as a narrowing tool over two or three iterations.
Watch for common overdesign mistakes
The most common mistakes are:
- starting with expensive end-to-end assertions
- treating profiling as a substitute for hard guarantees
- mixing data cleaning logic with validation logic
- failing jobs on every anomaly, causing alert fatigue
- writing tests with no clear owner or remediation path
If you ask the agent to rank checks by cost, confidence, and operational impact, the output usually becomes much more deployable.
Ask for a phased rollout plan
A strong improvement prompt is:
Using
data-quality-frameworks, create a 30/60/90-day rollout: immediate checks, next-layer business assertions, and longer-term contract governance.
This keeps teams from trying to implement every framework at once. In most cases, the best path is basic dbt tests first, then targeted Great Expectations, then broader contract discipline at team boundaries.
