W

data-quality-frameworks

by wshobson

The data-quality-frameworks skill helps teams plan production data validation with dbt tests, Great Expectations, and data contracts. Use it to choose the right checks, map them to a testing pyramid, and guide CI/CD-ready data quality workflows for Data Cleaning and pipeline reliability.

Stars32.6k
Favorites0
Comments0
AddedMar 30, 2026
CategoryData Cleaning
Install Command
npx skills add wshobson/agents --skill data-quality-frameworks
Curation Score

This skill scores 68/100, which means it is acceptable to list for directory users who want a substantial reference on data quality patterns, but they should expect to translate the guidance into their own environment rather than follow a tightly operationalized workflow. The repository evidence shows real content and clear triggers around Great Expectations, dbt tests, and data contracts, yet lacks install/runtime specifics, support files, or linked examples that would reduce execution guesswork further.

68/100
Strengths
  • Clear triggerability from frontmatter and "When to Use" guidance covering validation pipelines, dbt tests, data contracts, monitoring, and CI/CD.
  • Substantive documentation footprint: long SKILL.md with multiple sections, concepts, constraints, workflows, and code fences suggests real workflow content rather than a placeholder.
  • Useful cross-framework coverage: combines Great Expectations, dbt testing, and data contract patterns, giving agents a stronger starting point than a generic one-off prompt.
Cautions
  • Operational clarity is limited by missing support files, references, and repo/file links, so agents must infer implementation details for a specific stack.
  • No install command or executable assets are provided in the skill, which reduces confidence for quick adoption and reproducibility.
Overview

Overview of data-quality-frameworks skill

What the data-quality-frameworks skill does

The data-quality-frameworks skill helps an agent design practical data quality validation using three common approaches: dbt tests, Great Expectations, and data contracts. It is aimed at teams that need more than a vague “add data checks” prompt and want a structured way to decide what to test, where to test it, and how to operationalize those checks in pipelines and CI/CD.

Who should use data-quality-frameworks

This skill is best for data engineers, analytics engineers, platform teams, and technical leads building repeatable quality controls for tables, models, and pipeline interfaces. It is especially useful when you need data-quality-frameworks for Data Cleaning in a production context, not just one-off exploratory cleanup.

The real job to be done

Users rarely want a framework name by itself. They want to answer questions like:

  • Which quality dimensions matter for this dataset?
  • Should this check live in SQL, dbt, Great Expectations, or a contract?
  • What is the minimum viable test suite before production?
  • How do we prevent schema drift and bad upstream changes?

The data-quality-frameworks skill is most valuable when the goal is to translate business reliability needs into concrete validation patterns.

What differentiates this skill from a generic prompt

The repository content is stronger on decision structure than on automation. It gives a reusable mental model centered on:

  • core data quality dimensions
  • a testing pyramid for data
  • framework selection across dbt, Great Expectations, and contracts
  • production-oriented use cases such as CI/CD and monitoring

That makes it more useful than a generic “write some data checks” prompt, but it still expects you to provide your stack, schemas, and failure thresholds.

What to know before you install

This is a text-only skill with guidance in SKILL.md. There are no helper scripts, templates, or reference files in the skill folder. Adoption is easy because there is little setup, but output quality depends heavily on the inputs you provide. If you want copy-paste-ready configs without supplying table details, this skill will feel incomplete.

How to Use data-quality-frameworks skill

Install context for data-quality-frameworks

Install the skill from the wshobson/agents repository:

npx skills add https://github.com/wshobson/agents --skill data-quality-frameworks

Because the skill lives as a single SKILL.md, there is no extra local package setup inside the skill itself. The main setup work is in your own environment: dbt, Great Expectations, warehouse access, and any CI runner you use.

Read this file first

Start with:

  • plugins/data-engineering/skills/data-quality-frameworks/SKILL.md

Since there are no supporting README, resources, or scripts, the fastest reading path is:

  1. When to Use This Skill
  2. Core Concepts
  3. sections covering the testing pyramid and framework patterns
  4. any implementation examples in code blocks

This is a short skill to consume, so the main gain comes from using it with a precise prompt, not from deep repository spelunking.

What input the skill needs from you

For strong data-quality-frameworks usage, give the agent:

  • dataset or model names
  • column list with types
  • expected grain or primary key
  • freshness expectations
  • allowed value ranges or enums
  • nullable vs required fields
  • known upstream/downstream dependencies
  • where checks should run: ingestion, transform, publish, or contract boundary
  • failure handling policy: warn, fail job, quarantine, alert

Without those details, the agent can only return generic examples like uniqueness, null, and range checks.

Turn a rough goal into a strong prompt

Weak prompt:

Help me add data quality checks.

Better prompt:

Use the data-quality-frameworks skill to design a validation plan for our orders pipeline. Source is raw event data loaded to BigQuery, transformed with dbt. Key fields: order_id, customer_id, order_status, order_total, created_at, updated_at. order_id must be unique at the mart layer. order_status must be one of pending, paid, shipped, cancelled, refunded. order_total must be >= 0. Freshness target is under 2 hours. We want: 1) source-level checks, 2) dbt tests, 3) any checks that fit Great Expectations, 4) a simple data contract for upstream producers, and 5) CI/CD recommendations with fail-vs-warn guidance.

That prompt works because it gives the skill enough context to map requirements to the right framework.

How to ask for the right output format

Ask the agent to produce outputs in layers:

  1. quality dimensions by dataset
  2. testing pyramid placement
  3. concrete framework mapping
  4. sample test definitions
  5. rollout order

Example:

Using the data-quality-frameworks guide, return a table with columns: check, dimension, layer, framework, severity, reason. Then generate sample dbt tests and Great Expectations expectations only for the highest-value checks.

This reduces overengineering and keeps the first pass implementation-focused.

Practical workflow for data-quality-frameworks usage

A good workflow is:

  1. Inventory your critical datasets.
  2. Identify the grain and contract surface.
  3. Classify checks by quality dimension.
  4. Place each check in the testing pyramid.
  5. Assign each check to dbt, Great Expectations, or a data contract.
  6. Decide which checks block deployments and which only alert.
  7. Implement the smallest reliable set first.

This skill is better for system design and validation planning than for brute-force generation of every possible test.

When to use dbt, Great Expectations, or contracts

Use the skill to separate concerns:

  • dbt fits model-level assertions like uniqueness, non-null, accepted values, and relationship tests.
  • Great Expectations fits richer validation workflows, profiling-style expectations, and runtime validation around pipeline stages.
  • Data contracts fit producer-consumer agreements such as schema shape, required fields, and semantic guarantees at boundaries.

A common mistake is forcing one tool to do everything. The data-quality-frameworks skill is most helpful when you use each framework for its natural layer.

What the testing pyramid means in practice

The skill’s testing pyramid is useful for prioritization. In practice:

  • put many cheap structural checks at lower levels
  • add fewer cross-table and business-rule checks at higher levels
  • reserve expensive end-to-end validation for the most critical paths

If your first plan contains only complex business assertions and no basic null, uniqueness, schema, or freshness checks, you are likely skipping the highest ROI layer.

What this skill does well for Data Cleaning

For data-quality-frameworks for Data Cleaning, the skill is best used to define ongoing validation after cleaning logic is introduced. It helps answer:

  • which bad inputs should be blocked
  • which values should be standardized
  • which anomalies should trigger review instead of pipeline failure
  • how to ensure cleaned outputs stay conformant over time

It is less about cleaning transformations themselves and more about proving those transformations produce trustworthy outputs.

Constraints and adoption tradeoffs

This skill has low installation friction but limited built-in implementation assets. Expect to do your own translation into project files such as:

  • models/*.yml for dbt
  • expectation suites or checkpoints for Great Expectations
  • contract documents in your preferred schema format

If you need a repository with ready-made templates, this skill is lighter-weight than that. Its value is in helping an agent reason correctly, not in shipping a turnkey starter kit.

data-quality-frameworks skill FAQ

Is data-quality-frameworks good for beginners?

Yes, if you already understand basic tables, columns, and pipelines. The concepts are approachable: quality dimensions, test layering, and framework selection. Absolute beginners may still need separate documentation for dbt or Great Expectations syntax because the skill is not a full tutorial for either tool.

Is this better than an ordinary prompt?

Usually yes, when your problem is framework choice and test strategy. A normal prompt may generate random checks. The data-quality-frameworks skill gives the agent a more disciplined structure: dimensions, pyramid, and framework fit. That usually leads to fewer irrelevant tests.

What is the main limitation?

The skill does not include helper files, implementation templates, or project-specific adapters. It cannot infer your warehouse semantics, SLAs, or business rules unless you provide them. The quality of the result is tightly tied to the specificity of your prompt.

When should I not use data-quality-frameworks?

Skip it if you only need a one-line check for a single CSV or a quick ad hoc cleanup script. It is also a weak fit if your team has already standardized fully on one framework and only needs syntax snippets, not design guidance.

Can I use data-quality-frameworks with only dbt?

Yes. Even though the skill mentions multiple frameworks, you can ask it to constrain recommendations to dbt only. The same applies if your team prefers Great Expectations or wants to focus on data contracts first.

Does it help with CI/CD decisions?

Yes. One of the clearer use cases in the source skill is automating validation in CI/CD. Ask explicitly which checks should fail pull requests, which should run post-deploy, and which should produce alerts only. That distinction materially improves the usefulness of the output.

How to Improve data-quality-frameworks skill

Give the agent dataset semantics, not just schema

The fastest way to improve data-quality-frameworks results is to include meaning, not just columns. For example:

  • customer_id can be null for guest checkout”
  • revenue_amount should never be negative except for refunds”
  • status values are controlled by the application enum”

These details let the agent recommend realistic validity and consistency checks instead of generic ones.

Separate critical checks from nice-to-have checks

Tell the agent which failures are production blockers. Example:

Tier 1: schema drift, null primary keys, duplicate business keys.
Tier 2: freshness breaches over 2 hours.
Tier 3: soft anomaly detection on distribution shifts.

This helps the skill produce a plan your team can actually adopt instead of a long backlog that never ships.

Ask for framework mapping, not just a flat list

A common failure mode is getting 30 checks with no implementation path. Improve the prompt by requiring every check to include:

  • dimension
  • layer
  • framework
  • severity
  • owner

That turns the data-quality-frameworks guide into an execution plan rather than an idea dump.

Provide sample rows and known bad cases

If you want better data-quality-frameworks usage, include examples of both valid and invalid data. Known failure examples help the agent write sharper rules around:

  • edge-case nullability
  • date ordering
  • enum drift
  • duplicate logic
  • impossible value combinations

Real bad cases are often more informative than a perfect schema.

Iterate after the first output

Do not stop at the first generated plan. Ask follow-ups like:

  • “Which 5 tests give the highest reliability per hour of work?”
  • “Which recommendations belong in dbt versus contracts?”
  • “Which checks are likely too expensive for every run?”
  • “Rewrite this for BigQuery and incremental models.”

The data-quality-frameworks skill improves noticeably when used as a narrowing tool over two or three iterations.

Watch for common overdesign mistakes

The most common mistakes are:

  • starting with expensive end-to-end assertions
  • treating profiling as a substitute for hard guarantees
  • mixing data cleaning logic with validation logic
  • failing jobs on every anomaly, causing alert fatigue
  • writing tests with no clear owner or remediation path

If you ask the agent to rank checks by cost, confidence, and operational impact, the output usually becomes much more deployable.

Ask for a phased rollout plan

A strong improvement prompt is:

Using data-quality-frameworks, create a 30/60/90-day rollout: immediate checks, next-layer business assertions, and longer-term contract governance.

This keeps teams from trying to implement every framework at once. In most cases, the best path is basic dbt tests first, then targeted Great Expectations, then broader contract discipline at team boundaries.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...