dummy-dataset

by phuryn

dummy-dataset generates realistic test data in CSV, JSON, SQL, or Python script form. It helps with mock datasets, demos, database seeding, QA, and data cleaning by letting you define columns, row counts, and constraints for believable sample records.

Stars11.1k

Favorites0

Comments0

AddedMay 12, 2026

CategoryData Cleaning

Install Command

npx skills add phuryn/pm-skills --skill dummy-dataset

Curation Score

This skill scores 68/100, which means it is acceptable to list but should be presented with caveats. Directory users get a clearly stated purpose, usable arguments, and a step-by-step generation workflow that should help an agent trigger it with less guesswork than a generic prompt. However, it appears limited to a single SKILL.md with no supporting scripts or references, so adoption confidence is moderate rather than strong.

68/100

Strengths

Clear trigger and use case: generate realistic dummy datasets for testing, demos, and development.
Operational structure is explicit, with named arguments for product, dataset type, rows, columns, format, and constraints.
Step-by-step workflow plus output formats (CSV, JSON, SQL, Python script) gives agents a concrete execution path.

Cautions

Repository evidence shows no supporting scripts, references, or resources, so trust and depth are limited to the prompt text.
Experimental/test-like signals suggest it is best suited for sample-data tasks, not production-grade data generation workflows.

Dataset CSV Json Sql Python

Overview

Overview of dummy-dataset skill

What dummy-dataset does

The dummy-dataset skill helps you generate realistic test data fast: CSV, JSON, SQL, or a Python script that can produce the data later. It is best for people who need believable sample records for QA, demos, seed data, or a prototype pipeline—not just random filler. The real value of the dummy-dataset skill is that it lets you describe the domain, columns, row count, and constraints so the output is usable instead of obviously synthetic.

When this skill is the right fit

Use dummy-dataset for Data Cleaning, product testing, analytics mockups, form validation, and database seeding when you need data that looks coherent across fields. It is a strong fit if you care about relationships such as dates, categories, IDs, or realistic ranges. It is less useful if you only need one-off toy examples or if your task depends on a real schema already available from production.

What makes it different

Unlike a generic prompt, the dummy-dataset skill is oriented around output format and constraints from the start. That matters when you need data you can actually import or execute, not just read. The main decision point is whether you want directly usable files or a reproducible generation script; this skill supports both.

How to Use dummy-dataset skill

Install dummy-dataset

Install the dummy-dataset skill in your skills environment with:

npx skills add phuryn/pm-skills --skill dummy-dataset

After install, open the skill file first so you understand the expected inputs and output styles before you prompt it in a larger workflow.

Read the right files first

Start with SKILL.md, then check README.md, AGENTS.md, metadata.json, and any rules/, resources/, references/, or scripts/ folders if they exist in your environment. For this repository, SKILL.md is the main source of truth because the skill is compact and does not rely on support files. If you are using dummy-dataset for a real workflow, read the generation template and example sections before asking for final output.

Give a prompt the skill can execute

A good dummy-dataset usage request should include the dataset purpose, fields, row count, format, and constraints. For example: “Generate a 500-row dummy-dataset for a SaaS billing app with columns for customer_id, plan, signup_date, churned, and MRR in CSV format; keep IDs unique, dates within the last 18 months, and churned consistent with subscription status.” That is much better than “make sample data,” because it gives the skill enough structure to keep the dataset plausible.

Best workflow for output quality

Use the skill in two passes: first define the dataset spec, then refine the output after checking whether the fields and constraints are realistic. If you need dummy-dataset for Data Cleaning, ask for edge cases intentionally, such as missing values, duplicates, malformed emails, or inconsistent date formats. If you need a script, ask for the language and execution context up front so the output matches your tooling.

dummy-dataset skill FAQ

Is dummy-dataset good for production-like test data?

Yes, if you need believable mock records with controlled structure. The dummy-dataset skill is useful when downstream tools depend on field consistency, but it is still synthetic data, so it should not be treated as real user data or as a statistical model of your business.

Do I need programming knowledge to use it?

No. Beginners can use dummy-dataset by describing the dataset in plain language and specifying the format they want. More precise inputs improve results, but you do not need to write code unless you want a Python script or SQL insert output.

When should I not use this skill?

Do not use dummy-dataset when you need anonymization of real records, legally compliant synthetic data generation, or an exact copy of a production schema with sensitive constraints. In those cases, a dedicated data pipeline or privacy-aware tooling may be a better fit than a prompt-driven dummy-dataset guide.

Is it better than a normal prompt?

Usually yes, because the dummy-dataset skill pushes you to define columns, business rules, and output format together. A normal prompt often misses one of those pieces, which leads to data that looks okay at a glance but fails during import, testing, or validation.

How to Improve dummy-dataset skill

Provide a tighter dataset spec

The biggest quality gain comes from specifying the domain in terms of fields and rules, not just a theme. Instead of “generate customer data,” ask for concrete fields like customer_id, segment, signup_date, lifetime_value, and status, plus rules such as “lifetime_value should vary by segment” or “signup_date cannot be in the future.” This makes the dummy-dataset skill much more reliable.

Add the constraints that matter downstream

If you plan to clean, validate, or import the data, say what must be true after generation. Mention uniqueness, null rates, date ranges, allowed enums, foreign-key style relationships, and format requirements. For dummy-dataset for Data Cleaning, request controlled errors on purpose so the dataset actually exercises your cleaning logic.

Iterate from defects, not preferences

After the first output, focus your revision on what broke the workflow: bad column names, unrealistic ranges, missing edge cases, or a format that is hard to load. Then ask for a corrected dummy-dataset version with one or two specific changes instead of restating the whole request. That keeps the output practical and prevents overfitting to cosmetic details.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

read-file

by duckdb

read-file helps an agent read and inspect CSV, JSON, Parquet, Avro, Excel, SQLite, spatial files, or remote URLs with DuckDB. Use it to preview rows, check schema, profile data, and answer what’s in this file. It’s best for read-file usage on real data artifacts, not source code.

Office Documents

Favorites 0GitHub 443

data-quality-frameworks

by wshobson

The data-quality-frameworks skill helps teams plan production data validation with dbt tests, Great Expectations, and data contracts. Use it to choose the right checks, map them to a testing pyramid, and guide CI/CD-ready data quality workflows for Data Cleaning and pipeline reliability.

Data Cleaning

Favorites 0GitHub 32.6k

data-analyst

by Shubhamsaboo

data-analyst is a minimal GitHub skill that guides agents toward SQL, pandas, and basic statistical analysis for data exploration. Best for users who want code-backed queries, transformations, and interpretations from a single SKILL.md prompt layer.

Data Analysis

Favorites 0GitHub 104.2k

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Skill Authoring

Favorites 2GitHub 105.1k

azure-identity-py

by microsoft

azure-identity-py helps set up Azure authentication in Python with Microsoft Entra ID. Use it to choose DefaultAzureCredential, managed identity, or service principal auth, configure environment variables, and troubleshoot access control and credential chain issues. Install guidance, usage patterns, and practical setup notes are based on the repo skill file.

Access Control

Favorites 0GitHub 2.2k

claude-api

by anthropics

claude-api is a practical skill for installing and using the Claude API and Anthropic SDKs. It helps developers choose the right SDK or raw HTTP path, detect language-specific docs, and implement streaming, tool use, files, batches, and error handling with less guesswork.

API Development

Favorites 0GitHub 105k

wrangler

by cloudflare

The wrangler skill helps you find correct CLI commands, config shapes, and deployment steps for Cloudflare Workers. Use it for wrangler usage, wrangler install checks, and a practical wrangler guide when building or shipping Workers for Backend Development.

Backend Development

Favorites 0GitHub 1.3k

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

clickhouse-architecture-advisor

by ClickHouse

clickhouse-architecture-advisor helps design ClickHouse workloads with workload-aware decisions for ingestion, partitioning, joins, dictionaries, upserts, and pre-aggregation. It is especially useful for Backend Development, observability, SIEM, product analytics, IoT telemetry, and financial pipelines. The skill labels guidance as official, derived, or field.

Backend Development

Favorites 0GitHub 412

figma-generate-library

by figma

figma-generate-library helps you build or update a Figma design system from a codebase with an ordered workflow for tokens, component libraries, documentation, and light/dark theming. Use the figma-generate-library skill when you need a practical guide for Design Systems, not a one-off mockup. It complements figma-use for Plugin API calls.

Design Systems

Favorites 0GitHub 0

winui-app

by openai

The winui-app skill helps you bootstrap, build, and troubleshoot WinUI 3 desktop apps with C# and the Windows App SDK. Use it for environment readiness, new app setup, shell and navigation choices, XAML controls, theming, accessibility, deployment, and launch-fix workflows for Frontend Development.

Frontend Development

Favorites 0GitHub 0

speech

by openai

Use the speech skill to turn text into spoken audio for narration, voiceover, IVR prompts, accessibility reads, and batch speech generation. It uses the OpenAI Audio API with built-in voices, a bundled CLI, and `OPENAI_API_KEY` for live runs. Custom voice creation is out of scope.

Design Implementation

Favorites 0GitHub 0