dummy-dataset
by phuryndummy-dataset generates realistic test data in CSV, JSON, SQL, or Python script form. It helps with mock datasets, demos, database seeding, QA, and data cleaning by letting you define columns, row counts, and constraints for believable sample records.
This skill scores 68/100, which means it is acceptable to list but should be presented with caveats. Directory users get a clearly stated purpose, usable arguments, and a step-by-step generation workflow that should help an agent trigger it with less guesswork than a generic prompt. However, it appears limited to a single SKILL.md with no supporting scripts or references, so adoption confidence is moderate rather than strong.
- Clear trigger and use case: generate realistic dummy datasets for testing, demos, and development.
- Operational structure is explicit, with named arguments for product, dataset type, rows, columns, format, and constraints.
- Step-by-step workflow plus output formats (CSV, JSON, SQL, Python script) gives agents a concrete execution path.
- Repository evidence shows no supporting scripts, references, or resources, so trust and depth are limited to the prompt text.
- Experimental/test-like signals suggest it is best suited for sample-data tasks, not production-grade data generation workflows.
Overview of dummy-dataset skill
What dummy-dataset does
The dummy-dataset skill helps you generate realistic test data fast: CSV, JSON, SQL, or a Python script that can produce the data later. It is best for people who need believable sample records for QA, demos, seed data, or a prototype pipeline—not just random filler. The real value of the dummy-dataset skill is that it lets you describe the domain, columns, row count, and constraints so the output is usable instead of obviously synthetic.
When this skill is the right fit
Use dummy-dataset for Data Cleaning, product testing, analytics mockups, form validation, and database seeding when you need data that looks coherent across fields. It is a strong fit if you care about relationships such as dates, categories, IDs, or realistic ranges. It is less useful if you only need one-off toy examples or if your task depends on a real schema already available from production.
What makes it different
Unlike a generic prompt, the dummy-dataset skill is oriented around output format and constraints from the start. That matters when you need data you can actually import or execute, not just read. The main decision point is whether you want directly usable files or a reproducible generation script; this skill supports both.
How to Use dummy-dataset skill
Install dummy-dataset
Install the dummy-dataset skill in your skills environment with:
npx skills add phuryn/pm-skills --skill dummy-dataset
After install, open the skill file first so you understand the expected inputs and output styles before you prompt it in a larger workflow.
Read the right files first
Start with SKILL.md, then check README.md, AGENTS.md, metadata.json, and any rules/, resources/, references/, or scripts/ folders if they exist in your environment. For this repository, SKILL.md is the main source of truth because the skill is compact and does not rely on support files. If you are using dummy-dataset for a real workflow, read the generation template and example sections before asking for final output.
Give a prompt the skill can execute
A good dummy-dataset usage request should include the dataset purpose, fields, row count, format, and constraints. For example: “Generate a 500-row dummy-dataset for a SaaS billing app with columns for customer_id, plan, signup_date, churned, and MRR in CSV format; keep IDs unique, dates within the last 18 months, and churned consistent with subscription status.” That is much better than “make sample data,” because it gives the skill enough structure to keep the dataset plausible.
Best workflow for output quality
Use the skill in two passes: first define the dataset spec, then refine the output after checking whether the fields and constraints are realistic. If you need dummy-dataset for Data Cleaning, ask for edge cases intentionally, such as missing values, duplicates, malformed emails, or inconsistent date formats. If you need a script, ask for the language and execution context up front so the output matches your tooling.
dummy-dataset skill FAQ
Is dummy-dataset good for production-like test data?
Yes, if you need believable mock records with controlled structure. The dummy-dataset skill is useful when downstream tools depend on field consistency, but it is still synthetic data, so it should not be treated as real user data or as a statistical model of your business.
Do I need programming knowledge to use it?
No. Beginners can use dummy-dataset by describing the dataset in plain language and specifying the format they want. More precise inputs improve results, but you do not need to write code unless you want a Python script or SQL insert output.
When should I not use this skill?
Do not use dummy-dataset when you need anonymization of real records, legally compliant synthetic data generation, or an exact copy of a production schema with sensitive constraints. In those cases, a dedicated data pipeline or privacy-aware tooling may be a better fit than a prompt-driven dummy-dataset guide.
Is it better than a normal prompt?
Usually yes, because the dummy-dataset skill pushes you to define columns, business rules, and output format together. A normal prompt often misses one of those pieces, which leads to data that looks okay at a glance but fails during import, testing, or validation.
How to Improve dummy-dataset skill
Provide a tighter dataset spec
The biggest quality gain comes from specifying the domain in terms of fields and rules, not just a theme. Instead of “generate customer data,” ask for concrete fields like customer_id, segment, signup_date, lifetime_value, and status, plus rules such as “lifetime_value should vary by segment” or “signup_date cannot be in the future.” This makes the dummy-dataset skill much more reliable.
Add the constraints that matter downstream
If you plan to clean, validate, or import the data, say what must be true after generation. Mention uniqueness, null rates, date ranges, allowed enums, foreign-key style relationships, and format requirements. For dummy-dataset for Data Cleaning, request controlled errors on purpose so the dataset actually exercises your cleaning logic.
Iterate from defects, not preferences
After the first output, focus your revision on what broke the workflow: bad column names, unrealistic ranges, missing edge cases, or a format that is hard to load. Then ask for a corrected dummy-dataset version with one or two specific changes instead of restating the whole request. That keeps the output practical and prevents overfitting to cosmetic details.
