skill-creator
by anthropicsCreate, refine, test, and benchmark agent skills with the skill-creator workflow, including eval review, grading, blind comparison, and description improvement.
Overview
What skill-creator is
skill-creator is a meta-skill for building and improving other agent skills. In the anthropics/skills repository, it is described as a workflow for creating a skill from scratch, revising an existing skill, testing it with eval prompts, reviewing results, and iterating until performance improves.
This makes skill-creator especially relevant for teams working with Anthropic and Claude workflows who want a more structured way to author skills, validate behavior, and improve triggering descriptions over time.
Who should use skill-creator
Use skill-creator if you are:
- Writing a new skill and need a repeatable authoring process
- Updating an existing skill that is underperforming or triggering inconsistently
- Running evals to compare changes before and after a rewrite
- Reviewing outputs qualitatively, not just by raw pass/fail counts
- Benchmarking skill variants and analyzing why one version performs better than another
It is best suited to skill authors, agent workflow designers, and anyone responsible for testing and validation in a skills library.
What problems it helps solve
The repository evidence shows that skill-creator covers more than drafting instructions. It supports a broader improvement loop:
- drafting or rewriting a skill
- creating and reviewing eval prompts
- grading expectations against transcripts and outputs
- comparing competing outputs in a blinded way
- analyzing why a winning version performed better
- improving the skill description for better triggering accuracy
That combination is why skill-creator fits skill authoring first, with strong overlap into skill testing and skill validation.
What is included in the repository
The file tree shows a practical workflow rather than a single text prompt:
SKILL.mddefines the high-level process for creating and iterating on skillsagents/analyzer.md,agents/comparator.md, andagents/grader.mddescribe specialized evaluation rolesscripts/run_eval.py,scripts/run_loop.py,scripts/quick_validate.py, andscripts/aggregate_benchmark.pysupport testing and benchmark workflowsscripts/improve_description.pypoints to description optimization as a first-class taskeval-viewer/generate_review.py,eval-viewer/viewer.html, andassets/eval_review.htmlsupport human review of eval runsreferences/schemas.mdsuggests supporting structure and reference material for skill packaging or validation work
When skill-creator is a good fit
skill-creator is a strong fit when you want a documented, repeatable process for improving a skill in cycles. It is particularly useful if your team values evidence-based iteration instead of one-off prompt edits.
Choose it when you need:
- a practical workflow for skill authoring
- evaluation support beyond ad hoc testing
- blind comparison to reduce bias between variants
- review tooling for transcripts and outputs
- structured iteration after user or evaluator feedback
When skill-creator may not be the best fit
This skill may be more than you need if you only want a tiny helper skill with no planned evaluation loop. It is also not primarily a general software development toolkit or a UI framework. Its center of gravity is authoring and measuring agent skills.
If your goal is simply to install a ready-made end-user skill and use it immediately, skill-creator is more process-oriented than task-oriented.
How to Use
Install skill-creator
Install skill-creator from the Anthropic skills repository with:
npx skills add https://github.com/anthropics/skills --skill skill-creator
After installation, open the installed files and start with SKILL.md. That file sets the overall workflow: identify the user's stage, draft or revise the skill, test it, review the results, and iterate.
Review the key files first
For installation and adoption decisions, these are the most useful files to inspect early:
SKILL.mdagents/analyzer.mdagents/comparator.mdagents/grader.mdscripts/run_eval.pyscripts/run_loop.pyscripts/quick_validate.pyscripts/improve_description.pyscripts/aggregate_benchmark.pyeval-viewer/generate_review.pyeval-viewer/viewer.htmlassets/eval_review.htmlreferences/schemas.md
This mix shows that skill-creator includes both authoring guidance and validation support.
Understand the recommended workflow
Based on SKILL.md, the intended usage pattern is iterative:
- Decide what the target skill should do and how it should work.
- Draft the skill.
- Create a small set of test prompts.
- Run the skill on those prompts.
- Review outputs qualitatively and quantitatively.
- Rewrite the skill using the review findings.
- Expand the test set and repeat at larger scale.
This is helpful if you want to move from rough idea to validated skill without treating evaluation as an afterthought.
Use the evaluation agents for deeper review
The repository includes three specialized agent definitions that clarify how evaluation should work:
agents/comparator.md: compares outputs as A vs. B without knowing which skill produced them, which helps reduce biasagents/analyzer.md: explains why the winning version won and surfaces actionable improvement ideasagents/grader.md: checks whether expectations truly passed and warns against weak assertions that create false confidence
Together, these files show that skill-creator is not just about generating a skill draft. It is also about disciplined review.
Review eval results in a browser
A notable practical feature is eval-viewer/generate_review.py, which generates and serves a self-contained review page for eval results. The script usage in the source is:
python generate_review.py <workspace-path> [--port PORT] [--skill-name NAME]
It can also load prior feedback:
python generate_review.py <workspace-path> --previous-feedback /path/to/old/feedback.json
According to the source excerpt, it reads workspace runs, embeds output data into an HTML review page, serves it locally, and auto-saves feedback to feedback.json. If your workflow depends on human review of outputs, this is one of the strongest reasons to consider skill-creator.
Use the scripts folder as the operational toolbox
The scripts/ directory suggests the main operational tasks supported by skill-creator:
run_eval.pyfor executing evaluationsrun_loop.pyfor iterative improvement loopsquick_validate.pyfor faster validation checksaggregate_benchmark.pyfor benchmark aggregation and variance-oriented analysisgenerate_report.pyfor reportingimprove_description.pyfor description tuningpackage_skill.pyfor packaging work
You should treat these files as implementation details to inspect and adapt to your own environment rather than assuming a one-size-fits-all setup.
Practical adoption advice
Before fully adopting skill-creator, check these points:
- Whether your team already has a workspace layout compatible with transcript and output review
- Whether you want qualitative review in addition to numeric scoring
- Whether blind comparison between skill variants matters for your process
- Whether you need description optimization to improve skill triggering
- Whether Python-based local review tooling fits your environment
If those needs match your workflow, skill-creator is likely a good installation candidate.
FAQ
What does skill-creator actually do after installation?
skill-creator gives you a structured process for creating and improving agent skills. It helps you move from draft to tested version by combining authoring guidance, eval execution support, result review, grading, blind comparison, and iteration.
Is skill-creator only for creating brand-new skills?
No. The repository description explicitly supports creating a skill from scratch, modifying an existing skill, improving an existing skill, running evals, benchmarking performance, and optimizing a description for better triggering accuracy.
Does skill-creator include testing and validation support?
Yes. Repository evidence supports that strongly. The presence of agents/grader.md, agents/comparator.md, agents/analyzer.md, and scripts such as run_eval.py, quick_validate.py, and aggregate_benchmark.py shows that testing and validation are core parts of the workflow.
Does skill-creator help compare two skill versions fairly?
Yes. agents/comparator.md describes a blind comparison process where outputs are labeled A and B without revealing which skill produced them. That is useful when you want to compare variants with less bias.
Can skill-creator help improve a skill description?
Yes. The top-level description explicitly mentions optimizing a skill's description for better triggering accuracy, and the repository includes scripts/improve_description.py, which supports that claim.
Do I need to use every script and subfolder?
No. A practical approach is to start with SKILL.md, review the agent role files, and then inspect the scripts and viewer files that match your workflow. Some teams may only need the authoring loop and eval review, while others will want the broader benchmarking and reporting pieces.
Is skill-creator a good fit for simple one-off tasks?
Usually not. skill-creator is most valuable when you plan to iterate, test, compare, and improve a skill over time. For a one-off task with no evaluation plan, its workflow may be more structure than you need.
What should I inspect before deciding to install skill-creator in production workflows?
Check SKILL.md, the three agent files in agents/, the scripts in scripts/, and eval-viewer/generate_review.py. Those files give the clearest picture of how skill-creator approaches skill authoring, testing, and validation in real use.
