skill-creator
by anthropicsskill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.
This skill scores 84/100, which means it is a strong directory candidate for users who want a real workflow for creating, iterating on, and evaluating other skills. The repository shows substantial operational substance—multi-step guidance, evaluator agents, and runnable scripts—so an agent should get more leverage here than from a generic prompt, though adopters should still expect some setup interpretation because SKILL.md does not present a simple install or quick-start command.
- Strong triggerability: the description clearly covers creating new skills, editing existing ones, running evals, benchmarking variance, and improving descriptions for better triggering.
- High operational leverage: the repo includes concrete tooling for eval loops and review, including run_eval.py, run_loop.py, aggregate_benchmark.py, package_skill.py, and eval-viewer/generate_review.py.
- Good progressive disclosure: dedicated analyzer, comparator, and grader agent docs give explicit roles, inputs, and step-by-step evaluation procedures.
- Adoption is not fully turnkey: SKILL.md lacks an install command or compact quick-start path, so users may need to infer how to wire the scripts into their environment.
- The workflow appears comparatively heavy for simple use cases, with multiple scripts, agents, and evaluation steps that may be more than some users need.
Overview of skill-creator skill
What skill-creator does
skill-creator is a meta-skill for Skill Authoring: it helps you create a new skill, revise an existing one, and evaluate whether the changes actually improved behavior. Unlike a generic “write me a skill” prompt, it is built around an iterative loop: draft, test, review outputs, compare variants, and refine.
Who should use skill-creator
The best fit is anyone responsible for turning recurring agent behavior into a reusable skill:
- skill authors starting from a rough idea
- maintainers improving a weak
SKILL.md - teams adding evals before wider rollout
- people tuning descriptions so the right skill triggers more reliably
If you only need a one-off prompt, skill-creator is probably more process than you need.
The real job-to-be-done
Most users do not need help writing markdown alone. They need help reducing guesswork:
- what the skill should include
- how to collect enough context from the user
- how to test with realistic prompts
- how to review outputs qualitatively and quantitatively
- how to iterate without being fooled by a single good run
That workflow focus is the main differentiator of the skill-creator skill.
What stands out before install
The repository is stronger on evaluation and iteration than on “instant scaffolding.” It includes:
- evaluator-oriented helper agents in
agents/ - benchmark and reporting scripts in
scripts/ - an HTML review workflow in
eval-viewer/andassets/ - schema/reference material in
references/schemas.md
That makes skill-creator especially useful when you care about measuring quality, not just generating a first draft.
What may block adoption
The main tradeoff is complexity. skill-creator expects you to think in stages and provide test prompts, expectations, and comparison targets. If your environment cannot run supporting Python scripts or you do not plan to evaluate outputs, you will use only part of the skill.
How to Use skill-creator skill
Install skill-creator in your skills environment
If you use the Anthropic skills CLI pattern, install from the upstream repo:
npx skills add https://github.com/anthropics/skills --skill skill-creator
The repository does not advertise a separate package installer inside SKILL.md, so most users should add it from the monorepo and then inspect the local installed files.
Read these files first
For fast orientation, read in this order:
skills/skill-creator/SKILL.mdskills/skill-creator/agents/grader.mdskills/skill-creator/agents/comparator.mdskills/skill-creator/agents/analyzer.mdskills/skill-creator/scripts/run_eval.pyskills/skill-creator/scripts/run_loop.pyskills/skill-creator/eval-viewer/generate_review.pyskills/skill-creator/references/schemas.md
This path tells you the real operating model: generate or revise a skill, run evals, compare outputs, and analyze why one version wins.
Start with the stage you are actually in
The skill-creator skill is not just for brand-new skills. It works best when you explicitly tell the model which stage applies:
- idea capture: “I know the problem but not the workflow”
- first draft: “Turn these notes into a usable
SKILL.md” - repair: “This skill exists but fails on these prompts”
- optimization: “Improve triggering description and examples”
- evaluation: “Design test prompts and expectations”
- comparison: “Compare v1 vs v2 and explain the winner”
If you skip this, the model may spend too much effort on the wrong phase.
Give the input the skill actually needs
A strong skill-creator usage prompt usually includes:
- the target user job
- what inputs the future skill will receive
- expected outputs or deliverables
- tools/files the skill may read or run
- constraints such as latency, format, or safety
- examples of failure you already observed
- 3 to 10 realistic test prompts
The biggest quality jump usually comes from better examples and failure cases, not longer prose.
Turn a rough goal into a strong prompt
Weak prompt:
Help me create a research skill.
Stronger prompt:
Use skill-creator for Skill Authoring. I need a skill that turns a vague market question into a structured research brief with sources, assumptions, and open questions. Inputs are a user question and optional company context. Outputs should be a markdown brief. The skill may browse repository files but should not invent citations. Current failure modes: overlong answers, weak source framing, and missing assumptions. Please draft the skill, propose 6 eval prompts, and suggest measurable expectations for each.
This is better because it specifies task, I/O, constraints, and failure modes.
Use the built-in evaluation workflow
The repository evidence shows skill-creator is designed for iterative evaluation, not just drafting. In practice:
- draft or revise the skill
- create a small eval set
- run executions
- review transcripts and outputs
- grade expectations
- compare variants blindly when useful
- revise the skill again
The scripts under scripts/ are a clue to the intended workflow:
run_eval.pyfor running evalsaggregate_benchmark.pyandgenerate_report.pyfor summarizing resultsrun_loop.pyfor repeated improvement cyclesquick_validate.pyfor faster checksimprove_description.pyfor trigger-description tuning
Review outputs with the HTML viewer
A practical differentiator of skill-creator install is the included review UI. eval-viewer/generate_review.py creates a self-contained HTML review page from a workspace of runs and can save feedback. That matters when multiple outputs need human review, especially for skills where transcript quality and output artifacts both matter.
If you are deciding whether to adopt this skill, this review tooling is one of the strongest reasons.
Use comparator and grader agents for less biased iteration
Two support agents are especially valuable:
agents/comparator.mdcompares outputs as A/B without knowing which skill produced themagents/grader.mdchecks expectations against transcripts and outputs, and also critiques weak assertions
That means skill-creator is not only asking “did this output look good?” but also “were our evals meaningful?” That is unusually useful for serious skill maintenance.
Tune the description, not just the body
Many skill authors over-focus on instruction content and under-focus on the top description used for triggering. The presence of scripts/improve_description.py signals that trigger quality is part of the intended workflow. If a good skill is not being invoked consistently, improve:
- problem framing in the description
- the situations when it should activate
- the boundary of what it should not handle
This is a high-leverage use of the skill-creator skill for existing skill libraries.
Know the practical limits
skill-creator helps structure authoring and evaluation, but it does not remove the need for:
- domain knowledge about the target task
- realistic eval cases
- human judgment when outputs are subjective
- runtime support for the included Python utilities
If you cannot supply realistic prompts or inspect outputs, the process becomes much weaker.
skill-creator skill FAQ
Is skill-creator good for beginners?
Yes, with one caveat: beginners can use skill-creator guide workflows to avoid staring at a blank page, but the full repo assumes some comfort with iterative testing. If you are new, start with drafting and a tiny eval set before touching benchmarking scripts.
What makes skill-creator better than a normal prompt?
A normal prompt may give you a plausible first draft. skill-creator is better when you need a repeatable creation-and-improvement loop with evaluation support. Its real value is the surrounding method and helper files, not just the initial writing.
When should I not use skill-creator?
Skip it when:
- you only need a one-time prompt
- there is no plan to test outputs
- the task is too small to justify a skill
- your environment cannot use the repository’s supporting scripts or review flow
In those cases, a direct prompt is faster.
Does skill-creator only help with new skills?
No. The skill-creator skill is also suited to revising existing skills, benchmarking two versions, and improving descriptions for better triggering accuracy.
Do I need all the scripts to get value?
No. You can still use skill-creator usage for drafting and manual revision. But the evaluation scripts and viewer are where the repository gives the most information gain beyond ordinary prompting.
Is this only for Anthropic's skills ecosystem?
It is clearly designed around that ecosystem’s skill structure and terminology, so that is the best fit. Still, the workflow ideas—draft, eval, compare, revise—transfer well to other internal skill or agent frameworks.
How to Improve skill-creator skill
Give narrower task boundaries
The fastest way to improve skill-creator output quality is to define what the future skill should refuse or ignore. Without boundaries, drafts often become broad and trigger-happy. Include “use when” and “do not use when” examples in your prompt.
Supply realistic eval prompts early
Many users wait too long to create test cases. For skill-creator for Skill Authoring, early eval prompts force clarity about the real task. Good evals should reflect actual user inputs, not polished examples that make the skill look better than it is.
Write stronger expectations
Weak expectations create false confidence. Instead of:
- “Output is clear”
Use:
- “Output includes a prioritized recommendation”
- “Every cited claim links to a provided source”
- “Result contains assumptions and open questions sections”
This matches the philosophy seen in agents/grader.md, which explicitly warns against trivially satisfied assertions.
Compare versions blind when changes are subtle
If you are deciding between two similar drafts, use the blind-comparison pattern instead of eyeballing the markdown. Small wording changes can affect execution in ways that are hard to predict from the skill file alone.
Inspect transcripts, not just final outputs
A polished final answer can hide poor tool use, missed files, or weak reasoning. skill-creator becomes more valuable when you review transcripts alongside outputs and ask why a version succeeded, which aligns with the analyzer agent’s purpose.
Improve one dimension at a time
Do not rewrite description, instructions, examples, and tool guidance all at once if you want learnings you can trust. Change one dimension, rerun a stable eval set, then review the delta. This makes the skill-creator guide process much more informative.
Use the repository files as operating instructions
If results feel vague, do not only reread SKILL.md. Read the support files that define evaluation behavior:
agents/comparator.mdfor what “better” means in A/B reviewsagents/grader.mdfor pass/fail rigoragents/analyzer.mdfor post-hoc improvement insightsreferences/schemas.mdfor expected structures
These files often clarify how to use the skill more than the top-level description does.
Expand the eval set after the first win
A common failure mode is stopping after a few good runs. The skill-creator skill is explicitly built for iterative expansion: once the draft works on a small set, broaden the prompts to include edge cases, ambiguous requests, and failure-heavy examples. That is how you find whether the skill is robust or merely lucky.
