A

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Stars105.1k
Favorites2
Comments0
AddedMar 28, 2026
CategorySkill Authoring
Install Command
npx skills add anthropics/skills --skill skill-creator
Curation Score

This skill scores 84/100, which means it is a strong directory candidate for users who want a real workflow for creating, iterating on, and evaluating other skills. The repository shows substantial operational substance—multi-step guidance, evaluator agents, and runnable scripts—so an agent should get more leverage here than from a generic prompt, though adopters should still expect some setup interpretation because SKILL.md does not present a simple install or quick-start command.

95/100
Strengths
  • Strong triggerability: the description clearly covers creating new skills, editing existing ones, running evals, benchmarking variance, and improving descriptions for better triggering.
  • High operational leverage: the repo includes concrete tooling for eval loops and review, including run_eval.py, run_loop.py, aggregate_benchmark.py, package_skill.py, and eval-viewer/generate_review.py.
  • Good progressive disclosure: dedicated analyzer, comparator, and grader agent docs give explicit roles, inputs, and step-by-step evaluation procedures.
Cautions
  • Adoption is not fully turnkey: SKILL.md lacks an install command or compact quick-start path, so users may need to infer how to wire the scripts into their environment.
  • The workflow appears comparatively heavy for simple use cases, with multiple scripts, agents, and evaluation steps that may be more than some users need.
Overview

Overview of skill-creator skill

What skill-creator does

skill-creator is a meta-skill for Skill Authoring: it helps you create a new skill, revise an existing one, and evaluate whether the changes actually improved behavior. Unlike a generic “write me a skill” prompt, it is built around an iterative loop: draft, test, review outputs, compare variants, and refine.

Who should use skill-creator

The best fit is anyone responsible for turning recurring agent behavior into a reusable skill:

  • skill authors starting from a rough idea
  • maintainers improving a weak SKILL.md
  • teams adding evals before wider rollout
  • people tuning descriptions so the right skill triggers more reliably

If you only need a one-off prompt, skill-creator is probably more process than you need.

The real job-to-be-done

Most users do not need help writing markdown alone. They need help reducing guesswork:

  • what the skill should include
  • how to collect enough context from the user
  • how to test with realistic prompts
  • how to review outputs qualitatively and quantitatively
  • how to iterate without being fooled by a single good run

That workflow focus is the main differentiator of the skill-creator skill.

What stands out before install

The repository is stronger on evaluation and iteration than on “instant scaffolding.” It includes:

  • evaluator-oriented helper agents in agents/
  • benchmark and reporting scripts in scripts/
  • an HTML review workflow in eval-viewer/ and assets/
  • schema/reference material in references/schemas.md

That makes skill-creator especially useful when you care about measuring quality, not just generating a first draft.

What may block adoption

The main tradeoff is complexity. skill-creator expects you to think in stages and provide test prompts, expectations, and comparison targets. If your environment cannot run supporting Python scripts or you do not plan to evaluate outputs, you will use only part of the skill.

How to Use skill-creator skill

Install skill-creator in your skills environment

If you use the Anthropic skills CLI pattern, install from the upstream repo:

npx skills add https://github.com/anthropics/skills --skill skill-creator

The repository does not advertise a separate package installer inside SKILL.md, so most users should add it from the monorepo and then inspect the local installed files.

Read these files first

For fast orientation, read in this order:

  1. skills/skill-creator/SKILL.md
  2. skills/skill-creator/agents/grader.md
  3. skills/skill-creator/agents/comparator.md
  4. skills/skill-creator/agents/analyzer.md
  5. skills/skill-creator/scripts/run_eval.py
  6. skills/skill-creator/scripts/run_loop.py
  7. skills/skill-creator/eval-viewer/generate_review.py
  8. skills/skill-creator/references/schemas.md

This path tells you the real operating model: generate or revise a skill, run evals, compare outputs, and analyze why one version wins.

Start with the stage you are actually in

The skill-creator skill is not just for brand-new skills. It works best when you explicitly tell the model which stage applies:

  • idea capture: “I know the problem but not the workflow”
  • first draft: “Turn these notes into a usable SKILL.md
  • repair: “This skill exists but fails on these prompts”
  • optimization: “Improve triggering description and examples”
  • evaluation: “Design test prompts and expectations”
  • comparison: “Compare v1 vs v2 and explain the winner”

If you skip this, the model may spend too much effort on the wrong phase.

Give the input the skill actually needs

A strong skill-creator usage prompt usually includes:

  • the target user job
  • what inputs the future skill will receive
  • expected outputs or deliverables
  • tools/files the skill may read or run
  • constraints such as latency, format, or safety
  • examples of failure you already observed
  • 3 to 10 realistic test prompts

The biggest quality jump usually comes from better examples and failure cases, not longer prose.

Turn a rough goal into a strong prompt

Weak prompt:

Help me create a research skill.

Stronger prompt:

Use skill-creator for Skill Authoring. I need a skill that turns a vague market question into a structured research brief with sources, assumptions, and open questions. Inputs are a user question and optional company context. Outputs should be a markdown brief. The skill may browse repository files but should not invent citations. Current failure modes: overlong answers, weak source framing, and missing assumptions. Please draft the skill, propose 6 eval prompts, and suggest measurable expectations for each.

This is better because it specifies task, I/O, constraints, and failure modes.

Use the built-in evaluation workflow

The repository evidence shows skill-creator is designed for iterative evaluation, not just drafting. In practice:

  1. draft or revise the skill
  2. create a small eval set
  3. run executions
  4. review transcripts and outputs
  5. grade expectations
  6. compare variants blindly when useful
  7. revise the skill again

The scripts under scripts/ are a clue to the intended workflow:

  • run_eval.py for running evals
  • aggregate_benchmark.py and generate_report.py for summarizing results
  • run_loop.py for repeated improvement cycles
  • quick_validate.py for faster checks
  • improve_description.py for trigger-description tuning

Review outputs with the HTML viewer

A practical differentiator of skill-creator install is the included review UI. eval-viewer/generate_review.py creates a self-contained HTML review page from a workspace of runs and can save feedback. That matters when multiple outputs need human review, especially for skills where transcript quality and output artifacts both matter.

If you are deciding whether to adopt this skill, this review tooling is one of the strongest reasons.

Use comparator and grader agents for less biased iteration

Two support agents are especially valuable:

  • agents/comparator.md compares outputs as A/B without knowing which skill produced them
  • agents/grader.md checks expectations against transcripts and outputs, and also critiques weak assertions

That means skill-creator is not only asking “did this output look good?” but also “were our evals meaningful?” That is unusually useful for serious skill maintenance.

Tune the description, not just the body

Many skill authors over-focus on instruction content and under-focus on the top description used for triggering. The presence of scripts/improve_description.py signals that trigger quality is part of the intended workflow. If a good skill is not being invoked consistently, improve:

  • problem framing in the description
  • the situations when it should activate
  • the boundary of what it should not handle

This is a high-leverage use of the skill-creator skill for existing skill libraries.

Know the practical limits

skill-creator helps structure authoring and evaluation, but it does not remove the need for:

  • domain knowledge about the target task
  • realistic eval cases
  • human judgment when outputs are subjective
  • runtime support for the included Python utilities

If you cannot supply realistic prompts or inspect outputs, the process becomes much weaker.

skill-creator skill FAQ

Is skill-creator good for beginners?

Yes, with one caveat: beginners can use skill-creator guide workflows to avoid staring at a blank page, but the full repo assumes some comfort with iterative testing. If you are new, start with drafting and a tiny eval set before touching benchmarking scripts.

What makes skill-creator better than a normal prompt?

A normal prompt may give you a plausible first draft. skill-creator is better when you need a repeatable creation-and-improvement loop with evaluation support. Its real value is the surrounding method and helper files, not just the initial writing.

When should I not use skill-creator?

Skip it when:

  • you only need a one-time prompt
  • there is no plan to test outputs
  • the task is too small to justify a skill
  • your environment cannot use the repository’s supporting scripts or review flow

In those cases, a direct prompt is faster.

Does skill-creator only help with new skills?

No. The skill-creator skill is also suited to revising existing skills, benchmarking two versions, and improving descriptions for better triggering accuracy.

Do I need all the scripts to get value?

No. You can still use skill-creator usage for drafting and manual revision. But the evaluation scripts and viewer are where the repository gives the most information gain beyond ordinary prompting.

Is this only for Anthropic's skills ecosystem?

It is clearly designed around that ecosystem’s skill structure and terminology, so that is the best fit. Still, the workflow ideas—draft, eval, compare, revise—transfer well to other internal skill or agent frameworks.

How to Improve skill-creator skill

Give narrower task boundaries

The fastest way to improve skill-creator output quality is to define what the future skill should refuse or ignore. Without boundaries, drafts often become broad and trigger-happy. Include “use when” and “do not use when” examples in your prompt.

Supply realistic eval prompts early

Many users wait too long to create test cases. For skill-creator for Skill Authoring, early eval prompts force clarity about the real task. Good evals should reflect actual user inputs, not polished examples that make the skill look better than it is.

Write stronger expectations

Weak expectations create false confidence. Instead of:

  • “Output is clear”

Use:

  • “Output includes a prioritized recommendation”
  • “Every cited claim links to a provided source”
  • “Result contains assumptions and open questions sections”

This matches the philosophy seen in agents/grader.md, which explicitly warns against trivially satisfied assertions.

Compare versions blind when changes are subtle

If you are deciding between two similar drafts, use the blind-comparison pattern instead of eyeballing the markdown. Small wording changes can affect execution in ways that are hard to predict from the skill file alone.

Inspect transcripts, not just final outputs

A polished final answer can hide poor tool use, missed files, or weak reasoning. skill-creator becomes more valuable when you review transcripts alongside outputs and ask why a version succeeded, which aligns with the analyzer agent’s purpose.

Improve one dimension at a time

Do not rewrite description, instructions, examples, and tool guidance all at once if you want learnings you can trust. Change one dimension, rerun a stable eval set, then review the delta. This makes the skill-creator guide process much more informative.

Use the repository files as operating instructions

If results feel vague, do not only reread SKILL.md. Read the support files that define evaluation behavior:

  • agents/comparator.md for what “better” means in A/B reviews
  • agents/grader.md for pass/fail rigor
  • agents/analyzer.md for post-hoc improvement insights
  • references/schemas.md for expected structures

These files often clarify how to use the skill more than the top-level description does.

Expand the eval set after the first win

A common failure mode is stopping after a few good runs. The skill-creator skill is explicitly built for iterative expansion: once the draft works on a small set, broaden the prompts to include edge cases, ambiguous requests, and failure-heavy examples. That is how you find whether the skill is robust or merely lucky.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...