A

skill-creator

by anthropics

Create, refine, test, and benchmark agent skills with the skill-creator workflow, including eval review, grading, blind comparison, and description improvement.

Stars0
Favorites0
Comments0
CategorySkill Authoring
Install Command
npx skills add https://github.com/anthropics/skills --skill skill-creator
Overview

Overview

What skill-creator is

skill-creator is a meta-skill for building and improving other agent skills. In the anthropics/skills repository, it is described as a workflow for creating a skill from scratch, revising an existing skill, testing it with eval prompts, reviewing results, and iterating until performance improves.

This makes skill-creator especially relevant for teams working with Anthropic and Claude workflows who want a more structured way to author skills, validate behavior, and improve triggering descriptions over time.

Who should use skill-creator

Use skill-creator if you are:

  • Writing a new skill and need a repeatable authoring process
  • Updating an existing skill that is underperforming or triggering inconsistently
  • Running evals to compare changes before and after a rewrite
  • Reviewing outputs qualitatively, not just by raw pass/fail counts
  • Benchmarking skill variants and analyzing why one version performs better than another

It is best suited to skill authors, agent workflow designers, and anyone responsible for testing and validation in a skills library.

What problems it helps solve

The repository evidence shows that skill-creator covers more than drafting instructions. It supports a broader improvement loop:

  • drafting or rewriting a skill
  • creating and reviewing eval prompts
  • grading expectations against transcripts and outputs
  • comparing competing outputs in a blinded way
  • analyzing why a winning version performed better
  • improving the skill description for better triggering accuracy

That combination is why skill-creator fits skill authoring first, with strong overlap into skill testing and skill validation.

What is included in the repository

The file tree shows a practical workflow rather than a single text prompt:

  • SKILL.md defines the high-level process for creating and iterating on skills
  • agents/analyzer.md, agents/comparator.md, and agents/grader.md describe specialized evaluation roles
  • scripts/run_eval.py, scripts/run_loop.py, scripts/quick_validate.py, and scripts/aggregate_benchmark.py support testing and benchmark workflows
  • scripts/improve_description.py points to description optimization as a first-class task
  • eval-viewer/generate_review.py, eval-viewer/viewer.html, and assets/eval_review.html support human review of eval runs
  • references/schemas.md suggests supporting structure and reference material for skill packaging or validation work

When skill-creator is a good fit

skill-creator is a strong fit when you want a documented, repeatable process for improving a skill in cycles. It is particularly useful if your team values evidence-based iteration instead of one-off prompt edits.

Choose it when you need:

  • a practical workflow for skill authoring
  • evaluation support beyond ad hoc testing
  • blind comparison to reduce bias between variants
  • review tooling for transcripts and outputs
  • structured iteration after user or evaluator feedback

When skill-creator may not be the best fit

This skill may be more than you need if you only want a tiny helper skill with no planned evaluation loop. It is also not primarily a general software development toolkit or a UI framework. Its center of gravity is authoring and measuring agent skills.

If your goal is simply to install a ready-made end-user skill and use it immediately, skill-creator is more process-oriented than task-oriented.

How to Use

Install skill-creator

Install skill-creator from the Anthropic skills repository with:

npx skills add https://github.com/anthropics/skills --skill skill-creator

After installation, open the installed files and start with SKILL.md. That file sets the overall workflow: identify the user's stage, draft or revise the skill, test it, review the results, and iterate.

Review the key files first

For installation and adoption decisions, these are the most useful files to inspect early:

  • SKILL.md
  • agents/analyzer.md
  • agents/comparator.md
  • agents/grader.md
  • scripts/run_eval.py
  • scripts/run_loop.py
  • scripts/quick_validate.py
  • scripts/improve_description.py
  • scripts/aggregate_benchmark.py
  • eval-viewer/generate_review.py
  • eval-viewer/viewer.html
  • assets/eval_review.html
  • references/schemas.md

This mix shows that skill-creator includes both authoring guidance and validation support.

Based on SKILL.md, the intended usage pattern is iterative:

  1. Decide what the target skill should do and how it should work.
  2. Draft the skill.
  3. Create a small set of test prompts.
  4. Run the skill on those prompts.
  5. Review outputs qualitatively and quantitatively.
  6. Rewrite the skill using the review findings.
  7. Expand the test set and repeat at larger scale.

This is helpful if you want to move from rough idea to validated skill without treating evaluation as an afterthought.

Use the evaluation agents for deeper review

The repository includes three specialized agent definitions that clarify how evaluation should work:

  • agents/comparator.md: compares outputs as A vs. B without knowing which skill produced them, which helps reduce bias
  • agents/analyzer.md: explains why the winning version won and surfaces actionable improvement ideas
  • agents/grader.md: checks whether expectations truly passed and warns against weak assertions that create false confidence

Together, these files show that skill-creator is not just about generating a skill draft. It is also about disciplined review.

Review eval results in a browser

A notable practical feature is eval-viewer/generate_review.py, which generates and serves a self-contained review page for eval results. The script usage in the source is:

python generate_review.py <workspace-path> [--port PORT] [--skill-name NAME]

It can also load prior feedback:

python generate_review.py <workspace-path> --previous-feedback /path/to/old/feedback.json

According to the source excerpt, it reads workspace runs, embeds output data into an HTML review page, serves it locally, and auto-saves feedback to feedback.json. If your workflow depends on human review of outputs, this is one of the strongest reasons to consider skill-creator.

Use the scripts folder as the operational toolbox

The scripts/ directory suggests the main operational tasks supported by skill-creator:

  • run_eval.py for executing evaluations
  • run_loop.py for iterative improvement loops
  • quick_validate.py for faster validation checks
  • aggregate_benchmark.py for benchmark aggregation and variance-oriented analysis
  • generate_report.py for reporting
  • improve_description.py for description tuning
  • package_skill.py for packaging work

You should treat these files as implementation details to inspect and adapt to your own environment rather than assuming a one-size-fits-all setup.

Practical adoption advice

Before fully adopting skill-creator, check these points:

  • Whether your team already has a workspace layout compatible with transcript and output review
  • Whether you want qualitative review in addition to numeric scoring
  • Whether blind comparison between skill variants matters for your process
  • Whether you need description optimization to improve skill triggering
  • Whether Python-based local review tooling fits your environment

If those needs match your workflow, skill-creator is likely a good installation candidate.

FAQ

What does skill-creator actually do after installation?

skill-creator gives you a structured process for creating and improving agent skills. It helps you move from draft to tested version by combining authoring guidance, eval execution support, result review, grading, blind comparison, and iteration.

Is skill-creator only for creating brand-new skills?

No. The repository description explicitly supports creating a skill from scratch, modifying an existing skill, improving an existing skill, running evals, benchmarking performance, and optimizing a description for better triggering accuracy.

Does skill-creator include testing and validation support?

Yes. Repository evidence supports that strongly. The presence of agents/grader.md, agents/comparator.md, agents/analyzer.md, and scripts such as run_eval.py, quick_validate.py, and aggregate_benchmark.py shows that testing and validation are core parts of the workflow.

Does skill-creator help compare two skill versions fairly?

Yes. agents/comparator.md describes a blind comparison process where outputs are labeled A and B without revealing which skill produced them. That is useful when you want to compare variants with less bias.

Can skill-creator help improve a skill description?

Yes. The top-level description explicitly mentions optimizing a skill's description for better triggering accuracy, and the repository includes scripts/improve_description.py, which supports that claim.

Do I need to use every script and subfolder?

No. A practical approach is to start with SKILL.md, review the agent role files, and then inspect the scripts and viewer files that match your workflow. Some teams may only need the authoring loop and eval review, while others will want the broader benchmarking and reporting pieces.

Is skill-creator a good fit for simple one-off tasks?

Usually not. skill-creator is most valuable when you plan to iterate, test, compare, and improve a skill over time. For a one-off task with no evaluation plan, its workflow may be more structure than you need.

What should I inspect before deciding to install skill-creator in production workflows?

Check SKILL.md, the three agent files in agents/, the scripts in scripts/, and eval-viewer/generate_review.py. Those files give the clearest picture of how skill-creator approaches skill authoring, testing, and validation in real use.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...