skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Stars105.1k

Favorites2

Comments0

AddedMar 28, 2026

CategorySkill Authoring

Install Command

npx skills add anthropics/skills --skill skill-creator

Curation Score

This skill scores 84/100, which means it is a strong directory candidate for users who want a real workflow for creating, iterating on, and evaluating other skills. The repository shows substantial operational substance—multi-step guidance, evaluator agents, and runnable scripts—so an agent should get more leverage here than from a generic prompt, though adopters should still expect some setup interpretation because SKILL.md does not present a simple install or quick-start command.

95/100

Strengths

Strong triggerability: the description clearly covers creating new skills, editing existing ones, running evals, benchmarking variance, and improving descriptions for better triggering.
High operational leverage: the repo includes concrete tooling for eval loops and review, including run_eval.py, run_loop.py, aggregate_benchmark.py, package_skill.py, and eval-viewer/generate_review.py.
Good progressive disclosure: dedicated analyzer, comparator, and grader agent docs give explicit roles, inputs, and step-by-step evaluation procedures.

Cautions

Adoption is not fully turnkey: SKILL.md lacks an install command or compact quick-start path, so users may need to infer how to wire the scripts into their environment.
The workflow appears comparatively heavy for simple use cases, with multiple scripts, agents, and evaluation steps that may be more than some users need.

Anthropic Claude Workflow Testing Documentation

Overview

Overview of skill-creator skill

What skill-creator does

skill-creator is a meta-skill for Skill Authoring: it helps you create a new skill, revise an existing one, and evaluate whether the changes actually improved behavior. Unlike a generic “write me a skill” prompt, it is built around an iterative loop: draft, test, review outputs, compare variants, and refine.

Who should use skill-creator

The best fit is anyone responsible for turning recurring agent behavior into a reusable skill:

skill authors starting from a rough idea
maintainers improving a weak SKILL.md
teams adding evals before wider rollout
people tuning descriptions so the right skill triggers more reliably

If you only need a one-off prompt, skill-creator is probably more process than you need.

The real job-to-be-done

Most users do not need help writing markdown alone. They need help reducing guesswork:

what the skill should include
how to collect enough context from the user
how to test with realistic prompts
how to review outputs qualitatively and quantitatively
how to iterate without being fooled by a single good run

That workflow focus is the main differentiator of the skill-creator skill.

What stands out before install

The repository is stronger on evaluation and iteration than on “instant scaffolding.” It includes:

evaluator-oriented helper agents in agents/
benchmark and reporting scripts in scripts/
an HTML review workflow in eval-viewer/ and assets/
schema/reference material in references/schemas.md

That makes skill-creator especially useful when you care about measuring quality, not just generating a first draft.

What may block adoption

The main tradeoff is complexity. skill-creator expects you to think in stages and provide test prompts, expectations, and comparison targets. If your environment cannot run supporting Python scripts or you do not plan to evaluate outputs, you will use only part of the skill.

How to Use skill-creator skill

Install skill-creator in your skills environment

If you use the Anthropic skills CLI pattern, install from the upstream repo:

npx skills add https://github.com/anthropics/skills --skill skill-creator

The repository does not advertise a separate package installer inside SKILL.md, so most users should add it from the monorepo and then inspect the local installed files.

Read these files first

For fast orientation, read in this order:

skills/skill-creator/SKILL.md
skills/skill-creator/agents/grader.md
skills/skill-creator/agents/comparator.md
skills/skill-creator/agents/analyzer.md
skills/skill-creator/scripts/run_eval.py
skills/skill-creator/scripts/run_loop.py
skills/skill-creator/eval-viewer/generate_review.py
skills/skill-creator/references/schemas.md

This path tells you the real operating model: generate or revise a skill, run evals, compare outputs, and analyze why one version wins.

Start with the stage you are actually in

The skill-creator skill is not just for brand-new skills. It works best when you explicitly tell the model which stage applies:

idea capture: “I know the problem but not the workflow”
first draft: “Turn these notes into a usable SKILL.md”
repair: “This skill exists but fails on these prompts”
optimization: “Improve triggering description and examples”
evaluation: “Design test prompts and expectations”
comparison: “Compare v1 vs v2 and explain the winner”

If you skip this, the model may spend too much effort on the wrong phase.

Give the input the skill actually needs

A strong skill-creator usage prompt usually includes:

the target user job
what inputs the future skill will receive
expected outputs or deliverables
tools/files the skill may read or run
constraints such as latency, format, or safety
examples of failure you already observed
3 to 10 realistic test prompts

The biggest quality jump usually comes from better examples and failure cases, not longer prose.

Turn a rough goal into a strong prompt

Weak prompt:

Help me create a research skill.

Stronger prompt:

Use skill-creator for Skill Authoring. I need a skill that turns a vague market question into a structured research brief with sources, assumptions, and open questions. Inputs are a user question and optional company context. Outputs should be a markdown brief. The skill may browse repository files but should not invent citations. Current failure modes: overlong answers, weak source framing, and missing assumptions. Please draft the skill, propose 6 eval prompts, and suggest measurable expectations for each.

This is better because it specifies task, I/O, constraints, and failure modes.

Use the built-in evaluation workflow

The repository evidence shows skill-creator is designed for iterative evaluation, not just drafting. In practice:

draft or revise the skill
create a small eval set
run executions
review transcripts and outputs
grade expectations
compare variants blindly when useful
revise the skill again

The scripts under scripts/ are a clue to the intended workflow:

run_eval.py for running evals
aggregate_benchmark.py and generate_report.py for summarizing results
run_loop.py for repeated improvement cycles
quick_validate.py for faster checks
improve_description.py for trigger-description tuning

Review outputs with the HTML viewer

A practical differentiator of skill-creator install is the included review UI. eval-viewer/generate_review.py creates a self-contained HTML review page from a workspace of runs and can save feedback. That matters when multiple outputs need human review, especially for skills where transcript quality and output artifacts both matter.

If you are deciding whether to adopt this skill, this review tooling is one of the strongest reasons.

Use comparator and grader agents for less biased iteration

Two support agents are especially valuable:

agents/comparator.md compares outputs as A/B without knowing which skill produced them
agents/grader.md checks expectations against transcripts and outputs, and also critiques weak assertions

That means skill-creator is not only asking “did this output look good?” but also “were our evals meaningful?” That is unusually useful for serious skill maintenance.

Tune the description, not just the body

Many skill authors over-focus on instruction content and under-focus on the top description used for triggering. The presence of scripts/improve_description.py signals that trigger quality is part of the intended workflow. If a good skill is not being invoked consistently, improve:

problem framing in the description
the situations when it should activate
the boundary of what it should not handle

This is a high-leverage use of the skill-creator skill for existing skill libraries.

Know the practical limits

skill-creator helps structure authoring and evaluation, but it does not remove the need for:

domain knowledge about the target task
realistic eval cases
human judgment when outputs are subjective
runtime support for the included Python utilities

If you cannot supply realistic prompts or inspect outputs, the process becomes much weaker.

skill-creator skill FAQ

Is skill-creator good for beginners?

Yes, with one caveat: beginners can use skill-creator guide workflows to avoid staring at a blank page, but the full repo assumes some comfort with iterative testing. If you are new, start with drafting and a tiny eval set before touching benchmarking scripts.

What makes skill-creator better than a normal prompt?

A normal prompt may give you a plausible first draft. skill-creator is better when you need a repeatable creation-and-improvement loop with evaluation support. Its real value is the surrounding method and helper files, not just the initial writing.

When should I not use skill-creator?

Skip it when:

you only need a one-time prompt
there is no plan to test outputs
the task is too small to justify a skill
your environment cannot use the repository’s supporting scripts or review flow

In those cases, a direct prompt is faster.

Does skill-creator only help with new skills?

No. The skill-creator skill is also suited to revising existing skills, benchmarking two versions, and improving descriptions for better triggering accuracy.

Do I need all the scripts to get value?

No. You can still use skill-creator usage for drafting and manual revision. But the evaluation scripts and viewer are where the repository gives the most information gain beyond ordinary prompting.

Is this only for Anthropic's skills ecosystem?

It is clearly designed around that ecosystem’s skill structure and terminology, so that is the best fit. Still, the workflow ideas—draft, eval, compare, revise—transfer well to other internal skill or agent frameworks.

How to Improve skill-creator skill

Give narrower task boundaries

The fastest way to improve skill-creator output quality is to define what the future skill should refuse or ignore. Without boundaries, drafts often become broad and trigger-happy. Include “use when” and “do not use when” examples in your prompt.

Supply realistic eval prompts early

Many users wait too long to create test cases. For skill-creator for Skill Authoring, early eval prompts force clarity about the real task. Good evals should reflect actual user inputs, not polished examples that make the skill look better than it is.

Write stronger expectations

Weak expectations create false confidence. Instead of:

“Output is clear”

Use:

“Output includes a prioritized recommendation”
“Every cited claim links to a provided source”
“Result contains assumptions and open questions sections”

This matches the philosophy seen in agents/grader.md, which explicitly warns against trivially satisfied assertions.

If you are deciding between two similar drafts, use the blind-comparison pattern instead of eyeballing the markdown. Small wording changes can affect execution in ways that are hard to predict from the skill file alone.

Inspect transcripts, not just final outputs

A polished final answer can hide poor tool use, missed files, or weak reasoning. skill-creator becomes more valuable when you review transcripts alongside outputs and ask why a version succeeded, which aligns with the analyzer agent’s purpose.

Improve one dimension at a time

Do not rewrite description, instructions, examples, and tool guidance all at once if you want learnings you can trust. Change one dimension, rerun a stable eval set, then review the delta. This makes the skill-creator guide process much more informative.

Use the repository files as operating instructions

If results feel vague, do not only reread SKILL.md. Read the support files that define evaluation behavior:

agents/comparator.md for what “better” means in A/B reviews
agents/grader.md for pass/fail rigor
agents/analyzer.md for post-hoc improvement insights
references/schemas.md for expected structures

These files often clarify how to use the skill more than the top-level description does.

Expand the eval set after the first win

A common failure mode is stopping after a few good runs. The skill-creator skill is explicitly built for iterative expansion: once the draft works on a small set, broaden the prompts to include edge cases, ambiguous requests, and failure-heavy examples. That is how you find whether the skill is robust or merely lucky.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

lean-ux-canvas

by deanpeters

lean-ux-canvas helps teams frame a business problem, surface assumptions, and define what to learn next using Lean UX Canvas v2. Use it for workshop prep, stakeholder alignment, and early product discovery when you need a practical lean-ux-canvas guide before solutioning.

Skill Authoring

Favorites 0GitHub 4.1k

documentation-lookup

by affaan-m

documentation-lookup helps agents answer library, framework, and API questions from current docs instead of memory. It is ideal for setup, configuration, reference, and code-example tasks when the latest syntax matters. Use the documentation-lookup skill for Skill Docs when a request depends on live documentation and version-accurate guidance.

Skill Docs

Favorites 0GitHub 156.1k

mcp-builder

by anthropics

mcp-builder is a practical guide for planning, building, and evaluating MCP servers for external APIs and services. It helps developers choose tool scope, naming, transport, Python or Node implementation patterns, and evaluation workflows so agents can use the server reliably.

MCP Server Development

Favorites 0GitHub 105k

user-story

by deanpeters

The user-story skill helps you turn product needs into a single, development-ready story with Mike Cohn wording and Gherkin acceptance criteria. Use it for clearer handoffs, better estimation, and a tighter user-story guide for Technical Writing and product teams.

Technical Writing

Favorites 0GitHub 4.1k

user-story-splitting

by deanpeters

The user-story-splitting skill helps you split large epics and user stories into smaller, independently deliverable stories using structured patterns. Use it for estimation, sequencing, risk reduction, and Skill Authoring workflows when a backlog item is too broad for a single sprint.

Skill Authoring

Favorites 0GitHub 0

sanity-best-practices

by sanity-io

The sanity-best-practices skill helps you choose the right Sanity patterns before you build. Use it for schemas, GROQ, TypeGen, Visual Editing, Portable Text, localization, migrations, Functions, Blueprints, and frontend integrations like Next.js, Nuxt, Astro, Remix, SvelteKit, Angular, Hydrogen, and the App SDK.

Frontend Development

Favorites 0GitHub 0

provider-docs

by hashicorp

The provider-docs skill helps you create, update, and verify Terraform Registry documentation for Terraform providers. Use it for provider-docs guide work, provider-docs for Technical Writing, and for keeping schema descriptions, tfplugindocs templates, and Registry output in sync when docs change.

Technical Writing

Favorites 0GitHub 0

press-release

by deanpeters

The press-release skill helps you draft an Amazon-style Working Backwards press release before you build. Use it to clarify customer value, test a product or feature idea, and align stakeholders with a concise, customer-centered narrative. Helpful for press-release for Technical Writing and early product planning.

Technical Writing

Favorites 0GitHub 4.1k

writing-skills

by obra

writing-skills is a Skill Authoring guide for creating, editing, and validating agent skills with a test-driven workflow. Learn the key files, prerequisites, and practical steps for pressure scenarios, baseline tests, and concise SKILL.md iteration.

Skill Authoring

Favorites 0GitHub 121.9k

prd-generator

by ognjengt

prd-generator turns a rough product idea into an AI-ready Product Requirements Document. It asks clarifying questions, follows a fixed template, and helps founders, product leads, and Skill Authoring workflows produce clearer specs for downstream AI coding tools. Use prd-generator when you need structured requirements, metrics, constraints, and implementation-ready context.

Skill Authoring

Favorites 0GitHub 0

command-creator

by softaworks

command-creator helps turn repeated Claude Code workflows into reusable slash commands. Learn the right command pattern, write agent-executable instructions, choose between .claude/commands/ and ~/.claude/commands/, and use the bundled references for examples and best practices.

Skill Authoring

Favorites 0GitHub 1.3k

altitude-horizon-framework

by deanpeters

altitude-horizon-framework is a decision-making skill for the PM-to-Director transition. Use it to diagnose altitude and horizon gaps, clarify scope and timing, and apply the Cascading Context Map when strategy is vague. It includes practical install, usage, and example guidance for skill authoring.

Skill Authoring

Favorites 0GitHub 4.1k

prompt-optimizer

by affaan-m

prompt-optimizer is a prompt-optimizer skill that analyzes rough prompts, finds missing context, and rewrites them into clearer, ready-to-paste prompts. It is best for prompt-optimizer guide work, prompt review, and prompt-optimizer for Prompt Writing, especially when you need better structure for Claude Code or ECC workflows. It does not execute the underlying task.

Prompt Writing

Favorites 0GitHub 156.2k

continuous-learning-v2

by affaan-m

continuous-learning-v2 turns Claude Code sessions into project-scoped learning with hooks, observer agents, confidence scoring, and promotion of repeated patterns into skills, commands, or agents.

Skill Authoring

Favorites 0GitHub 156.1k

documentation-and-adrs

by addyosmani

documentation-and-adrs helps agents write decision-focused technical documentation and ADRs. Use it to capture context, constraints, tradeoffs, rejected options, and consequences for architecture, APIs, infrastructure, auth, and feature changes. It is ideal when you need durable rationale for future engineers and agents, not just a polished summary.

Technical Writing

Favorites 0GitHub 18.7k

skill-creator

Overview of skill-creator skill

What skill-creator does

Who should use skill-creator

The real job-to-be-done

What stands out before install

What may block adoption

How to Use skill-creator skill

Install skill-creator in your skills environment

Read these files first

Start with the stage you are actually in

Give the input the skill actually needs

Turn a rough goal into a strong prompt

Use the built-in evaluation workflow

Review outputs with the HTML viewer

Use comparator and grader agents for less biased iteration

Tune the description, not just the body

Know the practical limits

skill-creator skill FAQ

Is skill-creator good for beginners?

What makes skill-creator better than a normal prompt?

When should I not use skill-creator?

Does skill-creator only help with new skills?

Do I need all the scripts to get value?

Is this only for Anthropic's skills ecosystem?

How to Improve skill-creator skill

Give narrower task boundaries

Supply realistic eval prompts early

Write stronger expectations

Compare versions blind when changes are subtle

Inspect transcripts, not just final outputs

Improve one dimension at a time

Use the repository files as operating instructions

Expand the eval set after the first win

Ratings & Reviews