create-skill-test

by dotnet

create-skill-test scaffolds eval.yaml test files for agent skills in dotnet/skills. Use it to create skill tests, define scenarios, fixtures, assertions, and rubrics, and reduce overfitting in evaluation design. It is not for running existing tests, debugging validator errors, or authoring SKILL.md files.

Stars3k

Favorites0

Comments0

AddedMay 25, 2026

CategorySkill Testing

Install Command

npx skills add dotnet/skills --skill create-skill-test

Curation Score

This skill scores 62/100, which means it is listable but should be approached with caution: it gives directory users a real, targeted workflow for scaffolding eval.yaml test files, yet it is narrower and more repository-specific than a broadly reusable skill.

62/100

Strengths

Clear triggerability: the frontmatter says to use it for creating eval.yaml test files, adding scenarios, setting up fixtures, and checking overfitting risk.
Operationally concrete workflow: the body includes explicit inputs, when-to-use / when-not-to-use guidance, and a multi-step process with constraints.
Good install decision value for dotnet/skills contributors: it references validator checks and repository conventions, which reduces guesswork versus a generic prompt.

Cautions

It is experimental/test-oriented and scoped to dotnet/skills conventions, so it may not transfer well outside that repository.
No scripts, references, or support files are included, so users must rely on the document alone for implementation details.

Test Template Docs Developer Audience Dotnet

Overview

Overview of create-skill-test skill

create-skill-test is a scaffold-and-validate helper for building eval.yaml test files for agent skills in the dotnet/skills repository. It is aimed at people who need a reliable starting point for skill testing, not a general prompt for “write a test.” The main job is to turn a target skill, plugin name, and scenario idea into a convention-safe test structure with fixtures, assertions, and rubrics that are less likely to overfit.

The create-skill-test skill is best for authors who already know which skill they want to evaluate and need a fast way to produce a test file that fits repository rules. It is less useful if you are only trying to run tests, debug validator failures, or write skill instructions from scratch.

What create-skill-test is for

Use the create-skill-test skill when you are creating a new eval file, extending an existing one with more scenarios, or checking whether your rubric is too specific to one exact output. It is especially useful for create-skill-test for Skill Testing workflows where the quality of the test design matters as much as the YAML shape.

What it helps you avoid

The biggest value is avoiding fragile evals: missing required fields, mismatched skill paths, poor fixture organization, and rubric language that accidentally rewards one phrasing instead of the real behavior. That matters if you want tests that stay useful as the target skill evolves.

What it does not replace

It does not replace the skill-validator, and it does not help with editing SKILL.md files. If your goal is to diagnose a broken test run or debug validator output, this is the wrong tool.

How to Use create-skill-test skill

Install and open the source skill

Install create-skill-test with npx skills add dotnet/skills --skill create-skill-test. Then read SKILL.md first, because it contains the workflow, input requirements, and the boundaries that determine whether your request is valid before you ask the model to generate anything.

Give the skill the right test brief

A strong create-skill-test install request is not just “make a test.” Include the skill name, plugin name, the behavior you want to verify, and any scenario constraints. The skill expects inputs like the target skill under plugins/<plugin>/skills/, so naming precision matters.

A better brief looks like this:

Skill: foo-bar
Plugin: dotnet-msbuild
Goal: verify that the agent creates a valid summary and rejects unsupported paths
Scenario: first-time user with partial context
Fixture need: one minimal input file and one edge-case file

That gives the create-skill-test usage flow enough structure to build a useful eval instead of a generic one.

Read the repository sections that matter

Start with SKILL.md, then inspect any README.md, AGENTS.md, metadata.json, and nearby rules/, resources/, references/, or scripts/ folders if they exist. In this repository snapshot, SKILL.md is the only file surfaced, so the skill definition itself is the main source of truth.

Iterate on scenarios and rubrics

Use the first draft to check whether the test actually measures the intended behavior. If the rubric rewards wording instead of outcomes, tighten it. If the scenario is too broad, split it. If the skill only needs one happy path, keep the eval small rather than inventing extra cases.

create-skill-test skill FAQ

Is create-skill-test only for dotnet/skills?

Yes, it is designed around the dotnet/skills repository conventions and the plugins/<plugin>/skills/ layout. You can adapt the idea elsewhere, but the create-skill-test guide is most valuable when your repo follows the same structure and validation expectations.

Should I use it instead of a normal prompt?

Use create-skill-test when you want a repeatable eval scaffold with fewer structural mistakes. A normal prompt can describe a test, but it will usually be weaker on repository-specific conventions, fixture placement, and overfitting checks.

Is it beginner-friendly?

Yes, if you can identify the target skill and explain the scenario in plain language. It is not beginner-friendly if you cannot name the plugin, the skill path, or the behavior being tested, because those inputs drive the generated output.

When should I not use it?

Do not use create-skill-test for running tests, debugging validator errors, or authoring a new skill. Those are adjacent workflows with different tools and different success criteria.

How to Improve create-skill-test skill

Provide narrower inputs

The best create-skill-test results come from specific scenarios, not broad intentions. “Test that the skill handles missing context and returns a safe fallback” is stronger than “make a comprehensive eval,” because it tells the skill what behavior matters and what to avoid over-crediting.

Ask for rubric quality, not just YAML

If you only ask for structure, you may get a technically valid file that still overfits. Say what should count as success, what should fail, and which details are incidental. That is the fastest way to improve create-skill-test for Skill Testing outcomes.

Check for overfitting after generation

Review whether the assertions reward a single phrasing, a fixed order, or an exact example string unless that specificity is truly required. Good evals measure the behavior the skill should preserve, not the exact wording produced in one run.

Refine by validator feedback

If the first output fails validation, feed back the exact error and the surrounding YAML fragment. That usually produces a better second pass than restating the whole request.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Skill Authoring

Favorites 2GitHub 105.1k

cpp-testing

by affaan-m

The cpp-testing skill helps you write, run, and debug C++ tests with GoogleTest, GoogleMock, CMake, and CTest. Use it for coverage, flaky-test fixes, sanitizer-backed diagnostics, and practical cpp-testing usage in modern C++ projects.

Test Automation

Favorites 0GitHub 156.1k

test-driven-development

by addyosmani

The test-driven-development skill helps you change code by writing a failing test first, then making the smallest fix pass. Use it for logic changes, bug fixes, regressions, and edge cases where proof matters more than a plausible patch.

Skill Testing

Favorites 0GitHub 18.8k

skill-optimizer

by mcollina

skill-optimizer helps authors improve AI skills for activation, clarity, and cross-model reliability. Use it for Skill Authoring when a skill is written but not reliably followed, when triggers are weak, regressions appear, or context cost needs trimming. It supports benchmark loops, release gates, and tighter usage fidelity.

Skill Authoring

Favorites 0GitHub 1.8k

property-based-testing

by trailofbits

property-based-testing skill guide for writing, reviewing, and improving PBT across languages and smart contracts. Use this property-based-testing guide to spot roundtrip, idempotence, invariant, parser, validator, and normalization cases, choose generators, and decide when property-based-testing is stronger than example-based tests.

Skill Testing

Favorites 0GitHub 5k

writing-skills

by obra

writing-skills is a Skill Authoring guide for creating, editing, and validating agent skills with a test-driven workflow. Learn the key files, prerequisites, and practical steps for pressure scenarios, baseline tests, and concise SKILL.md iteration.

Skill Authoring

Favorites 0GitHub 121.9k

verification-loop

by affaan-m

verification-loop is a Claude Code verification workflow for checking builds, types, lint, tests, security, and diffs after code changes. This verification-loop skill is useful before PRs and after refactors when you want a structured post-change guide instead of a generic prompt.

Verification

Favorites 0GitHub 156.3k

perl-testing

by affaan-m

perl-testing is a practical guide for writing, running, and improving Perl tests with Test2::V0, Test::More, prove, mocking, coverage, and TDD. Use the perl-testing skill for install guidance, usage patterns, migration help, and faster debugging of failing suites.

Skill Testing

Favorites 0GitHub 156.2k

kotlin-testing

by affaan-m

kotlin-testing is a practical guide for Kotlin test automation with Kotest, MockK, coroutine testing, property-based tests, and Kover coverage. Use this kotlin-testing skill to follow a TDD-friendly workflow, write clearer unit and component tests, and reduce guesswork when mocking dependencies or testing suspending code.

Test Automation

Favorites 0GitHub 156.2k

eval-harness

by affaan-m

The eval-harness skill is a formal evaluation framework for Claude Code sessions and eval-driven development. It helps you define pass/fail criteria, build capability and regression evals, and measure agent reliability before shipping prompt or workflow changes.

Model Evaluation

Favorites 0GitHub 156.1k

context-budget

by affaan-m

The context-budget skill audits Claude Code context use across agents, skills, rules, and MCP servers. It helps identify bloat, duplicate content, and high-cost components, then returns prioritized cleanup actions. Use this context-budget guide for practical context-budget usage and for Skill Testing in larger setups.

Skill Testing

Favorites 0GitHub 156.1k

skill-judge

by softaworks

skill-judge is a review and scoring skill for auditing AI skill packages and SKILL.md files. It helps authors and maintainers judge knowledge delta, activation clarity, workflow quality, and publish readiness with actionable improvement guidance.

Skill Validation

Favorites 0GitHub 1.3k

playwright-testing

by alinaqi

playwright-testing skill for writing and debugging Playwright end-to-end tests with page objects, cross-browser runs, CI-friendly setup, auth handling, and stable test structure.

Skill Testing

Favorites 0GitHub 607

darwin-skill

by alchaincyf

darwin-skill helps improve SKILL.md files with a repeatable loop: evaluate, revise, test, then keep or revert changes. Built for Skill Authoring, it combines rubric scoring with prompt-based validation and supports visual result outputs from repo templates and assets.

Skill Authoring

Favorites 0GitHub 549

evaluation

by muratcankoylan

The evaluation skill helps you design and run agent evaluations for non-deterministic systems. Use it for evaluation install planning, rubrics, regression checks, quality gates, and evaluation for Skill Testing. It fits LLM-as-judge workflows, multi-dimensional scoring, and practical evaluation usage when you need repeatable results.

Skill Testing

Favorites 0GitHub 0

tutor

by RoundTable02

tutor is a quiz-driven study skill for Obsidian StudyVault users who want diagnostic assessments, concept-level review, and progress tracking. It detects language, finds the vault, reads the dashboard, and drills weak areas through structured sessions. Use tutor when you need repeatable study checks instead of a generic chat tutor.

Skill Authoring

Favorites 0GitHub 0