D

create-skill-test

by dotnet

create-skill-test scaffolds eval.yaml test files for agent skills in dotnet/skills. Use it to create skill tests, define scenarios, fixtures, assertions, and rubrics, and reduce overfitting in evaluation design. It is not for running existing tests, debugging validator errors, or authoring SKILL.md files.

Stars3k
Favorites0
Comments0
AddedMay 25, 2026
CategorySkill Testing
Install Command
npx skills add dotnet/skills --skill create-skill-test
Curation Score

This skill scores 62/100, which means it is listable but should be approached with caution: it gives directory users a real, targeted workflow for scaffolding eval.yaml test files, yet it is narrower and more repository-specific than a broadly reusable skill.

62/100
Strengths
  • Clear triggerability: the frontmatter says to use it for creating eval.yaml test files, adding scenarios, setting up fixtures, and checking overfitting risk.
  • Operationally concrete workflow: the body includes explicit inputs, when-to-use / when-not-to-use guidance, and a multi-step process with constraints.
  • Good install decision value for dotnet/skills contributors: it references validator checks and repository conventions, which reduces guesswork versus a generic prompt.
Cautions
  • It is experimental/test-oriented and scoped to dotnet/skills conventions, so it may not transfer well outside that repository.
  • No scripts, references, or support files are included, so users must rely on the document alone for implementation details.
Overview

Overview of create-skill-test skill

create-skill-test is a scaffold-and-validate helper for building eval.yaml test files for agent skills in the dotnet/skills repository. It is aimed at people who need a reliable starting point for skill testing, not a general prompt for “write a test.” The main job is to turn a target skill, plugin name, and scenario idea into a convention-safe test structure with fixtures, assertions, and rubrics that are less likely to overfit.

The create-skill-test skill is best for authors who already know which skill they want to evaluate and need a fast way to produce a test file that fits repository rules. It is less useful if you are only trying to run tests, debug validator failures, or write skill instructions from scratch.

What create-skill-test is for

Use the create-skill-test skill when you are creating a new eval file, extending an existing one with more scenarios, or checking whether your rubric is too specific to one exact output. It is especially useful for create-skill-test for Skill Testing workflows where the quality of the test design matters as much as the YAML shape.

What it helps you avoid

The biggest value is avoiding fragile evals: missing required fields, mismatched skill paths, poor fixture organization, and rubric language that accidentally rewards one phrasing instead of the real behavior. That matters if you want tests that stay useful as the target skill evolves.

What it does not replace

It does not replace the skill-validator, and it does not help with editing SKILL.md files. If your goal is to diagnose a broken test run or debug validator output, this is the wrong tool.

How to Use create-skill-test skill

Install and open the source skill

Install create-skill-test with npx skills add dotnet/skills --skill create-skill-test. Then read SKILL.md first, because it contains the workflow, input requirements, and the boundaries that determine whether your request is valid before you ask the model to generate anything.

Give the skill the right test brief

A strong create-skill-test install request is not just “make a test.” Include the skill name, plugin name, the behavior you want to verify, and any scenario constraints. The skill expects inputs like the target skill under plugins/<plugin>/skills/, so naming precision matters.

A better brief looks like this:

  • Skill: foo-bar
  • Plugin: dotnet-msbuild
  • Goal: verify that the agent creates a valid summary and rejects unsupported paths
  • Scenario: first-time user with partial context
  • Fixture need: one minimal input file and one edge-case file

That gives the create-skill-test usage flow enough structure to build a useful eval instead of a generic one.

Read the repository sections that matter

Start with SKILL.md, then inspect any README.md, AGENTS.md, metadata.json, and nearby rules/, resources/, references/, or scripts/ folders if they exist. In this repository snapshot, SKILL.md is the only file surfaced, so the skill definition itself is the main source of truth.

Iterate on scenarios and rubrics

Use the first draft to check whether the test actually measures the intended behavior. If the rubric rewards wording instead of outcomes, tighten it. If the scenario is too broad, split it. If the skill only needs one happy path, keep the eval small rather than inventing extra cases.

create-skill-test skill FAQ

Is create-skill-test only for dotnet/skills?

Yes, it is designed around the dotnet/skills repository conventions and the plugins/<plugin>/skills/ layout. You can adapt the idea elsewhere, but the create-skill-test guide is most valuable when your repo follows the same structure and validation expectations.

Should I use it instead of a normal prompt?

Use create-skill-test when you want a repeatable eval scaffold with fewer structural mistakes. A normal prompt can describe a test, but it will usually be weaker on repository-specific conventions, fixture placement, and overfitting checks.

Is it beginner-friendly?

Yes, if you can identify the target skill and explain the scenario in plain language. It is not beginner-friendly if you cannot name the plugin, the skill path, or the behavior being tested, because those inputs drive the generated output.

When should I not use it?

Do not use create-skill-test for running tests, debugging validator errors, or authoring a new skill. Those are adjacent workflows with different tools and different success criteria.

How to Improve create-skill-test skill

Provide narrower inputs

The best create-skill-test results come from specific scenarios, not broad intentions. “Test that the skill handles missing context and returns a safe fallback” is stronger than “make a comprehensive eval,” because it tells the skill what behavior matters and what to avoid over-crediting.

Ask for rubric quality, not just YAML

If you only ask for structure, you may get a technically valid file that still overfits. Say what should count as success, what should fail, and which details are incidental. That is the fastest way to improve create-skill-test for Skill Testing outcomes.

Check for overfitting after generation

Review whether the assertions reward a single phrasing, a fixed order, or an exact example string unless that specificity is truly required. Good evals measure the behavior the skill should preserve, not the exact wording produced in one run.

Refine by validator feedback

If the first output fails validation, feed back the exact error and the surrounding YAML fragment. That usually produces a better second pass than restating the whole request.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...