skill-comply
by affaan-mskill-comply is a compliance-testing skill that checks whether an agent follows a skill, rule, or agent definition in real runs. It generates specs from markdown, runs three prompt strictness levels, classifies tool-call timelines, and reports compliance rates with evidence. Useful for skill-comply for Compliance Review.
This skill scores 78/100, which means it is a solid listing candidate for directory users who want an agent to verify whether skills, rules, and agent definitions are actually being followed. The repository provides a concrete workflow, explicit activation cues, and supporting scripts/tests, so users can judge install value with reasonable confidence, though they should expect some operational setup effort.
- Explicitly describes a multi-step compliance workflow: spec generation, 3-level scenario generation, trace capture, classification, and reporting.
- Strong triggerability and scope clarity: SKILL.md says when to activate it and which targets it supports (skills, rules, agent definitions).
- Real implementation evidence: multiple scripts, prompts, fixtures, and tests back the documented workflow.
- No install command in SKILL.md, so users must wire it up manually and may need to inspect scripts to run it correctly.
- The repo notes agent-definition workflow verification is not yet fully supported, which limits coverage compared with the broad title.
Overview of skill-comply skill
skill-comply is a compliance-testing skill for checking whether an agent actually follows a skill, rule, or agent definition in real runs. It fits users who need evidence, not assumptions: maintainers validating a workflow rule, authors testing a new skill, or teams asking whether a coding agent obeys TDD, review, or process constraints under different prompt conditions.
What the skill-comply skill does
The skill-comply skill generates an expected behavior spec from a markdown source, creates three prompts with decreasing support, runs the agent, then compares observed tool-call timelines against the spec. That makes it useful for Compliance Review when you care about both presence and order of actions, not just final output.
When skill-comply is a good fit
Use skill-comply when you need to verify that a rule is followed under pressure: supportive prompts, neutral prompts, and competing prompts. It is especially relevant for skills that depend on sequence, such as “test before implementation” or “read the rule before editing.”
What makes it different
Unlike a generic prompt asking “did it follow the rules?”, skill-comply operationalizes the check: it extracts steps, classifies tool calls with an LLM, and evaluates ordering deterministically. The value is in the trace, timeline, and compliance rate, which help you decide whether the skill is reliable enough to keep using.
How to Use skill-comply skill
Install and activate skill-comply
Install the skill-comply skill with:
npx skills add affaan-m/everything-claude-code --skill skill-comply
Then run it against the markdown file you want to verify. The repository’s own usage pattern is centered on CLI execution, so the skill works best when you point it at a single target file and treat the output as a compliance report, not a prose summary.
Read these files first
For the skill-comply install and setup path, start with skills/skill-comply/SKILL.md, then inspect prompts/spec_generator.md, prompts/scenario_generator.md, and prompts/classifier.md. Those three prompts show the real workflow: spec extraction, scenario generation, and trace classification. If you want to understand implementation constraints, skim scripts/run.py, scripts/spec_generator.py, scripts/scenario_generator.py, and scripts/classifier.py.
How to shape a good input
A strong skill-comply usage prompt is a concrete compliance target, not a vague policy. Good inputs name the file and the behavior you want verified, for example: “Check whether rules/common/testing.md is followed during a coding task” or “Measure whether the agent writes tests before implementation in this skill.” Weak inputs like “is this good?” do not give the tool enough behavior to score.
Practical workflow for better results
Use this sequence: choose one rule or skill, generate the spec, review the extracted steps, then run the three scenario levels. The best way to use skill-comply for Compliance Review is to compare the supportive, neutral, and competing runs side by side, because that shows whether the behavior is robust or only appears when the prompt helps.
skill-comply skill FAQ
Is skill-comply only for coding skills?
No. It is best for coding-agent workflows, but the repository explicitly supports skills, rules, and agent definitions. If your target is a markdown policy with observable actions, skill-comply is a strong fit.
How is this different from a normal prompt test?
A normal prompt test checks whether an answer looks right. skill-comply checks whether the agent’s actions match an expected sequence, including tool-use timing. That matters when compliance is about process, not just output.
Is skill-comply beginner-friendly?
Yes, if you can identify the file being tested and describe the behavior you expect. The harder part is choosing a target with clear observable steps. It is less useful when the policy is vague or mostly human judgment.
When should I not use it?
Do not use skill-comply when the target has no actionable sequence, no meaningful tool calls, or only subjective quality criteria. It is also a poor fit if you need full production observability beyond a single claude -p run and trace comparison.
How to Improve skill-comply skill
Give it sharper source material
skill-comply works best when the source markdown states concrete actions, ordering, and exceptions. If your rule says “prefer tests” instead of “write a test before implementation,” the extracted spec will be harder to score and less useful for Compliance Review.
Watch for the main failure modes
The biggest risk is over-trusting an extracted spec that is too broad or too narrow. Another common issue is confusing prompt support with real compliance: a skill may look good in the supportive scenario and fail once the prompt becomes neutral or competing. Use skill-comply usage results to check robustness, not just one green run.
Strengthen the first run inputs
Provide a target path, a realistic task, and any setup commands needed to reproduce the behavior under test. If the skill depends on files, commands, or environment assumptions, include those explicitly so the generated scenarios reflect actual use rather than a toy example.
Iterate from trace to spec
After the first run, inspect the generated spec and the tool-call timeline before you change the prompt or skill text. If a step was missed, decide whether the issue is the skill wording, the scenario design, or the detector description. That loop is where skill-comply adds the most value: it turns “did it comply?” into specific edits you can make to the source rule.
