Evaluation

Evaluation taxonomy generated by the site skill importer.

3 skills
A
healthcare-eval-harness

by affaan-m

healthcare-eval-harness is a patient safety evaluation harness for healthcare app deployments. It helps teams verify CDSS accuracy, PHI exposure, data integrity, clinical workflow behavior, and integration compliance before release. Critical failures block deployment, making it useful for healthcare-eval-harness for Model Evaluation and CI safety gates.

Model Evaluation
Favorites 0GitHub 156.2k
A
eval-harness

by affaan-m

The eval-harness skill is a formal evaluation framework for Claude Code sessions and eval-driven development. It helps you define pass/fail criteria, build capability and regression evals, and measure agent reliability before shipping prompt or workflow changes.

Model Evaluation
Favorites 0GitHub 156.1k
A
continuous-agent-loop

by affaan-m

continuous-agent-loop helps agents run repeatable autonomous loops with quality gates, evals, recovery steps, and clear stop rules for reliable task completion.

Agent Orchestration
Favorites 0GitHub 156.1k