A

healthcare-eval-harness

by affaan-m

healthcare-eval-harness is a patient safety evaluation harness for healthcare app deployments. It helps teams verify CDSS accuracy, PHI exposure, data integrity, clinical workflow behavior, and integration compliance before release. Critical failures block deployment, making it useful for healthcare-eval-harness for Model Evaluation and CI safety gates.

Stars156.2k
Favorites0
Comments0
AddedApr 15, 2026
CategoryModel Evaluation
Install Command
npx skills add affaan-m/everything-claude-code --skill healthcare-eval-harness
Curation Score

This skill scores 78/100, which means it is a solid listing candidate for directory users who need a healthcare deployment safety harness. The repository shows a real, triggerable workflow for evaluating EMR/EHR changes, with explicit safety gates for CDSS accuracy, PHI exposure, data integrity, clinical workflow, and integration compliance. It is useful enough to install if you want a structured healthcare test harness rather than a generic prompt, though users should note that it is test-framework oriented and not bundled with helper scripts or references.

78/100
Strengths
  • Clear healthcare-specific trigger conditions: use before EMR/EHR deployments, CDSS changes, schema changes touching patient data, and auth changes.
  • Operationally meaningful gates: critical failures block deployment, with explicit pass thresholds for safety-focused categories.
  • Good workflow orientation: the body describes ordered test categories and framework-agnostic adaptation guidance, which helps an agent execute it with less guesswork.
Cautions
  • No install command, scripts, or supporting reference files are included, so adoption requires users to translate the harness into their own test framework.
  • The repository is labeled with experimental/test signals, so users should verify it fits their CI/CD and clinical validation standards before relying on it.
Overview

Overview of healthcare-eval-harness skill

What healthcare-eval-harness is

healthcare-eval-harness is a deployment safety skill for healthcare software teams that need to verify patient-facing changes before release. It focuses on model- and rules-based evaluation for clinical decision support, PHI exposure, data integrity, workflow correctness, and integration behavior. The point is not generic QA; it is to stop unsafe healthcare changes from shipping.

Who should use it

This healthcare-eval-harness skill is a good fit for engineers, QA leads, MLOps teams, and clinical informatics teams working on EMR, EHR, CDSS, or adjacent healthcare apps. It is most useful when a failure could affect dosing, triage, access control, or regulated patient data handling. If you need a lightweight prompt for a non-clinical app, this is probably too strict.

What makes it different

The repository treats safety gates as hard release criteria: critical failures block deployment instead of being logged as warnings. That makes healthcare-eval-harness useful when you need an installable evaluation pattern, not just a checklist. It also expects you to adapt the harness to your test runner, which keeps it portable across Jest, Vitest, pytest, or PHPUnit.

How to Use healthcare-eval-harness skill

Install and inspect the skill

Install with npx skills add affaan-m/everything-claude-code --skill healthcare-eval-harness. Then read skills/healthcare-eval-harness/SKILL.md first, followed by any linked guidance in the repo root if you are using the broader package. For this skill, the main value is in the evaluation rules and thresholds, so do not skip the “When to Use” and “How It Works” sections.

Turn your task into a useful prompt

A strong healthcare-eval-harness usage prompt should name the system under test, the change type, the test runner, and the safety concern. For example: “Apply healthcare-eval-harness to our EHR medication order flow in pytest. We changed dose validation and role-based access, and I need the critical gates to block release on PHI leakage or unsafe dosing failures.” That is much better than “Run the healthcare skill.”

Use the skill when a change touches patient data, clinical logic, or deployment controls. First map your feature to the five evaluation categories, then decide which ones are critical versus high priority. Next, translate the rules into your existing framework and CI pipeline, and only then run the checks. The most important decision is whether your test suite actually reflects the clinical failure mode you want to prevent.

What to read first

Start with SKILL.md for the gate structure, pass thresholds, and usage boundaries. Pay special attention to the examples that use Jest as a reference only; the skill is framework-agnostic, so you should adapt the file paths, commands, and assertions to your stack. If your repo has its own test organization, mirror that structure instead of forcing a generic layout.

healthcare-eval-harness skill FAQ

Is healthcare-eval-harness only for Jest?

No. Jest is shown as an example, but healthcare-eval-harness is meant to work with any serious test runner. The important part is preserving the critical gate logic, category order, and pass thresholds in your own tooling.

Is this the same as a normal prompt for healthcare QA?

No. A normal prompt may generate tests, but the healthcare-eval-harness skill gives you an installable evaluation model with explicit blocking behavior. That matters when you need reliable deployment decisions for healthcare application changes.

When should I not use it?

Do not use healthcare-eval-harness for low-risk content changes, marketing pages, or features that do not touch patient safety, clinical workflows, or regulated data. It can be overkill if your team does not have the discipline to maintain tests that reflect real clinical risk.

Is it beginner-friendly?

Yes, if you already know basic testing and CI concepts. It is not a tutorial on healthcare compliance, so beginners will still need domain review for thresholds, edge cases, and what counts as a critical failure.

How to Improve healthcare-eval-harness skill

Give the skill sharper clinical context

The best healthcare-eval-harness results come from specific inputs: the patient workflow, the failure you fear, the data fields involved, and the expected safe behavior. “Test the app” is weak; “test that a medication order with an allergy match blocks submission and logs the reason” is actionable.

Make the failure gates explicit

State which failures must block deployment and which can be high-priority warnings. If you want the skill to evaluate healthcare AI for Model Evaluation, say whether you care more about hallucination risk, PHI leakage, guideline adherence, or workflow breakage. The more explicit the gate, the less guesswork in the output.

Iterate against real misses

After the first run, compare the harness output to actual incidents, near misses, or clinician feedback. Tighten the assertions where unsafe behavior slipped through, and relax only the checks that create noise without improving safety. That feedback loop is what makes healthcare-eval-harness useful beyond a one-time prompt.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...