Judge is a two-phase evaluation skill that launches a meta-judge first, then a judge sub-agent to score work with isolated context, evidence, and clear criteria. Use it for report-only reviews of code, writing, analysis, or Skill Authoring when you need a defensible judge guide instead of a casual opinion.

Stars982
Favorites0
Comments0
AddedMay 9, 2026
CategorySkill Authoring
Install Command
npx skills add NeoLabHQ/context-engineering-kit --skill judge
Curation Score

This skill scores 66/100, which means it is listable but only as a modest, caveated option for users who want a structured judging workflow. It has enough real operational content to justify installation, but directory users should expect to do some interpretation because the repo provides no supporting scripts, references, or install command, and the workflow is mostly embedded in one SKILL.md file.

66/100
Strengths
  • Clear trigger and purpose: the frontmatter states it launches a meta-judge then a judge sub-agent for evaluation in the current conversation.
  • Substantial workflow content: the skill body is long, with multiple headings and defined phases, suggesting a non-placeholder judging process.
  • Evidence-oriented design: it explicitly asks for structured scoring and citations, which improves agent reliability over a generic prompt.
Cautions
  • No support files or install command, so adoption depends on reading and manually applying the SKILL.md workflow.
  • Operational specifics are still somewhat hidden in prose; directory users may need to infer exact execution steps and edge-case handling.
Overview

Overview of judge skill

What judge does

The judge skill launches a two-phase evaluation workflow: a meta-judge first defines the right rubric for the task, then a judge sub-agent scores the work with isolated context and evidence. It is best for users who need a disciplined review of code, analysis, writing, or agent output rather than a casual opinion.

Who should use judge

Use the judge skill when you want a report-only assessment with clear criteria, citations, and actionable feedback. It is a strong fit for Skill Authoring reviews, repo change review, and any task where confirmation bias or session carryover could distort judgment.

Why it is different

Unlike a generic prompt asking for “feedback,” judge builds the evaluation criteria before scoring starts. That makes the judge skill better when the artifact type is uncertain, when you need multi-dimensional scoring, or when the review must be defensible to another human.

How to Use judge skill

Install judge and inspect the entry file

Install with npx skills add NeoLabHQ/context-engineering-kit --skill judge. Start with plugins/sadd/skills/judge/SKILL.md, since it contains the workflow, inputs, and evaluation constraints that define judge install behavior.

Give judge a concrete evaluation target

The skill works best when you name the work and the lens. A strong prompt looks like: Judge the last draft of the launch page for clarity, SEO fit, and factual accuracy. A weak prompt like Review this leaves the meta-judge too much guesswork.

Provide the right context for the judge pipeline

Include the artifact to evaluate, the success criteria, and any hard constraints such as tone, audience, rubric priorities, or forbidden changes. If you are using judge for Skill Authoring, say so explicitly and name the target skill, because the rubric should change for installation clarity, discoverability, and instruction quality.

Read these files first

For installation and adaptation, read SKILL.md first, then any workflow or policy files the repo includes. In this repository, the skill body itself is the main source of truth, so the fastest path is to inspect the prompt structure, the workflow phases, and the evidence requirements before you copy the pattern into your own system.

judge skill FAQ

Is judge only for code review?

No. The judge skill is meant for evaluating any produced work that benefits from a rubric: prompts, docs, analysis, agent outputs, or design decisions. The key requirement is that the result can be judged against explicit criteria with evidence.

When should I not use judge?

Do not use judge when you only need a quick subjective reaction, when there is no completed artifact yet, or when the task cannot be assessed from evidence. In those cases, a simpler prompt is usually faster and less brittle.

Is judge suitable for beginners?

Yes, if the user can name the artifact and the success criteria. Beginners usually struggle only when they ask for a judgment without context. The skill reduces that problem by forcing a meta-judge step, but it still needs a clear target.

How is judge different from a normal prompt?

A normal prompt often asks one model to both invent criteria and score the result in a single pass. The judge skill separates those roles, which usually improves consistency, reduces bias, and makes the final report easier to trust.

How to Improve judge skill

Make the evaluation target explicit

The best inputs for judge name the exact artifact, the desired audience, and the decision you are trying to support. For example: Evaluate the new onboarding doc for first-time contributors, with emphasis on setup clarity and missing prerequisites. That is better than Check my doc because the rubric can align with real user risk.

Add constraints that affect the rubric

If you care about line-level evidence, cite requirements, or a specific scoring scale, say so up front. Judge performs better when it knows whether to prioritize correctness, completeness, UX clarity, or policy compliance, instead of averaging them implicitly.

Iterate after the first report

Use the first judge report to tighten the next prompt: add missing context, clarify tradeoffs, and name any section that felt under-scored. For Skill Authoring, the most useful iteration is often to ask judge to re-evaluate installation clarity, usage realism, and boundary cases separately.

Watch for common failure modes

Judge can underperform when the source work is vague, when the artifact is incomplete, or when the evaluation focus is overloaded with too many goals. If that happens, split the task into narrower passes and feed judge only the material needed for the current decision.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...