judge

by NeoLabHQ

Judge is a two-phase evaluation skill that launches a meta-judge first, then a judge sub-agent to score work with isolated context, evidence, and clear criteria. Use it for report-only reviews of code, writing, analysis, or Skill Authoring when you need a defensible judge guide instead of a casual opinion.

Stars982

Favorites0

Comments0

AddedMay 9, 2026

CategorySkill Authoring

Install Command

npx skills add NeoLabHQ/context-engineering-kit --skill judge

Curation Score

This skill scores 66/100, which means it is listable but only as a modest, caveated option for users who want a structured judging workflow. It has enough real operational content to justify installation, but directory users should expect to do some interpretation because the repo provides no supporting scripts, references, or install command, and the workflow is mostly embedded in one SKILL.md file.

66/100

Strengths

Clear trigger and purpose: the frontmatter states it launches a meta-judge then a judge sub-agent for evaluation in the current conversation.
Substantial workflow content: the skill body is long, with multiple headings and defined phases, suggesting a non-placeholder judging process.
Evidence-oriented design: it explicitly asks for structured scoring and citations, which improves agent reliability over a generic prompt.

Cautions

No support files or install command, so adoption depends on reading and manually applying the SKILL.md workflow.
Operational specifics are still somewhat hidden in prose; directory users may need to infer exact execution steps and edge-case handling.

Claude Claude Code Agents Evaluation Verification Reasoning Context Engineering

Overview

Overview of judge skill

What judge does

The judge skill launches a two-phase evaluation workflow: a meta-judge first defines the right rubric for the task, then a judge sub-agent scores the work with isolated context and evidence. It is best for users who need a disciplined review of code, analysis, writing, or agent output rather than a casual opinion.

Who should use judge

Use the judge skill when you want a report-only assessment with clear criteria, citations, and actionable feedback. It is a strong fit for Skill Authoring reviews, repo change review, and any task where confirmation bias or session carryover could distort judgment.

Why it is different

Unlike a generic prompt asking for “feedback,” judge builds the evaluation criteria before scoring starts. That makes the judge skill better when the artifact type is uncertain, when you need multi-dimensional scoring, or when the review must be defensible to another human.

How to Use judge skill

Install judge and inspect the entry file

Install with npx skills add NeoLabHQ/context-engineering-kit --skill judge. Start with plugins/sadd/skills/judge/SKILL.md, since it contains the workflow, inputs, and evaluation constraints that define judge install behavior.

Give judge a concrete evaluation target

The skill works best when you name the work and the lens. A strong prompt looks like: Judge the last draft of the launch page for clarity, SEO fit, and factual accuracy. A weak prompt like Review this leaves the meta-judge too much guesswork.

Provide the right context for the judge pipeline

Include the artifact to evaluate, the success criteria, and any hard constraints such as tone, audience, rubric priorities, or forbidden changes. If you are using judge for Skill Authoring, say so explicitly and name the target skill, because the rubric should change for installation clarity, discoverability, and instruction quality.

Read these files first

For installation and adaptation, read SKILL.md first, then any workflow or policy files the repo includes. In this repository, the skill body itself is the main source of truth, so the fastest path is to inspect the prompt structure, the workflow phases, and the evidence requirements before you copy the pattern into your own system.

judge skill FAQ

Is judge only for code review?

No. The judge skill is meant for evaluating any produced work that benefits from a rubric: prompts, docs, analysis, agent outputs, or design decisions. The key requirement is that the result can be judged against explicit criteria with evidence.

When should I not use judge?

Do not use judge when you only need a quick subjective reaction, when there is no completed artifact yet, or when the task cannot be assessed from evidence. In those cases, a simpler prompt is usually faster and less brittle.

Is judge suitable for beginners?

Yes, if the user can name the artifact and the success criteria. Beginners usually struggle only when they ask for a judgment without context. The skill reduces that problem by forcing a meta-judge step, but it still needs a clear target.

How is judge different from a normal prompt?

A normal prompt often asks one model to both invent criteria and score the result in a single pass. The judge skill separates those roles, which usually improves consistency, reduces bias, and makes the final report easier to trust.

How to Improve judge skill

Make the evaluation target explicit

The best inputs for judge name the exact artifact, the desired audience, and the decision you are trying to support. For example: Evaluate the new onboarding doc for first-time contributors, with emphasis on setup clarity and missing prerequisites. That is better than Check my doc because the rubric can align with real user risk.

Add constraints that affect the rubric

If you care about line-level evidence, cite requirements, or a specific scoring scale, say so up front. Judge performs better when it knows whether to prioritize correctness, completeness, UX clarity, or policy compliance, instead of averaging them implicitly.

Iterate after the first report

Use the first judge report to tighten the next prompt: add missing context, clarify tradeoffs, and name any section that felt under-scored. For Skill Authoring, the most useful iteration is often to ask judge to re-evaluate installation clarity, usage realism, and boundary cases separately.

Watch for common failure modes

Judge can underperform when the source work is vague, when the artifact is incomplete, or when the evaluation focus is overloaded with too many goals. If that happens, split the task into narrower passes and feed judge only the material needed for the current decision.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Skill Authoring

Favorites 2GitHub 105.1k

lean-ux-canvas

by deanpeters

lean-ux-canvas helps teams frame a business problem, surface assumptions, and define what to learn next using Lean UX Canvas v2. Use it for workshop prep, stakeholder alignment, and early product discovery when you need a practical lean-ux-canvas guide before solutioning.

Skill Authoring

Favorites 0GitHub 4.1k

documentation-lookup

by affaan-m

documentation-lookup helps agents answer library, framework, and API questions from current docs instead of memory. It is ideal for setup, configuration, reference, and code-example tasks when the latest syntax matters. Use the documentation-lookup skill for Skill Docs when a request depends on live documentation and version-accurate guidance.

Skill Docs

Favorites 0GitHub 156.1k

mcp-builder

by anthropics

mcp-builder is a practical guide for planning, building, and evaluating MCP servers for external APIs and services. It helps developers choose tool scope, naming, transport, Python or Node implementation patterns, and evaluation workflows so agents can use the server reliably.

MCP Server Development

Favorites 0GitHub 105k

user-story

by deanpeters

The user-story skill helps you turn product needs into a single, development-ready story with Mike Cohn wording and Gherkin acceptance criteria. Use it for clearer handoffs, better estimation, and a tighter user-story guide for Technical Writing and product teams.

Technical Writing

Favorites 0GitHub 4.1k

user-story-splitting

by deanpeters

The user-story-splitting skill helps you split large epics and user stories into smaller, independently deliverable stories using structured patterns. Use it for estimation, sequencing, risk reduction, and Skill Authoring workflows when a backlog item is too broad for a single sprint.

Skill Authoring

Favorites 0GitHub 0

sanity-best-practices

by sanity-io

The sanity-best-practices skill helps you choose the right Sanity patterns before you build. Use it for schemas, GROQ, TypeGen, Visual Editing, Portable Text, localization, migrations, Functions, Blueprints, and frontend integrations like Next.js, Nuxt, Astro, Remix, SvelteKit, Angular, Hydrogen, and the App SDK.

Frontend Development

Favorites 0GitHub 0

provider-docs

by hashicorp

The provider-docs skill helps you create, update, and verify Terraform Registry documentation for Terraform providers. Use it for provider-docs guide work, provider-docs for Technical Writing, and for keeping schema descriptions, tfplugindocs templates, and Registry output in sync when docs change.

Technical Writing

Favorites 0GitHub 0

press-release

by deanpeters

The press-release skill helps you draft an Amazon-style Working Backwards press release before you build. Use it to clarify customer value, test a product or feature idea, and align stakeholders with a concise, customer-centered narrative. Helpful for press-release for Technical Writing and early product planning.

Technical Writing

Favorites 0GitHub 4.1k

writing-skills

by obra

writing-skills is a Skill Authoring guide for creating, editing, and validating agent skills with a test-driven workflow. Learn the key files, prerequisites, and practical steps for pressure scenarios, baseline tests, and concise SKILL.md iteration.

Skill Authoring

Favorites 0GitHub 121.9k

prd-generator

by ognjengt

prd-generator turns a rough product idea into an AI-ready Product Requirements Document. It asks clarifying questions, follows a fixed template, and helps founders, product leads, and Skill Authoring workflows produce clearer specs for downstream AI coding tools. Use prd-generator when you need structured requirements, metrics, constraints, and implementation-ready context.

Skill Authoring

Favorites 0GitHub 0

command-creator

by softaworks

command-creator helps turn repeated Claude Code workflows into reusable slash commands. Learn the right command pattern, write agent-executable instructions, choose between .claude/commands/ and ~/.claude/commands/, and use the bundled references for examples and best practices.

Skill Authoring

Favorites 0GitHub 1.3k

altitude-horizon-framework

by deanpeters

altitude-horizon-framework is a decision-making skill for the PM-to-Director transition. Use it to diagnose altitude and horizon gaps, clarify scope and timing, and apply the Cascading Context Map when strategy is vague. It includes practical install, usage, and example guidance for skill authoring.

Skill Authoring

Favorites 0GitHub 4.1k

prompt-optimizer

by affaan-m

prompt-optimizer is a prompt-optimizer skill that analyzes rough prompts, finds missing context, and rewrites them into clearer, ready-to-paste prompts. It is best for prompt-optimizer guide work, prompt review, and prompt-optimizer for Prompt Writing, especially when you need better structure for Claude Code or ECC workflows. It does not execute the underlying task.

Prompt Writing

Favorites 0GitHub 156.2k

continuous-learning-v2

by affaan-m

continuous-learning-v2 turns Claude Code sessions into project-scoped learning with hooks, observer agents, confidence scoring, and promotion of repeated patterns into skills, commands, or agents.

Skill Authoring

Favorites 0GitHub 156.1k