do-and-judge
by NeoLabHQThe do-and-judge skill executes a single task with a sub-agent implementation step, an independent judge, and retry-based verification until it passes or max retries are reached. Use do-and-judge for Workflow Automation when you need clear acceptance criteria, isolated execution, and less guesswork than a generic prompt.
This skill scores 78/100, which means it is a solid listing candidate for directory users who want a structured execute-and-verify workflow. The repository gives enough operational detail to understand when to use it and how it behaves, though it still lacks some adoption aids that would reduce setup and usage guesswork.
- Clear trigger and workflow: it is explicitly for a single task with implementation, independent judging, and retry until pass or max retries.
- Strong agent leverage: the meta-judge plus judge loop, parallel dispatch, and feedback retry pattern should help agents execute with less self-check bias.
- Operational structure is substantial: valid frontmatter, long body, many headings, and multiple workflow/constraint signals suggest real procedural content rather than a placeholder.
- No install command, support files, or references are provided, so users must rely on the SKILL.md alone.
- The excerpt shows a hard orchestration constraint and truncation, which may make the skill feel brittle or harder to adapt in broader agent setups.
Overview of do-and-judge skill
What do-and-judge does
The do-and-judge skill is a single-task execution pattern for workflow automation: it sends work to an implementation sub-agent, creates a separate judge rubric, then retries until the result passes or the retry limit is reached. It is best for jobs where quality depends on external verification, not just one-shot generation.
Who should use it
Use do-and-judge when you need an agent to complete a bounded task with measurable acceptance criteria, such as refactors, code edits, or structured content changes. It is a good fit if you want less self-critique and more independent checking before output is accepted.
Why it stands out
The main value of the do-and-judge skill is the separation of roles: the orchestrator does not do the task itself, the implementation agent works from fresh context, and the judge evaluates against a dedicated specification. That design reduces blind spots and makes the do-and-judge install worthwhile when correctness matters more than speed alone.
How to Use do-and-judge skill
do-and-judge install and setup
Install the do-and-judge skill in your skills workspace, then open SKILL.md first because it contains the operating rules and the control flow. For a quick repo pass, read SKILL.md before anything else; there are no helper scripts or support folders to lean on here, so the skill file is the source of truth.
Turn a vague request into usable input
The do-and-judge usage pattern works best when the task is narrow, testable, and has a clear finish line. Instead of asking for “improve this module,” provide:
- the exact target file or component
- the desired outcome
- constraints that must not change
- a pass/fail condition or expected behavior
Strong prompt example: Refactor the UserService class to use dependency injection without changing public method names; verify that all existing tests still pass and that constructor wiring is explicit.
Suggested workflow
A practical do-and-judge guide is: define the task, let the implementation agent work in isolation, generate a judge rubric, check the result against that rubric, then retry only on concrete failures. The workflow is designed for do-and-judge for Workflow Automation where the goal is controlled execution, not open-ended brainstorming.
What to watch in the repo
Read SKILL.md for the process, the critical constraints, and the retry threshold. Pay special attention to the sections on task scope, context handling, and red flags, because those determine whether the orchestrator behaves correctly. If you are adapting the skill to another stack, map those rules to your own tooling before using it on a real task.
do-and-judge skill FAQ
Is do-and-judge better than a normal prompt?
For simple requests, no. A normal prompt is faster. do-and-judge is better when you need a task to be implemented and independently verified, especially if the first answer is likely to miss edge cases or drift from requirements.
Is this skill beginner-friendly?
Yes, if you can describe the task clearly. The main learning curve is not the syntax; it is providing enough task context and acceptance criteria for the judge to evaluate output without guessing.
When should I not use do-and-judge?
Do not use do-and-judge for open-ended exploration, loose ideation, or tasks where success is hard to define. It is also a poor fit when you want the orchestrator to directly edit files or run tools, because the skill is built around role separation and verification.
How does it fit into Workflow Automation?
It fits best as a control layer for single, bounded jobs inside a larger automation system. If your workflow already has explicit checks, the skill adds value by structuring the agent loop; if your workflow has no acceptance criteria, the judge step will be too vague to help.
How to Improve do-and-judge skill
Give the judge better criteria
The biggest quality gain comes from stronger evaluation input. When using do-and-judge, specify what “good” means in observable terms: required behavior, forbidden changes, coverage targets, formatting constraints, or compatibility rules. The more concrete the criteria, the less likely the judge is to approve a weak result.
Reduce common failure modes
The most common failure is underspecified scope. If the task is too broad, the implementation agent may optimize the wrong thing and the judge will only catch it late. Another failure mode is hidden constraints, such as backward compatibility, naming conventions, or environment limits, so include those up front instead of expecting the retry loop to infer them.
Iterate on the first output
If the first run misses the mark, do not restate the same task. Feed back the judge’s exact failures, tighten the acceptance criteria, and remove ambiguous language. For do-and-judge usage, the second attempt should be narrower and more testable than the first.
Improve fit before re-running
If you are adapting do-and-judge for another repository or agent stack, align the orchestration rules with your tooling first. Check whether your setup can actually support isolated implementation, independent judging, and bounded retries; if not, simplify the pattern rather than forcing it.
