O

systematic-debugging

by obra

systematic-debugging is a root-cause-first debugging skill for bugs, flaky tests, build failures, and unexpected behavior. Learn the four-phase workflow, companion files, and when to use it before proposing fixes.

Stars121.8k
Favorites0
Comments0
AddedMar 29, 2026
CategoryDebugging
Install Command
npx skills add obra/superpowers --skill systematic-debugging
Curation Score

This skill scores 84/100, which means it is a solid directory listing candidate for users who want a reusable debugging process agents can trigger reliably. The repository provides substantial workflow content, explicit decision rules, and practical companion files, so installers can expect more leverage and less guesswork than a generic "debug this" prompt, though packaging and onboarding are a bit rough around the edges.

84/100
Strengths
  • Very strong triggerability: the description and "When to Use" section clearly tell agents to invoke it for bugs, test failures, performance issues, build failures, and other unexpected behavior.
  • Operationally concrete: the skill defines a mandatory four-phase workflow with hard rules like "NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST," reducing guesswork versus a generic debugging prompt.
  • Includes reusable companion materials and examples such as root-cause tracing, condition-based waiting, defense-in-depth validation, and a polluter-finding shell script.
Cautions
  • Some support files look like internal examples/tests (for example CREATION-LOG and test-* docs), which may make the package feel less streamlined for first-time adopters.
  • There is no install command in SKILL.md, so users must infer adoption/setup from the parent repo rather than this skill page alone.
Overview

Overview of systematic-debugging skill

The systematic-debugging skill is a structured debugging workflow for agents and developers who want fewer guess-fixes and faster root-cause resolution. Its central rule is simple: investigate before you change code. That makes it a strong fit for test failures, flaky behavior, production bugs, build issues, and integration problems where a “quick fix” is likely to hide the real problem.

Who this skill is best for

Use systematic-debugging if you:

  • keep seeing failed fixes or partial fixes
  • are debugging under time pressure
  • need an AI assistant to slow down and investigate instead of patching symptoms
  • want a repeatable process for bugs, flaky tests, and unexpected behavior

It is especially useful when the issue looks obvious but you do not yet know why it happens.

What job the skill actually helps you do

The real job is not “produce a fix.” It is:

  1. reproduce the issue,
  2. trace the cause,
  3. test one explanation at a time,
  4. only then implement a targeted fix.

That sounds slower, but it usually reduces rework, failed patches, and new bugs introduced by guessing.

What makes systematic-debugging different

Most ordinary prompts jump from symptom to solution. The systematic-debugging skill is opinionated about not doing that. The repo emphasizes an “Iron Law” style workflow: no fixes without root-cause investigation first.

The surrounding files also make the skill more practical than a generic debugging checklist:

  • root-cause-tracing.md helps when the visible error is far from the real source
  • condition-based-waiting.md helps with flaky async tests caused by arbitrary delays
  • defense-in-depth.md helps turn one-off validation fixes into structural prevention
  • find-polluter.sh helps isolate tests that leak files or state

Best-fit issue types

systematic-debugging for Debugging is a strong fit for:

  • failing tests you cannot explain yet
  • flaky tests in CI
  • bugs where previous fixes did not stick
  • errors that appear deep in the stack
  • bad state, leaked files, wrong paths, race conditions, and timing issues

When this skill is a weak fit

Skip it only when the task is not really debugging, such as:

  • straightforward feature work
  • routine refactors with no failing behavior to explain
  • purely cosmetic changes

Even then, if you are reacting to “unexpected behavior,” systematic-debugging is usually the safer starting point.

How to Use systematic-debugging skill

Install context for systematic-debugging

If you use the Skills CLI pattern shown across this ecosystem, install with:

npx skills add https://github.com/obra/superpowers --skill systematic-debugging

Then invoke it from your agent environment or use its process manually by reading the source files in the skill folder.

Read these files first

For a fast, high-signal systematic-debugging guide, read in this order:

  1. skills/systematic-debugging/SKILL.md
  2. skills/systematic-debugging/root-cause-tracing.md
  3. skills/systematic-debugging/condition-based-waiting.md
  4. skills/systematic-debugging/defense-in-depth.md
  5. skills/systematic-debugging/find-polluter.sh

Why this order:

  • SKILL.md gives the mandatory four-phase workflow
  • root-cause-tracing.md helps when symptoms appear far downstream
  • condition-based-waiting.md gives a concrete fix pattern for flaky async tests
  • defense-in-depth.md helps harden the final fix
  • find-polluter.sh is a practical isolator for test pollution

What inputs the skill needs from you

The quality of systematic-debugging usage depends heavily on the inputs you provide. Give the skill:

  • exact error message
  • stack trace
  • reproduction steps
  • expected vs actual behavior
  • environment details like OS, runtime, test runner, CI-only or local-only
  • recent code changes
  • whether the issue is deterministic or flaky
  • what you already tried

Without these, the model is more likely to speculate.

Turn a rough bug report into a strong prompt

Weak prompt:

Test is failing. Help fix it.

Stronger prompt:

Use systematic-debugging on this failing test. Do not propose a fix until root cause investigation is complete. Here is the exact error, stack trace, reproduction command, recent changes, and the one behavior difference between local and CI. Identify likely root causes, suggest the minimum investigation steps, and keep hypotheses separate.

That prompt works better because it asks for investigation output before implementation.

A practical prompt template

Use this structure for systematic-debugging usage:

  • Issue: what failed
  • Reproduction: exact command or steps
  • Evidence: logs, trace, screenshots, failing assertion
  • Scope: local, CI, one machine, all environments
  • Recent changes: commits, dependency bumps, config edits
  • Constraints: cannot change API, need minimal patch, deadline, etc.
  • Request: investigate root cause first, then propose one fix

Example:

Use systematic-debugging for this Jest failure. Repro: npm test src/foo.test.ts. Error: Timeout waiting for TOOL_RESULT event after 5000ms. It fails in CI and under parallel runs, not always locally. We recently changed thread event handling. First investigate root cause, then propose one focused fix and one validation plan.

Follow the four-phase workflow in order

The repo centers on four phases. In practice, use them like this:

  1. Root cause investigation
    Read the error carefully, reproduce reliably, inspect what changed, and gather evidence.
  2. Pattern analysis
    Look for timing, environment, input-shape, state-leak, or call-chain patterns.
  3. Hypothesis testing
    Form one explanation at a time and test it. Avoid changing multiple variables at once.
  4. Implementation
    Only after evidence supports a cause, make the fix and verify it.

If Phase 1 is weak, every later step gets worse.

How to use systematic-debugging on flaky tests

This repo gives unusually practical help here. If a test relies on sleep, setTimeout, or arbitrary waits, inspect condition-based-waiting.md and condition-based-waiting-example.ts.

The key shift:

  • bad pattern: guessing how long async work takes
  • better pattern: wait for the condition that proves completion

That matters because many “random” failures are actually race conditions hidden by timing guesses.

How to use it when the symptom is downstream

If the error appears deep in a stack or far from where bad data originated, use root-cause-tracing.md. The workflow is:

  • identify the immediate failing line
  • trace one caller up
  • keep tracing until you find where the wrong state first appeared
  • fix at the source, not only at the crash site

This is one of the most valuable parts of the systematic-debugging skill, because many bugs are symptoms of earlier invalid state.

How to use the polluter finder

For tests that leave behind files, directories, or state, find-polluter.sh is worth reading before improvising your own loop.

Usage pattern:
./find-polluter.sh <file_or_dir_to_check> <test_pattern>

Example from the script:
./find-polluter.sh '.git' 'src/**/*.test.ts'

This is useful when the failure is caused by hidden test pollution rather than the test that visibly fails.

Common workflow that gets the best results

A reliable workflow for systematic-debugging install and first use:

  1. install the skill
  2. read SKILL.md
  3. collect exact failure evidence
  4. ask the agent to investigate without fixing
  5. choose the most evidence-backed hypothesis
  6. test only that hypothesis
  7. implement one focused fix
  8. add validation or defense-in-depth if the bug came from invalid data or multiple entry paths

This prevents the most common failure mode: changing code before understanding the failure.

What not to do with this skill

Do not ask systematic-debugging to:

  • brainstorm many fixes immediately
  • rewrite large areas before reproducing the bug
  • “just make the test pass” without explanation
  • patch several suspected causes at once

Those shortcuts directly conflict with the skill’s design and reduce output quality.

systematic-debugging skill FAQ

Is systematic-debugging only for complex bugs?

No. The repo’s stance is that even simple bugs still have root causes. The skill is most valuable when a problem looks simple enough to tempt a quick patch.

How is this different from a normal debugging prompt?

A normal prompt often rewards speed and speculative fixes. systematic-debugging forces the model to separate investigation, hypothesis, and implementation. That usually produces fewer incorrect patches and better explanations.

Is systematic-debugging beginner-friendly?

Yes, if you can provide concrete evidence. The process is strict, but the steps are understandable: reproduce, inspect, trace, test one idea, then fix. Beginners may actually benefit more because it prevents random trial-and-error.

When should I not use systematic-debugging?

Do not use it as your primary pattern for:

  • feature ideation
  • architecture brainstorming
  • code generation unrelated to a failure
  • purely visual tweaks with no broken behavior to explain

Use it when something is wrong and you need the cause, not just a patch.

Does systematic-debugging help with CI-only failures?

Yes. It is well suited to CI-only or load-sensitive failures because it pushes you to compare environments, reproduce conditions, and inspect timing and state assumptions instead of guessing.

Can it help with flaky async tests?

Yes, and this repo is stronger than average on that point. condition-based-waiting.md and the example TypeScript file give a concrete path away from arbitrary waits and toward condition-based synchronization.

Does the skill include tooling or only advice?

Mostly process guidance, plus a few concrete companion files. The most practical helper is find-polluter.sh, and the condition-based waiting example is directly reusable for some TypeScript test setups.

Can I use systematic-debugging with any stack?

Mostly yes. The core method is stack-agnostic. The examples lean toward TypeScript, shell, and test workflows, but the investigation process works across languages and frameworks.

How to Improve systematic-debugging skill

Give better evidence before asking for a fix

The biggest lever is input quality. For better systematic-debugging results, include:

  • one exact reproduction command
  • one exact error block
  • one minimal failing test or file
  • what changed recently
  • whether the issue is always reproducible

That helps the skill operate on evidence instead of inference.

Ask for investigation output before implementation

A high-performing prompt explicitly blocks early fixing. For example:

Use systematic-debugging. First produce root-cause investigation findings and the top 2 hypotheses with evidence for each. Do not suggest code changes yet.

This improves answer quality because it creates a checkpoint between symptom reading and code editing.

Force one hypothesis at a time

A common failure mode is mixing several possible causes into one patch. Ask for:

  • the leading hypothesis
  • the smallest test that would falsify it
  • what result would confirm it

That keeps the workflow aligned with the skill’s intent.

Improve prompts for flaky-test scenarios

When using systematic-debugging for Debugging on flaky tests, provide:

  • pass/fail frequency
  • whether failures correlate with parallelism or CI
  • any use of sleeps, waits, retries, or polling
  • the exact event or state the test is trying to observe

This makes it much easier for the model to recognize when condition-based-waiting.md is the relevant companion pattern.

Use source-adjacent files, not just SKILL.md

If the first output feels generic, point the model at the supporting docs:

  • root-cause-tracing.md for downstream symptoms
  • condition-based-waiting.md for timing/race issues
  • defense-in-depth.md for validation strategy
  • find-polluter.sh for test pollution

The skill gets better when the agent uses the specialized companion material, not just the headline workflow.

Tighten the scope after the first pass

If the first result is broad, refine with:

  • the exact subsystem to inspect
  • the suspected boundary where bad data enters
  • the first commit where the issue appeared
  • the smallest failing reproduction

Broad debugging prompts often produce broad debugging plans. Narrow scope produces better root-cause work.

Improve the final fix, not just the diagnosis

After root cause is found, ask for:

  • the minimal fix
  • one regression test
  • one validation layer that prevents recurrence

This is where defense-in-depth.md becomes useful. If the bug came from invalid inputs or bypassable assumptions, a single patch may not be enough.

Watch for these common failure modes

Poor systematic-debugging usage usually comes from:

  • incomplete error text
  • no reliable reproduction
  • asking for fixes too early
  • changing multiple things between test runs
  • treating the first plausible explanation as proven

If you avoid those, the skill becomes much more valuable than a generic “debug this” prompt.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...