systematic-debugging

by obra

systematic-debugging is a root-cause-first debugging skill for bugs, flaky tests, build failures, and unexpected behavior. Learn the four-phase workflow, companion files, and when to use it before proposing fixes.

Stars121.8k

Favorites0

Comments0

AddedMar 29, 2026

CategoryDebugging

Install Command

npx skills add obra/superpowers --skill systematic-debugging

Curation Score

This skill scores 84/100, which means it is a solid directory listing candidate for users who want a reusable debugging process agents can trigger reliably. The repository provides substantial workflow content, explicit decision rules, and practical companion files, so installers can expect more leverage and less guesswork than a generic "debug this" prompt, though packaging and onboarding are a bit rough around the edges.

84/100

Strengths

Very strong triggerability: the description and "When to Use" section clearly tell agents to invoke it for bugs, test failures, performance issues, build failures, and other unexpected behavior.
Operationally concrete: the skill defines a mandatory four-phase workflow with hard rules like "NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST," reducing guesswork versus a generic debugging prompt.
Includes reusable companion materials and examples such as root-cause tracing, condition-based waiting, defense-in-depth validation, and a polluter-finding shell script.

Cautions

Some support files look like internal examples/tests (for example CREATION-LOG and test-* docs), which may make the package feel less streamlined for first-time adopters.
There is no install command in SKILL.md, so users must infer adoption/setup from the parent repo rather than this skill page alone.

Superpowers Codex Cursor Copilot Windsurf OpenCode OpenClaw Testing Workflow Playbook Developer Audience

Overview

Overview of systematic-debugging skill

The systematic-debugging skill is a structured debugging workflow for agents and developers who want fewer guess-fixes and faster root-cause resolution. Its central rule is simple: investigate before you change code. That makes it a strong fit for test failures, flaky behavior, production bugs, build issues, and integration problems where a “quick fix” is likely to hide the real problem.

Who this skill is best for

Use systematic-debugging if you:

keep seeing failed fixes or partial fixes
are debugging under time pressure
need an AI assistant to slow down and investigate instead of patching symptoms
want a repeatable process for bugs, flaky tests, and unexpected behavior

It is especially useful when the issue looks obvious but you do not yet know why it happens.

What job the skill actually helps you do

The real job is not “produce a fix.” It is:

reproduce the issue,
trace the cause,
test one explanation at a time,
only then implement a targeted fix.

That sounds slower, but it usually reduces rework, failed patches, and new bugs introduced by guessing.

What makes systematic-debugging different

Most ordinary prompts jump from symptom to solution. The systematic-debugging skill is opinionated about not doing that. The repo emphasizes an “Iron Law” style workflow: no fixes without root-cause investigation first.

The surrounding files also make the skill more practical than a generic debugging checklist:

root-cause-tracing.md helps when the visible error is far from the real source
condition-based-waiting.md helps with flaky async tests caused by arbitrary delays
defense-in-depth.md helps turn one-off validation fixes into structural prevention
find-polluter.sh helps isolate tests that leak files or state

Best-fit issue types

systematic-debugging for Debugging is a strong fit for:

failing tests you cannot explain yet
flaky tests in CI
bugs where previous fixes did not stick
errors that appear deep in the stack
bad state, leaked files, wrong paths, race conditions, and timing issues

When this skill is a weak fit

Skip it only when the task is not really debugging, such as:

straightforward feature work
routine refactors with no failing behavior to explain
purely cosmetic changes

Even then, if you are reacting to “unexpected behavior,” systematic-debugging is usually the safer starting point.

How to Use systematic-debugging skill

Install context for systematic-debugging

If you use the Skills CLI pattern shown across this ecosystem, install with:

npx skills add https://github.com/obra/superpowers --skill systematic-debugging

Then invoke it from your agent environment or use its process manually by reading the source files in the skill folder.

Read these files first

For a fast, high-signal systematic-debugging guide, read in this order:

skills/systematic-debugging/SKILL.md
skills/systematic-debugging/root-cause-tracing.md
skills/systematic-debugging/condition-based-waiting.md
skills/systematic-debugging/defense-in-depth.md
skills/systematic-debugging/find-polluter.sh

Why this order:

SKILL.md gives the mandatory four-phase workflow
root-cause-tracing.md helps when symptoms appear far downstream
condition-based-waiting.md gives a concrete fix pattern for flaky async tests
defense-in-depth.md helps harden the final fix
find-polluter.sh is a practical isolator for test pollution

What inputs the skill needs from you

The quality of systematic-debugging usage depends heavily on the inputs you provide. Give the skill:

exact error message
stack trace
reproduction steps
expected vs actual behavior
environment details like OS, runtime, test runner, CI-only or local-only
recent code changes
whether the issue is deterministic or flaky
what you already tried

Without these, the model is more likely to speculate.

Turn a rough bug report into a strong prompt

Weak prompt:

Test is failing. Help fix it.

Stronger prompt:

Use systematic-debugging on this failing test. Do not propose a fix until root cause investigation is complete. Here is the exact error, stack trace, reproduction command, recent changes, and the one behavior difference between local and CI. Identify likely root causes, suggest the minimum investigation steps, and keep hypotheses separate.

That prompt works better because it asks for investigation output before implementation.

A practical prompt template

Use this structure for systematic-debugging usage:

Issue: what failed
Reproduction: exact command or steps
Evidence: logs, trace, screenshots, failing assertion
Scope: local, CI, one machine, all environments
Recent changes: commits, dependency bumps, config edits
Constraints: cannot change API, need minimal patch, deadline, etc.
Request: investigate root cause first, then propose one fix

Example:

Use systematic-debugging for this Jest failure. Repro: npm test src/foo.test.ts. Error: Timeout waiting for TOOL_RESULT event after 5000ms. It fails in CI and under parallel runs, not always locally. We recently changed thread event handling. First investigate root cause, then propose one focused fix and one validation plan.

Follow the four-phase workflow in order

The repo centers on four phases. In practice, use them like this:

Root cause investigation
Read the error carefully, reproduce reliably, inspect what changed, and gather evidence.
Pattern analysis
Look for timing, environment, input-shape, state-leak, or call-chain patterns.
Hypothesis testing
Form one explanation at a time and test it. Avoid changing multiple variables at once.
Implementation
Only after evidence supports a cause, make the fix and verify it.

If Phase 1 is weak, every later step gets worse.

How to use systematic-debugging on flaky tests

This repo gives unusually practical help here. If a test relies on sleep, setTimeout, or arbitrary waits, inspect condition-based-waiting.md and condition-based-waiting-example.ts.

The key shift:

bad pattern: guessing how long async work takes
better pattern: wait for the condition that proves completion

That matters because many “random” failures are actually race conditions hidden by timing guesses.

How to use it when the symptom is downstream

If the error appears deep in a stack or far from where bad data originated, use root-cause-tracing.md. The workflow is:

identify the immediate failing line
trace one caller up
keep tracing until you find where the wrong state first appeared
fix at the source, not only at the crash site

This is one of the most valuable parts of the systematic-debugging skill, because many bugs are symptoms of earlier invalid state.

How to use the polluter finder

For tests that leave behind files, directories, or state, find-polluter.sh is worth reading before improvising your own loop.

Usage pattern:
./find-polluter.sh <file_or_dir_to_check> <test_pattern>

Example from the script:
./find-polluter.sh '.git' 'src/**/*.test.ts'

This is useful when the failure is caused by hidden test pollution rather than the test that visibly fails.

Common workflow that gets the best results

A reliable workflow for systematic-debugging install and first use:

install the skill
read SKILL.md
collect exact failure evidence
ask the agent to investigate without fixing
choose the most evidence-backed hypothesis
test only that hypothesis
implement one focused fix
add validation or defense-in-depth if the bug came from invalid data or multiple entry paths

This prevents the most common failure mode: changing code before understanding the failure.

What not to do with this skill

Do not ask systematic-debugging to:

brainstorm many fixes immediately
rewrite large areas before reproducing the bug
“just make the test pass” without explanation
patch several suspected causes at once

Those shortcuts directly conflict with the skill’s design and reduce output quality.

systematic-debugging skill FAQ

Is systematic-debugging only for complex bugs?

No. The repo’s stance is that even simple bugs still have root causes. The skill is most valuable when a problem looks simple enough to tempt a quick patch.

How is this different from a normal debugging prompt?

A normal prompt often rewards speed and speculative fixes. systematic-debugging forces the model to separate investigation, hypothesis, and implementation. That usually produces fewer incorrect patches and better explanations.

Is systematic-debugging beginner-friendly?

Yes, if you can provide concrete evidence. The process is strict, but the steps are understandable: reproduce, inspect, trace, test one idea, then fix. Beginners may actually benefit more because it prevents random trial-and-error.

When should I not use systematic-debugging?

Do not use it as your primary pattern for:

feature ideation
architecture brainstorming
code generation unrelated to a failure
purely visual tweaks with no broken behavior to explain

Use it when something is wrong and you need the cause, not just a patch.

Does systematic-debugging help with CI-only failures?

Yes. It is well suited to CI-only or load-sensitive failures because it pushes you to compare environments, reproduce conditions, and inspect timing and state assumptions instead of guessing.

Can it help with flaky async tests?

Yes, and this repo is stronger than average on that point. condition-based-waiting.md and the example TypeScript file give a concrete path away from arbitrary waits and toward condition-based synchronization.

Does the skill include tooling or only advice?

Mostly process guidance, plus a few concrete companion files. The most practical helper is find-polluter.sh, and the condition-based waiting example is directly reusable for some TypeScript test setups.

Can I use systematic-debugging with any stack?

Mostly yes. The core method is stack-agnostic. The examples lean toward TypeScript, shell, and test workflows, but the investigation process works across languages and frameworks.

How to Improve systematic-debugging skill

Give better evidence before asking for a fix

The biggest lever is input quality. For better systematic-debugging results, include:

one exact reproduction command
one exact error block
one minimal failing test or file
what changed recently
whether the issue is always reproducible

That helps the skill operate on evidence instead of inference.

Ask for investigation output before implementation

A high-performing prompt explicitly blocks early fixing. For example:

Use systematic-debugging. First produce root-cause investigation findings and the top 2 hypotheses with evidence for each. Do not suggest code changes yet.

This improves answer quality because it creates a checkpoint between symptom reading and code editing.

Force one hypothesis at a time

A common failure mode is mixing several possible causes into one patch. Ask for:

the leading hypothesis
the smallest test that would falsify it
what result would confirm it

That keeps the workflow aligned with the skill’s intent.

Improve prompts for flaky-test scenarios

When using systematic-debugging for Debugging on flaky tests, provide:

pass/fail frequency
whether failures correlate with parallelism or CI
any use of sleeps, waits, retries, or polling
the exact event or state the test is trying to observe

This makes it much easier for the model to recognize when condition-based-waiting.md is the relevant companion pattern.

Use source-adjacent files, not just SKILL.md

If the first output feels generic, point the model at the supporting docs:

root-cause-tracing.md for downstream symptoms
condition-based-waiting.md for timing/race issues
defense-in-depth.md for validation strategy
find-polluter.sh for test pollution

The skill gets better when the agent uses the specialized companion material, not just the headline workflow.

Tighten the scope after the first pass

If the first result is broad, refine with:

the exact subsystem to inspect
the suspected boundary where bad data enters
the first commit where the issue appeared
the smallest failing reproduction

Broad debugging prompts often produce broad debugging plans. Narrow scope produces better root-cause work.

Improve the final fix, not just the diagnosis

After root cause is found, ask for:

the minimal fix
one regression test
one validation layer that prevents recurrence

This is where defense-in-depth.md becomes useful. If the bug came from invalid inputs or bypassable assumptions, a single patch may not be enough.

Watch for these common failure modes

Poor systematic-debugging usage usually comes from:

incomplete error text
no reliable reproduction
asking for fixes too early
changing multiple things between test runs
treating the first plausible explanation as proven

If you avoid those, the skill becomes much more valuable than a generic “debug this” prompt.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

web-perf

by cloudflare

web-perf analyzes web performance with Chrome DevTools MCP. It measures Core Web Vitals, trace-based load issues, render-blocking resources, layout shifts, caching problems, and accessibility gaps. Use the web-perf skill for Performance Optimization, debugging slow pages, and evidence-based web-perf guide workflows that rely on current docs and live traces.

Performance Optimization

Favorites 0GitHub 1.3k

playwright-best-practices

by currents-dev

playwright-best-practices is a Playwright + TypeScript skill for writing stable tests, reducing flake, improving auth flows, choosing fixtures vs page objects, and handling CI, popups, mobile, iframes, websockets, and multi-user scenarios with practical repo-backed guidance.

Test Automation

Favorites 0GitHub 174

autofix

by coderabbitai

autofix helps safely turn CodeRabbit PR review-thread feedback into validated code changes on the current GitHub branch. Use this autofix skill when you need a branch-aware CodeRabbit for Code Review workflow with explicit approval, not a generic prompt-following fixer. It checks repo state, reads trusted instructions, and applies only verified fixes.

Code Review

Favorites 0GitHub 0

sentry

by openai

The sentry skill is a read-only Observability tool for inspecting Sentry issues, events, and health signals. Use it to investigate recent production errors, summarize impact, and run repeatable CLI-based queries with structured output. It is best when you need a practical sentry guide for triage, not a broad observability overview.

Observability

Favorites 0GitHub 0

aspire

by github

aspire skill for install, AppHost setup, local run, dashboard debugging, and publish workflows for Deployment. Covers CLI usage, references, troubleshooting, and the key publish-vs-deploy boundary.

Deployment

Favorites 0GitHub 0

property-based-testing

by trailofbits

property-based-testing skill guide for writing, reviewing, and improving PBT across languages and smart contracts. Use this property-based-testing guide to spot roundtrip, idempotence, invariant, parser, validator, and normalization cases, choose generators, and decide when property-based-testing is stronger than example-based tests.

Skill Testing

Favorites 0GitHub 5k

terminal-ops

by affaan-m

terminal-ops is an evidence-first repo execution skill for terminal work. Use it to run commands, inspect git state, debug CI or builds, and make narrow fixes with proof of what changed and what was verified. This terminal-ops guide helps reduce guesswork for Code Editing and repo operations.

Code Editing

Favorites 0GitHub 156.3k

investigate

by garrytan

The investigate skill guides systematic debugging and root-cause analysis for broken, flaky, or unexpected behavior. Use it for code review, incident triage, bug fixes, and "it worked yesterday" cases when you need evidence before changing code. It follows a four-phase workflow: investigate, analyze, hypothesize, implement.

Code Review

Favorites 0GitHub 91.8k

browser-testing-with-devtools

by addyosmani

browser-testing-with-devtools helps agents test and debug real browser behavior through Chrome DevTools MCP. Use it to inspect the DOM, capture console errors, analyze network requests, profile performance, and verify fixes in a live browser.

Test Automation

Favorites 0GitHub 18.7k

libfuzzer

by trailofbits

libfuzzer is a coverage-guided fuzzer for C/C++ projects compiled with Clang. This libfuzzer skill helps you install, understand, and use the workflow for harnessing targets, running sanitizers, and starting a practical security audit with minimal setup.

Security Audit

Favorites 0GitHub 5k

vue-debug-guides

by vuejs-ai

vue-debug-guides is a Vue 3 debugging skill for diagnosing runtime errors, warnings, async component failures, reactivity issues, and SSR or hydration mismatches with targeted reference-based fixes.

Debugging

Favorites 0GitHub 2.1k

ios-simulator-skill

by conorluddy

ios-simulator-skill is a task-focused iOS simulator skill for accessibility-aware app launch, navigation, text entry, gestures, screenshots, state capture, build/test loops, and simulator lifecycle control. It is designed to reduce guesswork for AI agents, QA engineers, and developers working on repeatable iOS test automation.

Test Automation

Favorites 0GitHub 0

datadog-cli

by softaworks

datadog-cli helps agents run Datadog CLI workflows for logs, traces, metrics, services, and dashboards. Learn setup with DD_API_KEY and DD_APP_KEY, use npx @leoflores/datadog-cli commands, and handle --site plus dashboard update safety for incident triage.

Observability

Favorites 0GitHub 0

agent-introspection-debugging

by affaan-m

The agent-introspection-debugging skill provides a structured self-debugging workflow for AI agent failures: capture the failure state, diagnose likely causes, apply a contained recovery step, and produce a human-readable introspection report. Use it for looping, retry-heavy, or drift-prone runs, not routine verification.

Debugging

Favorites 0GitHub 156k

root-cause-tracing

by NeoLabHQ

root-cause-tracing helps you debug failures by tracing backward from the symptom to the original trigger. It is ideal for deep stack errors, misleading outputs, and cases where invalid data, paths, or working directories were introduced earlier. Use it as a root-cause-tracing guide for disciplined debugging and safer fixes.

Debugging

Favorites 0GitHub 982

rust-patterns

by affaan-m

rust-patterns is a practical guide for idiomatic Rust development, covering ownership, error handling, traits, concurrency, module boundaries, and backend-focused design choices for safer, cleaner code.

Backend Development

Favorites 0GitHub 156.2k