incident-runbook-templates

by wshobson

incident-runbook-templates helps teams create structured incident response runbooks with clear triage, mitigation, escalation, communication, and recovery steps for outages and operational Playbooks.

Stars32.5k

Favorites0

Comments0

AddedMar 30, 2026

CategoryPlaybooks

Install Command

npx skills add wshobson/agents --skill incident-runbook-templates

Curation Score

This skill scores 76/100, which makes it a solid directory listing: users get substantial, ready-to-use incident runbook structure and examples, but should expect a document-heavy template skill rather than an executable workflow with tooling or automation support.

76/100

Strengths

Strong triggerability from frontmatter and usage examples, including payment outages, database incidents, and on-call onboarding scenarios.
Substantial operational content: the skill provides production-oriented runbook structure, severity levels, and step-by-step incident response coverage across detection, triage, mitigation, resolution, and communication.
Real install-decision value because the body is extensive and non-placeholder, giving users enough evidence to judge fit for documenting service-specific incident procedures.

Cautions

Adoption is template-driven only: there are no scripts, reference files, resources, or automation helpers to reduce execution guesswork beyond the written guidance.
Repository signals show limited explicit workflow/constraint markers, so agents may still need interpretation when adapting the templates to a team's exact escalation rules and systems.

Incident Runbook Templates Template Playbook Checklist Workflow Sre Monitoring

Overview

Overview of incident-runbook-templates skill

What incident-runbook-templates does

The incident-runbook-templates skill helps you generate structured incident response runbooks for outages, degradations, database issues, and other operational failures. Its value is not just “write me a runbook,” but producing a repeatable format that covers impact, detection, triage, mitigation, escalation, communication, and recovery in a way an on-call engineer can use under pressure.

Who should use this skill

This skill is best for SREs, platform teams, DevOps engineers, engineering managers, and service owners who need consistent Playbooks across teams. It is especially useful if you already know the systems and failure modes but need faster, more standardized documentation.

The real job-to-be-done

Most teams do not struggle to name incidents; they struggle to turn tribal knowledge into clear, 3 AM-friendly procedures. incident-runbook-templates is aimed at that gap: converting rough operational knowledge into a practical runbook with severity framing, step order, and escalation logic.

What makes this different from a generic prompt

A generic prompt can produce incident prose. This skill is better when you want a predictable incident-response shape. The source material clearly emphasizes production-style sections such as severity levels and runbook structure, which reduces prompt design work and makes outputs easier to review, compare, and operationalize.

Best-fit outcomes

Use incident-runbook-templates when you want to:

draft a first version of a service outage runbook
standardize Playbooks across multiple services
document known recovery paths for recurring incidents
onboard new on-call engineers with guided procedures
turn fragmented notes into a consistent incident document

Important limitations before you install

This skill appears to be template-centric. It does not ship with scripts, validation tooling, or service-specific references in the repository path provided. That means output quality depends heavily on the operational details you supply. If your environment lacks clear alerts, owners, thresholds, or recovery steps, the runbook may look complete while remaining operationally weak.

How to Use incident-runbook-templates skill

How to install incident-runbook-templates

Install from the parent repository path:

npx skills add https://github.com/wshobson/agents --skill incident-runbook-templates

If your environment uses a different skills loader, add the skill from the same repository and then confirm the installed skill name is exactly incident-runbook-templates.

What to read first in the repository

Start with plugins/incident-response/skills/incident-runbook-templates/SKILL.md.

That file is the main asset. Based on the repository evidence, there are no extra resources/, rules/, scripts/, or companion references for this skill, so nearly all implementation guidance lives in SKILL.md.

What input the skill needs to work well

The incident-runbook-templates skill performs best when you provide:

service or system name
incident type
user and business impact
symptoms and alert sources
severity model or expected priority
known triage checks
safe mitigation actions
escalation contacts or team roles
communication expectations
exit criteria and post-incident follow-up

If you only ask for “a runbook for database issues,” expect a generic result. If you specify “Postgres primary replication lag with customer write failures and PagerDuty alerts,” the output becomes much more actionable.

Turn a rough goal into a strong incident-runbook-templates prompt

Weak prompt:
Create a runbook for payment service incidents.

Stronger prompt:
Use incident-runbook-templates to draft a runbook for payment API partial outage incidents. Include SEV classification guidance, Datadog alert triggers, first 15-minute triage steps, rollback checks for the last deploy, database dependency validation, when to page the payments team lead, customer communication points, and clear criteria for recovery and incident closure.

The stronger version improves output because it supplies scope, signal sources, time-sensitive actions, dependencies, escalation, and completion rules.

Suggested workflow for Playbooks

A practical workflow for incident-runbook-templates for Playbooks is:

Pick one incident pattern, not a whole domain.
Gather real alert names, dashboards, owners, and mitigation constraints.
Ask the skill for a first-pass runbook using your service context.
Review with an on-call engineer who has handled the issue before.
Add environment-specific commands, links, and safety notes outside the first draft if needed.
Test the runbook against a past incident timeline.
Store the final version where responders will actually find it.

This is a better adoption path than trying to generate a full runbook library in one pass.

How the built-in structure helps during incidents

The source excerpt shows a strong focus on severity levels and a standard runbook structure. That matters because responders need ordered information under stress. A good runbook generated with this skill should move from impact and detection into initial triage, mitigation, escalation, communication, and resolution without forcing the reader to infer the workflow.

Practical prompt fields that improve output quality

Include these fields directly in your prompt when possible:

Service: checkout-api
Incident type: elevated 5xx after deployment
Primary signals: Grafana error-rate alert, synthetic checkout failures
Customer impact: 40% of card payments failing
Dependencies: Postgres, Redis, payment gateway
Known safe actions: rollback app version, drain bad pods
Do not suggest: schema changes during incident
Escalate to: on-call SRE after 15 min, payments lead for SEV1/SEV2
Communications: status page update within 20 minutes for SEV1
Recovery criteria: error rate below 1%, queue backlog normal for 30 min

These details help the skill produce a runbook that is safer and more realistic.

What good incident-runbook-templates usage looks like

Good incident-runbook-templates usage is specific, bounded, and role-aware. The output should tell a responder:

how to recognize the incident
what to check first
what actions are safe
when to escalate
how to communicate
when the incident is actually resolved

If the generated document cannot answer those six questions quickly, your prompt likely lacked operational detail.

Where this skill is most useful in the documentation lifecycle

Use the skill early for first drafts and standardization. It is less valuable as the final authority unless you review and enrich it with real environment details. Think of it as a runbook scaffolding tool, not a substitute for production ownership.

Common adoption blocker: false confidence

The main risk with incident-runbook-templates install is not technical setup. It is assuming a well-formatted runbook is a tested runbook. Because the repository appears to provide templates rather than executable checks, you still need operational review, link validation, and possibly game-day testing before relying on outputs in live incidents.

incident-runbook-templates skill FAQ

Is incident-runbook-templates good for beginners?

Yes, if a beginner is working with a more experienced operator or existing system context. The structure can help newer engineers think through severity, escalation, and recovery. But beginners cannot supply the missing operational truth on their own, so review is essential.

Is this better than asking an AI for a runbook directly?

Usually yes, if you want consistency. The incident-runbook-templates skill gives a clearer response shape than an ordinary freeform prompt. That matters when multiple teams need similar Playbooks or when documents will be reviewed by incident managers.

Does incident-runbook-templates include executable automation?

Not from the repository evidence shown here. There are no support scripts or extra operational assets listed for this skill path. Treat it as a documentation-generation aid, not an automated incident response system.

What kinds of incidents fit best?

Best-fit incidents are recurring, understandable, and operationally bounded:

service outages
dependency failures
replication lag
resource exhaustion
deployment-related regressions
alert-driven degradations

Novel failures with no known response pattern are less suited to template-led generation.

When should I not use incident-runbook-templates?

Skip it when:

you need deep vendor-specific remediation logic already covered elsewhere
your team has no agreed severity or escalation model
the incident type is too broad, like “all infrastructure failures”
you need a tested operational procedure immediately without review time

In those cases, gather system knowledge first or work from an existing internal runbook base.

Can I use incident-runbook-templates for Playbooks across many teams?

Yes, and that is one of the stronger use cases. The skill is well suited to creating a shared format for Playbooks, provided each team fills in service-specific alerts, ownership, and approved actions rather than copying a generic template verbatim.

How to Improve incident-runbook-templates skill

Give the skill operational facts, not abstract intentions

To improve incident-runbook-templates, feed it concrete signals and constraints. “Handle downtime gracefully” is too vague. “If error rate exceeds 20% after deploy, validate pod health, rollback within 10 minutes if no recovery, and page platform on-call” leads to much stronger output.

Narrow the incident scope before generation

One runbook per failure mode usually works better than one giant service runbook. Ask for:

Redis connection saturation
instead of
all cache incidents

Narrow scopes improve triage steps, mitigation safety, and escalation clarity.

Add safety boundaries explicitly

Many incident documents fail because they suggest risky actions too early. Tell the skill what responders must not do during mitigation, such as restarting a stateful cluster, changing schemas, or clearing queues without approval. This materially improves trustworthiness.

Include your severity and escalation model

The source text already emphasizes incident severity levels. Lean into that. If your organization uses custom thresholds, provide them in the prompt so the runbook maps to real paging and communication behavior instead of generic SEV labels.

Ask for decision points, not just sections

A stronger incident-runbook-templates guide request asks for branching logic:

when to rollback vs continue investigation
when to escalate to another team
when customer communication becomes mandatory
when to declare recovery

This turns a static template into a more usable response aid.

Validate against a real past incident

After the first draft, test the runbook on a completed incident. Check whether the generated sequence would have:

detected the issue fast enough
prioritized the right signals
avoided unsafe actions
escalated at the correct time
defined recovery clearly

This is the fastest way to improve both the runbook and your prompts.

Improve outputs by adding role-specific context

If the document is for primary on-call, say so. If it is for incident commanders or support teams, say that instead. Different roles need different detail levels. The skill will produce better Playbooks when you specify the intended operator and decision authority.

Watch for common failure modes

Common weak outputs include:

generic detection steps with no real alerts
mitigation advice that lacks safety checks
escalation sections with no timing or owner
communication guidance with no trigger threshold
recovery criteria that are too vague to verify

When you see these, revise the prompt with missing operational data rather than asking for “more detail” generically.

Iterate with a fill-the-gaps pass

A practical way to improve the first draft:

generate the runbook
mark every placeholder, assumption, or vague action
add missing service facts
rerun only the weak sections
merge into a final reviewed version

This produces cleaner results than repeatedly regenerating the whole document.

Improve incident-runbook-templates adoption in your team

If you want incident-runbook-templates to stick, standardize a prompt intake checklist: service, failure mode, alerts, dependencies, safe actions, escalation, communication, and recovery criteria. Teams that normalize these inputs get much better, more comparable runbooks with less rework.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

executive-onboarding-playbook

by deanpeters

Use the executive-onboarding-playbook skill to build a diagnostic 30-60-90 day plan for new VP Product or CPO leaders. It helps you validate strategy, team health, and hidden risks before making changes, with a practical executive-onboarding-playbook guide for Playbooks users.

Playbooks

Favorites 0GitHub 4.1k

create-boss

by vogtsw

create-boss turns boss chats, notes, emails, and project artifacts into a reusable skill for judgment, managing up, and persona modeling. Install it in Claude Code or OpenClaw to build real manager profiles or entrepreneur archetypes, then refine outputs with correction workflows and reusable Playbooks-ready boss guidance.

Playbooks

Favorites 0GitHub 45

verification-before-completion

by obra

verification-before-completion is a final-check skill that blocks unsupported completion claims. Learn when to use it, how to install it from obra/superpowers, and how to match each status claim to fresh verification evidence.

Skill Validation

Favorites 0GitHub 121.9k

team-communication-protocols

by wshobson

team-communication-protocols defines messaging rules for agent teams, covering direct vs broadcast messages, plan approval, shutdown procedures, and reusable templates for coordinated Agent Orchestration.

Agent Orchestration

Favorites 0GitHub 32.5k

ship-learn-next

by softaworks

ship-learn-next turns transcripts, articles, and tutorials into small Ship → Learn → Next action cycles. Use it to convert source material into a first shippable rep, reflection prompts, and the next iteration, including Playbooks workflows.

Playbooks

Favorites 0GitHub 1.3k

building-soc-playbook-for-ransomware

by mukul975

building-soc-playbook-for-ransomware skill for SOC teams that need a structured ransomware response playbook. It covers detection triggers, containment, eradication, recovery, and audit-ready procedures aligned to NIST SP 800-61 and MITRE ATT&CK. Use it for practical playbook creation, tabletop exercises, and Security Audit support.

Security Audit

Favorites 0GitHub 0

ralph-plan

by mastra-ai

ralph-plan is a planning skill that turns rough engineering requests into structured ralph-loop commands with context, setup, tasks, testing, and iterative clarification.

Requirements Planning

Favorites 0GitHub 22.6k

executing-plans

by obra

executing-plans helps agents follow a written implementation plan: review it first, execute tasks in order, run specified checks, stop on blockers, and hand off to a finishing workflow. Best for Project Management and other plan-led delivery.

Project Management

Favorites 0GitHub 121.8k

steve-jobs-perspective

by alchaincyf

steve-jobs-perspective is a role-driven product critique skill that uses Steve Jobs-style heuristics, research files, and examples to sharpen product decisions, messaging, and strategy.

Playbooks

Favorites 0GitHub 78

pua-en

by tanweai

pua-en is a GitHub skill for escalating stalled AI work with structured troubleshooting, stronger initiative, and clear trigger rules. Use it after repeated failures, passive investigation, or debugging dead ends. Review SKILL.md, install from tanweai/pua, and apply it to code, config, deployment, API, and research tasks when normal prompting is not enough.

Debugging

Favorites 0GitHub 0

pua-ja

by tanweai

pua-ja is a Japanese-language escalation skill that pushes stalled agents to investigate harder, use tools before asking users, and verify results after repeated failures. Best for teams that want a trigger-based behavior layer for debugging, research, writing, and pua-ja for Context Engineering.

Context Engineering

Favorites 0GitHub 0

mama

by tanweai

mama is a narration-style variant of the pua skill that keeps the same core rules but switches to a Chinese nagging-mom voice. Use it to install a reusable trigger pattern for persistent troubleshooting, debugging, and Prompt Writing workflows, with inherited escalation, checklists, and stronger follow-through.

Prompt Writing

Favorites 0GitHub 14.1k

shot

by tanweai

shot is a single-file skill from tanweai/pua for full-context persona injection, role-based prompting, and strong sub-agent usage. Best for Context Engineering experiments, P7/P8/P9/P10 role framing, and self-contained prompt loading through skills/shot/SKILL.md.

Context Engineering

Favorites 0GitHub 0

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k