by wshobson
python-observability helps you instrument Python services with structured logging, metrics, traces, correlation IDs, and bounded-cardinality patterns for production debugging and safer observability rollouts.
by wshobson
python-observability helps you instrument Python services with structured logging, metrics, traces, correlation IDs, and bounded-cardinality patterns for production debugging and safer observability rollouts.
by wshobson
Use the slo-implementation skill to define SLIs, SLOs, error budgets, and burn-rate alerts for Reliability work. It helps teams turn service goals into measurable targets with PromQL-style examples and practical guidance from SKILL.md.
by wshobson
Use the distributed-tracing skill to design and explain request tracing across microservices with Jaeger and Tempo. Covers install basics, trace and span concepts, Kubernetes setup patterns, context propagation, and practical usage for observability and latency debugging.
by wshobson
postmortem-writing helps teams create blameless incident postmortems with timelines, root cause analysis, contributing factors, impact, and actionable follow-up items for report writing after outages or near-misses.
by wshobson
Learn the on-call-handoff-patterns skill for reliable shift transitions. Use it to structure incident handoffs, capture active issues, recent changes, escalation state, and next actions for Reliability teams.
by wshobson
incident-runbook-templates helps teams create structured incident response runbooks with clear triage, mitigation, escalation, communication, and recovery steps for outages and operational Playbooks.
by mukul975
The conducting-post-incident-lessons-learned skill helps Incident Response teams run structured after-action reviews, build factual timelines, identify root causes, capture what worked and failed, and turn each incident into measurable improvements with owners, deadlines, and playbook updates.