python-resilience
by wshobsonpython-resilience is a guidance skill for safer Python failure handling with retries, exponential backoff, jitter, timeouts, and bounded retry windows. Use it to install practical resilience patterns for external calls and apply tenacity-style wrappers with clearer retry rules.
This skill scores 78/100, which makes it a solid listing candidate for directory users who need Python retry, timeout, and fault-tolerance patterns. The repository evidence shows real operational content with clear triggers, core concepts, and code examples, so an agent can likely apply it with less guesswork than a generic prompt; however, adoption confidence is moderated by the lack of companion files, install guidance, or executable reference assets.
- Clear triggerability: frontmatter and 'When to Use This Skill' explicitly cover retries, timeouts, transient failures, rate limiting, and circuit breakers.
- Good practical leverage: the skill includes a Quick Start with concrete Python code using tenacity and resilience concepts like exponential backoff, jitter, and bounded retries.
- Substantive standalone guidance: SKILL.md is long, structured, and non-placeholder, with multiple headings covering concepts and workflow-oriented advice.
- No support files, scripts, or references are included, so users must translate the guidance into their own project context without executable examples.
- SKILL.md has no install command or repository/file references, which limits confidence about dependencies, setup, and how the patterns should be integrated in real codebases.
Overview of python-resilience skill
What python-resilience does
The python-resilience skill helps you design Python code that fails more safely when dependencies are unreliable. Its focus is practical resilience patterns: retries, exponential backoff, jitter, timeouts, bounded retry windows, and fault-tolerant wrappers around external calls.
Who should install this skill
This python-resilience skill is best for developers, platform teams, and agent users working on services that talk to APIs, databases, queues, or other networked systems. It is especially useful when you need code-generation help that goes beyond “add a retry” and instead chooses sane failure-handling boundaries.
The real job to be done
Most users do not need a theory page on reliability. They need working Python patterns for questions like:
- “Should this error be retried or fail fast?”
- “What backoff strategy is safe under load?”
- “Where should timeouts live?”
- “How do I avoid infinite retries and thundering herds?”
- “What decorator or helper should wrap this external call?”
The python-resilience skill is valuable because it frames those decisions explicitly instead of treating retry logic as a one-line patch.
What makes it different from a generic prompt
A generic coding prompt may add retries everywhere or ignore the difference between transient and permanent failures. python-resilience for Reliability is more opinionated: retry only retryable failures, add jitter, cap attempts and total time, and treat external boundaries as the main place to add resilience logic.
What the source actually covers
The upstream skill is a single SKILL.md file, but it is substantive. It centers on:
- transient vs permanent failures
- exponential backoff
- jitter
- bounded retries
- practical Python examples using
tenacity
That makes it lightweight to inspect and fast to adopt, but it also means you should expect guidance rather than a packaged library or test harness.
When this skill is a strong fit
Use python-resilience when you are:
- calling third-party APIs
- wrapping flaky network or service interactions
- building microservices or workers
- adding reliability controls to shared client utilities
- handling rate limiting, temporary outages, or intermittent timeouts
If your code is mostly pure in-process logic, this skill is probably not the highest-leverage install.
How to Use python-resilience skill
Install context for python-resilience
Install the skill from the wshobson/agents repository:
npx skills add https://github.com/wshobson/agents --skill python-resilience
After installation, open the skill file first:
plugins/python-development/skills/python-resilience/SKILL.md
This repository area appears to contain only the skill document, so adoption is straightforward: read the skill, then apply its patterns in your own codebase.
Read this file first
Start with SKILL.md from top to bottom. The highest-value sections to review first are:
When to Use This SkillCore ConceptsQuick Start
That reading order gives you fit, design rules, and implementation shape before you ask the model to modify your code.
What input the skill needs from you
The python-resilience usage quality depends heavily on the context you provide. Before invoking the skill, gather:
- the function or service boundary being protected
- the dependency type: HTTP API, DB, queue, cache, filesystem
- the exact exceptions or failure symptoms observed
- whether failures are transient or permanent
- timeout expectations
- idempotency constraints
- max acceptable latency
- retry budget: attempts or total duration
- whether many clients may retry at once
Without these inputs, the model will often produce overly broad retry logic.
Turn a rough goal into a strong prompt
Weak prompt:
Add resilience to this Python API client.
Better prompt:
Use the python-resilience skill to refactor this Python client method.
Context:
- Dependency: third-party HTTP API
- Library: httpx
- Traffic: moderate, bursty
- Common failures: read timeout, connect timeout, occasional 429 and 503
- Permanent failures: 400, 401, 403 should not be retried
- Idempotency: safe to retry GET requests only
- SLO: fail within 8 seconds total
- Requirement: use bounded retries, exponential backoff with jitter, and clear logging
Task:
- Propose a retry policy
- Implement the wrapper/decorator
- Explain which exceptions and status codes are retryable
- Show where timeout configuration should live
This works better because it gives the skill the decision boundaries it is designed to reason about.
Ask for policy before code
A strong python-resilience guide workflow is:
- ask for failure classification
- ask for a retry/timeout policy
- review tradeoffs
- then generate implementation code
This avoids jumping straight into decorators before deciding what should and should not be retried.
Use the skill at external boundaries
The skill is most effective when applied to code that crosses process or network boundaries, such as:
httpxorrequestscalls- message publishing or consumption
- database queries with known transient failure modes
- cloud SDK calls
- service client methods
Do not start by wrapping large business workflows end to end. Put resilience controls around the unstable dependency first.
What good python-resilience output should include
When the skill is working well, the output should usually include:
- explicit transient vs permanent failure rules
- finite retry limits
- exponential backoff
- jitter
- timeout placement
- examples using Python tooling such as
tenacity - notes on idempotency and side effects
If the result only says “retry 3 times,” ask for a more explicit retry policy.
Practical implementation pattern to request
The source skill includes a tenacity-based quick start. In practice, that means you can ask for patterns like:
- a decorator around a service client method
- a wrapper helper for all outbound HTTP calls
- separate read vs write retry policies
- retries filtered by exception type or status code
For mutation operations, ask the model to justify retry safety. Reliability patterns that ignore idempotency can create duplicate side effects.
Common mistakes during python-resilience usage
Watch for these issues in generated code:
- retrying authentication or validation failures
- no timeout, only retries
- retry loops with no total budget
- backoff without jitter
- wrapping too much code, hiding root cause
- retrying non-idempotent writes by default
These are the practical blockers that matter more than code style.
A useful workflow in your repo
For best results, provide the skill with:
- the current client function
- the exception classes you see in logs
- sample status codes
- your latency or retry budget
- one or two representative failure traces
Then ask for:
- policy summary
- code changes
- tests you should add
- monitoring fields to log
That sequence usually produces better adoption-ready output than asking for code alone.
python-resilience skill FAQ
Is python-resilience only for web APIs?
No. The python-resilience skill is broadly about unreliable dependencies. HTTP calls are the easiest example, but the same reasoning applies to queues, databases, caches, and cloud services where transient failures are common.
Is this a library or a guidance skill?
It is a guidance skill, not a standalone Python package. It teaches patterns and shows implementation style, including tenacity-based examples, but you still apply those patterns inside your own codebase.
When should I not use python-resilience?
Do not use python-resilience as a default layer over every function. It is a poor fit for:
- pure CPU-bound local logic
- errors that are clearly permanent
- workflows where retries would duplicate unsafe side effects
- systems where latency budgets are too tight for retry windows
In those cases, fail fast or redesign the integration instead.
Is python-resilience suitable for beginners?
Yes, if you already know basic Python and exception handling. The skill’s core ideas are accessible, but the user still needs to supply business context like retry safety, timeout budgets, and which failures are acceptable to retry.
How is this better than asking an LLM for retries?
The advantage of python-resilience for Reliability is not just code generation. It helps the model reason about failure categories, bounded retries, and backoff behavior. Generic prompts often miss those boundaries and produce retry logic that is unsafe or noisy under load.
Does python-resilience choose the exact retry policy for me?
Not automatically. It provides a strong pattern vocabulary, but the best policy depends on your dependency behavior, latency requirements, and idempotency rules. You should expect to tune attempts, wait ranges, and retry filters to your environment.
How to Improve python-resilience skill
Give the skill better failure classification
The fastest way to improve python-resilience results is to specify which failures are transient and which are permanent. For example:
- transient:
ConnectTimeout,ReadTimeout,503, some429 - permanent:
400,401,403, schema errors, bad credentials
This single distinction usually determines whether the generated policy is safe.
Provide latency and retry budgets
If you do not provide a budget, the model may choose arbitrary retry counts. State limits like:
- max 3 attempts
- total retry window under 8 seconds
- single request timeout 2 seconds
- background job can tolerate 30 seconds total
These constraints produce more realistic code.
Tell it whether operations are idempotent
Many resilience mistakes come from missing side-effect context. Improve python-resilience usage by labeling operations as:
- safe to retry
- safe only with idempotency key
- unsafe to retry automatically
That changes both the decorator design and the exception filters.
Ask for explicit non-retry rules
Do not only ask “what should be retried?” Also ask:
- what should fail fast?
- what should be surfaced to callers immediately?
- what should be logged but not retried?
This makes the output much more production-usable.
Request observability with the implementation
A good python-resilience guide should not stop at decorators. Ask the model to add:
- attempt count in logs
- exception type
- elapsed time
- final failure reason
- retry exhaustion message
Without this, your resilience layer may hide why calls are failing.
Iterate after the first draft
After the first output, refine with concrete feedback such as:
- “Do not retry POST requests.”
- “Cap total time, not just attempts.”
- “Handle 429 differently from 500.”
- “Use jitter to avoid synchronized retries.”
- “Separate timeout config from retry config.”
This kind of iteration materially improves the implementation.
Test the failure paths the skill proposes
Ask the model to generate tests for:
- transient exception retries
- permanent exception fast-fail behavior
- retry exhaustion
- timeout enforcement
- backoff policy boundaries
Resilience code that is not tested is easy to misconfigure and hard to trust.
Improve the skill output with real traces
If you have logs or sample stack traces, include them. Real failure evidence helps python-resilience recommend narrower exception filters and more believable timeout/backoff settings than abstract prompts do.
Keep the abstraction level modest
A common failure mode is asking the skill to design a full resilience framework when you only need a reliable client wrapper. Start smaller:
- one function
- one dependency
- one retry policy
Then expand after the pattern proves useful.
Use python-resilience as a review lens
Even if you already wrote the code, python-resilience is useful as a reviewer prompt. Ask it to inspect existing retry logic for:
- unbounded retries
- missing jitter
- bad timeout placement
- retrying permanent failures
- hidden side-effect risks
That review-oriented use case is often the highest-value way to apply the skill in mature codebases.
