W

python-resilience

by wshobson

python-resilience is a guidance skill for safer Python failure handling with retries, exponential backoff, jitter, timeouts, and bounded retry windows. Use it to install practical resilience patterns for external calls and apply tenacity-style wrappers with clearer retry rules.

Stars32.6k
Favorites0
Comments0
AddedMar 30, 2026
CategoryReliability
Install Command
npx skills add wshobson/agents --skill python-resilience
Curation Score

This skill scores 78/100, which makes it a solid listing candidate for directory users who need Python retry, timeout, and fault-tolerance patterns. The repository evidence shows real operational content with clear triggers, core concepts, and code examples, so an agent can likely apply it with less guesswork than a generic prompt; however, adoption confidence is moderated by the lack of companion files, install guidance, or executable reference assets.

78/100
Strengths
  • Clear triggerability: frontmatter and 'When to Use This Skill' explicitly cover retries, timeouts, transient failures, rate limiting, and circuit breakers.
  • Good practical leverage: the skill includes a Quick Start with concrete Python code using tenacity and resilience concepts like exponential backoff, jitter, and bounded retries.
  • Substantive standalone guidance: SKILL.md is long, structured, and non-placeholder, with multiple headings covering concepts and workflow-oriented advice.
Cautions
  • No support files, scripts, or references are included, so users must translate the guidance into their own project context without executable examples.
  • SKILL.md has no install command or repository/file references, which limits confidence about dependencies, setup, and how the patterns should be integrated in real codebases.
Overview

Overview of python-resilience skill

What python-resilience does

The python-resilience skill helps you design Python code that fails more safely when dependencies are unreliable. Its focus is practical resilience patterns: retries, exponential backoff, jitter, timeouts, bounded retry windows, and fault-tolerant wrappers around external calls.

Who should install this skill

This python-resilience skill is best for developers, platform teams, and agent users working on services that talk to APIs, databases, queues, or other networked systems. It is especially useful when you need code-generation help that goes beyond “add a retry” and instead chooses sane failure-handling boundaries.

The real job to be done

Most users do not need a theory page on reliability. They need working Python patterns for questions like:

  • “Should this error be retried or fail fast?”
  • “What backoff strategy is safe under load?”
  • “Where should timeouts live?”
  • “How do I avoid infinite retries and thundering herds?”
  • “What decorator or helper should wrap this external call?”

The python-resilience skill is valuable because it frames those decisions explicitly instead of treating retry logic as a one-line patch.

What makes it different from a generic prompt

A generic coding prompt may add retries everywhere or ignore the difference between transient and permanent failures. python-resilience for Reliability is more opinionated: retry only retryable failures, add jitter, cap attempts and total time, and treat external boundaries as the main place to add resilience logic.

What the source actually covers

The upstream skill is a single SKILL.md file, but it is substantive. It centers on:

  • transient vs permanent failures
  • exponential backoff
  • jitter
  • bounded retries
  • practical Python examples using tenacity

That makes it lightweight to inspect and fast to adopt, but it also means you should expect guidance rather than a packaged library or test harness.

When this skill is a strong fit

Use python-resilience when you are:

  • calling third-party APIs
  • wrapping flaky network or service interactions
  • building microservices or workers
  • adding reliability controls to shared client utilities
  • handling rate limiting, temporary outages, or intermittent timeouts

If your code is mostly pure in-process logic, this skill is probably not the highest-leverage install.

How to Use python-resilience skill

Install context for python-resilience

Install the skill from the wshobson/agents repository:

npx skills add https://github.com/wshobson/agents --skill python-resilience

After installation, open the skill file first:

  • plugins/python-development/skills/python-resilience/SKILL.md

This repository area appears to contain only the skill document, so adoption is straightforward: read the skill, then apply its patterns in your own codebase.

Read this file first

Start with SKILL.md from top to bottom. The highest-value sections to review first are:

  1. When to Use This Skill
  2. Core Concepts
  3. Quick Start

That reading order gives you fit, design rules, and implementation shape before you ask the model to modify your code.

What input the skill needs from you

The python-resilience usage quality depends heavily on the context you provide. Before invoking the skill, gather:

  • the function or service boundary being protected
  • the dependency type: HTTP API, DB, queue, cache, filesystem
  • the exact exceptions or failure symptoms observed
  • whether failures are transient or permanent
  • timeout expectations
  • idempotency constraints
  • max acceptable latency
  • retry budget: attempts or total duration
  • whether many clients may retry at once

Without these inputs, the model will often produce overly broad retry logic.

Turn a rough goal into a strong prompt

Weak prompt:

Add resilience to this Python API client.

Better prompt:

Use the python-resilience skill to refactor this Python client method.

Context:
- Dependency: third-party HTTP API
- Library: httpx
- Traffic: moderate, bursty
- Common failures: read timeout, connect timeout, occasional 429 and 503
- Permanent failures: 400, 401, 403 should not be retried
- Idempotency: safe to retry GET requests only
- SLO: fail within 8 seconds total
- Requirement: use bounded retries, exponential backoff with jitter, and clear logging

Task:
- Propose a retry policy
- Implement the wrapper/decorator
- Explain which exceptions and status codes are retryable
- Show where timeout configuration should live

This works better because it gives the skill the decision boundaries it is designed to reason about.

Ask for policy before code

A strong python-resilience guide workflow is:

  1. ask for failure classification
  2. ask for a retry/timeout policy
  3. review tradeoffs
  4. then generate implementation code

This avoids jumping straight into decorators before deciding what should and should not be retried.

Use the skill at external boundaries

The skill is most effective when applied to code that crosses process or network boundaries, such as:

  • httpx or requests calls
  • message publishing or consumption
  • database queries with known transient failure modes
  • cloud SDK calls
  • service client methods

Do not start by wrapping large business workflows end to end. Put resilience controls around the unstable dependency first.

What good python-resilience output should include

When the skill is working well, the output should usually include:

  • explicit transient vs permanent failure rules
  • finite retry limits
  • exponential backoff
  • jitter
  • timeout placement
  • examples using Python tooling such as tenacity
  • notes on idempotency and side effects

If the result only says “retry 3 times,” ask for a more explicit retry policy.

Practical implementation pattern to request

The source skill includes a tenacity-based quick start. In practice, that means you can ask for patterns like:

  • a decorator around a service client method
  • a wrapper helper for all outbound HTTP calls
  • separate read vs write retry policies
  • retries filtered by exception type or status code

For mutation operations, ask the model to justify retry safety. Reliability patterns that ignore idempotency can create duplicate side effects.

Common mistakes during python-resilience usage

Watch for these issues in generated code:

  • retrying authentication or validation failures
  • no timeout, only retries
  • retry loops with no total budget
  • backoff without jitter
  • wrapping too much code, hiding root cause
  • retrying non-idempotent writes by default

These are the practical blockers that matter more than code style.

A useful workflow in your repo

For best results, provide the skill with:

  • the current client function
  • the exception classes you see in logs
  • sample status codes
  • your latency or retry budget
  • one or two representative failure traces

Then ask for:

  1. policy summary
  2. code changes
  3. tests you should add
  4. monitoring fields to log

That sequence usually produces better adoption-ready output than asking for code alone.

python-resilience skill FAQ

Is python-resilience only for web APIs?

No. The python-resilience skill is broadly about unreliable dependencies. HTTP calls are the easiest example, but the same reasoning applies to queues, databases, caches, and cloud services where transient failures are common.

Is this a library or a guidance skill?

It is a guidance skill, not a standalone Python package. It teaches patterns and shows implementation style, including tenacity-based examples, but you still apply those patterns inside your own codebase.

When should I not use python-resilience?

Do not use python-resilience as a default layer over every function. It is a poor fit for:

  • pure CPU-bound local logic
  • errors that are clearly permanent
  • workflows where retries would duplicate unsafe side effects
  • systems where latency budgets are too tight for retry windows

In those cases, fail fast or redesign the integration instead.

Is python-resilience suitable for beginners?

Yes, if you already know basic Python and exception handling. The skill’s core ideas are accessible, but the user still needs to supply business context like retry safety, timeout budgets, and which failures are acceptable to retry.

How is this better than asking an LLM for retries?

The advantage of python-resilience for Reliability is not just code generation. It helps the model reason about failure categories, bounded retries, and backoff behavior. Generic prompts often miss those boundaries and produce retry logic that is unsafe or noisy under load.

Does python-resilience choose the exact retry policy for me?

Not automatically. It provides a strong pattern vocabulary, but the best policy depends on your dependency behavior, latency requirements, and idempotency rules. You should expect to tune attempts, wait ranges, and retry filters to your environment.

How to Improve python-resilience skill

Give the skill better failure classification

The fastest way to improve python-resilience results is to specify which failures are transient and which are permanent. For example:

  • transient: ConnectTimeout, ReadTimeout, 503, some 429
  • permanent: 400, 401, 403, schema errors, bad credentials

This single distinction usually determines whether the generated policy is safe.

Provide latency and retry budgets

If you do not provide a budget, the model may choose arbitrary retry counts. State limits like:

  • max 3 attempts
  • total retry window under 8 seconds
  • single request timeout 2 seconds
  • background job can tolerate 30 seconds total

These constraints produce more realistic code.

Tell it whether operations are idempotent

Many resilience mistakes come from missing side-effect context. Improve python-resilience usage by labeling operations as:

  • safe to retry
  • safe only with idempotency key
  • unsafe to retry automatically

That changes both the decorator design and the exception filters.

Ask for explicit non-retry rules

Do not only ask “what should be retried?” Also ask:

  • what should fail fast?
  • what should be surfaced to callers immediately?
  • what should be logged but not retried?

This makes the output much more production-usable.

Request observability with the implementation

A good python-resilience guide should not stop at decorators. Ask the model to add:

  • attempt count in logs
  • exception type
  • elapsed time
  • final failure reason
  • retry exhaustion message

Without this, your resilience layer may hide why calls are failing.

Iterate after the first draft

After the first output, refine with concrete feedback such as:

  • “Do not retry POST requests.”
  • “Cap total time, not just attempts.”
  • “Handle 429 differently from 500.”
  • “Use jitter to avoid synchronized retries.”
  • “Separate timeout config from retry config.”

This kind of iteration materially improves the implementation.

Test the failure paths the skill proposes

Ask the model to generate tests for:

  • transient exception retries
  • permanent exception fast-fail behavior
  • retry exhaustion
  • timeout enforcement
  • backoff policy boundaries

Resilience code that is not tested is easy to misconfigure and hard to trust.

Improve the skill output with real traces

If you have logs or sample stack traces, include them. Real failure evidence helps python-resilience recommend narrower exception filters and more believable timeout/backoff settings than abstract prompts do.

Keep the abstraction level modest

A common failure mode is asking the skill to design a full resilience framework when you only need a reliable client wrapper. Start smaller:

  • one function
  • one dependency
  • one retry policy

Then expand after the pattern proves useful.

Use python-resilience as a review lens

Even if you already wrote the code, python-resilience is useful as a reviewer prompt. Ask it to inspect existing retry logic for:

  • unbounded retries
  • missing jitter
  • bad timeout placement
  • retrying permanent failures
  • hidden side-effect risks

That review-oriented use case is often the highest-value way to apply the skill in mature codebases.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...