benchmark

by affaan-m

Use the benchmark skill to measure performance baselines, detect regressions before and after PRs, and compare stack alternatives across pages, APIs, and builds for Performance Optimization.

Stars156.1k

Favorites0

Comments0

AddedApr 15, 2026

CategoryPerformance Optimization

Install Command

npx skills add affaan-m/everything-claude-code --skill benchmark

Curation Score

This skill scores 67/100, which means it is acceptable to list for directory users but comes with meaningful execution gaps. The repository gives a clear enough picture of when to use benchmarking and what to measure across page, API, and build performance, so an agent can likely trigger it appropriately. However, users should expect to supply their own tooling choices, commands, and reporting workflow because the skill is mostly a measurement framework rather than a fully operational recipe.

67/100

Strengths

Strong triggerability: the "When to Use" section clearly frames before/after PR checks, baseline setup, slowdown investigation, launch readiness, and stack comparison.
Good benchmarking coverage: it lays out concrete metrics for page performance, APIs, and build/dev-loop performance, including Core Web Vitals and latency percentiles.
Useful agent leverage: the numbered measurement steps and target thresholds give more structure than a generic prompt for performance evaluation.

Cautions

Operational clarity is limited: the skill references browser MCP and benchmarking modes, but provides no install command, support files, or concrete command examples to run the tests.
Trust and adoption depth are modest: there are no scripts, references, resources, or companion assets showing a repeatable workflow or example outputs.

Claude Code Metrics Performance Testing API Browser Automation

Overview

Overview of benchmark skill

What the benchmark skill does

The benchmark skill helps you measure performance baselines, spot regressions, and compare alternatives with a repeatable workflow instead of ad hoc checks. It is built for benchmark for Performance Optimization across web pages, APIs, build pipelines, and before/after-change comparisons.

Who should install this benchmark skill

This benchmark skill is best for engineers, tech leads, and AI-assisted developers who need evidence for “is this slower?” or “did this PR improve performance?” It is especially useful when you need a shared measurement method before launch, after user complaints, or while evaluating stack changes.

What makes it useful versus a generic prompt

A normal prompt might tell an agent to “check performance.” This skill is better because it gives a concrete benchmarking frame: page metrics like Core Web Vitals and page weight, API latency percentiles and concurrency checks, and dev-loop metrics such as build and test timings. That structure reduces guesswork and makes outputs easier to compare over time.

How to Use benchmark skill

Install context and what to read first

For benchmark install, add the skill from the repository that contains skills/benchmark, then open SKILL.md first. In this case, the skill is self-contained, so most of the usable guidance is in that file. Read it in this order:

SKILL.md
The “When to Use” section
The mode matching your task: page, API, build, or before/after comparison

Inputs the benchmark skill needs

Good benchmark usage depends on supplying a real target and success criteria. Useful inputs include:

Target URLs or API endpoints
Environment: local, staging, preview, production
Change under test: branch, PR, commit, or stack option
Expected targets: LCP, INP, p95 latency, build time, bundle size
Test constraints: auth, seed data, region, device assumptions

A weak request is: “Benchmark my app.”
A stronger request is: “Use the benchmark skill on these 3 staging URLs, collect LCP/CLS/INP, page weight, and request counts, then compare against production and flag regressions over 10%.”

Turn a rough goal into a strong benchmark prompt

Use a prompt template like this for the benchmark guide:

Scope: page, API, build, or before/after
Targets: exact URLs, endpoints, commands, or branches
Metrics: what to measure and target thresholds
Comparison: baseline vs candidate
Output: summary table, regressions, likely causes, next actions

Example:
“Use the benchmark skill to compare this PR branch against main. For page performance, test /, /pricing, and /checkout on the preview deployment. Report LCP, FCP, CLS, INP, TTFB, total page weight, JS weight, and request count. Call out any regressions above 5% and suggest the top 3 fixes.”

Practical workflow that improves output quality

A high-signal benchmark usage workflow is:

Pick one mode only at first.
Establish a baseline on a stable environment.
Run the same benchmark against the changed version.
Ask for a comparison table and regression summary.
Only after that, ask for diagnosis and optimization ideas.

This order matters. If you skip the baseline, the agent may produce plausible but low-trust recommendations. If results vary a lot, narrow the scope to fewer targets and repeat under more controlled conditions.

benchmark skill FAQ

Is this benchmark skill for pages, APIs, or builds?

All three. The skill explicitly covers page performance, API performance, and build/developer-loop performance. That makes it broader than a Lighthouse-only workflow and more practical when performance problems are spread across frontend, backend, and tooling.

When should I use benchmark instead of a normal performance prompt?

Use benchmark when you need repeatable measurements, before/after comparisons, or regression detection. A generic prompt is fine for brainstorming optimization ideas, but this skill is better when the real job is measurement, not opinion.

Is the benchmark skill beginner-friendly?

Yes, if you can provide clear targets. You do not need to know every metric in advance, but you should know what you are benchmarking and where. Beginners get the most value by starting with one page or one endpoint, then expanding once the first run is understandable.

When is this a poor fit?

Skip this benchmark skill if you only want general performance education, not measurement. It is also a weak fit if your environment is too unstable to compare runs, or if you cannot supply accessible URLs, callable endpoints, or runnable build commands.

How to Improve benchmark skill

Give cleaner inputs for better benchmark results

The best improvement is input quality. For benchmark for Performance Optimization, specify:

exact targets
production or staging environment
baseline and candidate versions
thresholds that matter to your team
any auth/setup required

“Benchmark our API” is vague.
“Benchmark POST /search and GET /products/:id on staging with 100 requests, 10 concurrency, and report p50/p95/p99 against our 300ms p95 SLA” is actionable.

Avoid common benchmark failure modes

Common problems:

comparing different environments
mixing multiple changes into one test
using unrealistic pages or endpoints
asking for diagnosis before measurement
not defining acceptable regression thresholds

These failures make benchmark outputs noisy and harder to trust. Control the setup first, then interpret the result.

Ask for comparisons, not isolated numbers

A single metric snapshot is less useful than relative change. Improve the benchmark skill output by requesting:

baseline vs candidate tables
percent change
pass/fail against thresholds
suspected causes for the top regressions only

That pushes the agent from data dumping into decision support.

Iterate after the first benchmark run

After the first pass, tighten the scope. Ask the agent to rerun only the slowest pages, the worst API percentile, or the heaviest build step. Then request targeted follow-up such as “focus on render-blocking assets” or “investigate why p99 is much worse than p50.” This iterative loop is where the benchmark guide becomes most useful, because it turns one broad measurement pass into a practical optimization plan.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

vercel-react-best-practices

by vercel-labs

vercel-react-best-practices is a Vercel Engineering skill that guides AI agents to optimize React and Next.js performance with prioritized rules for waterfalls, bundle size, and rendering.

Frontend Development

Favorites 0GitHub 24k

performance-optimization

by addyosmani

The performance-optimization skill helps you measure first, find the real bottleneck, fix it, and verify results. Use it when performance requirements exist, you suspect a regression, or Core Web Vitals, load times, or interaction latency need improvement.

Performance Optimization

Favorites 0GitHub 18.7k

supabase-postgres-best-practices

by supabase

supabase-postgres-best-practices is a Supabase Postgres optimization skill for query tuning, indexing, schema design, RLS performance, locking, and connection management.

Database Engineering

Favorites 0GitHub 1.7k

wp-performance

by WordPress

Use wp-performance to investigate and improve WordPress performance from the backend, without a browser UI. It supports measurement-first diagnosis for slow frontend requests, admin pages, REST routes, and WP-Cron, with guidance on WP-CLI profile/doctor, Query Monitor via REST headers, Server-Timing, database queries, autoloaded options, object caching, cron, and remote HTTP calls.

Performance Optimization

Favorites 0GitHub 1.4k

web-perf

by cloudflare

web-perf analyzes web performance with Chrome DevTools MCP. It measures Core Web Vitals, trace-based load issues, render-blocking resources, layout shifts, caching problems, and accessibility gaps. Use the web-perf skill for Performance Optimization, debugging slow pages, and evidence-based web-perf guide workflows that rely on current docs and live traces.

Performance Optimization

Favorites 0GitHub 1.3k

react-native-best-practices

by callstackincubator

react-native-best-practices is a practical React Native performance optimization guide for slow startup, dropped frames, heavy renders, memory leaks, bundle bloat, and animation jank. Use it when you need evidence-backed fixes for Hermes, bridge overhead, FlashList, native modules, or profiling a release regression.

Performance Optimization

Favorites 0GitHub 1.3k

swift-nio

by Joannis

swift-nio is a skill for SwiftNIO backend development, covering servers, clients, pipelines, buffers, codecs, and event-loop-safe async code. Use it for swift-nio usage questions, protocol parsing, TCP/UDP services, NIOAsyncChannel integration, and debugging blocking work on an EventLoop. It is a practical swift-nio guide for correct architecture and implementation.

Backend Development

Favorites 0GitHub 0

audit-website

by squirrelscan

The audit-website skill uses the squirrel CLI to audit websites and webapps across 230+ rules for SEO, technical, content, performance, security, links, and site health, then returns actionable LLM-ready reports.

UX Audit

Favorites 0GitHub 68

autoresearch

by github

autoresearch is an autonomous experimentation loop for coding tasks with measurable outcomes. It helps developers define a goal, baseline, metric, and scope, then iterate through code changes, tests, and keep-or-revert decisions using git-backed checkpoints.

Workflow Automation

Favorites 0GitHub 0

godot-gdscript-patterns

by wshobson

godot-gdscript-patterns helps Godot 4 users generate and review GDScript with better scene structure, signals, state machines, autoloads, and async loading patterns. Use it to install proven Godot architecture into gameplay systems, UI flows, and maintainable project code.

Frontend Development

Favorites 0GitHub 32.5k

pytorch-patterns

by affaan-m

pytorch-patterns helps you write, review, and debug PyTorch code with device-agnostic patterns, reproducible experiments, and explicit tensor handling. Use the pytorch-patterns skill for cleaner training loops, model refactors, and practical PyTorch guidance.

Code Editing

Favorites 0GitHub 156.2k

nextjs-turbopack

by affaan-m

The nextjs-turbopack skill helps you use Turbopack in Next.js 16+ for faster local development, HMR, and bundler decisions. Use it as a practical nextjs-turbopack guide for install, usage, and when to switch back to webpack in Frontend Development workflows.

Frontend Development

Favorites 0GitHub 156.2k

jpa-patterns

by affaan-m

jpa-patterns is a practical JPA/Hibernate guide for Spring Boot backend development. It covers entity design, relationships, query tuning, transactions, auditing, pagination, and pooling to help reduce ORM mistakes and improve persistence performance.

Backend Development

Favorites 0GitHub 156.2k

rust-async-patterns

by wshobson

rust-async-patterns is a practical skill for async Rust with Tokio, covering tasks, channels, streams, timeouts, cancellation, tracing, and error handling for backend development.

Backend Development

Favorites 0GitHub 32.6k

go-concurrency-patterns

by wshobson

go-concurrency-patterns helps you apply idiomatic Go concurrency for worker pools, pipelines, channels, sync primitives, and context-based cancellation. Use it to design safer backend services, debug race conditions, and improve graceful shutdown behavior from the guidance in SKILL.md.

Backend Development

Favorites 0GitHub 32.6k

async-python-patterns

by wshobson

async-python-patterns is a practical guide to choosing safe asyncio patterns for I/O-bound Python systems. Use it to install context, review usage, avoid blocking the event loop, and design async APIs, workers, scrapers, and backend services with bounded concurrency, cancellation, and sync-vs-async tradeoffs.

Backend Development

Favorites 0GitHub 32.6k