A

benchmark

by affaan-m

Use the benchmark skill to measure performance baselines, detect regressions before and after PRs, and compare stack alternatives across pages, APIs, and builds for Performance Optimization.

Stars156.1k
Favorites0
Comments0
AddedApr 15, 2026
CategoryPerformance Optimization
Install Command
npx skills add affaan-m/everything-claude-code --skill benchmark
Curation Score

This skill scores 67/100, which means it is acceptable to list for directory users but comes with meaningful execution gaps. The repository gives a clear enough picture of when to use benchmarking and what to measure across page, API, and build performance, so an agent can likely trigger it appropriately. However, users should expect to supply their own tooling choices, commands, and reporting workflow because the skill is mostly a measurement framework rather than a fully operational recipe.

67/100
Strengths
  • Strong triggerability: the "When to Use" section clearly frames before/after PR checks, baseline setup, slowdown investigation, launch readiness, and stack comparison.
  • Good benchmarking coverage: it lays out concrete metrics for page performance, APIs, and build/dev-loop performance, including Core Web Vitals and latency percentiles.
  • Useful agent leverage: the numbered measurement steps and target thresholds give more structure than a generic prompt for performance evaluation.
Cautions
  • Operational clarity is limited: the skill references browser MCP and benchmarking modes, but provides no install command, support files, or concrete command examples to run the tests.
  • Trust and adoption depth are modest: there are no scripts, references, resources, or companion assets showing a repeatable workflow or example outputs.
Overview

Overview of benchmark skill

What the benchmark skill does

The benchmark skill helps you measure performance baselines, spot regressions, and compare alternatives with a repeatable workflow instead of ad hoc checks. It is built for benchmark for Performance Optimization across web pages, APIs, build pipelines, and before/after-change comparisons.

Who should install this benchmark skill

This benchmark skill is best for engineers, tech leads, and AI-assisted developers who need evidence for “is this slower?” or “did this PR improve performance?” It is especially useful when you need a shared measurement method before launch, after user complaints, or while evaluating stack changes.

What makes it useful versus a generic prompt

A normal prompt might tell an agent to “check performance.” This skill is better because it gives a concrete benchmarking frame: page metrics like Core Web Vitals and page weight, API latency percentiles and concurrency checks, and dev-loop metrics such as build and test timings. That structure reduces guesswork and makes outputs easier to compare over time.

How to Use benchmark skill

Install context and what to read first

For benchmark install, add the skill from the repository that contains skills/benchmark, then open SKILL.md first. In this case, the skill is self-contained, so most of the usable guidance is in that file. Read it in this order:

  1. SKILL.md
  2. The “When to Use” section
  3. The mode matching your task: page, API, build, or before/after comparison

Inputs the benchmark skill needs

Good benchmark usage depends on supplying a real target and success criteria. Useful inputs include:

  • Target URLs or API endpoints
  • Environment: local, staging, preview, production
  • Change under test: branch, PR, commit, or stack option
  • Expected targets: LCP, INP, p95 latency, build time, bundle size
  • Test constraints: auth, seed data, region, device assumptions

A weak request is: “Benchmark my app.”
A stronger request is: “Use the benchmark skill on these 3 staging URLs, collect LCP/CLS/INP, page weight, and request counts, then compare against production and flag regressions over 10%.”

Turn a rough goal into a strong benchmark prompt

Use a prompt template like this for the benchmark guide:

  • Scope: page, API, build, or before/after
  • Targets: exact URLs, endpoints, commands, or branches
  • Metrics: what to measure and target thresholds
  • Comparison: baseline vs candidate
  • Output: summary table, regressions, likely causes, next actions

Example:
“Use the benchmark skill to compare this PR branch against main. For page performance, test /, /pricing, and /checkout on the preview deployment. Report LCP, FCP, CLS, INP, TTFB, total page weight, JS weight, and request count. Call out any regressions above 5% and suggest the top 3 fixes.”

Practical workflow that improves output quality

A high-signal benchmark usage workflow is:

  1. Pick one mode only at first.
  2. Establish a baseline on a stable environment.
  3. Run the same benchmark against the changed version.
  4. Ask for a comparison table and regression summary.
  5. Only after that, ask for diagnosis and optimization ideas.

This order matters. If you skip the baseline, the agent may produce plausible but low-trust recommendations. If results vary a lot, narrow the scope to fewer targets and repeat under more controlled conditions.

benchmark skill FAQ

Is this benchmark skill for pages, APIs, or builds?

All three. The skill explicitly covers page performance, API performance, and build/developer-loop performance. That makes it broader than a Lighthouse-only workflow and more practical when performance problems are spread across frontend, backend, and tooling.

When should I use benchmark instead of a normal performance prompt?

Use benchmark when you need repeatable measurements, before/after comparisons, or regression detection. A generic prompt is fine for brainstorming optimization ideas, but this skill is better when the real job is measurement, not opinion.

Is the benchmark skill beginner-friendly?

Yes, if you can provide clear targets. You do not need to know every metric in advance, but you should know what you are benchmarking and where. Beginners get the most value by starting with one page or one endpoint, then expanding once the first run is understandable.

When is this a poor fit?

Skip this benchmark skill if you only want general performance education, not measurement. It is also a weak fit if your environment is too unstable to compare runs, or if you cannot supply accessible URLs, callable endpoints, or runnable build commands.

How to Improve benchmark skill

Give cleaner inputs for better benchmark results

The best improvement is input quality. For benchmark for Performance Optimization, specify:

  • exact targets
  • production or staging environment
  • baseline and candidate versions
  • thresholds that matter to your team
  • any auth/setup required

“Benchmark our API” is vague.
“Benchmark POST /search and GET /products/:id on staging with 100 requests, 10 concurrency, and report p50/p95/p99 against our 300ms p95 SLA” is actionable.

Avoid common benchmark failure modes

Common problems:

  • comparing different environments
  • mixing multiple changes into one test
  • using unrealistic pages or endpoints
  • asking for diagnosis before measurement
  • not defining acceptable regression thresholds

These failures make benchmark outputs noisy and harder to trust. Control the setup first, then interpret the result.

Ask for comparisons, not isolated numbers

A single metric snapshot is less useful than relative change. Improve the benchmark skill output by requesting:

  • baseline vs candidate tables
  • percent change
  • pass/fail against thresholds
  • suspected causes for the top regressions only

That pushes the agent from data dumping into decision support.

Iterate after the first benchmark run

After the first pass, tighten the scope. Ask the agent to rerun only the slowest pages, the worst API percentile, or the heaviest build step. Then request targeted follow-up such as “focus on render-blocking assets” or “investigate why p99 is much worse than p50.” This iterative loop is where the benchmark guide becomes most useful, because it turns one broad measurement pass into a practical optimization plan.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...