ab-test-setup
by coreyhaines31ab-test-setup helps you plan and design statistically sound A/B and multivariate experiments, from hypothesis through sample size and metrics, before you implement tracking or code changes.
Overview
What is ab-test-setup?
ab-test-setup is a skill for designing rigorous A/B and multivariate experiments before anything goes live. It guides an AI assistant to act as an experimentation specialist: clarifying test goals, crafting strong hypotheses, choosing appropriate metrics, and planning sample size and duration using structured references.
Instead of jumping straight into running a split test, ab-test-setup helps you create a solid test plan so the results are statistically valid and actionable, not just noise.
Who is this skill for?
Use ab-test-setup if you are:
- Growth or product marketing teams planning experiments on landing pages, onboarding flows, or pricing pages.
- Performance marketers optimizing ads, campaign creatives, or funnels and needing statistically sound tests.
- SEO and content teams testing headlines, layouts, or calls to action on high-value pages.
- Developers and product managers who support experimentation and want a consistent, documented planning framework.
If you simply need ideas for copy or layout changes without testing them, this skill is overkill; use your content or CRO skill instead.
What problems does ab-test-setup solve?
This skill is designed for situations where a user says things like:
- "We want to A/B test our homepage headline."
- "Should we run a multivariate test on these elements?"
- "Which version is better, and how should we test it?"
- "How long should we run this experiment?"
- "Do we have enough traffic for this test?"
ab-test-setup focuses on:
- Clarifying context: what you’re trying to improve, baseline performance, and constraints.
- Building a strong hypothesis using a structured framework.
- Choosing test type (A/B vs. A/B/n vs. multivariate) based on traffic and goals.
- Planning sample size and duration, using the included sample-size guide.
- Defining metrics (primary, secondary, and guardrail) that match your business objectives.
- Avoiding common pitfalls like testing too many variants at low traffic or making decisions too early (“peeking”).
For tracking implementation, use the analytics-tracking skill. For page-level conversion optimization ideas, use page-cro alongside ab-test-setup.
When is ab-test-setup a good fit?
This skill is a good fit when:
- You are comparing two or more approaches and need to measure which performs better.
- You have or expect enough traffic to run a meaningful A/B test.
- You care about statistical significance and avoiding false wins.
- Multiple stakeholders need a clear, documented test plan.
It is not a great fit when:
- You have extremely low traffic where meaningful A/B testing is unrealistic.
- You are making one-off design changes without measurement.
- You only need analytics setup or event tracking (use
analytics-trackinginstead).
How to Use
Installation
Install ab-test-setup into your agent environment using the skills CLI:
npx skills add https://github.com/coreyhaines31/marketingskills --skill ab-test-setup
After installation:
- Open the
skills/ab-test-setupdirectory in your editor or file viewer. - Start with
SKILL.mdto understand how the assistant should approach A/B test planning. - Review the
references/andevals/folders to see the supporting material and expected behavior.
Key files and folders
To get value quickly, focus on these files:
SKILL.md– Core instructions. Defines the experimentation mindset, initial assessment questions, and core principles like starting with a hypothesis and testing one thing at a time.references/sample-size-guide.md– Guidelines for calculating or estimating sample sizes, understanding minimum detectable effect (MDE), and planning test duration.references/test-templates.md– Ready-to-use templates for test plans, results documentation, and stakeholder updates.evals/evals.json– Example prompts and expected outputs that show how the skill should behave in real-world scenarios.
Use these as a reference when configuring your agent, or to align your internal experimentation documentation with the same structure.
Typical workflow with ab-test-setup
The skill is designed around a repeatable experimentation workflow.
1. Gather context
When a user asks for an A/B test, the agent should first understand:
- Test context – What page, feature, or channel is being tested? What change is being considered?
- Current state – Baseline conversion rate or key metric, current traffic volume.
- Constraints – Technical limitations, implementation complexity, timelines, and tools (e.g., Optimizely, Google Optimize alternatives, in-house framework).
If you have a shared product marketing context file (for example, product-marketing-context.md described in the repo), the agent should read it first and only ask for information that is missing or test-specific.
2. Define a strong hypothesis
ab-test-setup promotes a structured hypothesis format, as seen in evals/evals.json and references/test-templates.md:
Because [observation], we believe [change] will cause [outcome], which we'll measure by [metric].
In practice, the agent should:
- Turn vague ideas ("try a benefit headline") into specific predictions.
- Link each hypothesis to data or clear observations (analytics, research, user feedback).
- Tie the outcome directly to a primary business metric (e.g., signup rate, add-to-cart rate).
3. Choose the right test design
Using the principles in SKILL.md and the examples in evals/evals.json, the agent helps decide:
- A/B vs. A/B/n vs. multivariate – For example, discouraging testing four button colors at tiny traffic levels if that would underpower the test.
- Single-variable focus – Encouraging testing one main change at a time, so results are interpretable.
- Traffic allocation – Typically 50/50 for simple A/B, but the templates support more complex setups.
This is particularly useful for marketing and SEO teams who might be tempted to test many elements at once.
4. Plan sample size and duration
The references/sample-size-guide.md file gives the agent a framework to:
- Explain baseline conversion rate, MDE, significance, and power.
- Use quick reference tables or formulas to estimate sample size per variant.
- Translate that into an approximate test duration based on traffic.
- Highlight common mistakes, such as underpowered tests and ignoring multiple-variant adjustments.
For example, in an evaluation prompt, the agent is expected to estimate the required sample size for 15,000 visitors/month and a 3.2% baseline, and then recommend a realistic test duration.
5. Define metrics and guardrails
Using the patterns in test-templates.md, the agent should help you:
- Pick a primary metric that represents the main outcome (e.g., signup rate).
- Add secondary metrics for deeper understanding (e.g., click-through rate, micro-conversions).
- Set guardrail metrics to avoid harmful impacts (e.g., bounce rate, error rate, revenue per visitor).
This is especially valuable for ad optimization and SEO content experiments, where local gains can hurt overall performance if guardrails are ignored.
6. Produce a structured test plan
With the information collected, the agent can output a plan using the templates from references/test-templates.md, including:
- Overview and owner details.
- Hypothesis and rationale.
- Test design and implementation notes.
- Variant descriptions (control and challenger(s)).
- Metrics definitions and segmentation plan.
You can paste this plan into your experimentation tool, internal docs, or JIRA ticket to keep tests consistent and reviewable.
How ab-test-setup works with other skills
- With
analytics-tracking: ab-test-setup defines what and why you test; analytics-tracking defines how to capture events, goals, or conversions. - With
page-cro: page-cro helps generate ideas for what to change; ab-test-setup decides which ideas to test first and how.
Use them together for a full experimentation workflow: ideation → prioritization → test design → implementation → analysis.
FAQ
When should I use ab-test-setup instead of just changing the page?
Use ab-test-setup when:
- The change could have meaningful business impact (e.g., core funnel steps, high-traffic pages).
- Stakeholders will ask, "Did this actually work?" and you need credible evidence.
- You’re optimizing ongoing marketing or SEO efforts and want a repeatable process.
For trivial or cosmetic tweaks where you don’t plan to measure impact, a full A/B test plan isn’t needed.
Does ab-test-setup calculate exact sample sizes?
The skill does not contain a dedicated calculator library. Instead, it uses the logic and examples in references/sample-size-guide.md to:
- Explain what inputs you need.
- Estimate reasonable sample sizes or guide you to online calculators.
- Warn you when your traffic is likely too low for reliable tests.
For mission-critical or highly regulated contexts, you should still validate calculations with your analytics or data science team.
Can I use ab-test-setup for more than two variants?
Yes. While the core idea is A/B testing, the documentation and templates support A/B/n and multivariate experiments. The skill also emphasizes that adding more variants requires larger sample sizes and longer durations, which are covered in the sample-size guide.
How does ab-test-setup handle “peeking” and early stopping?
The evaluation prompts explicitly require the agent to:
- Warn about the peeking problem (checking results too frequently and stopping early).
- Recommend a fixed test duration or sample threshold before declaring a winner.
This helps maintain statistical validity, especially for high-stakes marketing and product decisions.
Is ab-test-setup only for web pages?
No. The principles apply to:
- Website and landing page experiments.
- In-app product tests.
- Email and lifecycle journey tests.
- Ad creative and messaging experiments.
Anywhere you can randomly assign users to variants and track outcomes, ab-test-setup can help design the experiment.
How do I know if I have enough traffic for an A/B test?
Use the guidance in references/sample-size-guide.md:
- Start with your baseline conversion rate and monthly visitors.
- Decide on a minimum detectable effect — how big a change is worth detecting.
- Use the tables or formulas to estimate required sample size per variant.
- Compare that to your traffic to see if the test would take a reasonable time.
If the required duration is extremely long, the agent may recommend:
- Combining similar pages or campaigns to increase sample size.
- Testing bigger, more impactful changes (larger MDE).
- Using other research methods (qualitative feedback, user testing) instead of A/B testing.
What if I only want copy ideas or design suggestions?
ab-test-setup assumes you want to measure which version wins. If you just want copy or layout ideas without running a test:
- Use your content or CRO-focused skill (such as
page-cro) to generate ideas. - Optionally come back to ab-test-setup later if you decide to validate those ideas via testing.
Where can I see examples of good output from this skill?
Check evals/evals.json in the ab-test-setup folder. It includes realistic prompts (e.g., testing homepage headlines or button colors) and detailed expectations for how the agent should respond, including:
- Hypothesis structure.
- Sample size and duration reasoning.
- Metric selection.
- Warnings about common pitfalls.
You can use these as benchmarks when you integrate or customize the skill in your own environment.
