Skip to content

Hypothesis Generation and Experiments

From hypothesis to experiment: how to create testable hypotheses, prioritize with the ICE framework and design experiments that deliver insights.

hypotesexperimentICEprioritering

Hypothesis Generation and Experiments: From Idea to Insight with the ICE Framework

Hypothesis generation is the process of creating testable predictions about what changes will improve your key metrics. Combined with a structured experimentation process, it transforms guesswork into systematic growth. The ICE framework (Impact, Confidence, Ease) provides a practical way to prioritize which hypotheses to test first, ensuring your team focuses on the experiments most likely to deliver meaningful results.

What Makes a Good Hypothesis?

A well-formed hypothesis has three components: a specific change, a predicted outcome and a rationale. The format is: "If we [make this change], then [this metric will improve], because [this reasoning]." For example: "If we add customer testimonials to the pricing page, then the conversion rate will increase by 10 percent, because social proof reduces purchase anxiety."

Avoid vague hypotheses like "improving the design will increase sales." Specificity is critical because it determines how you design the experiment and how you measure success. A good hypothesis is also falsifiable, meaning you can clearly determine whether it was correct or incorrect based on the data.

Sources of Hypotheses

The best hypotheses come from combining multiple data sources. Quantitative data from dashboards and analytics reveals what is happening. Qualitative data from user research, customer feedback and support tickets explains why it is happening. Competitive analysis shows what others are doing differently. Industry research and best practices provide frameworks and benchmarks.

Maintain a hypothesis backlog where anyone on the team can add ideas at any time. Review and prioritize this backlog during your regular growth sessions. A healthy backlog contains 20-50 hypotheses at varying levels of detail.

The ICE Framework for Prioritization

ICE stands for Impact, Confidence and Ease. Each hypothesis is scored on a scale of 1-10 for each dimension, and the scores are averaged to produce an overall priority score.

  • Impact: How much will this experiment move the needle if it succeeds? Score based on the potential effect on your key metric and the number of users affected.
  • Confidence: How certain are you that this experiment will produce the predicted result? Base this on supporting data, prior experiments and industry evidence.
  • Ease: How quickly and cheaply can you run this experiment? Consider development time, design resources and any dependencies.

High ICE scores indicate experiments that are likely to have a big impact, are supported by evidence and can be implemented quickly. Start with these. Low-scoring experiments should be refined or deprioritized.

Designing Experiments

A well-designed experiment has a clear hypothesis, a defined success metric, a sufficient sample size, a control group and a planned duration. Before launching, calculate the minimum sample size needed to detect your expected effect with statistical significance. Use an A/B calculator to determine the right parameters.

Document the experiment design including what you are testing, who is included in the test, how long it will run and what you will measure. Share this with the team before launch so everyone agrees on the setup and success criteria.

Analyzing and Learning from Results

When an experiment concludes, analyze the results rigorously. Did the variation beat the control with statistical significance? Was the effect size meaningful from a business perspective? Were there unexpected secondary effects on other metrics?

Document every experiment's results, regardless of outcome. Failed experiments are just as valuable as successful ones because they teach you what does not work and update your understanding of your users. Build a knowledge base of experiment results that the entire team can search and reference.

Building an Experimentation Culture

The most successful growth teams run many experiments continuously. Aim for a minimum of 2-4 experiments per sprint. Volume matters because most experiments do not produce statistically significant results. The more experiments you run, the more likely you are to find significant wins. Embed experimentation into your growth process as a core practice.

Frequently Asked Questions

What if our experiment does not reach statistical significance?

An inconclusive result is not a failure. It means the effect, if any, is too small to detect with your current sample size. You can extend the test duration, increase the traffic allocation or accept that the change does not have a meaningful impact and move on to the next experiment.

How do we balance quick wins with big bets?

Use the ICE framework to maintain a mix. Include some high-Ease experiments for quick wins that build momentum, alongside high-Impact experiments that take longer but could deliver transformational results. A good ratio is roughly 70 percent quick experiments and 30 percent bigger bets.

Should we test one thing at a time?

For A/B tests, yes. Changing multiple variables simultaneously makes it impossible to attribute results to a specific change. If you want to test multiple changes together, use multivariate testing, but be aware that it requires significantly larger sample sizes to reach significance.

Related articles