Skip to content

docs: add Benjamini-Hochberg correction and configurable CUPED lookback#2729

Open
russell-loube-mixpanel wants to merge 1 commit into
mainfrom
russell/experiments-bh-cuped-lookback
Open

docs: add Benjamini-Hochberg correction and configurable CUPED lookback#2729
russell-loube-mixpanel wants to merge 1 commit into
mainfrom
russell/experiments-bh-cuped-lookback

Conversation

@russell-loube-mixpanel
Copy link
Copy Markdown
Contributor

Summary

Updates the Advanced Statistical Methods section of the Experiments docs to cover two features that are now exposed in the product UI:

  • Benjamini-Hochberg correction — a new option in the Multiple Testing Correction dropdown, alongside the existing Bonferroni. Docs explain FDR vs. FWER, the rank-based procedure, a worked 5-metric example, when to prefer BH over Bonferroni, and the constraint that only one correction method can be applied at a time. Content drawn from Kaan's internal Benjamini-Hochberg writeup.
  • Configurable CUPED Pre-Exposure Period — previously the docs said "a pre-exposure period of your choosing" without enumerating options. Now lists the five options (1 Week, 2 Weeks default, 4 Weeks, 60 Days, 90 Days) and gives guidance on picking a window.

Also adds a new row in the "Methods at a Glance" table for Benjamini-Hochberg and a small clarifying tail to the Bonferroni row's "When to Use" cell to contrast the two methods.

No other sections were touched.

Test plan

  • Render preview on Vercel and confirm the new BH section, example table, and CUPED paragraph render correctly
  • Kaan to verify BH explanation matches the implementation
  • Confirm the listed CUPED lookback options match what ships in the UI dropdown

🤖 Generated with Claude Code

- Document Benjamini-Hochberg as a multiple-testing correction option
  alongside Bonferroni, with worked example and guidance on choosing
  between FDR and FWER methods.
- Document the configurable CUPED Pre-Exposure Period (1W, 2W default,
  4W, 60D, 90D) and tradeoffs for choosing a window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@russell-loube-mixpanel russell-loube-mixpanel requested a review from a team as a code owner May 27, 2026 21:45
@russell-loube-mixpanel russell-loube-mixpanel requested review from kaan-barmore-genc-mixpanel and mherrman and removed request for a team May 27, 2026 21:45
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 27, 2026 9:48pm

Request Review


### Benjamini-Hochberg Correction

Benjamini-Hochberg (BH) is a more balanced alternative to Bonferroni for handling multiple comparisons. Instead of controlling the probability that *any* significant result is a false positive (the family-wise error rate that Bonferroni targets), BH controls the **false discovery rate (FDR)** — the expected *proportion* of your "winners" that are actually false positives.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never say "FDR" again, so probably don't need to include the abbreviation here.


**How it works:** For each user, Mixpanel looks at their metric value during a pre-exposure period of your choosing and their metric value during the experiment. If these values are strongly correlated (users with high pre-experiment values tend to have high post-experiment values), CUPED uses this relationship to reduce variance in the experiment results. The mean values remain unchanged—CUPED only tightens the confidence intervals. This is applied to all metric categories: primary, secondary, and guardrail metrics.

**Configuring the pre-exposure period:** When you enable CUPED, you can choose the lookback window under **Configuration → CUPED Pre-Exposure Period**. The available options are **1 Week**, **2 Weeks** (default), **4 Weeks**, **60 Days**, and **90 Days**. Longer windows give CUPED more historical data per user, which can improve variance reduction when behavior is stable over time, but they also exclude users whose history doesn't reach that far back. Shorter windows include more users but may capture less predictive signal. Two weeks is a good default for most experiments; consider a longer window if your metric has slow-moving or seasonal patterns.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. When you set the pre-exposure period, that's the length of time we'll query going back. The more you expand this pre-exposure period, the more users we'll actually pick up. We only filter out people who were active during this period but weren't exposed to the experiment.

I think the right point to make here is that you want to choose the pre-exposure period so that it doesn't overlap with other experiments on the same user population as this one, otherwise there is loss of sensitivity and potential bias. And long enough to capture user behavior (say, if some of your users are active only once a month, you want to make sure you choose a range long enough to see them).

I also asked Claude for feedback, I'll add a relevant point it gave:

A longer window is not always better — stale behavior from far in the past can be less predictive of current behavior, and flooding the population with X = 0 (new users, infrequent users) drags ρ toward zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants