docs: add Benjamini-Hochberg correction and configurable CUPED lookback#2729
docs: add Benjamini-Hochberg correction and configurable CUPED lookback#2729russell-loube-mixpanel wants to merge 1 commit into
Conversation
- Document Benjamini-Hochberg as a multiple-testing correction option alongside Bonferroni, with worked example and guidance on choosing between FDR and FWER methods. - Document the configurable CUPED Pre-Exposure Period (1W, 2W default, 4W, 60D, 90D) and tradeoffs for choosing a window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
||
| ### Benjamini-Hochberg Correction | ||
|
|
||
| Benjamini-Hochberg (BH) is a more balanced alternative to Bonferroni for handling multiple comparisons. Instead of controlling the probability that *any* significant result is a false positive (the family-wise error rate that Bonferroni targets), BH controls the **false discovery rate (FDR)** — the expected *proportion* of your "winners" that are actually false positives. |
There was a problem hiding this comment.
We never say "FDR" again, so probably don't need to include the abbreviation here.
|
|
||
| **How it works:** For each user, Mixpanel looks at their metric value during a pre-exposure period of your choosing and their metric value during the experiment. If these values are strongly correlated (users with high pre-experiment values tend to have high post-experiment values), CUPED uses this relationship to reduce variance in the experiment results. The mean values remain unchanged—CUPED only tightens the confidence intervals. This is applied to all metric categories: primary, secondary, and guardrail metrics. | ||
|
|
||
| **Configuring the pre-exposure period:** When you enable CUPED, you can choose the lookback window under **Configuration → CUPED Pre-Exposure Period**. The available options are **1 Week**, **2 Weeks** (default), **4 Weeks**, **60 Days**, and **90 Days**. Longer windows give CUPED more historical data per user, which can improve variance reduction when behavior is stable over time, but they also exclude users whose history doesn't reach that far back. Shorter windows include more users but may capture less predictive signal. Two weeks is a good default for most experiments; consider a longer window if your metric has slow-moving or seasonal patterns. |
There was a problem hiding this comment.
This isn't correct. When you set the pre-exposure period, that's the length of time we'll query going back. The more you expand this pre-exposure period, the more users we'll actually pick up. We only filter out people who were active during this period but weren't exposed to the experiment.
I think the right point to make here is that you want to choose the pre-exposure period so that it doesn't overlap with other experiments on the same user population as this one, otherwise there is loss of sensitivity and potential bias. And long enough to capture user behavior (say, if some of your users are active only once a month, you want to make sure you choose a range long enough to see them).
I also asked Claude for feedback, I'll add a relevant point it gave:
A longer window is not always better — stale behavior from far in the past can be less predictive of current behavior, and flooding the population with X = 0 (new users, infrequent users) drags ρ toward zero.
Summary
Updates the Advanced Statistical Methods section of the Experiments docs to cover two features that are now exposed in the product UI:
Also adds a new row in the "Methods at a Glance" table for Benjamini-Hochberg and a small clarifying tail to the Bonferroni row's "When to Use" cell to contrast the two methods.
No other sections were touched.
Test plan
🤖 Generated with Claude Code