diff --git a/CHANGELOG.md b/CHANGELOG.md index 3332a47..50c4bc1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,37 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.0.0] - 2026-01-04 + +### Added +- **Goodman-Bacon decomposition** for TWFE diagnostics + - `BaconDecomposition` class for decomposing TWFE into weighted 2x2 comparisons + - `Comparison2x2` dataclass for individual comparisons (treated_vs_never, earlier_vs_later, later_vs_earlier) + - `BaconDecompositionResults` with weights and estimates by comparison type + - `bacon_decompose()` convenience function + - `plot_bacon()` visualization for decomposition results + - Integration via `TwoWayFixedEffects.decompose()` method +- **Power analysis** for study design + - `PowerAnalysis` class for analytical power calculations + - `PowerResults` and `SimulationPowerResults` dataclasses + - `compute_mde()`, `compute_power()`, `compute_sample_size()` convenience functions + - `simulate_power()` for Monte Carlo simulation-based power analysis + - `plot_power_curve()` visualization for power analysis + - Tutorial notebook: `docs/tutorials/06_power_analysis.ipynb` +- **Callaway-Sant'Anna multiplier bootstrap** for inference + - `CSBootstrapResults` with standard errors, confidence intervals, p-values + - Rademacher, Mammen, and Webb weight distributions + - Bootstrap inference for all aggregation methods +- **Troubleshooting guide** in documentation +- **Standard error computation guide** explaining SE differences across estimators + +### Changed +- Updated package status to Production/Stable (was Alpha) +- SyntheticDiD bootstrap now warns when >5% of iterations fail + +### Fixed +- Silent bootstrap failures in SyntheticDiD now produce warnings + ## [0.6.0] ### Added @@ -136,6 +167,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `to_dict()` and `to_dataframe()` export methods - `is_significant` and `significance_stars` properties +[1.0.0]: https://github.com/igerber/diff-diff/compare/v0.6.0...v1.0.0 [0.6.0]: https://github.com/igerber/diff-diff/compare/v0.5.0...v0.6.0 [0.5.0]: https://github.com/igerber/diff-diff/compare/v0.4.0...v0.5.0 [0.4.0]: https://github.com/igerber/diff-diff/compare/v0.3.0...v0.4.0 diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index f029093..47d3238 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -85,7 +85,7 @@ plot_sensitivity, ) -__version__ = "0.9.0" +__version__ = "1.0.0" __all__ = [ # Estimators "DifferenceInDifferences", diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py index a2df476..808dc3b 100644 --- a/diff_diff/estimators.py +++ b/diff_diff/estimators.py @@ -1774,6 +1774,20 @@ def _bootstrap_se( continue bootstrap_estimates = np.array(bootstrap_estimates) + + # Warn if too many bootstrap iterations failed + n_successful = len(bootstrap_estimates) + failure_rate = 1 - (n_successful / self.n_bootstrap) + if failure_rate > 0.05: + warnings.warn( + f"Only {n_successful}/{self.n_bootstrap} bootstrap iterations succeeded " + f"({failure_rate:.1%} failure rate). Standard errors may be unreliable. " + f"This can occur with small samples, near-singular weight matrices, " + f"or insufficient pre-treatment periods.", + UserWarning, + stacklevel=2, + ) + se = np.std(bootstrap_estimates, ddof=1) if len(bootstrap_estimates) > 1 else 0.0 return se, bootstrap_estimates diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index f0a1965..ceb48dd 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -205,6 +205,57 @@ Common Pitfalls *Solution*: Always specify ``cluster_col`` for panel data. +Standard Error Methods +---------------------- + +Different estimators compute standard errors differently. Understanding these +differences helps interpret results and choose appropriate inference. + +.. list-table:: + :header-rows: 1 + :widths: 20 25 55 + + * - Estimator + - Default SE Method + - Details + * - ``DifferenceInDifferences`` + - HC1 (heteroskedasticity-robust) + - Uses White's robust SEs by default. Specify ``cluster_col`` for cluster-robust SEs. Use ``inference='wild_bootstrap'`` for few clusters (<30). + * - ``TwoWayFixedEffects`` + - Cluster-robust (unit level) + - Always clusters at unit level after within-transformation. Specify ``cluster_col`` to override. Use ``inference='wild_bootstrap'`` for few clusters. + * - ``MultiPeriodDiD`` + - HC1 (heteroskedasticity-robust) + - Same as basic DiD. Cluster-robust available via ``cluster_col``. Wild bootstrap not yet supported for multi-coefficient inference. + * - ``CallawaySantAnna`` + - Analytical (simple difference) + - Uses simple variance of group-time means. Use ``bootstrap()`` method for multiplier bootstrap inference with proper SEs, CIs, and p-values. + * - ``SyntheticDiD`` + - Bootstrap or placebo-based + - Default uses bootstrap resampling. Set ``n_bootstrap=0`` for placebo-based inference using pre-treatment residuals. + +**Recommendations by sample size:** + +- **Large samples (N > 1000, clusters > 50)**: Default analytical SEs are reliable +- **Medium samples (clusters 30-50)**: Cluster-robust SEs recommended +- **Small samples (clusters < 30)**: Use wild cluster bootstrap (``inference='wild_bootstrap'``) +- **Very few clusters (< 10)**: Use Webb 6-point distribution (``weight_type='webb'``) + +**Common pitfall:** Forgetting to cluster when units are observed multiple times. +For panel data, always cluster at the unit level unless you have a strong reason not to. + +.. code-block:: python + + # Good: Cluster at unit level for panel data + did = DifferenceInDifferences() + results = did.fit(data, outcome='y', treated='treated', + post='post', cluster_col='unit_id') + + # Better for few clusters: Wild bootstrap + did = DifferenceInDifferences(inference='wild_bootstrap') + results = did.fit(data, outcome='y', treated='treated', + post='post', cluster_col='state') + When in Doubt ------------- diff --git a/docs/index.rst b/docs/index.rst index cad1203..b60a9d4 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -40,6 +40,7 @@ Quick Links - :doc:`quickstart` - Get started with basic examples - :doc:`choosing_estimator` - Which estimator should I use? +- :doc:`troubleshooting` - Common issues and solutions - :doc:`r_comparison` - Comparison with R packages - :doc:`python_comparison` - Comparison with Python packages - :doc:`api/index` - Full API reference @@ -51,6 +52,7 @@ Quick Links quickstart choosing_estimator + troubleshooting r_comparison python_comparison diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst new file mode 100644 index 0000000..19b86b9 --- /dev/null +++ b/docs/troubleshooting.rst @@ -0,0 +1,317 @@ +Troubleshooting +=============== + +This guide covers common issues and their solutions when using diff-diff. + +Data Issues +----------- + +"No treated observations found" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** The estimator raises an error that no treated units were found. + +**Causes:** + +1. Treatment column contains wrong values (e.g., strings instead of 0/1) +2. Treatment column has all zeros +3. Column name is misspelled + +**Solutions:** + +.. code-block:: python + + # Check your treatment column + print(data['treated'].value_counts()) + + # Ensure binary 0/1 values + data['treated'] = (data['group'] == 'treatment').astype(int) + + # Or use make_treatment_indicator + from diff_diff import make_treatment_indicator + data['treated'] = make_treatment_indicator(data, 'group', treated_value='treatment') + +"Panel is unbalanced" +~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** TwoWayFixedEffects or CallawaySantAnna fails with unbalanced panel. + +**Causes:** + +1. Some units are missing observations for certain time periods +2. Units have different numbers of observations + +**Solutions:** + +.. code-block:: python + + from diff_diff import balance_panel + + # Balance the panel (keeps only units with all periods) + balanced = balance_panel(data, unit='unit_id', time='period') + print(f"Dropped {len(data) - len(balanced)} observations") + + # Alternative: check balance first + from diff_diff import validate_did_data + issues = validate_did_data(data, outcome='y', treated='treated', + unit='unit_id', time='period') + print(issues) + +Estimation Errors +----------------- + +"Singular matrix" or "Matrix is singular" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** Linear algebra error during estimation. + +**Causes:** + +1. Perfect collinearity in covariates +2. Too few observations relative to parameters +3. Fixed effects that absorb all variation + +**Solutions:** + +.. code-block:: python + + # Check for collinearity + import numpy as np + X = data[['x1', 'x2', 'x3']].values + print(f"Matrix rank: {np.linalg.matrix_rank(X)} vs {X.shape[1]} columns") + + # Remove redundant covariates + # Or use fewer fixed effects + + # For SyntheticDiD, increase regularization + sdid = SyntheticDiD(lambda_reg=1e-4) # default is 1e-6 + +"Bootstrap iterations failed" warning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** SyntheticDiD warns that many bootstrap iterations failed. + +**Causes:** + +1. Small sample size leads to singular matrices in resamples +2. Insufficient pre-treatment periods for weight computation +3. Near-singular weight matrices + +**Solutions:** + +.. code-block:: python + + # Increase regularization + sdid = SyntheticDiD(lambda_reg=1e-4, n_bootstrap=500) + + # Or use placebo-based inference instead + sdid = SyntheticDiD(n_bootstrap=0) # Uses placebo inference + + # Ensure sufficient pre-treatment periods (recommend >= 4) + +Standard Error Issues +--------------------- + +"Standard errors seem too small/large" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** SEs don't match expectations or other software. + +**Causes:** + +1. Wrong clustering level +2. Not accounting for serial correlation +3. Different SE formulas (HC0 vs HC1 vs cluster) + +**Solutions:** + +.. code-block:: python + + # For panel data, always cluster at unit level + results = did.fit(data, outcome='y', treated='treated', + post='post', cluster_col='unit_id') + + # Compare SE methods + did_robust = DifferenceInDifferences() + did_cluster = DifferenceInDifferences() + did_wild = DifferenceInDifferences(inference='wild_bootstrap') + + r1 = did_robust.fit(data, outcome='y', treated='treated', post='post') + r2 = did_cluster.fit(data, outcome='y', treated='treated', + post='post', cluster_col='unit_id') + r3 = did_wild.fit(data, outcome='y', treated='treated', + post='post', cluster_col='unit_id') + + print(f"Robust SE: {r1.se:.4f}") + print(f"Cluster SE: {r2.se:.4f}") + print(f"Wild bootstrap SE: {r3.se:.4f}") + +"Wild bootstrap takes too long" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** Bootstrap inference is slow. + +**Solutions:** + +.. code-block:: python + + # Reduce number of bootstrap iterations (default is 999) + did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=499) + + # Note: Fewer iterations = less precise p-values + # 499 is minimum recommended for publication + +Staggered Adoption Issues +------------------------- + +"No never-treated units found" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** CallawaySantAnna fails when using ``control_group='never_treated'``. + +**Causes:** + +1. All units are eventually treated +2. ``first_treat`` column has no never-treated indicator (typically 0 or inf) + +**Solutions:** + +.. code-block:: python + + # Check first_treat distribution + print(data['first_treat'].value_counts()) + + # Option 1: Use not-yet-treated as controls + cs = CallawaySantAnna(control_group='not_yet_treated') + + # Option 2: Mark never-treated units correctly + # Never-treated should have first_treat = 0 or np.inf + data.loc[data['ever_treated'] == 0, 'first_treat'] = 0 + +"Group-time effects have large standard errors" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** ATT(g,t) estimates are imprecise. + +**Causes:** + +1. Small cohort sizes +2. Few comparison periods +3. High variance in outcomes + +**Solutions:** + +.. code-block:: python + + # Check cohort sizes + print(data.groupby('first_treat')['unit_id'].nunique()) + + # Use bootstrap for better inference + results = cs.fit(data, ...) + bootstrap_results = results.bootstrap(n_bootstrap=999) + + # Aggregate to get more precise estimates + event_study = results.aggregate('event_time') + overall_att = results.att # Aggregated ATT + +Visualization Issues +-------------------- + +"Event study plot looks wrong" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** Plot has unexpected gaps, wrong reference period, or missing periods. + +**Solutions:** + +.. code-block:: python + + from diff_diff import plot_event_study + + # Check your results first + print(results.period_effects) # or results.event_study_effects + + # Specify reference period explicitly + plot_event_study(results, reference_period=-1) + + # For CallawaySantAnna, aggregate first + event_study = results.aggregate('event_time') + plot_event_study(event_study) + +"Plot doesn't show in Jupyter" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Problem:** Matplotlib figure doesn't display. + +**Solutions:** + +.. code-block:: python + + import matplotlib.pyplot as plt + + # Option 1: Use plt.show() + fig = plot_event_study(results) + plt.show() + + # Option 2: Use inline magic (Jupyter) + %matplotlib inline + + # Option 3: Return and display figure + fig = plot_event_study(results) + fig # Display in Jupyter + +Performance Issues +------------------ + +"Estimation is slow" +~~~~~~~~~~~~~~~~~~~~ + +**Problem:** Fitting takes a long time. + +**Causes:** + +1. Large dataset with many fixed effects +2. Bootstrap inference with many iterations +3. CallawaySantAnna with many cohorts and time periods + +**Solutions:** + +.. code-block:: python + + # Use absorb instead of fixed_effects for high-dimensional FE + twfe = TwoWayFixedEffects() + results = twfe.fit(data, outcome='y', treated='treated', + unit='unit_id', time='period', + absorb=['unit_id', 'period']) # Faster than fixed_effects + + # Reduce bootstrap iterations for initial exploration + did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=99) + + # For CallawaySantAnna, start without bootstrap + cs = CallawaySantAnna() + results = cs.fit(data, ...) + # Only bootstrap for final results + bootstrap_results = results.bootstrap(n_bootstrap=999) + +Getting Help +------------ + +If you encounter issues not covered here: + +1. **Check the API documentation** for parameter details +2. **Run validation** with ``validate_did_data()`` to catch data issues +3. **Start simple** with basic DiD before adding complexity +4. **Compare with known results** using ``generate_did_data()`` + +.. code-block:: python + + # Generate test data with known effect + from diff_diff import generate_did_data, DifferenceInDifferences + + data = generate_did_data(n_units=100, n_periods=10, treatment_effect=2.0) + did = DifferenceInDifferences() + results = did.fit(data, outcome='y', treated='treated', post='post') + print(f"True effect: 2.0, Estimated: {results.att:.3f}") + +For bugs or feature requests, please open an issue on +`GitHub `_. diff --git a/pyproject.toml b/pyproject.toml index 90187cb..d23b5b8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "diff-diff" -version = "0.6.0" +version = "1.0.0" description = "A library for Difference-in-Differences causal inference analysis" readme = "README.md" license = "MIT" @@ -20,7 +20,7 @@ keywords = [ "treatment-effects", ] classifiers = [ - "Development Status :: 3 - Alpha", + "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "Operating System :: OS Independent", "Programming Language :: Python :: 3",