Skip to content

gorgeousfish/xtbreak

Repository files navigation

xtbreak

Fault Lines — Structural Break Testing & Estimation

Testing and estimating multiple structural breaks in time-series and panel regressions, based on Bai & Perron (1998, 2003) and Ditzen, Karavias & Westerlund (2021, 2024).

Overview

Regression parameters change over time. A policy shift, a financial crisis, or a technology adoption can move intercepts, slopes, or both. The analyst's problem is not just detecting that something changed---it's knowing when, what moved, and what stayed constant across regimes, while retaining the full search path that led to that conclusion.

xtbreak provides:

  • Formal hypothesis tests for structural breaks (supF for a known number of breaks, UDmax/WDmax for an unknown number, sequential l vs l+1).
  • Break-date estimation via dynamic programming over the global SSR surface.
  • Regime-specific coefficients with explicit common/breaking role separation.
  • Post-estimation accessors for the SSR path, segment decomposition, and break-date confidence intervals.

The underlying theory comes from Bai & Perron (1998) for the testing and estimation framework, Bai & Perron (2003) for the computational algorithm, and Ditzen, Karavias & Westerlund (2021) for the panel extension with common correlated effects.

Installation

remotes::install_github("gorgeousfish/xtbreak")

Requires R >= 4.1. No compiled code; no external dependencies beyond base R.

Quick Start

library(xtbreak)

# Simulate a time series with one structural break at t = 60
set.seed(42)
n <- 100
x <- rnorm(n)
y <- c(2 + 3*x[1:60] + rnorm(60), -1 + 0.5*x[61:100] + rnorm(40))
d <- data.frame(y = y, x = x, time = 1:n)

# Estimate the break location
fit <- xtbreak_estimate(y ~ x, data = d, index = "time",
                        breaks = 1, breakconstant = TRUE)
fit
#> xtbreak estimate result
#> -----------------------
#> Result type: fixed-break estimate
#> Observations: 100
#> Breaks: 1
#> SSR: 76.7217888831
#> Trimming: 0.15
#> Minimum segment: 15 periods
#> Model:
#>   breaking=(Intercept)/x, fitted_parameters=2
#>
#> Estimated breaks:
#>  break time_value
#>      1         60

coef(fit)
#>          (Intercept)         x
#> regime_1   1.8363314 3.1257254
#> regime_2  -0.9358977 0.2338294

The true DGP has intercepts (2, -1) and slopes (3, 0.5). The break is correctly located at t = 60, and the estimated coefficients recover the generating parameters.

Recommended Workflow

A complete structural break analysis has three stages: test whether breaks exist, estimate their locations, then inspect the fitted result.

# --- Step 1: Test ---
# supF test: is there at least one break?
test <- xtbreak_test(y ~ x, data = d, index = "time",
                     breaks = 1, hypothesis = "H1",
                     breakconstant = TRUE)
test
#> xtbreak test result
#> -------------------
#> Result type: structural-break test result
#> Hypothesis: H1
#> Test: unknown_supf
#> Statistic: 244.885998738
#> Critical values: c90=4.905, c95=5.735, c99=7.685
#> Rejections: c90=yes, c95=yes, c99=yes

# --- Step 2: Estimate ---
fit <- xtbreak_estimate(y ~ x, data = d, index = "time",
                        breaks = 1, breakconstant = TRUE)
coef(fit)
#>          (Intercept)         x
#> regime_1   1.8363314 3.1257254
#> regime_2  -0.9358977 0.2338294

# --- Step 3: Diagnose ---
confint(fit, data = d)
#>   break index time_value lower_index upper_index lower_time_value
#> 1     1    60         60          59          61               59
#>   upper_time_value level
#> 1               61  0.95

regime_coefficients(fit)
#>   regime start_index end_index start_time_value end_time_value n_periods
#> 1      1           1        60                1             60        60
#> 2      1           1        60                1             60        60
#> 3      2          61       100               61            100        40
#> 4      2          61       100               61            100        40
#>          term   estimate
#> 1 (Intercept)  1.8363314
#> 2           x  3.1257254
#> 3 (Intercept) -0.9358977
#> 4           x  0.2338294

Or use xtbreak() for the full automatic path (UDmax pretest, sequential testing, then estimation):

result <- xtbreak(y ~ x, data = d, index = "time", breakconstant = TRUE)
result
#> xtbreak result
#> --------------
#> Result type: automatic break selection
#> Observations: 100
#> Breaks: 1
#> H2 pretest: run
#> H2 decision: test=supF, statistic=244.886, critical=5.85, level=0.95, reject=yes
#> Sequential decision: test=F(1|0), statistic=244.886, critical=5.735, level=0.95, reject=yes
#> Estimate SSR: 76.7217888831
#>
#> Estimated breaks:
#>  break time_value
#>      1         60

Formula Syntax

The pipe | separates breaking variables (left) from common variables (right). Variables on the left have regime-specific coefficients; variables on the right are constrained to be equal across all regimes.

# x breaks across regimes; z is constant
fit <- xtbreak_estimate(y ~ x | z, data = d2, index = "time",
                        breaks = 1, breakconstant = TRUE)
coef(fit)
#>          (Intercept)         x         z
#> regime_1   1.9344561 3.0903648 0.4548992
#> regime_2  -0.8086968 0.4323753 0.4548992

Equivalently, use the nobreak argument:

fit <- xtbreak_estimate(y ~ x + z, data = d2, index = "time",
                        breaks = 1, breakconstant = TRUE,
                        nobreak = "z")

Panel Data

Balanced panels with fixed effects (demeaned automatically):

set.seed(123)
N <- 5; TT <- 80
panel <- expand.grid(id = 1:N, time = 1:TT)
panel <- panel[order(panel$id, panel$time), ]
panel$x <- rnorm(N * TT)
panel$y <- ifelse(panel$time <= 50,
                  1 + 2*panel$x, -1 + 0.5*panel$x) + rnorm(N*TT, sd = 0.8)

fit_panel <- xtbreak_estimate(y ~ x, data = panel,
                              index = c("id", "time"),
                              breaks = 1, breakconstant = TRUE)
fit_panel
#> xtbreak estimate result
#> -----------------------
#> Result type: fixed-break estimate
#> Observations: 400
#> Groups: 5
#> Breaks: 1
#> SSR: 252.12994177
#>
#> Estimated breaks:
#>  break time_value
#>      1         50

coef(fit_panel)
#>          (Intercept)         x
#> regime_1   0.9907404 2.0175111
#> regime_2  -0.9743793 0.5310434

For cross-sectional averages (CCE), pass csa:

fit_cce <- xtbreak_estimate(y ~ x, data = panel,
                            index = c("id", "time"),
                            breaks = 1, csa = "x, lags(1)")

Use nofixedeffects = TRUE for the pooled panel path (no demeaning).

Core API

Entry Points

Function Purpose
xtbreak_estimate() Estimate break dates for a given number of breaks (global SSR minimization via dynamic programming)
xtbreak_test() Test structural break hypotheses: supF (H1), UDmax/WDmax (H2), sequential l vs l+1 (H3)
xtbreak() Automatic workflow: UDmax pretest + sequential testing + estimation

S3 Methods

summary, coef, confint, plot, predict, fitted, residuals dispatch on xtbreak_estimate objects.

Post-Estimation Accessors

Function Returns
regime_coefficients() Coefficients by regime with common/breaking roles
ssr_path() One-break SSR path with admissible-region flags
ssr_dp_path() Dynamic-programming optimal partition path
ssr_frontier() SSR frontier across break counts
ssr_segments() Segment-level SSR decomposition
ssr_objective_audit() Verify SSR against segment recomputation
confint() Break-date confidence intervals

Data Helpers

augment_regime(), augment_fit(), split_regimes(), split_break_variables(), scatter_regimes().

Stata Post-Estimation Mirrors

estat_indicator(), estat_split(), estat_ssr().

Current Boundaries

Supported:

  • Regular time series (single index).
  • Balanced panels with common break dates across units.
  • Fixed effects (within-group demeaning).
  • Common correlated effects (CCE) via cross-sectional averages.
  • Partial structural change (some coefficients constant across regimes).
  • HAC and nonparametric kernel variance estimators (vce argument).
  • Up to 10 breaks (limited by critical value tables).
  • Break-date confidence intervals for time-series estimates.

Not supported:

  • Unbalanced panels (parsed for interface compatibility, not numerically supported).
  • Unit-specific break dates (all units share a common break).
  • Breaks in trend specifications.
  • Row-level fitted values for fixed-effect panel estimates (available for time-series and pooled panel only).
  • Endogenous regressors or IV estimation.
  • Non-linear models.

References

Bai, J. & Perron, P. (1998). Estimating and testing linear models with multiple structural changes. Econometrica, 66(1), 47--78.

Bai, J. & Perron, P. (2003). Computation and analysis of multiple structural change models. Journal of Applied Econometrics, 18(1), 1--22.

Ditzen, J., Karavias, Y. & Westerlund, J. (2025). Testing and estimating structural breaks in time series and panel data in Stata. The Stata Journal, 25(3), 526--560.

Ditzen, J., Karavias, Y. & Westerlund, J. (2022). Multiple structural breaks in interactive effects panel data and the impact of quantitative easing on bank lending. arXiv preprint arXiv:2211.06707.

Authors

R Implementation:

Methodology:

  • Jan Ditzen, Free University of Bozen-Bolzano
  • Yiannis Karavias, University of Birmingham
  • Joakim Westerlund, Lund University

License

AGPL-3.0

Citation

If you use xtbreak in published work, please cite both the software and the methodology:

Cai, X., Xu, W., Ditzen, J., Karavias, Y. & Westerlund, J. (2026). xtbreak: Structural Break Testing and Estimation in R [Computer software]. https://github.com/gorgeousfish/xtbreak

Ditzen, J., Karavias, Y. & Westerlund, J. (2025). Testing and estimating structural breaks in time series and panel data in Stata. The Stata Journal, 25(3), 526--560.

@software{cai2026xtbreak,
  title   = {xtbreak: Structural Break Testing and Estimation in {R}},
  author  = {Cai, Xuanyu and Xu, Wenli and Ditzen, Jan and Karavias, Yiannis and Westerlund, Joakim},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/gorgeousfish/xtbreak}
}

@article{ditzen2025xtbreak,
  title   = {Testing and estimating structural breaks in time series and panel data in {Stata}},
  author  = {Ditzen, Jan and Karavias, Yiannis and Westerlund, Joakim},
  journal = {The Stata Journal},
  volume  = {25},
  number  = {3},
  pages   = {526--560},
  year    = {2025}
}

About

R package for testing and estimating structural breaks in time series and panel data using Bai-Perron and dynamic programming methods. Provides hypothesis tests (supF, UDmax, WDmax), break-date estimation, and support for fixed effects and common correlated effects models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages