{smcl} {* *! version 1.1.0 28mar2026}{...} {cmd:help xtcbc} {right:version 1.1.0} {hline} {title:Title} {p2colset 5 18 20 2}{...} {p2col:{hi:xtcbc} {hline 2}}Coefficient-by-Coefficient Breaks in Panel Data Models{p_end} {p2colreset}{...} {title:Version} {pstd} Version 1.1.0, 28 March 2026{p_end} {pstd} {bf:Author:} Dr Merwan Roudane ({browse "mailto:merwanroudane920@gmail.com":merwanroudane920@gmail.com}){p_end} {pstd} Implements the CBCL estimator of Kaddoura (2025, {it:Journal of Econometrics}).{p_end} {title:Syntax} {p 8 16 2}{cmd:xtcbc} {depvar} {indepvars} {ifin} [{cmd:,} {it:options}]{p_end} {synoptset 28 tabbed}{...} {synopthdr} {synoptline} {syntab:Penalty settings} {synopt:{opt kap:pa(#)}}weight exponent for adaptive weights; default is {cmd:kappa(2)}{p_end} {synopt:{opt ngr:id(#)}}number of lambda grid points; default is {cmd:ngrid(50)}{p_end} {synopt:{opt cons:tant(#)}}BIC-type penalty constant c; default is {cmd:constant(0.05)}{p_end} {syntab:Data transformation} {synopt:{opt csd:emean}}cross-section demean data to partial out interactive effects{p_end} {syntab:Reporting} {synopt:{opt gr:aph}}produce coefficient path, IC, and break timeline graphs{p_end} {synopt:{opt l:evel(#)}}set confidence level; default is {cmd:level(95)}{p_end} {synoptline} {p2colreset}{...} {p 4 6 2} You must {cmd:xtset} your data before using {cmd:xtcbc}; see {helpb xtset}.{p_end} {p 4 6 2} The panel must be strongly balanced (no gaps).{p_end} {title:Description} {pstd} {cmd:xtcbc} implements the {bf:Coefficient-by-Coefficient Lasso (CBCL)} break estimator proposed by Kaddoura (2025). Unlike traditional structural break methods that estimate {it:vector breaks} (where all parameters shift simultaneously), {cmd:xtcbc} allows each coefficient to have its {it:own} number of breaks and break dates.{p_end} {pstd} In a panel regression y_it = x'_it * beta_t + u_it, the vector-break approach assumes all p components of beta_t change at the same dates. The CBC approach relaxes this: the kth component beta_{k,t} can have m_k breaks, independently for each k = 1, ..., p.{p_end} {pstd} The method is designed for panels with {bf:large N} and {bf:fixed or small T}. Asymptotics rely on N -> infinity.{p_end} {pstd} {bf:Key features:}{p_end} {p 8 12 2}1. {bf:Automatic break detection:} Simultaneously determines the number and location of breaks for each coefficient.{p_end} {p 8 12 2}2. {bf:Adaptive penalization:} Uses L1 fused penalty with adaptive weights that are asymptotically oracle-equivalent.{p_end} {p 8 12 2}3. {bf:Post-selection estimation:} Re-estimates coefficients in each regime using sub-regime OLS (Appendix B).{p_end} {p 8 12 2}4. {bf:Information criterion:} Selects optimal lambda via a BIC-type IC (Theorem 1.4).{p_end} {p 8 12 2}5. {bf:Fixed-effects handling:} Eliminates individual effects via first-period deviation.{p_end} {title:Options} {dlgtab:Penalty settings} {phang} {opt kappa(#)} specifies the exponent for adaptive weights:{p_end} {p 8 12 2}w_{k,t} = |beta_dot_{k,t} - beta_dot_{k,t-1}|^(-kappa){p_end} {pstd} where beta_dot are the initial partialed-out OLS estimates. Larger kappa puts stronger penalty on small differences. Default is {cmd:kappa(2)}, following Qian and Su (2016) and Kaddoura (2025).{p_end} {phang} {opt ngrid(#)} specifies the number of log-spaced grid points for lambda. The algorithm evaluates the CBCL objective at each point and selects the lambda minimizing the information criterion. Default is {cmd:ngrid(50)}.{p_end} {phang} {opt constant(#)} specifies the constant c in the BIC-type penalty:{p_end} {p 8 12 2}phi = c * log(N) / sqrt(N){p_end} {pstd} Default is {cmd:constant(0.05)}. Results are not very sensitive to c within a reasonable range (0.01 to 0.10).{p_end} {dlgtab:Data transformation} {phang} {opt csdemean} cross-section demeans data before estimation. Removes cross-sectional means from both dependent and independent variables at each time period. Useful for models with interactive/common factor effects, following Kaddoura and Westerlund (2023).{p_end} {dlgtab:Reporting} {phang} {opt graph} produces three diagnostic graphs saved as PNG files:{p_end} {p 8 12 2}1. {bf:xtcbc_coefficients.png:} Multi-panel graph showing coefficient paths beta_{k,t} over time. Dashed red lines mark detected break dates.{p_end} {p 8 12 2}2. {bf:xtcbc_ic.png:} IC_1(lambda) vs log(lambda). Dashed red line marks optimal lambda*.{p_end} {p 8 12 2}3. {bf:xtcbc_timeline.png:} Break timeline with coefficients on y-axis, time on x-axis.{p_end} {phang} {opt level(#)} specifies confidence level. Default is {cmd:level(95)}.{p_end} {title:Methodology} {pstd} {bf:1. Data Generating Process}{p_end} {pstd} Consider the fixed-effects panel model:{p_end} {p 8 12 2}y_it = xi_i + x'_it * beta_t + epsilon_it, i = 1,...,N, t = 1,...,T{p_end} {pstd} where xi_i are individual fixed effects. The kth component follows:{p_end} {p 8 12 2}beta_{k,t} = alpha_{k,j} for T_{k,j-1} <= t < T_{k,j}, j = 1,...,m_k+1{p_end} {pstd} where m_k is the number of breaks for coefficient k. Each coefficient is free to have a different m_k and different break dates.{p_end} {pstd} {bf:2. Fixed Effects Elimination}{p_end} {pstd} Fixed effects are eliminated via first-period deviation:{p_end} {p 8 12 2}ytilde_it = y_it - y_i1 = x'_it * beta_t - x'_i1 * beta_1 + (eps_it - eps_i1){p_end} {pstd} {bf:3. Initial Estimates (Partialed-Out OLS, Eq 2.4)}{p_end} {pstd} For each k and t, the initial estimate beta_dot_{k,t} is obtained by partialing out all other regressors:{p_end} {p 8 12 2}beta_dot_{k,t} = [X'_{k,t} M_{-k,t} X_{k,t}]^(-1) * X'_{k,t} M_{-k,t} ytilde_t{p_end} {pstd} where M_A = I - A*(A'A)^(-1)*A' and X_{-k,t} includes all other regressors at time t plus all first-period regressors.{p_end} {pstd} {bf:4. CBCL Objective Function (Eq 2.3)}{p_end} {pstd} The penalized estimates minimize:{p_end} {p 8 12 2}L_lambda(beta) = (1/N) * SUM_i SUM_{t>=2} [ytilde_it - xbar'_it * beta]^2{p_end} {p 8 12 2}{space 14}+ lambda * SUM_{t>=2} SUM_k w_{k,t} * |beta_{k,t} - beta_{k,t-1}|{p_end} {pstd} The L1 fused penalty encourages consecutive coefficients to be equal (fusing them into regimes). Adaptive weights ensure true breaks are not over-penalized. The estimator uses block coordinate descent with soft-thresholding and bidirectional sweeps.{p_end} {pstd} {bf:5. Post-Selection Estimation (Eq 2.5, Appendix B)}{p_end} {pstd} After detecting breaks, the post-selection estimator computes regime-specific coefficients using sub-regime OLS:{p_end} {p 8 12 2}alpha_{k,j} = [X'_{k,r} M_{X_breve} X_{k,r}]^(-1) * X'_{k,r} M_{X_breve} ytilde_r{p_end} {pstd} where r = r_{k,j} is the jth regime and M_{X_breve} projects out block-diagonal matrices of other coefficients' sub-regimes.{p_end} {pstd} {bf:Sub-regime construction:} For coefficient k in regime j, the other coefficients ell != k may break at different dates. The estimator creates sub-regime intersection sets r^(k,j)_{ell,c} = r_{k,j} INTERSECT ell's cth regime. This builds block-diagonal regressor matrices X_breve_ell.{p_end} {pstd} {bf:6. Asymptotic Distribution (Theorem 1.3)}{p_end} {p 8 12 2}sqrt(N) * O^(1/2) * (alpha_hat - alpha_0) -> N(0, Theta_0^{-1} * Sigma_0 * Theta_0^{-1}){p_end} {pstd} Standard errors come from the post-selection residual variance and the cross-products of partialed-out regressors.{p_end} {pstd} {bf:7. Tuning Parameter Selection (Theorem 1.4)}{p_end} {pstd} Lambda is selected by minimizing a BIC-type information criterion:{p_end} {p 8 12 2}IC_1(lambda) = sigma^2_hat(lambda) + phi * SUM_k [mhat_k(lambda) + 1]{p_end} {pstd} where:{p_end} {p 8 12 2}- sigma^2_hat is the post-selection residual variance{p_end} {p 8 12 2}- mhat_k(lambda) is the detected number of breaks for coefficient k{p_end} {p 8 12 2}- phi = c * log(N) / sqrt(N) is the model complexity penalty{p_end} {pstd} By Theorem 1.4, minimizing IC_1 consistently selects the true number of breaks as N -> infinity.{p_end} {title:Output Tables} {pstd} {cmd:xtcbc} produces four publication-quality output tables:{p_end} {p 8 12 2}1. {bf:Header:} Model specifications: dependent variable, regressors, panel dimensions (N, T), penalty settings (kappa, c), and optimal lambda.{p_end} {p 8 12 2}2. {bf:Break Detection:} For each coefficient k, reports the number of breaks mhat_k and estimated break dates. Non-breaking coefficients marked {it:none}.{p_end} {p 8 12 2}3. {bf:Post-Selection Estimates:} Regime-specific coefficients with standard errors, t-statistics, and significance stars (* p<0.10, ** p<0.05, *** p<0.01).{p_end} {p 8 12 2}4. {bf:CBC Estimation Results (Table 6 format):} Paper-style table with "No breaks" column for stable coefficients and regime-specific columns for breaking coefficients.{p_end} {title:Stored Results} {pstd} {cmd:xtcbc} stores the following in {cmd:e()}:{p_end} {synoptset 24 tabbed}{...} {p2col 5 24 28 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of cross-sectional units{p_end} {synopt:{cmd:e(T)}}number of time periods{p_end} {synopt:{cmd:e(p)}}number of regressors{p_end} {synopt:{cmd:e(kappa)}}adaptive weight exponent{p_end} {synopt:{cmd:e(ngrid)}}number of lambda grid points{p_end} {synopt:{cmd:e(c_const)}}BIC penalty constant c{p_end} {synopt:{cmd:e(opt_lambda)}}optimal tuning parameter lambda*{p_end} {synopt:{cmd:e(total_breaks)}}total breaks across all coefficients{p_end} {synopt:{cmd:e(nbreaks_k)}}breaks for coefficient k (k=1,...,p){p_end} {p2col 5 24 28 2: Matrices}{p_end} {synopt:{cmd:e(nbreaks)}}1 x p vector of break counts{p_end} {synopt:{cmd:e(break_dates)}}max_breaks x p matrix of break dates{p_end} {synopt:{cmd:e(alpha_info)}}n_alpha x 6 matrix: col 1=k, col 2=regime j, col 3=start, col 4=end, col 5=coef, col 6=SE{p_end} {synopt:{cmd:e(beta_hat)}}T x p matrix of penalized estimates{p_end} {synopt:{cmd:e(ic_values)}}ngrid x 1 vector of IC values{p_end} {p2col 5 24 28 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:xtcbc}{p_end} {synopt:{cmd:e(cmdline)}}full command as typed{p_end} {synopt:{cmd:e(depvar)}}dependent variable name{p_end} {synopt:{cmd:e(indepvars)}}independent variable names{p_end} {synopt:{cmd:e(title)}}estimation title{p_end} {title:Examples} {pstd}{bf:Example 1: Basic usage}{p_end} {phang}{cmd:. xtset id year}{p_end} {phang}{cmd:. xtcbc gdp_growth invest trade inflation}{p_end} {pstd}{bf:Example 2: With graphs and custom penalty}{p_end} {phang}{cmd:. xtcbc gdp_growth invest trade inflation, graph kappa(2) ngrid(100) constant(0.05)}{p_end} {pstd}{bf:Example 3: With cross-section demeaning}{p_end} {phang}{cmd:. xtcbc gdp_growth invest trade inflation, csdemean graph}{p_end} {pstd}{bf:Example 4: Simulated data with known break structure}{p_end} {phang}{cmd:. set seed 12345}{p_end} {phang}{cmd:. set obs 500}{p_end} {phang}{cmd:. gen id = ceil(_n / 5)}{p_end} {phang}{cmd:. gen time = mod(_n-1, 5) + 1}{p_end} {phang}{cmd:. xtset id time}{p_end} {phang}{cmd:. gen xi = 0}{p_end} {phang}{cmd:. forvalues i = 1/100 {c -(}}{p_end} {phang}{cmd:. local xv = rnormal()}{p_end} {phang}{cmd:. qui replace xi = `xv' if id == `i'}{p_end} {phang}{cmd:. {c )-}}{p_end} {phang}{cmd:. gen x1 = 0.2*xi + rnormal()}{p_end} {phang}{cmd:. gen x2 = 0.2*xi + rnormal()}{p_end} {phang}{cmd:. gen y = xi + 1*x1 + cond(time<=3, 2, 5)*x2 + rnormal(0, 0.5)}{p_end} {phang}{cmd:. drop xi}{p_end} {phang}{cmd:. xtcbc y x1 x2, graph}{p_end} {pstd}{bf:Example 5: Accessing stored results}{p_end} {phang}{cmd:. xtcbc y x1 x2 x3}{p_end} {phang}{cmd:. mat list e(nbreaks)}{p_end} {phang}{cmd:. mat list e(alpha_info)}{p_end} {phang}{cmd:. display "Optimal lambda = " e(opt_lambda)}{p_end} {phang}{cmd:. display "Total breaks = " e(total_breaks)}{p_end} {phang}{cmd:. display "Breaks in x1 = " e(nbreaks_1)}{p_end} {pstd}{bf:Example 6: Monte Carlo DGP from Section 4.1}{p_end} {phang}{cmd:. * See xtcbc_demo.do for the full simulation}{p_end} {phang}{cmd:. * p=6, T=5, N=200, true breaks = [2, 3, 0, 0, 0, 1]}{p_end} {phang}{cmd:. do xtcbc_demo.do}{p_end} {title:Remarks} {pstd} {bf:Comparison with vector-break estimators.}{p_end} {pstd} In classic structural break estimation (Bai and Perron 1998, Qian and Su 2016), all coefficients share the same break dates ("vector break" assumption). When some coefficients do not actually change at a detected break date, this introduces unnecessary regime-splitting, reduces effective sample size, increases variance, and can mask the significance of truly time-varying coefficients.{p_end} {pstd} The CBCL estimator resolves this by allowing each coefficient to break independently. In the empirical application to U.S. county crime data (Cornwell and Trumbull 1994), the vector-break estimator finds 2 breaks in all sixteen coefficients, while the CBC estimator correctly finds breaks only in three control variables, leaving the deterrence coefficients unbroken (see paper Table 6).{p_end} {pstd} {bf:Choosing kappa.}{p_end} {pstd} The adaptive weight exponent kappa controls penalization: larger kappa means stronger shrinkage of small differences, making the estimator more conservative (fewer false breaks). The value kappa=2 is standard and is recommended by both Qian and Su (2016) and Kaddoura (2025).{p_end} {pstd} {bf:Computational considerations.}{p_end} {pstd} Complexity: O(G * iter * T * p * N), where G = ngrid, iter = coordinate descent iterations per lambda. For typical panels (N=100-500, T=5-20, p=3-8), this takes seconds. For N > 1000, consider reducing ngrid.{p_end} {pstd} {bf:Balanced panels required.}{p_end} {pstd} The current implementation requires strongly balanced panels. Unbalanced panels should be balanced before running {cmd:xtcbc}.{p_end} {pstd} {bf:Data requirements.}{p_end} {pstd} The algorithm requires T >= 3 and N >= p+1. Small T limits break detection. The paper's Monte Carlo shows good performance for T=5 and N >= 100.{p_end} {title:References} {phang} Kaddoura, Y. (2025). Estimating coefficient-by-coefficient breaks in panel data models. {it:Journal of Econometrics}, 249, 106005.{p_end} {phang} Qian, J. and Su, L. (2016). Shrinkage estimation of common breaks in panel data models via adaptive group fused lasso. {it:Journal of Econometrics}, 191, 86-109.{p_end} {phang} Kaddoura, Y. and Westerlund, J. (2023). Estimation of panel data models with random interactive effects and multiple structural breaks when T is fixed. {it:Journal of Business & Economic Statistics}, 41, 778-790.{p_end} {phang} Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes. {it:Econometrica}, 66, 47-78.{p_end} {phang} Bonhomme, S. and Manresa, E. (2015). Grouped patterns of heterogeneity in panel data. {it:Econometrica}, 83, 1147-1184.{p_end} {phang} Cornwell, C. and Trumbull, W. N. (1994). Estimating the economic model of crime with panel data. {it:Review of Economics and Statistics}, 76, 360-366.{p_end} {title:Author} {pstd} Dr Merwan Roudane{p_end} {pstd} {browse "mailto:merwanroudane920@gmail.com":merwanroudane920@gmail.com}{p_end} {pstd} Please cite as:{p_end} {pstd} Roudane, M. (2026). xtcbc: Stata module for coefficient-by-coefficient breaks in panel data models.{p_end} {title:Also see} {psee} {helpb xtpmg}, {helpb xtlmbreak}, {helpb xtset}, {helpb regress} {p_end}