Title
Regressions with two-way fixed effects or match effects for large datasets
Syntax
twfe depvar [indepvars] [if] [in] [, options]
options Description ------------------------------------------------------------------------- Required ids(vname1 vname2) ID variables of fixed effects. Must include exactly two ID variables. Optional matcheffect include match fixed effect cluster(varlist) specify cluster variables for one- or two-way clustering replace replace data in memory by estimates of the fixed effects maxit(#) maximum number of iterations for the conjugate gradient algorithm; Default is 500. tol(#) set tolerance of cga; default is 1.0e-7 verbose(#) controls how much detail cga displays Reporting level(#) set confidence level; default is level(95) eform(string) report exponentiated coefficients and label as string display_options control column formats, row spacing, line width, and display of omitted variables and base and empty cells noheader suppress table header notable suppress coefficient header ------------------------------------------------------------------------- predict can be used after twfe with options xb,e,xbu and ue to predict the linear prediction with fixed effects (xbu) and without them (xbu) as well as residuals with fixed effects (ue) and without them (e). Other functionality of predict may work, but I haven't tested it.
Description
twfe fits a linear regression model of depvar on indepvars including fixed effects for the two units defined by ids(varlist). If matcheffect is specified, fixed effects for the interaction of the two id variables are included. The estimates of the fixed effects are saved in new variables called "fe1" (for ID with more units) and "fe2" (for ID with fewer units). If matcheffect is specified additional variables for the match id ("matchid"), the match fixed effect ("matchef") and the match duration ("mlength") are created. If the dataset already contains variables with these names, the original variables are replaced. twfe is intended for estimation in large data sets, where constraints on memory and matsize make standard estimation difficult and time consuming. Instead of solving (X'X)b=X'y by inverting X'X it solves the system by computing the slopes first, then using the conjugate gradient algorithm to compute the smaller set of fixed effects and finally solving for the other fixed effects recursively. See Remarks for further info.
Options
+----------+ ----+ Required +---------------------------------------------------------
ids(varname1 varname2) needs to contain the variable names for the identifiers of the two sets of fixed effects. It has to contain exactly two variables and they have to be numeric. The program always considers the unit that contains more fixed effects (i.e. individuals if there are more individuals than firms) as the first unit, regardless of the order they are specified in ids(). Specifying the larger one as the first variable will make it slightly faster.
+----------+ ----+ Optional +--------------------------------------------------------- matcheffect runs the match fixed effect model instead of the two-way fixed effects model, i.e. in addition to the fixed effects specified by ids(varname1 varname2) a fixed effect for every unique combination of the two identifiers is included in the model.
cluster(varlist) calculates robust as well as one or two-way clustered standard errors using the method proposed in Cameron et al (2006). varlist should contain the variable names of the variables that define the clusters or "het" for heteroskedasticity robust standard errors. If it contains one variable name, one-way clustering is used. For two-way clustering, specify two variables. Optionally, a third variable (order matters) can be specified that identifies unique combinations of the first two clustering variables (i.e. a match id). If such a variable is available, it speeds up the execution because the program does not need to create this variable. Specifying three variables does not do three-way clustering and specifying a variable that is not the interaction between the first two variables will lead to wrong results.
replace By default, twfe saves the data in memory to disc as a temporaty file in order to preserve memory. After estimation, it merges the original data and the estimates of the fixed effects. Specifying replace skips the save and merge, so that the data in memory is replaced by a dataset that only contains the ID variables and the variables created by twfe. If the data in memory is large, this can save time and disc space, but the data currently in memory is changed. Additionally, e(sample) is not returned and predict will probably return erratic results.
maxit(#) The program terminates unsuccessfully if the conjugate gradient algorithm has not converged within the number of iterations specified. The default is 500, see Convergence for further details.
tol(#) The conjugate gradient algorithm terminates successfully if the residual is smaller than the number specified. The default is 1.0e-7, see Convergence for further details.
verbose(#) Controls how much detail the conjugate gradient algorithm displays. Can take values 0 (none) 1 (summary) or 2 (size of residual after every iteration).
+-----------+ ----+ Reporting +--------------------------------------------------------
level(#) controls levels for confidence intervals. See [R] estimation options.
eform(string) specifies that the coefficient table be displayed in exponentiated form as defined in [R] maximize and that string be used to label the exponentiated coefficients in the table.
display_options: see [R] estimation options. They should all work correctly, but I haven't checked all of them.
noheader suppresses the display of the ANOVA table and summary statistics at the top of the output; only the coefficient table is displayed.
notable suppresses display of the coefficient table.
Remarks Method For a full description of the method, see my paper in the References. The algorithms employed for estimation of the two-way FE and the match effect model are a little different. However, both essentially proceed in three steps: 1. Calculate the slope coefficients (by partial regression) 2. Set up a system of equations that is implied by the normal equations and is solved by the OLS estimates of the smaller set of fixed effects. Use the conjugate gradient algorithm to solve it. 3. Calculate all other fixed efffects using the fact that residuals sum to zero. This procedure yields the exact OLS estimates of the slopes and all fixed effects if and only if the data matrix is of full rank. Contrary to the regress command, the program does not verify this. A key advantage of the method is the reduction in the size of the problem to solve by step 2. There are as many equations as there are fixed effects for the second id variable, so it is important to use the right order. The gains in performance are larger the smaller the second set of fixed effects is compared to the first.
Identification Identification conditions for the two-way fixed effects model are discussed in Abowd et al. 2002. For the match fixed effects model, see Woodcock 2008. The normalizations I impose to identify the two-way fixed effects model are that the intercept is set to zero and the (unweighted) fixed effects for id2 sum to zero in every group. The additional normalizations imposed to identify the match fixed effects is that the (duration weighted) match fixed effects sum to zero for every unit of the two ID variables.
Convergence Occasionally, the conjugate gradient algorithm may fail to converge. If this happens, it may either be the case that the system of equations has no unique solution or it may not have found it within the given number of iterations and the specified tolerance. Most problems that have a solution should converge within 500 iterations, but the default tolerance may be quite low when dealing with really large problems. On the other hand, the problem may not have a solution if some of the regressors are perfectly colinear. Generally, the algorithm has no problem with multiple groups (as defined in Abowd et al 2002). In such cases, the group mean of the fixed effects for id2 are set to zero. If there are units created by id2 that have no movers, this implied that their fixed effect is zero as well. However, if there are small groups, moving patterns can be such that there is no unique solution even though people are moving between units. The algorithm does not examine the group structure, so such groups have to be excluded manually in case convergens persistently fails. Amine Ouazad's a2group is a good program to examine the group structure of the data.
Memory (Stata 11 or earlier) Most of the computations are done in Mata and are thus not subject to Stata's memory limits. However, some calls to Stata are made and will not work if Stata has not been assigned enough memory to perform them. If possible, do not set Stata's memory limit really close to the size of the dataset. If you do not have enough main memory to estimate the model, you may be able to speed up the program by using "set virtual on".
Other If you find any mistakes or have any suggestions for improvements, please send me an email to mittag@uchicago.edu. Feel free to use, change or mutilate this program for private purpose, but please don't steal it, give due credit. That being said, I learned a lot about writing code in Mata from Amine Ouazad's a2reg code and would like to thank her for making it publicly available. I would also like to thank Kit Baum for several useful suggestions.
Examples Two-way fixed effects model with two-way clustered standard errors . twfe wage age experience, ids(individual_id firm_id) cluster(individual_id firm_id)
Match effects model with one-way clustered standard errors . twfe wage age experience, ids(individual_id firm_id) cluster(match_id) matcheffect
Saved results
twfe saves the following in e():
Scalars e(N) number of observations e(mss) model sum of squares e(df_m) model degrees of freedom e(rss) residual sum of squares e(df_e) residual degrees of freedom e(r2) R-squared e(ar2) adjusted R-squared e(rmse) root mean squared error e(ncov) number of covariates e(n1) number of fixed effects created by larger group e(n2) number of fixed effects created by smaller group e(nm) number of matches (only with matcheffect) e(F) F statistic of H0: all slopes and FEs are 0 e(pval) p value of H0: all slopes and FEs are 0 e(F_fe) F statistic of H0: all FEs are 0 e(p_fe) p value of H0: all FEs are 0 e(F_x) F statistic of H0: all slopes are 0 e(p_x) p value of H0: all slopes are 0 e(F_fe1) F statistic of H0: all FE by larger ID are 0 (not with matcheffect) e(p_fe1) p value of H0: all FE by larger ID are 0 (not with matcheffect) e(F_fe2) F statistic of H0: all FE by smaller ID are 0 (not with matcheffect) e(p_fe2) p value of H0: all FE by smaller ID are 0 (not with matcheffect)
Macros e(cmd) twfe e(depvar) name of dependent variable e(model) twfe or match e(title) title in estimation output e(clustvar) name of cluster variable(s) e(unit1) name of identifier for larger set of fixed effects e(unit2) name of identifier for smaller set of fixed effects e(predict) Program used to implement predict e(properties) b V
Matrices e(b) coefficient vector e(V) variance-covariance matrix of the estimators e(nomov) Identifies units in smaller ID that did not have any movers. Gives the positions of the units when unique values of smaller ID are sorted.
Functions e(sample) Marks estimation sample (not with replace
Abowd, J. M., R. H. Creecy, and F. Kramarz 2002. Computing person and firm effects using linked longitudinal employer-employee data. Census Bureau Technical Paper TP-2002-06. Cameron, C. A., J. B. Gelbach and D.L. Miller 2006. Robust Inference With Multi-Way Clustering. Mimeo. Mittag, N. 2012. New methods to estimate models with large sets of fixed effects with an application to matched employer-employee data from Germany. FDZ-Methodenreport 02/2012. Woodcock, S.D. 2008. Match Effects. Mimeo.
Author Nikolas Mittag, University of Chicago mittag@uchicago.edu