------------------------------------------------------------------------------- help forxtabond2-------------------------------------------------------------------------------

"Difference" and "system" GMM dynamic panel estimator

xtabond2depvarvarlist[ifexp] [inrange] [weight] [,level(#)svmattwosteprobustcluster(varname)noconstantsmallnoleveleqorthogonalgmmopt[gmmopt...]ivopt[ivopt...]pcacomponents(#)artests(#)arlevelsh(#)nodiffsargannomata]where

gmmoptis

gmmstyle(varlist[,laglimits(##)collapseorthogonalequation({diff|level|both})passthrusplit])and

ivoptis

ivstyle(varlist[,equation({diff|level|both})passthru]mz)

aweights,pweights, andfweights are allowed.fweightsmust be constant over time. See help weights.

xtabond2is for use with cross-section time-series data. You musttssetyour data before usingxtabond2; see help tsset.All

varlists may contain time-series operators and, in Stata version 11 or later, factor variables. See help varlist.

by...:may be used withxtabond2if no time-series operators are used in the command line. Thebyclause will not restrict the sample from which lags are drawn in building instruments. See help by.

xtabond2shares features of all estimation commands; see help estcom.The syntax of predict following

xtabond2is

predict[type]newvarname[ifexp] [inrange] [,statistic] [difference]where

statisticis

xbbx_it, fitted values (the default)residualse_it, the residuals

Description

xtabond2can fit two closely related dynamic panel data models. The first is the Arellano-Bond (1991) estimator, which is also available withxtabond, though without the two-step standard error correction described below. It is sometimes called "difference GMM." The second is an augmented version outlined by Arellano and Bover (1995) and fully developed by Blundell and Bond (1998). It is known as "system GMM." Roodman (2009) provides a pedagogic introduction to linear GMM, these estimators, andxtabond2. The estimators are designed for dynamic "small-T, large-N" panels that may contain fixed effects and--separate from those fixed effects--idiosyncratic errors that are heteroskedastic and correlated within but not across individuals. Consider the model:y_it = x_it * b_1 + w_it * b_2 + u_it i=1,...,N; t=1,...,T u_it = v_i + e_it,

where

v_i are unobserved individual-level effects;

e_it are the observation-specific errors;

x_it is a vector of strictly exogenous covariates (ones dependent on neither current nor past e_it);

w_it is a vector of predetermined covariates (which may include the lag of y) and endogenous covariates, all of which may be correlated with the v_i (Predetermined variables are potentially correlated with past errors. Endogenous ones are potentially correlated with past and present errors.);

b_1 and b_2 are vectors of parameters to be estimated;

and E[v_i]=E[e_it]=E[v_i*e_it]=0, and E[e_it*e_js]=0 for each i, j, t, s, i<>j.

First-differencing the equation removes the v_i, thus eliminating a potential source of omitted variable bias in estimation. However, differencing variables that are predetermined but not strictly exogenous makes them endogenous since the w_it in some D.w_it = w_it – w_i,t-1 is correlated with the e_i,t-1 in D.e_it. Following Holt-Eakin, Newey, and Rosen (1988), Arellano and Bond (1991) develop a Generalized Method of Moments estimator that instruments the differenced variables that are not strictly exogenous with all their available lags in levels. (Strictly exogenous variables are uncorrelated with current and past errors.) Arellano and Bond also develop an appropriate test for autocorrelation, which, if present, can render some lags invalid as instruments.

A problem with the original Arellano-Bond estimator is that lagged levels are poor instruments for first differences if the variables are close to a random walk. Arellano and Bover (1995) describe how, if the original equation in levels is added to the system, additional instruments can be brought to bear to increase efficiency. In this equation, variables in

levelsare instrumented with suitable lags of their ownfirstdifferences. The assumption needed is that these differences are uncorrelated with the unobserved country effects. Blundell and Bond show that this assumption in turn depends on a more precise one about initial conditions.

xtabond2implements both estimators--twice. The version in Stata’s ado programming language is slow but compatible with Stata 7 and 8. The Mata version is usually faster, and runs in Stata 10.0 or later. Thextabond2optionnomataprevents the use of Mata even when it is available.The Mata version also includes the option to use the forward orthogonal deviations transform instead of first differencing. Proposed by Arellano and Bover (1995) the orthogonal deviations transform, rather than subtracting the previous observation, subtracts the average of all available future observations. The result is then multiplied by a scale factor chosen to yield the nice but relatively unimportant property that if the original e_it are i.i.d., then so are the transformed ones (see Arellano and Bover (1995) and Roodman (2009)). Like differencing, taking orthogonal deviations removes fixed effects. Because lagged observations of a variable do not enter the formula for the transformation, they remain orthogonal to the transformed errors (assuming no serial correlation), and available as instruments. In fact, for consistency, the software stores the orthogonal deviation of an observation one period late, so that, as with differencing, observations for period 1 are missing and, for an instrumenting variable w, w_i,t-1 enters the formula for the transformed observation stored at i,t. With this move, exactly the same lags of variables are valid as instruments under the two transformations.

On balanced panels, GMM estimators based on the two transforms return numerically identical coefficient estimates, holding the instrument set fixed (Arellano and Bover 1995). But orthogonal deviations has the virtue of preserving sample size in panels with gaps. If some e_it is missing, for example, neither D.e_it nor D.e_i,t+1 can be computed. But the orthogonal deviation can be computed for every complete observation except the last for each individual. (First differencing can do no better since it must drop the first observation for each individual.) Note that "difference GMM" is still called that even when orthogonal deviations are used. We will refer to the equation in differences or orthogonal deviations as the

transformedequation. In system GMM with orthogonal deviations, the levels oruntransformedequation is still instrumented with differences as described above.

xtabond2reports the Arellano-Bond test for autocorrelation, which is applied to the differenced residuals in order to purge the unobserved and perfectly autocorrelated v_i. AR(1) is expected in first differences, because D.e_i,t = e_i,t - e_i,t-1 should correlate with D.e_i,t-1 = e_i,t-1 - e_i,t-2 since they share the e_i,t-1 term. So to check for AR(1) in levels, look for AR(2) in differences, on the idea that this will detect the relationship between the e_i,t-1 in D.e_i,t and the e_i,t-2 in D.e_i,t-2. This reasoning does not work for orthogonal deviations, in which the residuals for an individual are all mathematically interrelated, thus contaminated from the point of view of detecting AR in the e_it. So the test is run on differenced residuals even after estimation in deviations. Autocorrelation indicates that lags of the dependent variable (and any other variables used as instruments that are not strictly exogenous), are in fact endogenous, thus bad instruments. For example, if there is AR(s), then y_i,t-s would be correlated with e_i,t-s, which would be correlated with D.e_i,t-s, which would be correlated with D.e_i,t.

xtabond2also reports tests of over-identifying restrictions--of whether the instruments, as a group, appear exogenous. For one-step, non-robust estimation, it reports the Sargan statistic, which is the minimized value of the one-step GMM criterion function. The Sargan statistic is not robust to heteroskedasticity or autocorellation. So for one-step, robust estimation (and for all two-step estimation),xtabond2also reports the HansenJstatistic, which is the minimized value of the two-step GMM criterion function, and is robust.xtabond2still reports the Sargan statistic in these cases because theJtest has its own problem: it can be greatly weakened by instrument proliferation. The Mata version goes further, reporting difference-in-Sargan statistics (really, difference-in-Hansen statistics, except in one-step robust estimation), which test for whether subsets of instruments are valid. To be precise, it reports one test for each group of instruments defined by anivstyle()orgmmstyle()option (explained below). So replacinggmmstyle(x y)in a command line withgmmstyle(x) gmmstyle(y)will yield the same estimate but distinct difference-in-Sargan/Hansen tests. In addition, including thesplitsuboption in agmmstyle()option in system GMM splits an instrument group in two for difference-in-Sargan/Hansen purposes, one each for the transformed equation and levels equations. This is especially useful for testing the instruments for the levels equation based on lagged differences of the dependent variable, which are the most suspect in system GMM and the subject of the "initial conditions" in the title of Blundell and Bond (1998). In the same vein, in system GMM,xtabond2also tests all the GMM-type instruments for the levels equation as a group. All of these tests, however, are weak when the instrument count is high. Difference-in-Sargan/Hansen tests are are computationally intensive since they involve re-estimating the model for each test; thenodiffsarganoption is available to prevent them.As linear GMM estimators, the Arellano-Bond and Blundell-Bond estimators have one- and two-step variants. But though two-step is asymptotically more efficient, the reported two-step standard errors tend to be severely downward biased (Arellano and Bond 1991; Blundell and Bond 1998). To compensate,

xtabond2makes available a finite-sample correction to the two-step covariance matrix derived by Windmeijer (2005). This can make two-step robust estimations more efficient than one-step robust, especially for system GMM.Standard errors can also be "bootstrapped"--but not with the

bootstrapcommand. That command builds temporary data sets by sampling the real onewith replacement. And having multiple observations for a given observational unit and time period violates panel structure. Instead, usejacknife, perhaps with thecluster()option, clustering on the panel identifier variable, in order to drop each observational unit in turn.The syntax of

xtabond2differs substantially from that ofxtabondandxtdpdsys.xtabond2almost completely decouples specification ofregressorsfrom specification ofinstruments. As a result, most variables used will appear twice in anxtabond2command line.xtabond2requires the initialvarlistof the command line to include all regressors except for the optional constant term, be they strictly exogenous, predetermined, or endogenous. Variables used to form instruments then appear ingmmstyle()orivstyle()options after the comma. The result is a loss of parsimony, but fuller control over the instrument matrix. Variables can be used as the basis for "GMM-style" instrument sets without being included as regressors, or vice versa.The

gmmstyle()andivstyle()options also have suboptions that allow further customization of the instrument matrix.

Citationxtabond2is not an official Stata command. It is a free contribution to the research community. Please cite it as such: Roodman, D. 2009. How to do xtabond2: An introduction to difference and system GMM in Stata.Stata Journal9(1): 86-136.

Options

level(#)specifies the confidence level, in percent, for confidence intervals of the coefficients; see help level. The default is 95.

svmattellsxtabond2to save the X, Y, Z, H, and weight matrices as e() return macros. These are not included by default because the matrices can be larger than the data set itself. If thepcaoption is used,svmatwill also save the eigenvectors matrix as xtabond2_eigenvectors. This option is available only when using using the Mata implementation in Mata's speed-favoring mode. Data are stored in balanced matrices and sorted by individual, equation (for System GMM), then time. Rows and columns are labelled for clarity. The instrument matrix typically contains all-zero columns, which do not affect estimation. For compatibility with Stata column-labeling conventions, instruments subject to the backward orthogonal deviations transform (see below) are still denoted with a "D." operator.

twostepspecifies that the two-step estimator is to be calculated instead of the one-step.

robust: For one-step estimation,robustspecifies that the robust estimator of the covariance matrix of the parameter estimates be calculated. The resulting standard error estimates are consistent in the presence of any pattern of heteroskedasticity and autocorrelation within panels. In two-step estimation, the standard covariance matrix is already robust in theory--but typically yields standard errors that are downward biased.twostep robustrequests Windmeijer’s finite-sample correction for the two-step covariance matrix.

cluster(varname)overrides the default use of the panel identifier (as set bytsset) as the basis for defining groups.cluster(varname)impliesrobustin the senses just described. For example, in two-step estimation, it requests the Windmeijer correction. Changing the group identifier with this option affects one-step "robust" standard errors, all two-step results, the Hansen and difference-in-Hansen tests, and the Arellano-Bond serial correlation tests.

noconstantsuppresses the constant term in the levels equation. By default, the term is included as a regressor and IV-style instrument. Unlike xtabond and DPD (the original implementation of these estimators),xtabond2does not include the constant term in the transformed equation in difference GMM. Rather, the constant is transformed out.

smallrequestststatistics instead ofzstatistics and anFtest instead of a Wald chi-squared test of overall model fit.

noleveleqspecifies that level equation should be excluded from the estimation, yielding difference rather than system GMM.

nodiffsarganprevents difference-in-Sargan/Hansen tests, which are are computationally intensive since they involve re-estimating the model for each test. The option has no effect on the ado version ofxtabond2, which does not perform difference-in-Sargan/Hansen testing anyway.

nomataprevents the use of Mata code even when the language is available (in Stata 10.0 or later). It is not necessary in Stata 7-9. Ordinarily this switch does not affect results. However, if some variables are collinear or nearly so, the two versions of the program may dropped different ones, which can affect the results. They can even differ in how many they drop, since the versions use different routines and tolerances for determining collinearity. In addition, the Mata version does not perfectly handle strange and unusual expressions likegmm(L.x, lag(-1 -1)). (Documentation for thegmmstyle()option is below.) This expression is the same asgmm(x,lag(0 0))in principle. But the Mata code would interpret it by lagging x, thus losing the observations of x fort=T, then unlagging the remaining information. The slow, ado version would not lose data in this way.

orthogonalrequests the forward orthogonal deviations transform instead of differencing.

ivstyle()specifies a set of variables to serve as standard instruments, with one column in the instrument matrix per variable. Normally, strictly exogenous regressors are included inivstyleoptions, in order to enter the instrument matrix, as well as being listed before the main comma of the command line. Theequation()suboption specifies which equation(s) should use the instruments: first-difference only (equation(diff)), levels only (equation(level)), or both (equation(both)), the default. Also by default, the instruments are transformed (into differences or orthogonal deviations) for use in the transformed equation and entered untransformed for the levels equation. The suboptionpassthrumay be used afterequation(diff), or when the optionnoleveleqis invoked, to prevent this transformation.equation()is useful for proper handling of predetermined variables used as IV-style instruments in system GMM. For example, if x is predetermined, it is a valid instrument for the levels equation since it is assumed to be uncorrelated with the contemporaneous error term. However, x becomes endogenous in first differences, so D.x is not a valid instrument for the transformed equation.ivstyle(x)would therefore be inappropriate. The use of x as an IV-style instrument in levels only could be specified byiv(x, eq(level)).If the suboption

mzis included in anivstyleoption, missing values in the instruments are converted to zeroes.mzdoes not change the precise moment conditions generated byivstyle--they still apply only to the error terms of observations which have data for the instruments. Rather,mzallows observations that are missing data for the instruments in question to nonetheless stay in the regressionifthe instruments are not also regressors. (Observations missing values for regressors must still be dropped.)

gmmstyle()specifies a set of variables to be used as bases for "GMM-style" instrument sets described in Holtz-Eakin, Newey, and Rosen (1988) and Arellano and Bond (1991). By defaultxtabond2uses, for each time period, all available lags of the specified variables in levels dated t-1 or earlier as instruments for the transformed equation; and uses the contemporaneous first differences as instruments in the levels equation. These defaults are appropriate for predetermined variables that are not strictly exogenous (Bond 2000). Missing values are always replaced by zeros. The optionallaglimits(a b)suboption can override these defaults: for the transformed equation, lagged levels dated t-ato t-bare used as instruments, while for the levels equation, the first-difference dated t-a+1 is normally used.aandbcan each be missing (".");adefaults to 1 andbto infinity. They can even be negative, implying "forward" lags. Ifa>bthenxtabond2swaps their values. (Note that ifa<=b<0 then the first-difference dated t-b+1 is normally used as an instrument in the levels equation instead of that dated t-a+1, because it is more frequently in the range [1,T] of valid time indexes. Or, for the same reasons, ifa<=0<=borb<=0<=a, the first-difference dated t is used.) Since thegmmstyle()varlistallows time-series operators, there are many routes to the same specification. E.g.,gmm(w, lag(2 .)), the standard treatment for an endogenous variable, is equivalent togmm(L.w, lag(1 .)), thusgmm(L.w).The

equation()suboption ofgmmstyle()works much like that ofivstyle()(see above), with one important exception. In response toequation(level),xtabond2generates thefull setof available instruments for the levels equation since it is no longer the case that most are made mathematically redundant by the presence of the full set of moment conditions for the transformed equation. To be precise, if the lag limits areaandb, then lags of the specified variables in differences dated t-bto t-aare used.equation(diff)has no effect in difference GMM.The

passthrusuboption ofgmmstyle()is meaningful only in system GMM, and only for variables for whichequation(level)has also been specified. It directsxtabond2to create instruments for the levels equation that use not the first-differences of the specified variables but the original levels of the same dates. For example,equation(level) passthru laglimits(1 .)requests that all lagged levels be used as instruments. Under the standard assumptions, these instruments are not valid.The

orthogonalsuboption tellsxtabond2to apply the backward orthogonal deviations transform to the instruments for the transformed equation. Essentially, instruments are replaced with their deviations from past means. Since the resulting instruments depend on all past values of the underlying variables, the regressors in the transformed equation should not be similarly transformed. Otherwise the instruments may be correlated with the error. That is, if this suboption is used theorthogonaloptionshould also be included (outside agmmstyle()option). In simulations, Hayakawa (2009) finds that "Difference GMM" with this combination--backword orthogonal deviations for the insturments and forward for the regressors--is less biased and more stable than traditional Difference GMM for a standard AR(1) model whenT>=10. (For an AR(p) model, he uses only the most recent p instrument lags, equivalent togmm(L.y, orthog lag(1p)).) This option does not affec the instruments for the levels equation.The

splitsuboption ofgmmstyle()is also meaningful only in system GMM, and then only when neithereq(diff)noreq(level)is specified. Its sole effect is to split the specified instrument group in two for purposes of difference-in-Sargan/Hansen testing--one instrument set for the transformed equation and one for the levels equation.The

collapsesuboption ofgmmstyle()specifies thatxtabond2should create one instrument for each variable and lag distance, rather than one for each time period, variable, and lag distance. In large samples,collapsereduces statistical efficiency. But in small samples it can avoid the bias that arises as the number of instruments climbs toward the number of observations. (When instruments are many, they tend to overfit the instrumented variables and bias the results toward those of OLS/GLS.)collapsealso greatly curtails computational demands by reducing the width of the instrument matrix, and (relevant for the ado version of the program) helps keep the matrix within Stata's size limit.For example, if a model assumes that E[w_is*D.e_it] = 0 for all s<t, this is expressed in standard Arellano-Bond estimation as:

sum_i (w_is * D.e_it) = 0 for each s and t, s<t.

This translates into columns in the instrument matrix of the form:

w_i1 0 0 0 0 0 ... 0 w_i1 w_i2 0 0 0 ... 0 0 0 w_i1 w_i2 w_i3 ... . . . . . . ... . . . . . . ...

collapsedivides the "GMM-style" moment conditions into groups and sums the conditions in each group to form a smaller set of conditions of the form:sum_i,t (w_i,t-j * D.e_it)= 0 for each j>0.

This is equivalent to combining columns of the instrument matrix by addition, yielding:

w_i1 0 0 ... w_i2 w_i1 0 ... w_i3 w_i2 w_i1 ... . . . ... . . . ...

Similarly, the standard instruments for the levels equation (in system GMM) collapse from:

D.w_i2 0 0 ... 0 D.w_i3 0 ... 0 0 D.w_i4 ... . . . ...

To the single column:

D.w_i2 D.w_i3 D.w_i4 . .

pcatellsxtabond2to replace the "GMM-style" instruments with their principal components in order to reduce the instrument count in a minimally arbitrary way (Kapetanios and Marcellino 2010; Bai and Ng 2010; Mehrhoff 2009). Principal components analysis is run on the correlation, not covariance, matrix of the "GMM-style" instruments. By defaultxtabond2will select all components with eigenvalues at least 1, and will select more if necessary to guarantee that instruments are at least as numerous as regressors, favoring those with largest eigenvalues.

components(#)allows the user to override the default number of components described just above.

artests(#)specifies the maximum order of the autocorrelation tests to be reported. The default is 2.

arlevelsspecifies that the autocorrelation tests should be applied to the residuals from the levels, not first-difference, equation. It cannot be specified along withnoleveleq. If there are fixed effects, then autocorrelation in levels is expected and would not call the specification into the question.

h(#)controls the form of H, thea prioriestimate of the covariance matrix of the idiosyncratic errors. In one-step linear GMM, the inverse of Z'HZ, where Z is the instrument matrix, proxies for the covariance matrix of the moments, and is used to weight the sample moments whose magnitudes are jointly minimized. Since H merely controls the weights on instruments believed exogenous, for any non-degenerate choice of H, one-step estimates will be consistent. And two-step estimates will be asymptotically efficient (Baum, Schaffer, and Stillman 2003). So the priority in designing H is minimizing arbitrariness. H always has block diagonal form, with all blocks the same. Let * indicate variables transformed by orthogonal deviations or differencing and M be the (T-1)xT matrix that performs the chosen transform. We assume for the purposes of designing H that var[e]=I, the identity matrix. Then, for difference GMM, the (T-1)x(T-1) blocks of H by default are MM', which is var[u*] (= var[e*]) when var[e]=I (see Roodman 2009). For orthogonal deviations, MM'=I. For differencing, it is:2 -1 0 ... -1 2 -1 ... 0 -1 2 ... . . . ...

To perform system GMM,

xtabond2treats the transformed data as being for periods 2 to T and levels data as being for periods T+1 to 2T. The blocks of H are then (2T-1)x(2T-1)a prioriestimates of the covariance of the compound vector [u*' u']'. If we assume, in addition to var[e]=I, that var[v]=0 (no fixed effects), then the blocks of H areMM' M' M I

However, more than one choice for H is present in the literature. In

xtabond2,h(3), the default, specifies the matrices described above.h(2)differs in that for system GMM the upper right and lower left quadrants of the depicted H are zeroed out. This copies current versions of DPD for Gauss and Ox (Arellano and Bond 1998; Doornik, Arellano, and Bond 2002).h(1)specifies that H=I for both difference and system GMM. H took this value in the original implementation of the system GMM estimator, in Blundell and Bond (1998). In one-step GMM, setting H=I essentially gives 2SLS.The Mata system parameter matafavor influences the behavior of the Mata version of

xtabond2. Typemata: mata set matafavor speedormata:mata set matafavor spacebefore runningxtabond2to influence the tradeoff it makes between speed and memory use. Add the, permoption to these commands to make the change permanent.Note:Increasing the amount of memory available for Stata data sets using theset memorycommandreducesthat available to Mata. So if Mataxtabond2is running out of memory, usually indicated by an unable to allocate real message, also try reducing Stata memory withsetmemory.

Options forpredict

xb, the default, calculates the linear prediction.

residualscalculates the residual error of the dependent variable from the linear prediction.

differencerequests that the first-differences of the dependent variable, rather than the levels, be predicted.

Return valuesScalars

e(N)Number of complete observations in untransformed data > (system GMM) or transformed data (difference GMM)e(sargan)Sargan statistice(sar_df)Degrees of freedom for Sargan statistice(sarganp)p value of Sargan statistice(hansen)Hansen J statistice(hansen_df)Degrees of freedom for Hansen statistice(hansenp)p value of Hansen statistice(artests)Number of AR tests requestede(ari)AR(i) test statistice(arip)p value of AR(i) statistice(df_m)Model degrees of freedome(df_r)Residual degrees of freedom (ifsmallspecified)e(chi2)Wald chi-squared statistic (ifsmallnot specified)e(chi2p)p value of Wald statistic (ifsmallnot specified)e(sig2)Estimated variance of the e_ite(sigma)Square root thereofe(F)F statistic (ifsmallspecified)e(F_p)p value of F statistic (ifsmallspecified)e(g_min)Lowest number of observations in an included individu > ale(g_max)Highest number of observations in an included individ > uale(g_avg)Average number of observations per included individua > le(h)Value ofh()option (default is 3)e(j)Number of instrumentse(j0)Number of instruments, including collinear onese(N_g)Number of included individualse(N_clust)Number of clusterse(components)Number of components extracted if pca option invokede(kmo)Kaiser-Meyer-Olkin measure of sampling adequacy if pc > a option invokede(pcaR2)Sum of eigenvalues of included components divided by > sum of allMacros

e(predict)"xtab2_p"e(artype)"first differences" or "levels"e(vcetype)"Robust" for one-steprobust, "Corrected" fortwostep> robust, empty otherwisee(twostep)"twostep" fortwostepe(small)"small" forsmalle(esttype)"system" or "difference"e(pca)"pca" if pca option invokede(gmminstsi)Variables listed ingmmstylegroupie(ivinstsi)Variables listed inivstylegroupie(transform)"first differences" or "orthogonal deviations"e(depvar)Dependent variablee(clustvar)Clustering group identifiere(tvar)Time variablee(ivar)Individual (panel) variablee(cmd)"xtabond2"e(cmdline)Full command linee(diffgroupi)variables inith group subject to difference-Sargan/H > ansen testingMatrices

e(b)Coefficient vectore(V)Variance-covariance matrixe(A1)First-step GMM weighting matrixe(A2)Second-step GMM weighting matrix (iftwostepspecifie > d)e(Ze)Z'E where E=2nd-step residuals, used in computing Han > sen statistice(eigenvalues)Eigenvalues of principal components of GMM-style inst > ruments (ifpcaspecified)e(diffsargan)Table of difference-in-Sargan/Hansen testse(ivequation)Value of equation() suboption for each ivstyle() opti > on, in order (0=level, 1=diff, 2=both)e(ivpassthru)Value of passthru option for each ivstyle() option.e(ivmz)Value of mz suboption for each ivstyle() optione(gmmequation)Value of equation() suboption for each gmmstyle() opt > ion (0=level, 1=diff, 2=both)e(gmmpassthru)Value of passthru option for each gmmstyle() optione(gmmpasscollapse)Value of collapse option for each gmmstyle() optione(gmmlaglimits)Lag limits for each gmmstyle() optione(gmmorthogonal)Value of orthogonal option for each gmmstyle() optione(X)Matrix of right-side variables used in estimation, if >svmatinvokede(Y)Column of dependent variable used in estimation, ifs> vmatinvokede(Z)Instrument matrix used in estimation, ifsvmatinvoke > de(H)H matrix used in estimation, ifsvmatinvokede(wt)Weight vector used in estimation, ifsvmatinvoked an > d weights usede(eigenvectors)Principal component scores, ifsvmatandpcainvokedFunctions

e(sample)Marks estimation sample

Examplesuse http://www.stata-press.com/data/r7/abdata.dta xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984, passthru) noleveleq small xtabond2 n l.n l(0/1).(w k) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984, mz) robust twostep small h(2) xtabond2 n l(1/2).n l(0/1).w l(0/2).(k ys) yr1980-yr1984, gmm(l.n w k) iv(yr1980-yr1984) robust twostep small

* Next two are equivalent, assuming id is the panel identifierivreg2 n cap (w = k ys rec) [pw=_n], cluster(ind) orthog(rec) xtabond2 n w cap [pw=_n], iv(cap k ys, eq(level)) iv(rec, eq(level)) cluster(ind) h(1)* Same for next tworegress n w k xtabond2 n w k, iv(w k, eq(level)) small h(1)* And next two, assuming xtabond updated since May 2004 withupdatecommand.xtabond n yr*, lags(1) pre(w, lags(1,.)) pre(k, endog) robust small noconstant xtabond2 n L.n w L.w k yr*, gmm(L.(w n k)) iv(yr*) noleveleq robust small* And next twoxtdpd n L.n L(0/1).(w k) yr1978-yr1984, dgmm(w k n) lgmm(w k n) liv(yr1978-yr1984) vce(robust) two hascons xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep* Three ways to reduce the instrument countxtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep pca xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n), collapse) iv(yr1978-yr1984, eq(level)) h(2) robust twostep xtabond2 n L.n L(0/1).(w k) yr1978-yr1984, gmm(L.(w k n), lag(1 1)) iv(yr1978-yr1984, eq(level)) h(2) robust twostep* Estimation a la Hayakawa 2009xtabond2 n L.n L(0/1).(w k) yr1979-yr1984, gmm(L.(w k n), lag(1 1) orthog) iv(yr1979-yr1984, eq(level)) h(2) robust twostep orthog noleveleq

Three sample filesare included with the package downloaded with this command.abest.doreproduces two sample file that comes with DPD for Ox, which in turn generate most of the GMM results in Arellano and Bond (1991).bbest.doreproduces another sample file that comes with DPD for Ox, based on Blundell and Bond (1998). To download them, type the following command or click on it: ssc install xtabond2, all replace. This will save the files to your current directory, as set by thecdcommand.greene.doreproduces an example in Greene (2002).ReferencesArellano, M. and S. Bond. 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations.

The Review of Economic Studies58: 277-97. Arellano, M. and S. Bond. 1998. Dynamic Panel data estimation using DPD98 for Gauss: A guide for users. Arellano, M. and O. Bover. 1995. Another look at the instrumental variable estimation of error-components models.Journal ofEconometrics68: 29-51. Bai, J., and S. Ng. 2010. Instrumental Variables Estimation in a Data Rich Environment.Econometric Theory26(6): 1577-1606. Baum, C.F., M.E. Schaffer, and S. Stillman. 2003. Instrumental variables and GMM: Estimation and testing.Stata Journal3: 1-31. Blundell, R., and S. Bond. 1998. Initial conditions and moment restrictions in dynamic panel data models.Journal of Econometrics87: 115-43. Bond, S. 2002. Dynamic panel data models: A guide to micro data methods and practice. Working Paper 09/02. Institute for Fiscal Studies, London. Doornik, J.A., M. Arellano, and S. Bond. 2002. Panel data estimation using DPD for Ox. http://www.nuff.ox.ac.uk/Users/Doornik. Greene, W.H. 2002Econometric Analysis, 5th ed. Prentice-Hall. Hayakawa, K. 2009. A simple efficient instrumental variable estimator for panel AR(p) models when both N and T are large.Econometric Theory25: 873-90. Holtz-Eakin, D., W. Newey, and H.S. Rosen. 1988. Estimating vector autoregressions with panel data.Econometrica56: 1371-95. Kapetanios, G., M. Marcellino. 2010. Factor-GMM estimation with large sets of possibly weak instruments.Computational Statistics & DataAnalysis54(11): 2655–75. Mehrhoff, J. 2009. A solution to the problem of too many instruments in dynamic panel data GMM. Discussion Paper Series 1. No 31/2009. Roodman, D. 2009. How to Do xtabond2: An Introduction to "Difference" and "System" GMM in Stata.Stata Journal9(1): 86-136. Windmeijer, F. 2005. A finite sample correction for the variance of linear efficient two-step GMM estimators.Journal of Econometrics126: 25-51.

AuthorDavid Roodman Senior Fellow Center for Global Development Washington, DC droodman@cgdev.org

Also seeManual:

[U] 23 Estimation and post-estimation commands,[U] 29 Overview of Stata estimation commands,[XT] xtabondOnline: help for xtabond, ivreg, ivreg2, estcom, postest; xtgee,