.- help for ^-b1x2-^ .- Accounting for changes when X2 is added to a base model with X1 ---------------------------------------------------------------------------- ^b1x2^ depvar [^if^ exp] [^in^ range] [^aweight^ ^fweight^ exp], ^x1all^([vars included in both specifications]) ^x2all^([vars included only in full specification]) [^x1endog^([list of endogenous vars in x1all]) ^iv^([instruments for x1]) ^r^obust ^c^luster([cluster varname]) ^noBase^ ^noFull^ ^x1only^([list of x1 vars for which user wants decomposition computed]) ^x2delta^([definition of groups for x2 vars]) ^gamma0^ ^cov0^] ^-b1x2-^ is an eclass estimator and preserves sort order To reset problem-size limits, see help @matsize@. Note: This help file is long but cannot explain everything. An example appears below, illustrating how the command works. Play around with it yourself to get used to it. It is always possible that logfile snippets will appear at http://gelbach.eller.arizona.edu/papers/b1x2/index.html, so feel free to check there. Description ----------- ^-b1x2-^ provides decompositions of cross-specification differences in estimates of the coefficient on X1. Because the decomposition is conditional, ^-b1x2-^ is a superior alternative to the common practice of sequentially adding covariates to a base model. The command also computes consistent estimates of the standard error of the difference in base- and full-specification coefficient estimates. The command was written by Jonah B. Gelbach and is based on his paper, "When Do Covariates Matter? And Which Ones, And How Much?", University of Arizona Department of Economics Working Paper #09-07. This paper is currently available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1425737 In its simplest form, ^-b1x2-^ does the following: A. Runs the "base" regression . regress depvar x1all B. Runs the "full" regression . regress depvar x1all x2all C. Computes both the difference in the coefficent estimates for x1all (including the constant) and a consistent estimate of the asymptotic covariance matrix for this difference. The covariance matrix used in step C is discussed in Appendix B of the paper cited above. Options ------- MANDATORY: ^x1all^ List of vars included in both base and full regs ^x2all^ List of vars included in only the full reg OPTIONAL: ^robust^ Specifies hsk-robust covariance matrix ^cluster^(^varname^) Specifies covariance matrix that accts for arbitrary dependence w/in values of specified variable ^x1endog^(^varlist^) Specifies a list of endogenous variables in x1all. This list can be identical to x1all, or a subset. All endogenous variables must also be listed in x1all. ^iv^ Specifies instruments used for variables listed in ^x1endog^ in both base and full regressions ^nofull^ Specifies that full regression results should be supressed from output (they are always computed). ^nobase^ Specifies that base regression results should be supressed from output (they are always computed). ^x1only^ Specifies a subset of x1all variables for which the decomposition should be done. Use of this option is strongly encouraged when there are more than a few variables in x1all, and only a limited number of variables (maybe even just one?) in x1 are of interest. Using x1only makes computation quicker and also reduces system demands by reducing the dimension of various internally defined matrices. ^x2delta^ Specifies how variables in x2all should be split into groups for purposes of decomposing the difference in x1 coefficient estimates across the base and full specifications. See below. Examples -------- Methods and Formulas -------------------- 1. The model ------------ Let the model be y = b0 + X1 b1 + X2 b2 + e, where b1 and b2 are, respectively, k1- and k2-dimensional column vectors, with at least one element of b2 being nonzero. Assume that E[e|X1] = E[e|X1,X2] = 0. The rest of the econometric discussion will assume that the matrix [y X1 X2] has been demeaned, so that b0 can be taken as identically 0, though this is for exposition only. 2. Estimating the impact on b1hat of including X2 ------------------------------------------------- Define the ``full estimator'' b1full by b1full = (X1' M2 X1)^-1 (X1' M2 y), where M2 = I - P2 = I - X2 (X2'X2)^-1 X2 is the residual-maker matrix for X2. This estimator has plim b1, so it is consistent. Next, write X2 = X1 Gamma + W, where Gamma is the (k1 x k2) linear projection of X2 on X1. It follows definitionally that E[X1'W] = 0. Consider the base estimator of b1: b1base = (X1'X1)^-1 X1'y This estimator has plim b1 + d1, where d1 = Gamma b2, so b1base is consistent for b1 if and only if either (i) Gamma=0 or (ii) Gamma and b2 are orthogonal (we ruled out b2=0 by assumption, since this can most straightforwardly be tested by running an F/Chi2 test on the restriction b2=0). Define the difference between the estimates of b1 as dhat = b1base - b1full, and note that its plim is d1. 3. Decomposing dhat into component parts -------------------------------------------- It is easy to show (see Gelbach paper cited above) that dhat = (X1'X1)^-1 X1'X2 b2hat, where b2hat comes from the full model. Further, X2 b2hat = sum_k X2k b2hatk, where k indexes variables in X2. Now define Hhat = X2 b2hat, so that dhat = (X1'X1)^-1 X1'Hhat. Now consider G mutually exclusive groups of covariates in X2, indexed by g. We can write Hhat = sum_g Hhatg, where Hhatg = sum_{k in group g} X2k b2hatk. Also define dhatg = (X1'X1)^-1 X1'Hhatg, and note that since the G Hhatg terms sum exactly to the overall Hhat, we must have dhat = sum_g dhatg. Since dhat exactly equals the difference in base- and full-specification estimates of the coefficient on X1, it follows that the dhatg components together account exactly for the difference between the base- and full-specification estimates. Gelbach's paper shows that these components are also meaningful. To define the groups, you use the option x2delta, as follows: x2only( groupname1 = group1varlist : ... : groupnameG = groupGvarlist) In other words, the syntax of x2only is to include a series of G strings. The gth string has the form groupname = varlist, and the strings must be separated by ":", the colon character. Make sure that you do not include the same variable in multiple strings (b1x2 should throw an error in such cases, but it's better to get it right in the first place). D. Variance options ------------------- There are two variance options, gamma0 and cov0. The option gamma0 amounts to telling b1x2 to impose the null hypothesis that x1all and x2all variables are orthogonal when estimating the decomposition's variance matrix. This option may lead to more powerful tests of the null, though resulting standard errors are inconsistent when the null is false. The option cov0 tells b1x2 to ignore covariance between estimated components of b2 and estimated components of Gamma. This option is appropriate under certain conditions discussed in Appendix B of Gelbach's paper. E. Other notes -------------- b1x2 has internal checks to ensure that no variable dropped from the full specification will appear in the base specification. This requirement could lead to problems if the full model involves variables that cause a variable in x1all to be dropped due to perfect collinearity. For this reason, it's important to make sure that all variables you expect to be included in the base specification actually are. b1x2 also imposes the requirement that the same sample be used in each specification. This is a feature, not a bug. Saved Results ------------- ^-b1x2-^ is an eclass command. It saves results in the following places: Scalars: e(N) = number of observations e(k1) = number of variables in x1all, including the constant e(k2) = number of variables in x2all e(numiv) = number of instrumental variables e(numx1endog)= number of endogenous variables e(N_clust) = number of clusters Local macros: e(groupnames) string of groupnames (see discussion of x2delta) e(cmd) "b1x2" e(depvar) dependent variable name e(weight) type of weights and varname e(if) if condition e(in) in condition e(robust) flag for robust estimation e(cluster) cluster varname, if any Matrices: e(Delta) vector of decomposition elements (reported in table of results) e(Covdelta) estimated covariance matrix for Delta e(b1base) vector of base-specification coefficient estimates e(V1base) estimated covariance matrix for b1base e(b1full) vector of full-specification coefficient estimates for X1 e(V1full) estimated covariance matrix for b1full e(b2full) vector of full-specification coefficient estimates for X2 e(V2full) estimated covariance matrix for b2full e(bfull) vector of full-specification coefficient estimates for X1 and X2 e(Vfull) estimated covariance matrix for bfull Functions: e(sample) : Variable =1 if included and =0 if not included Examples -------- Here's one you can mess around with on your own: . set obs 1000 obs was 0, now 1000 . gen double x1 = invnorm(uniform()) . gen double x21 = x1*1 + invnorm(uniform()) . gen double x22 = x1*0.25 + x21*0.75 + invnorm(uniform()) . gen double x23 = x1*0.4 + x21*0.6 + x22*0.4 + invnorm(uniform()) . gen double y = x1*1 + x21*2 + x22*0.5 + x23*0.75 + invnorm(uniform()) *here's the base model . reg y x1 Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 1, 998) = 1897.98 Model | 21304.6529 1 21304.6529 Prob > F = 0.0000 Residual | 11202.4862 998 11.2249361 R-squared = 0.6554 -------------+------------------------------ Adj R-squared = 0.6550 Total | 32507.1391 999 32.5396788 Root MSE = 3.3504 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 4.608666 .1057864 43.57 0.000 4.401077 4.816255 _cons | .0724791 .1059478 0.68 0.494 -.135427 .2803851 ------------------------------------------------------------------------------ *let's save the estimated coefficient on x1: . scalar bx1base = _b[x1] *here's the full model . reg y x* Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 4, 995) = 7542.43 Model | 31469.2792 4 7867.3198 Prob > F = 0.0000 Residual | 1037.85989 995 1.04307527 R-squared = 0.9681 -------------+------------------------------ Adj R-squared = 0.9679 Total | 32507.1391 999 32.5396788 Root MSE = 1.0213 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.072417 .0483803 22.17 0.000 .9774781 1.167356 x21 | 1.887788 .0461923 40.87 0.000 1.797143 1.978434 x22 | .5403505 .035649 15.16 0.000 .4703945 .6103064 x23 | .7604411 .0336674 22.59 0.000 .6943737 .8265084 _cons | .0205694 .0323558 0.64 0.525 -.042924 .0840629 ------------------------------------------------------------------------------ *let's save the estimated coefficient on x1: . scalar bx1full = _b[x1] *here's the difference in the estimated coefficient on x1: . di bx1base-bx1full 3.5362488 *here's b1x2 at work. comments: * * 1. the base specification includes only x1 (and a constant) * 2. the full specification includes three additional covariates * 3. we ask b1x2 to put the first two covariates in group g1 and put x23 in g2 * . b1x2 y, x1all(x1) x2all(x2*) x2delta(g1 = x21 x22 : g2=x23) x1only(x1) Number of obs = 1000 Restricted regression: . b1x2: reg y x1 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 4.608666 .1057864 43.57 0.000 4.401328 4.816004 _cons | .0724791 .1059478 0.68 0.494 -.1351748 .280133 ------------------------------------------------------------------------------ Unrestricted regression: . b1x2: reg y x1 x21 x22 x23 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.072417 .0483803 22.17 0.000 .9775936 1.167241 x21 | 1.887788 .0461923 40.87 0.000 1.797253 1.978323 x22 | .5403505 .035649 15.16 0.000 .4704796 .6102213 x23 | .7604411 .0336674 22.59 0.000 .6944541 .8264281 _cons | .0205694 .0323558 0.64 0.525 -.0428468 .0839856 ------------------------------------------------------------------------------ Decomposition of changes in coefficients on x1 vars: x1 into parts due to these groups: g1 = x21 x22 g2 = x23 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | g1 | 2.468092 .08897 27.74 0.000 2.293714 2.64247 g2 | 1.068156 .0579061 18.45 0.000 .9546626 1.18165 __TC | 3.536249 .1071698 33.00 0.000 3.3262 3.746298 ------------------------------------------------------------------------------ Note: The __TC... line is the sum of all x2 variables' impacts on each x1. The reported covariance between this coef and all others is zero. This is NOT correct!! Done with -b1x2-. . **This example shows that variables in group g1 account for 2.47 of **the difference of 3.54 between the base- and full-specification **estimates of the coefficient on x1. The other 1.07 is explained by **group g2, which is just the variable x23 in this case. Also see -------- Manual: ^[U] 26 Estimation and post-estimation commands^ ^[U] 35 Overview of model estimation^ ^[R] regress^ ^[R] robust^ ^[R] test^ ^[R] testparm^ On-line: help for @est@; @regress@; @robust@; @test@; @testparm@ NO WARRANTIES: TO THE EXTENT PERMITTED BY APPLICABLE LAW, NEITHER JONAH B. GELBACH, NOR ANY OTHER PERSON, EITHER EXPRESSLY OR IMPLICITLY, WARRANTS ANY ASPECT OF THIS SOFTWARE OR PROGRAM, INCLUDING ANY OUTPUT OR RESULTS OF THIS SOFTWARE OR PROGRAM. THIS SOFTWARE AND PROGRAM IS BEING PROVIDED "AS IS", WITHOUT ANY WARRANTY OF ANY TYPE OR NATURE, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, AND ANY WARRANTY THAT THIS SOFTWARE OR PROGRAM IS FREE FROM DEFECTS.