Title
gsa -- Generalized sensitivity analysis
Syntax
gsa depvar treatmentvar [indepvars] [if] [in] , tau(real) tstat(real) [options]
options Description ------------------------------------------------------------------------- Model * tau(real) the target size of the coefficient of the treatment variable. * tstat(real) the target size of the t-value of the treatment variable. maxc1(real) the parameter that affects the partial correlation between pseudo unobservables and the treatment variable; default is maxc1(2). maxc2(real) the parameter that affects the partial correlation between pseudo unobservables and the outcome variable; default is maxc2(.5). precision (real) the percentage error from tau or tstat acceptable for pseudo unobservables; default is precision(5). resolution(int) the maximum number of iterations of pseudo unobservable generation for each fixed value of c1(step 5) and c2(step 8); default is resolution(100). observation(int) the maximum number of values at which the values of c1(step 5) and c2(step 8) are fixed; default is observation(200). binu generates binary pseudo unobservables instead of the default continuous ones. correlation declares that the partial correlations are used as the axes of the contour plot. ylogit declares that the outcome equation is estimated with logit. yprobit declares that the outcome equation is estimated with probit. ylpm declares that the outcome equation is estimated with linear probability model (See remark). ycontinuous declares that the outcome variable is continuous, and the outcome equation is estimated with regression. logit declares that the treatment assignment equation is estimated with logit. probit declares that the treatment assignment equation is estimated with probit. lpm declares that the treatment assignment equation is estimated with linear probability model (See remark). continuous declares that the treatment variable is continuous, and the outcome equation is estimated with regression. nodots suppresses display of iteration dots. seed(int) sets random-number seed; default is seed(1).
SE/Robust vce(vcetype) vcetype may be oim,robust, cluster clustvar, or opg
Graph noprint suppresses the figure. nplots(int) specifies the number of control variables plotted on the graph; default is nplots(5). The first n variables in indepvars are selected. fractional declares the contour is estimated with fractional polynomial (default.) quadratic declares the contour is estimated with quadratic prediction. lowess declares the contour is estimated with lowess smoothing. scatter adds the scatter plots on the figure.
Advanced gsa_pu_precision the accuracy of the orthogonality condition between a pseudo unobservable and other control variables when a pseudo unobservable is continuous; default is gsa_pu_precision(.99). gsa_binpu_precision the accuracy of the orthogonality condition between pseudo unobservable and other control variables when a pseudo unobservable is binary; default is gsa_binpu_precision(.99). gsa_range_res the maximum number of iterations in determining the maximum size of c1(step 4) and c2(step 7); default is gsa_range_res(2000). iter_tolerance the number of failed iterations in step 5 and step 8 tolerated before moving to the next step; default is iter_tolerance(10). ------------------------------------------------------------------------- * Either tau(real) or tstat(real) is required. At least one indepvars is required. The treatment effect needs to be positive. The size of tau(real) or tstat(real) must be smaller than the original treatment effect. Weights are not allowed. When ylpm or lpm is selected, the user needs to choose robust/clustered standard errors.
Description
-gsa- produces a figure for the sensitivity analysis similar to Imbens (2003). Observational studies cannot control for the bias due to the omission of unobservables. The sensitivity analysis provides a graphical benchmark about how strong assumption about unobservables researchers need to make to maintain the causal interpretation of the result. Among various sensitivity analyses, -gsa- often serves as the most accessible option because it minimizes the changes that researchers need to make in their models to conduct a sensitivity analysis.
The difference between -gsa- and Imbens (2003) is that while Imbens (2003) set up the likelihood function to produce the contour plot, -gsa- produces the contour computationally by generating pseudo unobservables. As such, -gsa- is the most helpful when a la Imbens (2003) likelihood function is difficult to set up and/or hard to achieve convergence. Another advantage for -gsa- is dits ability to draw contour based on test statistics, which Imbens' sensitivity analysis cannot (Harada 2012). Thus, -gsa- is particularly useful when the treatment variable is a continuous variable, when the outcome variable is binary, or when the quantity of interest is defined in terms of test statistics.
Options
+-------+ ----+ Model +------------------------------------------------------------
tau specifies the target size of the coefficient of the treatment variable. For example, if a researcher finds the treatment effect of 1.7 and wants to know the strength of the confounding by an unobservable that halves the coefficient, s/he should set tau(0.85). The contour in the figure represents the set of partial effects of an unobservable that changes the coefficient to 0.85.
tstat specifies the target size of the t-statistics of the treatment variable. For example, if a researcher finds a statistically sifnificant and positive treatment effect and wants to know the strength of the confounding by an unobservable that makes the treatment effect statistically insignificant at 5% level, s/he should set tstat(1.96). The contour in the figure represents the set of partial effects of an unobservable that makes the treatment effect statistically insignificant.
maxc1(real) specifies the maximum value of c1 that affects the partial correlation between pseudo unobservables and the treatment variable. In Imbens' (2003) framework, c1 affects the size of alpha. In the iterations where c1 is fixed and c2 is changed, maxc1(real) defines the maximum value at which c1 is fixed. When c2 is fixed and c1 is changed, a new maximum value of c1 is set to the smaller value for computational efficiency. This new maximum is smaller than maxc1(real).
maxc2(real) specifies the maximum value of c2 that affects the partial correlation between pseudo unobservables and the outcome variable. In Imbens' (2003) framework, c2 affects the size of delta. See the above explanation for the detail by exchanging c1 and c2.
precision(real) the percentage error from tau or tstat acceptable for pseudo unobservables. The default is precision(5), which means that when a researcher sets tstat(1.96), -gsa- will accept a pseudo unobservable if it changes the t-statistics of the treatment effect to any value that falls in (1.862, 2.058). (1.862 = 1.96*0.95 = tstat(1.96)*(1-precision(5)/100) and 2.058 = 1.96*1.05 = tstat(1.96)*(1+precision(5S)/100)). There is a trade of between computational time and accuracy. Particularly, with binu option, a researcher should not set precision (real) too small.
resolution(int) specifies the maximum number of iterations for each fixed value of c1(step 5) and c2(step 8). The default is resolution(100), which means that -gsa- generates up to 100 pseudo unobservables until a pseudo variable changes the treatmen effect to tau or tstat. In doing so, -gsa- gradually incrases the size of c2(step 5) or c1(step 8) from 0 to the value found in the preceeding step. Each dot in step 5 and step 8 indicates that -gsa- successfully generates a pseudo unobservable that satisfies the condition. Each x in red indicates that -gsa- could not find a pseudo unobservable in resolution(int) times of iterations. Typically -gsa- shows a number of dots initially and increasing number of x later, which is normal. When you see only x from the beginning (particularly with binu option), you might want to increase resolution(int).
observation specifies the number of values at which the values of c1(step 5) and c2(step 8) are fixed. The default is observation(200), which means that when a researcher set maxc1(2), -gsa- run the iterations of generating pseudo unobservables at up to 200 different fixed values of c1. In this example, -gsa- starts the iteration by setting c1 at maxc1(2) and gradually decreases c1 by 0.01 (=maxc1(2)/ observation(200)). When -gsa- could not find an appropriate pseudo unobservable for 10 consecutive times (in default), -gsa- proceeds to the next step in which it fixes c2 and changes c1. A larger value of observation(int) might be useful when the scatter plots have large variations.
binu generates binary pseudo unobservables instead of the default continuous ones. An assumption of a binary unobservable mainly serves for the following two purposes. First, if a researcher wants to compare the performance of -gsa- with that of -isa-, s/he must set binu because Imbens (2003) assumes a binary unobservable. Second case is obviously when a researcher thinks a binary unobservable is a reasonable assumption, which can be the case when a major unobserved confounder is, say, gene. Otherwise, a researcher may not want to use this option because it takes more computational time.
correlation declares that the partial correlations are used as the axes of the contour plot. Specifically, the vertical axis is defined by the partial correlation between an unobservable and the outcome variable. The horizontal axis is defined by the partial correlation between an unobservable and the treatment variable. An unobservable, the outcome variable and the treatment variable are all residualized by the other control variables before calculating partial correlation. If either yprobit or probit is specified, correlation is altomatically selected.
ylogit, yprobit, ylpm, ycontinuous, logit, probit, lpm, and continuous all specify the way model is estimated. The default for a binary outcome varaible is ylogit. The default for a continuous outcome varaible is ycontinuous. The default for a binary treatment varaible is logit. The default for a continuous treatment varaible is continuous. If either yprobit or probit is specified, correlation is altomatically selected.
+-----------+ ----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory, that are robust to some kinds of misspecification that allow for intragroup correlation; see [R] vce_option. [R] vce_option
Example
The first example below evaluates the effect of union membership on hourly wage assuming that union membership is a treatment variable. Because this example is the same as that appears in -isa-, this manual focuses on the issues specific to -gsa-.
. sysuse nlsw88 . xi i.race . rename _Irace_2 black . rename _Irace_3 other . xi i.industry . rename _Iindustry_5 mnfctr . gsa wage union age black other grade married south c_city mnfctr, tau(.314) logit binu scatter nplots(8)
In several minutes or so, -gsa- produces the figure similar to Imbens (2003), or -isa-. The blue horrow circles are the scatter plots that represent the estimates of partial R-squares for each pseudo unobservable. The contour curve is drawn based on these plots with fractional polynomial function. It would be interesting to run the following code from -isa- and see how closely -gsa- replicates the outputs of Imbens (2003).
. isa wage union age black other grade married south c_city mnfctr, tau(.314)
The second example uses the continous treatment variable, namely the years of education. The following code asks how much strong assumption about an unobservable a researcher needs to make to discount 10% of the effect of education on income.
. gsa wage grade age black other south c_city, tau(.633) maxc1(.1) maxc2(.1) scatter nplots(5)
Some control variables are dropped from the right hand side to avoid post-treatment bias. In this example, an unobservable is continous, which produces more conservative contour (i.e. it's harder to prove robustness). Nevertheless, the figure shows that all covariates are plotted far below the contour. Suppose only unobservable we need to worry in this model is intelligence (IQ). Then, this result shows that a researcher needs to assume that the effect of IQ on income must be more than several times stronger than that of age, race and locations to discount the education effect by 10%. If a researcher prefers partial correlation to partial R-square as axes, the following command does the job.
. gsagraph wage grade age black other south c_city, tau(.633) nplots(5) scatter cor
Tips
1. The treatment effect must be positive for the program to work properly.
2. The mean of variables should not be too large or too small. Ideally, the mean of the variables should be 1 digit. This will help finding appropriate values of maxc1 and maxc2.
3. Do not set the values of maxc1 and maxc2 too large. Usually, maxc1 and maxc2 are smaller than 5 and often do not exceed 2.
4. If the contour ends too short in the right edge, increase the values of maxc1. On the other hand, if the contour ends too short in the top edge, increase the values of maxc2. Also, if the contour is too far from the plots of covariates, decrease maxc1 and maxc2.
5. If outcome and/or treatment variable(s) are/is binary and the number of observation is small, linear probability model (ylpm and lpm) tends to produce a nice contour.
Saved results
-gsa- saves the following variables. The graph can be reproduced using these saved variables. The results of _n th successful generation of a pseudo unobservable are recorded in the _n th row:
gsa_c1 the value of c1. gsa_c2 the value of c2. gsa_alpha the value of alpha. gsa_delta the value of delta. gsa_partial_rsq_y the partial r-square of an unobservable in the outcome equation gsa_partial_rsq_t the partial r-square of an unobservable in the treatment assignment equation gsa_rho_res_yu the partial correlation between an unobservable and the outcome variable. Not available with yprobit and probit. gsa_rho_res_tu the partial correlation between an unobservable and the treatment variable. Not available with yprobit and probit. gsa_partial_rsq_yx the partial r-square of the k th covariate in the outcome equation gsa_partial_rsq_tx the partial r-square of the k th covariate in the treatment assignment equation gsa_rho_res_yx the partial correlation between the k th covariate and the outcome variable. gsa_rho_res_tx the partial correlation between the k th covariate and the treatment variable.
Reference
Here is the link for the quick guide of -gsa- by the author of the program. This provides the idea about what alpha, delta, c1 and c2 mean.
If you use this program, please cite:
Harada, Masataka "Generalized Sensitivity Analysis." Working paper.
Imbens, Guido W. 2003. "Sensitivity to Exogeneity Assumptions in Program Evaluation." The American Economic Review 93(2):126-132.
Contact
Please feel free to contact Masataka Harada(masatakaharada@nyu.edu) for