help gsa-------------------------------------------------------------------------------

Title

gsa-- Generalized sensitivity analysis

Syntax

gsadepvartreatmentvar[indepvars] [if] [in],tau(real)tstat(real)[options]

optionsDescription ------------------------------------------------------------------------- Model *tau(real)the target size of the coefficient of the treatment variable. *tstat(real)the target size of the t-value of the treatment variable.maxc1(real)the parameter that affects the partial correlation between pseudo unobservables and the treatment variable; default ismaxc1(2).maxc2(real)the parameter that affects the partial correlation between pseudo unobservables and the outcome variable; default ismaxc2(.5).precision (real)the percentage error fromtauortstatacceptable for pseudo unobservables; default isprecision(5).resolution(int)the maximum number of iterations of pseudo unobservable generation for each fixed value ofc1(step 5) andc2(step 8); default isresolution(100).observation(int)the maximum number of values at which the values ofc1(step 5) andc2(step 8) are fixed; default isobservation(200).binugenerates binary pseudo unobservables instead of the default continuous ones.correlationdeclares that the partial correlations are used as the axes of the contour plot.ylogitdeclares that the outcome equation is estimated withlogit.yprobitdeclares that the outcome equation is estimated withprobit.ylpmdeclares that the outcome equation is estimated with linear probability model (See remark).ycontinuousdeclares that the outcome variable is continuous, and the outcome equation is estimated withregression.logitdeclares that the treatment assignment equation is estimated withlogit.probitdeclares that the treatment assignment equation is estimated withprobit.lpmdeclares that the treatment assignment equation is estimated with linear probability model (See remark).continuousdeclares that the treatment variable is continuous, and the outcome equation is estimated withregression.nodotssuppresses display of iteration dots.seed(int)sets random-number seed; default isseed(1).SE/Robust

vce(vcetype)vcetypemay beoim,robust,clusterclustvar, oropgGraph

noprintsuppresses the figure.nplots(int)specifies the number of control variables plotted on the graph; default isnplots(5). The firstnvariables inindepvarsare selected.fractionaldeclares the contour is estimated with fractional polynomial (default.)quadraticdeclares the contour is estimated with quadratic prediction.lowessdeclares the contour is estimated with lowess smoothing.scatteradds the scatter plots on the figure.Advanced

gsa_pu_precisionthe accuracy of the orthogonality condition between a pseudo unobservable and other control variables when a pseudo unobservable is continuous; default isgsa_pu_precision(.99).gsa_binpu_precisionthe accuracy of the orthogonality condition between pseudo unobservable and other control variables when a pseudo unobservable is binary; default isgsa_binpu_precision(.99).gsa_range_resthe maximum number of iterations in determining the maximum size ofc1(step 4) andc2(step 7); default isgsa_range_res(2000).iter_tolerancethe number of failed iterations in step 5 and step 8 tolerated before moving to the next step; default isiter_tolerance(10). ------------------------------------------------------------------------- * Eithertau(real)ortstat(real)is required. At least oneindepvarsis required. The treatment effect needs to be positive. The size oftau(real)ortstat(real)must be smaller than the original treatment effect. Weights are not allowed. Whenylpmorlpmis selected, the user needs to choose robust/clustered standard errors.

Description-

gsa- produces a figure for the sensitivity analysis similar to Imbens (2003). Observational studies cannot control for the bias due to the omission of unobservables. The sensitivity analysis provides a graphical benchmark about how strong assumption about unobservables researchers need to make to maintain the causal interpretation of the result. Among various sensitivity analyses, -gsa- often serves as the most accessible option because it minimizes the changes that researchers need to make in their models to conduct a sensitivity analysis.The difference between -

gsa- and Imbens (2003) is that while Imbens (2003) set up the likelihood function to produce the contour plot, -gsa- produces the contour computationally by generating pseudo unobservables. As such, -gsa- is the most helpful whena laImbens (2003) likelihood function is difficult to set up and/or hard to achieve convergence. Another advantage for -gsa- is dits ability to draw contour based on test statistics, which Imbens' sensitivity analysis cannot (Harada 2012). Thus, -gsa- is particularly useful when the treatment variable is a continuous variable, when the outcome variable is binary, or when the quantity of interest is defined in terms of test statistics.

Options+-------+ ----+ Model +------------------------------------------------------------

tauspecifies the target size of the coefficient of the treatment variable. For example, if a researcher finds the treatment effect of 1.7 and wants to know the strength of the confounding by an unobservable that halves the coefficient, s/he should settau(0.85). The contour in the figure represents the set of partial effects of an unobservable that changes the coefficient to 0.85.

tstatspecifies the target size of the t-statistics of the treatment variable. For example, if a researcher finds a statistically sifnificant and positive treatment effect and wants to know the strength of the confounding by an unobservable that makes the treatment effect statistically insignificant at 5% level, s/he should settstat(1.96). The contour in the figure represents the set of partial effects of an unobservable that makes the treatment effect statistically insignificant.

maxc1(real)specifies the maximum value ofc1that affects the partial correlation between pseudo unobservables and the treatment variable. In Imbens' (2003) framework,c1affects the size of alpha. In the iterations wherec1is fixed andc2is changed,maxc1(real)defines the maximum value at whichc1is fixed. Whenc2is fixed andc1is changed, a new maximum value ofc1is set to the smaller value for computational efficiency. This new maximum is smaller thanmaxc1(real).

maxc2(real)specifies the maximum value ofc2that affects the partial correlation between pseudo unobservables and the outcome variable. In Imbens' (2003) framework,c2affects the size of delta. See the above explanation for the detail by exchangingc1andc2.

precision(real)the percentage error fromtauortstatacceptable for pseudo unobservables. The default isprecision(5), which means that when a researcher setststat(1.96), -gsa- will accept a pseudo unobservable if it changes the t-statistics of the treatment effect to any value that falls in (1.862, 2.058). (1.862 = 1.96*0.95 =tstat(1.96)*(1-precision(5)/100) and 2.058 = 1.96*1.05 =tstat(1.96)*(1+precision(5S)/100)). There is a trade of between computational time and accuracy. Particularly, withbinuoption, a researcher should not setprecision (real)too small.

resolution(int)specifies the maximum number of iterations for each fixed value ofc1(step 5) andc2(step 8). The default isresolution(100), which means that -gsa- generates up to 100 pseudo unobservables until a pseudo variable changes the treatmen effect totauortstat. In doing so, -gsa- gradually incrases the size ofc2(step 5) orc1(step 8) from 0 to the value found in the preceeding step. Each dot in step 5 and step 8 indicates that -gsa- successfully generates a pseudo unobservable that satisfies the condition. Each x in red indicates that -gsa- could not find a pseudo unobservable inresolution(int)times of iterations. Typically -gsa- shows a number of dots initially and increasing number of x later, which is normal. When you see only x from the beginning (particularly withbinuoption), you might want to increaseresolution(int).

observationspecifies the number of values at which the values ofc1(step 5) andc2(step 8) are fixed. The default isobservation(200), which means that when a researcher setmaxc1(2), -gsa- run the iterations of generating pseudo unobservables at up to 200 different fixed values ofc1. In this example, -gsa- starts the iteration by settingc1atmaxc1(2)and gradually decreasesc1by 0.01 (=maxc1(2)/observation(200)). When -gsa- could not find an appropriate pseudo unobservable for 10 consecutive times (in default), -gsa- proceeds to the next step in which it fixesc2and changesc1. A larger value ofobservation(int)might be useful when the scatter plots have large variations.

binugenerates binary pseudo unobservables instead of the default continuous ones. An assumption of a binary unobservable mainly serves for the following two purposes. First, if a researcher wants to compare the performance of -gsa- with that of -isa-, s/he must setbinubecause Imbens (2003) assumes a binary unobservable. Second case is obviously when a researcher thinks a binary unobservable is a reasonable assumption, which can be the case when a major unobserved confounder is, say, gene. Otherwise, a researcher may not want to use this option because it takes more computational time.

correlationdeclares that the partial correlations are used as the axes of the contour plot. Specifically, the vertical axis is defined by the partial correlation between an unobservable and the outcome variable. The horizontal axis is defined by the partial correlation between an unobservable and the treatment variable. An unobservable, the outcome variable and the treatment variable are all residualized by the other control variables before calculating partial correlation. If eitheryprobitorprobitis specified,correlationis altomatically selected.

ylogit,yprobit,ylpm,ycontinuous,logit,probit,lpm, andcontinuousall specify the way model is estimated. The default for a binary outcome varaible isylogit. The default for a continuous outcome varaible isycontinuous. The default for a binary treatment varaible islogit. The default for a continuous treatment varaible iscontinuous. If eitheryprobitorprobitis specified,correlationis altomatically selected.+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype)specifies the type of standard error reported, which includes types that are derived from asymptotic theory, that are robust to some kinds of misspecification that allow for intragroup correlation; see [R] vce_option.[R] vce_option

ExampleThe first example below evaluates the effect of union membership on hourly wage assuming that union membership is a treatment variable. Because this example is the same as that appears in -

isa-, this manual focuses on the issues specific to -gsa-.. sysuse nlsw88 . xi i.race . rename _Irace_2 black . rename _Irace_3 other . xi i.industry . rename _Iindustry_5 mnfctr . gsa wage union age black other grade married south c_city mnfctr, tau(.314) logit binu scatter nplots(8)

In several minutes or so, -

gsa- produces the figure similar to Imbens (2003), or -isa-. The blue horrow circles are the scatter plots that represent the estimates of partial R-squares for each pseudo unobservable. The contour curve is drawn based on these plots with fractional polynomial function. It would be interesting to run the following code from -isa- and see how closely -gsa- replicates the outputs of Imbens (2003).. isa wage union age black other grade married south c_city mnfctr, tau(.314)

The second example uses the continous treatment variable, namely the years of education. The following code asks how much strong assumption about an unobservable a researcher needs to make to discount 10% of the effect of education on income.

. gsa wage grade age black other south c_city, tau(.633) maxc1(.1) maxc2(.1) scatter nplots(5)

Some control variables are dropped from the right hand side to avoid post-treatment bias. In this example, an unobservable is continous, which produces more conservative contour (i.e. it's harder to prove robustness). Nevertheless, the figure shows that all covariates are plotted far below the contour. Suppose only unobservable we need to worry in this model is intelligence (IQ). Then, this result shows that a researcher needs to assume that the effect of IQ on income must be more than several times stronger than that of age, race and locations to discount the education effect by 10%. If a researcher prefers partial correlation to partial R-square as axes, the following command does the job.

. gsagraph wage grade age black other south c_city, tau(.633) nplots(5) scatter cor

Tips1. The treatment effect must be positive for the program to work properly.

2. The mean of variables should not be too large or too small. Ideally, the mean of the variables should be 1 digit. This will help finding appropriate values of

maxc1andmaxc2.3. Do not set the values of

maxc1andmaxc2too large. Usually,maxc1andmaxc2are smaller than 5 and often do not exceed 2.4. If the contour ends too short in the right edge, increase the values of

maxc1. On the other hand, if the contour ends too short in the top edge, increase the values ofmaxc2. Also, if the contour is too far from the plots of covariates, decreasemaxc1andmaxc2.5. If outcome and/or treatment variable(s) are/is binary and the number of observation is small, linear probability model (

ylpmandlpm) tends to produce a nice contour.

Saved results-

gsa- saves the following variables. The graph can be reproduced using these saved variables. The results of_nth successful generation of a pseudo unobservable are recorded in the_nth row:

gsa_c1the value of c1.gsa_c2the value of c2.gsa_alphathe value of alpha.gsa_deltathe value of delta.gsa_partial_rsq_ythe partial r-square of an unobservable in the outcome equationgsa_partial_rsq_tthe partial r-square of an unobservable in the treatment assignment equationgsa_rho_res_yuthe partial correlation between an unobservable and the outcome variable. Not available withyprobitandprobit.gsa_rho_res_tuthe partial correlation between an unobservable and the treatment variable. Not available withyprobitandprobit.gsa_partial_rsq_yxthe partial r-square of thekth covariate in the outcome equationgsa_partial_rsq_txthe partial r-square of thekth covariate in the treatment assignment equationgsa_rho_res_yxthe partial correlation between thekth covariate and the outcome variable.gsa_rho_res_txthe partial correlation between thekth covariate and the treatment variable.

ReferenceHere is the link for the quick guide of -

gsa- by the author of the program. This provides the idea about what alpha, delta, c1 and c2 mean.If you use this program, please cite:

Harada, Masataka "Generalized Sensitivity Analysis."

Working paper.Imbens, Guido W. 2003. "Sensitivity to Exogeneity Assumptions in Program Evaluation."

The American Economic Review93(2):126-132.

ContactPlease feel free to contact Masataka Harada(masatakaharada@nyu.edu) for