{smcl}
{* *! version 1.0.0 MLB 07Nov2010}{...}
{hline}
help for {hi:scenreg}
{hline}
{title:title}
{phang}
{bf:scenreg} {hline 2} Scenarios for models with binary dependent variables
{title:Syntax}
{p 8 17 2}
{cmd:scenreg}
[{varlist}]
{ifin}
{weight}
{cmd:,}
{opt sd(exp)}
[{it:options}]
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Model}
{synopt:{opt link(link_name)}}specifies the link function{p_end}
{synopt:{opt dist(dist_name)}}specifies the distribution of the unobserved variable{p_end}
{synopt:{opt rho(varname #)}}specifies the correlation between {it:varname} and the unobserved variable{p_end}
{synopt:{opt nocons:tant}}suppresses the constant{p_end}
{syntab:SE/Robust}
{synopt :{opth vce(vcetype)}}{it:vcetype} may be {opt oim},
{opt r:obust}, {opt cl:uster} {it:clustvar}, {opt opg}, {opt boot:strap},
or {opt jack:knife}{p_end}
{syntab:Display}
{synopt:{opt or}}report odds ratios{p_end}
{synopt:{opt rr}}report risk ratios{p_end}
{synopt:{opt hr}}report hazard ratios{p_end}
{synopt:{opt l:evel(#)}}set confidence level{p_end}
{synopt :{opt coefl:egend}}display coefficients' legend instead of coefficient table{p_end}
{syntab:Maximization}
{synopt :{it:{help scenreg##maximize_options:maximize_options}}}control the maximization process; seldom used{p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
{cmd:fweight}s and {cmd:pweight}s are allowed; see {help weight}.
{title:Description}
{pstd}
The results of many models for binary dependent variables can be influenced by
unobserved variables, even when these unobserved variable are uncorrelated with any
of the observed variables {help scenreg##neuhaus_jewell:Neuhaus and Jewell (1993)},
{help scenreg##allison:Allison (1999)}, {help scenreg##williams:Williams (2009)},
and {help scenreg##mood:Mood (2010)}. With {cmd:scenreg} one can explore the seriousness
of this potential problem for your data and hypotheses by allowing one to estimate
the results given a wide set of scenarios concerning the unobserved variable. In
that sense it is similar to {help scenreg##rosenbaum_rubin:Rosenbaum and Rubin (1983)},
{help scenreg##rosenbaum:Rosenbaum (2002)} and {help scenreg##becker_caliendo:Becker and Caliendo (2007)},
except that the method for estimating these scenarios used in {cmd:scenreg} allows
greater flexibility regarding the distribution of the unobserved variable and the
hypotheses and parameters that are investigated for scensitivity. The method is
similar to the method proposed by {help scenreg##buis:Buis (2010)}, except that he
used it for a sequential logit model.
{title:Options}
{dlgtab:Model}
{phang}
{opt sd(exp)} specified the standard deviation of the unobserved variable, which
can also be interpreted as the effect of the standardized unobserved variable.
The {it:exp} can either be a positive number or an expression that can be
interpreted by {manhelp generate R}. This option is not optional.
{phang}
{opt link(link_name)} specified the link function, the default is the {cmdab:logi:t}
link function. Alternitives are: {cmdab:prob:it}, {cmdab:iden:tity}, {cmdab:log},
{cmdab:clog:log}, and {cmdab:logl:og}.
{phang}
{opt dist(dist_name)} specifies the distribution of the unobserved variable. The
default is the {cmdab:norm:al} distribution. Alternatives are {cmdab:Gaus:sian} (a
synonym for {cmd:normal}), {cmdab:unif:orm}, and {cmdab:disc:rete} {it:# #}[{it:#} [...]].
The numbers in after {cmd:discrete} represent the proportion of observations that
belong to each category of the discrete distribution. Since the numbers are
propotions, they all need to be larger than 0 and they need to add up to 1. The
location of these categories will be chosen such that the mean is 0, the standard
deviation is 1, and all categories are separated by the same distance.
{phang}
{opt rho(varname #)} specifies the correlation of the unobserved variable and the
variable {it:varname}. The default is 0.
{phang}
{opt noconstant} suppresses the constant term (intercept) in the model.
{dlgtab:SE/Robust}
INCLUDE help vce_asymptall
{dlgtab:Display}
{phang}
{opt or} | {opt rr} | {opt hr} reports the estimated coefficients transformed
to odds ratios, risk ratios, or hazard ratios i.e., exp(b) rather than b.
Standard errors and confidence intervals are similarly transformed. This
option affects how results are displayed, not how they are estimated.
Unfortunately, this option suppresses the display of the baseline odds, risk
or hazard, i.e. exp(_cons). See {help scenreg##newson:Newson (2003)} for a
way around this problem.
{pmore}
The {opt or} option is only possible incombination with the {cmd:logit} link
function. The {opt rr} option is only possible incombination with the
{cmd:log} link function. The {opt hr} option is only possible incombination
with the {cmd:cloglog} link function.
{phang}
{opt level(#)}; see
{helpb estimation options##level():[R] estimation options}.
{phang}
{opt coeflegend}; see
{helpb estimation options##coeflegend:[R] estimation options}.
{marker maximize_options}{...}
{dlgtab:Maximization}
{phang}
{opt draws(#)} specifies the number of pseudo random draws per observation used when calculating the simulated
likelihood; the default is 100. See {manhelp mata_halton() M-5} and {help scenreg##drukker_gates:Drukker & Gates (2006)}.
{phang}
{opt start(#)} specifies the index at which the Halton sequence starts; the default is 15. See {manhelp mata_halton() M-5}
and {help scenreg##drukker_gates:Drukker & Gates (2006)}.
{phang}
{opt eclear} specifies that the Mata external global object S_unobserved_variable may be overwritten. This global
object is used to store pseudo-random draws from the unobserved variable, and is normally removed the moment {cmd:scenreg}
successfully finished. However, it can be left behind when {cmd:scenreg} exited with an error, in which case
the {opt eclear} option must be specified in the subsequent call to {cmd:scenreg} otherwise it will exit with an error.
{phang}
{it:maximize_options}:
{opt dif:ficult}, {opt tech:nique(algorithm_spec)},
{opt iter:ate(#)}, [{cmd:{ul:no}}]{opt lo:g}, {opt tr:ace},
{opt grad:ient}, {opt showstep},
{opt hess:ian},
{opt showtol:erance},
{opt tol:erance(#)},
{opt ltol:erance(#)},
{opt nrtol:erance(#)},
{opt nonrtol:erance},
; see {manhelp maximize R}. These options are seldom
used.
{title:Remarks}
{pstd}
The aim of {cmd:scenreg} is to explore the sensitivity of results in a binary regression model to
the presence of unobserved variables. There are many publications now showing that these
unobserved variable {it:could} influence the results ({help scenreg##neuhaus_jewell:Neuhaus and Jewell (1993)},
{help scenreg##allison:Allison (1999)}, {help scenreg##williams:Williams (2009)},
and {help scenreg##mood:Mood (2010)}), but these publications cannot tell you how big the problem
is in your data and for your hypotheses or parameters of interest. That is the question that
a sensitivity analysis using {cmd:scenreg} is supposed to answer.
{pstd}
The scenarios consists of assumptions concerning the strength of the effect of the standardized
unobserserved variable (specified in the {opt sd(exp)} option), the distribution of the
unobserved variable (specified in the {opt dist(dist_name)} option), and the correlation between
the unobserved and an observed variable (specified in the {opt corr(varname #)} option). The
strength of the effect of the unobserved variable can be constant or change over variables. The
latter possibility is useful as this "heteroscedasticity" plays an important part in the literature.
The effects are than estimated by integrating the unobserved variable out of the likelihood function
using maximum simulated likelihood, see: {help scenreg##train:Train (2003)} and the special issue
on maximum simulated likelihood in the Stata Journal, {browse "http://www.stata-journal.com/sj6-2.html":issue 6, number 2}.
{pstd}
The hard part is to determine a set of scenarios that on the one hand push the model hard, but on the other
hand are still (somewhat) plausible. This is not a technical problem, but a substantive one. The best thing
one can do is look at the literature in your field, and see what kind of effects occur in real data.
Remember that the effect specified in the {opt sd()} option can be thought of as effects of a standardized
variable.
{pstd}
To keep the number of scenarios manageable (and estimateable) you will typically want to break your
sensitivity analysis up into several sub-analyses: One that only changes the amount of unobserved
heterogeneity, one that allows the amount of unobserved heterogeneity to change in differing degrees over
a key variable, one that fixes the amount of unobserved heterogeneity to one number but allows the
correlation between the unobserved variable and an observed variable of interest to change, etc.
{title:Example}
{pstd}
As a general strategy, it is often useful to build a (sub-)sensitivity analysis in three steps:
{pmore}
1) prepare the data
{pmore}
2) estimate the scenarios, and store those models using {help estimates}
{pmore}
3) analyse the stored scenarios
{pstd}
The reason for separating the estimating and storing the scenarios from the analysing the scenarios
is that the estimation can take quite a bit of time, so you really want to do that only once, while
the analysis part consist of a lot of moving back and forth between scenarios and parameters that
might be of interest. By estimating and storing the models you can avoid estimating the same scenario
multiple times and you more easily keep an overview of which scenarios you estimated.
{pstd}
Below is an example of how I would organize such a sensitivity analysis. I start with a basic model
without unobserved heterogeneity, In this case I model union membership of women who were asked in
1988 where asked question regarding their union membership, marital status and how many years of
schooling they attained. The variable of interest is the education of the respondent (grade).
{cmd}
sysuse nlsw88, clear
gen byte black = race == 2 if race <= 2
gen byte baseline = 1
scenreg union married never_married black grade baseline, ///
sd(0) link(logit) or nocons
est store s0
{txt}
{pstd}
Next I will estimate the other scenarios. In this case I will look at the influence of changing the
amount of unobserved heterogeneity. So here I estimated three scenarios, where in each subsequent
scenario the amount of unobserved heterogeneity increased by 1.
{cmd}
scenreg union married never_married black grade baseline, ///
sd(1) link(logit) or nocons
est store s1
scenreg union married never_married black grade baseline, ///
sd(2) link(logit) or nocons
est store s2
scenreg union married never_married black grade baseline, ///
sd(3) link(logit) or nocons
est store s3
{txt}
{pstd}
Next we can use these stored scenarios to look if our conclusions are sensitive to the amount of
unobserved heterogeneity. Say we are interested in the effect of education for women. The
So we are looking at the parameter of grade. I start with creating an empty matrix in which I
will later store the results from the different scenarios. I have 4 scenarios, so the matrix will
contain 4 rows. For each scenario I want to store the amount of unobserved heterogeneity, the
odds ratio for grade, and the p-value of the test whether this odds ratio equals 1 (which is
equivalent to the test that the coefficient equals 0), so the matrix will contain three columns.
{cmd}
matrix res = J(4,3,.)
{txt}
{pstd}
Next I loop over the scenarios, which I called s0 till s3. I start with using {cmd: estimates restore}
to retrieve the appropriate scenario. I than test whether the effect of being white during the
third scenario equals zero. Than I create a new local macro equal to `i' + 1, so it will run from
1 to 4. This macro will indicate which row of the matrix res I will want to fill. The final line
says that we populate the `j'th row of matrix res with three numbers (see {help matrix substitution}):
{pmore}
The first number is amount of unobserved heterogeneity used in that scenario. Here I used the fact
that I created my scenarios in such a way that the amount of unobserved heterogeneity equals `i'.
In general one creates the scenarios in such a way that they differ in some regular way, and you
can often use that regularity to populate the first column of such a results matrix.
{pmore}
The second number is the odds ratio for grade. Here I used the standard Stata way of retrieving
coefficients from models, for more see here: {help _variables}. The odds ratio is exp(coefficient).
{pmore}
The final number is the p-value of the test whether that the odds ratio equals 1, i.e. the coeficient
equals 0. This p-value was left behind by the {help test} command as r(p).
{cmd}
forvalues i = 0/3 {
est restore s`i'
test _b[grade] = 0
local j = `i' + 1
matrix res[`j',1] = `i', exp(_b[grade]), r(p)
}
matrix colnames res = "sd" "or" "p"
{txt}
{pstd}
I can than tabulate the results using {help matlist}.
{cmd}
matlist res, names(columns) format(%9.3g)
{txt}
{pstd}
Or I can graph them. To do that I first turn the matrix into variables in my dataset using
{help svmat}. These variables I can than use to create my graphs.
{cmd}
svmat res, names(col)
twoway line or sd, ///
xtitle("effect of the standardized unobserved variable" ///
"(log odds ratio)") ///
ytitle("effect of grade (odds ratio)")
twoway line p sd, ///
xtitle("effect of the standardized unobserved variable" ///
"(log odds ratio)") ///
ytitle("p-value of test" "whether odds ratio for grade = 1")
{txt}
{pstd}
Putting this all together:
{cmd}
// start with preparing the data
sysuse nlsw88, clear
gen byte black = race == 2 if race <= 2
gen byte baseline = 1
// estimate the scenarios
scenreg union married never_married black grade baseline, ///
sd(0) link(logit) or nocons
est store s0
scenreg union married never_married black grade baseline, ///
sd(1) link(logit) or nocons
est store s1
scenreg union married never_married black grade baseline, ///
sd(2) link(logit) or nocons
est store s2
scenreg union married never_married black grade baseline, ///
sd(3) link(logit) or nocons
est store s3
// collect estimates from the scenarios
matrix res = J(4,3,.)
forvalues i = 0/3 {
est restore s`i'
test _b[grade] = 0
local j = `i' + 1
matrix res[`j',1] = `i', exp(_b[grade]), r(p)
}
matrix colnames res = "sd" "or" "p"
// tabulate the estimates
matlist res, names(columns) format(%9.3g)
// graph the estimates
// first turn the matrix into variables
svmat res, names(col)
// graph the variables
twoway line or sd, ///
xtitle("effect of the standardized unobserved variable" ///
"(log odds ratio)") ///
ytitle("effect of grade (odds ratio)")
twoway line p sd, ///
xtitle("effect of the standardized unobserved variable" ///
"(log odds ratio)") ///
ytitle("p-value of test" "whether odds ratio for grade = 1")
{txt}
{title:Author}
{p 4 4 2}Maarten L. Buis, Universitaet Tuebingen{break}maarten.buis@uni-tuebingen.de
{title:References}
{marker allison}{...}
{phang}
Allison, Paul D. (1999) "Comparing logit and probit coefficients across groups".
{it:Sociological Methods & Research}, 28(2): 186-208.
{marker becker_caliendo}{...}
{phang}
Becker, Sascha O. and Marco Caliendo (2007) "Sensitivity analysis for average treatment effects",
{it:The Stata Journal}, 7(1): 71-83.
{browse "http://www.stata-journal.com/article.html?article=st0121"}
{marker buis}{...}
{phang}
Buis, Maarten L (2010) "Chapter 7, The consequences of unobserved heterogeneity in a sequential logit model",
In: Buis, Maarten L. {it:Inequality of Educational Outcome and Inequality of Educational Opportunity in the Netherlands during the 20th Century}.
PhD thesis.
{browse "http://www.maartenbuis.nl/dissertation/chap_7.pdf"}
{marker drukker_gates}{...}
{phang}
Drukker, David M. and Richard Gates (2006) "Generating Halton sequences using Mata",
{it:The Stata Journal}, 6(2): 214-228.
{browse "http://www.stata-journal.com/article.html?article=st0103"}
{marker mood}{...}
{phang}
Mood, Carina (2010) "Logistic regression: Why we cannot
do what we think we can do, and what we can do about
it." {it:European Sociological Review}, 26(1):6 7–82.
{marker neuhaus_jewell}{...}
{phang}
Neuhaus, John M. and Nicholas P. Jewell (1993) "A
Geometric Approach to Assess Bias Due to Omited
Covariates in Generalized Linear Models." {it:Biometrika},
80(4): 807–815.
{marker newson}{...}
{phang}
Newson, Roger (2003) "Stata tip 1: The eform() option of regress",
{it:The Stata Journal}, 3(4): 445.
{browse "http://www.stata-journal.com/article.html?article=st0054"}
{marker rosenbaum}{...}
{phang}
Rosenbaum, Paul R (2002) {it:Observational Studies}. New York: Springer, 2d edition.
{marker rosenbaum_rubin}{...}
{phang}
Rosenbaum, Paul R. and Donald B. Rubin (1983) "Assessing Sensitivity to an Unobserved
Binary Covariate in an Observational Study with Binary Outcome."
{it:Journal of the Royal Statistical Society. Series B}, 45(2): 212–218.
{marker train}{...}
{phang}
Train, Kenneth (2002) {it:Discrete Choice Methods with Simulation}. Cambridge: Cambridge
University Press.
{marker williams}{...}
{phang}
Williams, Richard (2009) "Using heterogenous choice models to compare logit and probit
coefficients across groups", {it:Sociological Methods & Research}, 37(4): 531–559.
{title:Also see:}
{pstd}
Online: {helpb glm}, {helpb mata_halton()}
{pstd}
If installed: {helpb seqlogit}, {helpb mhbounds}