.-
help for ^relogitq^
.-
Calculates quantities of interest after a corrected logit regression
--------------------------------------------------------------------
^relogitq^ [^, pr^ ^bayes^ ^mle^ ^unbi^ased ^listx^
^fd(pr)^ ^changex(^var1 val1 val2 [^&^ var2 val1 val2]^)^
^rr(^var1 val1 val2 [^&^ var2 val1 val2]^) sims(^#^) l^evel^(^#^)^]
Description
-----------
This procedure implements the suggestions of King and Zeng (1999a,b)
for improved methods of computing quantities of interest -- absolute
risks (probabilities), relative risks, and attributable risks (first
differences) -- from a logistic regression that is corrected for small
samples and rare events, as well as selection on the dependent
variable as in case-control designs. First run a corrected logit (see
help @relogit@) and set values for the explanatory variables (see help
@setx@). Then use ^relogitq^ to calculate the desired quantities of
interest.
Note: The ^relogitq^ procedure is memory-intensive. If you receive an error
message such as "no room to add more variables," you may need to allocate
more memory to Stata by typing "clear" and then "set memory #m". See
[R] memory in the reference manual for more details about memory allocation.
Options That Affect Which Quantities are Calculated
---------------------------------------------------
^pr^ reports Pr(depvar==1|x), the probability (absolute risk) that the
dependent variable takes on a value of 1 when the explanatory
variables (x) are set at values that were chosen at the @setx@
stage. If no other options are specified, this is the default
output.
^fd(pr)^ is a "wrapper" that makes it easy to simulate first differences
(also called attributable risks). Simply wrap the fd() wrapper around
the ^pr^ option to estimate the change in Pr(Y=1) given some change in x,
holding other variables at the values that were set at the @setx@
stage. The ^fd()^ wrapper must be used in conjunction with the
^changex()^ option.
^changex(^var1 val1 val2^)^ specifies how the explanatory variables
(x) should change when evaluating a first difference (attributable
risk). ^changex()^ uses the same basic syntax as @setx@, except
that each explanatory variable has two values: a starting value and
an ending value. For instance, ^fd(pr)^ ^changex(x1 .2 .8)^
calculates the change in Pr(Y=1) caused by increasing x1 from its
starting value, 0.2, to its ending value, 0.8. You can specify
multiple changex scenarios by separating each scenario with an
ampersand. See the examples, below.
^rr(^var1 val1 val2^)^ specifies how the explanatory variables (x)
should change when calculating the relative risk,
Pr(Y=1|xend)/Pr(Y=1|xstart), where xstart represents the vector of
starting values for x and xend represents the vector of ending
values for x. ^rr()^ uses the same basic syntax as ^changex^. For
instance, ^rr(x1 mean p75)^ instructs ^relogitq^ to calculate the
relative risk of Pr(Y=1) caused by increasing x1 from its mean to
its 75th percentile, holding other variables at the levels chosen
at the @setx@ stage. In this example, x1 is set to its mean in
xstart and set to its 75th percentile in xend. If you are
interested in the percentage change in relative risk, compute
100*[rr - 1], where rr is the output from this command. You can
specify multiple rr() scenarios by separating each scenario with an
ampersand.
^listx^ causes ^relogitq^ to list all x-values that were chosen at the @setx@
stage and provide a basis for predicted probabilities, first differences
and relative risks.
^l^evel^(^#^)^ specifies the confidence level, in percent, for confidence
intervals around quantities of interest. The default is ^level(95)^ or
the value set by ^set l^evel. For more information on ^set l^evel, see
the on-line help for @level@.
Options that Affect How the Quantities are Calculated
-----------------------------------------------------
By default, ^relogitq^ uses stochastic simulation to compute all
quantities of interest and the uncertainty surrounding those
quantities. The program reports the median of the simulated posterior
density, as well as confidence intervals around the median.
^relogitq^ also supports analytical methods for obtaining point
estimates of quantities of interest, but continues to use simulation
to measure the uncertainty. The following three analytical methods
are available:
^mle^ instructs ^relogitq^ to calculate point estimates for quantities of
interest based only on the the maximum likelihood estimates (the
coefficients generated by @relogit@), without accounting for their
uncertainty. For instance, mle option computes the probability
Pr(Y=1|x,b) using the formula 1/(1+exp(-x*b)), where x is the
vector of x's that was chosen at the @setx@ stage and b represents
the vector of logit (or relogit) coefficients. This approach is
consistent but has higher mean square error and so is not generally
recommended.
^unbi^ased instructs ^relogitq^ to calculate approximately unbiased estimates
of all quantities of interest. This option has a higher mean squared error
than the Bayesian alternative, which is superior in most cases.
^bayes^ uses the entire probability distribution of b to approximate the
expected value of Pr(Y=1|x), without conditioning on the point
estimate b. This approach has the lowest mean squared error and is
recommended for users who prefer the analytical approach.
The program also contains an option to control the simulation process, which
produces all measures of uncertainty.
^sims(^m^)^ specifies the number of simulations, m, which must be a positive
integer. The default is 1000 simulations. Increase the number of
simulations to obtain more precise approximations to quantities of
interest; reduce the number of simulations for greater
computational speed. You can determine whether you have enough
precision by repeating a relogitq command with the same number of
simulations and seeing whether you have sufficient digits of
precision. If you choose a large number of simulations, you
may need to allocate more memory to Stata. See [R] memory in the
reference manual for more details about memory allocation.
Examples
--------
To display Pr(Y=1|x), where x represents the values that were chosen
at the @setx@ stage, type
. ^relogitq^
To obtain the same quantity of interest via analytical Bayesian
methods and list all x-values chosen at the @setx@ stage, type type
. ^relogitq, bayes listx^
Use the ^fd()^ and ^changex()^ options to calculate the effects of
changes in probabilties caused by movements in x. For instance, the
following command will calculate the change in Pr(Y=1) caused by
increasing the explanatory variable x1 from its 20th to its 80th
percentile.
. ^relogitq, fd(pr) changex(x1 p20 p80)^
You specify many changex() scenarios by separating each scenario with
an ampersand. The following expression will calculate two first
differences (attributable risks): the change in Pr(Y=1) caused by
increasing x1 from its mean to its maximum level, and the change in
Pr(Y=1) caused by simultaneously incrasing x1 from 3 to the square
root of 15 and increasing x2 from its median to its 90th percentile.
. ^relogitq, fd(pr) changex(x1 mean max & x1 3 sqrt(15) x2 median p90)^
A similar syntax applies to relative risks. Thus, the next command
gives the percentage change in relative risk of Pr(Y=1) caused by
raising x1 from 10 to 15.
. ^relogitq, rr(x1 10 15)^
Saved Results
-------------
^relogitq^ saves the following scalars:
r(Pr) = Point estimate for Pr(Y=1|x), where x was set with @setx@
r{PrL) = Lower bound of confidence interval for Pr(Y=1|x)
r(PrU) = Upper bound of confidence interval for Pr(Y=1|x)
r(dPr_#) = Point estimate of change in Pr(Y=1) for 1st difference scenario #
r(dPrL_#) = Lower bound of confidence interval for first difference #
r(dPrU_#) = Upper bound of confidence interval for first difference #
r(rr_#) = Point estimate of % change in Pr(Y=1) for relative risk scenario #
r(rrL_#) = Lower bound of confidence interval for relative risk #
r(rrU_#) = Upper bound of confidence interval for relative risk #
Distribution
------------
^relogitq^ is (C) Copyright, 1999, Michael Tomz, Gary King and Langche
Zeng, All Rights Reserved. You may copy and distribute this program
provided no charge is made and the copy is identical to the original.
To request an exception, please contact:
Michael Tomz
Department of Government, Harvard University
Littauer Center North Yard
Cambridge, MA 02138
Please distribute the current version of this program, which is
available from http://GKing.Harvard.Edu.
References
----------
Gary King and Langche Zeng. 1999a. "Logistic Regression in Rare
Events Data," Department of Government, Harvard University,
available from http://GKing.Harvard.Edu.
Gary King and Langche Zeng. 1999b. "Estimating Absolute, Relative,
and Attributable Risks in Case-Control Studies," Department of
Government, Harvard University, available from
http://GKing.Harvard.Edu.