Population attributable and unattributable risks from binary regression models
regpar [if] [in] [weight] , [ atspec(atspec) atzero(atspec0) subpop(subspec) predict(pred_opt) vce(vcespec) noesample force iterate(#) level(#) post ]
where atspec and atspec0 are at-specifications recognized by the at() option of margins, subspec is a subpopulation specification of the form recognized by the subpop() option of margins, and vcespec is a variance-covariance specification of the form recognized by margins, and must have one of the values
delta | unconditional
fweights, aweights, iweights, pweights are allowed; see weight.
Description
regpar calculates confidence intervals for population attributable risks, and also for scenario proportions. regpar can be used after an estimation command whose predicted values are interpreted as conditional proportions, such as logit, logistic, probit, or glm. It estimates two scenario proportions, a baseline scenario ("Scenario 0") and a fantasy scenario ("Scenario 1"), in which one or more exposure variables are assumed to be set to particular values (typically zero), and any other predictor variables in the model are assumed to remain the same. It also estimates the difference between the Scenario 0 proportion and the Scenario 1 proportion. This difference is known as the population attributable risk (PAR), and represents the amount of risk attributable to living in Scenario 0 instead of Scenario 1.
Options for regpar
atspec(atspec) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to values different from their value in the baseline scenario (denoted "Scenario 0" and equal to the real-life scenario unless atzero() is specified). regpar uses the margins command to estimate the proportions of outcome values positive under Scenarios 0 and 1, and then uses nlcom to estimate the logits of these 2 scenario proportions, and the hyperbolic arctangent (or Fishers's z-transform) of the difference between the Scenario 0 proportion and the Scenario 1 proportion, known as the population attributable risk (PAR). The proportion positive under Scenario 1 is known as the population unattributable risk (PUR). If atspec() is not specified, then its default value is atspec((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.
atzero(atspec0) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single baseline scenario ("Scenario 0"), defined as an alternative fantasy world in which a subset of predictors in the model are set to the values specified by atspec0. Scenario 0 will then be compared to the "Scenario 1" specified by the atspec() option. If atzero() is not specified, then its default value is atzero((asobserved) _all), implying that Scenario 0 is the real-life baseline scenario, represented by the predictor values actually present.
subpop(subspec), predict(pred_opt) and vce(vcespec) have the same form and function as the options of the same names for margins. They specify the subpopulation, the predict option(s), and the variance-covariance matrix formula, respectively, used to estimate the scenario proportions, and therefore to estimate the population attributable risk.
noesample has the same function as the option of the same name for margins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.
force has the same function as the option of the same name for margins.
iterate(#) has the same form and function as the option of the same name for nlcom. It specifies the number of iterations used by nlcom to find the optimal step size to calculate the numerical derivatives of the logits of the scenario proportions and the z-transform of their difference, with respect to the original scenario proportions calculated by margins.
level(#) specifies the percentage confidence level to be used in calculating the confidence intervals. If it is not specified, then it is taken from the current value of the c-class value c(level), which is usually 95.
post specifies that regpar will post in e() the estimation results for estimating the logits of the scenario means and the z-transform of their difference, the PAR. If post is not specified, then any existing estimation results are left in e(). Note that the estimation results posted are for the logits of the scenario proportions and the z-transform of their difference, and not for the proportions themselves and their difference. This is done because the estimation results are intended to define symmetric confidence intervals for the transformed parameters, which can be back-transformed to define asymmetric confidence intervals for the untransformed parameters.
Remarks
regpar estimates the population attributable risk, defined in its simplest unadjusted form by Gordis (2000). The general principles behind scenario comparisons for generalized linear models were introduced in Lane and Nelder (1982).
regpar starts by estimating the logits of the scenario proportions and the hyperbolic arctangent or Fisher's z-transform of their difference (the PAR), using margins and nlcom. The results of this estimation are stored in e(), if the option post is specified. These estimation results may be saved in an output dataset (or resultsset) by the parmest package, which can be downloaded from SSC.
regpar assumes that the most recent estimation command estimates the parameters of a regression model, whose fitted values are conditional proportions, which must be bounded between 0 and 1. It is the user's responsibility to ensure that this is the case. However, it will be true if the conditional proportions are defined using a generalized linear model with a Bernoulli variance function (not a non-Bernoulli binomial variance function), and a logit, probit or complementary log-log link function.
Note that population attributable and unattributable risks (PARs and PURs) are not the same parameters as population attributable and unattributable fractions (PAFs and PUFs). A population attributable risk is a difference between scenario proportions, a population unattributable fraction is a ratio between scenario proportions (or means), and a population attributable fraction is the result of subtracting a population unattributable fraction from 1. Users who need to estimate population unattributable fractions should use either punaf (for cohort or cross-sectional study data) or punafcc (for case-control or survival study data). Users who need to estimate scenario prevalences (without differences) should use margprev. Users who need to estimate log-transformed scenario means (without ratios) should use marglmean. The packages punaf, punafcc, margprev and marglmean are downloadable from SSC.
Note, also, that regpar estimates population attributable risks (PARs) using an indirect-standardization method, based on a regression model. Alternatively, a user can estimate the PAR using a direct-standardization method, based on rank statistics. This can be done using the scsomersd, somersd and expgen packages, which also are downloadable from SSC.
Examples
The following examples use the dataset lbw.dta, provided by Hosmer and Lemeshow (1988) and used in [R] logistic and distributed by Stata Press. This dataset has 1 observation for each of a sample of pregnancies, and data on the birth weight of the baby and on a list of predictive variables, which might be assumed to be causal by some scientists.
Setup
.use http://www.stata-press.com/data/r11/lbw.dta, clear .describe
The following example estimates population unattributable and attributable risks for maternal smoking during pregnancy as a predictor of low birth weight. This is done by comparing "Scenario 1" (a fantasy world in which no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).
. logit low i.race i.smoke, or robust . regpar, at(smoke=0)
The following example estimates population unattributable and attributable risks for maternal smoking and non-white race. This is done by comparing "Scenario 1" (a fantasy world in which all pregnant women are white and no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).
.logit low i.race i.smoke, or robust .regpar, at(smoke=0 race=1)
The following example demonstrates the use of regpar with a univariate model of low birth weight with respect to maternal smoking status, to estimate total and exposed PARs. We logit to estimate the odds ratio of low birth weight with respect to maternal smoking, and use regpar to estimate the scenario proportions and PARs, first for the total population, then for the smoking-exposed subpopulation. Finally, we use regpar with the atzero() option to compare 2 alternative fantasy scenarios, a "Scenario 0" in which no mothers smoke and a "Scenario 1" in which all mothers smoke. Note that, in this comparison, the PAR is negative, because a world of non-smoking mothers would have fewer low birth weight babies than a world of smoking mothers.
.logit low i.smoke, or robust .regpar, at(smoke=0) .regpar if smoke==1, at(smoke=0) .regpar, at(smoke=1) atzero(smoke=0)
The following example demonstrates the use of regpar with the parmest package, downloadable from SSC. The population unattributable and attributable risks for smoking are estimated using regpar (with the post option), and saved (in their transformed versions), using parmest, in a dataset in memory, overwriting the original dataset, with 1 observation for each of the 3 transformed parameters, named "Scenario_0", "Scenario_1" and "PAR", and data on the estimates, confidence limits, P-values, and other parameter attributes. We then use replace to replace the symmetric confidence intervals for the transformed parameters with asymmetric confidence intervals for the untransformed parameters, and describe and list the new dataset.
. logit low i.race i.smoke, or robust . regpar, at(smoke=0) post . parmest, norestore . foreach Y of var estimate min* max* { . replace `Y'=invlogit(`Y') if parm!="PAR" . replace `Y'=tanh(`Y') if parm=="PAR" . } . describe . list
Saved results
regpar saves the following in r():
Scalars r(rank) rank of r(V) r(N) number of observations r(N_sub) subpopulation observations r(N_clust) number of clusters r(N_psu) number of samples PSUs, survey data only r(N_strata) number of strata, survey data only r(df_r) variance degrees of freedom, survey data only r(N_poststrata) number of post strata, survey data only r(k_margins) number of terms in marginlist r(k_by) number of subpopulations r(k_at) number of at() options (always 2) r(level) confidence level of confidence intervals
Macros r(atzero) atzero() option r(atspec) atspec() option
Matrices r(cimat) vector containing estimates and confidence limits for the scenario proportions and the PAR r(b) vector of logits of scenario proportions and their z-transformed difference r(V) estimated variance-covariance matrix of the logits of scenario proportions and their z-transformed difference
If post is specified, regpar also saves the following in e():
Scalars e(rank) rank of e(V) e(N) number of observations e(N_sub) subpopulation observations e(N_clust) number of clusters e(N_psu) number of samples PSUs, survey data only e(N_strata) number of strata, survey data only e(df_r) variance degrees of freedom, survey data only e(N_poststrata) number of post strata, survey data only e(k_margins) number of terms in marginlist e(k_by) number of subpopulations e(k_at) number of at() options (always 2)
Macros e(cmd) regpar e(predict) program used to implement predict e(atzero) atzero() option e(atspec) atspec() option e(properties) b V
Matrices e(b) vector of logits of scenario proportions and their z-transformed difference e(V) estimated variance-covariance matrix of the logits of scenario proportions and their z-transformed difference e(V_srs) simple-random-sampling-without-replacement (co)variance hat V_srswor, if svy e(V_srswr) simple-random-sampling-with-replacement (co)variance hat V_srswr, if svy and fpc() e(V_msp) misspecification (co)variance hat V_msp, if svy and available
Functions e(sample) marks estimation sample
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Gordis, L. 2000. Epidemiology. 2nd ed. Philadelphia, PA: W. B. Saunders.
Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small. Biometrical Journal 30: 911–924.
Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction. Biometrics 38: 613–621.
Also see
Manual: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] probit, [R] glm
Help: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] probit, [R] glm punaf, punafcc, margprev, marglmean, parmest, scsomersd, somersd, expgen if installed