------------------------------------------------------------------------------- help for punaf (Roger Newson) -------------------------------------------------------------------------------

Population attributable and unattributable fractions for cohort and cross-secti > onal studies

punaf [if] [in] [weight] , [ atspec(atspec) atzero(atspec0) subpop(subspec) predict(pred_opt) vce(vcespec) noesample force iterate(#) eform level(#) post ]

where atspec and atspec0 are at-specifications recognized by the at() option of margins, subspec is a subpopulation specificarion of the form recognized by the subpop() option of margins, and vcespec is a variance-covariance specification of the form recognized by margins, and must have one of the values

delta | unconditional

fweights, aweights, iweights, pweights are allowed; see weight.

Description

punaf calculates confidence intervals for population attributable fractions, and also for scenario means and their ratio, known as the population unattributable fraction. punaf can be used after an estimation command whose predicted values are interpreted as conditional arithmetic means, such as logit, logistic, poisson, or glm. It estimates the logs of two scenario means, a baseline scenario ("Scenario 0") and a fantasy scenario ("Scenario 1"), in which one or more exposure variables are assumed to be set to particular values (typically zero), and any other predictor variables in the model are assumed to remain the same. It also estimates the log of the ratio of the Scenario 1 mean to the Scenario 0 mean. This ratio is known as the population unattributable fraction, and is subtracted from 1 to derive the population attributable fraction, defined as the proportion of the mean of the outcome variable attributable to living in Scenario 0 instead of Scenario 1.

Options for punaf

atspec(atspec) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to values different from their value in the baseline scenario (denoted "Scenario 0" and equal to the real-life scenario unless atzero() is specified). punaf uses the margins command to estimate the arithmetic mean values of the outcome under Scenarios 0 and 1, and then uses nlcom to estimate the logs of these 2 scenario means, and of the ratio of the Scenario 1 mean to the Scenario 0 mean, known as the population unattributable fraction (PUF). The PUF, and its confidence limits, are subtracted from 1 to calculate a confidence interval for the population attributable fraction (PAF). If atspec() is not specified, then its default value is atspec((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.

atzero(atspec0) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single baseline scenario ("Scenario 0"), defined as an alternative fantasy world in which a subset of predictors in the model are set to the values specified by atspec0. Scenario 0 will then be compared to the "Scenario 1" specified by the atspec() option. If atzero() is not specified, then its default value is atzero((asobserved) _all), implying that Scenario 0 is the real-life baseline scenario, represented by the predictor values actually present.

subpop(subspec), predict(pred_opt) and vce(vcespec) have the same form and function as the options of the same names for margins. They specify the subpopulation, the predict option(s), and the variance-covariance matrix formula, respectively, used to estimate the scenario means, and therefore to estimate the population unattributable and attributable fractions.

noesample has the same function as the option of the same name for margins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.

force has the same function as the option of the same name for margins.

iterate(#) has the same form and function as the option of the same name for nlcom. It specifies the number of iterations used by nlcom to find the optimal step size to calculate the numerical derivatives of the logs of the scenario means and of their ratio, with respect to the original scenario means calculated by margins.

eform specifies that punaf will display estimates, P-values and confidence limits for the scenario means and their ratio, instead of for their logs. If eform is not specified, then confidence intervals for the logs are displayed. In either case, punaf also displays a confidence interval for the population attributable fraction (PAF).

level(#) specifies the percentage confidence level to be used in calculating the confidence intervals. If it is not specified, then it is taken from the current value of the c-class value c(level), which is usually 95.

post specifies that punaf will post in e() the estimation results for estimating the logs of the scenario means and of their ratio, the PUF. If post is not specified, then any existing estimation results are left in e(). Note that the estimation results posted are for the logs of the scenario means and of their ratio (the PUF), whether or not eform is specified.

Remarks

punaf essentially implements the method for estimating population attributable fractions (PAFs) recommended by Greenland and Drescher (1993) for cohort and cross-sectional studies. This source recommended the use of the Normalizing and variance-stabilizing transformation

log(PUF) = log(1-PAF)

to define confidence intervals for the PAF. punaf starts by estimating the logs of the scenario means and of their ratio (the PUF), using margins and nlcom. The results of this estimation are stored in e(), if the option post is specified. These estimation results may be saved in an output dataset (or resultsset) by the parmest package, which can be downloaded from SSC.

punaf assumes that the most recent estimation command estimates the parameters of a regression model, whose fitted values are conditional arithmetic mean outcomes, which must be positive. It is the user's responsibility to ensure that this is the case. However, it will be true if the conditional means are defined using a generalized linear model with a log, logit, probit or complementary log-log link function.

punaf is intended to replace some of the functions of the aflogit package (Brady, 1998). aflogit was written in Version 6 of Stata, and therefore will not work if the user uses long variable names and factor variables, which were introduced in later Stata versions.

Note that punaf (unlike aflogit) does not implement the formulas for estimating PAFs in case-control studies. The PUFs and PAFs in case-control studies represent a different kind of parameter from the PUFs and PAFS in cohort and cross-sectional studies, and can be estimated using the punafcc package. Note, also, that the PAF is a different parameter from the population attributable risk (PAR), which is a between-scenario difference (not a between-scenario ratio), and can be estimated using the regpar package. Users who need to estimate scenario prevalences (without differences) should use margprev. Users who need to estimate log-transformed scenario means (without ratios) should use marglmean. The punafcc, regpar, margprev and marglmean packages can be downloaded from SSC.

The general principles behind scenario comparisons in generalized linear models were introduced by Lane and Nelder (1982).

Examples

The following examples use the dataset lbw.dta, provided by Hosmer and Lemeshow (1988) and used in [R] logistic and distributed by Stata Press. This dataset has 1 observation for each of a sample of pregnancies, and data on the birth weight of the baby and on a list of predictive variables, which might be assumed to be causal by some scientists.

Setup

.use http://www.stata-press.com/data/r11/lbw.dta, clear .describe

The following example estimates population unattributable and attributable fractions for maternal smoking during pregnancy as a predictor of low birth weight. This is done by comparing "Scenario 1" (a fantasy world in which no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).

. logit low i.race i.smoke, or robust . punaf, at(smoke=0) eform

The following example estimates population unattributable and attributable fractions for maternal smoking and non-white race. This is done by comparing "Scenario 1" (a fantasy world in which all pregnant women are white and no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).

.logit low i.race i.smoke, or robust .punaf, at(smoke=0 race=1) eform

The following example demonstrates the use of punaf with a univariate model of low birth weight with respect to maternal smoking status, to estimate the total and exposed PAFs output by cs. We start by calling cs to calculate the total and exposed attributable fractions. We then use logit to estimate the odds ratio of low birth weight with respect to maternal smoking, and use punaf to estimate the scenario means and PAFs, first for the total population, then for the smoking-exposed subpopulation. Finally, we use punaf with the atzero() option to compare 2 alternative fantasy scenarios, a "Scenario 0" in which no mothers smoke and a "Scenario 1" in which all mothers smoke. Note that the PUF in the third scenario comparison is equal to the risk ratio output by cs, with very similar confidence limits. Note, also, that, in this comparison, the PAF is negative, because a world of non-smoking mothers would have fewer low birth weight babies than a world of smoking mothers.

.cs low smoke .logit low i.smoke, or robust .punaf, eform at(smoke=0) .punaf if smoke==1, eform at(smoke=0) .punaf, eform at(smoke=1) atzero(smoke=0)

The following example demonstrates the use of punaf with the parmest and creplace packages, downloadable from SSC. The population unattributable and attributable fractions for smoking are estimated using punaf (with the post option), and saved, using parmest, in a dataset in memory, overwriting the original dataset, with 1 observation for each of the 3 original parameters, named "Scenario_0", "Scenario_1" and "PUF", and data on the estimates, confidence limits, P-values, and other parameter attributes. We then use replace and creplace to replace the confidence interval for the PUF with a confidence interval for the PAF, and describe and list the new dataset.

. logit low i.race i.smoke, or robust . punaf, at(smoke=0) eform post . parmest, eform norestore . foreach Y of var estimate min* max* { . replace `Y'=1-`Y' if parm=="PUF" . } . creplace min* max* if parm=="PUF" . replace parm="PAF" if parm=="PUF" . describe . list

Saved results

punaf saves the following in r():

Scalars r(rank) rank of r(V) r(N) number of observations r(N_sub) subpopulation observations r(N_clust) number of clusters r(N_psu) number of samples PSUs, survey data only r(N_strata) number of strata, survey data only r(df_r) variance degrees of freedom, survey data only r(N_poststrata) number of post strata, survey data only r(k_margins) number of terms in marginlist r(k_by) number of subpopulations r(k_at) number of at() options (always 2) r(level) confidence level of confidence intervals

Macros r(atzero) atzero() option r(atspec) atspec() option

Matrices r(cimat) vector containing estimates and confidence limits for the PAF r(b) vector of logs of scenario means and their ratio r(V) estimated variance-covariance matrix of the logs of scenario means and their ratio

If post is specified, punaf also saves the following in e():

Scalars e(rank) rank of e(V) e(N) number of observations e(N_sub) subpopulation observations e(N_clust) number of clusters e(N_psu) number of samples PSUs, survey data only e(N_strata) number of strata, survey data only e(df_r) variance degrees of freedom, survey data only e(N_poststrata) number of post strata, survey data only e(k_margins) number of terms in marginlist e(k_by) number of subpopulations e(k_at) number of at() options (always 2)

Macros e(cmd) punaf e(predict) program used to implement predict e(atzero) atzero() option e(atspec) atspec() option e(properties) b V

Matrices e(b) vector of logs of scenario means and their ratio e(V) estimated variance-covariance matrix of the logs of scenario means and their ratio e(V_srs) simple-random-sampling-without-replacement (co)variance hat V_srswor, if svy e(V_srswr) simple-random-sampling-with-replacement (co)variance hat V_srswr, if svy and fpc() e(V_msp) misspecification (co)variance hat V_msp, if svy and available

Functions e(sample) marks estimation sample

Author

Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

References

Brady A. 1998. sbe21: Adjusted population attributable fractions from logistic regression. Stata Technical Bulletin STB-42: 8-12. Download from the Stata Technical Bulletin website.

Greenland S. and K. Drescher. 1993. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics 49: 865-872.

Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small. Biometrical Journal 30: 911�924.

Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction. Biometrics 38: 613�621.

Also see

Manual: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] poisson, [R] glm

Help: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] poisson, [R] glm punafcc, regpar, margprev, marglmean, parmest, creplace, aflogit if installed