------------------------------------------------------------------------------- help forpunaf(Roger Newson) -------------------------------------------------------------------------------

Population attributable and unattributable fractions for cohort and cross-secti> onal studies

punaf[if] [in] [weight] , [atspec(atspec)atzero(atspec0)subpop(subspec)predict(pred_opt)vce(vcespec)noesampleforceiterate(#)eformlevel(#)post]where

atspecandatspec0are at-specifications recognized by theat()option ofmargins,subspecis a subpopulation specificarion of the form recognized by thesubpop()option ofmargins, andvcespecis a variance-covariance specification of the form recognized bymargins, and must have one of the values

delta|unconditional

fweights,aweights,iweights,pweights are allowed; see weight.

Description

punafcalculates confidence intervals for population attributable fractions, and also for scenario means and their ratio, known as the population unattributable fraction.punafcan be used after an estimation command whose predicted values are interpreted as conditional arithmetic means, such aslogit,logistic,poisson, orglm. It estimates the logs of two scenario means, a baseline scenario ("Scenario 0") and a fantasy scenario ("Scenario 1"), in which one or more exposure variables are assumed to be set to particular values (typically zero), and any other predictor variables in the model are assumed to remain the same. It also estimates the log of the ratio of the Scenario 1 mean to the Scenario 0 mean. This ratio is known as the population unattributable fraction, and is subtracted from 1 to derive the population attributable fraction, defined as the proportion of the mean of the outcome variable attributable to living in Scenario 0 instead of Scenario 1.

Options forpunaf

atspec(atspec)is an at-specification, allowed as a value of theat()option ofmargins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to values different from their value in the baseline scenario (denoted "Scenario 0" and equal to the real-life scenario unlessatzero()is specified).punafuses themarginscommand to estimate the arithmetic mean values of the outcome under Scenarios 0 and 1, and then usesnlcomto estimate the logs of these 2 scenario means, and of the ratio of the Scenario 1 mean to the Scenario 0 mean, known as the population unattributable fraction (PUF). The PUF, and its confidence limits, are subtracted from 1 to calculate a confidence interval for the population attributable fraction (PAF). Ifatspec()is not specified, then its default value isatspec((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.

atzero(atspec0)is an at-specification, allowed as a value of theat()option ofmargins. This at-specification must specify a single baseline scenario ("Scenario 0"), defined as an alternative fantasy world in which a subset of predictors in the model are set to the values specified byatspec0. Scenario 0 will then be compared to the "Scenario 1" specified by theatspec()option. Ifatzero()is not specified, then its default value isatzero((asobserved) _all), implying that Scenario 0 is the real-life baseline scenario, represented by the predictor values actually present.

subpop(subspec),predict(pred_opt)andvce(vcespec)have the same form and function as the options of the same names formargins. They specify the subpopulation, the predict option(s), and the variance-covariance matrix formula, respectively, used to estimate the scenario means, and therefore to estimate the population unattributable and attributable fractions.

noesamplehas the same function as the option of the same name formargins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.

forcehas the same function as the option of the same name formargins.

iterate(#)has the same form and function as the option of the same name fornlcom. It specifies the number of iterations used bynlcomto find the optimal step size to calculate the numerical derivatives of the logs of the scenario means and of their ratio, with respect to the original scenario means calculated bymargins.

eformspecifies thatpunafwill display estimates,P-values and confidence limits for the scenario means and their ratio, instead of for their logs. Ifeformis not specified, then confidence intervals for the logs are displayed. In either case,punafalso displays a confidence interval for the population attributable fraction (PAF).

level(#)specifies the percentage confidence level to be used in calculating the confidence intervals. If it is not specified, then it is taken from the current value of the c-class valuec(level), which is usually 95.

postspecifies thatpunafwill post ine()the estimation results for estimating the logs of the scenario means and of their ratio, the PUF. Ifpostis not specified, then any existing estimation results are left ine(). Note that the estimation results posted are for the logs of the scenario means and of their ratio (the PUF), whether or noteformis specified.

Remarks

punafessentially implements the method for estimating population attributable fractions (PAFs) recommended by Greenland and Drescher (1993) for cohort and cross-sectional studies. This source recommended the use of the Normalizing and variance-stabilizing transformation

log(PUF) = log(1-PAF)to define confidence intervals for the PAF.

punafstarts by estimating the logs of the scenario means and of their ratio (the PUF), usingmarginsandnlcom. The results of this estimation are stored ine(), if the optionpostis specified. These estimation results may be saved in an output dataset (or resultsset) by theparmestpackage, which can be downloaded from SSC.

punafassumes that the most recent estimation command estimates the parameters of a regression model, whose fitted values are conditional arithmetic mean outcomes, which must be positive. It is the user's responsibility to ensure that this is the case. However, it will be true if the conditional means are defined using a generalized linear model with a log, logit, probit or complementary log-log link function.

punafis intended to replace some of the functions of theaflogitpackage (Brady, 1998).aflogitwas written in Version 6 of Stata, and therefore will not work if the user uses long variable names and factor variables, which were introduced in later Stata versions.Note that

punaf(unlikeaflogit) does not implement the formulas for estimating PAFs in case-control studies. The PUFs and PAFs in case-control studies represent a different kind of parameter from the PUFs and PAFS in cohort and cross-sectional studies, and can be estimated using thepunafccpackage. Note, also, that the PAF is a different parameter from the population attributable risk (PAR), which is a between-scenario difference (not a between-scenario ratio), and can be estimated using theregparpackage. Users who need to estimate scenario prevalences (without differences) should usemargprev. Users who need to estimate log-transformed scenario means (without ratios) should usemarglmean. Thepunafcc,regpar,margprevandmarglmeanpackages can be downloaded from SSC.The general principles behind scenario comparisons in generalized linear models were introduced by Lane and Nelder (1982).

ExamplesThe following examples use the dataset

lbw.dta, provided by Hosmer and Lemeshow (1988) and used in[R] logisticand distributed by Stata Press. This dataset has 1 observation for each of a sample of pregnancies, and data on the birth weight of the baby and on a list of predictive variables, which might be assumed to be causal by some scientists.Setup

.use http://www.stata-press.com/data/r11/lbw.dta, clear.describeThe following example estimates population unattributable and attributable fractions for maternal smoking during pregnancy as a predictor of low birth weight. This is done by comparing "Scenario 1" (a fantasy world in which no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).

. logit low i.race i.smoke, or robust. punaf, at(smoke=0) eformThe following example estimates population unattributable and attributable fractions for maternal smoking and non-white race. This is done by comparing "Scenario 1" (a fantasy world in which all pregnant women are white and no pregnant women smoke) with "Scenario 0" (the real world in which the data were collected).

.logit low i.race i.smoke, or robust.punaf, at(smoke=0 race=1) eformThe following example demonstrates the use of

punafwith a univariate model of low birth weight with respect to maternal smoking status, to estimate the total and exposed PAFs output bycs. We start by callingcsto calculate the total and exposed attributable fractions. We then uselogitto estimate the odds ratio of low birth weight with respect to maternal smoking, and usepunafto estimate the scenario means and PAFs, first for the total population, then for the smoking-exposed subpopulation. Finally, we usepunafwith theatzero()option to compare 2 alternative fantasy scenarios, a "Scenario 0" in which no mothers smoke and a "Scenario 1" in which all mothers smoke. Note that the PUF in the third scenario comparison is equal to the risk ratio output bycs, with very similar confidence limits. Note, also, that, in this comparison, the PAF is negative, because a world of non-smoking mothers would have fewer low birth weight babies than a world of smoking mothers.

.cs low smoke.logit low i.smoke, or robust.punaf, eform at(smoke=0).punaf if smoke==1, eform at(smoke=0).punaf, eform at(smoke=1) atzero(smoke=0)The following example demonstrates the use of

punafwith theparmestandcreplacepackages, downloadable from SSC. The population unattributable and attributable fractions for smoking are estimated usingpunaf(with thepostoption), and saved, usingparmest, in a dataset in memory, overwriting the original dataset, with 1 observation for each of the 3 original parameters, named"Scenario_0","Scenario_1"and"PUF", and data on the estimates, confidence limits,P-values, and other parameter attributes. We then usereplaceandcreplaceto replace the confidence interval for the PUF with a confidence interval for the PAF, anddescribeandlistthe new dataset.

. logit low i.race i.smoke, or robust. punaf, at(smoke=0) eform post. parmest, eform norestore. foreach Y of var estimate min* max* {. replace `Y'=1-`Y' if parm=="PUF". }. creplace min* max* if parm=="PUF". replace parm="PAF" if parm=="PUF". describe. list

Saved results

punafsaves the following inr():Scalars

r(rank)rank ofr(V)r(N)number of observationsr(N_sub)subpopulation observationsr(N_clust)number of clustersr(N_psu)number of samples PSUs, survey data onlyr(N_strata)number of strata, survey data onlyr(df_r)variance degrees of freedom, survey data onlyr(N_poststrata)number of post strata, survey data onlyr(k_margins)number of terms inmarginlistr(k_by)number of subpopulationsr(k_at)number ofat()options (always 2)r(level)confidence level of confidence intervalsMacros

r(atzero)atzero()optionr(atspec)atspec()optionMatrices

r(cimat)vector containing estimates and confidence limits for the PAFr(b)vector of logs of scenario means and their ratior(V)estimated variance-covariance matrix of the logs of scenario means and their ratioIf

postis specified,punafalso saves the following ine():Scalars

e(rank)rank ofe(V)e(N)number of observationse(N_sub)subpopulation observationse(N_clust)number of clusterse(N_psu)number of samples PSUs, survey data onlye(N_strata)number of strata, survey data onlye(df_r)variance degrees of freedom, survey data onlye(N_poststrata)number of post strata, survey data onlye(k_margins)number of terms inmarginliste(k_by)number of subpopulationse(k_at)number ofat()options (always 2)Macros

e(cmd)punafe(predict)program used to implementpredicte(atzero)atzero()optione(atspec)atspec()optione(properties)b VMatrices

e(b)vector of logs of scenario means and their ratioe(V)estimated variance-covariance matrix of the logs of scenario means and their ratioe(V_srs)simple-random-sampling-without-replacement (co)variance hat V_srswor, ifsvye(V_srswr)simple-random-sampling-with-replacement (co)variance hat V_srswr, ifsvyandfpc()e(V_msp)misspecification (co)variance hat V_msp, ifsvyand availableFunctions

e(sample)marks estimation sample

AuthorRoger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

ReferencesBrady A. 1998. sbe21: Adjusted population attributable fractions from logistic regression.

Stata Technical BulletinSTB-42: 8-12. Download from theStata Technical Bulletinwebsite.Greenland S. and K. Drescher. 1993. Maximum likelihood estimation of the attributable fraction from logistic models.

Biometrics49: 865-872.Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small.

Biometrical Journal30: 911–924.Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction.

Biometrics38: 613–621.

Also seeManual:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] poisson,[R] glmHelp:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] poisson,[R] glmpunafcc,regpar,margprev,marglmean,parmest,creplace,aflogitif installed