------------------------------------------------------------------------------- help forpunafcc(Roger Newson) -------------------------------------------------------------------------------

Population attributable and unattributable fractions for case-control and survi> val studies

punafcc[if] [in] [weight] , [atspec(atspec)subpop(subspec)vce(vcespec)noesampleforceiterate(#)eformlevel(#)post]where

atspecis an at-specifications recognized by theat()option ofmargins,subspecis a subpopulation specificarion of the form recognized by thesubpop()option ofmargins, andvcespecis a variance-covariance specification of the form recognized bymargins, and must have one of the values

delta|unconditional

fweights,aweights,iweights,pweights are allowed; see weight.

Description

punafcccalculates confidence intervals for population attributable and unattributable fractions in case-control or survival studies.punafcccan be used after an estimation command whose parameters are interpreted as log rate ratios, such aslogitorlogisticfor case-control data, orstcoxfor survival data. It estimates the log of the mean rate ratio, in cases or deaths, between 2 scenarios, a baseline scenario ("Scenario 0") and a fantasy scenario ("Scenario 1"), in which one or more exposure variables are assumed to be set to particular values (typically zero), and any other predictor variables in the model are assumed to remain the same. This ratio is known as the population unattributable fraction (PUF), and is subtracted from 1 to derive the population attributable fraction (PAF), defined as the proportion of the cases or deaths attributable to living in Scenario 0 instead of Scenario 1.

Options forpunafcc

atspec(atspec)is an at-specification, allowed as a value of theat()option ofmargins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to values different from their value in the baseline scenario (denoted "Scenario 0" and equal to the real-life scenario). The at-specification may set variables only to values (not to statistics).punafccuses themarginscommand to estimate the mean rate ratio, in cases or deaths, between Scenarios 0 and 1, and then usesnlcomto estimate the log of this ratio, known as the population unattributable fraction (PUF). The PUF, and its confidence limits, are subtracted from 1 to calculate a confidence interval for the population attributable fraction (PAF). Ifatspec()is not specified, then its default value isatzero((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.

subpop(subspec)andvce(vcespec)have the same form and function as the options of the same names formargins. They specify the subpopulation and the variance-covariance matrix formula, respectively, used to estimate the mean Scenario 0/Scenario 1 rate ratio, and therefore to estimate the population unattributable and attributable fractions.

noesamplehas the same function as the option of the same name formargins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.

forcehas the same function as the option of the same name formargins.

iterate(#)has the same form and function as the option of the same name fornlcom. It specifies the number of iterations used bynlcomto find the optimal step size to calculate the numerical derivatives of the log of the mean Scenario 0/Scenario 1 rate ratio in cases or deaths, with respect to the rate ratio itself, calculated bymargins.

eformspecifies thatpunafccwill display an estimate,P-value and confidence limits for the population unattributable fraction, instead of for its log. Ifeformis not specified, then a confidence interval for the log is displayed. In either case,punafccalso displays a confidence interval for the population attributable fraction (PAF).

level(#)specifies the percentage confidence level to be used in calculating the confidence intervals. If it is not specified, then it is taken from the current value of the c-class valuec(level), which is usually 95.

postspecifies thatpunafccwill post ine()the estimation results for estimating the log of the mean Scenario 0/Scenario 1 rate ratio in cases or deaths, the PUF. Ifpostis not specified, then any existing estimation results are left ine(). Note that the estimation results posted are for the log of the mean rate ratio in cases or deaths (the PUF), whether or noteformis specified.

Remarks

punafccessentially implements the method for estimating population attributable fractions (PAFs) recommended by Greenland and Drescher (1993) for case-control studies. This source recommended the use of the Normalizing and variance-stabilizing transformation

log(PUF) = log(1-PAF)to define confidence intervals for the PAF.

punafccstarts by estimating the log of the mean rate ratio in cases or deaths (the PUF), usingmarginsandnlcom. The results of this estimation are stored ine(), if the optionpostis specified. These estimation results may be saved in an output dataset (or resultsset) by theparmestpackage, which can be downloaded from SSC.

punafccassumes that the most recent estimation command estimates the parameters of a single-equation regression model, whose parameters are interpreted as log rate ratios. It is the user's responsibility to ensure that this is the case. However, it will be true if the model is a logistic regression model on case-control data, fitted usinglogit,logisticorglm, or a Cox proportional hazard model on survival data, fitted usingstcox.punafccestimates the PUF as the Scenario 0/Scenario 1 mean rate ratio, restricted to observations representing deaths, if the previous command wasstcox, and restricted to observations with a non-missing non-zero value of the dependent variable, after any other estimation command.

punafccwas written to replace some of the functions of theaflogitpackage (Brady, 1998).aflogitwas written in Version 6 of Stata, and therefore will not work if the user uses long variable names and factor variables, which were introduced in later Stata versions.Note that

punafcc(unlikeaflogit) does not implement the formulas for estimating PAFs in cross-sectional and cohort studies, which can be done using thepunafpackage. The logs of PUFs in case-control and survival studies represent a different kind of parameter from the logs of PUFs in cohort and cross-sectional studies, but both can be estimated usingmarginsandnlcom. Note, also, that both kinds of PAF are a different parameter from the population attributable risk (PAR), which can be estimated using theregparpackage. Users who need to estimate scenario prevalences (without differences) should usemargprev. Users who need to estimate log-transformed scenario means (without ratios) should usemarglmean. The packagespunaf,regpar,margprevandmarglmeanare downloadable from SSC.The general principles behind scenario comparisons in generalized linear models were introduced by Lane and Nelder (1982).

ExamplesThe following examples use the dataset

ccxmp.dta. This dataset has 1 observation for each combination of case status and exposure, and data on the number of subjects with that case and exposure status.Setup

.webuse ccxmpl, clear.describe.listThe following example estimates population unattributable and attributable fractions for exposure as a predictor of case status, following a logistic regression model. This is done by comparing "Scenario 1" (a fantasy world in which no subjects are exposed) with "Scenario 0" (the real world in which the data were collected). This is done both for all subjects (to get the total-population attributable fraction) and for exposed subjects (to get the exposed-population attributable fraction). Note that the point estimators for both these PAFs are the same as those produced by

ccon the same data. The optionvce(unconditional), requiring robust variances in the model, is probably a good idea with case-control or survival studies, because we might expect covariate values in cases or deaths to be subject to sampling error. (However,vce(unconditional)should not be used when calculating out-of-sample PAFs for a second set of case-control or survival data from a model fitted to a first set of case-control or survival data, using thenoesampleoption.)

. cc case exposed [fweight=pop]. logit case exposed [fweight=pop], or robust. punafcc, at(exposed=0) eform vce(unconditional). punafcc, at(exposed=0) eform vce(unconditional) subpop(exposed)The following examples use the dataset

downs.dta. This dataset has 1 observation for each combination of case status, exposure and age group, and data on the number of subjects with that case and exposure status and age group.Setup

.webuse downs, clear.describe.label list age.list, sepby(age)The following examples estimate age-adjusted exposure effects using logistic regression, and then estimate the PAF. This is done by comparing "Scenario 1" (a fantasy world in which no subjects are exposed and the age distribution stays the same) with "Scenario 0" (the real world in which the data were collected).

.logit case i.age i.exposed [fweight=pop], or robust.punafcc, at(exposed=0) eform vce(unconditional)

.logit case i.age exposed [fweight=pop], or robust.punafcc, at(exposed=0) eform vce(unconditional)The following example demonstrates the use of

punafccin the same dataset with an interactive logistic model, in which exposure effects may vary with age.

.logit case i.age i.exposed i.age#i.exposed [fweight=pop], or robust.punafcc, at(exposed=0) eform vce(unconditional)The following example demonstrates the use of

punafccwith theparmestandcreplacepackages, downloadable from SSC. The population unattributable and attributable fractions for case status are estimated usingpunafcc(with thepostoption), and saved, usingparmest, in a dataset in memory, overwriting the original dataset, with 1 observation for 1 parameter, the log population unattributable fraction, named"PUF", and data on the estimate, confidence limits,P-value, and other parameter attributes. We then usereplaceandcreplaceto replace the confidence interval for the PUF with a confidence interval for the PAF, anddescribeandlistthe new dataset.

. logit case i.age i.exposed [fweight=pop], or robust. punafcc, at(exposed=0) eform vce(unconditional) post. parmest, eform norestore. foreach Y of var estimate min* max* {. replace `Y'=1-`Y'. }. creplace min* max*. replace parm="PAF". describe. listThe following example demonstrates the estimation of the PUF and PAF in a Cox regression model on the drugtr data, used as an example for

stcox. We estimate a PUF and a PAF comparing the real-world Scenario 0 with a fantasy Scenario 1, in which all subjects receive the drug, but the subjects' ages are the same as in Scenario 0.Setup

. webuse drugtr, clear. stset. tab drug, mExample

. stcox drug age, vce(robust). punafcc, eform at(drug=1) vce(unconditional)

Saved results

punafccsaves the following inr():Scalars

r(rank)rank ofr(V)r(N)number of observationsr(N_sub)subpopulation observationsr(N_clust)number of clustersr(N_psu)number of samples PSUs, survey data onlyr(N_strata)number of strata, survey data onlyr(df_r)variance degrees of freedom, survey data onlyr(N_poststrata)number of post strata, survey data onlyr(k_margins)number of terms inmarginlistr(k_by)number of subpopulationsr(k_at)number ofat()options (always 0)r(level)confidence level of confidence intervalsMacros

r(atzero)at()option for Scenario 0r(atspec)atspec()optionr(atzero_exp)expression()option for Scenario 0/Scenario 0 rate ratior(atspec_exp)expression()option for Scenario 0/Scenario 1 rate ratioMatrices

r(cimat)vector containing estimates and confidence limits for the PAFr(b)vector of log Scenario 0/Scenario 1 rate ratior(V)estimated variance-covariance matrix of log Scenario 0/Scenario 1 rate ratioIf

postis specified,punafccalso saves the following ine():Scalars

e(rank)rank ofe(V)e(N)number of observationse(N_sub)subpopulation observationse(N_clust)number of clusterse(N_psu)number of samples PSUs, survey data onlye(N_strata)number of strata, survey data onlye(df_r)variance degrees of freedom, survey data onlye(N_poststrata)number of post strata, survey data onlye(k_margins)number of terms inmarginliste(k_by)number of subpopulationse(k_at)number ofat()options (always 0)Macros

e(cmd)punafcce(predict)program used to implementpredicte(atzero)at()option for Scenario 0e(atspec)atspec()optione(atzero_exp)expression()option for Scenario 0/Scenario 0 rate ratioe(atspec_exp)expression()option for Scenario 0/Scenario 1 rate ratioe(properties)b VMatrices

e(b)vector of log Scenario 0/Scenario 1 rate ratioe(V)estimated variance-covariance matrix of log Scenario 0/Scenario 1 rate ratioe(V_srs)simple-random-sampling-without-replacement (co)variance hat V_srswor, ifsvye(V_srswr)simple-random-sampling-with-replacement (co)variance hat V_srswr, ifsvyandfpc()e(V_msp)misspecification (co)variance hat V_msp, ifsvyand availableFunctions

e(sample)marks estimation sample

AuthorRoger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

ReferencesBrady A. 1998. sbe21: Adjusted population attributable fractions from logistic regression.

Stata Technical BulletinSTB-42: 8-12. Download from theStata Technical Bulletinwebsite.Greenland S. and K. Drescher. 1993. Maximum likelihood estimation of the attributable fraction from logistic models.

Biometrics49: 865-872.Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small.

Biometrical Journal30: 911–924.Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction.

Biometrics38: 613–621.

Also seeManual:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] stcox,[R]glmHelp:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] stcox,[R]glmpunaf,regpar,margprev,marglmean,parmest,creplace,aflogitif installed