Population attributable and unattributable fractions for case-control and survi > val studies
punafcc [if] [in] [weight] , [ atspec(atspec) subpop(subspec) vce(vcespec) noesample force iterate(#) eform level(#) post ]
where atspec is an at-specifications recognized by the at() option of margins, subspec is a subpopulation specificarion of the form recognized by the subpop() option of margins, and vcespec is a variance-covariance specification of the form recognized by margins, and must have one of the values
delta | unconditional
fweights, aweights, iweights, pweights are allowed; see weight.
Description
punafcc calculates confidence intervals for population attributable and unattributable fractions in case-control or survival studies. punafcc can be used after an estimation command whose parameters are interpreted as log rate ratios, such as logit or logistic for case-control data, or stcox for survival data. It estimates the log of the mean rate ratio, in cases or deaths, between 2 scenarios, a baseline scenario ("Scenario 0") and a fantasy scenario ("Scenario 1"), in which one or more exposure variables are assumed to be set to particular values (typically zero), and any other predictor variables in the model are assumed to remain the same. This ratio is known as the population unattributable fraction (PUF), and is subtracted from 1 to derive the population attributable fraction (PAF), defined as the proportion of the cases or deaths attributable to living in Scenario 0 instead of Scenario 1.
Options for punafcc
atspec(atspec) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to values different from their value in the baseline scenario (denoted "Scenario 0" and equal to the real-life scenario). The at-specification may set variables only to values (not to statistics). punafcc uses the margins command to estimate the mean rate ratio, in cases or deaths, between Scenarios 0 and 1, and then uses nlcom to estimate the log of this ratio, known as the population unattributable fraction (PUF). The PUF, and its confidence limits, are subtracted from 1 to calculate a confidence interval for the population attributable fraction (PAF). If atspec() is not specified, then its default value is atzero((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.
subpop(subspec) and vce(vcespec) have the same form and function as the options of the same names for margins. They specify the subpopulation and the variance-covariance matrix formula, respectively, used to estimate the mean Scenario 0/Scenario 1 rate ratio, and therefore to estimate the population unattributable and attributable fractions.
noesample has the same function as the option of the same name for margins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.
force has the same function as the option of the same name for margins.
iterate(#) has the same form and function as the option of the same name for nlcom. It specifies the number of iterations used by nlcom to find the optimal step size to calculate the numerical derivatives of the log of the mean Scenario 0/Scenario 1 rate ratio in cases or deaths, with respect to the rate ratio itself, calculated by margins.
eform specifies that punafcc will display an estimate, P-value and confidence limits for the population unattributable fraction, instead of for its log. If eform is not specified, then a confidence interval for the log is displayed. In either case, punafcc also displays a confidence interval for the population attributable fraction (PAF).
level(#) specifies the percentage confidence level to be used in calculating the confidence intervals. If it is not specified, then it is taken from the current value of the c-class value c(level), which is usually 95.
post specifies that punafcc will post in e() the estimation results for estimating the log of the mean Scenario 0/Scenario 1 rate ratio in cases or deaths, the PUF. If post is not specified, then any existing estimation results are left in e(). Note that the estimation results posted are for the log of the mean rate ratio in cases or deaths (the PUF), whether or not eform is specified.
Remarks
punafcc essentially implements the method for estimating population attributable fractions (PAFs) recommended by Greenland and Drescher (1993) for case-control studies. This source recommended the use of the Normalizing and variance-stabilizing transformation
log(PUF) = log(1-PAF)
to define confidence intervals for the PAF. punafcc starts by estimating the log of the mean rate ratio in cases or deaths (the PUF), using margins and nlcom. The results of this estimation are stored in e(), if the option post is specified. These estimation results may be saved in an output dataset (or resultsset) by the parmest package, which can be downloaded from SSC.
punafcc assumes that the most recent estimation command estimates the parameters of a single-equation regression model, whose parameters are interpreted as log rate ratios. It is the user's responsibility to ensure that this is the case. However, it will be true if the model is a logistic regression model on case-control data, fitted using logit, logistic or glm, or a Cox proportional hazard model on survival data, fitted using stcox. punafcc estimates the PUF as the Scenario 0/Scenario 1 mean rate ratio, restricted to observations representing deaths, if the previous command was stcox, and restricted to observations with a non-missing non-zero value of the dependent variable, after any other estimation command.
punafcc was written to replace some of the functions of the aflogit package (Brady, 1998). aflogit was written in Version 6 of Stata, and therefore will not work if the user uses long variable names and factor variables, which were introduced in later Stata versions.
Note that punafcc (unlike aflogit) does not implement the formulas for estimating PAFs in cross-sectional and cohort studies, which can be done using the punaf package. The logs of PUFs in case-control and survival studies represent a different kind of parameter from the logs of PUFs in cohort and cross-sectional studies, but both can be estimated using margins and nlcom. Note, also, that both kinds of PAF are a different parameter from the population attributable risk (PAR), which can be estimated using the regpar package. Users who need to estimate scenario prevalences (without differences) should use margprev. Users who need to estimate log-transformed scenario means (without ratios) should use marglmean. The packages punaf, regpar, margprev and marglmean are downloadable from SSC.
The general principles behind scenario comparisons in generalized linear models were introduced by Lane and Nelder (1982).
Examples
The following examples use the dataset ccxmp.dta. This dataset has 1 observation for each combination of case status and exposure, and data on the number of subjects with that case and exposure status.
Setup
.webuse ccxmpl, clear .describe .list
The following example estimates population unattributable and attributable fractions for exposure as a predictor of case status, following a logistic regression model. This is done by comparing "Scenario 1" (a fantasy world in which no subjects are exposed) with "Scenario 0" (the real world in which the data were collected). This is done both for all subjects (to get the total-population attributable fraction) and for exposed subjects (to get the exposed-population attributable fraction). Note that the point estimators for both these PAFs are the same as those produced by cc on the same data. The option vce(unconditional), requiring robust variances in the model, is probably a good idea with case-control or survival studies, because we might expect covariate values in cases or deaths to be subject to sampling error. (However, vce(unconditional) should not be used when calculating out-of-sample PAFs for a second set of case-control or survival data from a model fitted to a first set of case-control or survival data, using the noesample option.)
. cc case exposed [fweight=pop] . logit case exposed [fweight=pop], or robust . punafcc, at(exposed=0) eform vce(unconditional) . punafcc, at(exposed=0) eform vce(unconditional) subpop(exposed)
The following examples use the dataset downs.dta. This dataset has 1 observation for each combination of case status, exposure and age group, and data on the number of subjects with that case and exposure status and age group.
Setup
.webuse downs, clear .describe .label list age .list, sepby(age)
The following examples estimate age-adjusted exposure effects using logistic regression, and then estimate the PAF. This is done by comparing "Scenario 1" (a fantasy world in which no subjects are exposed and the age distribution stays the same) with "Scenario 0" (the real world in which the data were collected).
.logit case i.age i.exposed [fweight=pop], or robust .punafcc, at(exposed=0) eform vce(unconditional)
.logit case i.age exposed [fweight=pop], or robust .punafcc, at(exposed=0) eform vce(unconditional)
The following example demonstrates the use of punafcc in the same dataset with an interactive logistic model, in which exposure effects may vary with age.
.logit case i.age i.exposed i.age#i.exposed [fweight=pop], or robust .punafcc, at(exposed=0) eform vce(unconditional)
The following example demonstrates the use of punafcc with the parmest and creplace packages, downloadable from SSC. The population unattributable and attributable fractions for case status are estimated using punafcc (with the post option), and saved, using parmest, in a dataset in memory, overwriting the original dataset, with 1 observation for 1 parameter, the log population unattributable fraction, named "PUF", and data on the estimate, confidence limits, P-value, and other parameter attributes. We then use replace and creplace to replace the confidence interval for the PUF with a confidence interval for the PAF, and describe and list the new dataset.
. logit case i.age i.exposed [fweight=pop], or robust . punafcc, at(exposed=0) eform vce(unconditional) post . parmest, eform norestore . foreach Y of var estimate min* max* { . replace `Y'=1-`Y' . } . creplace min* max* . replace parm="PAF" . describe . list
The following example demonstrates the estimation of the PUF and PAF in a Cox regression model on the drugtr data, used as an example for stcox. We estimate a PUF and a PAF comparing the real-world Scenario 0 with a fantasy Scenario 1, in which all subjects receive the drug, but the subjects' ages are the same as in Scenario 0.
Setup
. webuse drugtr, clear . stset . tab drug, m
Example
. stcox drug age, vce(robust) . punafcc, eform at(drug=1) vce(unconditional)
Saved results
punafcc saves the following in r():
Scalars r(rank) rank of r(V) r(N) number of observations r(N_sub) subpopulation observations r(N_clust) number of clusters r(N_psu) number of samples PSUs, survey data only r(N_strata) number of strata, survey data only r(df_r) variance degrees of freedom, survey data only r(N_poststrata) number of post strata, survey data only r(k_margins) number of terms in marginlist r(k_by) number of subpopulations r(k_at) number of at() options (always 0) r(level) confidence level of confidence intervals
Macros r(atzero) at() option for Scenario 0 r(atspec) atspec() option r(atzero_exp) expression() option for Scenario 0/Scenario 0 rate ratio r(atspec_exp) expression() option for Scenario 0/Scenario 1 rate ratio
Matrices r(cimat) vector containing estimates and confidence limits for the PAF r(b) vector of log Scenario 0/Scenario 1 rate ratio r(V) estimated variance-covariance matrix of log Scenario 0/Scenario 1 rate ratio
If post is specified, punafcc also saves the following in e():
Scalars e(rank) rank of e(V) e(N) number of observations e(N_sub) subpopulation observations e(N_clust) number of clusters e(N_psu) number of samples PSUs, survey data only e(N_strata) number of strata, survey data only e(df_r) variance degrees of freedom, survey data only e(N_poststrata) number of post strata, survey data only e(k_margins) number of terms in marginlist e(k_by) number of subpopulations e(k_at) number of at() options (always 0)
Macros e(cmd) punafcc e(predict) program used to implement predict e(atzero) at() option for Scenario 0 e(atspec) atspec() option e(atzero_exp) expression() option for Scenario 0/Scenario 0 rate ratio e(atspec_exp) expression() option for Scenario 0/Scenario 1 rate ratio e(properties) b V
Matrices e(b) vector of log Scenario 0/Scenario 1 rate ratio e(V) estimated variance-covariance matrix of log Scenario 0/Scenario 1 rate ratio e(V_srs) simple-random-sampling-without-replacement (co)variance hat V_srswor, if svy e(V_srswr) simple-random-sampling-with-replacement (co)variance hat V_srswr, if svy and fpc() e(V_msp) misspecification (co)variance hat V_msp, if svy and available
Functions e(sample) marks estimation sample
Author
Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Brady A. 1998. sbe21: Adjusted population attributable fractions from logistic regression. Stata Technical Bulletin STB-42: 8-12. Download from the Stata Technical Bulletin website.
Greenland S. and K. Drescher. 1993. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics 49: 865-872.
Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small. Biometrical Journal 30: 911–924.
Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction. Biometrics 38: 613–621.
Also see
Manual: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] stcox, [R] glm
Help: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] stcox, [R] glm punaf, regpar, margprev, marglmean, parmest, creplace, aflogit if installed