------------------------------------------------------------------------------- help for margprev (Roger Newson) -------------------------------------------------------------------------------

Marginal prevalences from binary regression models

margprev [if] [in] [weight] , [ atspec(atspec) subpop(subspec) predict(pred_opt) vce(vcespec) noesample force iterate(#) eform level(#) post ]

where atspec is an at-specification recognized by the at() option of margins, subspec is a subpopulation specification of the form recognized by the subpop() option of margins, and vcespec is a variance-covariance specification of the form recognized by margins, and must have one of the values

delta | unconditional

fweights, aweights, iweights, pweights are allowed; see weight.

Description

margprev calculates confidence intervals for marginal prevalences, also known as scenario proportions. margprev can be used after an estimation command whose predicted values are interpreted as conditional proportions, such as logit, logistic, probit, or glm. It estimates a marginal prevalence for a scenario ("Scenario 1"), in which one or more predictor variables may be assumed to be set to particular values, and any other predictor variables in the model are assumed to remain the same.

Options for margprev

atspec(atspec) is an at-specification, allowed as a value of the at() option of margins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to specified values. margprev uses the margins command to estimate the proportion of outcome values positive under Scenario 1, and then uses nlcom to estimate the logit of this scenario proportion, known as the marginal prevalence. If atspec() is not specified, then its default value is atspec((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.

subpop(subspec), predict(pred_opt) and vce(vcespec) have the same form and function as the options of the same names for margins. They specify the subpopulation, the predict option(s), and the variance-covariance matrix formula, respectively, used to estimate the logit of the marginal prevalence.

noesample has the same function as the option of the same name for margins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.

force has the same function as the option of the same name for margins.

iterate(#) has the same form and function as the option of the same name for nlcom. It specifies the number of iterations used by nlcom to find the optimal step size to calculate the numerical derivative of the logit of the marginal prevalence, with respect to the original marginal prevalence calculated by margins.

eform specifies that margprev will display an estimate, P-value and confidence limits for the marginal odds, instead of for the log marginal odds (the logit of the marginal prevalence). If eform is not specified, then a confidence interval for the log marginal odds is displayed. In either case, margprev also displays an asymmetric confidence interval for the untransformed marginal prevalence.

level(#) specifies the percentage confidence level to be used in calculating the confidence interval. If it is not specified, then it is taken from the current value of the c-class value c(level), which is usually 95.

post specifies that margprev will post in e() the estimation results for estimating the logit of the marginal prevalence. If post is not specified, then any existing estimation results are left in e(). Note that the estimation results posted are for the logit of the marginal prevalence, and not for the marginal prevalence itself. This is done because the estimation results are intended to define a symmetric confidence interval for the logit marginal prevalence, which can be back-transformed to define an asymmetric confidence interval for the untransformed marginal prevalence.

Remarks

margprev estimates the marginal prevalence, which is a scenario proportion, which is a special case of a scenario mean. The general principles behind scenario means for generalized linear models were introduced in Lane and Nelder (1982).

margprev starts by estimating the logit of the scenario proportion, using margins and nlcom. The results of this estimation are stored in e(), if the option post is specified. These estimation results may be saved in an output dataset (or resultsset) by the parmest package, which can be downloaded from SSC.

margprev assumes that the most recent estimation command estimates the parameters of a regression model, whose fitted values are conditional proportions, which must be bounded between 0 and 1. It is the user's responsibility to ensure that this is the case. However, it will be true if the conditional proportions are defined using a generalized linear model with a Bernoulli variance function (not a non-Bernoulli binomial variance function), and a logit, probit or complementary log-log link function.

Note that margprev estimates a single marginal prevalence, and does not compare 2 marginal prevalences using differences or ratios. Users who need to estimate differences between scenario proportions (population attributable risks) should use regpar. Users who need to estimate ratios between scenario proportions (population unattributable fractions) should use either punaf (for cohort or cross-sectional study data) or punafcc (for case-control or survival study data). Users who need to estimate general marginal means for general non-negative outcomes, instead of marginal prevalences for outcomes bounded between 0 and 1, should probably use marglmean. The packages marglmean, regpar, punaf and punafcc are downloadable from SSC.

Examples

The following examples use the dataset lbw.dta, provided by Hosmer and Lemeshow (1988) and used in [R] logistic and distributed by Stata Press. This dataset has 1 observation for each of a sample of pregnancies, and data on the birth weight of the baby and on a list of predictive variables, which might be assumed to be causal by some scientists.

Setup

. use http://www.stata-press.com/data/r11/lbw.dta, clear . describe

The following example estimates marginal prevalences of low birth weight under the existing scenario and under a fantasy scenario where no mothers smoke.

. logit low i.race i.smoke, or robust . margprev . margprev, at(smoke=0)

The following example demonstrates the use of margprev with the parmest package, downloadable from SSC. The marginal prevalence of low birth weight is estimated using margprev (with the post option), and saved (in its logit-transformed version), using parmest, in a dataset in memory, overwriting the original dataset, with 1 observation for the 1 transformed parameter, named "Scenario_1", and data on the estimate, confidence limits, P-value, and other parameter attributes. We then use replace to replace the symmetric confidence interval for the transformed parameter with an asymmetric confidence interval for the untransformed parameter, and describe and list the new dataset.

. logit low i.race i.smoke, or robust . margprev, eform post . parmest, norestore . foreach Y of var estimate min* max* { . replace `Y'=invlogit(`Y') . } . describe . list

Saved results

margprev saves the following in r():

Scalars r(rank) rank of r(V) r(N) number of observations r(N_sub) subpopulation observations r(N_clust) number of clusters r(N_psu) number of samples PSUs, survey data only r(N_strata) number of strata, survey data only r(df_r) variance degrees of freedom, survey data only r(N_poststrata) number of post strata, survey data only r(k_margins) number of terms in marginlist r(k_by) number of subpopulations r(k_at) number of at() options (always 1) r(level) confidence level of confidence intervals

Macros r(atspec) atspec() option

Matrices r(cimat) row vector containing estimates and confidence limits for the untransformed marginal prevalence r(b) vector of the logit of the marginal prevalence r(V) estimated variance-covariance matrix of the logit of the marginal prevalence

If post is specified, margprev also saves the following in e():

Scalars e(rank) rank of e(V) e(N) number of observations e(N_sub) subpopulation observations e(N_clust) number of clusters e(N_psu) number of samples PSUs, survey data only e(N_strata) number of strata, survey data only e(df_r) variance degrees of freedom, survey data only e(N_poststrata) number of post strata, survey data only e(k_margins) number of terms in marginlist e(k_by) number of subpopulations e(k_at) number of at() options (always 1)

Macros e(cmd) margprev e(predict) program used to implement predict e(atspec) atspec() option e(properties) b V

Matrices e(b) vector of the logit of the marginal prevalence e(V) estimated variance-covariance matrix of the logit of the marginal prevalence e(V_srs) simple-random-sampling-without-replacement (co)variance hat V_srswor, if svy e(V_srswr) simple-random-sampling-with-replacement (co)variance hat V_srswr, if svy and fpc() e(V_msp) misspecification (co)variance hat V_msp, if svy and available

Functions e(sample) marks estimation sample

Author

Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

References

Hosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small. Biometrical Journal 30: 911–924.

Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction. Biometrics 38: 613–621.

Also see

Manual: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] probit, [R] glm

Help: [R] margins, [R] nlcom, [R] logistic, [R] logit, [R] probit, [R] glm marglmean, regpar, punaf, punafcc, parmest if installed