------------------------------------------------------------------------------- help formargprev(Roger Newson) -------------------------------------------------------------------------------

Marginal prevalences from binary regression models

margprev[if] [in] [weight] , [atspec(atspec)subpop(subspec)predict(pred_opt)vce(vcespec)noesampleforceiterate(#)eformlevel(#)post]where

atspecis an at-specification recognized by theat()option ofmargins,subspecis a subpopulation specification of the form recognized by thesubpop()option ofmargins, andvcespecis a variance-covariance specification of the form recognized bymargins, and must have one of the values

delta|unconditional

fweights,aweights,iweights,pweights are allowed; see weight.

Description

margprevcalculates confidence intervals for marginal prevalences, also known as scenario proportions.margprevcan be used after an estimation command whose predicted values are interpreted as conditional proportions, such aslogit,logistic,probit, orglm. It estimates a marginal prevalence for a scenario ("Scenario 1"), in which one or more predictor variables may be assumed to be set to particular values, and any other predictor variables in the model are assumed to remain the same.

Options formargprev

atspec(atspec)is an at-specification, allowed as a value of theat()option ofmargins. This at-specification must specify a single scenario ("Scenario 1"), defined as a fantasy world in which a subset of the predictor variables in the model are set to specified values.margprevuses themarginscommand to estimate the proportion of outcome values positive under Scenario 1, and then usesnlcomto estimate the logit of this scenario proportion, known as the marginal prevalence. Ifatspec()is not specified, then its default value isatspec((asobserved) _all), implying that Scenario 1 is the real-life baseline scenario, represented by the predictor values actually present.

subpop(subspec),predict(pred_opt)andvce(vcespec)have the same form and function as the options of the same names formargins. They specify the subpopulation, the predict option(s), and the variance-covariance matrix formula, respectively, used to estimate the logit of the marginal prevalence.

noesamplehas the same function as the option of the same name formargins. It specifies that computations will not be restricted to the estimation sample used by the previous estimation command.

forcehas the same function as the option of the same name formargins.

iterate(#)has the same form and function as the option of the same name fornlcom. It specifies the number of iterations used bynlcomto find the optimal step size to calculate the numerical derivative of the logit of the marginal prevalence, with respect to the original marginal prevalence calculated bymargins.

eformspecifies thatmargprevwill display an estimate,P-value and confidence limits for the marginal odds, instead of for the log marginal odds (the logit of the marginal prevalence). Ifeformis not specified, then a confidence interval for the log marginal odds is displayed. In either case,margprevalso displays an asymmetric confidence interval for the untransformed marginal prevalence.

level(#)specifies the percentage confidence level to be used in calculating the confidence interval. If it is not specified, then it is taken from the current value of the c-class valuec(level), which is usually 95.

postspecifies thatmargprevwill post ine()the estimation results for estimating the logit of the marginal prevalence. Ifpostis not specified, then any existing estimation results are left ine(). Note that the estimation results posted are for the logit of the marginal prevalence, and not for the marginal prevalence itself. This is done because the estimation results are intended to define a symmetric confidence interval for the logit marginal prevalence, which can be back-transformed to define an asymmetric confidence interval for the untransformed marginal prevalence.

Remarks

margprevestimates the marginal prevalence, which is a scenario proportion, which is a special case of a scenario mean. The general principles behind scenario means for generalized linear models were introduced in Lane and Nelder (1982).

margprevstarts by estimating the logit of the scenario proportion, usingmarginsandnlcom. The results of this estimation are stored ine(), if the optionpostis specified. These estimation results may be saved in an output dataset (or resultsset) by theparmestpackage, which can be downloaded from SSC.

margprevassumes that the most recent estimation command estimates the parameters of a regression model, whose fitted values are conditional proportions, which must be bounded between 0 and 1. It is the user's responsibility to ensure that this is the case. However, it will be true if the conditional proportions are defined using a generalized linear model with a Bernoulli variance function (nota non-Bernoulli binomial variance function), and a logit, probit or complementary log-log link function.Note that

margprevestimates a single marginal prevalence, and does not compare 2 marginal prevalences using differences or ratios. Users who need to estimate differences between scenario proportions (population attributable risks) should useregpar. Users who need to estimate ratios between scenario proportions (population unattributable fractions) should use eitherpunaf(for cohort or cross-sectional study data) orpunafcc(for case-control or survival study data). Users who need to estimate general marginal means for general non-negative outcomes, instead of marginal prevalences for outcomes bounded between 0 and 1, should probably usemarglmean. The packagesmarglmean,regpar,punafandpunafccare downloadable from SSC.

ExamplesThe following examples use the dataset

lbw.dta, provided by Hosmer and Lemeshow (1988) and used in[R] logisticand distributed by Stata Press. This dataset has 1 observation for each of a sample of pregnancies, and data on the birth weight of the baby and on a list of predictive variables, which might be assumed to be causal by some scientists.Setup

. use http://www.stata-press.com/data/r11/lbw.dta, clear. describeThe following example estimates marginal prevalences of low birth weight under the existing scenario and under a fantasy scenario where no mothers smoke.

. logit low i.race i.smoke, or robust. margprev. margprev, at(smoke=0)The following example demonstrates the use of

margprevwith theparmestpackage, downloadable from SSC. The marginal prevalence of low birth weight is estimated usingmargprev(with thepostoption), and saved (in its logit-transformed version), usingparmest, in a dataset in memory, overwriting the original dataset, with 1 observation for the 1 transformed parameter, named"Scenario_1", and data on the estimate, confidence limits,P-value, and other parameter attributes. We then usereplaceto replace the symmetric confidence interval for the transformed parameter with an asymmetric confidence interval for the untransformed parameter, anddescribeandlistthe new dataset.

. logit low i.race i.smoke, or robust. margprev, eform post. parmest, norestore. foreach Y of var estimate min* max* {. replace `Y'=invlogit(`Y'). }. describe. list

Saved results

margprevsaves the following inr():Scalars

r(rank)rank ofr(V)r(N)number of observationsr(N_sub)subpopulation observationsr(N_clust)number of clustersr(N_psu)number of samples PSUs, survey data onlyr(N_strata)number of strata, survey data onlyr(df_r)variance degrees of freedom, survey data onlyr(N_poststrata)number of post strata, survey data onlyr(k_margins)number of terms inmarginlistr(k_by)number of subpopulationsr(k_at)number ofat()options (always 1)r(level)confidence level of confidence intervalsMacros

r(atspec)atspec()optionMatrices

r(cimat)row vector containing estimates and confidence limits for the untransformed marginal prevalencer(b)vector of the logit of the marginal prevalencer(V)estimated variance-covariance matrix of the logit of the marginal prevalenceIf

postis specified,margprevalso saves the following ine():Scalars

e(rank)rank ofe(V)e(N)number of observationse(N_sub)subpopulation observationse(N_clust)number of clusterse(N_psu)number of samples PSUs, survey data onlye(N_strata)number of strata, survey data onlye(df_r)variance degrees of freedom, survey data onlye(N_poststrata)number of post strata, survey data onlye(k_margins)number of terms inmarginliste(k_by)number of subpopulationse(k_at)number ofat()options (always 1)Macros

e(cmd)margpreve(predict)program used to implementpredicte(atspec)atspec()optione(properties)b VMatrices

e(b)vector of the logit of the marginal prevalencee(V)estimated variance-covariance matrix of the logit of the marginal prevalencee(V_srs)simple-random-sampling-without-replacement (co)variance hat V_srswor, ifsvye(V_srswr)simple-random-sampling-with-replacement (co)variance hat V_srswr, ifsvyandfpc()e(V_msp)misspecification (co)variance hat V_msp, ifsvyand availableFunctions

e(sample)marks estimation sample

AuthorRoger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

ReferencesHosmer Jr., D. W., S. Lemeshow, and J. Klar. 1988. Goodness-of-fit testing for the logistic regression model when the estimated probabilities are small.

Biometrical Journal30: 911–924.Lane, P. W., and J. A. Nelder. 1982. Analysis of covariance and standardization as instances of prediction.

Biometrics38: 613–621.

Also seeManual:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] probit,[R]glmHelp:

[R] margins,[R] nlcom,[R] logistic,[R] logit,[R] probit,[R]glmmarglmean,regpar,punaf,punafcc,parmestif installed