------------------------------------------------------------------------------- help for oglm -------------------------------------------------------------------------------

Ordinal Generalized Linear Models

oglm depvar [indepvars] [weight] [if exp] [in range] [, link(logit/probit/cloglog/loglog/cauchit/log) force lrforce store(name) constraints(clist) robust cluster(varname) level(#) or irr rrr eform hr log hetero(varlist) scale(varlist) eq2(varlist) hc ls flip maximize_options ]

oglm shares the features of all estimation commands; see help est. oglm typed without arguments redisplays previous results. The following options may be given when redisplaying results:

store or irr rrr hr eform level(#)

by, svy, nestreg, stepwise, xi and possibly other prefix commands are allowed; see help prefix.

fweights, iweights, and pweights are allowed; see help weights.

Syntax for predict

predict [type] newvars [if] [in] [, statistic outcome(outcome) ]

predict [type] {stub*|newvar_reg newvar_k1 ... newvar_kk-1} [if] [in] , scores

where k is the number of outcomes in the model.

statistic Description ------------------------------------------------------------------------- Main pr predicted probabilities; the default xb linear prediction sigma the standard deviation stdp standard error of the linear prediction ------------------------------------------------------------------------- Note that with the pr option, you specify one or k new variables depending on whether the outcome() option is also specified (where k is the number of categories of depvar). With xb and stdp, one new variable is specified.

These statistics are available both in and out of sample; type "predict ... if e(sample) ..." if wanted only for the estimation sample.

Description

oglm estimates Ordinal Generalized Linear Models. When these models include equations for heteroskedasticity they are also known as heterogeneous choice/ location-scale / heteroskedastic ordinal regression models. oglm supports multiple link functions, including logit (the default), probit, complementary log-log, log-log and cauchit.

When an ordinal regression model incorrectly assumes that error variances are the same for all cases, the standard errors are wrong and (unlike OLS regression) the parameter estimates are biased. Heterogeneous choice/ location-scale models explicitly specify the determinants of heteroskedasticity in an attempt to correct for it. Further, these models can be used when the variance/variability of underlying attitudes is itself of substantive interest. Alvarez and Brehm (1995), for example, argued that individuals whose core values are in conflict will have a harder time making a decision about abortion and will hence have greater variability/error variances in their responses.

Several special cases of ordinal generalized linear models can also be estimated by oglm, including the parallel lines models of ologit and oprobit (where error variances are assumed to be homoskedastic), the heteroskedastic probit model of hetprob (where the dependent variable must be a dichotomy and the only link allowed is probit), the binomial generalized linear models of logit, probit and cloglog (which also assume homoskedasticity), as well as similar models that are not otherwise estimated by Stata. This makes oglm particularly useful for testing whether constraints on a model (e.g. homoskedastic errors) are justified, or for determining whether one link function is more appropriate for the data than are others.

Other features of oglm include support for linear constraints, making it possible, for example, to impose and test the constraint that the effects of x1 and x2 are equal. oglm works with several prefix commands, including by, nestreg, xi, svy and sw. Its predict command includes the ability to compute estimated probabilities. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. Up to 20 outcomes are allowed. oglm was inspired by the SPSS PLUM routine but differs somewhat in its terminology, labeling of links, and the variables that are allowed when modeling heteroskedasticity.

Options

link(logit/probit/cloglog/loglog/cauchit/log) specifies the link function to be used. The legal values are link(logit), link(probit), link(cloglog), link(loglog) and link(cauchit) which can be abbreviated as link(l), link(p), link(c), link(ll) and link(ca). link(logit) is the default if the option is omitted.

NOTE: link(log) is also available but is considered experimental (and possibly wrong) at this point. Stata’s glm program successfully uses the log link with dichotomous dependent variables but it is not clear how and how well it generalizes to the ordinal case.

The following advice is adapted from Norusis (2005, p. 84): Probit and logit models are reasonable choices when the changes in the cumulative probabilities are gradual. If there are abrupt changes, other link functions should be used. The log-log link may be a good model when the cumulative probabilities increase from 0 fairly slowly and then rapidly approach 1. If the opposite is true, namely that the cumulative probability for lower scores is high and the approach to 1 is slow, the complementary log-log link may describe the data.

WARNING: Programs differ in the names used for some links. Stata's loglog link corresponds to SPSS PLUM's cloglog link; and Stata's cloglog link is called nloglog in SPSS.

hetero(varlist), scale(varlist) and eq2(varlist) are synonyms (use only one of them) and can be used to specify the variables believed to affect heteroskedasticity in heterogeneous choice/ location-scale models. In such models the model chi-square statistic is a test of whether the choice/location parameters and the heteroskedasticity/scale parameters differ from zero; this differs from hetprob, where the model chi-square only tests the choice/location parameters. The more neutral-sounding eq2(varlist) alternative is provided because it may be less confusing when using the flip option.

WARNING: The default Wald tests conducted by the nestreg and sw prefix commands can give incorrect results when the same variable appears in both the location and scale equations. In such cases it is recommended that you use nestreg's and sw's likelihood ratio test options.

flip causes the command-line placement of the location and scale variables to be reversed, i.e. what would normally be the location variables will instead be the scale variables, and vice- versa. This is primarily useful if you want to use the sw or nestreg prefix commands to do stepwise selection or hierarchical entry of the heteroskedasticity/scale variables. (Just be sure to keep straight which set of variables is which!) Again, if you do this, remember to use the likelihood ratio test options of nestreg or sw, because the default Wald tests may be wrong otherwise.

hc and ls affect how the equations are labeled. If hc is used, then, consistent with the literature on heterogeneous choice, the equations are labeled "choice" and "variance". If ls is used, the equations are labeled "location" and "scale", which is consistent with SPSS PLUM and other published literature. If neither option is specified, then the scale/heteroskedasticity equation is labeled "lnsigma", which is consistent with other Stata programs such as hetprob.

force can be used to force oglm to issue only warning messages in some situations when it would normally give a fatal error. By default, the dependent variable can have a maximum of 20 categories. A variable with more categories than that is probably a mistaken entry by the user, e.g. a continuous variable has been specified rather than an ordinal one. But, if your dependent variable really is ordinal with more than 20 categories, force will let oglm analyze it (although other practical limitations, such as small sample sizes within categories, may keep it from coming up with a final solution.) Obviously, you should only use force when you are confident that you are not making a mistake. trustme can be used as a synonym for force.

lrforce forces Stata to report a Likelihood Ratio Statistic under certain conditions when it ordinarily would not. Some types of constraints can make a Likelihood Ratio chi-square test invalid. Hence, to be safe, Stata reports a Wald statistic whenever constraints are used. But, for many common sorts of constraints (e.g. constraining the effects of two variables to be equal) an LR chi- square statistic is probably appropriate. Note that the lrforce option will be ignored when robust standard errors are specified either directly or indirectly, e.g. via use of the robust or svy options. Use this option with caution.

store(name) causes the command estimates store name to be executed when oglm finishes. This is useful for when you wish to estimate a series of models and want to save the results. See help estimates.

WARNING: The store option may not work correctly when the svy prefix is used.

log displays the iteration log. By default it is suppressed.

or reports the estimated coefficients transformed to relative odds ratios, i.e., exp(b) rather than b; see [R] ologit for a description of this concept. Options rrr, eform, irr and hr produce identical results (labeled differently) and can also be used. It is up to the user to decide whether the exp(b) transformation makes sense given the link function used, e.g. it probably doesn't make sense when using the probit link.

constraints(clist) specifies the linear constraints to be applied during estimation. The default is to perform unconstrained estimation. Constraints are defined with the constraint command. constraints(1) specifies that the model is to be constrained according to constraint 1; constraints(1-4) specifies constraints 1 through 4; constraints(1-4,8) specifies 1 through 4 and 8.

robust specifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation. robust combined with cluster() allows observations which are not independent within cluster (although they must be independent between clusters). If you specify pweights, robust is implied.

cluster(varname) specifies that the observations are independent across groups (clusters) but not necessarily within groups. varname specifies to which group each observation belongs; e.g., cluster(personid) in data with repeated observations on individuals. cluster() affects the estimated standard errors and variance-covariance matrix of the estimators (VCE), but not the estimated coefficients. cluster() can be used with pweights to produce estimates for unstratified cluster-sampled data.

level(#) specifies the confidence level in percent for the confidence intervals of the coefficients; see help level.

maximize_options control the maximization process; see help maximize. You should never have to specify most of these. However, the difficult option can sometimes be useful with models that are running very slowly or not converging at all.

Options for predict

pr, the default, calculates the predicted probabilities. If you do not also specify the outcome() option, you must specify k new variables, where k is the number of categories of the dependent variable. Say that you fitted a model by typing oglm result x1 x2, and result takes on three values. Then you could type predict p1 p2 p3 to obtain all three predicted probabilities. If you specify the outcome() option, you must specify one new variable. Say that result takes on the values 1, 2, and 3. Typing predict p1, outcome(1) would produce the same p1.

xb calculates the linear prediction. You specify one new variable, for example, predict linear, xb. The linear prediction is defined, ignoring the contribution of the estimated cutpoints.

sigma calculates the standard deviation, also known as the scale. You specify one new variable, for example, predict sigma, s. If the model does not include an equation for heteroskedasticity then the predicted sigma value is missing for all cases.

stdp calculates the standard error of the linear prediction. You specify one new variable, for example, predict se, stdp.

outcome(outcome) specifies for which outcome the predicted probabilities are to be calculated. outcome() should contain either a single value of the dependent variable or one of #1, #2, ..., with #1 meaning the first category of the dependent variable, #2 the second category, etc.

scores calculates equation-level score variables.

Examples

Example 1. Basic models. By default, oglm will estimate the same models as ologit. The store option is convenient for saving results if you want to contrast different models.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . oglm warm yr89 male white age ed prst . oglm warm yr89 male white age ed prst, store(m1) . oglm warm yr89 male white age ed prst, robust

Example 2. Survey data estimation.

. use http://www.stata-press.com/data/r8/nhanes2f.dta, clear . svy: oglm health female black age age2 . svy, subpop(female): oglm health black age age2

Example 3. The predict command.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . quietly oglm warm yr89 male white age ed prst . predict p1 p2 p3 p4

Example 4. Constrained logistic regression. logit, ologit, probit and oprobit provide other and generally faster means for estimating non-heteroskedastic models with logit and probit links; but none of these commands currently supports the use of linear constraints, such as two variables having equal effects. oglm can be used for this purpose. For example,

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . recode warm (1 2 = 0)(3 4 = 1), gen(agree) . * Constrain the effects of male and white to be equal . constraint 1 male = white . oglm agree yr89 male white age ed prst, lrf store(constrained) c(1) . oglm agree yr89 male white age ed prst, store(unconstrained) . lrtest constrained unconstrained

Example 5. Other link functions. By default, oglm uses the logit link. If you prefer, however, you can specify probit, complementary log log, log log or log links. In the following example, the same model is estimated using each of the links supported by oglm (note that link(log) is considered experimental and possibly wrong.}

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . oglm warm yr89 male white age ed prst, link(l) . oglm warm yr89 male white age ed prst, link(p) . oglm warm yr89 male white age ed prst, link(c) . oglm warm yr89 male white age ed prst, link(ll) . oglm warm yr89 male white age ed prst, link(ca) . oglm warm yr89 male white age ed prst, link(log)

Example 6. Prefix commands. oglm supports many of Stata 9's prefix commands. For example,

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . sw, pe(.05): oglm warm yr89 male . xi: oglm warm yr89 i.male . nestreg: oglm warm (yr89 male white age) (ed prst)

Example 7. heteroskedasticity/scale/eq2 option. The het, scale and eq2 options are synonyms, use whichever one you prefer. ls and hc are optional and affect whether the equations are labeled consistently with the heterogeneous choice or location-scale literature. If also using the sw or nestreg prefix commands, you should use their likelihood ratio test options since the default Wald tests can be wrong when the same variable appears in both the location and scale equations. Note that it is possible to estimate a heteroskedasticity-only model, and that the variables in the two equations do not need to be the same.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . oglm warm yr89 male white age ed prst, het(yr89) hc . oglm warm yr89 male white age ed prst, scale(male white) ls link(p) . oglm warm, eq2(male) . sw, pe(.05) lr: oglm warm yr89 male white age ed prst, het(yr89 male white) . nestreg, lr: oglm warm yr89 male white age ed prst, het(yr89 male white)

Example 8. The flip option. In the last two examples, we did stepwise selection and hierarchical entry of the choice/location variables. Suppose we wanted to do stepwise selection or hierarchical entry of the heteroskedasticity/scale variables instead? We can use the flip option, which causes the command-line placement of the location and scale variables to be reversed. Just make sure you specify each variable list correctly - while the hetero, scale and eq2 options are all synonyms, you may find it less confusing if you use eq2 with flip. Also remember to use the likelihood ratio test options with nestreg or sw. In the following examples, because of the flip option, the choice variables are yr89, male, white, age, ed, and prst, while the hetero variables are yr89, male, and white.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear . sw, pe(.05) lr: oglm warm yr89 male white, eq2(yr89 male white age ed prst) flip . nestreg, lr: oglm warm yr89 male white, eq2(yr89 male white age ed prst) flip

Author

Richard Williams Notre Dame Department of Sociology Richard.A.Williams.5@ND.Edu http://www.nd.edu/~rwilliam/oglm/

Acknowledgements

The documentation and source code for several Stata commands (e.g. ologit_p) were major aids in developing the oglm documentation and in adding support for the predict command. Much of the code is adapted from Maximum Likelihood Estimation with Stata, Third Edition, by William Gould, Jeffrey Pitblado and William Sribney. SPSS's PLUM routine helped to inspire oglm and provided a means for double-checking the accuracy of the program.

Joseph Hilbe, Mike Lacy and Rory Wolfe provided stimulating and helpful comments. Jeff Pitblado helped me with several programming issues.

References

Alvarez, R. Michael and John Brehm. 1995. "American Ambivalence towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values." American Journal of Political Science 39(4):1055-82.

Hardin, James and Joseph Hilbe. 2001. "Generalized Linear Models and Extensions." College Station, TX: Stata Press.

Long, J. Scott and Jeremy Freese. 2006. "Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition." College Station, Texas: Stata Press.

Norusis, Marija. 2005. "SPSS 13.0 Advanced Statistical Procedures Companion." Upper Saddle River, New Jersey: Prentice Hall. See especially the chapter on SPSS PLUM, available on the web at http://www.norusis.com/pdf/ASPC_v13.pdf

Suggested citations if using oglm in published work

oglm is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such.

Williams, Richard. 2009. "Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups" Sociological Methods & Research 37(4): 531-559.A pre-publication version is available at http://www.nd.edu/~rwilliam/oglm/RW_Hetero_Choice.pdf.

Williams, Richard. 2008. "Estimating Heterogeneous Choice Models with Stata." Working paper. http://www.nd.edu/~rwilliam/oglm/oglm_Stata.pdf.

Williams, Richard. 2006. "Generalized Ordered Logit/ Partial Proportional Odds Models for Ordinal Dependent Variables." The Stata Journal 6(1):58-82. A pre-publication version that includes information on updates to the program since the article was published is available at http://www.nd.edu/~rwilliam/gologit2/gologit2.pdf. The published article can be found at http://www.stata-journal.com/article.html?article=st0097

gologit2 is a related program and may be more appropriate than oglm for some purposes. The two programs can also be used together if you wish to contrast heterogeneous choice / location-scale models with gologit models.

I would appreciate an email notification if you use oglm in published work, as well as a citation of one or more of the sources listed above. Also feel free to email me if you have comments about the program or its documentation.

Also see

Online: help for estcom, postest, constraint, ologit, oprobit, hetprob,