------------------------------------------------------------------------------- help foroglm-------------------------------------------------------------------------------

Ordinal Generalized Linear Models

oglmdepvar[indepvars] [weight] [ifexp] [inrange] [,link(logit/probit/cloglog/loglog/cauchit/log)forcelrforcestore(name)constraints(clist)robustcluster(varname)level(#)orirrrrreformhrloghetero(varlist)scale(varlist)eq2(varlist)hclsflipmaximize_options]

oglmshares the features of all estimation commands; see help est.oglmtyped without arguments redisplays previous results. The following options may be given when redisplaying results:

storeorirrrrrhreformlevel(#)

by,svy,nestreg,stepwise,xiand possibly other prefix commands are allowed; see help prefix.

fweights,iweights, andpweights are allowed; see help weights.

Syntax for predict

predict[type]newvars[if] [in] [,statisticoutcome(outcome)]

predict[type] {stub*|newvar_regnewvar_k1...newvar_kk-1} [if] [in],scoreswhere k is the number of outcomes in the model.

statisticDescription ------------------------------------------------------------------------- Mainprpredicted probabilities; the defaultxblinear predictionsigmathe standard deviationstdpstandard error of the linear prediction ------------------------------------------------------------------------- Note that with theproption, you specify one or k new variables depending on whether theoutcome()option is also specified (where k is the number of categories ofdepvar). Withxbandstdp, one new variable is specified.These statistics are available both in and out of sample; type "

predict...if e(sample)..." if wanted only for the estimation sample.

Description

oglmestimates Ordinal Generalized Linear Models. When these models include equations for heteroskedasticity they are also known as heterogeneous choice/ location-scale / heteroskedastic ordinal regression models.oglmsupports multiple link functions, including logit (the default), probit, complementary log-log, log-log and cauchit.When an ordinal regression model incorrectly assumes that error variances are the same for all cases, the standard errors are wrong and (unlike OLS regression) the parameter estimates are biased. Heterogeneous choice/ location-scale models explicitly specify the determinants of heteroskedasticity in an attempt to correct for it. Further, these models can be used when the variance/variability of underlying attitudes is itself of substantive interest. Alvarez and Brehm (1995), for example, argued that individuals whose core values are in conflict will have a harder time making a decision about abortion and will hence have greater variability/error variances in their responses.

Several special cases of ordinal generalized linear models can also be estimated by

oglm, including the parallel lines models ofologitandoprobit(where error variances are assumed to be homoskedastic), the heteroskedastic probit model ofhetprob(where the dependent variable must be a dichotomy and the only link allowed is probit), the binomial generalized linear models oflogit,probitandcloglog(which also assume homoskedasticity), as well as similar models that are not otherwise estimated by Stata. This makesoglmparticularly useful for testing whether constraints on a model (e.g. homoskedastic errors) are justified, or for determining whether one link function is more appropriate for the data than are others.Other features of

oglminclude support for linear constraints, making it possible, for example, to impose and test the constraint that the effects of x1 and x2 are equal.oglmworks with several prefix commands, includingby,nestreg,xi,svyandsw. Itspredictcommand includes the ability to compute estimated probabilities. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. Up to 20 outcomes are allowed.oglmwas inspired by the SPSS PLUM routine but differs somewhat in its terminology, labeling of links, and the variables that are allowed when modeling heteroskedasticity.

Options

link(logit/probit/cloglog/loglog/cauchit/log)specifies the link function to be used. The legal values arelink(logit),link(probit),link(cloglog),link(loglog)andlink(cauchit)which can be abbreviated aslink(l),link(p),link(c),link(ll)andlink(ca).link(logit)is the default if the option is omitted.NOTE:

link(log)is also available but is considered experimental (and possibly wrong) at this point. Stata’sglmprogram successfully uses the log link with dichotomous dependent variables but it is not clear how and how well it generalizes to the ordinal case.The following advice is adapted from Norusis (2005, p. 84): Probit and logit models are reasonable choices when the changes in the cumulative probabilities are gradual. If there are abrupt changes, other link functions should be used. The log-log link may be a good model when the cumulative probabilities increase from 0 fairly slowly and then rapidly approach 1. If the opposite is true, namely that the cumulative probability for lower scores is high and the approach to 1 is slow, the complementary log-log link may describe the data.

WARNING: Programs differ in the names used for some links. Stata's loglog link corresponds to SPSS PLUM's cloglog link; and Stata's cloglog link is called nloglog in SPSS.

hetero(varlist),scale(varlist)andeq2(varlist)are synonyms (use only one of them) and can be used to specify the variables believed to affect heteroskedasticity in heterogeneous choice/ location-scale models. In such models the model chi-square statistic is a test of whether the choice/location parameters and the heteroskedasticity/scale parameters differ from zero; this differs fromhetprob, where the model chi-square only tests the choice/location parameters. The more neutral-soundingeq2(varlist)alternative is provided because it may be less confusing when using theflipoption.WARNING: The default Wald tests conducted by the

nestregandswprefix commands can give incorrect results when the same variable appears in both the location and scale equations. In such cases it is recommended that you usenestreg's andsw's likelihood ratio test options.

flipcauses the command-line placement of the location and scale variables to be reversed, i.e. what would normally be the location variables will instead be the scale variables, and vice- versa. This is primarily useful if you want to use theswornestregprefix commands to do stepwise selection or hierarchical entry of the heteroskedasticity/scale variables. (Just be sure to keep straight which set of variables is which!) Again, if you do this, remember to use the likelihood ratio test options ofnestregorsw, because the default Wald tests may be wrong otherwise.

hcandlsaffect how the equations are labeled. Ifhcis used, then, consistent with the literature on heterogeneous choice, the equations are labeled "choice" and "variance". Iflsis used, the equations are labeled "location" and "scale", which is consistent with SPSS PLUM and other published literature. If neither option is specified, then the scale/heteroskedasticity equation is labeled "lnsigma", which is consistent with other Stata programs such ashetprob.

forcecan be used to forceoglmto issue only warning messages in some situations when it would normally give a fatal error. By default, the dependent variable can have a maximum of 20 categories. A variable with more categories than that is probably a mistaken entry by the user, e.g. a continuous variable has been specified rather than an ordinal one. But, if your dependent variable really is ordinal with more than 20 categories,forcewill letoglmanalyze it (although other practical limitations, such as small sample sizes within categories, may keep it from coming up with a final solution.) Obviously, you should only useforcewhen you are confident that you are not making a mistake.trustmecan be used as a synonym forforce.

lrforceforces Stata to report a Likelihood Ratio Statistic under certain conditions when it ordinarily would not. Some types of constraints can make a Likelihood Ratio chi-square test invalid. Hence, to be safe, Stata reports a Wald statistic whenever constraints are used. But, for many common sorts of constraints (e.g. constraining the effects of two variables to be equal) an LR chi- square statistic is probably appropriate. Note that thelrforceoption will be ignored when robust standard errors are specified either directly or indirectly, e.g. via use of therobustorsvyoptions. Use this option with caution.

store(name)causes the commandestimates storenameto be executed whenoglmfinishes. This is useful for when you wish to estimate a series of models and want to save the results. See help estimates.WARNING: The

storeoption may not work correctly when thesvyprefix is used.

logdisplays the iteration log. By default it is suppressed.

orreports the estimated coefficients transformed to relative odds ratios, i.e., exp(b) rather than b; see[R] ologitfor a description of this concept. Optionsrrr,eform,irrandhrproduce identical results (labeled differently) and can also be used. It is up to the user to decide whether the exp(b) transformation makes sense given the link function used, e.g. it probably doesn't make sense when using the probit link.

constraints(clist)specifies the linear constraints to be applied during estimation. The default is to perform unconstrained estimation. Constraints are defined with the constraint command.constraints(1)specifies that the model is to be constrained according to constraint 1;constraints(1-4)specifies constraints 1 through 4;constraints(1-4,8)specifies 1 through 4 and 8.

robustspecifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation.robustcombined withcluster()allows observations which are not independent within cluster (although they must be independent between clusters). If you specifypweights,robustis implied.

cluster(varname)specifies that the observations are independent across groups (clusters) but not necessarily within groups.varnamespecifies to which group each observation belongs; e.g.,cluster(personid)in data with repeated observations on individuals.cluster()affects the estimated standard errors and variance-covariance matrix of the estimators (VCE), but not the estimated coefficients.cluster()can be used withpweights to produce estimates for unstratified cluster-sampled data.

level(#)specifies the confidence level in percent for the confidence intervals of the coefficients; see help level.

maximize_optionscontrol the maximization process; see help maximize. You should never have to specify most of these. However, thedifficultoption can sometimes be useful with models that are running very slowly or not converging at all.

Options for predict

pr, the default, calculates the predicted probabilities. If you do not also specify theoutcome()option, you must specify k new variables, where k is the number of categories of the dependent variable. Say that you fitted a model by typingoglm result x1 x2, andresulttakes on three values. Then you could typepredict p1 p2 p3to obtain all three predicted probabilities. If you specify theoutcome()option, you must specify one new variable. Say thatresulttakes on the values 1, 2, and 3. Typingpredict p1, outcome(1)would produce the samep1.

xbcalculates the linear prediction. You specify one new variable, for example,predict linear, xb. The linear prediction is defined, ignoring the contribution of the estimated cutpoints.

sigmacalculates the standard deviation, also known as the scale. You specify one new variable, for example,predict sigma, s. If the model does not include an equation for heteroskedasticity then the predicted sigma value is missing for all cases.

stdpcalculates the standard error of the linear prediction. You specify one new variable, for example,predict se, stdp.

outcome(outcome)specifies for which outcome the predicted probabilities are to be calculated.outcome()should contain either a single value of the dependent variable or one of#1,#2,..., with#1meaning the first category of the dependent variable,#2the second category, etc.

scorescalculates equation-level score variables.

Examples

Example 1. Basic models.By default,oglmwill estimate the same models asologit. Thestoreoption is convenient for saving results if you want to contrast different models.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. oglm warm yr89 male white age ed prst. oglm warm yr89 male white age ed prst, store(m1). oglm warm yr89 male white age ed prst, robust

Example 2. Survey data estimation.

. use http://www.stata-press.com/data/r8/nhanes2f.dta, clear. svy: oglm health female black age age2. svy, subpop(female): oglm health black age age2

Example 3. Thepredictcommand.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. quietly oglm warm yr89 male white age ed prst. predict p1 p2 p3 p4

Example 4. Constrained logistic regression.logit,ologit,probitandoprobitprovide other and generally faster means for estimating non-heteroskedastic models with logit and probit links; but none of these commands currently supports the use of linear constraints, such as two variables having equal effects.oglmcan be used for this purpose. For example,

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. recode warm (1 2 = 0)(3 4 = 1), gen(agree). * Constrain the effects of male and white to be equal. constraint 1 male = white. oglm agree yr89 male white age ed prst, lrf store(constrained) c(1). oglm agree yr89 male white age ed prst, store(unconstrained). lrtest constrained unconstrained

Example 5. Other link functions.By default,oglmuses the logit link. If you prefer, however, you can specify probit, complementary log log, log log or log links. In the following example, the same model is estimated using each of the links supported by oglm (note thatlink(log)is considered experimental and possibly wrong.}

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. oglm warm yr89 male white age ed prst, link(l). oglm warm yr89 male white age ed prst, link(p). oglm warm yr89 male white age ed prst, link(c). oglm warm yr89 male white age ed prst, link(ll). oglm warm yr89 male white age ed prst, link(ca). oglm warm yr89 male white age ed prst, link(log)

Example 6. Prefix commands.oglmsupports many of Stata 9's prefix commands. For example,

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. sw, pe(.05): oglm warm yr89 male. xi: oglm warm yr89 i.male. nestreg: oglm warm (yr89 male white age) (ed prst)

Example 7. heteroskedasticity/scale/eq2 option.Thehet,scaleandeq2options are synonyms, use whichever one you prefer.lsandhcare optional and affect whether the equations are labeled consistently with the heterogeneous choice or location-scale literature. If also using theswornestregprefix commands, you should use their likelihood ratio test options since the default Wald tests can be wrong when the same variable appears in both the location and scale equations. Note that it is possible to estimate a heteroskedasticity-only model, and that the variables in the two equations do not need to be the same.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. oglm warm yr89 male white age ed prst, het(yr89) hc. oglm warm yr89 male white age ed prst, scale(male white) ls link(p). oglm warm, eq2(male). sw, pe(.05) lr: oglm warm yr89 male white age ed prst, het(yr89male white). nestreg, lr: oglm warm yr89 male white age ed prst, het(yr89 malewhite)

Example 8. The flip option.In the last two examples, we did stepwise selection and hierarchical entry of the choice/location variables. Suppose we wanted to do stepwise selection or hierarchical entry of the heteroskedasticity/scale variables instead? We can use theflipoption, which causes the command-line placement of the location and scale variables to be reversed. Just make sure you specify each variable list correctly - while thehetero,scaleandeq2options are all synonyms, you may find it less confusing if you useeq2withflip. Also remember to use the likelihood ratio test options withnestregorsw. In the following examples, because of theflipoption, the choice variables are yr89, male, white, age, ed, and prst, while the hetero variables are yr89, male, and white.

. use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta,clear. sw, pe(.05) lr: oglm warm yr89 male white, eq2(yr89 male white ageed prst) flip. nestreg, lr: oglm warm yr89 male white, eq2(yr89 male white age edprst) flip

AuthorRichard Williams Notre Dame Department of Sociology Richard.A.Williams.5@ND.Edu http://www.nd.edu/~rwilliam/oglm/

AcknowledgementsThe documentation and source code for several Stata commands (e.g.

ologit_p) were major aids in developing theoglmdocumentation and in adding support for thepredictcommand. Much of the code is adapted fromMaximum Likelihood Estimation with Stata, Third Edition, by William Gould, Jeffrey Pitblado and William Sribney. SPSS's PLUM routine helped to inspireoglmand provided a means for double-checking the accuracy of the program.Joseph Hilbe, Mike Lacy and Rory Wolfe provided stimulating and helpful comments. Jeff Pitblado helped me with several programming issues.

ReferencesAlvarez, R. Michael and John Brehm. 1995. "American Ambivalence towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values." American Journal of Political Science 39(4):1055-82.

Hardin, James and Joseph Hilbe. 2001. "Generalized Linear Models and Extensions." College Station, TX: Stata Press.

Long, J. Scott and Jeremy Freese. 2006. "Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition." College Station, Texas: Stata Press.

Norusis, Marija. 2005. "SPSS 13.0 Advanced Statistical Procedures Companion." Upper Saddle River, New Jersey: Prentice Hall. See especially the chapter on SPSS PLUM, available on the web at http://www.norusis.com/pdf/ASPC_v13.pdf

Suggested citations if usingoglmin published work

oglmis not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such.Williams, Richard. 2009. "Using Heterogeneous Choice Models To Compare Logit and Probit Coefficients Across Groups" Sociological Methods & Research 37(4): 531-559.A pre-publication version is available at http://www.nd.edu/~rwilliam/oglm/RW_Hetero_Choice.pdf.

Williams, Richard. 2008. "Estimating Heterogeneous Choice Models with Stata." Working paper. http://www.nd.edu/~rwilliam/oglm/oglm_Stata.pdf.

Williams, Richard. 2006. "Generalized Ordered Logit/ Partial Proportional Odds Models for Ordinal Dependent Variables." The Stata Journal 6(1):58-82. A pre-publication version that includes information on updates to the program since the article was published is available at http://www.nd.edu/~rwilliam/gologit2/gologit2.pdf. The published article can be found at http://www.stata-journal.com/article.html?article=st0097

gologit2is a related program and may be more appropriate thanoglmfor some purposes. The two programs can also be used together if you wish to contrast heterogeneous choice / location-scale models with gologit models.I would appreciate an email notification if you use oglm in published work, as well as a citation of one or more of the sources listed above. Also feel free to email me if you have comments about the program or its documentation.

Also seeOnline: help for estcom, postest, constraint, ologit, oprobit, hetprob,