------------------------------------------------------------------------------- help formclestJohn Hendrickx -------------------------------------------------------------------------------

Stata macros for multinomial conditional logit models

MCLstands forMultinomialConditionalLogitmodel. A conditional logit program is used to estimate a multinomial logistic model. This produces the same coefficients and standard errors as a regular multinomial logit program but has the advantage that it provides great flexibility for imposing constraints on the dependent variable. mclgen restructures the data so the model can be estimated by clogit,mclestestimates the model usingclogit.In addition,

mclestcan estimate two special models:stereotyped orderedregression(SOR) and Goodman'srow and columns model2 (RC2). Both models estimate a scaling metric for the dependent variable; the RC2 model estimates a scaling metric for a categorical independent variable as well.

Syntax:

mclestvarlist[ifexp] [inrange] [weight] [,sor(varlist)soriter(#)sortol(#)rc2(varname)eqrc2(varname)muby(varlist)nonormdebug]

varlistcontains a model specification. The main effects of theresponse factorspecified inmclgencorrespond with the intercept in a multinomial logit model. Interactions of theresponse factorwith independent variables correspond with the effects of these variables. Because the response factor is on the right hand side of the model specification, it is a simple matter to impose restrictions. The dichotomous dependent variable and the stratification variable are automically passed on from mclgen tomclestand do not have to be specified.

OptionsThese options are used to request the special nonlinear models

SterotypedOrdered Regression(SOR) and/or theRow and Columns model 2(RC2).

sor(varlist)specifies a list of variables for which the SOR model should be estimated. Note that at least two variables should be specified, unless either therc2oreqrc2option is being used.

soriter(#)specifies the maximum number of iterations for estimating a SOR or RC2 model. The default value is 20.

sortol(#)specifies the convergence criterion for estimating a SOR or RC2 model. The default value is .0001.

rc2(varname)specifies a categorical independent variable for the RC2 model. The eqrc2 option will be ignored if therc2option is specified.

eqrc2(varname)specifies a categorical independent variable for the EQRC2 model. Therc2option may not be used together with the eqrc2 option.

muby(varlist)specifies one or more variables which affect the association between therc2oreqrc2variable and the dependent variable. Ignored if not used in conjunction with therc2oreqrc2option.

nonormprevents the mclest program from estimating a normalized solution if a SOR and/or RC2 model has been requested.

debugprints intermediate results ofclogit. This can be used to determine the source of error if something goes wrong. The default isnodebug.

mclestpasses the following arguments on to clogit unaltered:

weight,if,inSee the Stata documentation on clogit for further details on these options.

UsageFor basic use,

mclestwill be specified with a list of dummy variables representing the main effects of theresponse factorand interactions of theresponse factorwith independent variables.The main effects of the independent variables should not be specified; unfortunately xi does not provide this option. As a result, clogit will report that the main effects have been "omitted due to no within-group variance". This has no further consquences for the estimates.

In the following example, the variable

occ(respondent's occupation, 5 categories) is theresponse factor. It is specified in the mclgen command to transform the data into aperson/choicefile.mclgenreports "(3352 observations created)", each of the 838 cases has been duplicated 4 times so that there are now 5 records for each respondent. The cases are indexed byoccand__strata.occindicates response options 1 to 5,__strataindicates respondents 1 to 838. The variable__didepindicates which record corresponds with the respondent's choice.The model is specified as a main effect of

occand interactions ofoccwith the independent variableseducandblack. The model could be specified as xi: mclest i.occ*educ i.occ*black instead. This produces the same estimates but a different order of the estimates.* Using mcl to estimate a multinomial logit model use logan

mlogit occ educ black, base(1)

mclgen occ xi: mclest i.occ i.occ|educ i.occ|black

The coefficients of this model are the same as those found using mlogit. For models like this,

mclestbasicly just specifies the__didepand__stratavariables for you in a clogit command. The model could also be estimated as:xi: clogit __didep i.occ i.occ|educ i.occ|black, strata(__strata)

This model can be estimated equivalently (and more easily) with

mlogit. The advantage of mcl models lies in the ability to easily specify differentresponse functionsfor different independent variables. A response function refers to the type of parameterization applied to the response factor.Using

xi, the first category ofoccis treated as the reference category. In that case, the model is equivalent to using mlogit ..., base(1). Using xi3 or desmat, other paramterizations, or contrasts, can be applied to the response factor, obtaining other response functions. Equality constraints can be imposed on two categories of the response factor by adding the dummies for those two categories. A parameter can be fixed to zero by dropping the dummy for that cateory. A linear constraint can be imposed by treating the response factor as a continuous variable. In an mcl model, such restrictions can be imposed on the indendent variables on a variable by variable basis.For example, to estimate an

adjacent logit(Agresti 1990: 318) model withmclest, use thebackward difference contrastwith either xi3 or desmat. Bothxi3anddesmatare available from the ssc archives. To estimate and adjacent logit model withxi3, use:xi3: mclest b.occ b.occ*educ b.occ*black

Or with

desmat:char occ[pzat] dif(b) desmat: mclest occ occ.@educ occ.@black

Another conceivable application might be to impose a

linear constrainton the response factor for the effects ofeducbut to use the usual constraints (first category is reference) forblack. This could be done as follows:gen occ_ed=occ*educ xi: mclest i.occ occ_ed i.occ*black

Mobility modelsWith

mclest, it is easy to impose any constraint you might wish on the response factor and to impose a different constraint for each independent (dummy) variable if necessary. One application of this is to specifyloglinear modelfor square tables, also known asmobility models, as multinomial logistic models (cf. Logan 1983, Breen 1994).Mobility models lie in the space between a model of independence and a saturated loglinear model. Special constraints are imposed on the second degree loglinear parameters in order to test for a particular pattern of association and to enhance interpretablity by reducing the number of parameters. Hout (1983) and Goodman (1984) contain overviews of commonly used mobility models.

When treated as multinomial logit models, mobility models utilize a different

response function, i.e. a different set of restrictions on the response factor, for different levels of the "other" variable. For example, in a quasi-independence model for father's occupation by son's occupation, son's occupation would be specified as the response factor. For categoryiof father's occupation, the response function would be categoryiof son's occupation versus the other categories. The model would be specified as:mclgen occ gen d1=(focc==1)*(occ==1) gen d2=(focc==2)*(occ==2) gen d3=(focc==3)*(occ==3) gen d4=(focc==4)*(occ==4) gen d5=(focc==5)*(occ==5) xi: mclest i.occ d* i.occ|black i.occ|educ

The specification

(occ==i)represents the reponse function for a particular dummy variable. This dummy variable will produce the logit of landing in categoryiversus some other category. Sod1indicates the effect of father's occupation being equal to 1 on the logit for the son being in category 1 versus some other category.As it turns out, once the data have been transformed into a

person choice file, mobility models can be specified in anmclmodel in the same way they would be in a loglinear model. So the much more compact specification:gen diag=(focc==occ)*focc xi: mclest i.occ i.diag i.occ|black i.occ|educ

Could be used as well. Examples for estimating a number of mobility models are in the file

mobility.doin thedesmatpackage. Use net get desmat to download these ancillary files

Stereotyped Ordered Regression

mclestis also able to estimate certain special designs incorporating both linear and multiplicative effects. One of these is theStereotyped OrderedRegression model(Anderson 1984, DiPrete 1990). The SOR model is an alternative to theproportional odds modelestimated by ologit. The SOR model estimates a scaling metric for the response factor based on the effects of independent variables. The model hasJ-1 intercept parameters for a response factor withJcategories, just like an unordered multinomial logit model. However, it has a singlebetaparameter for each independent variable, together withJ-2 independentscale valuesphi[j]for the response factor.Two restrictions must be placed on the scaling metric in order to identify the model.

mclestsets the value for the first category to 0 and the value for the last category to 1 while estimating the model. For the final estimates, the scaling metric is also normalized, with a mean of 0 and a sum of squares of 1.The SOR model can be specified as:

log(P(Y==q)/P(Y==r)) = a[q]-a[r] + (phi[q]-phi[r])(b[1]X[1]+b[2]X[2]+ ... +b[K]X[K])

Where

Yis the response factor with categoriesj=1 toJ,qandrare any two categories ofY,a[j]represents the intercept parameters with suitable restrictions,phi[j]represents the scaling metric with suitable restrictions,X[k]represents independent variables withk=1 toK, andb[k]represents parameters of the independent variables.Compare this to a standard multinomial logistic model:

log(P(Y==q)/P(Y==r)) = a[q]-a[r] + (b[q1]-b[r1])X[1]+(b[q2]-b[r2])X[2]+ ... +(b[qK]-b[rK])X[K])

In a multinomial model, the difference between

b[qk]andb[rk]show how thelogit(q/r)is affected byX[k]. In the SOR model, the degree of this effect equals(phi[q]-phi[r])b[k]X[k]. The SOR model forces the effect on the logit for any two outcomes to be proportional for all independent variables, with the magnitude of the effect being determined by theb[k]parameters.A SOR model can be requested by specifying a varlist in the

soroption. A SOR model with only oneX[k]variable would be trivial and equivalent to standard multinomial model since it contains the same number of parameters. A simple SOR model with two variables could be specified as:use logan mclgen occ

xi: mclest i.occ, sor(educ black)

This model will contain 9 parameters: 4 intercept parameters, 3 independent

phi[j]parameters, and 2b[k]parameters. This is only slightly 3 less than for an unrestricted multinomial model. However, the parsimony of a SOR model does increase as the number ofX[k]variables increase.The SOR model contains both linear and multiplicative elements. To estimate it,

mclestiteratively estimates MCL models, first taking thephi[j]scaling metric as given and estimating theb[k]parameters, then taking theb[k]parameters as given and estimating thephi[j]parameters. This continues until the change in log likelihood between successive MCL models is less than the value specified in thesortoloption (defalut .0001) or the maximum number of iterations specified in thesoriteroption is exceeded (default 20).As a result of this estimation procedure, no standard errors can be given for the

phi[j]parameters and standard errors for the remaining parameters are conditional, given the scaling metric. In addition, the model degrees of freedom reported byclogitare not correct since the estimates for the scaling metric are not taken into account. See theModel fit informationat the end of the output for the correct number of degrees of freedom.e(df_m)in the saved results also contains the correct degrees of freedom.

Row and Columns model 2A second special model that can be estimated by

mclestis Goodman's (1979) Row and Columns model 2. Originally developed for frequency tables, the RC2 model estimates scaling metrics for both the dependent variable and one of the independent variables. The association between the two variables can then be expressed through a single parametermu. The scaling metric for the dependent variable isphi[j]as in the SOR model, the scaling metric for the independent variable issigma[v]. Two restrictions must be imposed onphi[j]andsigma[v]to identify the model. During estimation,mclestsetsphi[1]=sigma[1]=0 andphi[J]=sigma[V]=1. The final estimates are also given for normalized scale values, where mean(phi[j])=mean(sigma[v])=0 and SS(phi[j])=SS(sigma[v])=1.A model containing an RC2 effect could be specified as:

logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])mu*sigma[v]

This model can be extended with standard and/or SOR effectsof independent variables. Indeed, the RC2 effects can be seen as the SOR effects of a categorical variable, scaled by

mu*sigma[v].A variation of the RC2 model is the EQual Row and Columns model 2 (EQRC2), which as the name suggests uses the same scale for the dependent variable and the categorical independent.

logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])mu*phi[v]

Another variation implemented in

mclestallows the association mu between the dependent and independent variable to vary by one or more other variables.logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])(mu[0]+mu[t]X[t])*phi[v]

An overall association parameter mu[0] is estimated, together with mu[t] parameters indicating how the association changes for each independent variable X[t], t=1 to T.

An RC2 model is requested by specifying a varname in the

rc2option. At present, only one variable can be used for the RC2 effect. Similarly, an EQRC2 model can be requested by specifying a varname in theeqrc2option. Therc2andeqrc2options are mutually exclusive. To let the overall association vary by one or more independent variables, specify a varlist in themubyoption.The following example estimates a quasi RC2 model for father's occupation, including both effects for identical categories (diag) and an rc2 effect. The overall association mu between father's occupation and respondent's occupation is allowed to vary by race. Further more, race and education are included in the model as covariates using a SOR effect.

use logan mclgen occ

gen diag=(focc==occ)*focc xi: mclest i.occ i.diag, sor(educ black) rc2(focc) muby(black)

Models containing RC2 or EQRC2 effects are estimated by iteratively running MCL models, as is the case for SOR models. Convergence criterion and maximum iterations are determined by the

sortolandsoriteroptions. As with SOR models, no standard errors are available for the sigma[v] metric and other standard errors are conditional on thephi[j]andsigma[v]estimates. See theModel fit informationat the end of the output for the correct number of model degrees of freedom.For tabular data, a separate rc2 program is also available from the ssc archives. Note that

mclestcan estimate these models in many cases. The differences are thatmclestwill not estimate the main effects of the row variable and that interactions between the row variable and a grouping variable must be included inrc2for the models to be equivalent. An advantage ofrc2is that it does not require a restructuring of the data bymclgen.

Saved resultsIn addition to the results saved by clogit,

mclestsaves the following matrices:

e(phi)the phi scale with the first category fixed to 0, the last to 1

e(phi_n)the phi scale with mean 0 and sum of squares 1

e(df_m)the model degrees of freedom adjusted for the sigma and phi parametersIf the

rc2oreqrc2option has been used,mclestalso saves:

e(sig)the sigma scale with the first category fixed to 0, the last to 1

e(sig_n)the sigma scale with mean 0 and sum of squares 1

ReferencesAnderson, J.A. (1984). Regression and Ordered Categorical Variables.

Journalof the Royal Statistical Society, Series B 46: 1-30.Breen, Richard. (1994). Individual Level Models for Mobility Tables and Other Cross-Classifications.

Sociological Methods & Research33: 147-173.DiPrete, Thomas A. (1990). Adding Covariates to Loglinear Models for the Study of Social Mobility.

American Sociological Review55: 757-773.Goodman, Leo A. (1979). Multiplicative models for the analysis of occupational mobility tables and other kinds of cross-classification tables.

AmericanJournal of Sociology84: 804-819.Hendrickx, John, Ganzeboom, Harry B.G. (1998). Occupational Status Attainment in the Netherlands, 1920-1990. A Multinomial Logistic Analysis.

EuropeanSociological Review14: 387-403.Hout, Michael. (1983).

Mobility Tables. Sage Publication 07-031.Logan, John A. (1983). A Multivariate Model for Mobility Tables.

AmericanJournal of Sociology89: 324-349.Xie, Yu (2003). Association Model. In the

Encyclopedia of Social ScienceResearch Methods, edited by Michael Lewis-Beck, Alan Bryman and Tim Futing Liao. Thousand Oaks, Ca: Sage (2003).http://www-personal.umich.edu/~yuxie/Research/Assoc-program.html

Direct comments to: John Hendrickx

mclestis available at SSC-IDEAS. Use sscinstall mclto obtain the latest version.The packages rc2, desmat, xi3, are also available from the ssc archives.

On-line: help for mclgen, clogit, mlogit, ologit desmat, desrep, xi, xi3, rc2Also see