{smcl} {.-} help for {cmd:mclest}{right: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx}} {.-} {title:Stata macros for multinomial conditional logit models} {p} {it:MCL} stands for {it:{ul:M}ultinomial {ul:C}onditional {ul:L}ogit} model. A conditional logit program is used to estimate a multinomial logistic model. This produces the same coefficients and standard errors as a regular multinomial logit program but has the advantage that it provides great flexibility for imposing constraints on the dependent variable. {help mclgen} restructures the data so the model can be estimated by {help clogit}, {cmd:mclest} estimates the model using {cmd:clogit}. {p} In addition, {cmd:mclest} can estimate two special models: {it:stereotyped ordered regression} (SOR) and Goodman's {it:row and columns model} 2 (RC2). Both models estimate a scaling metric for the dependent variable; the RC2 model estimates a scaling metric for a categorical independent variable as well. {title:Syntax:} {p 8 27} {cmd:mclest} {it:varlist} [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{it:weight}] [, {cmd:sor(}{it:varlist}{cmd:)} {cmd:soriter(}{it:#}{cmd:)} {cmd:sortol(}{it:#}{cmd:)} {cmd:rc2(}{it:varname}{cmd:)} {cmd:eqrc2(}{it:varname}{cmd:)} {cmd:muby(}{it:varlist}{cmd:)} {cmd:nonorm} {cmd:debug} ] {p} {it:varlist} contains a model specification. The main effects of the {it:response factor} specified in {cmd:mclgen} correspond with the intercept in a multinomial logit model. Interactions of the {it:response factor} with independent variables correspond with the effects of these variables. Because the response factor is on the right hand side of the model specification, it is a simple matter to impose restrictions. The dichotomous dependent variable and the stratification variable are automically passed on from {help mclgen} to {cmd:mclest} and do not have to be specified. {title:Options} {p} These options are used to request the special nonlinear models {it:Sterotyped Ordered Regression} (SOR) and/or the {it:Row and Columns model 2} (RC2). {p 0 4} {cmd:sor(}{it:varlist}{cmd:)} specifies a list of variables for which the SOR model should be estimated. Note that at least two variables should be specified, unless either the {cmd:rc2} or {cmd:eqrc2} option is being used. {p 0 4} {cmd:soriter(}{it:#}{cmd:)} specifies the maximum number of iterations for estimating a SOR or RC2 model. The default value is 20. {p 0 4} {cmd:sortol(}{it:#}{cmd:)} specifies the convergence criterion for estimating a SOR or RC2 model. The default value is .0001. {p 0 4} {cmd:rc2(}{it:varname}{cmd:)} specifies a categorical independent variable for the RC2 model. The eqrc2 option will be ignored if the {cmd:rc2} option is specified. {p 0 4} {cmd:eqrc2(}{it:varname}{cmd:)} specifies a categorical independent variable for the EQRC2 model. The {cmd:rc2} option may not be used together with the eqrc2 option. {p 0 4} {cmd:muby(}{it:varlist}{cmd:)} specifies one or more variables which affect the association between the {cmd:rc2 }or {cmd:eqrc2} variable and the dependent variable. Ignored if not used in conjunction with the {cmd:rc2} or {cmd:eqrc2} option. {p 0 4} {cmd:nonorm} prevents the mclest program from estimating a normalized solution if a SOR and/or RC2 model has been requested. {p 0 4} {cmd:debug} prints intermediate results of {cmd:clogit}. This can be used to determine the source of error if something goes wrong. The default is {cmd:nodebug}. {p} {cmd:mclest} passes the following arguments on to {help clogit} unaltered: {p 4 4} {cmd:weight}, {cmd:if}, {cmd:in} {p} See the Stata documentation on {help clogit} for further details on these options. {title:Usage} {p} For basic use, {cmd:mclest} will be specified with a list of dummy variables representing the main effects of the {it:response factor} and interactions of the {it:response factor} with independent variables. {p} The main effects of the independent variables should not be specified; unfortunately {help xi} does not provide this option. As a result, clogit will report that the main effects have been "omitted due to no within-group variance". This has no further consquences for the estimates. {p} In the following example, the variable {cmd:occ} (respondent's occupation, 5 categories) is the {it:response factor}. It is specified in the {help mclgen} command to transform the data into a {it:person/choice} file. {cmd:mclgen }reports "(3352 observations created)", each of the 838 cases has been duplicated 4 times so that there are now 5 records for each respondent. The cases are indexed by {cmd:occ} and {cmd:__strata}. {cmd:occ} indicates response options 1 to 5, {cmd:__strata} indicates respondents 1 to 838. The variable {cmd:__didep} indicates which record corresponds with the respondent's choice. {p} The model is specified as a main effect of {cmd:occ} and interactions of {cmd:occ} with the independent variables {cmd:educ} and {cmd:black}. The model could be specified as {input:xi: mclest i.occ*educ i.occ*black} instead. This produces the same estimates but a different order of the estimates. {p} {input:* Using mcl to estimate a multinomial logit model} {input:use logan} {input:mlogit occ educ black, base(1)} {input:mclgen occ} {input:xi: mclest i.occ i.occ|educ i.occ|black} {p} The coefficients of this model are the same as those found using {help mlogit}. For models like this, {cmd:mclest} basicly just specifies the {cmd:__didep} and {cmd:__strata} variables for you in a {help clogit} command. The model could also be estimated as: {input:xi: clogit __didep i.occ i.occ|educ i.occ|black, strata(__strata)} {p} This model can be estimated equivalently (and more easily) with {cmd:mlogit}. The advantage of mcl models lies in the ability to easily specify different {it:response functions} for different independent variables. A response function refers to the type of parameterization applied to the response factor. {p} Using {cmd:xi}, the first category of {cmd:occ} is treated as the reference category. In that case, the model is equivalent to using {input:mlogit ..., base(1)}. Using {help xi3} or {help desmat}, other paramterizations, or contrasts, can be applied to the response factor, obtaining other response functions. Equality constraints can be imposed on two categories of the response factor by adding the dummies for those two categories. A parameter can be fixed to zero by dropping the dummy for that cateory. A linear constraint can be imposed by treating the response factor as a continuous variable. In an mcl model, such restrictions can be imposed on the indendent variables on a variable by variable basis. {p} For example, to estimate an {it:adjacent logit} (Agresti 1990: 318) model with {cmd:mclest}, use the {it:backward difference contrast} with either {net search:xi3} or {net search:desmat}. Both {cmd:xi3} and {cmd:desmat} are available from the {help ssc} archives. To estimate and adjacent logit model with {cmd:xi3}, use: {input:xi3: mclest b.occ b.occ*educ b.occ*black} {p} Or with {cmd:desmat}: {input:char occ[pzat] dif(b)} {input:desmat: mclest occ occ.@educ occ.@black} {p} Another conceivable application might be to impose a {it:linear constraint} on the response factor for the effects of {cmd:educ} but to use the usual constraints (first category is reference) for {cmd:black}. This could be done as follows: {input:gen occ_ed=occ*educ} {input:xi: mclest i.occ occ_ed i.occ*black} {title:Mobility models} {p} With {cmd:mclest}, it is easy to impose any constraint you might wish on the response factor and to impose a different constraint for each independent (dummy) variable if necessary. One application of this is to specify {it:loglinear model for square tables}, also known as {it:mobility models}, as multinomial logistic models (cf. Logan 1983, Breen 1994). {p} Mobility models lie in the space between a model of independence and a saturated loglinear model. Special constraints are imposed on the second degree loglinear parameters in order to test for a particular pattern of association and to enhance interpretablity by reducing the number of parameters. Hout (1983) and Goodman (1984) contain overviews of commonly used mobility models. {p} When treated as multinomial logit models, mobility models utilize a different {it:response function}, i.e. a different set of restrictions on the response factor, for different levels of the "other" variable. For example, in a quasi-independence model for father's occupation by son's occupation, son's occupation would be specified as the response factor. For category {it:i} of father's occupation, the response function would be category {it:i} of son's occupation versus the other categories. The model would be specified as: {input:mclgen occ} {input:gen d1=(focc==1)*(occ==1)} {input:gen d2=(focc==2)*(occ==2)} {input:gen d3=(focc==3)*(occ==3)} {input:gen d4=(focc==4)*(occ==4)} {input:gen d5=(focc==5)*(occ==5)} {input:xi: mclest i.occ d* i.occ|black i.occ|educ} {p} The specification {cmd:(occ==}{it:i}{cmd:)} represents the reponse function for a particular dummy variable. This dummy variable will produce the logit of landing in category {it:i} versus some other category. So {cmd:d1} indicates the effect of father's occupation being equal to 1 on the logit for the son being in category 1 versus some other category. {p} As it turns out, once the data have been transformed into a {it:person choice file}, mobility models can be specified in an {cmd:mcl} model in the same way they would be in a loglinear model. So the much more compact specification: {input:gen diag=(focc==occ)*focc} {input:xi: mclest i.occ i.diag i.occ|black i.occ|educ} {p} Could be used as well. Examples for estimating a number of mobility models are in the file {cmd:mobility.do} in the {cmd:desmat} package. Use {net get desmat:net get desmat} to download these ancillary files {title:Stereotyped Ordered Regression} {p} {cmd:mclest} is also able to estimate certain special designs incorporating both linear and multiplicative effects. One of these is the {it:Stereotyped Ordered Regression model} (Anderson 1984, DiPrete 1990). The SOR model is an alternative to the {it:proportional odds model} estimated by {help ologit}. The SOR model estimates a scaling metric for the response factor based on the effects of independent variables. The model has {it:J}-1 intercept parameters for a response factor with {it:J} categories, just like an unordered multinomial logit model. However, it has a single {hi:beta} parameter for each independent variable, together with {it:J}-2 independent {it:scale values} {hi:phi[j]} for the response factor. {p} Two restrictions must be placed on the scaling metric in order to identify the model. {cmd:mclest} sets the value for the first category to 0 and the value for the last category to 1 while estimating the model. For the final estimates, the scaling metric is also normalized, with a mean of 0 and a sum of squares of 1. {p} The SOR model can be specified as: {asis} log(P(Y==q)/P(Y==r)) = a[q]-a[r] + (phi[q]-phi[r])(b[1]X[1]+b[2]X[2]+ ... +b[K]X[K]) {smcl} {p 0 4} Where {break}{hi:Y} is the response factor with categories {it:j}=1 to {it:J}, {break}{hi:q} and {hi:r} are any two categories of {hi:Y}, {break}{hi:a[j]} represents the intercept parameters with suitable restrictions, {break}{hi:phi[j]} represents the scaling metric with suitable restrictions, {break}{hi:X[k]} represents independent variables with {it:k}=1 to {it:K}, and {break}{hi:b[k]} represents parameters of the independent variables. {p} Compare this to a standard multinomial logistic model: {asis} log(P(Y==q)/P(Y==r)) = a[q]-a[r] + (b[q1]-b[r1])X[1]+(b[q2]-b[r2])X[2]+ ... +(b[qK]-b[rK])X[K]) {smcl} {p} In a multinomial model, the difference between {hi:b[qk]} and {hi:b[rk]} show how the {cmd:logit(}{hi:q}{cmd:/}{hi:r}{cmd:)} is affected by {hi:X[k]}. In the SOR model, the degree of this effect equals {hi:(phi[q]-phi[r])b[k]X[k]}. The SOR model forces the effect on the logit for any two outcomes to be proportional for all independent variables, with the magnitude of the effect being determined by the {hi:b[k]} parameters. {p} A SOR model can be requested by specifying a varlist in the {cmd:sor} option. A SOR model with only one {hi:X[k]} variable would be trivial and equivalent to standard multinomial model since it contains the same number of parameters. A simple SOR model with two variables could be specified as: {input:use logan} {input:mclgen occ} {input:xi: mclest i.occ, sor(educ black)} {p} This model will contain 9 parameters: 4 intercept parameters, 3 independent {hi:phi[j]} parameters, and 2 {hi:b[k]} parameters. This is only slightly 3 less than for an unrestricted multinomial model. However, the parsimony of a SOR model does increase as the number of {hi:X[k]} variables increase. {p} The SOR model contains both linear and multiplicative elements. To estimate it, {cmd:mclest} iteratively estimates MCL models, first taking the {hi:phi[j]} scaling metric as given and estimating the {hi:b[k]} parameters, then taking the {hi:b[k]} parameters as given and estimating the {hi:phi[j]} parameters. This continues until the change in log likelihood between successive MCL models is less than the value specified in the {cmd:sortol} option (defalut .0001) or the maximum number of iterations specified in the {cmd:soriter} option is exceeded (default 20). {p} As a result of this estimation procedure, no standard errors can be given for the {hi:phi[j]} parameters and standard errors for the remaining parameters are conditional, given the scaling metric. In addition, the model degrees of freedom reported by {cmd:clogit} are not correct since the estimates for the scaling metric are not taken into account. See the {hi:Model fit information} at the end of the output for the correct number of degrees of freedom. {cmd:e(df_m)} in the saved results also contains the correct degrees of freedom. {title:Row and Columns model 2} {p} A second special model that can be estimated by {cmd:mclest} is Goodman's (1979) Row and Columns model 2. Originally developed for frequency tables, the RC2 model estimates scaling metrics for both the dependent variable and one of the independent variables. The association between the two variables can then be expressed through a single parameter {hi:mu}. The scaling metric for the dependent variable is {hi:phi[j]} as in the SOR model, the scaling metric for the independent variable is {hi:sigma[v]}. Two restrictions must be imposed on {hi:phi[j]} and {hi:sigma[v]} to identify the model. During estimation, {cmd:mclest} sets {hi:phi}[1]={hi:sigma}[1]=0 and {hi:phi}[{hi:J}]={hi:sigma}[{hi:V}]=1. The final estimates are also given for normalized scale values, where mean({hi:phi}[{hi:j}])={hi:mean}(sigma[{hi:v}])=0 and SS({hi:phi}[{hi:j}])=SS({hi:sigma}[{hi:v}])=1. {p} A model containing an RC2 effect could be specified as: {asis} logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])mu*sigma[v] {smcl} {p} This model can be extended with standard and/or SOR effectsof independent variables. Indeed, the RC2 effects can be seen as the SOR effects of a categorical variable, scaled by {hi:mu}*{hi:sigma[v]}. {p} A variation of the RC2 model is the EQual Row and Columns model 2 (EQRC2), which as the name suggests uses the same scale for the dependent variable and the categorical independent. {asis} logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])mu*phi[v] {smcl} {p} Another variation implemented in {cmd:mclest} allows the association mu between the dependent and independent variable to vary by one or more other variables. {asis} logit(q/r) = a[q]-a[r] + (phi[q]-phi[r])(mu[0]+mu[t]X[t])*phi[v] {smcl} {p} An overall association parameter mu[0] is estimated, together with mu[t] parameters indicating how the association changes for each independent variable X[t], t=1 to T. {p} An RC2 model is requested by specifying a varname in the {cmd:rc2} option. At present, only one variable can be used for the RC2 effect. Similarly, an EQRC2 model can be requested by specifying a varname in the {cmd:eqrc2} option. The {cmd:rc2} and {cmd:eqrc2} options are mutually exclusive. To let the overall association vary by one or more independent variables, specify a varlist in the {cmd:muby} option. {p} The following example estimates a quasi RC2 model for father's occupation, including both effects for identical categories (diag) and an rc2 effect. The overall association mu between father's occupation and respondent's occupation is allowed to vary by race. Further more, race and education are included in the model as covariates using a SOR effect. {input:use logan} {input:mclgen occ} {input:gen diag=(focc==occ)*focc} {input:xi: mclest i.occ i.diag, sor(educ black) rc2(focc) muby(black)} {p} Models containing RC2 or EQRC2 effects are estimated by iteratively running MCL models, as is the case for SOR models. Convergence criterion and maximum iterations are determined by the {cmd:sortol} and {cmd:soriter} options. As with SOR models, no standard errors are available for the sigma[v] metric and other standard errors are conditional on the {hi:phi[j]} and {hi:sigma[v]} estimates. See the {hi:Model fit information} at the end of the output for the correct number of model degrees of freedom. {p} For tabular data, a separate {net search rc2:rc2} program is also available from the {help ssc} archives. Note that {cmd:mclest} can estimate these models in many cases. The differences are that {cmd:mclest} will not estimate the main effects of the row variable and that interactions between the row variable and a grouping variable must be included in {cmd:rc2} for the models to be equivalent. An advantage of {cmd:rc2} is that it does not require a restructuring of the data by {cmd:mclgen}. {title:Saved results} {p} In addition to the results saved by {help clogit}, {cmd:mclest} saves the following matrices: {p 0 4} {cmd:e(phi)} {break}the phi scale with the first category fixed to 0, the last to 1 {p 0 4} {cmd:e(phi_n)} {break}the phi scale with mean 0 and sum of squares 1 {p 0 4} {cmd:e(df_m)} {break}the model degrees of freedom adjusted for the sigma and phi parameters {p} If the {cmd:rc2} or {cmd:eqrc2} option has been used, {cmd:mclest} also saves: {p 0 4} {cmd:e(sig)} {break}the sigma scale with the first category fixed to 0, the last to 1 {p 0 4} {cmd:e(sig_n)} {break}the sigma scale with mean 0 and sum of squares 1 {title:References} {p 0 4} Anderson, J.A. (1984). Regression and Ordered Categorical Variables. {it:Journal of the Royal Statistical Society}, Series B 46: 1-30. {p 0 4} Breen, Richard. (1994). Individual Level Models for Mobility Tables and Other Cross-Classifications. {it:Sociological Methods & Research} 33: 147-173. {p 0 4} DiPrete, Thomas A. (1990). Adding Covariates to Loglinear Models for the Study of Social Mobility. {it:American Sociological Review} 55: 757-773. {p 0 4} Goodman, Leo A. (1979). Multiplicative models for the analysis of occupational mobility tables and other kinds of cross-classification tables. {it:American Journal of Sociology} 84: 804-819. {p 0 4} Hendrickx, John, Ganzeboom, Harry B.G. (1998). Occupational Status Attainment in the Netherlands, 1920-1990. A Multinomial Logistic Analysis. {it:European Sociological Review} 14: 387-403. {p 0 4} Hout, Michael. (1983). {it:Mobility Tables}. Sage Publication 07-031. {p 0 4} Logan, John A. (1983). A Multivariate Model for Mobility Tables. {it:American Journal of Sociology} 89: 324-349. {p 0 4} Xie, Yu (2003). Association Model. In the {it:Encyclopedia of Social Science Research Methods}, edited by Michael Lewis-Beck, Alan Bryman and Tim Futing Liao. Thousand Oaks, Ca: Sage (2003). {p 0 4} {browse "http://www-personal.umich.edu/~yuxie/Research/Assoc-program.html":http://www-personal.umich.edu/~yuxie/Research/Assoc-program.html} {p} Direct comments to: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx} {p} {cmd:mclest} is available at {browse "http://ideas.uqam.ca/ideas/data/bocbocode.html":SSC-IDEAS}. Use {help ssc} {cmd:install mcl} to obtain the latest version. {p} The packages {net search rc2:rc2}, {net search desmat:desmat}, {net search xi3:xi3}, are also available from the ssc archives. {title:Also see} {p 0 21} On-line: help for {help mclgen}, {help clogit}, {help mlogit}, {help ologit} {help desmat}, {help desrep}, {help xi}, {help xi3}, {help rc2} {p_end}