Sequential logit model for Stata 9 and 10
seqlogit10 depvar [indepvars] [if] [in] [weight] , tree(tree) [ ofinterest(varname) over(varlist) sd(numlist) deltasd(varname numlist) rho(#) { pr(numlist) | mn(# # , # # [, # #, etc.]) | uniform } draws(#) drawstart(#) or constraints(numlist) robust cluster(clustervar) nolog level(#) maximize_options ]
by ... : may be used with seqlogit; see help by.
pweights, fweights and iweights are allowed; see help weights.
Description
seqlogit10 fits a sequential logit model in Stata 9.2 or 10. Users with more recent versions of Stata should use seqlogit. This model is know under a variety of other names: sequential response model (maddala 1983), continuation ratio logit (Agresti 2002), model for nested dichotomies (fox 1997), and the Mare model (shavit and blossfeld93) (after (Mare 1981)).
A sequential logit model can be estimated quite simply by estimating a number of logit models. The seqlogit10 package serves three additional purposes: First, it makes it easier to test hypotheses across transitions since the entire model is estimated simultaneously. Second, it implements the decomposition proposed by Buis (2010a) of the effect of an explanatory variable on the outcome of the process described by the sequential logit into the contributions of each of the transitions. The implementation is discussed in seqlogit10 postestimation. Third, it implements and extends the strategy proposed by Buis (2011) of doing a sensitivity analysis to investigate the potential influence of unobserved variables.
For this last purpose, the seqlogit10 package allows own to estimate a sequential logit given a scenario concerning the unobserved variables. These effects will only be estimated when the sd() option is specified. A regular sequential logit model, which assumes that there is no unobserved heterogeneity, is estimated if the sd() option is not specified. The scenarios assume that these unobserved variables either add up to a standardized normally (Gaussian) distributed variable (the default), or to a standardized discrete variable (when the pr() option is specified), or a mixture of normal distributions (when the mn() option is specified), or a uniform distribution (when the uniform distribution is specified. The effects of this agregate unobserved variable during each transition are specified in the sd() option. One can allow the effect of the unobserved variable to change over another variable using the deltasd() option. The correlation during the first transition between this unobserved variable and the variable specified in the ofinterest() option is specified in the rho() option. The scenarios are estimated using maximum simulated likelihood, while the regular sequential logit model is estimated using regular maximum likelihood. Advise on how to use these different scenarios in a sensitivity analysis is given here: seqlogit10_sensitivity.
Options
+-------+ ----+ Model +------------------------------------------------------------
tree(tree) specifies the sequence of transitions that make up the model. The transitions are seperated with commas and the choices within transitions are seperated with colons. The levels are represented by the levels of the depvar. It is thus convenient to code depvar as a series of integers. For example, say there are three levels, 1, 2, and 3, and the first transition consist of a choice between value 1 versus values 2 or 3, and the second transition consists (for those who didn't choose value 1) of a choice between values 2 and 3. The tree option should than be: tree(1 : 2 3 , 2 : 3).
All values of depvar must be specified in the tree and all values in the tree must occur in depvar. Furthermore, all levels must be reachable through one and only one path through the tree.
ofinterest(varname) specifies the variable whose effect will be decomposed when using the seqlogitdecomp10 command. The variable specified is added to the list of explanatory variables.
over(varlist) specifies the variable(s) over which the effect of the variable specified in the ofinterest() option is allowed to change. This/these variable(s) and the interaction effect between the variable(s) spefied in over() and ofinterest() are added to the list of explanatory variables. ofinterest() needs to be specified when specifying over().
constraints(numlist) specifies linear constraints to be applied during estimation, see constraint.
+-----------+ ----+ Scenarios +--------------------------------------------------------
sd(numlist) specifies for each transition the effect of the standardized unobserved variable. If only one number is specified, this effect will be assumed constant over transitions, otherwise the first number will refer to the first transition, the second number to the second transition, etc. The default is 0.
deltasd(varname numlist) specifies how the effect of the unobserved variable changes over varname. If only one number is specified, this change will be assumed constant over transitions, otherwise the first number will refer to the first transition, the second number to the second transition, etc. The default is that the effect of the unobserved variable is constant over all variables.
rho(#) specifies the correlation of the unobserved variable and the variable specified in ofinterest(). The default is 0.
The pr(), mn(), and uniform options govern the distribution of the unobserved variable. They are mutually exclusive. If the pr(), mn(), and uniform are not specified but the sd() option is specified, than the unobserved variable will be represented by a standard normal distribution.
pr(numlist) specifies that the unobserved variable is to be represented by a discrete distribution. The numbers in pr() represent the proportion of observations that belong to each category. Since the numbers are propotions, they all need to be larger than 0 and they need to add up to 1. The location of these categories will be chosen such that the mean is 0, the standard deviation is 1, and all categories are separated by the same distance. The pr() option may not be specified without specifying the sd() option.
mn(# # , # # [, # #, etc.]) specified that the unobserved variable is to be represented as a mixture of normal distributions. This option should consist of multiple elements separated by commas, whereby each element consists of two numbers, the first is the proprotion the latter a mean of one of the components of the mixture distribution. The proportions should add up to 1. The means of the components will be transformed such that the mean of the of the distribution equals 0. The variances of the components are assumed to be equal and are choses such that the overall variance equals 1. This means that not all proportion-mean combination will lead to valid mixture distributions, in which case seqlogit10 will report an error and stop.
uniform specifies that the unobserved variable is to be represented by a uniform distribution with mean 0 and standard deviation 1. This could make sense when we think that it is the rank order on the unobserved variables that influences the outcome rather than some metric value, as the distribution of rank scores is a uniform distribution. This assumption could be justified when we assume that the unobserved variables are also be hard to observe for the actors themselves, and that the rank score is easier to observe (Jill is smarter than Jack, we just don't know how much smarter.).
draws(#) specifies the number of pseudo random draws per observation used when calculating the simulated likelihood. The default is 100. Because maximum simulated likelihood is only used when the sd() option is specified, the draws() option can only be specified when the sd() option is specified.
drawstart(#) specifies the index at which the Halton sequence starts. The default is 15.
+-----------+ ----+ Reporting +--------------------------------------------------------
or report odds ratios
robust specifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation; see [U] 23.14 Obtaining robust variance estimates. robust combined with cluster() allows observations which are not independent within cluster (although they must be independent between clusters).
cluster(clustervar) specifies that the observations are independent across groups (clusters) but not necessarily within groups. clustervar specifies to which group each observation belongs; e.g., cluster(personid) in data with repeated observations on individuals. See [U] 23.14 Obtaining robust variance estimates. Specifying cluster() implies robust.
level(#) specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level.
nolog suppresses an iteration log of the log likelihood
maximize_options: difficult, technique(algorithm_spec), iterate(#), trace, gradient, showstep, hessian, shownrtolerance, tolerance(#), ltolerance(#), gtolerance(#), nrtolerance(#), nonrtolerance(#); see maximize. These options are seldom used.
Example
sysuse nlsw88, clear gen ed = cond(grade< 12, 1, /// cond(grade==12, 2, /// cond(grade<16,3,4))) if grade < . gen byr = (1988-age-1950)/10 gen white = race == 1 if race < .
seqlogit10 ed byr south, /// ofinterest(white) over(byr) /// tree(1 : 2 3 4, 2 : 3 4, 3 : 4) /// levels(1=6, 2=12, 3=14, 4= 16) /// or
seqlogitdecomp10, /// overat(byr -.5, byr 0, byr .4) /// subtitle("1945" "1950" "1954") /// eqlabel(`""finish" "high school""' /// `""high school v" "some college""' /// `""some college v" "college""') /// xline(0) yline(0)
seqlogit10 ed byr south, /// ofinterest(white) over(byr) /// tree(1 : 2 3 4, 2 : 3 4, 3 : 4) /// or sd(1) uhdesc10
Author
Maarten L. Buis Wissenschaftszentrum Berlin für Sozialforschung, WZB Research unit Skill Formation and Labor Markets maarten.buis@wzb.eu
Suggested citation if using seqlogit10 in published work
seqlogit is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such.
Buis, Maarten L. 2007. "SEQLOGIT: Stata module to fit a sequential logit model" http://ideas.repec.org/c/boc/bocode/s456843.html
or:
Buis, Maarten L. 2010 ``Chapter 6, Not all transitions are equal: The relationship between inequality of educational opportunities and inequality of educational outcomes'', In: Buis, Maarten L. ``Inequality of Educational Outcome and Inequality of Educational Opportunity in the Netherlands during the 20th Century''. PhD thesis. http://www.maartenbuis.nl/dissertation/chap_6.pdf
Buis, maarten L. 2011 ``The Consequences of Unobserved Heterogeneity in a Sequential Logit Model'', Research in Social Stratification and Mobility, 29(3), pp. 247-262.
Acknowledgements I appreciate the useful comments I received at the 2009 Summer meeting of the RC28, and a bug report by Dominik Becker.
References
Agresti, Alan 2002 . Categorical Data Analysis, 2nd edition. Hoboken, NJ: Wiley-Interscience.
Buis, Maarten L. 2010a ``Chapter 6, Not all transitions are equal: The relationship between inequality of educational opportunities and inequality of educational outcomes'', In: Buis, Maarten L. ``Inequality of Educational Outcome and Inequality of Educational Opportunity in the Netherlands during the 20th Century''. PhD thesis. http://www.maartenbuis.nl/dissertation/chap_6.pdf
Buis, maarten L. 2011 ``The Consequences of Unobserved Heterogeneity in a Sequential Logit Model'', Research in Social Stratification and Mobility, 29(3), pp. 247-262. http://dx.doi.org/10.1016/j.rssm.2010.12.006
Fox, John 1997 Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks: Sage.
Maddala, G.S. 1983 Limited Dependent and Qualitative Variables in Econometrics Cambridge: Cambridge University Press.
Mare, Robert D. 1981 ``Change and Stability in educational Stratification'' American Sociological Review, 46(1), p.p. 72-87.
Shavit, Yossi and Hans-Peter Blossfeld 1993 Persistent Inequality: Changing Educational Attainment in Thirteen Countries Boulder: Westview Press.
Also see
Online: help for seqlogit10 postestimation, seqlogit10_sensitivity help for logit, mlogit