{smcl}
{* MLB 22Mar2012}{...}
{* MLB 10Apr2010}{...}
{* MLB 14Jul2009}{...}
{hline}
help for {hi:seqlogit}
{hline}

{title:Sequential logit model}

{p 8 17 2}
{cmd:seqlogit} {depvar} [{indepvars}] {ifin} {weight} 
{cmd:,} 
{opt tree(tree)}
[ 
{opt ofint:erest(varname)}
{opt over(varlist)}
{opt sd(numlist)}
{opt deltasd(varname numlist)}
{opt rho(#)}
{cmd:{c -(} }
{opt pr(numlist)} |
{opt mn(# # , # # [, # #, etc.])} |
{opt uniform}
{cmd:{c )-}} 
{opt draws(#)}
{opt drawstart(#)}
{cmd:or}
{opt c:onstraints(numlist)}
{cmdab:r:obust}
{opt cl:uster(clustervar)}
{cmd:nolog}
{opt l:evel(#)} 
{it:{help seqlogit##maximize_options:maximize_options}} 
]

{p 4 4 2}{cmd:by} {it:...} {cmd::} may be used with {cmd:seqlogit}; see help
{help by}. 

{p 4 4 2}{cmd:pweight}s, {cmd:fweight}s and {cmd:iweight}s are allowed; see 
help {help weights}.


{title:Description}

{p 4 4 2} {cmd:seqlogit} fits a sequential logit model. This model is know 
under a variety of other names: sequential response model (maddala 1983), 
continuation ratio logit (Agresti 2002), model for nested dichotomies 
(fox 1997), and the Mare model (shavit and blossfeld93) (after (Mare 1981)). 

{p 4 4 2}A sequential logit model can be estimated quite simply by estimating 
a number of {help logit} models. The {cmd:seqlogit} package serves three 
additional purposes: First, it makes it easier to {help test} hypotheses across
transitions since the entire model is estimated simultaneously. Second, it
implements the decomposition proposed by Buis (2010a) of the effect of an 
explanatory variable on the outcome of the process described by the sequential 
logit into the contributions of each of the transitions. The implementation is 
discussed in {help seqlogit postestimation}. Third, it implements and extends
the strategy proposed by Buis (2011) of doing a sensitivity analysis to 
investigate the potential influence of unobserved variables.

{p 4 4 2} For this last purpose, the {cmd:seqlogit} package allows own to 
estimate a sequential logit given a scenario concerning the unobserved 
variables. These effects will only be estimated when the {opt sd()} option is
specified. A regular sequential logit model, which assumes that there is no 
unobserved heterogeneity, is estimated if the {opt sd()} option is not 
specified. The scenarios assume that these unobserved variables either add up 
to a standardized normally (Gaussian) distributed variable (the default), or 
to a standardized discrete variable (when the {opt pr()} option is specified), or
a mixture of normal distributions (when the {opt mn()} option is specified), or
a uniform distribution (when the {opt uniform} distribution is specified. The 
effects of this agregate unobserved variable during each transition are specified 
in the {opt sd()} option. One can allow the effect of the unobserved variable to
change over another variable using the {opt deltasd()} option. The correlation 
during the first transition between this unobserved variable and the variable 
specified in the {cmd:ofinterest()} option is specified in the {cmd:rho()} 
option. The scenarios are estimated using maximum simulated likelihood, 
while the regular sequential logit model is estimated using regular maximum 
likelihood. Advise on how to use these different scenarios in a sensitivity
analysis is given here: {help seqlogit_sensitivity}.


{title:Options}

{dlgtab:Model}

{phang}
{opt tree(tree)} specifies the sequence of transitions that make up the model.
The transitions are seperated with commas and the choices within transitions
are seperated with colons. The levels are represented by the levels of the
{it:depvar}. It is thus convenient to code {it:depvar} as a series of integers. For
example, say there are three levels, 1, 2, and 3, and the first transition 
consist of a choice between value 1 versus values 2 or 3, and the second 
transition consists (for those who didn't choose value 1) of a choice between 
values 2 and 3. The tree option should than be: tree(1 : 2 3 , 2 : 3).

{p 8 8 2}All values of {it:depvar} must be specified in the tree and all 
values in the tree must occur in {it:depvar}. Furthermore, all levels must be 
reachable through one and only one path through the tree. 

{phang}
{opt ofint:erest(varname)} specifies the variable whose effect will be 
decomposed when using the 
{helpb seqlogit postestimation##seqlogitdecomp:seqlogitdecomp} command. 
The variable specified is added to the list of explanatory variables. 

{phang}
{opt over(varlist)} specifies the variable(s) over which the effect of 
the variable specified in the {opt ofinterest()} option is allowed to change.
This/these variable(s) and the interaction effect between the variable(s) 
spefied in {opt over()} and {opt ofinterest()} are added to the list of 
explanatory variables. {opt ofinterest()} needs to be specified when
specifying {opt over()}.

{phang}
{opt c:onstraints(numlist)} specifies linear constraints to be applied during
estimation, see {helpb constraint}.


{dlgtab:Scenarios}

{phang}
{opt sd(numlist)} specifies for each transition the effect of the 
standardized unobserved variable. If only one number is specified, this
effect will be assumed constant over transitions, otherwise the first
number will refer to the first transition, the second number to the second
transition, etc. The default is 0.

{phang}
{opt deltasd(varname numlist)} specifies how the effect of the unobserved
variable changes over {it:varname}. If only one number is specified,
this change will be assumed constant over transitions, otherwise the first
number will refer to the first transition, the second number to the second
transition, etc. The default is that the effect of the unobserved variable
is constant over all variables.

{phang}
{opt rho(#)} specifies the correlation of the unobserved variable and the 
variable specified in {cmd:ofinterest()}. The default is 0.

{phang}
The {cmd:pr()}, {cmd:mn()}, and {cmd:uniform} options govern the distribution
of the unobserved variable. They are mutually exclusive. If the {cmd:pr()}, 
{cmd:mn()}, and {cmd:uniform} are not specified but the {cmd:sd()} option is 
specified, than the unobserved variable will be represented by a standard 
normal distribution.

{phang}
{opt pr(numlist)} specifies that the unobserved variable is to be represented
by a discrete distribution. The numbers in {cmd:pr()} represent the 
proportion of observations that belong to each category. Since the numbers are
propotions, they all need to be larger than 0 and they need to add up to 1. 
The location of these categories will be chosen such that the mean is 0, the 
standard deviation is 1, and all categories are separated by the same distance.
The {cmd:pr()} option may not be specified without specifying the {cmd:sd()} 
option. 

{phang}
{opt mn(# # , # # [, # #, etc.])} specified that the unobserved variable is to
be represented as a mixture of normal distributions. This option should consist
of multiple elements separated by commas, whereby each element consists of 
two numbers, the first is the proprotion the latter a mean of one of the 
components of the mixture distribution. The proportions should add up to 1. The
means of the components will be transformed such that the mean of the of the 
distribution equals 0. The variances of the components are assumed to be equal
and are choses such that the overall variance equals 1. This means that not
all proportion-mean combination will lead to valid mixture distributions, in
which case {cmd:seqlogit} will report an error and stop. 

{phang}
{opt uniform} specifies that the unobserved variable is to be represented by
a uniform distribution with mean 0 and standard deviation 1. This could make
sense when we think that it is the rank order on the unobserved variables that
influences the outcome rather than some metric value, as the distribution of
rank scores is a uniform distribution.  This assumption could be justified  
when we assume that the unobserved variables are also be hard to observe for 
the actors themselves, and that the rank score is easier to observe (Jill is
smarter than Jack, we just don't know how much smarter.). 

{phang}
{opt draws(#)} specifies the number of pseudo random draws per observation used 
when calculating the simulated likelihood. The default is 100. Because maximum 
simulated likelihood is only used when the {cmd:sd()} option is specified, the 
{opt draws()} option can only be specified when the {cmd:sd()} option is specified.

{phang}
{opt drawstart(#)} specifies the index at which the Halton sequence starts. The
default is 15.


{dlgtab:Reporting}

{phang}
{opt or} report odds ratios 

{phang}
{opt r:obust} specifies that the Huber/White/sandwich estimator
of variance is to be used in place of the traditional calculation; see
{hi:[U] 23.14 Obtaining robust variance estimates}.  {cmd:robust}
combined with {cmd:cluster()} allows observations which are not
independent within cluster (although they must be independent between
clusters). 

{phang}
{opt c:luster(clustervar)} specifies that the observations
are independent across groups (clusters) but not necessarily within groups.
{it:clustervar} specifies to which group each observation belongs; e.g.,
{cmd:cluster(personid)} in data with repeated observations on individuals.  See
{hi:[U] 23.14 Obtaining robust variance estimates}.  Specifying {cmd:cluster()}
implies {cmd:robust}.

{phang}
{opt l:evel(#)} specifies the confidence level, in percent,
for the confidence intervals of the coefficients; see help {help level}.

{phang}
{opt nolog} suppresses an iteration log of the log likelihood

{marker maximize_options}{...}
{phang}
{opt maximize_options}:
{opt diff:icult},
{opt tech:nique(algorithm_spec)},
{opt iter:ate(#)},
{opt tr:ace},
{opt grad:ient},
{opt showstep},
{opt hess:ian},
{opt shownr:tolerance},
{opt tol:erance(#)},
{opt ltol:erance(#)},
{opt gtol:erance(#)},
{opt nrtol:erance(#)},
{opt nonrtol:erance(#)};
see {help maximize}.  These options are seldom used.


{title:Example}

{cmd}
    sysuse nlsw88, clear
    gen ed = cond(grade< 12, 1, ///
             cond(grade==12, 2, ///
             cond(grade<16,3,4))) if grade < .
    gen byr = (1988-age-1950)/10
    gen white = race == 1 if race < .

    seqlogit ed byr south,                   ///   
             ofinterest(white) over(byr)     ///
             tree(1 : 2 3 4, 2 : 3 4, 3 : 4) ///
             levels(1=6, 2=12, 3=14, 4= 16)  ///
             or

    seqlogitdecomp,                                     ///
             overat(byr -.5, byr 0, byr .4)             ///
             subtitle("1945" "1950" "1954")             ///
             eqlabel(`""finish" "high school""'         ///
                     `""high school v" "some college""' ///
                     `""some college v" "college""')    ///
             xline(0) yline(0)  

    seqlogit ed byr south,                   ///   
             ofinterest(white) over(byr)     ///
             tree(1 : 2 3 4, 2 : 3 4, 3 : 4) ///
             or sd(1)
    
    uhdesc
{txt}


{title:Author}

{p 4 4 2}Maarten L. Buis, Universitaet Tuebingen{break}maarten.buis@uni-tuebingen.de


{title:Suggested citation if using seqlogit in published work}

{p 4 4 2}
{cmd:seqlogit} is not an official Stata command. It is a free contribution 
to the research community, like a paper. Please cite it as such.

{p 4 4 2}
Buis, Maarten L.  2007. "SEQLOGIT: Stata module to fit a sequential logit model" 
{browse "http://ideas.repec.org/c/boc/bocode/s456843.html"}

{p 4 4 2} or:

{p 4 4 2}
Buis, Maarten L. 2010
``Chapter 6, Not all transitions are equal: The relationship between inequality of 
educational opportunities and inequality of educational outcomes'', In:
Buis, Maarten L. ``Inequality of Educational Outcome and Inequality of 
Educational Opportunity in the Netherlands during the 20th Century''.
PhD thesis.
{browse "http://www.maartenbuis.nl/dissertation/chap_6.pdf"}

{p 4 4 2}
Buis, maarten L. 2011 
``The Consequences of Unobserved Heterogeneity in a Sequential Logit Model'', 
Research in Social Stratification and Mobility, 29(3), pp. 247-262.


{title:Acknowledgements}
{pstd}
I appreciate the useful comments I received at the 2009 Summer meeting of the 
RC28, and a bug report by Dominik Becker.


{title:References}

{p 4 4 2}
Agresti, Alan 2002 . 
{it:Categorical Data Analysis, 2nd edition.}
Hoboken, NJ: Wiley-Interscience. 

{p 4 4 2}
Buis, Maarten L. 2010a
``Chapter 6, Not all transitions are equal: The relationship between inequality of 
educational opportunities and inequality of educational outcomes'', In:
Buis, Maarten L. ``Inequality of Educational Outcome and Inequality of 
Educational Opportunity in the Netherlands during the 20th Century''.
PhD thesis.
{browse "http://www.maartenbuis.nl/dissertation/chap_6.pdf"}

{p 4 4 2}
Buis, maarten L. 2011 
``The Consequences of Unobserved Heterogeneity in a Sequential Logit Model'', 
Research in Social Stratification and Mobility, 29(3), pp. 247-262.
{browse "http://dx.doi.org/10.1016/j.rssm.2010.12.006"}

{p 4 4 2}
Fox, John 1997
{it:Applied Regression Analysis, Linear Models, and Related Methods.}
Thousand Oaks: Sage.

{p 4 4 2}
Maddala, G.S. 1983
{it:Limited Dependent and Qualitative Variables in Econometrics}
Cambridge: Cambridge University Press.

{p 4 4 2}
Mare, Robert D. 1981
``Change and Stability in educational Stratification''
{it:American Sociological Review}, 46(1), p.p. 72-87.

{p 4 4 2}
Shavit, Yossi and Hans-Peter Blossfeld 1993
{it:Persistent Inequality: Changing Educational Attainment in Thirteen Countries}
Boulder: Westview Press.

{title:Also see}

{p 4 13 2}
Online: help for {help seqlogit postestimation}, {help seqlogit_sensitivity} help for {help logit}, {help mlogit}