help lclogit -------------------------------------------------------------------------------

Title

lclogit -- Latent class logit model via EM algorithm

Syntax

lclogit depvar [indepvars] [if] [in], group(varname) id(varname) nclasses(#) [, membership(varlist) convergence(#) iterate(#) constraints(Class#1 numlist: Class#2 numlist: ...) seed(#) nolog]

Description

lclogit fits latent class conditional logit models through an Expectation-Maximisation algorithm proposed in Bhat (1997) and Train (2008). The data setup is the same as for clogit.

Note: Maartin Buis's fmlogit (findit fmlogit) needs to be installed before membership(varlist) is used to let the class shares depend on the choice maker's characteristics.

Options for lclogit

group(varname) is required and specifies a numeric identifier variable for the choice situations.

id(varname) is required and specifies a numeric identifier variable for the choice makers or agents. With cross section data users should specify the same variable for both id(varname) and group(varname)

nclasses(#) specifies the number of latent classes used in the estimation. A minimum of 2 latent classes is required.

membership(varlist) specifies indepedent variables to enter the fractional multinomial logit model of class membership. These variables are assumed to be constant across alternatives and choice occasions for the same agent, age and household income being typical examples.

convergence(#) specfies the tolerance for the log likelihood. When the proportional increase in the log likelihood over the last five iterations is less than the specified criterion, lclogit declares convergence. The default is 0.00001.

iterate(#) specifies the number of maximum iterations. The default is 150.

seed(#) sets the seed for pseudo uniform random numbers. The default is c(seed).

The starting values for the taste coefficients are obtained by splitting the sample into nclasses() different subsamples and estimating a clogit model for each of them. During this process, a pseudo uniform random number is generated for each agent to assign the agent into a particular subsample. As for the starting values for the class shares, lclogit uses equal shares, i.e. 1/nclasses().

constraints(Class#1 numlist: Class#2 numlist: ...) specifies constraints to be imposed on the taste coefficients of the desginated classes. For instance, suppose that x1 and x2 are attributes included among indepvars and the user wishes to restrict the coefficient on x1 to zero for Class3 and Class4, and the coefficient on x2 to 2 for Class4. The relevant series of commands would look like:

constraint 1 x1 = 0

constraint 2 x2 = 2

lclogit depvar indepvars [if] [in] , group(varname) id(varname) constraints(Class3 1: Class4 1 2) nclasses(8)

nolog suppresses the display of an iteration log.

Example

Consider the following example that contains the first rows from the data used in Huber and Train (2001). pid is the agent, gid the choice situation, y the dependent variable and contract, local, wknown, tod and seasonal are alternative-specific attributes:

pid gid y price contract local wknown tod seasonal 1 1 0 7 5 0 1 0 0 1 1 0 9 1 1 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 0 5 0 1 1 0 1 2 0 7 0 0 1 0 0 1 2 0 9 5 0 1 0 0 1 2 1 0 1 1 0 1 0 1 2 0 0 5 0 0 0 1 >

lclogit can be particularly useful for the nonparametric estimation of mixing distributions. Indeed, when the number of latent classes increases, the true mixing distribution of the coefficients can be approximated nonparametrically.

Latent class models have been estimated via gradient-based algorithms, such as Newton-Raphson or BHHH. However, the estimation through standard optimization techniques becomes difficult when the number of parameters increases. In this case an EM procedure could help as it requires the repeated evaluation of a function that is far easier to maximize.

Clearly, the first goal when dealing with latent class models is to determine the optimal number of latent classes. Train (2008) bases this decision on goodness-of-fit measures such as the AIC or the BIC. Here we show how to determine the optimal number of latent classes using lclogit and the BIC:

. use http://fmwww.bc.edu/repec/bocode/t/traindata.dta, clear 2. forvalues c=2/11{ 3. lclogit y price contract local wknown tod seasonal, id(pid) gr(gid) ncl(`c') 4. scalar bic_`c'=e(bic) 5. } 6. forvalues c=2/11{ 7. display bic_`c' 8. }

Saved results

lclogit saves the following in e():

Scalars e(N) number of observations e(N_g) number of choice situations identifed by group() e(N_i) number of agents identifed by id() e(nclasses) number of latent classes e(ll) log likelihood e(bic) Bayesian information criterion e(aic) Akaike information criterion e(caic) Consistent Akaike information criterion

Macros e(cmd) lclogit e(title) Model estimated via EM algorithm e(group) name of group() variable e(id) name of id() variable e(indepvars) names of independent variables in the choice model e(indepvars2) names of independent variables in the class membership model e(seed) pseudo random number seed

Matrices e(b) vector of taste coefficients followed by class membership model coefficients e(B) matrix of taste coefficients e(P) vector of (estimation sample average) class shares e(PB) vector of weighted average choice model coefficients, where weights = class shares e(CB) (estimation sample average) covariance matrix of choice model coefficients e(Cns) constraints matrix Functions e(sample) marks estimation sample

Reference Bhat, C., 1997. An endogenous segmentation mode choice model with an application to intercity travel. Transportation Science 31, 34-48.

Train, K., 2008. EM Algorithms for Nonparametric Estimation of Mixing Distributions. Journal of Choice Modelling 1 (1) 40-69.

Huber, J. and K. Train, 2001. On the similarity of classical and bayesian estimates of individual mean partworths, Marketing Letters 12, 259-269.

Authors

This command was written by Daniele Pacifico and Hong il Yoo. Comments and suggestions are welcome. Daniele Pacifico (daniele.pacifico@tesoro.it): Italian Department of the Treasury, Italy. Hong il Yoo (h.yoo@unsw.edu.au): School of Economics, University of New South Wales, Australia.

Also see

Online: [R] lclogit, lclogit postestimation, fmlogit