help mixlogit                                  (SJ7-4: st0133_1; SJ7-3: st0133)
-------------------------------------------------------------------------------

Title

mixlogit -- Mixed logit model

Syntax

mixlogit depvar [indepvars] [if] [in] [weight] , group(varname) rand( varlist) [id(varname) ln(#) corr nrep(#) burn(#) numerical level(#) constraints(numlist) maximize_options]

mixlpred newvar [if] [in] [, nrep(#) burn(#)]

mixlcov [, sd]

mixlbeta varlist [if] [in] , saving(filename) [replace nrep(#) burn(#)]

fweights, iweights, and pweights are allowed (see weight), but they are interpreted to apply to decision-makers, not to individual observations.

Description

mixlogit fits mixed logit models by using maximum simulated likelihood (Train 2003). See Hole (2007) for a description of the module with examples.

The following postestimation commands are available after mixlogit:

mixlpred calculates predicted probabilities. The predictions are available both in and out of sample; type mixlpred ... if e(sample) ... if predictions are wanted for the estimation sample only.

mixlcov calculates the elements in the coefficient covariance matrix along with their standard errors. This command is relevant only when the coefficients are specified to be correlated; see the corr option below. mixlcov is a wrapper for nlcom (see [R] nlcom).

mixlbeta calculates individual-level parameters corresponding to the variables in the specified varlist using the method proposed by Revelt and Train (2000) (see also Train, 2003, ch. 11). The individual-level parameters are stored in a data file specified by the user. As with mixlpred the predictions are available both in and out of sample; type mixlbeta ... if e(sample) ... if predictions are wanted for the estimation sample only.

Options for mixlogit

group(varname) is required and specifies a numeric identifier variable for the choice occasions.

rand(varlist) is required and specifies the independent variables whose coefficients are random. The random coefficients can be specified to be normally or lognormally distributed (see the ln() option). The variables immediately following the dependent variable in the syntax are specified to have fixed coefficients (see the examples below).

id(varname) specifies a numeric identifier variable for the decision makers. This option should be specified only when each individual performs several choices; i.e., the dataset is a panel.

ln(#) specifies that the last # variables in rand() have lognormally rather than normally distributed coefficients. The default is ln(0).

corr specifies that the random coefficients are correlated. The default is that they are independent. When the corr option is specified, the estimated parameters are the means of the (fixed and random) coefficients plus the elements of the lower-triangular matrix L, where the covariance matrix for the random coefficients is given by V = LL'. The estimated parameters are reported in the following order: the means of the fixed coefficients, the means of the random coefficients, and the elements of the L matrix. The mixlcov command can be used postestimation to obtain the elements in the V matrix along with their standard errors.

If the corr option is not specified, the estimated parameters are the means of the fixed coefficients and the means and standard deviations of the random coefficients, reported in that order. The sign of the estimated standard deviations is irrelevant. Although in practice the estimates may be negative, interpret them as being positive.

The sequence of the parameters is important to bear in mind when specifying starting values.

nrep(#) specifies the number of Halton draws used for the simulation. The default is nrep(50).

burn(#) specifies the number of initial sequence elements to drop when creating the Halton sequences. The default is burn(15). Specifying this option helps reduce the correlation between the sequences in each dimension. Train (2003, 230) recommends that # should be at least as large as the prime number used to generate the sequences. If there are K random coefficients, mixlogit uses the first K primes to generate the Halton draws.

numerical specifies that numerical gradients should be used instead of analytical gradients (the default). This option is useful for replicating the results from earlier versions of mixlogit but should otherwise not be used.

level(#); see estimation options.

constraints(numlist); see estimation options.

robust, cluster(varname); see estimation options. The cluster variable must be numeric.

maximize_options: difficult, technique(algorithm_spec), iterate(#), trace, gradient, showstep, hessian, tolerance(#), ltolerance(#) gtolerance(#), nrtolerance(#), from(init_specs); see maximize. technique(bhhh) is not allowed.

Options for mixlpred

nrep(#) specifies the number of Halton draws used for the simulation. The default is nrep(50).

burn(#) specifies the number of initial sequence elements to drop when creating the Halton sequences. The default is burn(15). Specifying this option helps reduce the correlation between the sequences in each dimension. Train (2003, 230) recommends that # should be at least as large as the prime number used to generate the sequences. If there are K random coefficients, mixlogit uses the first K primes to generate the Halton draws.

Option for mixlcov

sd reports the standard deviations of the correlated coefficients instead of the covariance matrix.

Options for mixlbeta

saving(filename) save individual-level parameters to filename.

replace overwrite filename.

nrep(#) specifies the number of Halton draws used for the simulation. The default is nrep(50).

burn(#) specifies the number of initial sequence elements to drop when creating the Halton sequences. The default is burn(15). Specifying this option helps reduce the correlation between the sequences in each dimension. Train (2003, 230) recommends that # should be at least as large as the prime number used to generate the sequences. If there are K random coefficients, mixlogit uses the first K primes to generate the Halton draws.

Examples

Consider the following example that contains the data for one individual who makes two choices. On the first choice occasion, the individual has three alternatives and on the second, four. choice is the dependent variable, and speed and cost are the independent variables or alternative attributes:

choice speed cost group id 0 5 3 1 1 1 8 4 1 1 0 6 3 1 1 0 3 2 2 1 0 2 2 2 1 1 5 4 2 1 0 6 4 2 1 A mixed logit model where speed has a normally distributed coefficient and cost has a fixed coefficient can be specified as follows:

. mixlogit choice cost, group(group) id(id) rand(speed)

A model where speed has a normally distributed coefficient and cost has a lognormally distributed coefficient can be specified as follows (given that the coefficient for cost is expected to be negative we generate a variable mcost = -cost since the lognormal distribution implies that the coefficient is positive):

. gen mcost = -cost . mixlogit choice, group(group) id(id) rand(speed mcost) ln(1)

mixlogit automatically generates starting values unless they are specified using the from() option. The starting values for the means are the estimated coefficients from a model where all coefficients are fixed (i.e., clogit), and the starting values for the standard deviations/elements in the L matrix are set to 0.1.

References

Hole AR. 2007. Fitting mixed logit models by using maximum simulated likelihood. The Stata Journal 7: 388-401.

Revelt D, Train K. 2000. Customer-specific taste parameters and mixed logit: Households' choice of electricity supplier. Working Paper, Department of Economics, University of California, Berkeley.

Train KE. 2003. Discrete Choice Methods with Simulation. Cambridge: Cambridge University Press.

Author

This command was written by Arne Risa Hole (a.r.hole@sheffield.ac.uk), Department of Economics, University of Sheffield. Comments and suggestions are welcome.

Also see

Manual: [R] clogit

Online: [R] clogit