------------------------------------------------------------------------------- help formvprobitCappellari and Jenkins (15jan2003) -------------------------------------------------------------------------------

Multivariate probit models, estimated by Simulated Maximum Likelihood

mvprobitequation1equation2 ...equationM[weight] [ifexp] [inrange] [,draws(#)seed(#)beta0atrho0(matrix_name)robustcluster(varname)constraints(numlist)level(#)maximize_options]where each equation is specified as

([eqname:]depvar[=] [varlist] [,noconstant])

by...:may be used withmvprobit; see help by.

pweights,aweights,fweights, andiweights are allowed; see help weights.

mvprobitshares the features of all estimation commands; see help est.

mvprobittyped without arguments redisplays the last estimates. The level option may be used.Predictions based on

mvprobitestimates, including predicted joint and marginal probabilities, can be derived using mvppred.

Description

mvprobitestimatesM-equation probit models, by the method of simulated maximum likelihood (SML). (Cf.probitandbiprobitwhich estimate 1-equation and 2-equation probit models by maximum likelihood.) The variance-covariance matrix of the cross-equation error terms has values of 1 on the leading diagonal, and the off-diagonal elements are correlations to be estimated (rhoji= rhoij, and rhoii= 1, for alli= 1,...,M).

mvprobituses the Geweke-Hajivassiliou-Keane (GHK) simulator to evaluate theM-dimensional Normal integrals in the likelihood function. For each observation, a likelihood contribution is calculated for each replication, and the simulated likelihood contribution is the average of the values derived from all the replications. The simulated likelihood function for the sample as a whole is then maximized using standard methods (mlin this case). For a brief description of the GHK smooth recursive simulator, see Greene (2000: 183-185), who also provides references to the literature.Under standard conditions, the SML estimator is consistent as the number of observations and the number of draws tend to infinity, and is asymptotically equivalent to the true maximum likelihood estimator as the ratio of the square root of the sample size to the number of draws tends to zero. Thus, other things equal, the more draws, the better. In practice, however, it has been observed that a relatively small number of draws may work well for `smooth' likelihoods. An integer number corresponding to the square root of the number of observations is often used for the number of random draws. For small sample sizes, a larger number of draws may be required.

Estimation is numerically intensive, and may be very slow if the data set is large, if the number of draws is large, or (especially) if the number of equations is large. Users may also need to

set matsizeandset memoryto values above the default ones. (See help for matsize and memory.) Use of theatrho0option may speed up convergence.Models for which the matrix of rhos is close to not being positive definite are likely to be difficult to maximize. (The Cholesky factorization used by SML requires positive definiteness.) This is more likely if |rho

ji| is close to one. In these cases,mlmay report difficulties calculating numerical derivatives and a non-concave log-likelihood. In difficult maximization problems, the message "Warning: cannot do Cholesky factorization of rho matrix" may appear between iterations. It may be safely ignored if the maximization proceeds to a satisfactory conclusion. Results may differ depending on the sort order of the data, because the sort order affects which values of the random variable(s) get allocated to which observation. (Be assured, however, thatmvprobitdoes not change the sort order of the data.) This potential problem is reduced, the larger the number of random draws that is used.

Options

draws(#)specifies the number of random variates drawn when calculating the simulated likelihood. The default is 5. (See the discussion above concerning the number of draws.)

seed(#)specifies the initial value of the (pseudo-)random-number seed used by theuniform()function in the simulation process. The value should be an integer (the default value is 123456789). Warning: if the number of draws is 'small', changes in the seed value may lead to surprisingly large changes in estimates.

beta0specifies that the estimates of the marginal probit regressions (used to provide starting values) are reported.

atrho0(matrix_name)allows users to specify starting values for the off-diagonal elements of the rho matrix that are different from the default values (which are all zero). More precisely, the matrixmatrix_namecontains values of the incidental parameter in each /atrhojiequation, i.e. atanh(rhoji) = .5*ln((1+rhoji)/(1-rhoji)). Matrixmatrix_namemust have properly named column names. E.g. if a starting value in /atrho21is being set, one would first use the commandmatrixmatrix_name= (value), followed bymatrix colnamesmatrix_name= atrho21:_cons. Between 1 andM(M-1)/2 /atrhojistarting values may be specified, wherej= 2,...,M, andi<j. One likely source for a non-default starting value for atrhojiis the /athrho parameter estimate from thebiprobitmodel corresponding to equationsjandiof the fullmvprobitmodel.

robustspecifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation; see[U] 23.11 Obtainingrobust variance estimates.robustcombined withcluster()allows observations which are not independent within cluster (although they must be independent between clusters). If you specify pweights,robustis implied.

cluster(varname)specifies that the observations are independent across groups (clusters) but not necessarily within groups.varnamespecifies to which group each observation belongs; e.g.,cluster(personid)in data with repeated observations on individuals. See[U] 23.11 Obtaining robustvariance estimates.cluster()can be used with pweights to produce estimates for unstratified cluster-sampled data. Specifyingcluster()impliesrobust.

noconstantsuppresses the constant term (intercept) in the relevant regression.

constraints(numlist)specifies the linear constraints to be applied during estimation. Constraints are defined using theconstraintcommand and are numbered; see help constraint. The default is to perform unconstrained estimation.

level(#)specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level.

maximize_optionscontrol the maximization process; see help maximize. Use of them is likely to be rare.

Saved resultsIn addition to the usual results saved after

ml,mvprobitalso saves the following:

e(draws)is the number of random draws used when simulating probabilities.

e(seed)is the initial seed value used by the random-number generator.

e(neqs)is the number of equations in theM-equation model.

e(ll0)is the log-likehood for the comparison model (the sum of the log-likelihoods from the marginal univariate probit models corresponding to each equation).

e(chi2_c)is chi-square test statistic for the likelihood ratio test of the multivariate probit model against the comparison model.

e(nrho)is the number of estimated rhos (the degrees of freedom for the likelihood ratio test against the comparison model).

e(rhoji)is the estimate of correlationjiin the variance-covariance matrix of cross-equation error terms.

e(serhoji)is the estimated standard error of correlationji.

e(rhsi)is the list of explanatory variables used in equationi. This list does not include the constant term, regardless of whether there is one is implied by equationi.

e(nrhsi)is number of explanatory variables in equationi. This number includes the constant term if there is one implied by equationi.

Examples. use http://www.stata-press.com/data/r7/school.dta, clear

. biprobit (private = years logptax loginc) (vote=years logptax loginc)

. mvprobit (private = years logptax loginc) (vote = years logptax loginc), dr(15)

. mvprobit (private = years logptax loginc) (vote = years logptax, nocons), nolog

. mvprobit (private years logptax loginc) (vote years logptax, nocons), beta0

. constraint define 1 [private]loginc = 0.4

. mvprobit (private = years logptax loginc) (vote = years logptax loginc, nocons), constraint(1)

. mvprobit (private = years logptax loginc) (vote = years logptax loginc) (pub12 = years loginc)

. mvprobit (private = loginc logptax) (vote = loginc logptax)(school = logptax)(pub5 = ), dr(10)

AuthorsLorenzo Cappellari, Universita del Piemonte-Orientale, Italy <Lorenzo.Cappellari@eco.unipmn.it>

Stephen P. Jenkins, ISER, University of Essex, U.K. <stephenj@essex.ac.uk>

AcknowledgementsThanks to Nick Cox and Weihua Guan for comments and suggestions. Much of our code for syntax handling and display of results was inspired by code used in

biprobit.

ReferencesGreene, W.H. (2000),

Econometric Analysis, Fourth edition, Prentice-Hall International, Upper Saddle River NJ.

Also seeManual:

[U] 23 Estimation and post-estimation commands,[U] 29 Overview of model estimation in Stata,[R] biprobitOn-line: help for constraint, est, postest, ml, biprobit, probit, and (if installed) triprobit.