```-------------------------------------------------------------------------------
help for mvprobit                            Cappellari and Jenkins (15jan2003)
-------------------------------------------------------------------------------

Multivariate probit models, estimated by Simulated Maximum Likelihood

mvprobit equation1 equation2 ... equationM [weight] [if exp] [in range] [,
draws(#) seed(#) beta0 atrho0(matrix_name) robust cluster(varname)
constraints(numlist) level(#) maximize_options ]

where each equation is specified as

( [eqname:] depvar [=] [varlist] [, noconstant] )

by ... : may be used with mvprobit; see help by.

pweights, aweights, fweights, and iweights are allowed; see help weights.

mvprobit shares the features of all estimation commands; see help est.

mvprobit typed without arguments redisplays the last estimates.  The level
option may be used.

Predictions based on mvprobit estimates, including predicted joint and marginal
probabilities, can be derived using mvppred.

Description

mvprobit estimates M-equation probit models, by the method of simulated maximum
likelihood (SML). (Cf. probit and biprobit which estimate 1-equation and
2-equation probit models by maximum likelihood.) The variance-covariance matrix
of the cross-equation error terms has values of 1 on the leading diagonal, and
the off-diagonal elements are correlations to be estimated (rhoji = rhoij, and
rhoii = 1, for all i = 1,...,M).

mvprobit uses the Geweke-Hajivassiliou-Keane (GHK) simulator to evaluate the
M-dimensional Normal integrals in the likelihood function. For each
observation, a likelihood contribution is calculated for each replication, and
the simulated likelihood contribution is the average of the values derived from
all the replications. The simulated likelihood function for the sample as a
whole is then maximized using standard methods (ml in this case). For a brief
description of the GHK smooth recursive simulator, see Greene (2000: 183-185),
who also provides references to the literature.

Under standard conditions, the SML estimator is consistent as the number of
observations and the number of draws tend to infinity, and is asymptotically
equivalent to the true maximum likelihood estimator as the ratio of the square
root of the sample size to the number of draws tends to zero. Thus, other
things equal, the more draws, the better. In practice, however, it has been
observed that a relatively small number of draws may work well for `smooth'
likelihoods. An integer number corresponding to the square root of the number
of observations is often used for the number of random draws. For small sample
sizes, a larger number of draws may be required.

Estimation is numerically intensive, and may be very slow if the data set is
large, if the number of draws is large, or (especially) if the number of
equations is large. Users may also need to set matsize and set memory to values
above the default ones. (See help for matsize and memory.) Use of the atrho0
option may speed up convergence.

Models for which the matrix of rhos is close to not being positive definite are
likely to be difficult to maximize. (The Cholesky factorization used by SML
requires positive definiteness.) This is more likely if |rhoji| is close to
one. In these cases, ml may report difficulties calculating numerical
derivatives and a non-concave log-likelihood. In difficult maximization
problems, the message "Warning: cannot do Cholesky factorization of rho matrix"
may appear between iterations. It may be safely ignored if the maximization
proceeds to a satisfactory conclusion. Results may differ depending on the sort
order of the data, because the sort order affects which values of the random
variable(s) get allocated to which observation. (Be assured, however, that
mvprobit does not change the sort order of the data.) This potential problem is
reduced, the larger the number of random draws that is used.

Options

draws(#) specifies the number of random variates drawn when calculating the
simulated likelihood. The default is 5. (See the discussion above
concerning the number of draws.)

seed(#) specifies the initial value of the (pseudo-)random-number seed used by
the uniform() function in the simulation process. The value should be an
integer (the default value is 123456789). Warning: if the number of draws
is 'small', changes in the seed value may lead to surprisingly large
changes in estimates.

beta0 specifies that the estimates of the marginal probit regressions (used to
provide starting values) are reported.

atrho0(matrix_name) allows users to specify starting values for the
off-diagonal elements of the rho matrix that are different from the default
values (which are all zero).  More precisely, the matrix matrix_name
contains values of the incidental parameter in each /atrhoji equation, i.e.
atanh(rhoji) = .5*ln((1+rhoji)/(1-rhoji)). Matrix matrix_name must have
properly named column names. E.g. if a starting value in /atrho21 is being
set, one would first use the command matrix matrix_name = (value), followed
by matrix colnames matrix_name = atrho21:_cons. Between 1 and M(M-1)/2
/atrhoji starting values may be specified, where j = 2,...,M, and i < j.
One likely source for a non-default starting value for atrhoji is the
/athrho parameter estimate from the biprobit model corresponding to
equations j and i of the full mvprobit model.

robust specifies that the Huber/White/sandwich estimator of variance is to be
used in place of the traditional calculation; see [U] 23.11 Obtaining
robust variance estimates.  robust combined with cluster() allows
observations which are not independent within cluster (although they must
be independent between clusters).  If you specify pweights, robust is
implied.

cluster(varname) specifies that the observations are independent across groups
(clusters) but not necessarily within groups.  varname specifies to which
group each observation belongs; e.g., cluster(personid) in data with
repeated observations on individuals.  See [U] 23.11 Obtaining robust
variance estimates. cluster() can be used with pweights to produce
estimates for unstratified cluster-sampled data.  Specifying cluster()
implies robust.

noconstant suppresses the constant term (intercept) in the relevant regression.

constraints(numlist) specifies the linear constraints to be applied during
estimation.  Constraints are defined using the constraint command and are
numbered; see help constraint. The default is to perform unconstrained
estimation.

level(#) specifies the confidence level, in percent, for the confidence
intervals of the coefficients; see help level.

maximize_options control the maximization process; see help maximize.  Use of
them is likely to be rare.

Saved results

In addition to the usual results saved after ml, mvprobit also saves the
following:

e(draws) is the number of random draws used when simulating probabilities.

e(seed) is the initial seed value used by the random-number generator.

e(neqs) is the number of equations in the M-equation model.

e(ll0) is the log-likehood for the comparison model (the sum of the
log-likelihoods from the marginal univariate probit models corresponding to
each equation).

e(chi2_c) is chi-square test statistic for the likelihood ratio test of the
multivariate probit model against the comparison model.

e(nrho) is the number of estimated rhos (the degrees of freedom for the
likelihood ratio test against the comparison model).

e(rhoji) is the estimate of correlation ji in the variance-covariance matrix of
cross-equation error terms.

e(serhoji) is the estimated standard error of correlation ji.

e(rhsi) is the list of explanatory variables used in equation i. This list does
not include the constant term, regardless of whether there is one is
implied by equation i.

e(nrhsi) is number of explanatory variables in equation i.  This number
includes the constant term if there is one implied by equation i.

Examples

. use http://www.stata-press.com/data/r7/school.dta, clear

. mvprobit (private = years logptax loginc) (vote = years logptax

. mvprobit (private = years logptax loginc) (vote = years logptax,
nocons), nolog

. mvprobit (private years logptax loginc) (vote years logptax, nocons),
beta0

. constraint define 1 [private]loginc = 0.4

. mvprobit (private = years logptax loginc) (vote = years logptax

. mvprobit (private = years logptax loginc) (vote = years logptax

logptax)(pub5 = ), dr(10)

Authors

Lorenzo Cappellari, Universita del Piemonte-Orientale, Italy
<Lorenzo.Cappellari@eco.unipmn.it>

Stephen P. Jenkins, ISER, University of Essex, U.K.
<stephenj@essex.ac.uk>

Acknowledgements

Thanks to Nick Cox and Weihua Guan for comments and suggestions.  Much of
our code for syntax handling and display of results was inspired by code
used in biprobit.

References

Greene, W.H. (2000), Econometric Analysis, Fourth edition, Prentice-Hall

Also see

Manual:  [U] 23 Estimation and post-estimation commands,
[U] 29 Overview of model estimation in Stata,
[R] biprobit

On-line:  help for constraint, est, postest, ml, biprobit, probit, and (if
installed) triprobit.

```