help cmp-------------------------------------------------------------------------------TitleConditional (recursive) mixed process estimator with multilevel random effects and coefficients

Syntax

cmp setup- or -

cmpeq[eq ...] [if] [in] [weight],indicators(exp[exp...]) [level(#)quietlynolrtestredraws(# [# ...] , [type(halton|hammersley|ghalton|random)antitheticsscramblesteps(#)])lfghkdraws([#],[type(halton|hammersley|ghalton|random)antitheticsscramble])nodropinteractiveinit(vector)noestimatecovariance(covopt[covopt...])structuralreversepsampling(# #)ml_optssvysvy_opts]where

covoptis

unstructured |exchangeable |independentand

eqis[

fe_equation] [||re_equation] [||re_equation...]Each

fe_equationis an equation to be estimated, defined largely according to the ml modeleqsyntax. That is,fe_equationis enclosed in parentheses, optionally prefixed with a name for the equation:

([eqname:]varname_y[varname_y2]=[indepvars] [,noconstantoffset(varname_o)exposure(varname_e)truncpoints(exp exp)iia])

indepvarsmay include factor variables; see fvvarlist.

varname_y2is included only for interval-censored data, in a syntax analogous to that of intreg.truncpoints(exp exp)is allowed for equations involving all model types except mulitnomial and rank-ordered probit.iiais meaningful only for multinomial probit models without alternative-specific regressors.Each

expin the requiredindicators()option is an expression that evaluates to acmpindicator variable, which communicates required observation-level information about the dependent variable(s) in the corresponding equation, and can be a constant, a variable name, or a complex mathematical expression. It can contain spaces or parentheses if it is double-quoted. Eachexpmust evaluate to 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9, potentially varying by observation. For a multinomial probit equation group with alternative-specific regressors, the corresponding indicator expressions should all evaluate to 0's and 6's, and should be enclosed in an additional pair of parentheses. The same goes for rank-ordered probit groups, except with 9's instead of 6's.In random effect/coefficent models, each

re_equationspecifies the effects, potentially at multiple levels, according to the syntax

levelvar:[varlist] [,noconstantcovariance(covopt)] [weight]where

covoptis as defined above. In these models, theredraws()option is mandatory.

cmpmay be prefixed with svy ... :.

pweights,fweights,aweights, andiweights are allowed at all levels; see help weights. Group-level, weights, specificed in there_equationsyntax above, should be constant within groups. Weights for a given level need be specified in just one equation.The syntax of predict following

cmpis

predict[type] {newvarname|stub*|newvarlist} [ifexp] [inrange] [,statisticequation(eqno[,eqno])outcome(outcome)nooffset]where

statisticisxb,pr,stdp,stddp,lnl,scores,residuals,e(ab), orystar(a b); andaandbmay be numbers or variables;amissing (a>.) means minus infinity, andbmissing (b>.) means plus infinity; see missing.

cmpshares features of all estimation commands; see help estcom.

Description

cmpfits a large family of multi-equation, multi-level, conditional recursive mixed-process estimators. Those adjectives are defined as follows:* "Multi-equation" means that it can fit Seemingly Unrelated (SUR) and instrumental variables (IV) systems.

* "Multi-level" means that random coefficients and effects (intercepts) can be modelled at various levels in hierarchical fashion, the classic example being a model of education outcomes with unobserved school and class effects. Since the models can also be multi-equation, random effects at a given level are allowed by default to be correlated across equations. E.g., school and class effects may be correlated across outcomes such as math and readings scores. Effects at different levels, however, are assumed uncorrelated with each other and with the observation-level errors.

* "Mixed process" means that different equations can have different kinds of dependent variables (response types). The choices, all generalized linear models with a Gaussian error distribution, are: continuous and unbounded (the classical linear regression model), tobit (left-, right-, or bi-censored), interval-censored, probit, ordered probit, multinomial probit, and rank-ordered probit. Pre-censoring truncation can be modeled for most response types. A dependent variable in one equation can appear on the right side of another equation.

* "Recursive" means, however, that

cmpcan only fit sets of equations with clearly defined stages, not ones with simultaneous causation.AandBcan be modeled determinants ofCandCa determinant ofD--butDcannot be a modeled determinant ofA,B, orC.* "Conditional" means that the model can vary by observation. An equation can be dropped for observations for which it is not relevant--if, say, a worker retraining program is not offered in a city then the determinants of uptake cannot be modeled there. The type of a dependent variable can even vary by observation.

Broadly,

cmpis appropriate for two classes of models: 1) those in which a truly recursive data-generating process is posited; and 2) those in which there is simultaneity, but instruments allow the construction of a recursive set of equations, as in two-stage least squares, that can be used to consistently estimate structural parameters in the final stage. In the first case,cmpis a full-information maximum likelihood (FIML) estimator, and all estimated parameters are structural. In the latter, it is a limited-information (LIML) estimator, and only the final stage's coefficients are structural, the rest being reduced-form parameters. What matters for the validity ofcmpis that thesystem of equationsis recursive, whether or not the model is.

cmp's modeling framework embraces those of the official Stata commands probit, ivprobit, treatreg, biprobit, tetrachoric, oprobit, mprobit, asmprobit, asroprobit, tobit, ivtobit, cnreg, intreg, truncreg, heckman, heckprob, xtreg, xtprobit, xttobit, xtintreg, in principle even regress, and sureg, as well as the user-written ssm, polychoric, triprobit, mvprobit, bitobit, mvtobit, oheckman, switch_probit, and (in its "non-endogenous" mode) bioprobit. It goes beyond them in several ways. Thanks to the flexibility of ml, on which it is built, it accepts coefficient constraints as well as all weight types, vce types (robust, cluster, linearized, etc.), andsvysettings. And it offers more flexibility in model construction. For example, one can regress a continuous variable on two endogenous variables, one binary and the other sometimes left-censored, instrumenting each with additional variables. Andcmpusually allows the model to vary by observations. Equations can have different samples, overlapping or non-overlapping. Heckman selection modeling can be incorporated into a wide variety of models through auxilliary probit equations. In some cases, the gain is consistent estimation where it was difficult before. Sometimes the gain is in efficiency. For example ifCis continuous,Bis a sometimes-left-censored determinant ofC, andAis an instrument, then the effect ofBonCcan be consistently estimated with 2SLS (Kelejian 1971). However, acmpestimate that uses the information thatBis censored will be more efficient, based as it is on a more accurate model.As a matter of algorithm,

cmpis an SUR (seemingly unrelated regressions) estimator. It treats the equations as independent from each other except for modeling their underlying errors as jointly normally distributed. Mathematically, the likelihood it computes is conditioned on observingallright-side variables, including those that also appear on the left side of equations. However, it can actually fit a much larger class of models. Maximum likelihood (ML) SUR estimators, includingcmp, are appropriate for many simultaneous equation models, in which endogenous variables appear on the right side of structural equations as well as the left. Models of this kind for which ML SUR is consistent must satisfy two criteria:1) They are recursive. In other words, the equations can be arranged so that the matrix of coefficients of the dependent variables in each others' equations is triangular. As emphasized above, this means the models have clearly defined stages, though there can be more than one equation per stage.

2) They are "fully oberved." Dependent variables in one stage enter subsequent stages only as observed. Returning to the example in the first paragraph, if

Cis a categorical variable modeled as ordered probit, thenC, not the latent variable underlying it, call itC*, must figure in the model forD.As an illustration of the ideas here, some Stata estimation commands have wider applicability than many realize.

sureg (X=Y) (Y=Z), isuretypically matchesivregress 2sls X (Y=Z)exactly even though the documentation does not describe sureg as an instrumental variables (IV) estimator. (Iterated SUR is not a true ML estimator, but it converges to the same solution as ML-based SUR, as implemented, for example, in the demonstration command mysureg. See Pagan (1979) on the LIML-iterated SUR connection.) Andbiprobit (X=Y) (Y=Z)will consistently estimate an IV model in whichXandYare binary.To inform

cmpabout the natures of the dependent variables and about which equations apply to which observations, the user must include theindicators()option after the comma in thecmpcommand line. This must contain one expression for each equation. The expression can be a constant, a variable name, or a more complicated mathematical formula. Formulas that contain spaces or parentheses should be enclosed in quotes. For each observation, each expression must evaluate to one of the following codes, with the meanings shown:0 = observation is not in this equation's sample 1 = equation is "continuous" for this observation, i.e., has the OLS likelihood or is an uncensored observation in a tobit equation 2 = observation is left-censored for this (tobit) equation at the value stored in the dependent variable 3 = observation is right-censored at the value stored in the dependent variable 4 = equation is probit for this observation 5 = equation is ordered probit for this observation 6 = equation is multinomial probit for this observation 7 = equation is interval-censored for this observation

8 = equation is truncated on the left and/or right(obsolete because truncation is now a general modeling feature) 9 = equation is rank-ordered probit for this observationFor clarity, users can execute the

cmp setupsubcommand, which defines global macros that can then be used incmpcommand lines:$cmp_out = 0 $cmp_cont = 1 $cmp_left = 2 $cmp_right = 3 $cmp_probit = 4 $cmp_oprobit = 5 $cmp_mprobit = 6 $cmp_int = 7 $cmp_trunc = 8 $cmp_roprobit = 9

Since

cmpis a Maximum Likelihood estimator built on ml, equations are specified according to theml modelsyntax. This means that for instrumented regressions,cmpdiffers from ivregress, ivprobit, ivtobit, and similar commands in not automatically including exogenous regressors (included instruments) from the second stage in the first stage. So you must arrange for this yourself. For example,ivreg y x1 (x2=z)corresponds tocmp (y=x1 x2) (x2=x1 z), ind($cmp_cont $cmp_cont).In order to model random coefficients and effects,

cmpborrows syntax from xtmixed. It is best explained with examples. Thiseqspecifies an equation with two levels of random effects corresponding to groups defined by the variablesschoolandclass:(math = age || school: || class:)

(

school, coming first, is understood to be "above"classin the hierarchy.) At a given level, random effects can be assumed present in some equations and not others. Those in more than one equation at a given level are assumed to be (potentially) correlated across equations (an assumption that can be overridden through constraints or thecovariance()option). This specifies a school effect only for math but not reading scores, and potentially correlated class effects for both:(math = age || school: || class:) (reading = age || class:)

This adds random coefficients on age at the class level in both equations. The two new coefficients are potentially correlated with each other and with the random intercepts at the same level:

(math = age || school: || class: age) (reading = age || class: age)

Weights of various types may be specified at each level. These should be defined by variables or expressions that are constant within each group of the given level. Within a given group,

aweights andpweights are rescaled to sum to the number of groups in the next level down (or number of observations if it is the bottom level).pweights implyvce(clustergroupvar)wheregroupvardefines the highest level in the hierarchy havingpweights.iweights andfweightss are not rescaled; the latter affect the reported sample size. Since weights must be the same across equations, they need be specified only once for each level. So these are equivalent:(math = age || school: || class: [pw=weightvar1]) (reading = age || class:) (math = age || school: || class: [pw=weightvar1]) (reading = age || class: [pw=weightvar1])

and the contradiction here would cause an error:

(math = age || school: || class: [pw=weightvar1]) (reading = age || class: [pw=weightvar2])

Unlike xtreg, xtprobit, xttobit, xtintreg, xtmixed, and gllamm, which use quadrature to integrate likelihoods over the unobserved random effects (see

[R] xtmixed),cmpuses simulation. This involves making many draws from the hypothesized normal distributions of the effects, computing the implied likelihood for each set of draws, then averaging (Train 2009; Greene 2011, chap 15). To govern the simulation, random effects models incmpmust include theredraws()option. This sets the number of draws per observation at each level, the type of sequence (Halton, Hammersley, generalized Halton, pseudorandom), whether antithetics are also drawn, and whether, in the Halton and Hammersley caes, sequences should be scrambled using the square-root scrambler. (See Gates 2006 for more on all these concepts except scrambling, for which see Kolenikov 2012.) For (generalized) Halton and Hammersley sequences, it is preferable to make the number of draws prime, to insure more variable coverage of the distribution from observation to observation, making coverage more uniform overall. Increasing the number of draws increases precision at the expense of time. In a bid for speedcmpcan begin by estimating with just 1 draw per observation and random effect (plus the antithetics if specified). It can then use the result of this rough search as the starting point for an estimate with more draws, then repeat, multiplying by a fixed amount each time until reaching the specified number of draws. Thesteps(#)suboption ofredraws()can override the default number of steps, which is just 1.redraws(50 50, steps(1))would specify immediate estimation with the full 50 draws per observation in a three-level model (with two levels of random effects). See options below for more.Estimation problems with observations that are censored in three or more equations, such as a trivariate probit, require calculation of cumulative joint normal distributions of dimension three or higher. This is a non-trivial problem. The preferred technique again involves simulation: the method of Geweke, Hajivassiliou, and Keane (GHK). (Greene 2011; Cappellari and Jenkins 2003; Gates 2006).

cmpaccesses the algorithm through the separately available Mata function ghk2(), which must be installed forcmpto work. Modeled on the built-in ghk() and ghkfast(), ghk2() gives users choices about the length and nature of the sequences generated for the simulation, which choicescmplargely passes on through the optionalghkdraws()option, which includestype(),antithetics,scramblesuboptions. See options below for more.For both domains of simulation--random coefficients/effects and cumulative normal distributions estimated with the GHK algorithm---each observation or group gets its own sequence of draws. Thus changing the order of the observations in the data set will change the estimation results--one hopes only slightly. If using pseudorandom sequences (

ghktype(random)) or generalized Halton ones (ghktye(ghalton)), the state of the Stata random number generator also slightly influences the results. For exact reproducibility of your results with these sequences, initialize the seed to some chosen value with the set seed command each time before runningcmp. Estimations that require simulation can run much more slowly than those that do not.

cmpstarts by fitting each equation separately in order to obtain a good starting point for the full model fit. Sometimes in this preparatory step, convergence difficulties make a reported parameter covariance matrix singular, yielding missing standard errors for some regressors. Or variables are found to be collinear. In order to maximize the chance of convergence,cmpordinarily drops such regressors from the equations in which they cause trouble, reruns the single-equation fit, and then leaves them out for the full model too. Thenodropoption prevents this behavior.

On estimation with interval-censored or truncated equationsFor equations with interval-censored observations, list two variables before the

=, somewhat following the syntax of intreg. For example,cmp(y1 y2 = x1 x2), ind($cmp_int)indicates that the dependent variable is censored to intervals whose lower bounds are in y1 and upper bounds are in y2. Missing values in y1 are treated as -infinity and those in y2 are treated as +infinity. For observations in which y1 and y2 coincide, there is no censoring, and the likelihood is the same as for$cmp_cont.For equations with truncated distributions--which can be any model type besides multinomial and rank-ordered probit--use the

truncpoints(exp exp)option within the specification for the given equation to provide truncation points. Like indicator expressions, the truncation points can be constants, expressions, or variable names, with missing values interpreted as above. Observations in which the observed value lies outside the truncation range are automatically dropped for that equation. An example is below.

On estimation with multinomial probit equationsMultinomial probits can be specified with two different syntaxes, roughly corresponding to the Stata commands mprobit and asmprobit. In the first syntax, the user lists a single equation, just as for other dependent variable types, and puts a 6 (

$cmp_mprobit) in theindicators()list. The dependent variable holds the choice made in each case. Like mprobit,cmptreats all regressors as determinants of choice for all alternatives. In particular, it expands the specified equation to a group with one "utility" equation for each possible choice. All equations in the group include all regressors, except for the first, which has none. This one corresponds to the lowest value of the dependent variable, which is taken as the base alternative. The next, corresponding to the second-lowest value, is the "scale alternative," meaning that to normalize results, the variance of its error term is fixed. (The value it is fixed at depends on whether thestructuraloption is invoked, on which see below.) In the first syntax, the singleeqcan contain aniiaoption after the comma so thatcmp, like mprobit, will impose the Independence of Irrelevant Alternatives assumption. I.e.,cmpwill assume that errors in the utility equations are uncorrelated and have unit variance.Such models, ones without exclusion restrictions and without the IIA assumption, are formally identified as long as at least one regressor varies over alternatives (Keane 1992). However, Keane emphasizes that fits for such models tend to be unstable if there are no exclusion restrictions. There are two ways to impose exclusion restrictions with

cmp. First, as with mprobit, you can use constraints.Second, you can use

cmp's other multinomial probit syntax. In this "alternative-specific" syntax, you list one equation in thecmpcommand line for each alternative, including the base alternative. Different equations may include different regressors. Unlike asmprobit,cmpdoes not force regressors that appear in more than one equation to have the same coefficient across alternatives, although again this restriction can be imposed through constraints. When using the alternative-specific syntax, the dependent variables listed should be a set ofdummies, as can be generated with xi, noomit from the underlying choice variable. The first equation is always treated as the base alternative, so here you can control which alternative is the base alternative, by reordering the equations. In general, regressors that appear in all other equations should be excluded from the base alternative. Otherwise, unless a constraint is imposed to reduce the degrees of freedom, the model will not be identified. (cmpautomatically excludes the constant from the base alternative equation.) Variables that are specific to the base alternative, however, or to a strict subset of alternatives, can be included in the base alternative equation. In the second syntax, the IIA is not assumed, nor available through an option. It can still be imposed through constraints.To specify an alternative-specific multinomial probit group, include expressions in the

indicators()that evaluate to 0 or 6 ($cmp_outor$cmp_mprobit) for each equation (0 indicating that the choice is not available for given observations). You must enclose the whole list in an additional set of parentheses. Note that unlike with asmprobit, there should be still be one row in the data set per case, not per case and alternative. So instead of variables that vary by alternative, there must be a version of that variable for each alternative--not a single travel time variable that varies by mode of travel, but an air travel time variable, a bus travel time variable, and so on. An alternative-specific multinomial example is also below.In a multinomial probit model with J choices, each possible choice has its own structural equation, including an error term. These error terms have some covariance structure. It is impossible, however, to estimate all the entries of the JxJ covariance matrix (Train 2003; Long and Freese (2006)). What is used in the computation of the likelihood is the (J-1)x(J-1) covariance matrix of the differences of the non-base-alternative errors from the base-alternative error. So by default,

cmp, much like asmprobit, interprets the sigma and rho parameters relating to these equations as characterizing thesedifferencederrors. To eliminate an excessive degree of scaling freedom, it constrains the error variance of the first non-base-alternative equation (the "scaling alternative") to 2, which it would be anyway if the errors for the first two structural equations were i.i.d. standard normal (so that their difference has variance 2).The disadvantage of this parameterization is that it is hard to think about if you want to impose additional constraints on it. As an alternative,

cmp, like asmprobit, offers astructuraloption. When this is included,cmpcreates a full set of parameters to describe the covariance of the J structural errors. To remove the excessive degrees of freedom, it then constrains the base alternative error to have variance 1 and no correlation with the other errors; and constrains the error for the second, scaling alternative to also have variance 1. To impose the IIA under this option, one would then constrain various "atanhrho" and "lnsig" parameters to 0. An example below shows how to estimate the same IIA model with and without the structural parameterization.The intuitiveness of the structural parameterization comes at a real cost, however (Bunch (1991); Long and Freese (2006), pp. 325-29). Though the particular set of constraints imposed seems innocent, it actually results in a mapping from the space of allowed structural covariances to the space of possible covariance matrices for the relative-differenced errors that is not

onto. That is, there are positive definite (J-1)x(J-1) matrices, valid candidates for the covariance of the relative-differenced errors, which are not compatible with the assumptions that the first two alternatives' structural errors have variance oneandthat the first, base alternative's error is uncorrelated with all other structural errors. So thestructuraloption can preventcmpfrom reaching the maximum-likelihood fit. Long and Freese (2006) describe how changing which alternatives are the base and scaling alternatives, by reording the equations, can sometimes free an estimator to find the true maximum within thestructuralparametrization.

On estimation with rank-ordered probit equationsSpecification and treatment of rank-ordered probit equations is nearly identical to that in the second syntax for multinomial probits described just above. Equations must be listed for every alternative. Indicators for these equations must be either 0 or 9 (

$cmp_outor$cmp_roprobit) for each observation, and grouped in an extra set of parentheses. Constraints defining the base and scale alternatives are automatically imposed in the same way. Thestructuraloption too works identically. One option relating specifically to rank-ordered probit isreverse. It instructscmpto interpret lower-numbered rankings as higher instead of lower.

Tips for achieving and speeding convergenceIf you are having trouble achieving (or waiting for) convergence with

cmp, these techniques might help:1. Changing the search techniques using

ml's technique() option, or perhaps the search parameters, through its maximization options.cmpaccepts all these and passes them on toml. The default Newton-Raphson search method usually works very well oncemlhas found a concave region. The DFP algorithm (tech(dfp)) often works better before then, and the two can be mixed, as withtech(dfpnr). See the details of thetechnique()option at ml. 2. If the estimation problem requires the GHK algorithm (see above), change the number of draws per observation in the simulation sequence using theghkdraws()option. By default,cmpuses twice the square root of the number of observations for which the GHK algorithm is needed, i.e., the number of observations that are censored in at least three equations. Raising simulation accuracy by increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision. On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision, with remarkably few draws per observations--as few as 5 when the sample size is 10,000 (Cappellari and Jenkins 2003). And taking more draws can also greatly extend execution time. 3. If getting many "(not concave)" messages, try thedifficultoption, which instructsmlto use a different search algorithm in non-concave regions. 4. If the search appears to be converging in likelihood--if the log likelihood is hardly changing in each iteration--and yet fails to converge, try adding anrtolerance(#)ornonrtoleranceoption to the command line after the comma. These aremloptions that control when convergence is declared. (See ml_opts, below.) By default,mldeclares convergence when the log likelihood is changing very little with successive iterations (within tolerances adjustable with thetolerance(#)andltolerance(#)options)andwhen the calculated gradient vector is close enough to zero. In some difficult problems, such as ones with nearly collinear regressors, the imprecision of floating point numbers preventsmlfrom quite satisfying the second criterion. It can be loosened by using thenrtolerance(#)to set the scaled gradient tolerance to a value larger than its default of 1e-5, or eliminated altogether withnonrtolerance. Because of the risks of collinearity,cmpwarns when the condition number of an equation's regressor matrix exceeds 20 (Greene 2000, p. 40). 5. Trycmp's interactive mode, via theinteractiveoption. This allows the user to interrupt maximization by hitting Ctrl-Break or its equivalent, investigate and adjust the current solution, and then restart maximization by typing ml max. Techniques for exploring and changing the current solution include displaying the current coefficient and gradient vectors (withmat list $ML_bormat list $ML_g) and running ml plot. See help ml,[R] ml, and Gould, Pitblado, and Sribney (2006) for details.cmpis slower in interactive mode.

indicators(exp[exp...]) is required. It should pass a list of expressions that evaluate to 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9 for every observation, with one expression for each equation and in the same order. Expressions can be constants, variable names, or formulas. Individual formulas that contain spaces or parentheses should be enclosed in quotes.

level(#)specifies the confidence level, in percent, for confidence intervals of the coefficients; see help level. The default is 95.

quietlysuppresses most output: the results from any single-equation initial fits and the iteration log during the full model fit.

nolrtestsuppresses calculation and reporting of the likelihood ratio (LR) test of overall model fit, relative to a constant(s)-only model. This has no effect if data arepweighted or errors arerobustorclustered. In those cases, the likelihood function does not reflect the non-sphericity of the errors, and so is a pseudolikelihood. The LR test is then invalid and is not run anyway.

redraws(# [# ...] , [type(halton|hammersley|ghalton|random)antitheticsscramblesteps(#)])is required for random coefficient/effects models. It governs the simulation-based estimation of them. The option should begin with one number (#) for each level of the model above the base (e.g., two numbers in a three-level model); these specify the number of draws per observation from the simulated distributions of the random effects. The optionaltype()suboption sets the sequence type; the default is halton. The optionalantitheticssuboption doubles the number of draws at all levels by including antithetics. For more on these concepts, see See Drukker and Gates (2006). The optionalscrambleoption invokes the square-root scrambler for Halton and Hammersley sequences. (See Kolenikov 2012. This scrambler has no effect for primes 2 and 3.) The optionalsteps(#)suboption sets the number of times to fit the model as the number of draws at each level is geometrically increased to the specified final values. The preliminary runs all use the Newton-Raphson search algorithm and ml'snonrtolerance tolerance(0.1)options in order to accept coarse fits. This stepping is meant only to increase speed by using fewer draws until the search is close to the maximum. It can be disabled withsteps(1).

lfmakescmpuse its lf-method evaluator instead of its d2-method one (for Stata 10 or earlier) or lf1-method one (Stata 11 or later). This forcescmpto compute first derivatives of the likelihood numerically instead of analytically, which substantially slows estimation but occassionally improves convergence.

ghkdraws([#],[type(halton|hammersley|ghalton|random)antitheticsscramble])governs the draws used in GHK simulation of higher-dimensional cumulative multivariate normal distributions. It is similar to theredrawsoption. However, it takes at most one number; if it, or the entire option, is omitted, the number of draws is set rather arbitrarily to twice the square root of the number of observations for which the simulation is needed. (Simulated maximum likelihood is consistent as long as the number of draws rises with the square root of the number of observations. In practice, a higher number of observations often reduces the number of draws per observation needed for reliable results. Train 2009, p. 252.) As in theredraws()option, the optionaltype(string)suboption specifies the type of sequence in the GHK simulation,haltonbeing the default;antitheticsrequests antithetic draws; andscrambleapplies the square-root scrambler.

nodropprevents the dropping of regressors from equations in which they receive missing standard errors in initial single-equation fits. It also prevents the removal of collinear variables.

covariance(covopt[covopt...]) offers shorthand ways to constrain thecross-equationcorrelation structure of the errors at each level--shorthand, that is, compared to using constraint. There should be onecovoptfor each level in the model, and each can beunstructured,exchangeable, orindependent.unstructured, the default, imposes no constraint.exchangeablespecifies that all correlations between random effects, coefficients, or residual errors in different equations, within a given level are the same; and likewise for all the variances of all these areas.independentsets all cross-equation correlations at a given level to zero. Separately, there_equationsyntax documented above also includes acovariance()option. With the same choices and defintitions, it controls variances and covarianceswithina given equation at a given level.

interactivemakescmpfit the full model in ml's interactive mode. This allows the user to interrupt the model fit with Ctrl-Break or its equivalent, view and adjust the trial solution with such commands as ml plot, then restart optimization by typing ml max. See help ml,[R]ml, and Gould, Pitblado, and Sribney (2006) for details.cmpruns more slowly in interactive mode.

init(vector)passes a row vector of user-chosen starting values for the full model fit, in the manner of the ml init, copy command. The vector must contain exactly one element for each parametercmpwill estimate, and in the same order ascmpreports the parameter estimates in the output. Thus, at the end will be the initial guesses for the lnsig_iparameters, then those for the atanhrho_ij, then those for any ordered-probit cuts. (cmpnormally also reports sig_i's and rho_ij's, but these are not additional parameters, merely transformed versions of underlying ones, and should be ignored in building the vector of starting values.) The names of the row and columns of the vector do not matter.

noestimatesimplifies the job of constructing an initial vector for theinit()option. It instructscmpto stop before fitting the full model and leave behind an e(b) return vector with one labeled entry for each free parameter. To view this vector, type mat list e(b). You can copy or edit this vector, such as with "mat b=e(b)", then pass it back tocmpwith theinit()option.

structuralforces the structural covariance parameterization for all multinomial and rank-ordered equation groups. See above for more.

reverseinstructscmpto interpret lower-numbered ranks in rank-ordered probit equations as being higher.

psampling(# #)makescmpperform "progressive sampling," which can speed estimation on large data sets. First it estimates on a small subsample, then a larger one, etc., until reaching the full sample. Each iteration uses the previous one's estimates as a starting point. The first argument in the option sets the initial sample size, either in absolute terms (if it is at least 1) or as a fraction of the full sample (if it is less than 1). The second argument is the factor by which the sample should grow in each iteration. This process is analogous to but distinct from the stepping that occurs by default in simulating random effects.

ml_opts:cmpaccepts the following standard ml options:tracegradienthessianshowsteptechnique(algorithm_specs)vce(oim|opg|robust|cluster)iterate(#)tolerance(#)ltolerance(#)gtolerance(#)nrtolerance(#)nonrtoleranceshownrtolerancedifficultconstraints(clist)score(newvarlist|stub*)

svyindicates thatmlis to pick up thesvysettings set bysvysetand use the robust variance estimator. This option requires the data to besvyset.svymay not be specified withvce()or weights. See help svy estat.

svy_opts: Along withsvy, users may also specify any of these related ml options, which affect how the svy-based variance is estimated:nosvyadjustsubpop(subpop_spec)srssubpop. And users may specify any of these ml options, which affect output display:deffdeftmeffmefteformprobci. See help svy estat.

Onpredictandmarginsafter cmpOptions for

predictaftercmpare:

equation(eqno[,eqno])specify equation(s)xblinear predictionstdpstandard error of linear predictionstddpstandard error of difference in linear predictionslnlobservation-level log likelihood (in hierarchical models, averaged over groups)scoresderivative of the log likelihood with respect to xb or parameterresidualscalculate the residualsprprobability of a positive outcome (meant for probit equations)e(# #)censored expected value (see help regress postestimation)ystar(# #)truncated expected value (see help regress postestimation)outcome(outcome)specify outcome(s), for ordered probit onlynooffsetignore anyoffset()orexposure()variableNote that the

e(# #)andystar(# #)options should not include a comma between the two bounds.

eqnocan be an equation name (if not set explicitly, an equation's name is that of its dependent variable). Or it can be an equation number preceded by a#. The default equation is #1, unless the provided variable list has one entry for each equation, or takes the formstub*. These request prediction variables for all equations, with names as given or as automatically generated beginning withstub.In contrast, for ordered probit equations, if

pris specified,predictwill by default compute probability variables for all outcomes. The names for these variables will be automatically generated using a provided variable name as a stub. This stub may be directly provided in the command line--in which case it shouldnotinclude a*--or may itself be automatically generated by a cross-equationstub*. Thus it is possible to generate probabilities for all outcomes in all ordered probit equations with a single, terse command. Alternatively, theoutcome(outcome)option can be used to request probabilities for just one outcome.outcomecan be a value for the dependent variable, or a category number preceded by a#. For example, if the categorical dependent variable takes the values 0, 3, and 4, thenoutcome(4)andoutcome(#3)are synonyms. (outcome()also impliespr.)In explaining the multi-equation and -outcome behavior of predict after

cmp, examples are worth a thousand words.The flexibility of

cmpaffects the use of predict and margins after estimation. Because the censoring type (probit, tobit, etc.) can technically vary by observation, the default statistic for predict is alwaysxb, linear fitted values. This is unlike for probit and oprobit, after which the default ispr, predicted probabilities of outcomes. So to obtain probilities predicted by (ordered) probit equations, remember to include theproption in the predict command line orpredict(pr)in the margins command line. (For ordered probit equations, anoutcome()option will also implypr.)When using margins to estimate marginal effects with respect to one equation in a multi-equation

cmpmodel, users may need to include theforceoption in the cmd:margins} command line.Examples of predict and margins after

cmpare below.

Citation

cmpis not an official Stata command. It is a free contribution to the research community. Please cite it as such: Roodman, D. 2011. Estimating fully observed recursive mixed-process models with cmp.Stata Journal11(2): 159-206.

See Google Scholar.Published examples

Introductory examplesThe purpose of

cmpis not to match standard commands, but to fit models otherwise beyond easy estimation in Stata. But replications illustrate howcmpworks (colored text is clickable):

* Define indicator macros for clarity.. cmp setup. webuse laborsup

* Make censoring level 0 for fem_inc since pre-Oct '07 ivtobit assumes itis because of bug.. replace fem_inc = fem_inc - 10. reg kids fem_inc male_educ . cmp (kids = fem_inc male_educ), ind($cmp_cont) quietly

. sureg (kids = fem_inc male_educ) (fem_work = male_educ), isure . cmp (kids = fem_inc male_educ) (fem_work = male_educ), ind($cmp_cont $cmp_cont) quietly

. mvreg fem_educ male_educ = kids other_inc fem_inc . cmp (fem_educ = kids other_inc fem_inc) (male_educ = kids other_inc fem_inc), ind(1 1) qui

. ivreg fem_work fem_inc (kids = male_educ), first . cmp (kids = fem_inc male_educ) (fem_work = kids fem_inc), ind($cmp_cont $cmp_cont) qui

. ivregress liml fem_work fem_inc (kids = male_educ other_inc) . cmp (kids = fem_inc male_educ other_inc) (fem_work = kids fem_inc), ind($cmp_cont $cmp_cont) qui}

. probit kids fem_inc male_educ . predict p . margins, dydx(*) . cmp (kids = fem_inc male_educ), ind($cmp_probit) qui . predict p2, pr . margins, dydx(*) predict(pr)

. oprobit kids fem_inc male_educ . margins, dydx(*) predict(outcome(#2)) . cmp (kids = fem_inc male_educ), ind($cmp_oprobit) qui . margins, dydx(*) predict(eq(#1) outcome(#2) pr)

. gen byte anykids = kids > 0 . biprobit (anykids = fem_inc male_educ) (fem_work = male_educ) . cmp (anykids = fem_inc male_educ) (fem_work = male_educ), ind($cmp_probit $cmp_probit)

. tetrachoric anykids fem_work . cmp (anykids = ) (fem_work = ), ind($cmp_probit $cmp_probit) nolr qui

. ivprobit fem_work fem_educ kids (other_inc = male_educ), first . margins, predict(pr) dydx(*) . cmp (fem_work = other_inc fem_educ kids) (other_inc = fem_educ kids male_educ), ind($cmp_probit $cmp_cont) . margins, predict(pr eq(#1)) dydx(*) force

. treatreg other_inc fem_educ kids, treat(fem_work = male_educ) . cmp (other_inc = fem_educ kids fem_work) (fem_work = male_educ), ind($cmp_cont $cmp_probit) qui

. tobit fem_inc kids male_educ, ll . cmp (fem_inc = kids male_educ), ind("cond(fem_inc, $cmp_cont, $cmp_left)") qui

. ivtobit fem_inc kids (male_educ = other_inc), ll first . cmp (fem_inc=kids male_educ) (male_educ=kids other_inc), ind("cond(fem_inc,$cmp_cont,$cmp_left)" $cmp_cont)

. preserve . webuse intregxmpl, clear . intreg wage1 wage2 age age2 nev_mar rural school tenure . cmp (wage1 wage2 = age age2 nev_mar rural school tenure), ind($cmp_int) qui . restore

. preserve . webuse laborsub, clear . truncreg whrs kl6 k618 wa we, ll(0) . cmp (whrs = kl6 k618 wa we, trunc(0 .)), ind($cmp_cont) qui . restore

. preserve . webuse sysdsn3, clear . mprobit insure age male nonwhite site2 site3 . cmp (insure = age male nonwhite site2 site3, iia), nolr ind($cmp_mprobit) qui . restore

. preserve . webuse travel, clear . asmprobit choice travelcost termtime, casevars(income) case(id) alternatives(mode) struct . drop invehiclecost traveltime partysize . reshape wide choice termtime travelcost, i(id) j(mode) . constraint 1 [air]termtime1 = [train]termtime2 . constraint 2 [train]termtime2 = [bus]termtime3 . constraint 3 [bus]termtime3 = [car]termtime4 . constraint 4 [air]travelcost1 = [train]travelcost2 . constraint 5 [train]travelcost2 = [bus]travelcost3 . constraint 6 [bus]travelcost3 = [car]travelcost4 . cmp (air:choice1=t*1) (train: choice2=income t*2) (bus: choice3=income t*3) (car: choice4=income t*4), ind((6 6 6 6)) constr(1/6) nodrop struct tech(dfp) . restore

. preserve . webuse wlsrank, clear . asroprobit rank high low if noties, casevars(female score) case(id) alternatives(jobchar) reverse . reshape wide rank high low, i(id) j(jobchar) . constraint 1 [esteem]high1=[variety]high2 . constraint 2 [esteem]high1=[autonomy]high3 . constraint 3 [esteem]high1=[security]high4 . constraint 4 [esteem]low1=[variety]low2 . constraint 5 [esteem]low1=[autonomy]low3 . constraint 6 [esteem]low1=[security]low4 . cmp (esteem:rank1=high1 low1)(variety:rank2=female score high2 low2)(autonomy:rank3=female score high3 low3)(security:rank4=female score high4 low4) if noties,ind((9 9 9 9)) tech(dfp) ghkd(200, type(hammersley)) rev constr(1/6) . restore

* Heckman selection models.. preserve

. webuse womenwk, clear . heckman wage education age, select(married children education age) mills(heckman_mills) . gen selectvar = wage<. . cmp (wage = education age) (selectvar = married children education age), ind(selectvar $cmp_probit) nolr qui . predict cmp_mills, eq(selectvar) . replace cmp_mills = normalden(cmp_mills)/normal(cmp_mills)

. gen wage2 = wage > 20 if wage < . . heckprob wage2 education age, select(married children education age) . cmp (wage2 = education age) (selectvar = married children education age), ind(selectvar*$cmp_probit $cmp_probit) qui

. restore

* Hierarchical/random effects models. preserve . webuse union, clear . gen double south_year = south * year . xtprobit union age grade not_smsa south year south_year . cmp (union = age grade not_smsa south year south_year || idcode:), ind($cmp_probit) nolr redraws(101, anti) tech(dfp) . restore

. preserve . webuse nlswork3, clear . gen double south_year = south * year . xttobit ln_wage union age grade not_smsa south year south_year, ul(1.9) . replace ln_wage = 1.9 if ln_wage > 1.9 . cmp (ln_wage = union age grade not_smsa south year south_year || idcode:), ind("cond(ln_wage<1.899999, $cmp_cont, $cmp_right)") nolr redraws(101) tech(dfp) . restore

. preserve . webuse nlswork5, clear . gen double south_year = south * year . xtintreg ln_wage1 ln_wage2 union age grade south year south_year occ_code . cmp (ln_wage1 ln_wage2 = union age grade south year south_year occ_code || idcode:), ind($cmp_int) nolr redraws(101, type(hammersley)) tech(dfp) . restore

These examples go beyond standard commands:

. webuse laborsup

* Regress an unbounded, continuous variable on an instrumented, binaryone. 2SLS is consistent but less efficient.. cmp (other_inc = fem_work) (fem_work = kids), ind($cmp_cont $cmp_probit) qui robust . ivreg other_inc (fem_work = kids), robust

* Now regress it on a left-censored one, female income, which is onlymodeled for observations in which the woman works.. gen byte ind2 = cond(fem_work, cond(fem_inc, $cmp_cont, $cmp_left), $cmp_out) . cmp (other_inc=fem_inc kids) (fem_inc=fem_edu), ind($cmp_cont ind2)

* "IV-oprobit". cmp (kids = fem_educ) (fem_educ = fem_work), ind($cmp_oprobit $cmp_cont) tech(dfp) nolr . margins, dydx(*) predict(eq(#1) pr outcome(#2)) force

* Ordered probit with Heckman selection modeling. preserve . webuse womenwk, clear . gen selectvar = wage<. . gen wage3 = (wage > 10)+(wage > 30) if wage < . . cmp (wage3 = education age) (selectvar = married children education age), ind(selectvar*$cmp_oprobit $cmp_probit) qui . restore

* Correlated random coefficient and random effect.. preserve . use http://www.stata-press.com/data/mlmus3/gcse, clear . cmp (gcse = lrt || school: lrt), ind($cmp_cont) nolr redraws(101, anti) tech(dfp) . restore* Multinomial probit with heterogeneous preferences (random effects byindividual). preserve . use http://fmwww.bc.edu/repec/bocode/j/jspmix.dta, clear . cmp (tby = sex, iia || scy3:), ind($cmp_mprobit) nolr redraws(47, anti) tech(dfp) . restoreThese illustrate subtleties of predict after

cmp:. webuse laborsup

* Bivariate seemingly unrelated ordered probit. gen byte kids2 = kids + int(uniform()*3) . cmp (kids=fem_educ) (kids2=fem_educ), ind($cmp_oprobit $cmp_oprobit) nolr tech(dfp) qui* Predict fitted values. Fitted values are always the default, as isequation #1. predict xbA* Two ways to predict fitted values for all equations. predict xbB* . predict xbC xbD* Get scores for all equations and parameters. predict sc*, score* Get observation-level log-likelihoods. predict lnl, lnl* Two ways to predict kids=0, using (default) first equation. predict prA, pr outcome(0) . predict prB, outcome(#1)* Predict kids2=4, using second equation. predict prC, outcome(4) eq(kids2)* Predict all outcomes, all equations.. predict prD*, pr* Same but result variable names for the two equations start with prE andprF respectively.. predict prE prF, pr* Predict all outcomes, equation 2. Generates variables prG_Y where Y isoutcome number (not outcome value).. predict prG, eq(#2) pr

ReferencesBunch, D.S. 1991. Estimability in the multinomial probit model. Transportation Research. 25B(1): 1-12. Cappellari, L., and S. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood.

Stata Journal3(3): 278-94. Drukker, D.M., and R. Gates. 2006. Generating Halton sequences using Mata.Stata Journal6(2): 214-28. http://www.stata-journal.com/article.html?article=st0103 Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal simulator.Stata Journal6(2): 190-213. http://www.stata-journal.com/article.html?article=st0102 Gould, W., J. Pitblado, and W. Sribney. 2006. Maximum Likelihood Estimation with Stata. 3rd ed. College Station: Stata Press. Greene, W.H. 2002.Econometric Analysis, 5th ed. Prentice-Hall. Greene, W.H. 2011.Econometric Analysis, 7th ed. Prentice-Hall. Chapter 15 Keane, M.P. 1992. A note on identification in the multinomial probit model.Journal of Business and Economics Statistics10(2), pp. 193-200. Kelejian, H.H. 1971. Two-stage least squares and econometric systems linear in parameters but nonlinear in the endogenous variables.Journal of the American Statistical Association66(334): 373-74. Long, J. S., and J. Freese. 2006. Regression models for categorical dependent variables using Stata. 2nd ed. College Station, TX: Stata Press. Pagan. A. 1979. Some consequences of viewing LIML as an iterated Aiken estimator. Economics Letters 3:369-372. Pitt, M.M., and S. R. Khandker. 1998. The impact of group-based credit programs on poor households in Bangladesh: Does the gender of participants matter?Journal of Political Economy106(5): 958-96. Rivers, D., and Q. Vuong. 1988. Limited information estimators and exogeneity tests for simultaneous probit models.Journal ofEconometrics39: 347-66. Roodman, D. 2011. Estimating fully observed recursive mixed-process models with cmp.Stata Journal11(2): 159-206. Smith, R.J., and R.W. Blundell. 1986. An exogeneity test for a simultaneous equation tobit model with an application to labor supply.Econometrica54(3): 679-85. Train, K. 2009.Discrete Choice Methods with Simulation.2nd ed. Cambridge University Press. http://elsa.berkeley.edu/books/choice2.html

AuthorDavid Roodman Senior Fellow Center for Global Development Washington, DC droodman@cgdev.org

AcknowledgementsThanks to Kit Baum, David Drukker, Arne Hole, Stanislaw Kolenikov, and Mead Over for comments.

Also see

[R] ml,[R] biprobit,[R] probit,[R] oprobit,[R] sureg,[R] ivreg,[R]tobit,[R] cnreg,[R] intreg,[R] truncreg,[R] ivtobit,[R]ivprobit,[R] heckman,[R] heckprob,[SVY] svy estimation.