help cmp

Conditional (recursive) mixed process estimator with multilevel random effects and coefficients


cmp setup

- or -

cmp eq [eq ...] [if] [in] [weight] , indicators(exp [exp ...]) [level(#) quietly nolrtest redraws(# [# ...] , [type(halton | hammersley | ghalton | random) antithetics scramble steps(#)]) lf ghkdraws([#] , [type(halton | hammersley | ghalton | random) antithetics scramble]) nodrop interactive init(vector) noestimate covariance(covopt [covopt ...]) structural reverse psampling(# #) ml_opts svy svy_opts]

where covopt is

unstructured | exchangeable | independent

and eq is

[fe_equation] [|| re_equation] [|| re_equation ...]

Each fe_equation is an equation to be estimated, defined largely according to the ml model eq syntax. That is, fe_equation is enclosed in parentheses, optionally prefixed with a name for the equation:

( [eqname:] varname_y [varname_y2] = [indepvars] [, noconstant offset(varname_o) exposure(varname_e) truncpoints(exp exp) iia] )

indepvars may include factor variables; see fvvarlist.

varname_y2 is included only for interval-censored data, in a syntax analogous to that of intreg. truncpoints(exp exp) is allowed for equations involving all model types except mulitnomial and rank-ordered probit. iia is meaningful only for multinomial probit models without alternative-specific regressors.

Each exp in the required indicators() option is an expression that evaluates to a cmp indicator variable, which communicates required observation-level information about the dependent variable(s) in the corresponding equation, and can be a constant, a variable name, or a complex mathematical expression. It can contain spaces or parentheses if it is double-quoted. Each exp must evaluate to 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9, potentially varying by observation. For a multinomial probit equation group with alternative-specific regressors, the corresponding indicator expressions should all evaluate to 0's and 6's, and should be enclosed in an additional pair of parentheses. The same goes for rank-ordered probit groups, except with 9's instead of 6's.

In random effect/coefficent models, each re_equation specifies the effects, potentially at multiple levels, according to the syntax

levelvar: [varlist] [, noconstant covariance(covopt)] [weight]

where covopt is as defined above. In these models, the redraws() option is mandatory.

cmp may be prefixed with svy ... :.

pweights, fweights, aweights, and iweights are allowed at all levels; see help weights. Group-level, weights, specificed in the re_equation syntax above, should be constant within groups. Weights for a given level need be specified in just one equation.

The syntax of predict following cmp is

predict [type] {newvarname|stub*|newvarlist} [if exp] [in range] [, statistic equation(eqno[,eqno]) outcome(outcome) nooffset]

where statistic is xb, pr, stdp, stddp, lnl, scores, residuals, e(a b), or ystar(a b); and a and b may be numbers or variables; a missing (a > .) means minus infinity, and b missing (b > .) means plus infinity; see missing.

cmp shares features of all estimation commands; see help estcom.


cmp fits a large family of multi-equation, multi-level, conditional recursive mixed-process estimators. Those adjectives are defined as follows:

* "Multi-equation" means that it can fit Seemingly Unrelated (SUR) and instrumental variables (IV) systems.

* "Multi-level" means that random coefficients and effects (intercepts) can be modelled at various levels in hierarchical fashion, the classic example being a model of education outcomes with unobserved school and class effects. Since the models can also be multi-equation, random effects at a given level are allowed by default to be correlated across equations. E.g., school and class effects may be correlated across outcomes such as math and readings scores. Effects at different levels, however, are assumed uncorrelated with each other and with the observation-level errors.

* "Mixed process" means that different equations can have different kinds of dependent variables (response types). The choices, all generalized linear models with a Gaussian error distribution, are: continuous and unbounded (the classical linear regression model), tobit (left-, right-, or bi-censored), interval-censored, probit, ordered probit, multinomial probit, and rank-ordered probit. Pre-censoring truncation can be modeled for most response types. A dependent variable in one equation can appear on the right side of another equation.

* "Recursive" means, however, that cmp can only fit sets of equations with clearly defined stages, not ones with simultaneous causation. A and B can be modeled determinants of C and C a determinant of D--but D cannot be a modeled determinant of A, B, or C.

* "Conditional" means that the model can vary by observation. An equation can be dropped for observations for which it is not relevant--if, say, a worker retraining program is not offered in a city then the determinants of uptake cannot be modeled there. The type of a dependent variable can even vary by observation.

Broadly, cmp is appropriate for two classes of models: 1) those in which a truly recursive data-generating process is posited; and 2) those in which there is simultaneity, but instruments allow the construction of a recursive set of equations, as in two-stage least squares, that can be used to consistently estimate structural parameters in the final stage. In the first case, cmp is a full-information maximum likelihood (FIML) estimator, and all estimated parameters are structural. In the latter, it is a limited-information (LIML) estimator, and only the final stage's coefficients are structural, the rest being reduced-form parameters. What matters for the validity of cmp is that the system of equations is recursive, whether or not the model is.

cmp's modeling framework embraces those of the official Stata commands probit, ivprobit, treatreg, biprobit, tetrachoric, oprobit, mprobit, asmprobit, asroprobit, tobit, ivtobit, cnreg, intreg, truncreg, heckman, heckprob, xtreg, xtprobit, xttobit, xtintreg, in principle even regress, and sureg, as well as the user-written ssm, polychoric, triprobit, mvprobit, bitobit, mvtobit, oheckman, switch_probit, and (in its "non-endogenous" mode) bioprobit. It goes beyond them in several ways. Thanks to the flexibility of ml, on which it is built, it accepts coefficient constraints as well as all weight types, vce types (robust, cluster, linearized, etc.), and svy settings. And it offers more flexibility in model construction. For example, one can regress a continuous variable on two endogenous variables, one binary and the other sometimes left-censored, instrumenting each with additional variables. And cmp usually allows the model to vary by observations. Equations can have different samples, overlapping or non-overlapping. Heckman selection modeling can be incorporated into a wide variety of models through auxilliary probit equations. In some cases, the gain is consistent estimation where it was difficult before. Sometimes the gain is in efficiency. For example if C is continuous, B is a sometimes-left-censored determinant of C, and A is an instrument, then the effect of B on C can be consistently estimated with 2SLS (Kelejian 1971). However, a cmp estimate that uses the information that B is censored will be more efficient, based as it is on a more accurate model.

As a matter of algorithm, cmp is an SUR (seemingly unrelated regressions) estimator. It treats the equations as independent from each other except for modeling their underlying errors as jointly normally distributed. Mathematically, the likelihood it computes is conditioned on observing all right-side variables, including those that also appear on the left side of equations. However, it can actually fit a much larger class of models. Maximum likelihood (ML) SUR estimators, including cmp, are appropriate for many simultaneous equation models, in which endogenous variables appear on the right side of structural equations as well as the left. Models of this kind for which ML SUR is consistent must satisfy two criteria:

1) They are recursive. In other words, the equations can be arranged so that the matrix of coefficients of the dependent variables in each others' equations is triangular. As emphasized above, this means the models have clearly defined stages, though there can be more than one equation per stage.

2) They are "fully oberved." Dependent variables in one stage enter subsequent stages only as observed. Returning to the example in the first paragraph, if C is a categorical variable modeled as ordered probit, then C, not the latent variable underlying it, call it C*, must figure in the model for D.

As an illustration of the ideas here, some Stata estimation commands have wider applicability than many realize. sureg (X=Y) (Y=Z), isure typically matches ivregress 2sls X (Y=Z) exactly even though the documentation does not describe sureg as an instrumental variables (IV) estimator. (Iterated SUR is not a true ML estimator, but it converges to the same solution as ML-based SUR, as implemented, for example, in the demonstration command mysureg. See Pagan (1979) on the LIML-iterated SUR connection.) And biprobit (X=Y) (Y=Z) will consistently estimate an IV model in which X and Y are binary.

To inform cmp about the natures of the dependent variables and about which equations apply to which observations, the user must include the indicators() option after the comma in the cmp command line. This must contain one expression for each equation. The expression can be a constant, a variable name, or a more complicated mathematical formula. Formulas that contain spaces or parentheses should be enclosed in quotes. For each observation, each expression must evaluate to one of the following codes, with the meanings shown:

0 = observation is not in this equation's sample 1 = equation is "continuous" for this observation, i.e., has the OLS likelihood or is an uncensored observation in a tobit equation 2 = observation is left-censored for this (tobit) equation at the value stored in the dependent variable 3 = observation is right-censored at the value stored in the dependent variable 4 = equation is probit for this observation 5 = equation is ordered probit for this observation 6 = equation is multinomial probit for this observation 7 = equation is interval-censored for this observation 8 = equation is truncated on the left and/or right (obsolete because truncation is now a general modeling feature) 9 = equation is rank-ordered probit for this observation

For clarity, users can execute the cmp setup subcommand, which defines global macros that can then be used in cmp command lines:

$cmp_out = 0 $cmp_cont = 1 $cmp_left = 2 $cmp_right = 3 $cmp_probit = 4 $cmp_oprobit = 5 $cmp_mprobit = 6 $cmp_int = 7 $cmp_trunc = 8 $cmp_roprobit = 9

Since cmp is a Maximum Likelihood estimator built on ml, equations are specified according to the ml model syntax. This means that for instrumented regressions, cmp differs from ivregress, ivprobit, ivtobit, and similar commands in not automatically including exogenous regressors (included instruments) from the second stage in the first stage. So you must arrange for this yourself. For example, ivreg y x1 (x2=z) corresponds to cmp (y=x1 x2) (x2=x1 z), ind($cmp_cont $cmp_cont).

In order to model random coefficients and effects, cmp borrows syntax from xtmixed. It is best explained with examples. This eq specifies an equation with two levels of random effects corresponding to groups defined by the variables school and class:

(math = age || school: || class:)

(school, coming first, is understood to be "above" class in the hierarchy.) At a given level, random effects can be assumed present in some equations and not others. Those in more than one equation at a given level are assumed to be (potentially) correlated across equations (an assumption that can be overridden through constraints or the covariance() option). This specifies a school effect only for math but not reading scores, and potentially correlated class effects for both:

(math = age || school: || class:) (reading = age || class:)

This adds random coefficients on age at the class level in both equations. The two new coefficients are potentially correlated with each other and with the random intercepts at the same level:

(math = age || school: || class: age) (reading = age || class: age)

Weights of various types may be specified at each level. These should be defined by variables or expressions that are constant within each group of the given level. Within a given group, aweights and pweights are rescaled to sum to the number of groups in the next level down (or number of observations if it is the bottom level). pweights imply vce(cluster groupvar) where groupvar defines the highest level in the hierarchy having pweights. iweights and fweightss are not rescaled; the latter affect the reported sample size. Since weights must be the same across equations, they need be specified only once for each level. So these are equivalent:

(math = age || school: || class: [pw=weightvar1]) (reading = age || class:) (math = age || school: || class: [pw=weightvar1]) (reading = age || class: [pw=weightvar1])

and the contradiction here would cause an error:

(math = age || school: || class: [pw=weightvar1]) (reading = age || class: [pw=weightvar2])

Unlike xtreg, xtprobit, xttobit, xtintreg, xtmixed, and gllamm, which use quadrature to integrate likelihoods over the unobserved random effects (see [R] xtmixed), cmp uses simulation. This involves making many draws from the hypothesized normal distributions of the effects, computing the implied likelihood for each set of draws, then averaging (Train 2009; Greene 2011, chap 15). To govern the simulation, random effects models in cmp must include the redraws() option. This sets the number of draws per observation at each level, the type of sequence (Halton, Hammersley, generalized Halton, pseudorandom), whether antithetics are also drawn, and whether, in the Halton and Hammersley caes, sequences should be scrambled using the square-root scrambler. (See Gates 2006 for more on all these concepts except scrambling, for which see Kolenikov 2012.) For (generalized) Halton and Hammersley sequences, it is preferable to make the number of draws prime, to insure more variable coverage of the distribution from observation to observation, making coverage more uniform overall. Increasing the number of draws increases precision at the expense of time. In a bid for speed cmp can begin by estimating with just 1 draw per observation and random effect (plus the antithetics if specified). It can then use the result of this rough search as the starting point for an estimate with more draws, then repeat, multiplying by a fixed amount each time until reaching the specified number of draws. The steps(#) suboption of redraws() can override the default number of steps, which is just 1. redraws(50 50, steps(1)) would specify immediate estimation with the full 50 draws per observation in a three-level model (with two levels of random effects). See options below for more.

Estimation problems with observations that are censored in three or more equations, such as a trivariate probit, require calculation of cumulative joint normal distributions of dimension three or higher. This is a non-trivial problem. The preferred technique again involves simulation: the method of Geweke, Hajivassiliou, and Keane (GHK). (Greene 2011; Cappellari and Jenkins 2003; Gates 2006). cmp accesses the algorithm through the separately available Mata function ghk2(), which must be installed for cmp to work. Modeled on the built-in ghk() and ghkfast(), ghk2() gives users choices about the length and nature of the sequences generated for the simulation, which choices cmp largely passes on through the optional ghkdraws() option, which includes type(), antithetics, scramble suboptions. See options below for more.

For both domains of simulation--random coefficients/effects and cumulative normal distributions estimated with the GHK algorithm---each observation or group gets its own sequence of draws. Thus changing the order of the observations in the data set will change the estimation results--one hopes only slightly. If using pseudorandom sequences (ghktype(random)) or generalized Halton ones (ghktye(ghalton)), the state of the Stata random number generator also slightly influences the results. For exact reproducibility of your results with these sequences, initialize the seed to some chosen value with the set seed command each time before running cmp. Estimations that require simulation can run much more slowly than those that do not.

cmp starts by fitting each equation separately in order to obtain a good starting point for the full model fit. Sometimes in this preparatory step, convergence difficulties make a reported parameter covariance matrix singular, yielding missing standard errors for some regressors. Or variables are found to be collinear. In order to maximize the chance of convergence, cmp ordinarily drops such regressors from the equations in which they cause trouble, reruns the single-equation fit, and then leaves them out for the full model too. The nodrop option prevents this behavior.

On estimation with interval-censored or truncated equations

For equations with interval-censored observations, list two variables before the =, somewhat following the syntax of intreg. For example, cmp (y1 y2 = x1 x2), ind($cmp_int) indicates that the dependent variable is censored to intervals whose lower bounds are in y1 and upper bounds are in y2. Missing values in y1 are treated as -infinity and those in y2 are treated as +infinity. For observations in which y1 and y2 coincide, there is no censoring, and the likelihood is the same as for $cmp_cont.

For equations with truncated distributions--which can be any model type besides multinomial and rank-ordered probit--use the truncpoints(exp exp) option within the specification for the given equation to provide truncation points. Like indicator expressions, the truncation points can be constants, expressions, or variable names, with missing values interpreted as above. Observations in which the observed value lies outside the truncation range are automatically dropped for that equation. An example is below.

On estimation with multinomial probit equations

Multinomial probits can be specified with two different syntaxes, roughly corresponding to the Stata commands mprobit and asmprobit. In the first syntax, the user lists a single equation, just as for other dependent variable types, and puts a 6 ($cmp_mprobit) in the indicators() list. The dependent variable holds the choice made in each case. Like mprobit, cmp treats all regressors as determinants of choice for all alternatives. In particular, it expands the specified equation to a group with one "utility" equation for each possible choice. All equations in the group include all regressors, except for the first, which has none. This one corresponds to the lowest value of the dependent variable, which is taken as the base alternative. The next, corresponding to the second-lowest value, is the "scale alternative," meaning that to normalize results, the variance of its error term is fixed. (The value it is fixed at depends on whether the structural option is invoked, on which see below.) In the first syntax, the single eq can contain an iia option after the comma so that cmp, like mprobit, will impose the Independence of Irrelevant Alternatives assumption. I.e., cmp will assume that errors in the utility equations are uncorrelated and have unit variance.

Such models, ones without exclusion restrictions and without the IIA assumption, are formally identified as long as at least one regressor varies over alternatives (Keane 1992). However, Keane emphasizes that fits for such models tend to be unstable if there are no exclusion restrictions. There are two ways to impose exclusion restrictions with cmp. First, as with mprobit, you can use constraints.

Second, you can use cmp's other multinomial probit syntax. In this "alternative-specific" syntax, you list one equation in the cmp command line for each alternative, including the base alternative. Different equations may include different regressors. Unlike asmprobit, cmp does not force regressors that appear in more than one equation to have the same coefficient across alternatives, although again this restriction can be imposed through constraints. When using the alternative-specific syntax, the dependent variables listed should be a set of dummies, as can be generated with xi, noomit from the underlying choice variable. The first equation is always treated as the base alternative, so here you can control which alternative is the base alternative, by reordering the equations. In general, regressors that appear in all other equations should be excluded from the base alternative. Otherwise, unless a constraint is imposed to reduce the degrees of freedom, the model will not be identified. (cmp automatically excludes the constant from the base alternative equation.) Variables that are specific to the base alternative, however, or to a strict subset of alternatives, can be included in the base alternative equation. In the second syntax, the IIA is not assumed, nor available through an option. It can still be imposed through constraints.

To specify an alternative-specific multinomial probit group, include expressions in the indicators() that evaluate to 0 or 6 ($cmp_out or $cmp_mprobit) for each equation (0 indicating that the choice is not available for given observations). You must enclose the whole list in an additional set of parentheses. Note that unlike with asmprobit, there should be still be one row in the data set per case, not per case and alternative. So instead of variables that vary by alternative, there must be a version of that variable for each alternative--not a single travel time variable that varies by mode of travel, but an air travel time variable, a bus travel time variable, and so on. An alternative-specific multinomial example is also below.

In a multinomial probit model with J choices, each possible choice has its own structural equation, including an error term. These error terms have some covariance structure. It is impossible, however, to estimate all the entries of the JxJ covariance matrix (Train 2003; Long and Freese (2006)). What is used in the computation of the likelihood is the (J-1)x(J-1) covariance matrix of the differences of the non-base-alternative errors from the base-alternative error. So by default, cmp, much like asmprobit, interprets the sigma and rho parameters relating to these equations as characterizing these differenced errors. To eliminate an excessive degree of scaling freedom, it constrains the error variance of the first non-base-alternative equation (the "scaling alternative") to 2, which it would be anyway if the errors for the first two structural equations were i.i.d. standard normal (so that their difference has variance 2).

The disadvantage of this parameterization is that it is hard to think about if you want to impose additional constraints on it. As an alternative, cmp, like asmprobit, offers a structural option. When this is included, cmp creates a full set of parameters to describe the covariance of the J structural errors. To remove the excessive degrees of freedom, it then constrains the base alternative error to have variance 1 and no correlation with the other errors; and constrains the error for the second, scaling alternative to also have variance 1. To impose the IIA under this option, one would then constrain various "atanhrho" and "lnsig" parameters to 0. An example below shows how to estimate the same IIA model with and without the structural parameterization.

The intuitiveness of the structural parameterization comes at a real cost, however (Bunch (1991); Long and Freese (2006), pp. 325-29). Though the particular set of constraints imposed seems innocent, it actually results in a mapping from the space of allowed structural covariances to the space of possible covariance matrices for the relative-differenced errors that is not onto. That is, there are positive definite (J-1)x(J-1) matrices, valid candidates for the covariance of the relative-differenced errors, which are not compatible with the assumptions that the first two alternatives' structural errors have variance one and that the first, base alternative's error is uncorrelated with all other structural errors. So the structural option can prevent cmp from reaching the maximum-likelihood fit. Long and Freese (2006) describe how changing which alternatives are the base and scaling alternatives, by reording the equations, can sometimes free an estimator to find the true maximum within the structural parametrization.

On estimation with rank-ordered probit equations

Specification and treatment of rank-ordered probit equations is nearly identical to that in the second syntax for multinomial probits described just above. Equations must be listed for every alternative. Indicators for these equations must be either 0 or 9 ($cmp_out or $cmp_roprobit) for each observation, and grouped in an extra set of parentheses. Constraints defining the base and scale alternatives are automatically imposed in the same way. The structural option too works identically. One option relating specifically to rank-ordered probit is reverse. It instructs cmp to interpret lower-numbered rankings as higher instead of lower.

Tips for achieving and speeding convergence

If you are having trouble achieving (or waiting for) convergence with cmp, these techniques might help:

1. Changing the search techniques using ml's technique() option, or perhaps the search parameters, through its maximization options. cmp accepts all these and passes them on to ml. The default Newton-Raphson search method usually works very well once ml has found a concave region. The DFP algorithm (tech(dfp)) often works better before then, and the two can be mixed, as with tech(dfp nr). See the details of the technique() option at ml. 2. If the estimation problem requires the GHK algorithm (see above), change the number of draws per observation in the simulation sequence using the ghkdraws() option. By default, cmp uses twice the square root of the number of observations for which the GHK algorithm is needed, i.e., the number of observations that are censored in at least three equations. Raising simulation accuracy by increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision. On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision, with remarkably few draws per observations--as few as 5 when the sample size is 10,000 (Cappellari and Jenkins 2003). And taking more draws can also greatly extend execution time. 3. If getting many "(not concave)" messages, try the difficult option, which instructs ml to use a different search algorithm in non-concave regions. 4. If the search appears to be converging in likelihood--if the log likelihood is hardly changing in each iteration--and yet fails to converge, try adding a nrtolerance(#) or nonrtolerance option to the command line after the comma. These are ml options that control when convergence is declared. (See ml_opts, below.) By default, ml declares convergence when the log likelihood is changing very little with successive iterations (within tolerances adjustable with the tolerance(#) and ltolerance(#) options) and when the calculated gradient vector is close enough to zero. In some difficult problems, such as ones with nearly collinear regressors, the imprecision of floating point numbers prevents ml from quite satisfying the second criterion. It can be loosened by using the nrtolerance(#) to set the scaled gradient tolerance to a value larger than its default of 1e-5, or eliminated altogether with nonrtolerance. Because of the risks of collinearity, cmp warns when the condition number of an equation's regressor matrix exceeds 20 (Greene 2000, p. 40). 5. Try cmp's interactive mode, via the interactive option. This allows the user to interrupt maximization by hitting Ctrl-Break or its equivalent, investigate and adjust the current solution, and then restart maximization by typing ml max. Techniques for exploring and changing the current solution include displaying the current coefficient and gradient vectors (with mat list $ML_b or mat list $ML_g) and running ml plot. See help ml, [R] ml, and Gould, Pitblado, and Sribney (2006) for details. cmp is slower in interactive mode.


indicators(exp [exp ...]) is required. It should pass a list of expressions that evaluate to 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9 for every observation, with one expression for each equation and in the same order. Expressions can be constants, variable names, or formulas. Individual formulas that contain spaces or parentheses should be enclosed in quotes.

level(#) specifies the confidence level, in percent, for confidence intervals of the coefficients; see help level. The default is 95.

quietly suppresses most output: the results from any single-equation initial fits and the iteration log during the full model fit.

nolrtest suppresses calculation and reporting of the likelihood ratio (LR) test of overall model fit, relative to a constant(s)-only model. This has no effect if data are pweighted or errors are robust or clustered. In those cases, the likelihood function does not reflect the non-sphericity of the errors, and so is a pseudolikelihood. The LR test is then invalid and is not run anyway.

redraws(# [# ...] , [type(halton | hammersley | ghalton | random) antithetics scramble steps(#)]) is required for random coefficient/effects models. It governs the simulation-based estimation of them. The option should begin with one number (#) for each level of the model above the base (e.g., two numbers in a three-level model); these specify the number of draws per observation from the simulated distributions of the random effects. The optional type() suboption sets the sequence type; the default is halton. The optional antithetics suboption doubles the number of draws at all levels by including antithetics. For more on these concepts, see See Drukker and Gates (2006). The optional scramble option invokes the square-root scrambler for Halton and Hammersley sequences. (See Kolenikov 2012. This scrambler has no effect for primes 2 and 3.) The optional steps(#) suboption sets the number of times to fit the model as the number of draws at each level is geometrically increased to the specified final values. The preliminary runs all use the Newton-Raphson search algorithm and ml's nonrtolerance tolerance(0.1) options in order to accept coarse fits. This stepping is meant only to increase speed by using fewer draws until the search is close to the maximum. It can be disabled with steps(1).

lf makes cmp use its lf-method evaluator instead of its d2-method one (for Stata 10 or earlier) or lf1-method one (Stata 11 or later). This forces cmp to compute first derivatives of the likelihood numerically instead of analytically, which substantially slows estimation but occassionally improves convergence.

ghkdraws([#] , [type(halton | hammersley | ghalton | random) antithetics scramble ]) governs the draws used in GHK simulation of higher-dimensional cumulative multivariate normal distributions. It is similar to the redraws option. However, it takes at most one number; if it, or the entire option, is omitted, the number of draws is set rather arbitrarily to twice the square root of the number of observations for which the simulation is needed. (Simulated maximum likelihood is consistent as long as the number of draws rises with the square root of the number of observations. In practice, a higher number of observations often reduces the number of draws per observation needed for reliable results. Train 2009, p. 252.) As in the redraws() option, the optional type(string) suboption specifies the type of sequence in the GHK simulation, halton being the default; antithetics requests antithetic draws; and scramble applies the square-root scrambler.

nodrop prevents the dropping of regressors from equations in which they receive missing standard errors in initial single-equation fits. It also prevents the removal of collinear variables.

covariance(covopt [covopt ...]) offers shorthand ways to constrain the cross-equation correlation structure of the errors at each level--shorthand, that is, compared to using constraint. There should be one covopt for each level in the model, and each can be unstructured, exchangeable, or independent. unstructured, the default, imposes no constraint. exchangeable specifies that all correlations between random effects, coefficients, or residual errors in different equations, within a given level are the same; and likewise for all the variances of all these areas. independent sets all cross-equation correlations at a given level to zero. Separately, the re_equation syntax documented above also includes a covariance() option. With the same choices and defintitions, it controls variances and covariances within a given equation at a given level.

interactive makes cmp fit the full model in ml's interactive mode. This allows the user to interrupt the model fit with Ctrl-Break or its equivalent, view and adjust the trial solution with such commands as ml plot, then restart optimization by typing ml max. See help ml, [R] ml, and Gould, Pitblado, and Sribney (2006) for details. cmp runs more slowly in interactive mode.

init(vector) passes a row vector of user-chosen starting values for the full model fit, in the manner of the ml init, copy command. The vector must contain exactly one element for each parameter cmp will estimate, and in the same order as cmp reports the parameter estimates in the output. Thus, at the end will be the initial guesses for the lnsig_i parameters, then those for the atanhrho_ij, then those for any ordered-probit cuts. (cmp normally also reports sig_i's and rho_ij's, but these are not additional parameters, merely transformed versions of underlying ones, and should be ignored in building the vector of starting values.) The names of the row and columns of the vector do not matter.

noestimate simplifies the job of constructing an initial vector for the init() option. It instructs cmp to stop before fitting the full model and leave behind an e(b) return vector with one labeled entry for each free parameter. To view this vector, type mat list e(b). You can copy or edit this vector, such as with "mat b=e(b)", then pass it back to cmp with the init() option.

structural forces the structural covariance parameterization for all multinomial and rank-ordered equation groups. See above for more.

reverse instructs cmp to interpret lower-numbered ranks in rank-ordered probit equations as being higher.

psampling(# #) makes cmp perform "progressive sampling," which can speed estimation on large data sets. First it estimates on a small subsample, then a larger one, etc., until reaching the full sample. Each iteration uses the previous one's estimates as a starting point. The first argument in the option sets the initial sample size, either in absolute terms (if it is at least 1) or as a fraction of the full sample (if it is less than 1). The second argument is the factor by which the sample should grow in each iteration. This process is analogous to but distinct from the stepping that occurs by default in simulating random effects.

ml_opts: cmp accepts the following standard ml options: trace gradient hessian showstep technique(algorithm_specs) vce(oim|opg|robust|cluster) iterate(#) tolerance(#) ltolerance(#) gtolerance(#) nrtolerance(#) nonrtolerance shownrtolerance difficult constraints(clist) score(newvarlist|stub*)

svy indicates that ml is to pick up the svy settings set by svyset and use the robust variance estimator. This option requires the data to be svyset. svy may not be specified with vce() or weights. See help svy estat.

svy_opts: Along with svy, users may also specify any of these related ml options, which affect how the svy-based variance is estimated: nosvyadjust subpop(subpop_spec) srssubpop. And users may specify any of these ml options, which affect output display: deff deft meff meft eform prob ci. See help svy estat.

On predict and margins after cmp

Options for predict after cmp are:

equation(eqno[,eqno]) specify equation(s) xb linear prediction stdp standard error of linear prediction stddp standard error of difference in linear predictions lnl observation-level log likelihood (in hierarchical models, averaged over groups) scores derivative of the log likelihood with respect to xb or parameter residuals calculate the residuals pr probability of a positive outcome (meant for probit equations) e(# #) censored expected value (see help regress postestimation) ystar(# #) truncated expected value (see help regress postestimation) outcome(outcome) specify outcome(s), for ordered probit only nooffset ignore any offset() or exposure() variable

Note that the e(# #) and ystar(# #) options should not include a comma between the two bounds.

eqno can be an equation name (if not set explicitly, an equation's name is that of its dependent variable). Or it can be an equation number preceded by a #. The default equation is #1, unless the provided variable list has one entry for each equation, or takes the form stub*. These request prediction variables for all equations, with names as given or as automatically generated beginning with stub.

In contrast, for ordered probit equations, if pr is specified, predict will by default compute probability variables for all outcomes. The names for these variables will be automatically generated using a provided variable name as a stub. This stub may be directly provided in the command line--in which case it should not include a *--or may itself be automatically generated by a cross-equation stub*. Thus it is possible to generate probabilities for all outcomes in all ordered probit equations with a single, terse command. Alternatively, the outcome(outcome) option can be used to request probabilities for just one outcome. outcome can be a value for the dependent variable, or a category number preceded by a #. For example, if the categorical dependent variable takes the values 0, 3, and 4, then outcome(4) and outcome(#3) are synonyms. (outcome() also implies pr.)

In explaining the multi-equation and -outcome behavior of predict after cmp, examples are worth a thousand words.

The flexibility of cmp affects the use of predict and margins after estimation. Because the censoring type (probit, tobit, etc.) can technically vary by observation, the default statistic for predict is always xb, linear fitted values. This is unlike for probit and oprobit, after which the default is pr, predicted probabilities of outcomes. So to obtain probilities predicted by (ordered) probit equations, remember to include the pr option in the predict command line or predict(pr) in the margins command line. (For ordered probit equations, an outcome() option will also imply pr.)

When using margins to estimate marginal effects with respect to one equation in a multi-equation cmp model, users may need to include the force option in the cmd:margins} command line.

Examples of predict and margins after cmp are below.


cmp is not an official Stata command. It is a free contribution to the research community. Please cite it as such: Roodman, D. 2011. Estimating fully observed recursive mixed-process models with cmp. Stata Journal 11(2): 159-206.

Published examples See Google Scholar.

Introductory examples

The purpose of cmp is not to match standard commands, but to fit models otherwise beyond easy estimation in Stata. But replications illustrate how cmp works (colored text is clickable):

* Define indicator macros for clarity. . cmp setup

. webuse laborsup

* Make censoring level 0 for fem_inc since pre-Oct '07 ivtobit assumes it is because of bug. . replace fem_inc = fem_inc - 10

. reg kids fem_inc male_educ . cmp (kids = fem_inc male_educ), ind($cmp_cont) quietly

. sureg (kids = fem_inc male_educ) (fem_work = male_educ), isure . cmp (kids = fem_inc male_educ) (fem_work = male_educ), ind($cmp_cont $cmp_cont) quietly

. mvreg fem_educ male_educ = kids other_inc fem_inc . cmp (fem_educ = kids other_inc fem_inc) (male_educ = kids other_inc fem_inc), ind(1 1) qui

. ivreg fem_work fem_inc (kids = male_educ), first . cmp (kids = fem_inc male_educ) (fem_work = kids fem_inc), ind($cmp_cont $cmp_cont) qui

. ivregress liml fem_work fem_inc (kids = male_educ other_inc) . cmp (kids = fem_inc male_educ other_inc) (fem_work = kids fem_inc), ind($cmp_cont $cmp_cont) qui}

. probit kids fem_inc male_educ . predict p . margins, dydx(*) . cmp (kids = fem_inc male_educ), ind($cmp_probit) qui . predict p2, pr . margins, dydx(*) predict(pr)

. oprobit kids fem_inc male_educ . margins, dydx(*) predict(outcome(#2)) . cmp (kids = fem_inc male_educ), ind($cmp_oprobit) qui . margins, dydx(*) predict(eq(#1) outcome(#2) pr)

. gen byte anykids = kids > 0 . biprobit (anykids = fem_inc male_educ) (fem_work = male_educ) . cmp (anykids = fem_inc male_educ) (fem_work = male_educ), ind($cmp_probit $cmp_probit)

. tetrachoric anykids fem_work . cmp (anykids = ) (fem_work = ), ind($cmp_probit $cmp_probit) nolr qui

. ivprobit fem_work fem_educ kids (other_inc = male_educ), first . margins, predict(pr) dydx(*) . cmp (fem_work = other_inc fem_educ kids) (other_inc = fem_educ kids male_educ), ind($cmp_probit $cmp_cont) . margins, predict(pr eq(#1)) dydx(*) force

. treatreg other_inc fem_educ kids, treat(fem_work = male_educ) . cmp (other_inc = fem_educ kids fem_work) (fem_work = male_educ), ind($cmp_cont $cmp_probit) qui

. tobit fem_inc kids male_educ, ll . cmp (fem_inc = kids male_educ), ind("cond(fem_inc, $cmp_cont, $cmp_left)") qui

. ivtobit fem_inc kids (male_educ = other_inc), ll first . cmp (fem_inc=kids male_educ) (male_educ=kids other_inc), ind("cond(fem_inc,$cmp_cont,$cmp_left)" $cmp_cont)

. preserve . webuse intregxmpl, clear . intreg wage1 wage2 age age2 nev_mar rural school tenure . cmp (wage1 wage2 = age age2 nev_mar rural school tenure), ind($cmp_int) qui . restore

. preserve . webuse laborsub, clear . truncreg whrs kl6 k618 wa we, ll(0) . cmp (whrs = kl6 k618 wa we, trunc(0 .)), ind($cmp_cont) qui . restore

. preserve . webuse sysdsn3, clear . mprobit insure age male nonwhite site2 site3 . cmp (insure = age male nonwhite site2 site3, iia), nolr ind($cmp_mprobit) qui . restore

. preserve . webuse travel, clear . asmprobit choice travelcost termtime, casevars(income) case(id) alternatives(mode) struct . drop invehiclecost traveltime partysize . reshape wide choice termtime travelcost, i(id) j(mode) . constraint 1 [air]termtime1 = [train]termtime2 . constraint 2 [train]termtime2 = [bus]termtime3 . constraint 3 [bus]termtime3 = [car]termtime4 . constraint 4 [air]travelcost1 = [train]travelcost2 . constraint 5 [train]travelcost2 = [bus]travelcost3 . constraint 6 [bus]travelcost3 = [car]travelcost4 . cmp (air:choice1=t*1) (train: choice2=income t*2) (bus: choice3=income t*3) (car: choice4=income t*4), ind((6 6 6 6)) constr(1/6) nodrop struct tech(dfp) . restore

. preserve . webuse wlsrank, clear . asroprobit rank high low if noties, casevars(female score) case(id) alternatives(jobchar) reverse . reshape wide rank high low, i(id) j(jobchar) . constraint 1 [esteem]high1=[variety]high2 . constraint 2 [esteem]high1=[autonomy]high3 . constraint 3 [esteem]high1=[security]high4 . constraint 4 [esteem]low1=[variety]low2 . constraint 5 [esteem]low1=[autonomy]low3 . constraint 6 [esteem]low1=[security]low4 . cmp (esteem:rank1=high1 low1)(variety:rank2=female score high2 low2)(autonomy:rank3=female score high3 low3)(security:rank4=female score high4 low4) if noties,ind((9 9 9 9)) tech(dfp) ghkd(200, type(hammersley)) rev constr(1/6) . restore

* Heckman selection models.

. preserve

. webuse womenwk, clear . heckman wage education age, select(married children education age) mills(heckman_mills) . gen selectvar = wage<. . cmp (wage = education age) (selectvar = married children education age), ind(selectvar $cmp_probit) nolr qui . predict cmp_mills, eq(selectvar) . replace cmp_mills = normalden(cmp_mills)/normal(cmp_mills)

. gen wage2 = wage > 20 if wage < . . heckprob wage2 education age, select(married children education age) . cmp (wage2 = education age) (selectvar = married children education age), ind(selectvar*$cmp_probit $cmp_probit) qui

. restore

* Hierarchical/random effects models

. preserve . webuse union, clear . gen double south_year = south * year . xtprobit union age grade not_smsa south year south_year . cmp (union = age grade not_smsa south year south_year || idcode:), ind($cmp_probit) nolr redraws(101, anti) tech(dfp) . restore

. preserve . webuse nlswork3, clear . gen double south_year = south * year . xttobit ln_wage union age grade not_smsa south year south_year, ul(1.9) . replace ln_wage = 1.9 if ln_wage > 1.9 . cmp (ln_wage = union age grade not_smsa south year south_year || idcode:), ind("cond(ln_wage<1.899999, $cmp_cont, $cmp_right)") nolr redraws(101) tech(dfp) . restore

. preserve . webuse nlswork5, clear . gen double south_year = south * year . xtintreg ln_wage1 ln_wage2 union age grade south year south_year occ_code . cmp (ln_wage1 ln_wage2 = union age grade south year south_year occ_code || idcode:), ind($cmp_int) nolr redraws(101, type(hammersley)) tech(dfp) . restore

These examples go beyond standard commands:

. webuse laborsup

* Regress an unbounded, continuous variable on an instrumented, binary one. 2SLS is consistent but less efficient. . cmp (other_inc = fem_work) (fem_work = kids), ind($cmp_cont $cmp_probit) qui robust . ivreg other_inc (fem_work = kids), robust

* Now regress it on a left-censored one, female income, which is only modeled for observations in which the woman works. . gen byte ind2 = cond(fem_work, cond(fem_inc, $cmp_cont, $cmp_left), $cmp_out) . cmp (other_inc=fem_inc kids) (fem_inc=fem_edu), ind($cmp_cont ind2)

* "IV-oprobit" . cmp (kids = fem_educ) (fem_educ = fem_work), ind($cmp_oprobit $cmp_cont) tech(dfp) nolr . margins, dydx(*) predict(eq(#1) pr outcome(#2)) force

* Ordered probit with Heckman selection modeling . preserve . webuse womenwk, clear . gen selectvar = wage<. . gen wage3 = (wage > 10)+(wage > 30) if wage < . . cmp (wage3 = education age) (selectvar = married children education age), ind(selectvar*$cmp_oprobit $cmp_probit) qui . restore

* Correlated random coefficient and random effect. . preserve . use, clear . cmp (gcse = lrt || school: lrt), ind($cmp_cont) nolr redraws(101, anti) tech(dfp) . restore * Multinomial probit with heterogeneous preferences (random effects by individual) . preserve . use, clear . cmp (tby = sex, iia || scy3:), ind($cmp_mprobit) nolr redraws(47, anti) tech(dfp) . restore

These illustrate subtleties of predict after cmp:

. webuse laborsup

* Bivariate seemingly unrelated ordered probit . gen byte kids2 = kids + int(uniform()*3) . cmp (kids=fem_educ) (kids2=fem_educ), ind($cmp_oprobit $cmp_oprobit) nolr tech(dfp) qui * Predict fitted values. Fitted values are always the default, as is equation #1 . predict xbA * Two ways to predict fitted values for all equations . predict xbB* . predict xbC xbD * Get scores for all equations and parameters . predict sc*, score * Get observation-level log-likelihoods . predict lnl, lnl * Two ways to predict kids=0, using (default) first equation . predict prA, pr outcome(0) . predict prB, outcome(#1) * Predict kids2=4, using second equation . predict prC, outcome(4) eq(kids2) * Predict all outcomes, all equations. . predict prD*, pr * Same but result variable names for the two equations start with prE and prF respectively. . predict prE prF, pr * Predict all outcomes, equation 2. Generates variables prG_Y where Y is outcome number (not outcome value). . predict prG, eq(#2) pr


Bunch, D.S. 1991. Estimability in the multinomial probit model. Transportation Research. 25B(1): 1-12. Cappellari, L., and S. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood. Stata Journal 3(3): 278-94. Drukker, D.M., and R. Gates. 2006. Generating Halton sequences using Mata. Stata Journal 6(2): 214-28. Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal simulator. Stata Journal 6(2): 190-213. Gould, W., J. Pitblado, and W. Sribney. 2006. Maximum Likelihood Estimation with Stata. 3rd ed. College Station: Stata Press. Greene, W.H. 2002. Econometric Analysis, 5th ed. Prentice-Hall. Greene, W.H. 2011. Econometric Analysis, 7th ed. Prentice-Hall. Chapter 15 Keane, M.P. 1992. A note on identification in the multinomial probit model. Journal of Business and Economics Statistics 10(2), pp. 193-200. Kelejian, H.H. 1971. Two-stage least squares and econometric systems linear in parameters but nonlinear in the endogenous variables. Journal of the American Statistical Association 66(334): 373-74. Long, J. S., and J. Freese. 2006. Regression models for categorical dependent variables using Stata. 2nd ed. College Station, TX: Stata Press. Pagan. A. 1979. Some consequences of viewing LIML as an iterated Aiken estimator. Economics Letters 3:369-372. Pitt, M.M., and S. R. Khandker. 1998. The impact of group-based credit programs on poor households in Bangladesh: Does the gender of participants matter? Journal of Political Economy 106(5): 958-96. Rivers, D., and Q. Vuong. 1988. Limited information estimators and exogeneity tests for simultaneous probit models. Journal of Econometrics 39: 347-66. Roodman, D. 2011. Estimating fully observed recursive mixed-process models with cmp. Stata Journal 11(2): 159-206. Smith, R.J., and R.W. Blundell. 1986. An exogeneity test for a simultaneous equation tobit model with an application to labor supply. Econometrica 54(3): 679-85. Train, K. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge University Press.


David Roodman Senior Fellow Center for Global Development Washington, DC


Thanks to Kit Baum, David Drukker, Arne Hole, Stanislaw Kolenikov, and Mead Over for comments.

Also see

[R] ml, [R] biprobit, [R] probit, [R] oprobit, [R] sureg, [R] ivreg, [R] tobit, [R] cnreg, [R] intreg, [R] truncreg, [R] ivtobit, [R] ivprobit, [R] heckman, [R] heckprob, [SVY] svy estimation.