------------------------------------------------------------------------------ help for

mlboolean,mlboolfirst-------------------------------------------------------------------------------

Boolean Logit and Probit

mlbooleanlink_functionn(calculus) (depvar) (indepvars1) (indepvars2) ...ystar(varname),ml_options

The syntax of

mlboolfirstaftermlbooleanis

mlboolfirstindepvar

Description

mlbooleanconducts a maximum-likelihood Boolean logit or probit estimation (see Braumoeller 2003) in which there are multiple causal "paths" to a given outcome or nonoutcome. Each of thenpaths is modeled as a standard logit or probit curve, and the predicted values for all of the curves cumulate in a manner described by a Boolean probability calculus (for example, a and (b or c) cause y) to produce the observed binary dependent variable.

Example. Imagine that in a given firm there are three ways in which an employee can be fired: as a result of embezzlement, as a result of poor performance, or as a result of a combination of company-wide budget cutbacks and a low position in the company's hierarchy. We know whether each employee was fired, how much he or she embezzled, some general indicators of job performance (on-time percentage, recent customer service ratings, and recent overall performance ratings), changes in the company's earnings and stock value, and where the employee's position is in the company hierarchy. We do not know the combination of reasons that resulted in termination. This gives a system with four unobserved variables, modeled here as standard probit equationsy1* = norm(b1 + b2(embezzle))

y2* = norm(b3 + b4(ontime) + b5(custserv) + b6(performance))

y3* = norm(b7 + b8(earnings) + b9(stock))

y4* = norm(b10 + b11(hierarchy)),

and an observed dependent variable that is the binary realization of an underlying process of the form

p(fired) = 1 - (1-y1*) x (1-y2*) x (1-(y3* x y4*))

- that is, y1 or y2 or (y3 and y4) cause termination. We might then estimate an equation of the following form:

. mlboolean probit 4 (aorbor(candd)) (fired) (embezzle) (ontime custserv performance) (earnings stock) (hierarchy)

The predicted values for "fired" and for the various yn* are estimated automatically (see "Saved Variables," below).

mlboolfirstcalculates and graphs predicted values for a given independent variable. All other variables are set to their means except for variables of type int, which are set to their modal values. Similarly, predictions are plotted as curves for continuous variables and points for integers. To manipulate whether or notmlboolfirstflags the variable as an integer, simply use the recast command.

Required Input

mlbooleanrequires the following, in order:A link function (either

logitorprobit).The number of causal paths

n(n² 5).The probability calculus that describes how they cumulate, using "a" to denote y1*, "b" to denote y2*, and so on, connected by "and" or "or".

Examples: (aorborc), ((aorb)andcandd).The binary dependent variable, in parentheses.

nsets of independent variables, in parentheses. The independent variables can overlap partially - for example, one set might consist of x1-x4 and another of x1, x2, x5 and x6 if the probabilities of the antecedent events in question are thought to be correlated because each is influenced by x1 and x2.

Options

ystar(varname)specifies the names of the variables that will contain the predicted values of the latent variables associated with each of then"paths" (see "Saved Variables"). The default isystar(ystar).

ml_options:All other options are passed directly toml, so any options that work with the latter should work with the former. See the ml documentation for further details. It is worth noting, however, that some of these options are particularly relevant in the context ofmlboolean:

gtolerance(#)specifies an optional tolerance for the gradient relative to the coefficients. When |g*b| <=gtolerance()for all parameters b_i and the corresponding elements of the gradient g_i, then the gradient tolerance criterion is met. Unliketolerance()andltolerance(), thegtolerance()criterion must be met in addition to any other tolerance. That is, convergence is declared whengtolerance()is met andtolerance()orltolerance()is also met. Thegtolerance()option is provided for particularly deceptive likelihood functions that may trigger premature declarations of convergence. The option must be specified for gradient checking to be activated; by default the gradient is not checked.

lf0(#k #ll)specifies the number of parameters and log-likelihood value of the "constant-only" model so thatmlbooleancan report a likelihood-ratio test rather than a Wald test.

difficultspecifies that the likelihood function is likely to be difficult to maximize. In particular,difficultstates that there may be regions where -H is not invertible and that, in those regions, Stata's standard fixup may not work well.difficultspecifies that a different fixup requiring substantially more computer time is to be used.difficultcan be of some help in obtaining "normal" parameter estimates when plateaus in profile likelihoods produce absurdly large standard errors; it can also make things worse. Such situations are typically indicative of a dangerous lack of information and should be treated with caution.

init(ml_init_args)sets the initial parameter values. Becausemlbooleancan produce convoluted likelihood functions, the wise investigator will try an array of different starting values before reporting final results.

nrtolerance(#)specifies an optional tolerance that is based on the gradient g and Hessian H. The tolerance is met when g*inv(H)*g' <gtolerance(). Likegtolerance(), thenrtolerance()criterion must be met in addition to any other tolerance. This option must be specified for g*inv(H)*g' to be checked; by default it is not.

technique()specifies how the likelihood function is to be maximized. The following algorithms are currently implemented in ml.

technique(nr)specifies Stata's modified Newton-Raphson (NR) algorithm.

technique(bhhh)specifies the Berndt-Hall-Hall-Hausman (BHHH) algorithm.

technique(dfp)specifies Davidon-Fletcher-Powell (DFP) algorithm.

technique(bfgs)specifies the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.It is possible to switch between algorithms by specifying more than one in the

technique()option. For example, specifyingtechnique(bhhhdfp)will causemlto switch between the BHHH and DFP algorithms.mlwill use an algorithm for 5 iterations, by default, before switching to the next algorithm. Thustechnique(bhhh dfp)will causemlto switch between BHHH and DFP every 5 iterations. You may specify a different number of iterations for each algorithm by including a count after it. For exampletechnique(bhhh 10 nr 1000)will causemlto optimize the likelihood using BHHH for 10 iterations before switching to the modified Newton-Raphson algorithm, then switch back to BHHH aftermlspends 1000 iterations using NR.

search(on|quietly|off)specifies whetherml searchis to be used to improve the initial values. Note thatsearch(on)is the default.

nowarningis allowed only withiterate(0).nowarningsuppresses the "convergence not achieved" message. Not remotely recommended.

Saved Variables

boolpred: Predicted probability of occurrence of dependent variable in a given case.

ystar_n: Predicted values of latent variables associated with each of then"paths." Variable names can be changed with theystar(varname)option, above; default isystar(ystar).

Example. In the job-termination example above, we might estimate for a given employee that the probability of termination due to embezzlement (ystar_a) is 0.1, the probability of termination due to poor performance (ystar_b) is 0.5, the probability that cutbacks will occur given the company's recent performance (ystar_c) is 0.7, and the probability that the employee will be vulnerable to cutbacks should they occur (ystar_d) is 0.6. In that case, the prediction would be that the employee will be fired with probability 1-((1-0.1)x(1-0.5)x(1-(0.6x0.7))), or 73.9%.

Examples. mlboolean logit 2 (aandb) (y) (x1 x2) (x3)

. mlboolean probit 4 ((aandb)or(candd)) (y) (x1 x2) (x3 x4 x5) (x6 x7) (x8), difficult

. mlboolean logit 2 (aandb) (plantliv) (H2O) (sunlite lamplite), robust

. mlboolean probit 3 (aorborc) (nonvoter) (apathy) (alienation) (indifference) [pweight=weight]

. mlboolean probit 3 (aand(borc)) (y) (x1 x2) (x1 x3 x4) (x5 x6), init(Path1:x1=0 Path1:x2=0 Path2:x1=0 Path2:x3=0 Path2:x4=0 Path3:x5=0 Path3:x6=0)

. probit dv

. mlboolean probit 2 (aandb) (dv) (x1 x2) (x1 x3 x4), lf0(1 -2517.1859)

Known IssuesThe maximum number of paths is five.

Partial observability routines are generically starved for information, and this one is no different. The routine requires substantial variation for each independent variable at different levels of all of the others. The warning sign of the absence of such variation is exploding standard errors, which typically correspond to plateaus in the relevant profile likelihood. This is an indicator that more data are required.

The probability calculus must contain as few parentheses as logically possible; otherwise it will not be recognized. For example, ((aorb)andcandd) will be recognized, but ((aandb)and(candd)) will not.

VersionVersion 1.3. Contact bfbraum@fas.harvard.edu with comments or questions. Click here to check for updated versions.

Required Filesmlboolean.ado, mlboolean.hlp, mlboolfirst.ado, mlboollog.ado, mlboolpred.ado, mlboolprep.ado, mlboolprob.ado

ReferencesBraumoeller, Bear F. (2003) "Causal Complexity and the Study of