------------------------------------------------------------------------------
help for mlboolean,mlboolfirst
-------------------------------------------------------------------------------

Boolean Logit and Probit

    mlboolean link_function n (calculus) (depvar) (indepvars1) (indepvars2)
            ...  ystar(varname), ml_options


The syntax of mlboolfirst after mlboolean is

    mlboolfirst indepvar


Description

mlboolean conducts a maximum-likelihood Boolean logit or probit estimation
(see Braumoeller 2003) in which there are multiple causal "paths" to a given
outcome or nonoutcome.  Each of the n paths is modeled as a standard logit or
probit curve, and the predicted values for all of the curves cumulate in a
manner described by a Boolean probability calculus (for example, a and (b or
c) cause y) to produce the observed binary dependent variable.

    Example. Imagine that in a given firm there are three ways in which an
    employee can be fired: as a result of embezzlement, as a result of poor
    performance, or as a result of a combination of company-wide budget
    cutbacks and a low position in the company's hierarchy.  We know whether
    each employee was fired, how much he or she embezzled, some general
    indicators of job performance (on-time percentage, recent customer
    service ratings, and recent overall performance ratings), changes in the
    company's earnings and stock value, and where the employee's position is
    in the company hierarchy.  We do not know the combination of reasons that
    resulted in termination.  This gives a system with four unobserved
    variables, modeled here as standard probit equations

        y1* = norm(b1 + b2(embezzle))

        y2* = norm(b3 + b4(ontime) + b5(custserv) + b6(performance))

        y3* = norm(b7 + b8(earnings) + b9(stock))

        y4* = norm(b10 + b11(hierarchy)),

    and an observed dependent variable that is the binary realization of an
    underlying process of the form

        p(fired) = 1 - (1-y1*) x (1-y2*) x (1-(y3* x y4*))

    - that is, y1 or y2 or (y3 and y4) cause termination.  We might then
    estimate an equation of the following form:

        . mlboolean probit 4 (aorbor(candd)) (fired) (embezzle) (ontime
        custserv performance) (earnings stock) (hierarchy)

    The predicted values for "fired" and for the various yn* are estimated
    automatically (see "Saved Variables," below).


mlboolfirst calculates and graphs predicted values for a given independent
variable.  All other variables are set to their means except for variables of
type int, which are set to their modal values.  Similarly, predictions are
plotted as curves for continuous variables and points for integers.  To
manipulate whether or not mlboolfirst flags the variable as an integer,
simply use the recast command.


Required Input

mlboolean requires the following, in order:

    A link function (either logit or probit).

    The number of causal paths n (n ² 5).

    The probability calculus that describes how they cumulate, using "a" to
    denote y1*, "b" to denote y2*, and so on, connected by "and" or "or".

            Examples: (aorborc), ((aorb)andcandd).

    The binary dependent variable, in parentheses.

    n sets of independent variables, in parentheses.  The independent
    variables can overlap partially - for example, one set might consist of
    x1-x4 and another of x1, x2, x5 and x6 if the probabilities of the
    antecedent events in question are thought to be correlated because each
    is influenced by x1 and x2.


Options

ystar(varname) specifies the names of the variables that will contain the
    predicted values of the latent variables associated with each of the n
    "paths" (see "Saved Variables").  The default is ystar(ystar).

ml_options: All other options are passed directly to ml, so any options that
work with the latter should work with the former.  See the ml documentation
for further details.  It is worth noting, however, that some of these options
are particularly relevant in the context of mlboolean:

gtolerance(#) specifies an optional tolerance for the gradient relative to
    the coefficients. When |g*b| <= gtolerance() for all parameters b_i and
    the corresponding elements of the gradient g_i, then the gradient
    tolerance criterion is met.  Unlike tolerance() and ltolerance(), the
    gtolerance() criterion must be met in addition to any other tolerance.
    That is, convergence is declared when gtolerance() is met and tolerance()
    or ltolerance() is also met.  The gtolerance() option is provided for
    particularly deceptive likelihood functions that may trigger premature
    declarations of convergence.  The option must be specified for gradient
    checking to be activated; by default the gradient is not checked.

lf0(#k #ll) specifies the number of parameters and log-likelihood value of
    the "constant-only" model so that mlboolean can report a likelihood-ratio
    test rather than a Wald test.

difficult specifies that the likelihood function is likely to be difficult to
    maximize.  In particular, difficult states that there may be regions
    where -H is not invertible and that, in those regions, Stata's standard
    fixup may not work well.  difficult specifies that a different fixup
    requiring substantially more computer time is to be used.  difficult can
    be of some help in obtaining "normal" parameter estimates when plateaus
    in profile likelihoods produce absurdly large standard errors; it can
    also make things worse.  Such situations are typically indicative of a
    dangerous lack of information and should be treated with caution.

init(ml_init_args) sets the initial parameter values. Because mlboolean can
    produce convoluted likelihood functions, the wise investigator will try
    an array of different starting values before reporting final results.

nrtolerance(#) specifies an optional tolerance that is based on the gradient
    g and Hessian H.  The tolerance is met when g*inv(H)*g' < gtolerance().
    Like gtolerance(), the nrtolerance() criterion must be met in addition to
    any other tolerance.  This option must be specified for g*inv(H)*g' to be
    checked; by default it is not.

technique() specifies how the likelihood function is to be maximized.  The
    following algorithms are currently implemented in ml.

    technique(nr) specifies Stata's modified Newton-Raphson (NR) algorithm.

    technique(bhhh) specifies the Berndt-Hall-Hall-Hausman (BHHH) algorithm.

    technique(dfp) specifies Davidon-Fletcher-Powell (DFP) algorithm.

    technique(bfgs) specifies the Broyden-Fletcher-Goldfarb-Shanno (BFGS)
        algorithm.

    It is possible to switch between algorithms by specifying more than one
    in the technique() option.  For example, specifying technique(bhhh dfp)
    will cause ml to switch between the BHHH and DFP algorithms. ml will use
    an algorithm for 5 iterations, by default, before switching to the next
    algorithm.  Thus technique(bhhh dfp) will cause ml to switch between BHHH
    and DFP every 5 iterations.  You may specify a different number of
    iterations for each algorithm by including a count after it.  For example
    technique(bhhh 10 nr 1000) will cause ml to optimize the likelihood using
    BHHH for 10 iterations before switching to the modified Newton-Raphson
    algorithm, then switch back to BHHH after ml spends 1000 iterations using
    NR.


search(on|quietly|off) specifies whether ml search is to be used to improve
    the initial values.  Note that search(on) is the default.

nowarning is allowed only with iterate(0).  nowarning suppresses the
    "convergence not achieved" message.  Not remotely recommended.


Saved Variables

boolpred: Predicted probability of occurrence of dependent variable in a
    given case.

ystar_n: Predicted values of latent variables associated with each of the n
    "paths." Variable names can be changed with the ystar(varname) option,
    above; default is ystar(ystar).

    Example. In the job-termination example above, we might estimate for a
    given employee that the probability of termination due to embezzlement
    (ystar_a) is 0.1, the probability of termination due to poor performance
    (ystar_b) is 0.5, the probability that cutbacks will occur given the
    company's recent performance (ystar_c) is 0.7, and the probability that
    the employee will be vulnerable to cutbacks should they occur (ystar_d)
    is 0.6.  In that case, the prediction would be that the employee will be
    fired with probability 1-((1-0.1)x(1-0.5)x(1-(0.6x0.7))), or 73.9%.


Examples

        . mlboolean logit 2 (aandb) (y) (x1 x2) (x3)

        . mlboolean probit 4 ((aandb)or(candd)) (y) (x1 x2) (x3 x4 x5) (x6
            x7) (x8), difficult

        . mlboolean logit 2 (aandb) (plantliv) (H2O) (sunlite lamplite),
            robust

        . mlboolean probit 3 (aorborc) (nonvoter) (apathy) (alienation)
            (indifference) [pweight=weight]

        . mlboolean probit 3 (aand(borc)) (y) (x1 x2) (x1 x3 x4) (x5 x6),
            init(Path1:x1=0 Path1:x2=0 Path2:x1=0 Path2:x3=0 Path2:x4=0
            Path3:x5=0 Path3:x6=0)

        . probit dv

        . mlboolean probit 2 (aandb) (dv) (x1 x2) (x1 x3 x4), lf0(1
            -2517.1859)



Known Issues

The maximum number of paths is five.

Partial observability routines are generically starved for information, and
    this one is no different.  The routine requires substantial variation for
    each independent variable at different levels of all of the others.  The
    warning sign of the absence of such variation is exploding standard
    errors, which typically correspond to plateaus in the relevant profile
    likelihood.  This is an indicator that more data are required.

The probability calculus must contain as few parentheses as logically
    possible; otherwise it will not be recognized.  For example,
    ((aorb)andcandd) will be recognized, but ((aandb)and(candd)) will not.


Version

Version 1.3.  Contact bfbraum@fas.harvard.edu with comments or questions.
    Click here to check for updated versions.


Required Files

mlboolean.ado, mlboolean.hlp, mlboolfirst.ado, mlboollog.ado, mlboolpred.ado,
    mlboolprep.ado, mlboolprob.ado


References

    Braumoeller, Bear F. (2003) "Causal Complexity and the Study of