```-------------------------------------------------------------------------------
help for zoib
-------------------------------------------------------------------------------

Fitting a zero one inflated beta distribution by maximum likelihood

zoib depvar [indepvars] [weight] [if] [in] [, oneinflate(varlist_o)
zeroinflate(varlist_z) nozero noone phivar(varlist_p) robust
cluster(clustervar) level(#) maximize_options ]

by ... : may be used with zoib; see help by.

fweights, pweights, and aweights are allowed; see help weights.

When using Stata version 11 or higher, indepvars, oneinflate(),
zeroinflate(), and phivar() may contain factor variables; see fvvarlist.

Description

zoib fits by maximum likelihood a zero one inflated beta distribution to
a distribution of a variable depvar. depvar ranges between 0 and 1: for
example, it may be a proportion.  It will estimate the probabilities of
having the value 0 and/or 1 as separate processes.  The logic is that we
can often think of proportions of 0 or 1 as being qualitatively different
and generated through a different process as the other proportions.

The zero one inflated beta distribution consists of three parts:

a probability that depvar = 0
a probability that depvar = 1
the distribution of depvar given that 0 < depvar < 1

This means that the likelihood is:

[1-(Pr(depvar = 0)] * [1-Pr(depvar = 1)] * Beta(depvar | mu, phi) if 0
> < depvar < 1
Pr(depvar = 0) if depvar = 0
Pr(depvar = 1) if depvar = 1

The zero inflation and one inflation parts of this model are by default
included whenever the dependent variable contains the value 0 and 1
respectively, and excluded otherwise. The user can force the exclusion of
the of these parts by specifying the nozero and noone options.

The the effects on the log odds of having the value 0 or 1 on the
variable depvar are represented in the zeroinflate and oneinflate
equations respectively. The remaining proportions are modelled using a
beta-distribution useing the parameterization discussed in (e.g. Ferrari
and Cribari-Neto 2004, Paolino 2001, or Smithson and Verkuilen 2006).
These effects are also reported on the logit scale.

An alternative to zoib is to assume the proportions represent rare events
that did not have had the time to get a single realization, so the 0s and
1s are created via the same process as all the other proportions.  In
this case one can use a fractional logit model as proposed by Papke and
Wooldridge (1996), which can be estimated using glm, see:
http://www.stata.com/support/faqs/stat/logit.html.

Options

zeroinflate() specifies the variables the influence the log odds of
having the value 0 on depvar. This option can only be specified if
the value 0 exists in depvar.

oneinflate() specifies the variables the influence the log odds of having
the value 1 on depvar. This option can only be specified if the value
1 exists in depvar.

nozero specifies that no zero inflation equation is to be estimated. This
implies that all observations with the value 0 on depvar will be
ignored.

noone specifies that no one inflation equation is to be estimated. This
implies that all observations with the value 1 on depvar will be
ignored.

phivar() allow the user to specify each the scale parameter for the beta
part of the zero one inflated beta distribution as a function of the
covariates specified in the respective variable list. A constant term
is always included in each equation.

robust specifies that the Huber/White/sandwich estimator of variance is
to be used in place of the traditional calculation; see [U] 23.14
Obtaining robust variance estimates.  robust combined with cluster()
allows observations which are not independent within cluster
(although they must be independent between clusters).

cluster(clustervar) specifies that the observations are independent
across groups (clusters) but not necessarily within groups.
clustervar specifies to which group each observation belongs; e.g.,
cluster(personid) in data with repeated observations on individuals.
See [U] 23.14 Obtaining robust variance estimates.  Specifying
cluster() implies robust.

level(#) specifies the confidence level, in percent, for the confidence
intervals of the coefficients; see help level.

nolog suppresses the iteration log.

maximize_options control the maximization process; see help maximize. If
you are seeing many "(not concave)" messages in the log, using the
difficult option may help convergence.

Examples

use k401.dta, clear

replace totemp = totemp/100

zoib prate mrate totemp age sole,       ///
oneinflate( mrate totemp age sole)

mfx

Author

Maarten L. Buis, WZB
maarten.buis@wzb.eu

References

Cook, D.O., Kieschnick, R. and McCullough, B.D. 2008. Regression analysis
of proportions in finance with self selection. Journal of Empirical
Finance 15(5):860-867.

Evans, M., Hastings, N. and Peacock, B. 2000. Statistical distributions.
New York: John Wiley.

Ferrari, S.L.P. and Cribari-Neto, F. 2004.  Beta regression for modelling
rates and proportions.  Journal of Applied Statistics 31(7): 799-815.

Johnson, N.L., Kotz, S. and Balakrishnan, N. 1995.  Continuous univariate
distributions: Volume 2. New York: John Wiley.

MacKay, D.J.C. 2003.  Information theory, inference, and learning
algorithms.  Cambridge: Cambridge University Press (see p.316).
http://www.inference.phy.cam.ac.uk/itprnn/book.pdf

Papke, L.E. and Wooldridge, J.M. 1996.  Econometric Methods for
Fractional Response Variables with an Application to 401(k) Plan
Participation Rates. Journal of Applied Econometrics 11(6):619-632.

Paolino, P. 2001.  Maximum likelihood estimation of models with
beta-distributed dependent variables. Political Analysis 9(4): 325-346.
http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdf

Smithson, M. and Verkuilen, J. 2006.  A better lemon squeezer? Maximum
likelihood regression with beta-distributed dependent variables.
Psychological Methods 11(1): 54-71.

Acknowledgement

Jeroan Allison helpfully identified a bug in a pervious version of the
predict function of zoib.

Also see

Online: help for zoib postestimation, glm

```