------------------------------------------------------------------------------- help forzoib-------------------------------------------------------------------------------

Fitting a zero one inflated beta distribution by maximum likelihood

zoibdepvar[indepvars] [weight] [if] [in] [,oneinflate(varlist_o)zeroinflate(varlist_z)nozeronoonephivar(varlist_p)robustcluster(clustervar)level(#)maximize_options]

by...:may be used withzoib; see help by.

fweights,pweights, andaweights are allowed; see help weights.When using Stata version 11 or higher,

indepvars, oneinflate(), zeroinflate(), and phivar() may contain factor variables; see fvvarlist.

Description

zoibfits by maximum likelihood a zero one inflated beta distribution to a distribution of a variabledepvar.depvarranges between 0 and 1: for example, it may be a proportion. It will estimate the probabilities of having the value 0 and/or 1 as separate processes. The logic is that we can often think of proportions of 0 or 1 as being qualitatively different and generated through a different process as the other proportions.The zero one inflated beta distribution consists of three parts:

a probability that

depvar= 0 a probability thatdepvar= 1 the distribution of depvar given that 0 <depvar< 1This means that the likelihood is:

[1-(Pr(

depvar = 0)] * [1-Pr(depvar = 1)] * Beta(depvar| mu, phi) if 0 > <depvar< 1 Pr(depvar= 0) ifdepvar = 0Pr(depvar= 1) ifdepvar = 1The zero inflation and one inflation parts of this model are by default included whenever the dependent variable contains the value 0 and 1 respectively, and excluded otherwise. The user can force the exclusion of the of these parts by specifying the

nozeroandnooneoptions.The the effects on the log odds of having the value 0 or 1 on the variable

depvarare represented in thezeroinflateandoneinflateequations respectively. The remaining proportions are modelled using a beta-distribution useing the parameterization discussed in (e.g. Ferrari and Cribari-Neto 2004, Paolino 2001, or Smithson and Verkuilen 2006). These effects are also reported on the logit scale.An alternative to

zoibis to assume the proportions represent rare events that did not have had the time to get a single realization, so the 0s and 1s are created via the same process as all the other proportions. In this case one can use a fractional logit model as proposed by Papke and Wooldridge (1996), which can be estimated usingglm, see: http://www.stata.com/support/faqs/stat/logit.html.

Options

zeroinflate()specifies the variables the influence the log odds of having the value 0 ondepvar. This option can only be specified if the value 0 exists indepvar.

oneinflate()specifies the variables the influence the log odds of having the value 1 ondepvar. This option can only be specified if the value 1 exists indepvar.

nozerospecifies that no zero inflation equation is to be estimated. This implies that all observations with the value 0 ondepvarwill be ignored.

noonespecifies that no one inflation equation is to be estimated. This implies that all observations with the value 1 ondepvarwill be ignored.

phivar()allow the user to specify each the scale parameter for the beta part of the zero one inflated beta distribution as a function of the covariates specified in the respective variable list. A constant term is always included in each equation.

robustspecifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation; see[U] 23.14Obtaining robust variance estimates.robustcombined withcluster()allows observations which are not independent within cluster (although they must be independent between clusters).

cluster(clustervar)specifies that the observations are independent across groups (clusters) but not necessarily within groups.clustervarspecifies to which group each observation belongs; e.g.,cluster(personid)in data with repeated observations on individuals. See[U] 23.14 Obtaining robust variance estimates. Specifyingcluster()impliesrobust.

level(#)specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level.

nologsuppresses the iteration log.

maximize_optionscontrol the maximization process; see help maximize. If you are seeing many "(not concave)" messages in the log, using thedifficultoption may help convergence.

Examples

use k401.dta, clear

replace totemp = totemp/100

zoib prate mrate totemp age sole, ///oneinflate( mrate totemp age sole)

mfx

AuthorMaarten L. Buis, WZB maarten.buis@wzb.eu

ReferencesCook, D.O., Kieschnick, R. and McCullough, B.D. 2008. Regression analysis of proportions in finance with self selection.

Journal of EmpiricalFinance15(5):860-867.Evans, M., Hastings, N. and Peacock, B. 2000.

Statistical distributions.New York: John Wiley.Ferrari, S.L.P. and Cribari-Neto, F. 2004. Beta regression for modelling rates and proportions.

Journal of Applied Statistics31(7): 799-815.Johnson, N.L., Kotz, S. and Balakrishnan, N. 1995.

Continuous univariatedistributions: Volume 2.New York: John Wiley.MacKay, D.J.C. 2003.

Information theory, inference, and learningalgorithms.Cambridge: Cambridge University Press (see p.316). http://www.inference.phy.cam.ac.uk/itprnn/book.pdfPapke, L.E. and Wooldridge, J.M. 1996. Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates.

Journal of Applied Econometrics11(6):619-632.Paolino, P. 2001. Maximum likelihood estimation of models with beta-distributed dependent variables.

Political Analysis9(4): 325-346. http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdfSmithson, M. and Verkuilen, J. 2006. A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables.

Psychological Methods11(1): 54-71.

AcknowledgementJeroan Allison helpfully identified a bug in a pervious version of the

predictfunction ofzoib.

Also seeOnline: help for zoib postestimation, glm