```-------------------------------------------------------------------------------
help for betafit
-------------------------------------------------------------------------------

Fitting a two-parameter beta distribution by maximum likelihood

betafit depvar [weight] [if] [in] [, { alphavar(varlist_a)
betavar(varlist_b) } | { muvar(varlist_m [, noconstant ])
phivar(varlist_p [, noconstant eform ]) alternative } rpr
robust cluster(clustervar) level(#) maximize_options ]

by ... : may be used with betafit; see help by.

fweights and aweights are allowed; see help weights.

When using Stata version 11 or higher, alphavar, betavar, muvar, and
phivar may contain factor variables; see fvvarlist.

Description

betafit fits by maximum likelihood a two-parameter beta distribution to a
distribution of a variable depvar. depvar ranges between 0 and 1: for
example, it may be a proportion.

Note that cases will be ignored if the dependent variable has a value
less than or equal to zero or more than or equal to one.  betafit can
still be used to fit a variable with a range beyond (0, 1) by rescaling
this variable. Several examples are shown in Smithson and Verkuilen
(2006).

betafit uses one of two parameterizations:

A conventional parameterization with shape parameters alpha > 0 and
beta > 0 (e.g. Forbes et al. 2011 or Johnson et al. 1995) will be
used if only depvar is specified or if one or both of alphavar() and
betavar() is specified. The conventional parameterization is
especially useful when no covariates are present.

An alternative parameterization with location parameter mu and scale
parameter phi (e.g. Ferrari and Cribari-Neto 2004, Paolino 2001, or
Smithson and Verkuilen 2006) will be used if one or both muvar() and
phivar() is specified or if the alternative option is specified.  The
alternative parameterization is especially useful when covariates are
present. mu is reported on the logit scale so that it stays between 0
and 1, i.e. logit mu = muvar * e(b_mu).  In order to help
interpretation, various types of marginal effects can be calculated
with dbetafit. phi is reported on the logarithmic scale to ensure
that it remains positive, i.e.  ln phi = phivar * e(b_phi).

Options

alphavar() and betavar() allow the user to specify each parameter in the
conventional parameterization as a function of the covariates
specified in the respective variable list. A constant term is always
included in each equation.

muvar() and phivar() allow the user to specify each parameter in the
alternative parameterization as a function of the covariates
specified in the respective variable list. A constant term can be
suppressed in each equation by specifying the noconstant suboption.
To display exponentiated coefficients for the phi equation, specify
the eform suboption.

As implied above, just one parameterization should be chosen.

alternative ensures that the alternative parameterization is used instead
of the conventional parameterization if only depvar is specified.
This option cannot be used with alphavar() or betavar().

rpr reports the estimated coefficients transformed to relative proportion
ratios, i.e., exp(b) rather than b.  Standard errors and confidence
intervals are similarly transformed.  This option affects how results
are displayed, not how they are estimated.  The interpretation of
these relative proportion ratios is discussed in detail in the
examples below.

Relative proportion ratios can be useful when the model contains
interaction terms, as in that case marginal effects as computed by
dbetafit will no longer be appropriate. Relative proportion ratios
for the interaction terms can still be interpreted as the factor by
which the relative proportion ratio changes, as is discussed in Buis
(2010).

robust specifies that the Huber/White/sandwich estimator of variance is
to be used in place of the traditional calculation; see [U] 23.14
Obtaining robust variance estimates.  robust combined with cluster()
allows observations which are not independent within cluster
(although they must be independent between clusters).

cluster(clustervar) specifies that the observations are independent
across groups (clusters) but not necessarily within groups.
clustervar specifies to which group each observation belongs, for
example, cluster(personid) in data with repeated observations on
individuals.  See [U] 23.14 Obtaining robust variance estimates.
Specifying cluster() implies robust.

level(#) specifies the confidence level, in percent, for the confidence
intervals of the coefficients; see help level.

nolog suppresses the iteration log.

maximize_options control the maximization process; see help maximize. If
you are seeing many "(not concave)" messages in the log, using the
difficult option may help convergence.

Saved results

In addition to the usual results saved after ml, betafit also saves the
following as appropriate if no covariates have been specified:

e(alpha) and e(beta) are the estimated parameters in the conventional
parameterization.

e(mu) and e(phi) are the estimated parameters in the alternative
parameterization.

The following results are saved regardless of whether covariates have
been specified, as appropriate:

e(b_alpha) and e(b_beta) are row vectors containing the parameter
estimates from each equation in the conventional parameterization.

e(b_mu) and e(b_phi) are row vectors containing the parameter
estimates from each equation in the alternative parameterization.

e(length_b_alpha) and e(length_b_beta) or e(length_b_mu) and
e(length_b_phi) contain the lengths of these vectors. If no
covariates are specified in an equation, the corresponding vector has
length equal to 1 (the constant term); otherwise, the length is one
plus the number of covariates.

Examples and interpretation of results

Marginal effects

To help with the interpretation of the results, use dbetafit to compute a
set of marginal effects. Alternatively. it is also possible to use mfx or
margins (for Stata versions 11 and higher).

These marginal effects depend on the values of the
explanatory/independent/x variables. So each observation will have its
own marginal effects. Those displayed by dbetafit are for a (fictional)
observation whose explanatory variables are fixed at the mean or at
values specified in the at() option. So in the example below the marginal
effects refer to a city governed by a leftwing government (the left is
not a minority and not absent from the city government, so it must be the
majority) and the house value and population density are average.

For this fictional city the proportion spent on governing is 9.5%
[E(governing|x)].  If that city is governed by a minority left
government, that proportion will decrease 0.8 percentage points; and if
it is governed by only parties on the right of the political spectrum,
the proportion will increase by 0.9 percentage points (First table,
column Min --> Max).

A 100,000 euro increase in average house value will lead to 2.5
percentage points increase in the proportion and an extra 1000 persons
per square kilometre will lead to an 1.1 percentage points decrease in
the proportion (Second table, column MFX at x).

use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear

betafit governing, mu(minorityleft noleft houseval popdens)

dbetafit, at(minorityleft 0 noleft 0)

(click to run)

Relative proportion ratios

Alternatively, betafit also allows the display of relative proportion
ratios. This can be useful when the dependent variable is a proportion.
Consider the example below. This models the proportion of a city-budget
spent on each city's own organization.  In that case the relative
proportion is the proportion spent on governing divided by 1 - the
proportion spent on governing. That is, in other words, the proportion
spent on governing divided by the proportion spent on useful stuff.  As
the total budget size drops out of this ratio, we can also say that this
is the number of euros spent on governing per euro spent on productive
stuff.

It is useful to see the baseline relative proportion, that is, the
relative proportion when all covariates are equal to zero. This is the
exponentiated constant. Since Stata by default supresses the display of
the exponentiated constant, we need to use a trick. We first create a
variable baseline that contains all 1s, and add that to our list of
variables in the muvar() option, and at the same time add the noconstant
sub-option. The coefficient of baseline is now the baseline relative
proportion.

In the example below, a city with a city government consisting of
majority left-leaning members, an average population and house value can
expect to spent 10 cents on governing per euro spent on productive stuff.
This ratio decreases by 10% (i.e. [1-.90]*100% = -10%) if it is governed
by a minority left government, and it increases by 11% when no left
parties are represented in the city government. A 100,000 euro increase
in average house value will lead to an 35% increase in the relative
proportion and an extra 1000 persons per square kilometre will lead to an
11% decrease in the relative proportion.

use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear
gen byte baseline = 1

sum popdens if !missing(minorityleft, noleft, houseval, popdens), meanonly
gen cpopdens = popdens - r(mean)

sum houseval if !missing(minorityleft, noleft, houseval, popdens), meanonly
gen chouseval = houseval - r(mean)

betafit governing, ///
mu(minorityleft noleft chouseval cpopdens baseline, nocons) rpr

(click to run)

Note

Notice the difference between percentage point changes (in the section on
marginal effects) and percentage changes (in the section on relative
proportion ratios). If we start with a baseline value of 1% and change by
1 percentage point, then the result will be 1 + 1 = 2%.  If we change the
baseline value by 1%, the result will be 1 * 1.01 = 1.01%.

Authors

Maarten L. Buis, Universitaet Tuebingen
maarten.buis@uni-tuebingen.de

Nicholas J. Cox, Durham University
n.j.cox@durham.ac.uk

Stephen P. Jenkins, The London School of Economics and Political Science
S.Jenkins@lse.ac.uk

References

Buis, M.L. 2010.  Stata tip 87: Interpretation of interactions in
non-linear models.  The Stata Journal 10(2): 305-308.

Forbes, C., Evans, M., Hastings, N. and Peacock, B. 2011. Statistical
distributions.  Hoboken, NJ: John Wiley.

Ferrari, S.L.P. and Cribari-Neto, F. 2004.  Beta regression for modelling
rates and proportions.  Journal of Applied Statistics 31(7): 799-815.

Johnson, N.L., Kotz, S. and Balakrishnan, N. 1995.  Continuous univariate
distributions: Volume 2. New York: John Wiley.

MacKay, D.J.C. 2003.  Information theory, inference, and learning
algorithms.  Cambridge: Cambridge University Press (see p.316).
http://www.inference.phy.cam.ac.uk/itprnn/book.pdf

Paolino, P. 2001.  Maximum likelihood estimation of models with
beta-distributed dependent variables. Political Analysis 9(4): 325-346.
http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdf

Smithson, M. and Verkuilen, J. 2006.  A better lemon squeezer? Maximum
likelihood regression with beta-distributed dependent variables.
Psychological Methods 11(1): 54-71.

Also see

Online: help for betafit postestimation,

```