-------------------------------------------------------------------------------
help for dirifit
-------------------------------------------------------------------------------

Fitting a Dirichlet distribution by maximum likelihood

        dirifit depvarlist [weight] [if exp] [in range] [, {
                 alphavar(varlist_a) alpha1|2|3|..|k(varlist_a_j) } | {
                 muvar(varlist_m) phivar(varlist_p)
                 mu1|2|3|...|k(varlist_m_j) baseoutcome(var) alternative }
                 robust cluster(clustervar) level(#) maximize_options ]

    by ... : may be used with dirifit; see help by.

    fweights and aweights are allowed; see help weights.


Description

    dirifit fits by maximum likelihood a Dirichlet distribution to a set of
    variables depvarlist.  Each variable in depvarlist ranges between 0 and 1
    and all variables in depvarlist must, for each observation, add up to 1:
    for example, they may be proportions.

    Note that cases will be ignored if the one or more of the dependent
    variables has a value less than or equal to zero or more than or equal to
    one or if the dependent variables don't add up to one.

    dirifit uses one of two parameterizations:

        A conventional parameterization with shape parameters alpha_j > 0
        (one for each variable in depvarlist) (e.g. Evans et al. 2000 or Kotz
        et al. 2000) will be used if only depvarlist is specified or if one
        or more of alphavar() and alpha1|2|3|...|k() is specified.  alpha_j
        is reported on the logarithmic scale to ensure that it remains
        positive. The conventional parameterization is especially useful when
        no covariates are present.

        An alternative parameterization with location parameters mu_j (one
        for each variable in depvarlist except the baseoutcome) and scale
        parameter phi will be used if one or more of muvar(),
        mu1|2|3|...|k(), baseoutcome(), and phivar() is specified or if the
        alternative option is specified.  The alternative parameterization is
        especially useful when covariates are present. mu_j are reported on
        the multinomial logit scale so that they stay between 0 and 1, and
        add up to one. In order to help interpretation, various types of
        marginal effects can be calculated with ddirifit. phi is reported on
        the logarithmic scale to ensure that it remains positive. This
        parameterization is analogous to the parameterization proposed by
        Paolino (2001), Ferrari and Cribari-Neto (2004), and Smithson and
        Verkuilen (2006) for the beta distribution.


Options

    alphavar() and alpha1|2|3|...|k() allow the user to specify each
        parameter in the conventional parameterization as a function of the
        covariates specified in the variable list. The covariates in
        alphavar() are common to all parameters, while alpha1|2|3|...|k()
        allow the user to specify (additional) covariates for the first,
        second, third, ..., k th parameter. The order of the parameters is
        determined by the order of depvarlist.  A constant term is always
        included in each equation.

    muvar(), mu1|2|3|...|k(), and phivar() allow the user to specify each
        parameter in the alternative parameterization as a function of the
        covariates specified in the respective variable list. The covariates
        in muvar() are common to all mu parameters, while mu1|2|3|...|k()
        allow the user to specify (additional) covariates for the first,
        second, third, ..., k th mu parameter. The order of the parameters is
        determined by the order of depvarlist. A constant term is always
        included in each equation.

    As implied above, just one parameterization should be chosen.

    alternative ensures that the alternative parameterization is used instead
        of the conventional parameterization if only depvarlist is specified.
        This option cannot be used with alphavar() or alpha1|2|3|...|k().

    baseoutcome variable in depvarlist that will be the baseoutcome. The
        default is the first variable of depvarlist. This option cannot be
        used with alphavar() or alpha1|2|3|...|k().

    robust specifies that the Huber/White/sandwich estimator of variance is
        to be used in place of the traditional calculation; see [U] 20.14
        Obtaining robust variance estimates ([U] 23.14 in version 8).  robust
        combined with cluster() allows observations which are not independent
        within cluster (although they must be independent between clusters).

    cluster(clustervar) specifies that the observations are independent
        across groups (clusters) but not necessarily within groups.
        clustervar specifies to which group each observation belongs; e.g.,
        cluster(personid) in data with repeated observations on individuals.
        See [U] 20.14 Obtaining robust variance estimates ([U] 23.14 in
        version 8).  Specifying cluster() implies robust.

    level(#) specifies the confidence level, in percent, for the confidence
        intervals of the coefficients; see help level.

    nolog suppresses the iteration log.

    maximize_options control the maximization process; see help maximize. If
        you are seeing many "(not concave)" messages in the log, using the
        difficult option may help convergence.


Saved results

    In addition to the usual results saved after ml, dirifit also saves the
    following, as appropriate:

        e(b_alpha1) to e(b_alphak) (where k is the number of variables in
        depvarlist) are row vectors containing the parameter estimates from
        each equation in the conventional parameterization.

        e(b_phi) and e(b_mu1) to e(b_muk) (where k is the number of variables
        in depvarlist) except for the baseoutcome, are row vectors containing
        the parameter estimates from each equation in the alternative
        parameterization.

        e(length_b_alpha1) to e(length_b_alphak) or e(length_b_mu1) to
        e(length_b_muk) and e(length_b_phi) contain the lengths of these
        vectors. If no covariates are specified in an equation, the
        corresponding vector has length equal to 1 (the constant term);
        otherwise, the length is one plus the number of covariates.

        
Examples


    use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear

    dirifit governing safety education recreation social urbanplanning, ///
        mu(minorityleft noleft houseval popdens)

    ddirifit, at(minorityleft 0 noleft 0 )

    (click to run)

Authors

    Maarten L. Buis, Universitaet Tuebingen
    maarten.buis@uni-tuebingen.de

    Nicholas J. Cox, Durham University
    n.j.cox@durham.ac.uk

    Stephen P. Jenkins, University of Essex
    stephenj@essex.ac.uk


Acknowledgement
Philipp Rehm provided a bug report.


References

    Evans, M., Hastings, N. and Peacock, B. 2000. Statistical distributions.
    New York: John Wiley.

    Ferrari, S.L.P. and Cribari-Neto, F. 2004.  Beta regression for modelling
    rates and proportions.  Journal of Applied Statistics 31(7): 799-815.

    Kotz, S., Balakrishnan, N., Johnson, N.L. 2000.  Continuous multivariate
    distributions: Volume 1. New York: John Wiley.

    MacKay, D.J.C. 2003.  Information theory, inference, and learning
    algorithms.  Cambridge: Cambridge University Press (see pp.316-318).
    http://www.inference.phy.cam.ac.uk/itprnn/book.pdf

    Paolino, P. 2001.  Maximum likelihood estimation of models with
    beta-distributed dependent variables. Political Analysis 9(4): 325-346.
    http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdf

    Smithson, M. and Verkuilen, J. 2006.  A better lemon squeezer? Maximum
    likelihood regression with beta-distributed dependent variables.
    Psychological Methods 11(1): 54-71.

Also see

    Online: help for dirifit_postestimation, betafit, fmlogit (if installed)