-------------------------------------------------------------------------------
help for fiskfit                                            Maarten L. Buis and
                                             Stephen P. Jenkins (December 2007)
-------------------------------------------------------------------------------

Fitting a Fisk distribution by ML to unit record data

        fiskfit varname [weight] [if] [in] [, avar(varlist1) bvar(varlist2)
                 ab(varlist) stats from(string) poorfrac(#) cdf(cdfname)
                 pdf(pdfname) robust cluster(varname) svy level(#)
                 maximize_options svy_options ]

    by ... : may be used with fiskfit; see help by.

    pweights, aweights, fweights, and iweights are allowed; see help weights.
    To use pweights, you must first svyset your data and then use the svy
    option.


Description

    fiskfit fits by ML the 2 parameter Fisk (1961) or log-logistic
    distribution to sample observations on a random variable var. Unit record
    data are assumed (rather than grouped data). It is closely related to the
    Singh-Maddala (Burr Type 12) distribution (Singh and Maddala, 1976) and
    the Dagum (Dagum, 1977,1980) distribution. All are special cases of the
    Generalized Beta of the Second Kind distribution (see gb2fit). For a
    comprehensive review of these and other related distributions, see
    Kleiber and Kotz (2003).


Options

    avar(varlist1), and bvar(varlist2), allow the user to specify each
        parameter as a function of the covariates specified in the respective
        variable list. A constant term is always included in each equation.

    ab(varlist) can be used instead of the previous option if the same
        covariates are to appear in each parameter equation.

    from(string) specifies initial values for the Fisk parameters, and is
        likely to be used only rarely. You can specify the initial values in
        one of three ways: the name of a vector containing the initial values
        (e.g., from(b0) where b0 is a properly labeled vector); by specifying
        coefficient names with the values (e.g., from(a:_cons=1 b:_cons=5
        p:_cons = 0); or by specifying an ordered list of values (e.g.,
        from(1 5 0 .16, copy)). Poor values in from() may lead to convergence
        problems. For more details, including the use of copy and skip, see
        {help:maximize}.

    If covariates are specified, the next four options are not available.

    stats displays selected distributional statistics implied by the Fisk
        parameter estimates:  quantiles, cumulative shares of total var at
        quantiles (i.e. the Lorenz curve ordinates), the mode, mean, standard
        deviation, variance, half the coefficient of variation squared, Gini
        coefficient, and quantile ratios p90/p10, p75/p25.

    poorfrac(#) displays the estimated proportion with values of var less
        than the cut-off specified by #. This option may be specified when
        replaying results.

    cdf(cdfname) creates a new variable cdfname containing the estimated Fisk
        c.d.f. value F(x) for each x.

    pdf(pdfname) creates a new variable pdfname containing the estimated Fisk
        p.d.f. value f(x) for each x.

    robust specifies that the Huber/White/sandwich estimator of variance is
        to be used in place of the traditional calculation; see [U] 23.14
        Obtaining robust variance estimates.  robust combined with cluster()
        allows observations which are not independent within cluster
        (although they must be independent between clusters).  If you specify
        pweights, robust is implied.

    cluster(varname) specifies that the observations are independent across
        groups (clusters) but not necessarily within groups.  varname
        specifies to which group each observation belongs; e.g.,
        cluster(personid) in data with repeated observations on individuals.
        See [U] 23.14 Obtaining robust variance estimates. cluster() can be
        used with pweights to produce estimates for unstratified
        cluster-sampled data.  Specifying cluster() implies robust.

    svy indicates that ml is to pick up the svy settings set by svyset and
        use the robust variance estimator. Thus, this option requires the
        data to be svyset; see help svyset. svy may not be combined with
        weights or the strata(), psu(), fpc(), or cluster() options.

    level(#) specifies the confidence level, in percent, for the confidence
        intervals of the coefficients; see help level.

    nolog suppresses the iteration log.

    maximize_options control the maximization process. The options available
        are those shown by maximize, with the exception of from().  If you
        are seeing many "(not concave)" messages in the iteration log, using
        the difficult or technique options may help convergence.

    svy_options specify the options used together with the svy option.


Saved results

    In addition to the usual results saved after ml, fiskfit also saves the
    following, if no covariates have been specified and the relevant options
    used:

    e(ba), and e(bb) are the estimated Fisk parameters.

    e(cdfvar) and e(pdfvar) are the variable names specified for the c.d.f.
    and the p.d.f.

    e(mode), e(mean), e(var), e(sd), e(i2), and e(gini) are the estimated
    mode, mean, variance, standard deviation, half coefficient of variation
    squared, Gini coefficient. e(pX), and e(LpX) are the quantiles, and
    Lorenz ordinates, where X = {1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 75,
    80, 90, 95, 99}.

    The following results are saved regardless of whether covariates have
    been specified or not.

    e(b_a), and e(b_b) are row vectors containing the parameter estimates
    from each equation.

    e(length_b_a),and e(length_b_b) contain the lengths of these vectors. If
    no covariates have been specified in an equation, the corresponding
    vector has length equal to 1 (the constant term); otherwise, the length
    is one plus the number of covariates.


Formulae

    The Fisk distribution has distribution function (c.d.f.)

        F(x) = 1/[ 1 + (b/x)^a ]

    where a, b, are parameters, each positive, for random variable x > 0.
    Parameters a is the key distributional 'shape' parameters; b is a scale
    parameter.

    Letting z = 1 + (b/x)^a, then F(x) = 1/z, and the probability density
    function (p.d.f.) is

        f(x) = a(b/x)^a*(1/x)/z^2

    The likelihood function for a sample of observations on var is specified
    as the product of the densities for each observation (weighted where
    relevant), and is maximized using ml model lf.

    The formulae used to derive the distributional summary statistics
    presented (optionally) are as follows. The r-th moment about the origin
    is given by

        b^r*B(1+r/a,1-r/a)

    where B(u,v) is the Beta function = G(u).G(v)/G(u+v) and G(.) is the
    gamma function [exp(lngamma(.)], which by substitution and using G(1) =
    1, implies the moments can be written

        b^r*G(1-r/a)*G(1+r/a)

    and hence

        mean = b*G(1-1/a)*G(1+1/a)

        variance = (b^2)*G(1-2/a)*G(1+2/a) - mean^2

    from which the standard deviation and half the squared coefficient of
    variation can be derived. The mode is

    mode = b*((a-1)/(a+1))^(1/a) if a > 1, and 0 otherwise.

    The quantiles are derived by inverting the distribution function:

        x_s = b*( 1/s - 1)^(1/a) for each s = F(x_s).

    The Gini coefficient of inequality is given by

        Gini = -1 + G(2 + 1/a) / { G(1+1/a)*G(2) }.

    The Lorenz curve ordinates at each s = F(x_s) use the incomplete Beta
    function:

        L(s) = ibeta(1+1/a, 1-1/a, s ).

        
Examples

    . fiskfit x [w=wgt]

    . fiskfit

    . fiskfitfit, stats poorfrac(100)

    . fiskfitmfit x, a(age sex) b(age sex)

    . fiskfit x, ab(age sex)

    . predict double a_i, eq(a) xb

    . predict double b_i, eq(b) xb

    . predict double p_i, eq(p) xb


Authors

    Maarten L. Buis <m.buis@fsw.vu.nl>, Department of Social Research
    Methodology, Vrije Universiteit Amsterdam, Boelelaan 1081, 1081 HV
    Amsterdam, The Netherlands.

    Stephen P. Jenkins <stephenj@essex.ac.uk>, Institute for Social and
    Economic Research, University of Essex, Colchester CO4 3SQ, U.K.


References

    Dagum, C. (1977). A new model of personal income distribution:
        specification and estimation. Economie Appliquée 30: 413-437.

    Dagum, C. (1980). The generation and distribution of income, the Lorenz
        curve and the Gini ratio. Economie Appliquée 33: 327-367.

    Fisk, P.R. (1961). The Graduation of Income Distributions.  Econometrica
        29: 171-185.

    Jenkins, S.P. (2004). Fitting functional forms to distributions, using
        ml. Presentation at Second German Stata Users Group Meeting, Berlin. 
        http://www.stata.com/meeting/2german/Jenkins.pdf

    Kleiber, C. (1996). Dagum vs. Singh-Maddala income distributions.
        Economics Letters 53: 265-268.

    Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in
        Economics and Actuarial Sciences.  Hoboken, NJ: John Wiley.

    McDonald, J.B. (1984). Some generalized functions for the size
        distribution of income. Econometrica 52: 647-663.

    Singh, S.K. and G.S. Maddala (1976). A function for the size distribution
        of income. Econometrica 44: 963-970.

Also see

    Online: help for hangroot dagumfit, smfit, gb2fit, lognfit, if installed.