-------------------------------------------------------------------------------
help for sumdist                            Stephen P. Jenkins (September 2006)
-------------------------------------------------------------------------------


Distribution summary statistics, by quantile group

        sumdist varname [if exp] [in range] [, ngp(#) qgp(newvarname)
               pvar(newvarname) lvar(newvarname) glvar(newvarname) ]

    fweights and aweights are allowed; see weights.

    by may be used with sumdist; see by.


Description

    sumdist estimates distributional summary statistics commonly used by
        income distribution analysts, complementing those available via
        pctile, xtile, and summarize, detail. Calculations are based on all
        non-missing values of varname. Use if if you wish to exclude values
        less than or equal to zero.

    For variable x and distribution function F(x), the statistics are:

    (1) quantiles k = 1,2,...,K-1, for K = # quantile groups;

    (2) the quantiles expressed as a percentage of median(x);

    (3) the quantile group shares of x in total x (expressed as a %);

    (4) the cumulative quantile group shares of total x (with cumulation in
        ascending order of x), i.e. the Lorenz ordinates L(p_k) at each p_k
        = F(x_k) for quantile points x_k (expressed as a %);

    (5) the generalised Lorenz ordinates at each p_k = F(x_k), i.e.
        GL(p_k) = mean(x)*L(p_k).

    Bootstrapped standard errors for the estimates can be derived using 
    bootstrap. Standard errors derived using linearization methods can be
    calculated for Lorenz and generalized ordinates using svylorenz.


Options

    ngp(#) specifies the number of quantile groups, and must be an integer
        between 1 and 100. The default is 10.

    qgp(newvarname) creates a new variable in the current data set that
        identifies the quantile group membership of each observation. If this
        option (or any of the three below) is combined with by, the variables
        refer to the last bygroup only.

    The following options may be used to graph Lorenz curves and generalized
        Lorenz curves (see also glcurve which is a more general program for
        this task):

    pvar(newvarname) creates a new variable in the current data set
        containing the values of p_k = F(x_k) corresponding to each k, plus
        0.

    lvar(newvarname) creates a new variable in the current data set
        containing the cumulative shares L(p_k), plus L(0) = 0.

    glvar(newvarname) creates a new variable in the current data set
        containing the generalized Lorenz ordinates GL(p_k), plus GL(0) = 0.


Examples

    . sumdist x [aw = wgtvar]

    . sumdist x [aw = wgtvar], ngp(20)

    . sumdist x [aw = wgtvar], ngp(5) qgp(quintilegroup)

    . bysort famtype: sumdist x [aw = wgtvar]

    . // bootstrapped standard errors for share of poorest fifth (Stata
        version 8)

    . preserve

    . keep if x > 0 & x < .

    . version 8: bootstrap "sumdist x, ngp(5)" cush1 = r(cush1), reps(100)

    . restore

    . // bootstrapped standard errors for share of poorest fifth (Stata
        version 9)

    . preserve

    . keep if x > 0 & x < .

    . bootstrap cush1 = r(cush1), reps(100): sumdist x, ngp(5)

    . restore

    . // draw basic Lorenz curve

    . sumdist x, ngp(20) pvar(p) lvar(l)

    . twoway (connect p p) (connect l p, sort)


Saved Results

    Scalars:

    r(mean)           mean of varname
    r(median)         median of varname
    r(N)              number of observations
    r(sum_w)          sum of weights
    r(ngps)           number of quantile groups
    r(qk)             quantile k, where k = 1, ..., ngps-1
    r(shk)            share of varname held by each quantile group k
    r(cushk)          cumulative share of varname held by each quantile group
                 k
    r(glk)            generalized Lorenz ordinate of varname held by each
                 quantile group k

    Matrices:

    r(quantiles)      1 x (ngps-1) vector of quantiles
    r(relquantiles)   1 x (ngps-1) vector of quantiles relative to median
    r(shares)         1 x (ngps) vector of shares


Author

    Stephen P. Jenkins, Institute for Social and Economic Research,
    University of Essex. Email: stephenj@essex.ac.uk


References

Cowell, F.A. 1995. Measuring Inequality, second edition.
Hemel Hempstead: Prentice-Hall/Harvester-Wheatsheaf.


Also see

    xtile pctile summarize

    svylorenz, glcurve, if installed.