```-------------------------------------------------------------------------------
help for sumdist                            Stephen P. Jenkins (September 2006)
-------------------------------------------------------------------------------

Distribution summary statistics, by quantile group

sumdist varname [if exp] [in range] [, ngp(#) qgp(newvarname)
pvar(newvarname) lvar(newvarname) glvar(newvarname) ]

fweights and aweights are allowed; see weights.

by may be used with sumdist; see by.

Description

sumdist estimates distributional summary statistics commonly used by
income distribution analysts, complementing those available via
pctile, xtile, and summarize, detail. Calculations are based on all
non-missing values of varname. Use if if you wish to exclude values
less than or equal to zero.

For variable x and distribution function F(x), the statistics are:

(1) quantiles k = 1,2,...,K-1, for K = # quantile groups;

(2) the quantiles expressed as a percentage of median(x);

(3) the quantile group shares of x in total x (expressed as a %);

(4) the cumulative quantile group shares of total x (with cumulation in
ascending order of x), i.e. the Lorenz ordinates L(p_k) at each p_k
= F(x_k) for quantile points x_k (expressed as a %);

(5) the generalised Lorenz ordinates at each p_k = F(x_k), i.e.
GL(p_k) = mean(x)*L(p_k).

Bootstrapped standard errors for the estimates can be derived using
bootstrap. Standard errors derived using linearization methods can be
calculated for Lorenz and generalized ordinates using svylorenz.

Options

ngp(#) specifies the number of quantile groups, and must be an integer
between 1 and 100. The default is 10.

qgp(newvarname) creates a new variable in the current data set that
identifies the quantile group membership of each observation. If this
option (or any of the three below) is combined with by, the variables
refer to the last bygroup only.

The following options may be used to graph Lorenz curves and generalized
Lorenz curves (see also glcurve which is a more general program for

pvar(newvarname) creates a new variable in the current data set
containing the values of p_k = F(x_k) corresponding to each k, plus
0.

lvar(newvarname) creates a new variable in the current data set
containing the cumulative shares L(p_k), plus L(0) = 0.

glvar(newvarname) creates a new variable in the current data set
containing the generalized Lorenz ordinates GL(p_k), plus GL(0) = 0.

Examples

. sumdist x [aw = wgtvar]

. sumdist x [aw = wgtvar], ngp(20)

. sumdist x [aw = wgtvar], ngp(5) qgp(quintilegroup)

. bysort famtype: sumdist x [aw = wgtvar]

. // bootstrapped standard errors for share of poorest fifth (Stata
version 8)

. preserve

. keep if x > 0 & x < .

. version 8: bootstrap "sumdist x, ngp(5)" cush1 = r(cush1), reps(100)

. restore

. // bootstrapped standard errors for share of poorest fifth (Stata
version 9)

. preserve

. keep if x > 0 & x < .

. bootstrap cush1 = r(cush1), reps(100): sumdist x, ngp(5)

. restore

. // draw basic Lorenz curve

. sumdist x, ngp(20) pvar(p) lvar(l)

. twoway (connect p p) (connect l p, sort)

Saved Results

Scalars:

r(mean)           mean of varname
r(median)         median of varname
r(N)              number of observations
r(sum_w)          sum of weights
r(ngps)           number of quantile groups
r(qk)             quantile k, where k = 1, ..., ngps-1
r(shk)            share of varname held by each quantile group k
r(cushk)          cumulative share of varname held by each quantile group
k
r(glk)            generalized Lorenz ordinate of varname held by each
quantile group k

Matrices:

r(quantiles)      1 x (ngps-1) vector of quantiles
r(relquantiles)   1 x (ngps-1) vector of quantiles relative to median
r(shares)         1 x (ngps) vector of shares

Author

Stephen P. Jenkins, Institute for Social and Economic Research,
University of Essex. Email: stephenj@essex.ac.uk

References

Cowell, F.A. 1995. Measuring Inequality, second edition.