-------------------------------------------------------------------------------
help for sumdist                            Stephen P. Jenkins (September 2006)
-------------------------------------------------------------------------------

Distribution summary statistics, by quantile group

sumdist varname [if exp] [in range] [, ngp(#) qgp(newvarname) pvar(newvarname) lvar(newvarname) glvar(newvarname) ]

fweights and aweights are allowed; see weights.

by may be used with sumdist; see by.

Description

sumdist estimates distributional summary statistics commonly used by income distribution analysts, complementing those available via pctile, xtile, and summarize, detail. Calculations are based on all non-missing values of varname. Use if if you wish to exclude values less than or equal to zero.

For variable x and distribution function F(x), the statistics are:

(1) quantiles k = 1,2,...,K-1, for K = # quantile groups;

(2) the quantiles expressed as a percentage of median(x);

(3) the quantile group shares of x in total x (expressed as a %);

(4) the cumulative quantile group shares of total x (with cumulation in ascending order of x), i.e. the Lorenz ordinates L(p_k) at each p_k = F(x_k) for quantile points x_k (expressed as a %);

(5) the generalised Lorenz ordinates at each p_k = F(x_k), i.e. GL(p_k) = mean(x)*L(p_k).

Bootstrapped standard errors for the estimates can be derived using bootstrap. Standard errors derived using linearization methods can be calculated for Lorenz and generalized ordinates using svylorenz.

Options

ngp(#) specifies the number of quantile groups, and must be an integer between 1 and 100. The default is 10.

qgp(newvarname) creates a new variable in the current data set that identifies the quantile group membership of each observation. If this option (or any of the three below) is combined with by, the variables refer to the last bygroup only.

The following options may be used to graph Lorenz curves and generalized Lorenz curves (see also glcurve which is a more general program for this task):

pvar(newvarname) creates a new variable in the current data set containing the values of p_k = F(x_k) corresponding to each k, plus 0.

lvar(newvarname) creates a new variable in the current data set containing the cumulative shares L(p_k), plus L(0) = 0.

glvar(newvarname) creates a new variable in the current data set containing the generalized Lorenz ordinates GL(p_k), plus GL(0) = 0.

Examples

. sumdist x [aw = wgtvar]

. sumdist x [aw = wgtvar], ngp(20)

. sumdist x [aw = wgtvar], ngp(5) qgp(quintilegroup)

. bysort famtype: sumdist x [aw = wgtvar]

. // bootstrapped standard errors for share of poorest fifth (Stata version 8)

. preserve

. keep if x > 0 & x < .

. version 8: bootstrap "sumdist x, ngp(5)" cush1 = r(cush1), reps(100)

. restore

. // bootstrapped standard errors for share of poorest fifth (Stata version 9)

. preserve

. keep if x > 0 & x < .

. bootstrap cush1 = r(cush1), reps(100): sumdist x, ngp(5)

. restore

. // draw basic Lorenz curve

. sumdist x, ngp(20) pvar(p) lvar(l)

. twoway (connect p p) (connect l p, sort)

Saved Results

Scalars:

r(mean) mean of varname r(median) median of varname r(N) number of observations r(sum_w) sum of weights r(ngps) number of quantile groups r(qk) quantile k, where k = 1, ..., ngps-1 r(shk) share of varname held by each quantile group k r(cushk) cumulative share of varname held by each quantile group k r(glk) generalized Lorenz ordinate of varname held by each quantile group k

Matrices:

r(quantiles) 1 x (ngps-1) vector of quantiles r(relquantiles) 1 x (ngps-1) vector of quantiles relative to median r(shares) 1 x (ngps) vector of shares

Author

Stephen P. Jenkins, Institute for Social and Economic Research, University of Essex. Email: stephenj@essex.ac.uk

References

Cowell, F.A. 1995. Measuring Inequality, second edition. Hemel Hempstead: Prentice-Hall/Harvester-Wheatsheaf.

Also see

xtile pctile summarize

svylorenz, glcurve, if installed.