Distribution summary statistics, by quantile group

sumdistvarname[ifexp] [inrange] [,ngp(#)qgp(newvarname)pvar(newvarname)lvar(newvarname)glvar(newvarname)]

fweights andaweightsare allowed; see weights.

bymay be used withsumdist; see by.

Description

sumdistestimates distributional summary statistics commonly used by income distribution analysts, complementing those available viapctile,xtile, andsummarize, detail. Calculations are based on all non-missing values ofvarname. Useifif you wish to exclude values less than or equal to zero.For variable x and distribution function F(x), the statistics are:

(1) quantiles k = 1,2,...,K-1, for K = # quantile groups;

(2) the quantiles expressed as a percentage of median(x);

(3) the quantile group shares of x in total x (expressed as a %);

(4) the cumulative quantile group shares of total x (with cumulation in ascending order of x), i.e. the Lorenz ordinates L(p_k) at each p_k = F(x_k) for quantile points x_k (expressed as a %);

(5) the generalised Lorenz ordinates at each p_k = F(x_k), i.e. GL(p_k) = mean(x)*L(p_k).

Bootstrapped standard errors for the estimates can be derived using bootstrap. Standard errors derived using linearization methods can be calculated for Lorenz and generalized ordinates using

svylorenz.

Options

ngp(#)specifies the number of quantile groups, and must be an integer between 1 and 100. The default is 10.

qgp(newvarname)creates a new variable in the current data set that identifies the quantile group membership of each observation. If this option (or any of the three below) is combined withby, the variables refer to the last bygroup only.The following options may be used to graph Lorenz curves and generalized Lorenz curves (see also

glcurvewhich is a more general program for this task):

pvar(newvarname)creates a new variable in the current data set containing the values of p_k = F(x_k) corresponding to each k, plus 0.

lvar(newvarname)creates a new variable in the current data set containing the cumulative shares L(p_k), plus L(0) = 0.

glvar(newvarname)creates a new variable in the current data set containing the generalized Lorenz ordinates GL(p_k), plus GL(0) = 0.

Examples. sumdist x [aw = wgtvar]

. sumdist x [aw = wgtvar], ngp(20)

. sumdist x [aw = wgtvar], ngp(5) qgp(quintilegroup)

. bysort famtype: sumdist x [aw = wgtvar]

. // bootstrapped standard errors for share of poorest fifth (Stata version 8)

. preserve

. keep if x > 0 & x < .

. version 8: bootstrap "sumdist x, ngp(5)" cush1 = r(cush1), reps(100)

. restore

. // bootstrapped standard errors for share of poorest fifth (Stata version 9)

. preserve

. keep if x > 0 & x < .

. bootstrap cush1 = r(cush1), reps(100): sumdist x, ngp(5)

. restore

. // draw basic Lorenz curve

. sumdist x, ngp(20) pvar(p) lvar(l)

. twoway (connect p p) (connect l p, sort)

Saved ResultsScalars:

r(mean)mean ofvarnamer(median)median ofvarnamer(N)number of observationsr(sum_w)sum of weightsr(ngps)number of quantile groupsr(qk)quantilek, wherek= 1, ...,ngps-1r(shk)share ofvarnameheld by each quantile groupkr(cushk)cumulative share ofvarnameheld by each quantile groupkr(glk)generalized Lorenz ordinate ofvarnameheld by each quantile groupkMatrices:

r(quantiles)1 x (ngps-1) vector of quantilesr(relquantiles)1 x (ngps-1) vector of quantiles relative to medianr(shares)1 x (ngps) vector of shares

AuthorStephen P. Jenkins, Institute for Social and Economic Research, University of Essex. Email: stephenj@essex.ac.uk

ReferencesCowell, F.A. 1995.

Measuring Inequality, second edition. Hemel Hempstead: Prentice-Hall/Harvester-Wheatsheaf.

Also seextile pctile summarize

svylorenz,glcurve, if installed.