Quantile group shares, cumulative shares (Lorenz ordinates), generalized Lorenz > ordinates, and Gini coefficient
svylorenz varname [if exp] [in range] [, ngp(#) qgp(newvarname) subpop(varname) pvar(newvarname) lvar(newvarname) selvar(newvarname) glvar(newvarname) seglvar(newvarname) level(#) ]
Data must be svyset before using this command; see svyset.
svylorenz computes distribution-free variance estimates for quantile group shares of total varname, cumulative quantile group shares (cumulation is in ascending order of varname), generalized Lorenz ordinates, and the Gini coefficient. The Lorenz curve, L(p), is the graph of cumulative quantile group shares at each cumulative population share p = F(varname). The generalized Lorenz curve GL(p) is the Lorenz curve scaled up at each p by mean varname. The Gini coefficient, G, is twice the area between the Lorenz curve and the line of perfect inequality, and ranges between 0 and 1. Higher values indicate greater inequality. Note that the Gini coefficient is calculated using all valid observations; it is not derived by approximation from the income shares.
Beach and Davidson (1983) provide formulae for variance estimation of shares, cumulative shares and generalized Lorenz ordinates, but for unweighted data with no complex survey design features. Beach and Kaliski (1986) extend these results to the case with sample weights that are fixed and non-stochastic. Kovacevic and Binder (1997), using the estimating equations approach, provide formulae for variance estimation of cumulative shares allowing for probability weights and for complex survey design more generally. They also provide formulae for variance estimation of G. All these linearization methods rely on asymptotic approximations, and small sample properties are not well-known.
svylorenz derives variance estimates using the methods of Kovacevic and Binder (1997) for cumulative shares and G, and derives estimates for quantile group shares from those for cumulative shares using a result of Beach and Kaliski (1986). Variance estimates for generalized Lorenz ordinates are derived by an application of the estimating equations approach of Binder and Kovacevic (1995) and Kovacevic and Binder (1997). For an alternative derivation, see Zheng (2002).
The point estimates computed by svylorenz are the same as the estimates that can be calculated using sumdist, ineqdeco and ineqdec0. By default, however, svylorenz uses observations with non-negative values of varname, ineqdeco uses observations with strictly positive values of varname, and ineqdec0 and sumdist use observations with negative, zero, or positive values of varname.
ngp(#) specifies the number of quantile groups, and must be an integer between 1 and 100. The default is 10.
qgp(newvarname) creates a new variable in the current data set that identifies the quantile group membership of each observation.
subpop(varname) specifies that estimates be computed for the single subpopulation defined by the observations for which varname!=0. Typically, varname=1 defines the subpopulation and varname=0 indicates observations not belonging to the subpopulation. For observations whose subpopulation status is uncertain, varname should be set to missing.
level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level; see [U] 20.6 Specifying the width of confidence intervals.
The following options may be used to graph Lorenz curves and generalized Lorenz curves (see also glcurve which is a more general program for this task):
pvar(newvarname) creates a new variable in the current data set containing the values of p_j = F(x_j) corresponding to each j for each quantile group j = 1,...,J, plus 0.
lvar(newvarname) creates a new variable in the current data set containing the cumulative shares L(p_j), plus L(0) = 0.
selvar(newvarname) creates a new variable in the current data set containing the estimated standard errors of the cumulative shares.
glvar(newvarname) creates a new variable in the current data set containing the generalized Lorenz ordinates GL(p_j), plus GL(0) = 0.
seglvar(newvarname) creates a new variable in the current data set containing the estimated standard errors of the generalized Lorenz ordinates.
. svyset psu_name [pw = wgt], strata(strata_name)
. svylorenz income
. svylorenz cYbhcg, ngp(20) pvar(p) lvar(l) selvar(sel)
. twoway (connect p p) (connect l p, sort)
Further examples are provided in the downloadable materials accompanying the presentation by Jenkins (2006).
e(gini) G e(se_gini) asymptotic SE of G e(mean) mean of varname e(se_mean) asymptotic SE of the mean e(total) total of varname e(ngps) number of quantile groups e(qj) quantile j, where j = 1, ..., ngps e(shj) share of varname held by each quantile group j e(se_shj) asymptotic SE of each group j's share of varname e(cushj) cumulative share of varname held by each quantile group j e(se_cushj) asymptotic SE of each group j's cumulative share of varname e(glj) generalized Lorenz ordinate of varname held by each quantile group j e(se_glj) asymptotic SE of each group j's generalized Lorenz ordinate of varname
e(quantiles) 1 x (ngps-1) vector of quantiles e(shares) 1 x (ngps) vector of quantile group shares e(V_cush) (ngps-1) x (ngps-1) variance-covariance matrix of cumulative shares e(V_gl) (ngps) x (ngps) variance-covariance matrix of generalized Lorenz ordinates
Philippe Van Kerm provided helpful comments on early drafts of this program.
Beach, C.M. and R. Davidson. 1983. Distribution-free statistical inference with Lorenz curves and income shares. Review of Economic Studies 50: 723-725.
Beach, C.M. and S.F. Kaliski. 1986. Lorenz curve inference with sample weights: an application to the distribution of unemployment experience. Applied Statistics 35(1): 38-45.
Binder, D.A. and M.S. Kovacevic. 1995. Estimating some measures of income inequality from survey data: an application of the estimating equations approach. Survey Methodology 21: 137-145.
Jenkins, S.P. 2006. Estimation and interpretation of measures of inequality, poverty, and social welfare using Stata. Presentation at North American Stata Users' Group Meetings 2006, Boston MA. http://econpapers.repec.org/paper/bocasug06/16.htm.
Kovaevic, M.S. and D.A. Binder. 1997. Variance estimation for measures of income inequality and polarization. Journal of Official Statistics 13(1): 41-58. Full text downloadable from http://www.jos.nu/Articles/abstract.asp?article=13141.
Zheng, B. 2002. Testing Lorenz curves with non-simple random samples. Econometrica 70: 1235-1243.
Stephen P. Jenkins, Institute for Social and Economic Research, University of Essex. Email: firstname.lastname@example.org
svy, svyset, xtile