Derivation of generalised Lorenz curve ordinates with unit record data
glcurve varname [weight] [if exp] [in range] [, pvar(newvarname) glvar(newvarname) sortvar(varname) by(varname) split nograph replace lorenz atip(string) rtip(string) plot(plot) graph_options ]
aweights and fweights are allowed; see help weights.
Description
Given a variable varname, call it x with c.d.f. F(x), glcurve draws its Generalised Lorenz curve and/or generates two new variables containing the Generalised Lorenz ordinates for x, i.e. GL(p) at each p = F(x). For a population ordered in ascending order of x, a graph of GL(p) against p plots the cumulative total of x divided by population size against cumulative population share GL(1) = mean(x). glcurve can also be used to derive many other related concepts such as Lorenz curves, concentration curves and 'Three Is of Poverty' (TIP) curves, with appropriate definition of varname, order of cumulation (set with the sortvar option), and normalisation (e.g. by the mean of varname). Alternatively glcurve with the lorenz, atip or rtip option can be used directly to draw the related Lorenz, concentration and TIP curves.
Comparisons of pairs of distributions (and dominance checks) can be undertaken by using the by() (with or without the split) options. It can also be made manually by 'stacking' the data (see help on stack).
The graphs drawn by glcurve are relatively basic. For graphs with full user control over formatting and labelling, users are recommended to use glcurve to generate the ordinates of the graph required using the pvar(newvarname) and glvar(newvarname) options, and then to draw the graph using graph twoway.
Options
pvar(pvarname) generates the variable pvarname containing the x coordinates of the created curve.
glvar(glvarname) generates the variable glvarname containing the y coordinates of the created curve.
sortvar(sname) specifies the sort variable. By default, the data are sorted (and cumulated) in ascending order of varname. If the sortvar option is specified, sorting and cumulation is in ascending order of variable sname. Within tied values of sname, data are sorted in ascending order of varname.
by(groupvar) specifies that the coordinates are to be computed separately for each subgroup defined by groupvar. groupvar must be an integer variable.
split specifies that a series of new variables are created containing the coordinates for each subgroup specified by by(groupvar). split can not be used without by(). If split is specified, then the string glname in glvar(glname) is used as a prefix to create new variables glname_X1, glname_X2,... (where X1, X2, ... are the values taken by groupvar).
nograph avoids the automatic display of a crude graph made out of the created variables. nograph is assumed if by() is specified without split.
replace allows the variables specified in glvar(glvarname) and pvar(pvarname) to be overwritten if they already exist. Otherwise glvarname and pvarname must be new variable names.
lorenz requires that the ordinates of the Lorenz curve are computed instead of generalised Lorenz ordinates. The Lorenz ordinates of variable x, L(p), are GL(p)/mean(x).
rtip(povline) and atip(povline) require that the ordinates of TIP curves are computed instead of generalised Lorenz ordinates. povline specifies the value of the poverty line: it can be either a numeric value taken as the poverty line for all observations or a variable name containing the value of the poverty line for each observation. atip() draws 'absolute' TIP curves (by cumulating max(z-x,0)) and rtip() draws 'relative' TIP curves (by cumulating max(1-(x/z),0)).
plot(plot) provides a way to add other plots to the generated graph; see plot option.
graph_options are standard twoway scatter options. Note that modifications to the legend labels should be made with the legend(order(...) options instead of legend(label(...) (see help legend_option).
Examples
Many glcurve examples are provided in the downloadable materials provided by Jenkins (2006).
. * Generalized Lorenz curve ordinates; plot using -graph twoway-
. glcurve x, gl(gl1) p(p1) nograph
. twoway line gl1 p1
. * Lorenz curve ordinates; plot using -glcurve-
. glcurve x, lorenz plot(function equality = x)
. * Lorenz curve ordinates; plot using -glcurve-; options
. glcurve x [fw=wgt] if x > 0, gl(gl2) p(p2) lorenz
. * Generalised Lorenz curve ordinates and graphs, by state
. glcurve x, gl(gl2) p(p2) replace sort(y) by(state) split
. * TIP curve ordinates with graph
. glcurve x, gl(gl3) p(p3) atip(10000)
. glcurve x, gl(gl3) p(p3) atip(plinevar)
. * Lorenz curve ordinates; plot using -graph twoway-
. glcurve x, gl(gl) p(p) lorenz nograph
. twoway line gl p , sort || line p p , /// xlabel(0(.1)1) ylabel(0(.1)1) /// xline(0(.2)1) yline(0(.2)1) /// title("Lorenz curve") subtitle("Example with custom formatting") /// legend(label(1 "Lorenz curve") label(2 "Line of perfect equality")) /// plotregion(margin(zero)) aspectratio(1) scheme(economist)
Notes
glcurve is designed to be used with individual-level, unit-record data. Although glcurve can also be applied mechanically to grouped (`banded') income data using fweights, be aware that the resulting curve is a potentially poor estimate, because within-income-band inequality is not taken into account. On the estimation of Lorenz curves and inequality indices with grouped data, see e.g. Gastwirth and Glaubermann (1976) or Cowell and Mehta (1982).
One must also be careful in using the ordinates returned from the option pvar for subsequent computation of the Gini or Concentration coefficient using the 'convenient covariance' formulae described by e.g. Lerman and Yitzhaki (1984, 1989) or Jenkins (1988). The ordinates returned in pvar are the curve ordinates (and are equal to estimates obtained from cumul) and these are not necessarily the fractional ranks required in the covariance formula. The difference is generally negligible with continuous unit-record data, but is larger if there are many ties in the ranking variable (as in the case, e.g., for the concentration coefficient based on an ordinal categorical variable, or when dealing with grouped data).
Acknowledgements
Nicholas J. Cox helped with updating the code for our program from Stata 7 (glcurve7) to Stata 8. David Demery, Owen O'Donnell, Shehzad Ali made useful bug reports. Comments by Zhuo (Adam) Chen lead to introduction of 'sort stable' estimation for concentration curves.
Authors
Philippe Van Kerm, CEPS/INSTEAD, Differdange, G.-D. Luxembourg philippe.vankerm@ceps.lu
Stephen P. Jenkins, ISER, University of Essex stephenj@essex.ac.uk
References
Cowell, F.A. 1995. Measuring Inequality (second edition). Hemel Hempstead: Prentice-Hall/Harvester-Wheatsheaf.
Cowell, F.A. and Mehta, F. 1982. The Estimation and Interpolation of Inequality Measures. Review of Economic Studies 49(2): 273-290.
Gastwirth, J.L. and Glauberman, M. 1976. The Interpolation of the Lorenz Curve and Gini Index from Grouped Data. Econometrica 44(3): 479-483.
Jenkins, S.P. 1988. Calculating income distribution indices from microdata. National Tax Journal 61: 139-142.
Jenkins, S.P. 2006. Estimation and interpretation of measures of inequality, poverty, and social welfare using Stata. Presentation at North American Stata Users' Group Meetings 2006, Boston MA. http://econpapers.repec.org/paper/bocasug06/16.htm.
Jenkins, S.P. and Lambert, P.J. 1997. Three 'I's of poverty curves, with an analysis of UK poverty trends. Oxford Economic Papers 49: 317-327.
Lambert, P.J. 2001. The Distribution and Redistribution of Income (third edition). Manchester: Manchester University Press.
Lerman, R.I. and Yitzhaki, S. 1984. A note on the calculation and interpretation of the Gini index. Economics Letters 15(3-4): 363-368.
Lerman, R.I. and Yitzhaki, S. 1989. Improving the Accuracy of Estimates of Gini Coefficients. Journal of Econometrics 42(1): 43-47.
Shorrocks, A.F. 1983. Ranking income distributions. Economica 197: 3-17.
Also see
Manual: [R] lorenz STB: STB-48 sg107, STB-49 sg107.1, SJ 1(1) gr0001 On-line: help for sumdist, svylorenz (if installed)