-------------------------------------------------------------------------------
help for ineqdeco, ineqdec0                       Stephen P. Jenkins (May 2008)
-------------------------------------------------------------------------------

Inequality indices, with optional decomposition by subgroup

        ineqdeco varname [weights] [if exp] [in range] [, bygroup(groupvar)
                 welfare summarize ]

        ineqdec0 varname [weights] [if exp] [in range] [, bygroup(groupvar)
                 welfare summarize ]

    fweights and aweights are allowed; see help weights.

Description

    ineqdeco and ineqdec0 estimate a range of inequality and related indices
    commonly used by economists, plus decompositions of a subset of these
    indices by population subgroup.  Inequality decompositions by subgroup
    are useful for providing inequality profiles at a point in time, and also
    for analyzing secular trends using shift-share analysis. Unit record
    (`micro' level) data are required. Observations with values of varname
    less than or equal to zero are excluded from calculations using ineqdeco.
    By contrast, calculations using ineqdec0 do not exclude these
    observations: values of varname less than or equal to zero are valid
    (unless otherwise excluded using the if or in options). As a consequence,
    the portfolio of indices that is estimated by ineqdec0 is restricted.
    See below for details.

    Inequality indices estimated by ineqdeco are: members of the single
    parameter Generalized Entropy class GE(a) for a = -1, 0, 1, 2; the
    Atkinson class A(e) for e = 0.5, 1, 2; the Gini coefficient, and the
    percentile ratios p90 / p10 and p75 / p25.  Also presented are related
    summary statistics such as subgroup means and population shares.
    Optionally presented are indices related to the Atkinson inequality
    indices, namely equally-distributed-equivalent income Yede(e), social
    welfare indices W(e), and the Sen welfare index: see below for details.
    The indices estimated by ineqdec0 are the percentile ratios p90/p10 and
    p75/p25, GE(2) = half the squared coefficient of variation, the Gini
    coefficient, and Sen's welfare index.

    The inequality indices differ in their sensitivities to income
    differences in different parts of the distribution. The more positive a
    is, the more sensitive GE(a) is to income differences at the top of the
    distribution; the more negative a is, the more sensitive it is to
    differences at the bottom of the distribution. GE(0) is the mean
    logarithmic deviation, GE(1) is the Theil index, and GE(2) is half the
    square of the coefficient of variation. The more positive e > 0 (the
    'inequality aversion parameter') is, the more sensitive A(e) is to income
    differences at the bottom of the distribution. The Gini coefficient is
    most sensitive to income differences about the middle (more precisely,
    the mode).

    For textbook reviews of inequality measurement from the perspective of
    economists, see Cowell (1995) or Jenkins (1991). See also Cowell (2000).
    On the characterization of Generalized Entropy indices, and their
    subgroup decomposability properties, see e.g.  Shorrocks (1984) and
    references therein. On the Atkinson indices, see Atkinson (1970).  The
    decomposition of Atkinson indices is discussed by Blackorby et al.
    (1981).  For extensive empirical illustrations of inequality index
    decomposition, see inter alia Jenkins (1995) who also applies the
    decomposition of inequality trends proposed by Mookherjee and Shorrocks
    (1982). Cowell and Jenkins (1995) compare decompositions based on
    Generalized Entropy and Atkinson indices.  The welfare indices estimated
    here are discussed by Sen (1976), and Jenkins (1997) who also provides
    empirical illustrations.

    groupvar must take non-negative integer values only. To create such a
    variable from an existing variable, use the egen function group.  By
    default, observations with missing values on groupvar are excluded from
    calculations when the bygroup option is specified. If you wish to include
    them, create a new variable with the egen function group and use its
    missing option. The egen function group is also useful for multi-way
    decompositions. E.g. for a decomposition by sex and region, create a new
    groupvar defining sex-region combinations by specifying sex and region in
    group(varlist).

    Bootstrapped standard errors for the estimates of the indices can be
    derived using bootstrap. Standard errors derived using linearization
    methods can be calculated for GE(a) using svygei, for A(e) using svyatk,
    and for the Gini using svylorenz.


Technical details

    Consider a population of persons (or households ...), i = 1,...,n, with
    income y_i, and weight w_i. Let f_i = w_i / N, where N = SUM w_i.  (In
    what follows all sums are over all values of whatever is subscripted.)
    When the data are unweighted, w_i = 1 and N = n.

    Arithmetic mean income is m. Suppose there is an exhaustive partition of
    the population into mutually-exclusive subgroups k = 1,...,K.

    The Generalized Entropy class of inequality indices is given by

        GE(a) = [1 / (a (a - 1)] { [SUM f_i (y_i / m)^a] - 1 },
        a != 0 and a != 1,

        GE(1) = SUM f_i (y_i / m) log(y_i / m),

        GE(0) = SUM f_i log(m / y_i).

    Each GE(a) index can be additively decomposed as

        GE(a) = GE_W(a) + GE_B(a)

    where GE_W(a) is Within-group Inequality and GE_B(a) is Between-Group
    Inequality and

        GE_W(a) = SUM [v_k^(1-a)] . [s_k^a] . GE_k(a)

    where v_k = N_k / N is the number of persons in subgroup k divided by the
    total number of persons (subgroup population share), and s_k is the share
    of total income held by k's members (subgroup income share).  (Strictly
    speaking, v_k is the sum of the weights in subgroup k divided by the sum
    of the weights for the full estimation sample.)

    GE_k(a), inequality for subgroup k, is calculated as if the subgroup were
    a separate population, and GE_B(a) is derived assuming every person
    within a given subgroup k received k's mean income, m_k.

    Define the equally-distributed-equivalent income

        Yede(e) = [SUM f_i (y_i)^(1-e)]^(1 / (1 - e)), e > 0 and  e != 1,

                = exp( SUM f_i . log y_i ), e = 1.

    The Atkinson indices are defined by

        A(e) = 1 - Yede(e) / m.

    These indices are decomposable (but not additively decomposable):

        A(e) = A_W(a) + A_B(a) - A_W(a) . A_B(a)

    where

        A_W(a) = 1 - [SUM (v_k) .  Yede_k / m] and

        A_B(a) = 1 - Yede / [SUM v_k. Yede_k ].

    Social welfare indices are defined by

        W(e) = [Yede(e)^(1-e)] / (1 - e), e > 0, e != 1;

        W(1) = log Yede(1).

    Each of these welfare indices is an increasing function of a `generalized
    mean of order (1 - e)', Yede(e).  All the welfare indices are additively
    decomposable:

        W(e) = SUM v_k W_k(e).

    The Gini coefficient is given by

        G = 1 + (1 / N) - [2/(m . N^2)] [SUM (N - i + 1) y_i]

    where persons are ranked in ascending order of y_i.  The Gini coefficient
    (and the percentile ratios) cannot be written as the sum of a term
    summarizing within-group inequality and a term summarizing between-group
    inequality.

    Sen's (1976) welfare index is given by

        S = m(1 - G).


Options

    bygroup(groupvar) requests inequality decompositions by population
        subgroup, with subgroup membership summarized by groupvar.

    welfare requests calculation of equally-distributed-equivalent incomes
        and welfare indices in addition to the inequality index calculations.

    summarize requests presentation of summary, detail output for varname.


Saved results 

    r(p5), r(p10), r(p25)       Percentiles p5, p10, p25,
    r(p50), r(p75), r(p90)      p50, p75, p90,
    r(p95)                      p95

    r(p90p10), r(p75p25)        Percentile ratios p90/p10, p75/p25,
    r(p25p50), r(p10p50)        p25/p50, p10/p50,
    r(p90p50), r(p75p50)        p90/p50, p75/p50

    r(gem1), r(ge0),            GE(a), for a = -1, 0, 1, 2 
    r(ge1), r(ge2)
     
    r(gini)                     Gini coefficient

    r(ahalf), r(a1), r(a2)      A(e), for e = 0.5, 1, 2 

    r(mean), r(sd), r(Var)      mean, standard deviation, variance
    r(min), r(max)              minimum, maximum
    r(N), r(sumw)               Number of observations, sum of weights

    If the welfare option is specified, also saved are:

    r(edehalf), r(ede1)         Yede(e) for e = 0.5, 1, 2
    r(ede2)

    r(whalf), r(w1)             W(e) for e = 0.5, 1, 2, and
    r(w2), r(wgini)             Sen's welfare measure

    If the bygroup option is specified, also saved are:

    r(within_gem1)              GE_W(a), for a = -1, 0, 1, 2 
    r(within_ge0)
    r(within_ge1)
    r(within_ge2) 
    
    r(between_gem1)             GE_B(a), for a = -1, 0, 1, 2 
    r(between_ge0)
    r(between_ge1)
    r(between_ge2) 

    r(within_ahalf)             A_W(a), for e = 0.5, 1, 2 
    r(within_a1)
    r(within_a2) 
    
    r(between_ahalf)            A_B(a), for e = 0.5, 1, 2
    r(between_a1)
    r(between_a2) 

    r(gem1_k), r(ge0_k)         GE_k(a), for a = -1, 0, 1, 2, and
    r(ge1_k), r(ge2_k)          each subgroup k, where the values of k
                                correspond to the values of groupvar
                                in the estimation sample. See r(levels) below.

    r(ahalf_k), r(a1_k)         A_k(a), for a = 0.5, 1, 2, and
    r(a2_k)                             each subgroup k

    r(gini_k)                   Gini for each subgroup k

    r(mean_k), r(lambda_k)      subgroup mean (m_k), and relative mean (m_k/m)
    r(lgmean_k)                 subgroup log mean, log(m_k) 
    r(theta_k)                  subgroup income share, s_k
    r(sumw_k)                   subgroup sum of weights
    r(v_k)                      subgroup population share, v_k 

    r(levels)                   macro containing the set of values of groupvar
                                (the number of unique values = K)


    If the welfare option is specified, also saved are:

    r(whalf_k), r(w1_k)         W_k(a), for a = 0.5, 1, 2, and
    r(w2_k)                     each subgroup k

    r(edehalf_k), r(ede1_k)     Yede_k(a), for a = 0.5, 1, 2, and
    r(ede2_k), r(wgini_k)       Sen's welfare measure, for each subgroup k

    For the convenience of users of earlier versions of these programs, a
    selected set of estimates is also saved in global macros, as follows.

    S_9010, S_7525              Percentile ratios p90/p10, p75/p25

    S_im1, S_i0, S_i1, S_i2     GE(a), for a = -1, 0, 1, 2 

    S_gini                      Gini coefficient

    S_ahalf, S_a1, S_a2         A(e), for e = 0.5, 1, 2 

        
Examples

    . ineqdeco x [aw = wgtvar]

    . ineqdec0 x [aw = wgtvar]

    . ineqdeco x, by(famtype) w

    . ineqdeco x if sex==1, w s

    . // bootstrapped standard errors for Gini in Stata version 8

    . preserve

    . keep if x > 0 & x < .

    . version 8: bootstrap "ineqdeco x" gini = r(gini), reps(100)

    . restore

    . // bootstrapped standard errors for Gini in Stata version 9

    . preserve

    . keep if x > 0 & x < .

    . bootstrap gini = r(gini), reps(100): ineqdeco x

    . restore

    . // multi-way decomposition

    . egen sexXregion = group(sex region)

    . ineqdeco x, by(sexXregion)

    Further examples are provided in the downloadable materials accompanying
    the presentation by Jenkins (2006).


Author

    Stephen P. Jenkins <stephenj@essex.ac.uk>
    Institute for Social and Economic Research
    University of Essex, Colchester CO4 3SQ, U.K.

Acknowledgements

    For comments and suggestions, I am grateful to Philippe Van Kerm and Nick
    Cox.  Thanks also to Johannes Giesecke and Austin Nichols who drew
    attention to a bug in the subgroup decomposition calculations that arose
    when if/in qualifiers led to no observations in one of more of the by
    groups. Austin provided code to fix the bug.


References

    Atkinson, A.B. 1970.  On the measurement of inequality.  Journal of
        Economic Theory 2: 244-63.

    Blackorby, C., Donaldson, D., and Auersperg, M. 1981.  A new procedure
        for the measurement of inequality within and between population
        subgroups.  Canadian Journal of Economics 14: 665-85.

    Cowell, F.A. 1995.  Measuring Inequality.  Hemel Hempstead:
        Prentice-Hall/Harvester-Wheatsheaf.

    Cowell, F.A. 2000.  Measurement of inequality.  In Handbook of Income
        Distribution Volume 1, eds A.B. Atkinson and F. Bourguignon.
        Amsterdam: Elsevier Science, 59-85.

    Cowell, F.A. and Jenkins, S.P. 1995.  How much inequality can we explain?
        A methodology and an application to the USA.  Economic Journal 105:
        421-430.

    Jenkins. S.P. 1991.  The measurement of income inequality.  In L. Osberg
        (ed.) Economic Inequality and Poverty: International Perspectives.
        Armonk, NY: M.E. Sharpe.

    Jenkins, S.P. 1995.  Accounting for inequality trends: decomposition
        analyses for the UK, 1971-86.  Economica 62: 29-63.

    Jenkins, S.P. 1997.  Trends in real income in Britain: a microeconomic
        analysis.  Empirical Economics 22: 483-500.

    Jenkins, S.P. 2006. Estimation and interpretation of measures of
        inequality, poverty, and social welfare using Stata. Presentation at
        North American Stata Users' Group Meetings 2006, Boston MA.  
        http://econpapers.repec.org/paper/bocasug06/16.htm.

    Mookherjee, D. and Shorrocks, A. 1982.  A decomposition analysis of the
        trend in UK inequality.  Economic Journal 92: 886-992.

    Sen, A.K. 1976.  Real national income.  Review of Economic Studies 43:
        19-39.

    Shorrocks, A.F. 1984.  Inequality decomposition by population subgroups.
        Econometrica 52: 1369-88.


Also see

    inequal7 if installed; sumdist if installed; svylorenz if installed; 
             svygei if installed; svyatk if installed; povdeco if installed.