-------------------------------------------------------------------------------
help for hutchens
-------------------------------------------------------------------------------

Hutchens `square root' segregation index, with decompositions by subgroup

hutchens unitvar segvar [weight] [if exp] [in range] [, bygroup(groupvar) missing format(%fmt) ]

fweights and aweights are allowed; see help weights.

Description

hutchens computes the `square root' segregation index proposed by Hutchens (2004) from individual-level data. Hutchens shows that this index, call it S, satisfies seven desirable properties for a good numerical measure of segregation. In particular, S is additively decomposable by population subgroup: total segregation may be expressed as the sum of within-group segregation (a weighted sum of S across subgroups) plus between-group segregation. S lies on the unit interval, with zero representing the complete absence of segregation, and one representing complete segregation. If two distributions are unambiguously ordered according to a pair of (non-intersecting) segregation curves, then S will also order the distributions in the same way.

unitvar is the categorical variable summarising social units and segvar is the categorical variable defining the social groups who are segregated. For example, in a study of occupational sex segregation, unitvar would represent occupations and segvar would represent sex. In a study of the educational segregation by family background, unitvar would represent schools (say) and segvar would be a measure of family background. Note that segvar must be a binary (0/1) variable. For decompositions of S by population subgroup, groupvar is the categorical variable defining the subgroups.

S is the sum, over all social units, of each unit's shortfall from distributional evenness. For each value of unitvar, this shortfall is the difference between the geometric mean of the shares of individuals with different backgrounds characterized by segvar were there to be no segregation, and the geometric mean of the actual shares. See Jenkins et al. (2006).

Options

bygroup(groupvar) specifies the decomposition by population subgroups defined by groupvar. If the bygroup option is not specified, calculations are based on the subset of observations with valid values on unitvar and segvar.

missing requests that missing values on groupvar be treated like other values. (Cases with missing values form a separate subgroup when decompositions are done.) missing may only specified if the bygroup option is also specified. If the bygroup option is specified and the missing option is not specified, then all calculations (including aggregate statistics) are based on the subset of observations with valid values on unitvar, segvar, and groupvar.

format(%fmt) specifies the format to be used to display the results. The default is format(%10.0g).

Examples

Occupational sex segregation:

. hutchens isco88 sex

Sex segregation in schools, with a decomposition by school type (e.g. public/private):

. hutchens schoolid sex, by(stype)

Sex segregation in schools, with a decomposition by school type and region:

. egen stypeXregion = group(stype region)

. hutchens schoolid sex, by(stypeXregion)

Saved Results

r(S) value of S for total estimation sample

r(Ncat) number of distinct categories in unitvar

r(Nobs) total number of raw (unweighted) observations

r(pr_1) fraction of sample with segvar = 1.

If the bygroup option is specified:

r(SW) within-group segregation value

r(SWpc) within-group segregation value, expressed as percentage of S

r(SB) between-group segregation value

r(SBpc) between-group segregation value, expressed as percentage of S

Methods and Formulae

Let N(A_j) be the number from social group A in unit j (e.g. the number of men who are bankers) and N(A_j) be the number from social group B in unit j (e.g. the number of women who are bankers). The square root segregation index S is defined as

S = 1 - SUM_j sqrt[ N(A_j)/N(A)} * N(B_j)/N(B) ] j = 1,...,J

or, equivalently,

S = SUM_j C_j

where the `contribution' of each obs C_j = N(B_j)/N(B) - sqrt[ N(A_j)/N(A) * N(B_j)/N(B) ], and N(A) and N(B) are the total number of obs in groups A and B. The C_j term for a given social unit is the shortfall from distributional evenness for that unit (see the earlier discussion).

For decompositions by population subgroup, suppose that the sample can be exhaustively partitioned into G non-overlapping subgroups. Then,

S = SUM_g C_g g = 1,...,G

where C_g is the `sectoral contribution' of group g, i.e. C_j summed over every obs within group g.

For the additive decomposition of S into within- and between-group segregation components, Hutchens (2004) shows that:

S = SW + SB = [ SUM_g w_g*S_g ] + SB

where SW is total within-group segregation, S_g is the value of S for subgroup g, and `subgroup weight', w_g, is defined as:

w_g = sqrt[ N(A_g)/N(A) * N(B_g)/N(B) ]

where N(A_g) is the number from group A in group g and N(B_g) is the number from group B in group g.

SB is total between-group segregation, defined as

SB = 1 - SUM_g w_g.

Between-group segregation may be interpreted as the amount of segregation that there would be if the observations in social groups (defined by segvar) were redistributed across social units (defined by unitvar) such that the within-unit measure were zero (Hutchens, 2004).

Reference

Hutchens, R. 2004. One measure of segregation. International Economic Review 45(2): 555-578.

Jenkins, S.P., Micklewright, J. and Schnepf, S.V. 2006. Social segregation in secondary schools: how does England compare with other countries? Working Paper 2006-02, Institute for Social and Economic Research, University of Essex. http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2006-02.pdf

Author

Stephen P. Jenkins, Institute for Social and Economic Research, University of Essex. Email: stephenj@essex.ac.uk

Acknowledgements

Much of the code for hutchens is based on duncan2 written by Ben Jann (ETH Zurich). hutchens was developed as part of a project on `Social Segregation in UK Schools: Benchmarking with International Comparisons', undertaken jointly with John Micklewright and Syke Schnepf (University of Southampton), and supported by grant RES-000-22-0995 from the UK Economic and Social Research Council. Jenkins also acknowledges core funding support for ISER from the ESRC and the University of Essex.

Also see