{smcl} {* 17aug2005}{...} {hline} help for {hi:hutchens} {hline} {title:Hutchens `square root' segregation index, with decompositions by subgroup} {p 8 15 2} {cmd:hutchens} {it:unitvar} {it:segvar} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmdab:by:group:(}{it:groupvar}{cmd:)} {cmdab:m:issing} {cmdab:f:ormat:(%}{it:fmt}{cmd:)} ] {p 4 4 2} {cmd:fweight}s and {cmd:aweight}s are allowed; see help {help weights}. {title:Description} {p 4 4 2}{cmd:hutchens} computes the `square root' segregation index proposed by Hutchens (2004) from individual-level data. Hutchens shows that this index, call it {it:S}, satisfies seven desirable properties for a good numerical measure of segregation. In particular, {it:S} is additively decomposable by population subgroup: total segregation may be expressed as the sum of within-group segregation (a weighted sum of {it:S} across subgroups) plus between-group segregation. {it:S} lies on the unit interval, with zero representing the complete absence of segregation, and one representing complete segregation. If two distributions are unambiguously ordered according to a pair of (non-intersecting) segregation curves, then {it:S} will also order the distributions in the same way. {p 4 4 2}{it:unitvar} is the categorical variable summarising social units and {it:segvar} is the categorical variable defining the social groups who are segregated. For example, in a study of occupational sex segregation, {it:unitvar} would represent occupations and {it:segvar} would represent sex. In a study of the educational segregation by family background, {it:unitvar} would represent schools (say) and {it:segvar} would be a measure of family background. Note that {it:segvar} must be a binary (0/1) variable. For decompositions of {it:S} by population subgroup, {it:groupvar} is the categorical variable defining the subgroups. {p 4 4 2}{it:S} is the sum, over all social units, of each unit's shortfall from distributional evenness. For each value of {it:unitvar}, this shortfall is the difference between the geometric mean of the shares of individuals with different backgrounds characterized by {it:segvar} were there to be no segregation, and the geometric mean of the actual shares. See Jenkins et al. (2006). {title:Options} {p 4 8 2}{cmd:bygroup(}{it:groupvar}{cmd:)} specifies the decomposition by population subgroups defined by {it:groupvar}. If the {cmd:bygroup} option is not specified, calculations are based on the subset of observations with valid values on {it: unitvar} and {it: segvar}. {p 4 8 2}{cmd:missing} requests that missing values on {it:groupvar} be treated like other values. (Cases with missing values form a separate subgroup when decompositions are done.) {cmd:missing} may only specified if the {cmd:bygroup} option is also specified. If the {cmd:bygroup} option is specified and the {cmd:missing} option is not specified, then all calculations (including aggregate statistics) are based on the subset of observations with valid values on {it: unitvar}, {it: segvar}, and {it: groupvar}. {p 4 8 2}{cmd:format(%}{it:fmt}{cmd:)} specifies the format to be used to display the results. The default is {cmd:format(%10.0g)}. {title:Examples} {p 4 8 2} Occupational sex segregation: {p 8 12 2}{inp:. hutchens isco88 sex} {p 4 8 2} Sex segregation in schools, with a decomposition by school type (e.g. public/private): {p 8 12 2}{inp:. hutchens schoolid sex, by(stype)} {p 4 8 2} Sex segregation in schools, with a decomposition by school type and region: {p 8 12 2}{inp:. egen stypeXregion = group(stype region)} {p 8 12 2}{inp:. hutchens schoolid sex, by(stypeXregion)} {title:Saved Results} {p 4 17 2}{cmd:r(S)}{space 7}value of {it:S} for total estimation sample {p 4 17 2}{cmd:r(Ncat)}{space 4}number of distinct categories in {it:unitvar} {p 4 17 2}{cmd:r(Nobs)}{space 4}total number of raw (unweighted) observations {p 4 17 2}{cmd:r(pr_1)}{space 4}fraction of sample with {it:segvar} = 1. {p 4 4 2} If the {cmd:bygroup} option is specified: {p 4 17 2}{cmd:r(SW)}{space 6}within-group segregation value {p 4 17 2}{cmd:r(SWpc)}{space 4}within-group segregation value, expressed as percentage of {it:S} {p 4 17 2}{cmd:r(SB)}{space 6}between-group segregation value {p 4 17 2}{cmd:r(SBpc)}{space 4}between-group segregation value, expressed as percentage of {it:S} {title:Methods and Formulae} {p 4 4 2} Let {it:N}({it:A_j}) be the number from social group {it:A} in unit {it:j} (e.g. the number of men who are bankers) and {it:N}({it:A_j}) be the number from social group {it:B} in unit {it:j} (e.g. the number of women who are bankers). The square root segregation index {it:S} is defined as {p 8 12 2}{it:S} = 1 {c -} SUM_{it:j} sqrt[ {it:N}({it:A_j})/{it:N}({it:A})} * {it:N}({it:B_j})/{it:N}({it:B}) ] {space 4} {it:j} = 1,...,{it:J} {p 4 4 2} or, equivalently, {p 8 12 2}{it:S} = SUM_{it:j} {it:C_j} {p 4 4 2} where the `contribution' of each obs {it:C_j} = {it:N}({it:B_j})/{it:N}({it:B}) {c -} sqrt[ {it:N}({it:A_j})/{it:N}({it:A}) * {it:N}({it:B_j})/{it:N}({it:B}) ], and {it:N}({it:A}) and {it:N}({it:B}) are the total number of obs in groups {it:A} and {it:B}. The {it:C_j} term for a given social unit is the shortfall from distributional evenness for that unit (see the earlier discussion). {p 4 4 2} For decompositions by population subgroup, suppose that the sample can be exhaustively partitioned into {it:G} non-overlapping subgroups. Then, {p 8 12 2}{it:S} = SUM_{it:g} {it:C_g} {space 4} {it:g} = 1,...,{it:G} {p 4 4 2}where {it:C_g} is the `sectoral contribution' of group {it:g}, i.e. {it:C_j} summed over every obs within group {it:g}. {p 4 4 2}For the additive decomposition of {it:S} into within- and between-group segregation components, Hutchens (2004) shows that: {p 8 12 2}{it:S} = {it:SW} + {it:SB} = [ SUM_{it:g w_g*S_g }] + {it:SB} {p 4 4 2}where {it:SW} is total within-group segregation, {it:S_g} is the value of {it:S} for subgroup {it:g}, and `subgroup weight', {it:w_g}, is defined as: {p 8 12 2}{it:w_g} = sqrt[ {it:N}({it:A_g})/{it:N}({it:A}) * {it:N}({it:B_g})/{it:N}({it:B}) ] {p 4 4 2}where {it:N}({it:A_g}) is the number from group {it:A} in group {it:g} and {it:N}({it:B_g}) is the number from group {it:B} in group {it:g}. {p 4 4 2}{it:SB} is total between-group segregation, defined as {p 8 12 2}{it:SB} = 1 {c -} SUM_{it:g w_g}. {p 4 4 2}Between-group segregation may be interpreted as the amount of segregation that there would be if the observations in social groups (defined by {it:segvar}) were redistributed across social units (defined by {it:unitvar}) such that the within-unit measure were zero (Hutchens, 2004). {title:Reference} {p 4 8 2}Hutchens, R. 2004. One measure of segregation. {it:International Economic Review} 45(2): 555{c -}578. {p 4 8 2}Jenkins, S.P., Micklewright, J. and Schnepf, S.V. 2006. Social segregation in secondary schools: how does England compare with other countries? Working Paper 2006-02, Institute for Social and Economic Research, University of Essex. {browse "http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2006-02.pdf"} {title:Author} {p 4 4 2} Stephen P. Jenkins, Institute for Social and Economic Research, University of Essex. Email: stephenj@essex.ac.uk {title:Acknowledgements} {p 4 4 2}Much of the code for {cmd:hutchens} is based on {cmd:duncan2} written by Ben Jann (ETH Zurich). {cmd:hutchens} was developed as part of a project on `Social Segregation in UK Schools: Benchmarking with International Comparisons', undertaken jointly with John Micklewright and Syke Schnepf (University of Southampton), and supported by grant RES-000-22-0995 from the UK Economic and Social Research Council. Jenkins also acknowledges core funding support for ISER from the ESRC and the University of Essex. {title:Also see} {p 4 13 2} {cmd:duncan}, {cmd:duncan2}, {cmd:seg} if installed.