.-
help for ^seg^                                               (sean f. reardon)
.-

Segregation Indices Calculation -------------------------------

^seg^ varlist [^if^ exp] [^in^ range], [^d^ ^g^ ^h^ ^c^ ^r^ ^p^ ^x^ ^s^ ^n^ > ^by(^varlist^)^ ^u^nit^(^var^)^ ^ba^se^(^num^)^ ^nodis^play ^gen^erate^(^genlist^)^ ^f^ile^(^filename^)^ ^replace^

Note: at least one of the index options (^d g h c r p x s n^) must be specified > .

genlist must have the form: (^index^ ^newvar^ [^index^ ^newvar^ [^index^ ^newvar^ [...]]])

where ^index^ indicates the index to be output, and ^newvar^ is the name of the variable to be created. ^index^ must be one of the following:

^t^ for total by-group counts ^u^ for the number of units within each by-group ^i^ for Normalized Simpson Interaction Diversity Index ^e^ for Entropy Diversity Index ^d^ for Dissimilarity Segregation Index ^g^ for Gini Segregation Index ^h^ for Information Theory Segregation Index ^c^ for Sqsuared Coefficient of Variation Segregation Index ^r^ for Relative Diversity Segregation Index ^p^ for (n-group) Normalized Exposure Segregation Index ^x^ for Exposure Index ^s^ for Isolation Index ^n^ for 2-group Normalized Exposure Index

Description -----------

^seg^ calculates multiple-group diversity and segregation indices for the variables in varlist. The ^by^ and ^unit^ options allow specification of the organization level at which segregation is to be calculated. The ^generate^ and ^file^ options allow index values for each value of the by-group variable(s) to be output to either the current file or a new file.

Options --------

^d^ specifies that the Dissimilarity Index is to be calculated. The Simpson Diversity Indices are also calculated if this option is specified.

^g^ specifies that the Gini Index is to be calculated. The Simpson Diversity Indices are also calculated if this option is specified.

^h^ specifies that the Theil Information Theory Index is to be calculated. The Theil Entropy Diversity Index is also calculated if this option is specified.

^c^ specifies that the Squared Coefficient of Variation Index is to be calculat > ed. The Simpson Diversity Indices are also calculated if this option is specified.

^r^ specifies that the Relative Diversity Index is to be calculated. The Simps > on Diversity Indices are also calculated if this option is specified.

^p^ specifies that the (multi-group) Normalized Exposure Index is to be calculated. The Simpson Diversity Indices are also calculated if this optio > n is specified.

^x^ specifies that the (two-group) Exposure Index is to be calculated. The calculated exposure is the exposure of the group specified in var1 to the group specified in var2. Other groups listed in varlist are included in the calculation of the exposure index.

^s^ specifies that the Isolation Index is to be calculated for the group specif > ied in var1. Other groups listed in varlist are included in the calculation of the isolation index.

^n^ specifies that the (two-group) Normalized Exposure Index is to be calculate > d. The calculated exposure is the exposure of the group specified in var1 to the group specified in var2. Other groups listed in varlist are included i > n the calculation of the normalized exposure index.

^by(^varlist^)^ specifies that the indices are to be calculated within varlist. If the ^by^ variable option is not sepcified, then segregation is calculated over the entire set of observations.

^unit(^varname^)^ specifies that segregation is to be calculated between observations with distinct values of ^varname^. Effectively, observations are grouped on the unit variable, and segregation is calculated between these units. This is used, for example, if each observation is a census block group and one wants to calculate segregation between tracts. If the ^unit^ variable option is not specified, then each observation is treated as a separate unit.

^base(^num^)^ specifies that the entropy index of diversity should be calculate > d using logarithms of base num, where num is an integer greater than 1. The default is to use base M, where M is the number of groups specified in varlist.

^nodisplay^ specifies that output should be surpressed. If two or more variabl > es are listed in the ^by^ option, ^nodisplay^ is the default.

^generate(^clist^)^ specifies that the values of the indices indicated in ^clis > t^ are to be written to the current file, with variable names as indicated in ^clist^. If the ^file^ option is also specified, ^generate^ will cause the > variables listed in ^clist^ to be written to the new file rather than to the current file.

^file(^filename^)^ specifies that the values of the indices requested are to be written to a separate file. [Note: ^seg^ reserves several variable names a > s defaults if none are specified in ^generate^: ^Total^, ^nunits^, ^Dseg^, ^G > seg^, ^Hseg^, ^Cseg^, ^Rseg^, ^Pseg^, ^Xseg^, ^Sseg^, ^Nseg^, ^Idiv^, and ^Ediv^. > This can cause a conflict if a variable specified in the ^by^ option uses one of these reserved names. Conflicts can be avoided in this case by using the ^generate^ option to specify new names for the variables written to the new file.]

^replace^, when specified with the ^file^ option, forces ^seg^ to overwrite the > file specified in the ^file^ option, if it already exists. If ^replace^ is not specified and the file already exists, it will not be overwritten.

Remarks -------

The varlist variables should be non-negative counts of mutually exclusive categories (e.g. counts by race, sex, etc.). The by option is used to specify the level of organization within which segregation is to be calculated, and the unit option is used to specify the level of organization between which segregation is to be calculated.

Observations with missing values on any of the variables in ^varlist^ are dropped, as are observations with missing values on ^by^ or ^unit^, if these are specified.

^seg^ calculates the indices as follows. For each unit, calculate the total count within the unit as

t = SUM(varlist)

and the proportion of the unit within category ^n^ as

q^n^ = var^n^/t.

The Simpson Interaction Diversity Index of each unit is then

I^u^ = SUM[q^n^ * (1 - q^n^)]

The Normalized Interaction Index of each unit is then

NI^u^ = [n/(n-1)] * I^u^

The Entropy Diversity Index of each unit is then

E^u^ = SUM[q^n^ * LOG(1/q^n^)].

The corresponding Diversity indices of each by-group (I^g^, NI^g^, & E^g^) are calculated similarly

I^g^ = SUM[Q^n^ * (1 - Q^n^)]

NI^g^ = [n/(n-1)] * I^g^

E^g^ = SUM[Q^n^ * LOG(1/Q^n^)]

where T and Q^n^ are calculated over each by-group rather than each unit.

The multiple-group segregation indices of each by-group `g' are defined then as follows:

D^g^ = SUMn[SUMu[t * |Q^n^ - q^n^|]] / (2 * T * I^g^)

G^g^ = SUMn[SUMui[SUMuj[t^i^ * t^j^ * |q^ni^ - q^nj^|]]] / (2 * T * T * I^g > ^)

H^g^ = 1 - [SUM((t/T)*E^u^) / E^g^].

C^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / [T * Q^n^ * (M - 1)]] > .

R^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / (T * I^g^)].

P^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / [T * (1 - Q^n^)]].

X^g^ = SUMu((t * q^1^ * q^2^) / (T * Q^1^)].

S^g^ = SUMu((t * q^1^ * q^1^) / (T * Q^1^)].

N^g^ = 1 - X^g^/Q^2^.

where M is the number of groups specified in varlist; where SUMn indicates a sum over all n groups in varlist; and where SUMu (and SUMui or SUMuj) indicates as sum over all units.

Seven of the segregation indices (D G H C R P N) have a minimum of 0 (no segregation) and a maximum of 1 (complete segregation). X has a minimum of > 0 (no exposure) and an upper bound of 1 (complete exposure). S has a lower bound of 0 (no isolation) and a maximum of 1 (complete isolation).

Examples --------- Suppose the data contain racial enrollment counts by school, with variables ^sch^, ^dst^, and ^msa^ identifying the school, district, and metropolitan area of each school, and with ^white^, ^black^, ^hisp^, ^a > sian^, and ^natam^ variables containing within-school enrollment counts for 5 racial/ethnic groups. Then

. ^seg white black, d^

calculates the between-school dissimilarity index between White and Black students among all schools in the data set.

. ^seg white black hisp asian, g by(msa) u(dst) gen(g gwbha i iwbha)^

calculates for each metropolitan area the between-district gini index among White, Black, Hispanic, and Asian students, and outputs the gini index and diversity of each metropolitan area to the variables ^gwbha^ and ^iwbha^.

. ^seg white black, d g v h by(msa dst) file(c:\outfile.dta) replace^

calculates and writes to file "c:\outfile.dta" the dissimilarity, gini, variance ratio, and entropy indices (and the relevant diversity indices) between White and Black students within each district in each metropolitan area. Because two variables are listed in the BY option, the results will not be displayed to the screen.

. ^seg white black hisp asian natam, x s n^

calculates the white-black exposure index, the white isolation index, and the normalized white-black exposure index among all schools in the data. Note that this will give different results than

. ^seg white black, x s n^

which will calculate the exposure and isolation indices ignoring all students other than black and white students.

Author -------

sean f. reardon sean@@pop.psu.edu

References -----------

James, David R. and Karl E. Taeuber. 1985. "Measures of segregation." Sociological Methodology 14:1-32 Massey, Douglas S. and Nancy A. Denton. 1988. "The dimensions of racial segregation." Social Forces 67:281-315. Reardon, Sean F., and Glenn Firebaugh. 2002. "Measures of multigroup segregation." Sociological Methodology 32: 33-67. White, Michael J. 1986. "Segregation and diversity measures in population distribution." Population Index 52:198-221. Zoloth, Barbara S. 1976. "Alternative measures of school segregation." Land Economics 52:278-298.