.-
help for ^seg^                                               (sean f. reardon)
.-

Segregation Indices Calculation
-------------------------------

    ^seg^ varlist [^if^ exp] [^in^ range], [^d^ ^g^ ^h^ ^c^ ^r^ ^p^ ^x^ ^s^ ^n^ 
          ^by(^varlist^)^ ^u^nit^(^var^)^ ^ba^se^(^num^)^ ^nodis^play 
          ^gen^erate^(^genlist^)^ ^f^ile^(^filename^)^ ^replace^

Note: at least one of the index options (^d g h c r p x s n^) must be specified.

genlist must have the form: 
    (^index^ ^newvar^ [^index^ ^newvar^ [^index^ ^newvar^ [...]]])

    where ^index^ indicates the index to be output, and ^newvar^
    is the name of the variable to be created.  ^index^ must be 
    one of the following:

        ^t^  for total by-group counts
        ^u^  for the number of units within each by-group
        ^i^  for Normalized Simpson Interaction Diversity Index
        ^e^  for Entropy Diversity Index
        ^d^  for Dissimilarity Segregation Index
        ^g^  for Gini Segregation Index
        ^h^  for Information Theory Segregation Index
        ^c^  for Sqsuared Coefficient of Variation Segregation Index
        ^r^  for Relative Diversity Segregation Index
        ^p^  for (n-group) Normalized Exposure Segregation Index
        ^x^  for Exposure Index
        ^s^  for Isolation Index
        ^n^  for 2-group Normalized Exposure Index


Description
-----------

^seg^ calculates multiple-group diversity and segregation indices 
for the variables in varlist.  The ^by^ and ^unit^ options allow specification 
of the organization level at which segregation is to be calculated.
The ^generate^ and ^file^ options allow index values for each value of the 
by-group variable(s) to be output to either the current file or a new file.


Options
--------

^d^ specifies that the Dissimilarity Index is to be calculated.  The Simpson 
    Diversity Indices are also calculated if this option is specified.

^g^ specifies that the Gini Index is to be calculated.  The Simpson 
    Diversity Indices are also calculated if this option is specified.

^h^ specifies that the Theil Information Theory Index is to be calculated.
    The Theil Entropy Diversity Index is also calculated if this option is 
    specified.

^c^ specifies that the Squared Coefficient of Variation Index is to be calculated.
    The Simpson Diversity Indices are also calculated if this option is 
    specified.

^r^ specifies that the Relative Diversity Index is to be calculated.  The Simpson 
    Diversity Indices are also calculated if this option is specified.

^p^ specifies that the (multi-group) Normalized Exposure Index is to be 
    calculated. The Simpson Diversity Indices are also calculated if this option
    is specified.

^x^ specifies that the (two-group) Exposure Index is to be calculated.  The 
    calculated exposure is the exposure of the group specified in var1 to the
    group specified in var2.  Other groups listed in varlist are included in 
    the calculation of the exposure index.

^s^ specifies that the Isolation Index is to be calculated for the group specified
    in var1.  Other groups listed in varlist are included in 
    the calculation of the isolation index.

^n^ specifies that the (two-group) Normalized Exposure Index is to be calculated.
    The calculated exposure is the exposure of the group specified in var1 to 
    the group specified in var2.  Other groups listed in varlist are included in
    the calculation of the normalized exposure index.

^by(^varlist^)^ specifies that the indices are to be calculated within varlist.
    If the ^by^ variable option is not sepcified, then segregation is 
    calculated over the entire set of observations.

^unit(^varname^)^ specifies that segregation is to be calculated between
    observations with distinct values of ^varname^.  Effectively, observations
    are grouped on the unit variable, and segregation is calculated between 
    these units.  This is used, for example, if each observation is a census
    block group and one wants to calculate segregation between tracts.  If the
    ^unit^ variable option is not specified, then each observation is treated
    as a separate unit.

^base(^num^)^ specifies that the entropy index of diversity should be calculated
    using logarithms of base num, where num is an integer greater than 1.  The 
    default is to use base M, where M is the number of groups specified in 
    varlist.

^nodisplay^ specifies that output should be surpressed.  If two or more variables
    are listed in the ^by^ option, ^nodisplay^ is the default.

^generate(^clist^)^ specifies that the values of the indices indicated in ^clist^ 
    are to be written to the current file, with variable names as indicated in 
    ^clist^.  If the ^file^ option is also specified, ^generate^ will cause the 
    variables listed in ^clist^ to be written to the new file rather than to 
    the current file.

^file(^filename^)^ specifies that the values of the indices requested are to be
    written to a separate file.  [Note: ^seg^ reserves several variable names as
    defaults if none are specified in ^generate^: ^Total^, ^nunits^, ^Dseg^, ^Gseg^, 
    ^Hseg^, ^Cseg^, ^Rseg^, ^Pseg^, ^Xseg^, ^Sseg^, ^Nseg^, ^Idiv^, and ^Ediv^.  
    This can cause a conflict if a variable specified in the ^by^ option uses 
    one of these reserved names.  Conflicts can be avoided in this case by 
    using the ^generate^ option to specify new names for the variables written 
    to the new file.]

^replace^, when specified with the ^file^ option, forces ^seg^ to overwrite the 
    file specified in the ^file^ option, if it already exists.  If ^replace^ is
    not specified and the file already exists, it will not be overwritten.


Remarks
-------

The varlist variables should be non-negative counts of mutually exclusive 
categories (e.g. counts by race, sex, etc.).  The by option is used to
specify the level of organization within which segregation is to be 
calculated, and the unit option is used to specify the level of 
organization between which segregation is to be calculated.  

Observations with missing values on any of the variables in ^varlist^ are 
dropped, as are observations with missing values on ^by^ or ^unit^, if
these are specified.

^seg^ calculates the indices as follows.
For each unit, calculate the total count within the unit as 

    t = SUM(varlist)

and the proportion of the unit within category ^n^ as

    q^n^ = var^n^/t.

The Simpson Interaction Diversity Index of each unit is then

    I^u^ = SUM[q^n^ * (1 - q^n^)]

The Normalized Interaction Index of each unit is then

    NI^u^ = [n/(n-1)] * I^u^

The Entropy Diversity Index of each unit is then

    E^u^ = SUM[q^n^ * LOG(1/q^n^)].

The corresponding Diversity indices of each by-group (I^g^, NI^g^, & E^g^)
are calculated similarly

    I^g^ = SUM[Q^n^ * (1 - Q^n^)]

    NI^g^ = [n/(n-1)] * I^g^

    E^g^ = SUM[Q^n^ * LOG(1/Q^n^)]

where T and Q^n^ are calculated over each by-group rather than each unit.

The multiple-group segregation indices of each by-group `g' are defined 
then as follows:

    D^g^ = SUMn[SUMu[t * |Q^n^ - q^n^|]] / (2 * T * I^g^)

    G^g^ = SUMn[SUMui[SUMuj[t^i^ * t^j^ * |q^ni^ - q^nj^|]]] / (2 * T * T * I^g^)

    H^g^ = 1 - [SUM((t/T)*E^u^) / E^g^].

    C^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / [T * Q^n^ * (M - 1)]].

    R^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / (T * I^g^)].

    P^g^ = SUMn[SUMu[t * (Q^n^ - q^n^) * (Q^n^ - q^n^)] / [T * (1 - Q^n^)]].

    X^g^ = SUMu((t * q^1^ * q^2^) / (T * Q^1^)].

    S^g^ = SUMu((t * q^1^ * q^1^) / (T * Q^1^)].

    N^g^ = 1 - X^g^/Q^2^.

where M is the number of groups specified in varlist; 
where SUMn indicates a sum over all n groups in varlist; and 
where SUMu (and SUMui or SUMuj) indicates as sum over all units.

Seven of the segregation indices (D G H C R P N) have a minimum of 0 
(no segregation) and a maximum of 1 (complete segregation).  X has a minimum of 
0 (no exposure) and an upper bound of 1 (complete exposure).  S has a lower 
bound of 0 (no isolation) and a maximum of 1 (complete isolation).


Examples
---------
        Suppose the data contain racial enrollment counts by school, with
        variables ^sch^, ^dst^, and ^msa^ identifying the school, district, and
        metropolitan area of each school, and with ^white^, ^black^, ^hisp^, ^asian^,
        and ^natam^ variables containing within-school enrollment counts for 5
        racial/ethnic groups.  Then

        . ^seg white black, d^

            calculates the between-school dissimilarity index between White
            and Black students among all schools in the data set.  

        . ^seg white black hisp asian, g by(msa) u(dst) gen(g gwbha i iwbha)^ 

            calculates for each metropolitan area the between-district
            gini index among White, Black, Hispanic, and Asian students,
            and outputs the gini index and diversity of each metropolitan area
            to the variables ^gwbha^ and ^iwbha^.

        . ^seg white black, d g v h by(msa dst) file(c:\outfile.dta) replace^

            calculates and writes to file "c:\outfile.dta" the 
            dissimilarity, gini, variance ratio, and entropy indices (and 
            the relevant diversity indices) between White and Black students
            within each district in each metropolitan area.  Because two 
            variables are listed in the BY option, the results will not be 
            displayed to the screen.

        . ^seg white black hisp asian natam, x s n^ 

            calculates the white-black exposure index, the white isolation 
            index, and the normalized white-black exposure index among all 
            schools in the data.  Note that this will give different results 
            than

        . ^seg white black, x s n^ 

            which will calculate the exposure and isolation indices ignoring
            all students other than black and white students.


Author
-------

         sean f. reardon
         sean@@pop.psu.edu 

References
-----------

James, David R. and Karl E. Taeuber. 1985. "Measures of segregation." 
      Sociological Methodology 14:1-32
Massey, Douglas S. and Nancy A. Denton. 1988. "The dimensions of racial 
      segregation." Social Forces 67:281-315.
Reardon, Sean F., and Glenn Firebaugh. 2002. "Measures of multigroup 
      segregation."  Sociological Methodology 32: 33-67.
White, Michael J. 1986. "Segregation and diversity measures in population 
      distribution." Population Index 52:198-221.
Zoloth, Barbara S. 1976. "Alternative measures of school segregation." Land 
      Economics 52:278-298.