-------------------------------------------------------------------------------
help for duncan
-------------------------------------------------------------------------------

Duncan & Duncan dissimilarity index

        duncan depvar groupvar [weight] [if exp] [in range] [, frequencies
               missing nolabel format(%fmt) ]

        duncan2 depvar groupvar [weight] [if exp] [in range] [, missing
               format(%fmt) d(newvar) ncat(newvar) nobs(newvar) dj(newvar) ]


    by ... : may be used with duncan and duncan2; see help by.

    fweights and aweights are allowed; see help weights.


Description

    duncan computes the segregation statistic known as dissimilarity index D
    (Duncan and Duncan 1955). depvar is the categorical characteristic of
    interest (e.g. occupations) and groupvar defines the groups (e.g. sex). D
    will be displayed for all pairwise comparisons of groups. The maximum
    number of distinct categories in depvar is limited to 300 in Intercooled
    Stata and 1200 in Stata/SE.

    duncan2 also computes D, but has no limitation concerning the number of
    categories in depvar. However, note that the groupvar must be 0/1 with
    duncan2.

    Furthermore, duncan and duncan2 differ in the treatment of the by prefix.
    duncan computes and displays D one after another for each by-group,
    whereas duncan2 does all computations in one call and displays all
    results in one table.

    duncan and duncan2 compute D from individual level data. To calculate D
    from aggregate data, see the dissim package by Nicholas J. Cox. Also
    consider the seg package by Sean F. Reardon, which may be used to compute
    a variety of segregation indices.


Options

    frequencies specifies that a two-way table of frequency counts be
        displayed (duncan only).

    missing requests that missing values be treated like other values.

    nolabel causes the numeric codes of the groups to be displayed rather
        than the value labels (duncan only).

    format(%fmt) specifies the format to be used to display the results. The
        default is format(%10.0g).

    d(newvar), ncat(newvar), nobs(newvar) may be used to save the results (D,
    the number of categories, the number of observations) as variables
    (duncan2 only).

    dj(newvar) may be used to save the dissimilarity values of the individual
    categories as a variable (the sum over these values results in D)
    (duncan2 only).


Examples

    Occupational sex segregation:

        . duncan isco88 sex

    Sex segregation in schools by country:

        . sort country
        . by country: duncan2 schoolid sex


Saved Results

    duncan saves in r():

    Scalars:

    r(c)    number of distinct categories in depvar
    r(N)    number of observations

    Matrices:

    r(D)    pairwise dissimilarity indices


Methods and Formulas

    Let N(A_j) be the frequency of category j in group A (e.g. the frequency
    of male janitors) and N(B_j) be the frequency of category j in group B
    (e.g. the frequency of female janitors). The dissimilarity index D is
    defined as

        D = 0.5 * sum_j | N(A_j)/N(A) - N(B_j)/N(B) |      j = 1,...,J

    where N(A) and N(B) are the overall group sizes. D may be interpreted as
    the proportion of subjects in group B that would have to change category
    in order to get the same relative distribution as in group A (or vice
    versa).


References

    Duncan, O.D., Duncan, B., 1955: A Methodological Analysis of Segregation
        Indexes.  American Sociological Review 20: 210-217.


Author

    Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


Also see