Duncan & Duncan dissimilarity index
duncan depvar groupvar [weight] [if exp] [in range] [, frequencies missing nolabel format(%fmt) ]
duncan2 depvar groupvar [weight] [if exp] [in range] [, missing format(%fmt) d(newvar) ncat(newvar) nobs(newvar) dj(newvar) ]
by ... : may be used with duncan and duncan2; see help by.
fweights and aweights are allowed; see help weights.
Description
duncan computes the segregation statistic known as dissimilarity index D (Duncan and Duncan 1955). depvar is the categorical characteristic of interest (e.g. occupations) and groupvar defines the groups (e.g. sex). D will be displayed for all pairwise comparisons of groups. The maximum number of distinct categories in depvar is limited to 300 in Intercooled Stata and 1200 in Stata/SE.
duncan2 also computes D, but has no limitation concerning the number of categories in depvar. However, note that the groupvar must be 0/1 with duncan2.
Furthermore, duncan and duncan2 differ in the treatment of the by prefix. duncan computes and displays D one after another for each by-group, whereas duncan2 does all computations in one call and displays all results in one table.
duncan and duncan2 compute D from individual level data. To calculate D from aggregate data, see the dissim package by Nicholas J. Cox. Also consider the seg package by Sean F. Reardon, which may be used to compute a variety of segregation indices.
Options
frequencies specifies that a two-way table of frequency counts be displayed (duncan only).
missing requests that missing values be treated like other values.
nolabel causes the numeric codes of the groups to be displayed rather than the value labels (duncan only).
format(%fmt) specifies the format to be used to display the results. The default is format(%10.0g).
d(newvar), ncat(newvar), nobs(newvar) may be used to save the results (D, the number of categories, the number of observations) as variables (duncan2 only).
dj(newvar) may be used to save the dissimilarity values of the individual categories as a variable (the sum over these values results in D) (duncan2 only).
Examples
Occupational sex segregation:
. duncan isco88 sex
Sex segregation in schools by country:
. sort country . by country: duncan2 schoolid sex
Saved Results
duncan saves in r():
Scalars:
r(c) number of distinct categories in depvar r(N) number of observations
Matrices:
r(D) pairwise dissimilarity indices
Methods and Formulas
Let N(A_j) be the frequency of category j in group A (e.g. the frequency of male janitors) and N(B_j) be the frequency of category j in group B (e.g. the frequency of female janitors). The dissimilarity index D is defined as
D = 0.5 * sum_j | N(A_j)/N(A) - N(B_j)/N(B) | j = 1,...,J
where N(A) and N(B) are the overall group sizes. D may be interpreted as the proportion of subjects in group B that would have to change category in order to get the same relative distribution as in group A (or vice versa).
References
Duncan, O.D., Duncan, B., 1955: A Methodological Analysis of Segregation Indexes. American Sociological Review 20: 210-217.
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see