{smcl}
{* 02jul2005}{...}
{hline}
{cmd:help supclust}
{hline}

{title:Build superordinate categories from classification variables}

{p 8 17 2}
{cmd:supclust} {vars} {ifin} {cmd:,}
{cmdab:g:enerate(}{newvar}{cmd:)}
[ {opt alt:ernating} {opt m:issing} ]


{title:Description}

{pstd} {cmd:supclust} may be used to build superordinate categories based
on the values of two or more classification variables. This would be a
useful procedure if, for example, you want to identify distinct clusters in
a trading network based on the identification codes of sellers and buyers.
Another application would be the identification of related entries in a
telephone register, based on a common telephone number or address.

{pstd} Note that {vars} must specify at least two classification variables.
The variables may be numeric or string. However, if the
{opt alternating} option is specified, all variables must be numeric.

{pstd} {cmd:supclust} has to do quite a bit of iterating and sorting,
depending on the maximum length of the paths by which the observations are
connected within the clusters. {cmd:supclust} may therefore take a while
to finish if it is applied to a large and complex dataset.


{title:Options}

{phang} {cmd:generate(}{newvar}{cmd:)} is required and stores unique
identifiers for the superordinate clusters in {it:newvar}. {it:newvar} will
identify the clusters using consecutive integers starting at 1.

{phang} {opt alternating} causes {cmd:supclust} to match values across
classification variables. The default is to treat the classification
variables as representing independent classifications. If the
{opt alternating} option is specified, all variables in {vars}
must be numeric.

{pmore} Suppose, for example, you have a dataset in which each observation
represents an economic transaction between a seller and a buyer. If the
sellers and the buyers are from two distinct populations, then use the default
algorithm to identify the clusters. If, however, sellers
and buyers are drawn from the same population, that is, if specific actors can
appear both as sellers {it:and} buyers, then the {opt alternating} option
should be specified. Note that in this case it is important to use unique
identification numbers for the actors, independent of their appearance as
sellers or as buyers.

{phang} {opt missing} specifies that observations with missing values
be included in the computations. The default is to exclude such cases. If
included, missing values are treated being different from one another, that
is, cases with missing values are not necessarily interpreted as belonging
to the same cluster.


{title:Examples}

        {com}. input id1 id2

             {txt}      id1        id2
          1{com}. 1  1
        {txt}  2{com}. 2  1
        {txt}  3{com}. 2  2
        {txt}  4{com}. 3  2
        {txt}  5{com}. 3  4
        {txt}  6{com}. 4  5
        {txt}  7{com}. 5  3
        {txt}  8{com}. 6  6
        {txt}  9{com}. 6  .
        {txt} 10{com}. .  .
        {txt} 11{com}. end
        {txt}
        {com}. supclust id1 id2, generate(a)
        {txt}4 clusters in 8 observations

        {com}. list id1 id2 a, clean
        {txt}
               {res}id1   id2   a {txt}
          1.   {res}  1     1   1 {txt}
          2.   {res}  2     1   1 {txt}
          3.   {res}  2     2   1 {txt}
          4.   {res}  3     2   1 {txt}
          5.   {res}  3     4   1 {txt}
          6.   {res}  4     5   2 {txt}
          7.   {res}  5     3   3 {txt}
          8.   {res}  6     6   4 {txt}
          9.   {res}  6     .   . {txt}
         10.   {res}  .     .   . {txt}

        {com}. supclust id1 id2, generate(b) alternating
        {res}{txt}2 clusters in 8 observations

        {com}. list id1 id2 b, clean
        {txt}
               {res}id1   id2   b {txt}
          1.   {res}  1     1   1 {txt}
          2.   {res}  2     1   1 {txt}
          3.   {res}  2     2   1 {txt}
          4.   {res}  3     2   1 {txt}
          5.   {res}  3     4   1 {txt}
          6.   {res}  4     5   1 {txt}
          7.   {res}  5     3   1 {txt}
          8.   {res}  6     6   2 {txt}
          9.   {res}  6     .   . {txt}
         10.   {res}  .     .   . {txt}

        {com}. supclust id1 id2, generate(c) missing
        {txt}5 clusters in 10 observations

        {com}. list id1 id2 c, clean
        {txt}
               {res}id1   id2   c {txt}
          1.   {res}  1     1   1 {txt}
          2.   {res}  2     1   1 {txt}
          3.   {res}  2     2   1 {txt}
          4.   {res}  3     2   1 {txt}
          5.   {res}  3     4   1 {txt}
          6.   {res}  4     5   2 {txt}
          7.   {res}  5     3   3 {txt}
          8.   {res}  6     6   4 {txt}
          9.   {res}  6     .   4 {txt}
         10.   {res}  .     .   5 {txt}

        {com}. clear
        {txt}
        {com}. input id1 id2 id3

             {txt}      id1        id2        id3
          1{com}. 1  1  1
        {txt}  2{com}. 2  1  2
        {txt}  3{com}. 3  2  2
        {txt}  4{com}. 4  3  3
        {txt}  5{com}. end
        {txt}
        {com}. supclust id1 id2, generate(a)
        {txt}3 clusters in 4 observations

        {com}. list id1 id2 id3 a, clean
        {txt}
               {res}id1   id2   id3   a {txt}
          1.   {res}  1     1     1   1 {txt}
          2.   {res}  2     1     2   1 {txt}
          3.   {res}  3     2     2   2 {txt}
          4.   {res}  4     3     3   3 {txt}

        {com}. supclust id1 id2 id3, generate(b)
        {txt}2 clusters in 4 observations

        {com}. list id1 id2 id3 b, clean
        {txt}
               {res}id1   id2   id3   b {txt}
          1.   {res}  1     1     1   1 {txt}
          2.   {res}  2     1     2   1 {txt}
          3.   {res}  3     2     2   1 {txt}
          4.   {res}  4     3     3   2 {txt}


{title:Saved Results}

{pstd} Scalars:

    {cmd:r(N)}        number of observations
    {cmd:r(N_clust)}  number of clusters


{title:Author}

{pstd} Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch


{title:Also see}

{psee} Online:  help for {help egen}, {help sort}