Estimate population cluster group assignments .-
^clustpop^ varlist [^if^ exp] [^in^ range] [, reps(100) k(3) seed(string) alp > ha(0.05) type("kmeans") distopt("L2") dots ]
Description ----------- clustpop is a routine to estimate population group assignments using the @clust > er@ command.
The cluster command groups cases based on the values of a variable, or the mean/median of a group of variables. However, the group assignments will vary depending on the random seed that starts off the process. So if the -cluster- command is executed many times, it will produce different group assignments.
In other words, There is population of group assignments from which the -cluste > r- command samples a single possibility. Therefore the results from -cluster- are > like taking a sample (N=1) from a population and using that result as an estimate of the group assignment population.
^clustpop^ runs the -cluster- command many times in order to create a larger sa > mple. For each case, the most frequently occuring group assignment is taken as an est > imate of the most common group assignment in the population. The case is assigned to > this group only if the lower bound (at a given alpha) of the population estimat > e is greater than half. In other words, it must be probable that the most frequentl > y occuring group assignment is the group assignment more than half the time in th > e population. If this is not so, the group assignment is set to missing.
Output: ------- Three variables are produced as output: 1. The estimate of the group assignment 2. The proportion of cases that are assigned as in variable 1 above 3. The lower bound of the proportion of cases that are assigned as in variable > 1 above
Options: -------- ^reps(^#^)^ specifies the number of times the cluster command is repeated. The default is 30.
^k(^#^)^ specifies the number of groups (see help for the @cluster@ command). The default is 3.
^seed(^"string"^)^ specifies the random number seed to use at the start. The default is "123456789" (you can just -set seed- intead).
^type(^"string"^)^ indicates the type of averaging (see help for the @cluster@ > command). The default is "kmeans".
^distopt(^"string"^)^ Specifies how the distance between the cases is calculate > d (see help for the cluster command). The default is "L2".
^alpha(^0.00-0.99^)^ gives the alpha for the statistical test. Default is 0.05 > . ^dots^ will print a dot for each replication (shows the command is working...)
Other routines called --------------------- @matsort@ must be installed for clustpop to function
Examples -------- . sysuse auto,clear . clustpop mpg rep78 displacement,k(4)
Author: Paul Millar www.paulmillar.ca paulmi@@nipissingu.ca See also: --------- Online: help for @cluster@