help rel_clust
-------------------------------------------------------------------------------

Title

rel_clust -- relative clusterability and weighted variables for cluster analysis

Syntax

rel_clust varlist [if] [in] [, options]

options Description ------------------------------------------------------------------------- Main suffix(suffix) generation of transformed variables of varlist adding suffix to their variable names (default transformation is vr_ratio) transf(arg) type of variable transformation (requires option suffix) norc no output of "relative clusterability" indices replace replace existing variables requested by suffix

Sub (arguments of transf) vr_ratio variance-to-range ratio weighting (see Steinley & Brusco, 2008, p. 83f.) (default when using option suffix) range range transformation [xij/range(xj)] z_score z-score transformation [xij-(mean(xj))/sd(xj)] -------------------------------------------------------------------------

Description

rel_clust computes indices of relative clusterability of varlist according to Steinley and Brusco (2008) and optionally generates transformed or weighted variables for use in cluster analysis.

rel_clust can be used to transform the variables of varlist by z-standardization, by standardization by range, or by variance-to-range (VR) ratio weighting. The VR ratio weighting procedure was designed specifically for cluster analysis: It reflects the degree of "clusterability" of the set of variables used. According to Steinley and Brusco (2008), VR ratio weighting of variables together with their proposed variable selection procedure clearly increases the ability of K-means cluster analysis to accurately recover the true cluster structure.

Options

+------+ ----+ Main +-------------------------------------------------------------

suffix(suffix) requests the generation of transformed variables. The names of the new variables will be the original variable names with suffix added. Transformation to variance-to-range ratio weighted variables is the default.

transf(arg) specifies the type of transformation (only necessary if other than variance-to-range ratio weighting is requested). Three transformations are possible according to arg (see below).

norc requests not to show the table of "relative clusterability" per variable in the results window.

replace requests that the variables specified by using option suffix will replace already existing variables.

+---------------------+ ----+ Sub (option transf) +----------------------------------------------

vr_ratio requests a transformation by variance-to-range ratio weighting of the original variables (see Steinley & Brusco, 2008, p. 83f.) (default when using option suffix).

range requests a range transformation [xij/range(xj)] of the original variables.

z_score requests a z-score transformation [xij-(mean(xj))/sd(xj)] of the original variables.

Example

The following commands replicate the results shown in Steinley & Brusco (2008) - as to the "relative clusterability" indices see Table 7, p. 102 (x1 to x4), as to the variance and range of the original and transformed variables see Table 6, p. 99:

. webuse iris . tabstat seplen-petwid, s(v r) f(%5.2f) . rel_clust seplen-petwid, tr(z) su(_z1) . tabstat *z1, s(v r) f(%5.2f) . rel_clust seplen-petwid, tr(ra) su(_z2) norc . tabstat *z2, s(v r) f(%5.2f) . rel_clust seplen-petwid, su(_z3) norc . tabstat *z3, s(v r) f(%5.2f)

Saved Results

rel_clust saves the following in r():

Scalars r(N) number of valid cases (listwise)

Macros r(trans) type of transformation used (if requested by option suffix) r(vars) list of variables used

Matrices r(RC) matrix of relative clusterability index per variable

References

Steinley, D. & Brusco, M. J. (2008). A new variable weighting and selection procedure for K-means cluster analysis. Multivariate Behavioral Research, 43, 77-108.

Also see

Help: cluster kmeans

Author

Dirk Enzmann Institute of Criminal Sciences, Hamburg email: mailto:dirk.enzmann@uni-hamburg.de