help clstop_lbt-------------------------------------------------------------------------------

Title

clstop_lbt-- Steinley & Brusco's lower bound technique (LBT) to determine the number of kmeans clusters

Syntax

cluster stop [clname], rule(lbt)

Description

clstop_lbtadds the rulelbtto the post-estimation command cluster stop to determine the number of k-means clusters using Steinley & Brusco's (2011) lower bound technique (LBT).

clstop_lbtcreates the normalized index LBT that measures the closeness of the observed value of the within-cluster sums of squares (SSE) to the minimum value of SSE in terms of total sums of squares (SST) according to LBT = (SSE - SSE(min))/SST. The method to determine the lower bound of the SSE is given in Steinley & Brusco (2011, p. 289). If the number of variables is equal or less than the number of clustersk, LBT is equal to the ratio SSE/SST (in this case, the LBT cannot be used). Using the LBT, a partition intokclusters is chosen such that LBT(k) is minimum.

clstop_lbtcan also be used to determine whether there is more than one cluster. In this case the ratio SSE(2)/SST of a two cluster solution should be less than the lower bound ratio (LBR) obtainable when there is only one cluster - assuming a (multivariate) normal distribution, the LBR(normal) is 1-2/pi = .3634, assuming a univariate distribution the LBR(univariate) is .25.A simulation study by Steinley & Brusco (2011) shows that the LBT index outperforms the accuracy and precision of the CH (Calinski-Harabasz) index. However, the LBT requires that the number of variables exceed the number of clusters. In cases of equal or more clusters than the number of variables Steinley & Brusco recommend to use the CH index which is also calculated by

clstop_lbt(see Saved Results) and which is the default when using -cluster stop-.

Example. webuse iris . cluster kmeans seplen-petwid, k(2) s(pr(1)) . cluster stop, rule(lbt) . cluster kmeans seplen-petwid, k(3) s(pr(1)) . cluster stop, rule(lbt) . cluster kmeans seplen-petwid, k(4) s(pr(1)) . cluster stop, rule(lbt)

cluster stopwithrule(lbt)saves the following inr():Scalars

r(N)number of valid cases (listwise)r(k)number of partitions (clusters)r(SSE_#)Within clusters (error) sum of squares for # partitionsr(SSB_#)Between clusters sum of squares for # partitionsr(SSE_SST_#)Ratio SSE/SST for # partitionsr(calinski_#)Calinski & Harabasz pseudo F for # partitionsr(LBT_#)Index LBT for # partitionsMacros

r(clname)name of the cluster analysisr(vars)list of variables usedr(rule)lbt

ReferencesSteinley, D. & Brusco, M. J. (2011). Choosing the number of clusters in K-means clustering.

Psychological Methods,16, 285-297.

Also seeManual:

[MV] cluster programming subroutines

AuthorDirk Enzmann Institute of Criminal Sciences, Hamburg email: mailto:dirk.enzmann@uni-hamburg.de