help clstop_lbt
-------------------------------------------------------------------------------

Title

    clstop_lbt -- Steinley & Brusco's lower bound technique (LBT) to
                  determine the number of kmeans clusters

Syntax

    cluster stop [clname], rule(lbt)

Description

    clstop_lbt adds the rule lbt to the post-estimation command cluster stop
    to determine the number of k-means clusters using Steinley & Brusco's
    (2011) lower bound technique (LBT).

    clstop_lbt creates the normalized index LBT that measures the closeness
    of the observed value of the within-cluster sums of squares (SSE) to the
    minimum value of SSE in terms of total sums of squares (SST) according to
    LBT = (SSE - SSE(min))/SST. The method to determine the lower bound of
    the SSE is given in Steinley & Brusco (2011, p. 289). If the number of
    variables is equal or less than the number of clusters k, LBT is equal to
    the ratio SSE/SST (in this case, the LBT cannot be used). Using the LBT,
    a partition into k clusters is chosen such that LBT(k) is minimum.

    clstop_lbt can also be used to determine whether there is more than one
    cluster. In this case the ratio SSE(2)/SST of a two cluster solution
    should be less than the lower bound ratio (LBR) obtainable when there is
    only one cluster - assuming a (multivariate) normal distribution, the
    LBR(normal) is 1-2/pi = .3634, assuming a univariate distribution the
    LBR(univariate) is .25.

    A simulation study by Steinley & Brusco (2011) shows that the LBT index
    outperforms the accuracy and precision of the CH (Calinski-Harabasz)
    index. However, the LBT requires that the number of variables exceed the
    number of clusters. In cases of equal or more clusters than the number of
    variables Steinley & Brusco recommend to use the CH index which is also
    calculated by clstop_lbt (see Saved Results) and which is the default
    when using -cluster stop-.

Example

    . webuse iris
    . cluster kmeans seplen-petwid, k(2) s(pr(1))
    . cluster stop, rule(lbt)
    . cluster kmeans seplen-petwid, k(3) s(pr(1))
    . cluster stop, rule(lbt)
    . cluster kmeans seplen-petwid, k(4) s(pr(1))
    . cluster stop, rule(lbt)

Saved Results

    cluster stop with rule(lbt) saves the following in r():

    Scalars   
      r(N)           number of valid cases (listwise)
      r(k)           number of partitions (clusters)
      r(SSE_#)       Within clusters (error) sum of squares for # partitions
      r(SSB_#)       Between clusters sum of squares for # partitions
      r(SSE_SST_#)   Ratio SSE/SST for # partitions
      r(calinski_#)  Calinski & Harabasz pseudo F for # partitions
      r(LBT_#)       Index LBT for # partitions

    Macros    
      r(clname)      name of the cluster analysis
      r(vars)        list of variables used
      r(rule)        lbt

References

    Steinley, D. & Brusco, M. J. (2011). Choosing the number of clusters in
        K-means clustering. Psychological Methods, 16, 285-297.

Also see

    Manual: [MV] cluster programming subroutines

Author

    Dirk Enzmann
    Institute of Criminal Sciences, Hamburg
    email: mailto:dirk.enzmann@uni-hamburg.de