{smcl}
{* *! version 1.1.2; February 09, 2013 @ 13:44:00 DE}{...}
{hi:help clstop_lbt}
{hline}
{title:Title}
{p 4 18 2}
{hi:clstop_lbt} -- Steinley & Brusco's lower bound technique (LBT) to determine the number of kmeans clusters
{title:Syntax}
{phang}
{cmd: cluster stop [{it:clname}], rule(lbt)}
{title:Description}
{pstd}{cmd:clstop_lbt} adds the rule {hi:lbt} to the post-estimation command {help cluster stop} to
determine the number of k-means clusters using Steinley & Brusco's (2011) lower bound technique
(LBT).
{pstd}{hi:clstop_lbt} creates the normalized index LBT that measures the closeness of the observed
value of the within-cluster sums of squares (SSE) to the minimum value of SSE in terms of total
sums of squares (SST) according to LBT = (SSE - SSE(min))/SST. The method to determine the lower
bound of the SSE is given in Steinley & Brusco (2011, p. 289). If the number of variables is equal or
less than the number of clusters {it:k}, LBT is equal to the ratio SSE/SST (in this case, the LBT cannot
be used). Using the LBT, a partition into {it:k} clusters is chosen such that LBT({it:k}) is minimum.
{pstd}{hi:clstop_lbt} can also be used to determine whether there is more than one cluster. In this
case the ratio SSE(2)/SST of a two cluster solution should be less than the lower bound ratio (LBR)
obtainable when there is only one cluster - assuming a (multivariate) normal distribution, the
LBR(normal) is 1-2/pi = .3634, assuming a univariate distribution the LBR(univariate) is .25.
{pstd}A simulation study by Steinley & Brusco (2011) shows that the LBT index outperforms the
accuracy and precision of the CH (Calinski-Harabasz) index. However, the LBT requires that the
number of variables exceed the number of clusters. In cases of equal or more clusters than the
number of variables Steinley & Brusco recommend to use the CH index which is also calculated by
{cmd: clstop_lbt} (see {help clstop_lbt##results:Saved Results}) and which is the default
when using -cluster stop-.
{title:Example}
{phang}. {stata webuse iris}{p_end}
{phang}. {stata cluster kmeans seplen-petwid, k(2) s(pr(1))}{p_end}
{phang}. {stata cluster stop, rule(lbt)}{p_end}
{phang}. {stata cluster kmeans seplen-petwid, k(3) s(pr(1))}{p_end}
{phang}. {stata cluster stop, rule(lbt)}{p_end}
{phang}. {stata cluster kmeans seplen-petwid, k(4) s(pr(1))}{p_end}
{phang}. {stata cluster stop, rule(lbt)}{p_end}
{marker results}{...}
{title:Saved Results}
{pstd} {cmd:cluster stop} with {cmd:rule(lbt)} saves the following in {cmd:r()}: {p_end}
{synoptset 15 tabbed}{...}
{p2col 5 15 19 2: Scalars}{p_end}
{synopt:{cmd:r(N)}}number of valid cases (listwise){p_end}
{synopt:{cmd:r(k)}}number of partitions (clusters){p_end}
{synopt:{cmd:r(SSE_#)}}Within clusters (error) sum of squares for # partitions{p_end}
{synopt:{cmd:r(SSB_#)}}Between clusters sum of squares for # partitions{p_end}
{synopt:{cmd:r(SSE_SST_#)}}Ratio SSE/SST for # partitions{p_end}
{synopt:{cmd:r(calinski_#)}}Calinski & Harabasz pseudo F for # partitions{p_end}
{synopt:{cmd:r(LBT_#)}}Index LBT for # partitions{p_end}
{synoptset 15 tabbed}{...}
{p2col 5 15 19 2: Macros}{p_end}
{synopt:{cmd:r(clname)}}name of the cluster analysis{p_end}
{synopt:{cmd:r(vars)}}list of variables used{p_end}
{synopt:{cmd:r(rule)}}{cmd:lbt}{p_end}
{title:References}
{phang}Steinley, D. & Brusco, M. J. (2011). {browse "http://psycnet.apa.org/journals/met/16/3/285/":Choosing the number of clusters in K-means clustering}. {it:Psychological Methods}, {it:16}, 285-297.{p_end}
{title:Also see}
{psee}
Manual: {manhelp cluster_subroutines MV:cluster programming subroutines}{p_end}
{title:Author}
{phang}Dirk Enzmann{p_end}
{phang}Institute of Criminal Sciences, Hamburg{p_end}
{phang}email: {browse "mailto:dirk.enzmann@uni-hamburg.de"}{p_end}