-------------------------------------------------------------------------------
help for clv                                               Jean-Benoit Hardouin
-------------------------------------------------------------------------------

Clustering of variables around latent components

clv [varlist] [if exp] [in range] [weight] [, nostandardized bar consolidation(#) nodendro savedendro(filename[,replace]) cutnumber(#) showcount textsize(string) deltaT horizontal abbrev(#) title(string) caption(string) kernel(numlist) method(keyword) nobiplot addvar genlv(string) replace std dim(string)]

Description

clv clusters variables around latent components. The variables are clustered stepwise by seeking to minimize at each step the decrease of the T criterion, computed as the sum of the first eigenvalues of the matrices of data of all the clusters. A hierarchical cluster analysis based on this criterion is performed. A consolidation procedure can be run subsequently which allows each variable to be assigned to the latent component it is the most correlated with.

Options

nostandardized uses centered variables instead of standardized variables.

bar displays a chart of the decrease in the T criterion at each step.

consolidation performs a consolidation procedure with the obtained partition into the specified number of clusters (by default, no consolidation procedure is performed).

nodendro suppresses the display of the dendogram.

savedendro saves the dendrogram in the file defined by this option. If this file already exists, it is possible to replace it with the replace option.

cutnumber defines the number of clusters presented in the dendrogram (40 by default).

showcount displays the number of variables in each cluster (usefull with the cutnumber option).

textsize defines the size of the labels of the variables on the dendrogram (see textsizestyle).

deltaT uses the variation of the T criterion as height variable for the dendrogram.

horizontal displays an horizontal (instead vertical) dendrogram.

abbrev defines the length of the variables labels on the dendrogram (15 characters by default).

title defines the title of the dendrogram.

caption defines the caption of the axis of the dendrogram which indicates the names of the variables.

kernel defines one or several kernels of variables (variables which are clustered together in an initial step). The first number indicates that the first variables are clustered together, the second number indicates that the following variables are clustered together...

method indicates the method to cluster the variables among classical (by default) for the method described by Vigneau and Qannari, polychoric for a use of the matrix of polychoric coefficients of correlation (instead of Pearson coefficients of correlation), v2 for a modified algorithm wich search to minimize the maximum second eigenvalue among the clusters of 2 variables and more, polychoricv2 which correspond to the v2 option with the matrix of polychoric coefficients of correlation, and centroid which is defined by Vigneau and Qannari as an adaptation of CLV when the sign of the correlation coefficients between the variables is important.

nobiplot avoids to display a biplot of the latent variables with the consolidation option.

genlv saves the latent variables in new variables with the string as prefix (followed by a number). This option must be used in conjonction with the consolidation option.

replace allows replacing the variables creates with the genlv option if they already exist.

std allows standardizing the latent variables for the graphical representation on the biplot.

dim(string) allows choosing the axes represented on the biplot.

If no varlist is indicated, the procedure uses the varlist from the last clv procedure, but does not perform the hierarchical cluster analysis.

Notes

The classifications around latent variables (CLV) is defined by its authors (Vigneau and Qannari, 2003) only for continuous variables. Results with binary or ordinal variables must be interpreted with precautions.

Only fweights are allowed. The biplots are disabled if weights are used.

In this procedure, all the individuals with at least one missing value are omitted.

With the polychoric and polychoricv2 methods, the nostandardized option is disabled.

This module uses the following modules downloadable on SSC: ssc describe polychoric, ssc describe biplotvlab and ssc describe genscore

Example

. clv var1-var15 /*performs the HCA procedure*/

. clv var1-var15, cons(6) bar nodendro meth(centroid) /* performs the HCA procedure based on the centroid method followed by a consolidation procedure with 6 clusters*/

. clv, cons(3) addvar /*performs only the consolidation procedure with 3 clusters, based on the preceeding HCA procedure*/

Aknowledgements

The author thanks Ronan Conroy for all the propositions of improvements.

Reference

Vigneau E. and Qannari E. M. Clustering of variables around latent components. Communications in Statistics - Simulation and Computation. 32(4): 1131-1150, 2003.

Author

Jean-Benoit Hardouin, PhD, assistant professor EA 4275 SPHERE "Team of Biostatistics, Clinical Research and Subjective Measures in Health Sciences" University of Nantes - Faculty of Pharmaceutical Sciences 1, rue Gaston Veil - BP 53508 44035 Nantes Cedex 1 - FRANCE Email: jean-benoit.hardouin@univ-nantes.fr