------------------------------------------------------------------------------- help forbiplotmanual: dialog: -------------------------------------------------------------------------------

Draw JK-, SQ- and GH- biplots

biplotvarlist[weight] [ifexp] [inrange] [,[jk|sq|gh|mixed(jk|sq|ghjk|sq|gh)]dimensions(# #)[obsonly|varonly]covariancervmahalanobissubpop(varname)flip(x|yxy)stretch(#)jitter(relativesize)generate(varname1 varname2)scatter_optionsline_optionstwoway_options]

aweights, andfweights, are allowed; see help weights. However, no weights are allowed with optionrv, andaweights are not allowed with optionssqoptiongh.

Description

biplotdraws biplots of the data matrix defined by varlist. By default, a JK-biplot with standardized values will be drawn. Biplots are useful for visual inspection of data matrices, allowing the eye to identify patterns, regularities and outliers. In a biplot variables (columns) are shown as arrows from the origin and observations (rows) are shown as points.The configuration of arrows reflects the relations of the variables. The cosine of the angle between the arrows reflects the correlation between the variables they represent. If the variables are not standardized, the length of each arrow reflects the standard deviation of the variable it represents.

The scatter of observations shows relations among observations. The distance between two points approximates the Euclidean distance between two observations of the data matrix. The cutpoint of a perpendicular from a point to an arrow shows the value of the variable the arrow represents.

Options

jk|sq|ghspecifies the type of biplot.jkspecifies the default, a JK-biplot. The JK biplot approximates the Euclidean distances between observations more closely than the other types.ghspecifies a GH-biplot. The GH biplot represents the relations between variables more closely than the other types.sqspecifies a SQ biplot (symmetric biplot).

mixed()can be used instead of the biplot-types to combine the relataive advantages of the different biplot-types. Inside the parentheses one first states a byplot-type for the observations and than a type for the variables. The plot positions of observations and variables are than calculeted respectively. Gabriel (2001), for example, proposes a "correspondence analysis", by using a JK-biplot for the observations and a GH-biplot for the variables. This can be achieved withmixed(jkgh).

Note:In Intercooled Stata "matsize too small" is a likely error message with typeghorsq, even with small sample sizes. matsize has to be at least number of observations + 1. With Intercooled Stata, SQ and GH biplots are only recommended for data with few observations and are only possible up to 799 observations.

dimensions(# #)is used specify the meaning of the graph-axis. The default is to use the coordinates which corresponent to the highest two Eigenvalues. For JK-biplots these are the first two principal components.dimensions()allows to use arbitrary axes. A JK-biplot withdim(3 4)for example, would plot all values in the space of the 3rd and 4th principal component.

obsonly|varonlyare used to supress either the plotting of observations or variables. A JK-biplot withobsonlyis a component score plot, and a JK-biplot with optionsvaronlyandstretch(1)is a Plot of the PCA-coefficients.

covarianceuses original instead of standardized values.

rvis used to produce relative variation diagramms. Relative variation diagrams are biplots for compositional data and compositional data are data sets with constant row-sums and only positive value (like, for example the row percentages of twoway frequency tables). To get a relative variation diagramm the data matrix needs to be transformed before producing the biplot, and the optionrvdoes this transformtion for you.

mahalanobiscan be used for GH biplots to rescale the graph in a way that the distances between the observations approximates the Mahalnobis distances.

generate(varname1 varname2)is used to store the coordinates for the observations and the variables as variables in the dataset. The y-axis coordinates for the observations are stored in name1_y and the x_axis coordinates for the observations are stored in name1_x. Accordingly, the coordinates for the variables are stored in name2_y and name2_x.

subpop(varname)is used to hilite observations from different subpopulations with different plotsymbols. Note, that by default a legend is drawn to identify the subpopulation. The legend, however, changes the aspect ratio of the biplot. If you don't like this, you can turn the legend off or you can refine the aspect ration withxsize(). Another way to hilite subpopulations would be the optionmlab(), which is described below.

flip(x|y|xy)exchanges the signs of the axes.flip(x)andflip(y)exchange signs of the indicated axis.flip(xy)flips both axes.flip()is seldom used, but might be useful if you want to compare your results with the results of other software-packages.

stretch(#)draws longer (or if needed shorter) lines for the variable. By defaultstretch()is set to a value which improves readability. You can set the value to any real positive number. Withstretch(1)you will get the original length, and withstretch(2)the lines will be drawn twice as long as the original values.stretch()is seldom used.

jitter(relativesize)adds spherical random noise to the plot symbols of observations. This is useful when plotting data which otherwise would result in points plotted on top of each other. Commonly specified arejitter(5)orjitter(6);jitter(0)is the default. See helprelativesizefor a description of relative sizes.

scatter_optionsare the following set of the options allowed with scatter:----------------------------------------------------------------------

msymbol(symbolstylelist)shape of markermcolor(colorstylelist)color of marker, inside and outmsize(markersizestylelist)size of markermlabel(varlist)specify marker variablesmlabposition(clockposlist)where to locate labelmlabvposition(varlist)where to locate label 2mlabgap(relativesizelist)gap between marker and labelmlabsize(textsizestylelist)size of labelmlabcolor(colorstylelist)color of label ----------------------------------------------------------------------You can specify up to two elements within each option. The first element refers to the display of the observations, the second element refers to the variables. Note, that the default plot symbol for the position of the variables is invisible, that is the default value for msymbol is

msymbol(oh i). The lines for the variables are, however, changed with theline_options.

line_optionsare the following set of the options allowed with line:----------------------------------------------------------------------

clpattern(linepatternstylelist)whether line solid, dashed, etc.clwidth(linewidthstylelist)thickness of lineclcolor(colorstylelist)color of line ----------------------------------------------------------------------Note that the line_options only refer to the display of the variable vectors.

twoway_optionsare those allowed with {cmd:graph twoway} see help twoway_options:

Examples

. biplot mpg weight length turn. biplot mpg weight length turn, gh mlabel(make). biplot mpg weight length turn, gh mlabel(make) msymbol(oh o)

Also seeOnline: help for twoway, graph, scatter,

AuthorUlrich Kohler, WZB, kohler@wz-berlin.de

ReferencesGabriel, K.R. 1971. The biplot graphical display of matrices with application to principal component analysis. Biometrika 58, 453-467.

Gower, J.C. and Hand, D.J. 1996. Biplots. London: Chapman and Hall.

Gabriel, K.R. 2002. Goodness of Fit of Biplots and Correspondence Analysis. Biometrica, 89, 423--436