{smcl}
{* 18feb2004}{...}
{hline}
help for {hi:biplot}{right:manual: }
{right:dialog: }
{hline}
{title:Draw JK-, SQ- and GH- biplots}
{p 8 18 2}{cmd:biplot} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}]
[{cmd:,}
[{cmd:jk}|{cmd:sq}|{cmd:gh}|{cmd:mixed(}{it:jk}|{it:sq}|{it:gh} {it:jk}|{it:sq}|{it:gh}{cmd:)}]
{cmdab:dim:ensions(}{it:# #}{cmd:)}
[{cmd:obsonly|varonly}]
{cmdab:cov:ariance}
{cmd:rv}
{cmdab:mahal:anobis}
{cmdab:sub:pop(}{it:varname}{cmd:)}
{cmdab:f:lip(}{it:x}|{it:y} {it:xy}{cmd:)}
{cmdab:st:retch(}{it:#}{cmd:)}
{cmd:jitter(}{it:relativesize}{cmd:)}
{cmdab:gen:erate(}{it:varname1 varname2}{cmd:)}
{it:scatter_options}
{it:line_options}
{it:twoway_options} ]
{p 4 4 2}
{cmd:aweight}s, and {cmd:fweight}s, are allowed; see help
{help weights}. However, no weights are allowed with option {cmd:rv},
and {cmd:aweight}s are not allowed with options {cmd:sq}
option {cmd:gh}.
{title:Description}
{p 4 4 2} {cmd:biplot} draws biplots of the data matrix defined by
varlist. By default, a JK-biplot with standardized values will be
drawn. Biplots are useful for visual inspection of data matrices,
allowing the eye to identify patterns, regularities and outliers. In a
biplot variables (columns) are shown as arrows from the origin and
observations (rows) are shown as points.
{p 4 4 2} The configuration of arrows reflects the relations of the
variables. The cosine of the angle between the arrows reflects the
correlation between the variables they represent. If the variables are
not standardized, the length of each arrow reflects the standard
deviation of the variable it represents.
{p 4 4 2} The scatter of observations shows relations among
observations. The distance between two points approximates the
Euclidean distance between two observations of the data matrix. The
cutpoint of a perpendicular from a point to an arrow shows the value
of the variable the arrow represents.
{title:Options}
{p 4 8 2} {cmd:jk}|{cmd:sq}|{cmd:gh} specifies the type of
biplot. {cmd:jk} specifies the default, a JK-biplot. The JK biplot
approximates the Euclidean distances between observations more closely
than the other types. {cmd:gh} specifies a GH-biplot. The GH biplot
represents the relations between variables more closely than the other
types. {cmd:sq} specifies a SQ biplot ({it:symmetric biplot}).
{p 4 8 2} {cmd:mixed()} can be used instead of the biplot-types to
combine the relataive advantages of the different biplot-types. Inside
the parentheses one first states a byplot-type for the observations
and than a type for the variables. The plot positions of observations
and variables are than calculeted respectively. Gabriel (2001), for
example, proposes a "correspondence analysis", by using a JK-biplot
for the observations and a GH-biplot for the variables. This can be
achieved with {cmd:mixed(jk gh)}.
{p 8 8 2} {it:Note:} In Intercooled Stata "matsize too small" is a
likely error message with type {cmd:gh} or {cmd:sq}, even with small
sample sizes. {help matsize} has to be at least number of observations
+ 1. With Intercooled Stata, SQ and GH biplots are only recommended
for data with few observations and are only possible up to 799
observations.
{p 4 8 2} {cmdab:dim:ensions(}{it:# #}{cmd:)} is used specify the
meaning of the graph-axis. The default is to use the coordinates which
corresponent to the highest two Eigenvalues. For JK-biplots these are
the first two principal components. {cmd:dimensions()} allows to use
arbitrary axes. A JK-biplot with {cmd:dim(3 4)} for example, would
plot all values in the space of the 3rd and 4th principal component.
{p 4 8 2} {cmd:obsonly|varonly} are used to supress either the
plotting of observations or variables. A JK-biplot with {cmd:obsonly}
is a component score plot, and a JK-biplot with options {cmd:varonly}
and {cmd:stretch(1)} is a Plot of the PCA-coefficients.
{p 4 8 2} {cmd:covariance} uses original instead of standardized
values.
{p 4 8 2} {cmd:rv} is used to produce relative variation
diagramms. Relative variation diagrams are biplots for compositional
data and compositional data are data sets with constant row-sums and
only positive value (like, for example the row percentages of twoway
frequency tables). To get a relative variation diagramm the data
matrix needs to be transformed before producing the biplot, and the
option {cmd:rv} does this transformtion for you.
{p 4 8 2} {cmd:mahalanobis} can be used for GH biplots to rescale the
graph in a way that the distances between the observations
approximates the Mahalnobis distances.
{p 4 8 2} {cmd:generate(varname1 varname2)} is used to store the coordinates
for the observations and the variables as variables in the
dataset. The y-axis coordinates for the observations are stored in
name1_y and the x_axis coordinates for the observations are stored in
name1_x. Accordingly, the coordinates for the variables are stored in
name2_y and name2_x.
{p 4 8 2} {cmd:subpop(}{it:varname}{cmd:)} is used to hilite
observations from different subpopulations with different
plotsymbols. Note, that by default a legend is drawn to identify the
subpopulation. The legend, however, changes the aspect ratio of the
biplot. If you don't like this, you can turn the legend off or you can
refine the aspect ration with {cmd:xsize()}. Another way to hilite
subpopulations would be the option {cmd:mlab()}, which is described
below.
{p 4 8 2} {cmd:flip(x|y|xy)} exchanges the signs of the
axes. {cmd:flip(x)} and {cmd:flip(y)} exchange signs of the indicated
axis. {cmd:flip(xy)} flips both axes. {cmd:flip()} is seldom used, but
might be useful if you want to compare your results with the results
of other software-packages.
{p 4 8 2} {cmd:stretch(#)} draws longer (or if needed shorter) lines
for the variable. By default {cmd:stretch()} is set to a value which
improves readability. You can set the value to any real positive
number. With {cmd:stretch(1)} you will get the original length, and
with {cmd:stretch(2)} the lines will be drawn twice as long as the
original values. {cmd:stretch()} is seldom used.
{p 4 8 2} {cmd:jitter(}{it:relativesize}{cmd:)} adds spherical random
noise to the plot symbols of observations. This is useful when
plotting data which otherwise would result in points plotted on top of
each other. Commonly specified are {cmd:jitter(5)} or {cmd:jitter(6)};
{cmd:jitter(0)} is the default. See help {it:{help relativesize}} for
a description of relative sizes.
{p 4 8 2} {cmd:scatter_options} are the following set of the options
allowed with {help scatter}:
{col 8}{hline 70}
{col 8}{cmdab:m:symbol:(}{it:{help symbolstyle}list}{cmd:)}{col 42}shape of marker
{col 8}{cmdab:mc:olor:(}{it:{help colorstyle}list}{cmd:)}{col 42}color of marker, inside and out
{col 8}{cmdab:msiz:e:(}{it:{help markersizestyle}list}{cmd:)}{col 42}size of marker
{col 8}{cmd:mlabel(}{it:varlist}{cmd:)}{col 42}specify marker variables
{col 8}{cmdab:mlabp:osition:(}{it:{help clockpos}list}{cmd:)}{col 42}where to locate label
{col 8}{cmdab:mlabv:position:(}{it:varlist}{cmd:)}{col 42}where to locate label 2
{col 8}{cmdab:mlabg:ap:(}{it:{help relativesize}list}{cmd:)}{col 42}gap between marker and label
{col 8}{cmdab:mlabs:ize:(}{it:{help textsizestyle}list}{cmd:)}{col 42}size of label
{col 8}{cmdab:mlabc:olor:(}{it:{help colorstyle}list}{cmd:)}{col 42}color of label
{col 8}{hline 70}
{p 8 8 2} You can specify up to two elements within each option. The
first element refers to the display of the observations, the second
element refers to the variables. Note, that the default plot symbol
for the position of the variables is invisible, that is the default
value for msymbol is {cmd:msymbol(oh i)}. The lines for the variables
are, however, changed with the {it: line_options}.
{p 4 8 2}
{cmd:line_options} are the following set of the options allowed with {help line}:
{col 8}{hline 70}
{col 8}{cmdab:cl:pattern:(}{it:{help linepatternstyle}list}{cmd:)}{col 42}whether line solid, dashed, etc.
{col 8}{cmdab:clw:idth:(}{it:{help linewidthstyle}list}{cmd:)}{col 42}thickness of line
{col 8}{cmdab:clc:olor:(}{it:{help colorstyle}list}{cmd:)}{col 42}color of line
{col 8}{hline 70}
{p 8 8 2} Note that the line_options only refer to the display of the
variable vectors.
{p 4 8 2} {cmd:twoway_options} are those allowed with {cmd:graph
twoway} see help {help twoway_options}:
{title:Examples}
{cmd:. biplot mpg weight length turn}
{p 4 8 2}{cmd:. biplot mpg weight length turn, gh mlabel(make)}{p_end}
{p 4 8 2}{cmd:. biplot mpg weight length turn, gh mlabel(make) msymbol(oh o)}
{title:Also see}
{p 4 13 2}
Online: help for {help twoway}, {help graph}, {help scatter},
{title:Author}
{p 4 13 2}
Ulrich Kohler, WZB, kohler@wz-berlin.de
{title:References}
{p 4 4 2}
Gabriel, K.R. 1971. The biplot graphical display of matrices with
application to principal component analysis. Biometrika 58, 453-467.
{p 4 4 2}
Gower, J.C. and Hand, D.J. 1996. Biplots. London: Chapman and Hall.
{p 4 4 2}
Gabriel, K.R. 2002. Goodness of Fit of Biplots and Correspondence Analysis.
Biometrica, 89, 423--436