```-------------------------------------------------------------------------------
help for biplot                                                       manual:
dialog:
-------------------------------------------------------------------------------

Draw JK-, SQ- and GH- biplots

biplot varlist [weight] [if exp] [in range] [,
[jk|sq|gh|mixed(jk|sq|gh jk|sq|gh)] dimensions(# #)
[obsonly|varonly] covariance rv mahalanobis subpop(varname)
flip(x|y xy) stretch(#) jitter(relativesize)
generate(varname1 varname2) scatter_options line_options
twoway_options ]

aweights, and fweights, are allowed; see help weights.  However, no
weights are allowed with option rv, and aweights are not allowed with
options sq option gh.

Description

biplot draws biplots of the data matrix defined by varlist. By default, a
JK-biplot with standardized values will be drawn. Biplots are useful for
visual inspection of data matrices, allowing the eye to identify
patterns, regularities and outliers. In a biplot variables (columns) are
shown as arrows from the origin and observations (rows) are shown as
points.

The configuration of arrows reflects the relations of the variables. The
cosine of the angle between the arrows reflects the correlation between
the variables they represent. If the variables are not standardized, the
length of each arrow reflects the standard deviation of the variable it
represents.

The scatter of observations shows relations among observations. The
distance between two points approximates the Euclidean distance between
two observations of the data matrix. The cutpoint of a perpendicular from
a point to an arrow shows the value of the variable the arrow represents.

Options

jk|sq|gh specifies the type of biplot. jk specifies the default, a
JK-biplot. The JK biplot approximates the Euclidean distances between
observations more closely than the other types. gh specifies a
GH-biplot.  The GH biplot represents the relations between variables
more closely than the other types. sq specifies a SQ biplot
(symmetric biplot).

mixed() can be used instead of the biplot-types to combine the relataive
advantages of the different biplot-types. Inside the parentheses one
first states a byplot-type for the observations and than a type for
the variables. The plot positions of observations and variables are
than calculeted respectively. Gabriel (2001), for example, proposes a
"correspondence analysis", by using a JK-biplot for the observations
and a GH-biplot for the variables. This can be achieved with mixed(jk
gh).

Note: In Intercooled Stata "matsize too small" is a likely error
message with type gh or sq, even with small sample sizes. matsize has
to be at least number of observations + 1. With Intercooled Stata, SQ
and GH biplots are only recommended for data with few observations
and are only possible up to 799 observations.

dimensions(# #) is used specify the meaning of the graph-axis. The
default is to use the coordinates which corresponent to the highest
two Eigenvalues. For JK-biplots these are the first two principal
components.  dimensions() allows to use arbitrary axes. A JK-biplot
with dim(3 4) for example, would plot all values in the space of the
3rd and 4th principal component.

obsonly|varonly are used to supress either the plotting of observations
or variables. A JK-biplot with obsonly is a component score plot, and
a JK-biplot with options varonly and stretch(1) is a Plot of the
PCA-coefficients.

covariance uses original instead of standardized values.

rv is used to produce relative variation diagramms. Relative variation
diagrams are biplots for compositional data and compositional data
are data sets with constant row-sums and only positive value (like,
for example the row percentages of twoway frequency tables). To get a
relative variation diagramm the data matrix needs to be transformed
before producing the biplot, and the option rv does this
transformtion for you.

mahalanobis can be used for GH biplots to rescale the graph in a way that
the distances between the observations approximates the Mahalnobis
distances.

generate(varname1 varname2) is used to store the coordinates for the
observations and the variables as variables in the dataset. The
y-axis coordinates for the observations are stored in name1_y and the
x_axis coordinates for the observations are stored in name1_x.
Accordingly, the coordinates for the variables are stored in name2_y
and name2_x.

subpop(varname) is used to hilite observations from different
subpopulations with different plotsymbols. Note, that by default a
legend is drawn to identify the subpopulation.  The legend, however,
changes the aspect ratio of the biplot. If you don't like this, you
can turn the legend off or you can refine the aspect ration with
xsize(). Another way to hilite subpopulations would be the option
mlab(), which is described below.

flip(x|y|xy) exchanges the signs of the axes. flip(x) and flip(y)
exchange signs of the indicated axis. flip(xy) flips both axes.
flip() is seldom used, but might be useful if you want to compare
your results with the results of other software-packages.

stretch(#) draws longer (or if needed shorter) lines for the variable.
By default stretch() is set to a value which improves readability.
You can set the value to any real positive number. With stretch(1)
you will get the original length, and with stretch(2) the lines will
be drawn twice as long as the original values.  stretch() is seldom
used.

jitter(relativesize) adds spherical random noise to the plot symbols of
observations. This is useful when plotting data which otherwise would
result in points plotted on top of each other. Commonly specified are
jitter(5) or jitter(6); jitter(0) is the default.  See help
relativesize for a description of relative sizes.

scatter_options are the following set of the options allowed with
scatter:

----------------------------------------------------------------------
msymbol(symbolstylelist)          shape of marker
mcolor(colorstylelist)            color of marker, inside and out
msize(markersizestylelist)        size of marker
mlabel(varlist)                   specify marker variables
mlabposition(clockposlist)        where to locate label
mlabvposition(varlist)            where to locate label 2
mlabgap(relativesizelist)         gap between marker and label
mlabsize(textsizestylelist)       size of label
mlabcolor(colorstylelist)         color of label
----------------------------------------------------------------------

You can specify up to two elements within each option. The first
element refers to the display of the observations, the second element
refers to the variables. Note, that the default plot symbol for the
position of the variables is invisible, that is the default value for
msymbol is msymbol(oh i). The lines for the variables are, however,
changed with the line_options.

line_options are the following set of the options allowed with line:

----------------------------------------------------------------------
clpattern(linepatternstylelist)   whether line solid, dashed, etc.
clwidth(linewidthstylelist)       thickness of line
clcolor(colorstylelist)           color of line
----------------------------------------------------------------------

Note that the line_options only refer to the display of the variable
vectors.

twoway_options are those allowed with {cmd:graph twoway} see help
twoway_options:

Examples

. biplot mpg weight length turn
. biplot mpg weight length turn, gh mlabel(make)
. biplot mpg weight length turn, gh mlabel(make) msymbol(oh o)

Also see

Online:  help for twoway, graph, scatter,

Author

Ulrich Kohler, WZB, kohler@wz-berlin.de

References

Gabriel, K.R. 1971. The biplot graphical display of matrices with
application to principal component analysis. Biometrika 58, 453-467.

Gower, J.C. and Hand, D.J. 1996. Biplots. London: Chapman and Hall.

Gabriel, K.R. 2002. Goodness of Fit of Biplots and Correspondence
Analysis.  Biometrica, 89, 423--436

```