-------------------------------------------------------------------------------
help for parplot
-------------------------------------------------------------------------------

Parallel coordinates plots

parplot varlist [if exp] [in range] [ , by(byvar [, suboptions]) horizontal over(varname) transform(transform) variablelabels plot(plot) addplot(plot) graph_options ]

Description

parplot produces a parallel coordinates plot of varlist. Each variable is plotted on a separate vertical or horizontal scale and the values for each observation are shown by connected line segments. An observation will be ignored if it has missing values for any variable in varlist (and by default, if by() is specified, for byvar).

Such plots have a long history under various guises. Wegman (1990) gave a definitive account for a statistical readership. Cooke and van Noortwijk (2000) discuss their use, under the name cobweb plots, in sensitivity analysis. Robbins (2005) gives examples in an introductory text. Andrienko and Andrienko (2005) give further examples and extensions.

Options

by() specifies that a separate plot should be drawn for each value of byvar. See help on by_option and note, among other possibilities, the suboptions total and missing.

horizontal draws variable scales horizontally. The default is vertical.

over() specifies a variable to be used to identify different categories. Different pens will be used for different categories, and it is possible to specify different marker symbols, line patterns, and so forth.

transform() specifies a transformation to be applied to each variable before plotting. Each transformation may be specified by as little as one letter, m, c, s or r.

maxmin specifies transforming to (value - minimum) / (maximum - minimum), which is the default. Values shown thus vary from 0 to 1.

centered or centred specifies transforming to (value - median) / max(maximum - median, median - minimum). Each median thus is shown at 0 and values shown vary from (possibly) -1 to (possibly) 1. Note that transformed values for any given variable will attain both -1 and 1 if and only if maximum - median = median - minimum. This transform was used by Gleason (1996).

standardized or standardised specifies transforming to (value - mean) / SD. Each mean thus is shown at 0.

raw specifies no transform, i.e. data are shown as supplied. This may be a good choice for variables expressed in the same units.

variablelabels specifies that multiple variables be labelled by their variable labels. The default is to use variable names.

plot(plot) provides a way to add other plots to the generated graph; see help plot option. (Stata 8 only.)

addplot(plot) provides a way to add other plots to the generated graph; see help addplot option. (Stata 9 up.)

graph_options are options of twoway connected.

Examples

. sysuse census, clear

. foreach v in death divorce marriage { . gen r_`v' = log10(`v' / pop) . }

. foreach t in maxmin centred standardised raw { . parplot r_* , tr(`t') by(region, caption(logarithmic scales) title(US states 1980) t1(`t' scaling)) hor yla(1 "deaths" 2 "divorces" 3 "marriages", ang(h)) . more . }

. sysuse auto, clear . gen gpm = 1 / mpg . parplot gpm weight disp, xsc(r(0.8 3.2)) yla(, ang(h)) over(foreign) ms(oh dh) clp(_ ".#") . parplot gpm weight disp, hor ysc(r(0.8 3.2)) yla(, ang(h)) over(foreign) ms(oh dh) clp(_ ".#")

Acknowledgments

The parcoord program written by John R. Gleason for Stata 4 (Gleason 1996) was a most valuable start for this program.

Vince Wiggins made very helpful comments. Ian S. Evans supplied the Andrienko reference. Scott Merryman found a bug. Garry Anderson provoked an update of the help.

Author

Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

References

Andrienko, G. and N. Andrienko. 2005. Blending aggregation and selection: adapting parallel coordinates for the visualization of large datasets. Cartographic Journal 42: 49-60.

Cooke, R.M. and J.M. van Noortwijk. 2000. Graphical methods. In Santelli, A., K. Chan and E.M. Scott (eds) Sensitivity analysis. Chichester: John Wiley, 245-264.

Gleason, J.R. 1996. Graphing high-dimensional data using parallel coordinates. Stata Technical Bulletin 29: 10-14 (STB Reprints 5: 53-60).

Robbins, N.B. 2005. Creating More Effective Graphs. Hoboken, NJ: Wiley.

Wegman, E.J. 1990. Hyperdimensional data analysis using parallel coordinates. Journal, American Statistical Association 85: 664-675.