------------------------------------------------------------------------------- help fordevnplot-------------------------------------------------------------------------------

Deviation plots

devnplotyvar[x1var[x2var]] [ifexp] [inrange][

,overalllevel(exp)sort(varlist)missing

separate(true_or_false_condition)separateopts(scatter_options)

plines[(added_line_options)]superplines[(added_line_options)]pgap(#)superpgap(#)

lineopts(line_options)rspikeopts(rspike_options)clean

scatter_options]

Description

devnplotby default plots the values of numeric variableyvaras deviations from the mean in increasing order. That is, each deviation is represented as a vertical spike with base given by the mean and with a marker symbol showing the value relative to a vertical scale.If one or both of

x1varandx2varis also specified, observations are grouped by values ofx1var(andx2varwhen specified). Deviations are plotted from the means ofyvarfor each distinct group so defined, unless the optionoverallis also specified. Such distinct groups are considered to define distinct "panels". If bothx1varandx2varare specified, distinct groups defined by values ofx1varare also considered to define distinct "superpanels".Further variations from the basic design may be obtained by particular option choices.

RemarksThe immediate stimulus for this program was provided by Whitlock and Schluter (2009, pp.396, 519). Further similar examples are given by Grafen and Hails (2002, pp.4-7). Among antecedents, note various graphs in Pearson (1956) and the graph of Fisher (1925, Figure 3 and p.35) combining a quantile plot of rainfall and a plot of wheat yield versus the rank order of the corresponding rainfall. Not every graph needs a distinct name, but every Stata command does. "Deviation plot" is the author's suggestion.

x1varandx2varmay be numeric or string. In either case, missing values are ignored unless themissingoption is specified. In either case, variables are treated as categorical.Note that the values of

yvarare plotted as separate variables if any other variable is specified. This allows the use of (e.g.) different marker symbols and colours if so desired. The default is to use the same marker symbol and colour, and where specified the same line colour, but those choices can be overridden. If you wish to show a particular group distinctively, that may be easiest to achieve using the Graph Editor.Some may find this a helpful plot for thinking about one-way or two-way analysis of variance.

This plot is intended to work well with very different group numbers.

devnplotis not designed to show scatter plots with regression lines for two measured variables with data points represented as deviations. For that problem, try code such as

. regress y x. predict predict. scatter y x || rspike y predict x || line predict x, sort ytitle("`:var label y'") legend(off)

Options

What is to be plotted

overallspecifies that deviations are shown from the overall mean, regardless of any specification ofx1varorx2var.

level(exp)allows the use of any expression to define reference levels, rather than means. Commonly, but not necessarily, the expression will be either a numeric constant or a variable name. It need not be constant in value, even within groups ofx1varand/orx2var.

sort(varlist)specifies that values are to be sorted onvarlistrather thanyvar. Usually, but not necessarily,varlistis a singlevarname. As a special case,_nmay be specified to insist on respecting current sort order. This option does not override any sorting onx1var(andx2varwhen specified).

missingspecifies that missing values ofx1varandx2varare to be included as distinct categories. The default is to omit such values.

separate(true_or_false_condition)specifies that observations satisfying atrue_or_false_conditionshould be shown differently.

separateopts(scatter_options)are used in conjunction withseparate(), described above, to indicate how such observations should be shown.

Panels and superpanels

plinesis a convenience option specifying that lines should be drawn between panels usingxline().plinesmay also be specified with added_line_options. The default islc(gs8).

superplinesis a convenience option specifying that lines should be drawn between superpanels usingxline().superplinesmay also be specified with added_line_options. The default islc(gs4) lw(*1.2).

pgap(#)tunes the space between panels. The default is 2.

superpgap(#)tunes the space between superpanels. The default is 4.

Other graph options

lineopts(line_options)are options of twoway line, which may be used to tune the appearance of the horizontal line segments representing the mean(s).

rspikeopts(rspike_options)are options of twoway rspike, which may be used to tune the appearance of the vertical line segments representing deviations.

cleanis a convenient shorthand forlineopts(lc(none ..))rspikeopts(lc(none))and removes the scaffolding emphasising that the values are plotted as deviations.

scatter_optionsare options of scatter and may be used to tune the appearance of markers or the graph in general.

Examples

. set scheme s1color

. sysuse auto, clear

. devnplot mpg

. devnplot mpg foreign. devnplot mpg rep78. devnplot mpg rep78, pgap(5). devnplot mpg rep78, overall. devnplot mpg rep78, overall pgap(3). devnplot mpg rep78, overall plines. devnplot mpg rep78, overall plines pgap(3). devnplot price foreign. devnplot price foreign, sort(weight). devnplot price rep78, clean. devnplot price rep78, clean plines. devnplot mpg rep78, clean plines recast(connected). devnplot mpg foreign, pgap(3) plines(lstyle(major_grid) lc(bg) lw(*8))plotregion(color(gs15))

. devnplot mpg foreign rep78. devnplot mpg foreign rep78, superplines(lstyle(yxline)) plines. egen median = median(mpg), by(foreign). devnplot mpg foreign rep78, superplines(lstyle(yxline)) level(median)

. webuse systolic, clear

. version 9: anova systolic drug disease drug*disease. predict predict. predict residual, residual. devnplot systolic drug disease, level(predict) superplines. devnplot residual drug disease, level(0) superplines

. webuse grunfeld, clear

. devnplot invest company, sort(time) clean ysc(log) yla(1000 300 100 3010 3 1) recast(line) subtitle(Grunfeld data)

. u smoking_oecd, clear

. devnplot percent gender period, xla(, labsize(*.8) axis(2))recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h))superplines(lc(gs14)). egen nation = group(country). devnplot percent gender period, xla(, labsize(*.8) axis(2))recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h))superplines(lc(gs14)) separate(nation == 24) separateopts(mcolor(blue..) msize(*1.2 ..)) note("USA highlighted")

AuthorNicholas J. Cox, Durham University n.j.cox@durham.ac.uk

AcknowledgmentsVince Wiggins and David Airey gave helpful and encouraging suggestions.

ReferencesFisher, R.A. 1925.

Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.Grafen, A. and Hails, R. 2002.

Modern Statistics for the Life Sciences.Oxford: Oxford University Press.Pearson, E.S. 1956. Some aspects of the geometry of statistics: the use of visual presentation in understanding the theory and application of mathematical statistics.

Journal of the Royal Statistical SocietyA 119: 125-146.Whitlock, M.C. and Schluter, D. 2009.

The Analysis of Biological Data.Greenwood Village, CO: Roberts and Company.

Also seeqplot (if installed); distplot (if installed); stripplot (if installed); dotplot