Deviation plots
devnplot yvar [x1var [x2var]] [if exp] [in range]
[, overall level(exp) sort(varlist) missing
separate(true_or_false_condition) separateopts(scatter_options)
plines[(added_line_options)] superplines[(added_line_options)] pgap(#) superpgap(#)
lineopts(line_options) rspikeopts(rspike_options) clean
scatter_options ]
Description
devnplot by default plots the values of numeric variable yvar as deviations from the mean in increasing order. That is, each deviation is represented as a vertical spike with base given by the mean and with a marker symbol showing the value relative to a vertical scale.
If one or both of x1var and x2var is also specified, observations are grouped by values of x1var (and x2var when specified). Deviations are plotted from the means of yvar for each distinct group so defined, unless the option overall is also specified. Such distinct groups are considered to define distinct "panels". If both x1var and x2var are specified, distinct groups defined by values of x1var are also considered to define distinct "superpanels".
Further variations from the basic design may be obtained by particular option choices.
Remarks
The immediate stimulus for this program was provided by Whitlock and Schluter (2009, pp.396, 519). Further similar examples are given by Grafen and Hails (2002, pp.4-7). Among antecedents, note various graphs in Pearson (1956) and the graph of Fisher (1925, Figure 3 and p.35) combining a quantile plot of rainfall and a plot of wheat yield versus the rank order of the corresponding rainfall. Not every graph needs a distinct name, but every Stata command does. "Deviation plot" is the author's suggestion.
x1var and x2var may be numeric or string. In either case, missing values are ignored unless the missing option is specified. In either case, variables are treated as categorical.
Note that the values of yvar are plotted as separate variables if any other variable is specified. This allows the use of (e.g.) different marker symbols and colours if so desired. The default is to use the same marker symbol and colour, and where specified the same line colour, but those choices can be overridden. If you wish to show a particular group distinctively, that may be easiest to achieve using the Graph Editor.
Some may find this a helpful plot for thinking about one-way or two-way analysis of variance.
This plot is intended to work well with very different group numbers.
devnplot is not designed to show scatter plots with regression lines for two measured variables with data points represented as deviations. For that problem, try code such as
. regress y x . predict predict . scatter y x || rspike y predict x || line predict x, sort ytitle("`: var label y'") legend(off)
Options
What is to be plotted
overall specifies that deviations are shown from the overall mean, regardless of any specification of x1var or x2var.
level(exp) allows the use of any expression to define reference levels, rather than means. Commonly, but not necessarily, the expression will be either a numeric constant or a variable name. It need not be constant in value, even within groups of x1var and/or x2var.
sort(varlist) specifies that values are to be sorted on varlist rather than yvar. Usually, but not necessarily, varlist is a single varname. As a special case, _n may be specified to insist on respecting current sort order. This option does not override any sorting on x1var (and x2var when specified).
missing specifies that missing values of x1var and x2var are to be included as distinct categories. The default is to omit such values.
separate(true_or_false_condition) specifies that observations satisfying a true_or_false_condition should be shown differently.
separateopts(scatter_options) are used in conjunction with separate(), described above, to indicate how such observations should be shown.
Panels and superpanels
plines is a convenience option specifying that lines should be drawn between panels using xline(). plines may also be specified with added_line_options. The default is lc(gs8).
superplines is a convenience option specifying that lines should be drawn between superpanels using xline(). superplines may also be specified with added_line_options. The default is lc(gs4) lw(*1.2).
pgap(#) tunes the space between panels. The default is 2.
superpgap(#) tunes the space between superpanels. The default is 4.
Other graph options
lineopts(line_options) are options of twoway line, which may be used to tune the appearance of the horizontal line segments representing the mean(s).
rspikeopts(rspike_options) are options of twoway rspike, which may be used to tune the appearance of the vertical line segments representing deviations.
clean is a convenient shorthand for lineopts(lc(none ..)) rspikeopts(lc(none)) and removes the scaffolding emphasising that the values are plotted as deviations.
scatter_options are options of scatter and may be used to tune the appearance of markers or the graph in general.
Examples
. set scheme s1color
. sysuse auto, clear
. devnplot mpg
. devnplot mpg foreign . devnplot mpg rep78 . devnplot mpg rep78, pgap(5) . devnplot mpg rep78, overall . devnplot mpg rep78, overall pgap(3) . devnplot mpg rep78, overall plines . devnplot mpg rep78, overall plines pgap(3) . devnplot price foreign . devnplot price foreign, sort(weight) . devnplot price rep78, clean . devnplot price rep78, clean plines . devnplot mpg rep78, clean plines recast(connected) . devnplot mpg foreign, pgap(3) plines(lstyle(major_grid) lc(bg) lw(*8)) plotregion(color(gs15))
. devnplot mpg foreign rep78 . devnplot mpg foreign rep78, superplines(lstyle(yxline)) plines . egen median = median(mpg), by(foreign) . devnplot mpg foreign rep78, superplines(lstyle(yxline)) level(median)
. webuse systolic, clear
. version 9: anova systolic drug disease drug*disease . predict predict . predict residual, residual . devnplot systolic drug disease, level(predict) superplines . devnplot residual drug disease, level(0) superplines
. webuse grunfeld, clear
. devnplot invest company, sort(time) clean ysc(log) yla(1000 300 100 30 10 3 1) recast(line) subtitle(Grunfeld data)
. u smoking_oecd, clear
. devnplot percent gender period, xla(, labsize(*.8) axis(2)) recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h)) superplines(lc(gs14)) . egen nation = group(country) . devnplot percent gender period, xla(, labsize(*.8) axis(2)) recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h)) superplines(lc(gs14)) separate(nation == 24) separateopts(mcolor(blue ..) msize(*1.2 ..)) note("USA highlighted")
Author
Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk
Acknowledgments
Vince Wiggins and David Airey gave helpful and encouraging suggestions.
References
Fisher, R.A. 1925. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
Grafen, A. and Hails, R. 2002. Modern Statistics for the Life Sciences. Oxford: Oxford University Press.
Pearson, E.S. 1956. Some aspects of the geometry of statistics: the use of visual presentation in understanding the theory and application of mathematical statistics. Journal of the Royal Statistical Society A 119: 125-146.
Whitlock, M.C. and Schluter, D. 2009. The Analysis of Biological Data. Greenwood Village, CO: Roberts and Company.
Also see
qplot (if installed); distplot (if installed); stripplot (if installed); dotplot