{smcl} {* 1february2011/10aug2012/3sep2013}{...} {hline} help for {hi:devnplot} {hline} {title:Deviation plots} {p 8 17 2} {cmd:devnplot} {it: yvar} [{it:x1var} [{it:x2var}]] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] {p 17 17 2} [{cmd:,} {cmd:overall} {cmd:level(}{it:exp}{cmd:)} {cmd:sort(}{it:varlist}{cmd:)} {cmdab:desc:ending} {cmdab:miss:ing} {p 17 17 2} {cmd:separate(}{it:true_or_false_condition}{cmd:)} {cmd:separateopts(}{it:scatter_options}{cmd:)} {p 17 17 2} {cmd:plines}[{cmd:(}{it:added_line_options}{cmd:)}] {cmd:superplines}[{cmd:(}{it:added_line_options}{cmd:)}] {cmd:pgap(}{it:#}{cmd:)} {cmd:superpgap(}{it:#}{cmd:)} {p 17 17 2} {cmd:lineopts(}{it:line_options}{cmd:)} {cmd:rspikeopts(}{it:rspike_options}{cmd:)} {cmd:clean} {p 17 17 2} {it:scatter_options} ] {title:Description} {p 4 4 2} {cmd:devnplot} by default plots the values of numeric variable {it:yvar} as deviations from the mean in increasing order. That is, each deviation is represented as a vertical spike with base given by the mean and with a marker symbol showing the value relative to a vertical scale. {p 4 4 2} If one or both of {it:x1var} and {it:x2var} is also specified, observations are grouped by values of {it:x1var} (and {it:x2var} when specified). Deviations are plotted from the means of {it:yvar} for each distinct group so defined, unless the option {cmd:overall} is also specified. Such distinct groups are considered to define distinct "panels". If both {it:x1var} and {it:x2var} are specified, distinct groups defined by values of {it:x1var} are also considered to define distinct "superpanels". {p 4 4 2} Further variations from the basic design may be obtained by particular option choices. {title:Remarks} {p 4 4 2} The immediate stimulus for this program was provided by Whitlock and Schluter (2009, pp.396, 519). Further similar examples are given by Grafen and Hails (2002, pp.4{c -}7). Among antecedents, note various graphs in Pearson (1956) and the graph of Fisher (1925, Figure 3 and p.35) combining a quantile plot of rainfall and a plot of wheat yield versus the rank order of the corresponding rainfall. Not every graph needs a distinct name, but every Stata command does. "Deviation plot" is the author's suggestion. {p 4 4 2} A related plot, which seems especially popular in clinical oncology, is the waterfall plot (or waterfall chart). Common examples show variations in change in tumour dimensions during clinical trials. {cmd:desc rspikeopts(recast(bar) base(0)) ms(none ..)} would be typical option choices. See (e.g.) Gilder (2012). Note, however, that waterfall plots or charts also refer to at least two quite different plots in business and in the analysis of spectra. {p 4 4 2} {it:x1var} and {it:x2var} may be numeric or string. In either case, missing values are ignored unless the {cmd:missing} option is specified. In either case, variables are treated as categorical. {p 4 4 2} Note that the values of {it:yvar} are plotted as separate variables if any other variable is specified. This allows the use of (e.g.) different marker symbols and colours if so desired. The default is to use the same marker symbol and colour, and where specified the same line colour, but those choices can be overridden. If you wish to show a particular group distinctively, that may be easiest to achieve using the Graph Editor. {p 4 4 2} Some may find this a helpful plot for thinking about one-way or two-way analysis of variance. {p 4 4 2} This plot is intended to work well with very different group numbers. {p 4 4 2} {cmd:devnplot} is not designed to show scatter plots with regression lines for two measured variables with data points represented as deviations. For that problem, try code such as {p 4 8 2}{cmd:. regress y x}{p_end} {p 4 8 2}{cmd:. predict predict}{p_end} {p 4 8 2}{cmd:. scatter y x || rspike y predict x || line predict x, sort ytitle("`: var label y'") legend(off)} {title:Options} {p 4 4 2}{it:What is to be plotted} {p 4 8 2} {cmd:overall} specifies that deviations are shown from the overall mean, regardless of any specification of {it:x1var} or {it:x2var}. {p 4 8 2} {cmd:level(}{it:exp}{cmd:)} allows the use of any expression to define reference levels, rather than means. Commonly, but not necessarily, the expression will be either a numeric constant or a variable name. It need not be constant in value, even within groups of {it:x1var} and/or {it:x2var}. {p 4 8 2} {cmd:sort(}{it:varlist}{cmd:)} specifies that values are to be sorted on {it:varlist} rather than {it:yvar}. Usually, but not necessarily, {it:varlist} is a single {it:varname}. As a special case, {cmd:_n} may be specified to insist on respecting current sort order. This option does not override any sorting on {it:x1var} (and {it:x2var} when specified). {p 4 8 2} {cmd:descending} specifies that, other instructions aside, values descend from left to right rather than ascend. {p 4 8 2} {cmd:missing} specifies that missing values of {it:x1var} and {it:x2var} are to be included as distinct categories. The default is to omit such values. {p 4 8 2} {cmd:separate(}{it:true_or_false_condition}{cmd:)} specifies that observations satisfying a {it:true_or_false_condition} should be shown differently. {p 4 8 2} {cmd:separateopts(}{it:scatter_options}{cmd:)} are used in conjunction with {cmd:separate()}, described above, to indicate how such observations should be shown. {p 4 4 2}{it:Panels and superpanels} {p 4 8 2} {cmd:plines} is a convenience option specifying that lines should be drawn between panels using {cmd:xline()}. {cmd:plines} may also be specified with {help added_line_options}. The default is {cmd:lc(gs8)}. {p 4 8 2} {cmd:superplines} is a convenience option specifying that lines should be drawn between superpanels using {cmd:xline()}. {cmd:superplines} may also be specified with {help added_line_options}. The default is {cmd:lc(gs4) lw(*1.2)}. {p 4 8 2} {cmd:pgap(}{it:#}{cmd:)} tunes the space between panels. The default is 2. {p 4 8 2} {cmd:superpgap(}{it:#}{cmd:)} tunes the space between superpanels. The default is 4. {p 4 4 2}{it:Other graph options} {p 4 8 2} {cmd:lineopts(}{it:line_options}{cmd:)} are options of {help twoway line}, which may be used to tune the appearance of the horizontal line segments representing the mean(s). {p 4 8 2} {cmd:rspikeopts(}{it:rspike_options}{cmd:)} are options of {help twoway rspike}, which may be used to tune the appearance of the vertical line segments representing deviations. {p 8 8 2} {cmd:clean} is a convenient shorthand for {cmd:lineopts(lc(none ..)) rspikeopts(lc(none))} and removes the scaffolding emphasising that the values are plotted as deviations. {p 4 8 2} {it:scatter_options} are options of {help scatter} and may be used to tune the appearance of markers or the graph in general. {title:Examples} {p 4 8 2}{cmd:. set scheme s1color} {p 4 8 2}{cmd:. sysuse auto, clear} {p 4 8 2}{cmd:. devnplot mpg} {p 4 8 2}{cmd:. devnplot mpg foreign}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, pgap(5)}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, overall}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, overall pgap(3)}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, overall plines}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, overall plines pgap(3)}{p_end} {p 4 8 2}{cmd:. devnplot price foreign}{p_end} {p 4 8 2}{cmd:. devnplot price foreign, sort(weight)}{p_end} {p 4 8 2}{cmd:. devnplot price rep78, clean}{p_end} {p 4 8 2}{cmd:. devnplot price rep78, clean plines}{p_end} {p 4 8 2}{cmd:. devnplot mpg rep78, clean plines recast(connected)}{p_end} {p 4 8 2}{cmd:. devnplot mpg foreign, pgap(3) plines(lstyle(major_grid) lc(bg) lw(*8)) plotregion(color(gs15))} {p 4 8 2}{cmd:. devnplot mpg foreign rep78}{p_end} {p 4 8 2}{cmd:. devnplot mpg foreign rep78, superplines(lstyle(yxline)) plines}{p_end} {p 4 8 2}{cmd:. egen median = median(mpg), by(foreign)}{p_end} {p 4 8 2}{cmd:. devnplot mpg foreign rep78, superplines(lstyle(yxline)) level(median)} {p 4 8 2}{cmd:. webuse systolic, clear} {p 4 8 2}{cmd:. version 9: anova systolic drug disease drug*disease}{p_end} {p 4 8 2}{cmd:. predict predict}{p_end} {p 4 8 2}{cmd:. predict residual, residual}{p_end} {p 4 8 2}{cmd:. devnplot systolic drug disease, level(predict) superplines}{p_end} {p 4 8 2}{cmd:. devnplot residual drug disease, level(0) superplines}{p_end} {p 4 8 2}{cmd:. webuse grunfeld, clear} {p 4 8 2}{cmd:. devnplot invest company, sort(time) clean ysc(log) yla(1000 300 100 30 10 3 1) recast(line) subtitle(Grunfeld data)}{p_end} {p 4 8 2}{cmd:. u smoking_oecd, clear} {p 4 8 2}{cmd:. devnplot percent gender period, xla(, labsize(*.8) axis(2)) recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h)) superplines(lc(gs14))}{p_end} {p 4 8 2}{cmd:. egen nation = group(country)}{p_end} {p 4 8 2}{cmd:. devnplot percent gender period, xla(, labsize(*.8) axis(2)) recast(line) xti("", axis(2)) xti("", axis(1)) yla(, ang(h)) superplines(lc(gs14)) separate(nation == 24) separateopts(mcolor(blue ..)} {cmd: msize(*1.2 ..)) note("USA highlighted")} {title:Author} {p 4 4 2}Nicholas J. Cox, Durham University{break} n.j.cox@durham.ac.uk {title:Acknowledgments} {p 4 4 2}Vince Wiggins and David Airey gave helpful and encouraging suggestions. {title:References} {p 4 8 2} Fisher, R.A. 1925. {it:Statistical Methods for Research Workers}. Edinburgh: Oliver and Boyd. {p 4 8 2}Gilder, K. 2012. Statistical graphics in clinical oncology. In Krause, A. and O'Connell, M. (Eds) {it:A picture is worth a thousand tables: Graphics in life sciences.} New York: Springer, 173{c -}198. {p 4 8 2} Grafen, A. and Hails, R. 2002. {it:Modern Statistics for the Life Sciences.} Oxford: Oxford University Press. {p 4 8 2} Pearson, E.S. 1956. Some aspects of the geometry of statistics: the use of visual presentation in understanding the theory and application of mathematical statistics. {it:Journal of the Royal Statistical Society} A 119: 125{c -}146. {p 4 8 2} Whitlock, M.C. and Schluter, D. 2009. {it:The Analysis of Biological Data.} Greenwood Village, CO: Roberts and Company. {title:Also see} {p 4 4 2} {help qplot} (if installed); {help distplot} (if installed); {help stripplot} (if installed); {help dotplot}