{smcl} {* 13may2014}{...} {hline} help for {hi:designplot} {hline} {title:Design plot: graphical summary of response given one or more factors} {p 8 17 2} {cmd:designplot} {it:yvar} {it:xvarlist} [{it:weight}] [{help if}] [{help in}] [ {cmd:,} {cmdab:stat:istics(}{it:statistics}{cmd:)} {cmd:prefix(}{it:prefix}{cmd:)} {cmd:saveresults(}{it:filename} [{cmd:,} {it:save_options}{cmd:)} {cmdab:max:way(}{it:#}{cmd:)} {cmdab:min:way(}{it:#}{cmd:)} {cmd:recast(}{c -(}{cmd:bar}{c |}{cmd:hbar}{c )-}{cmd:)} {c -(} {cmd:variablelabels} {c |} {cmd:variablenames} {c )-} {cmd:alllabel(}{it:text}{cmd:)} {cmd:entryopts(}{it:over_subopts}{cmd:)} {cmd:groupopts(}{it:over_subopts}{cmd:)} {it:graph_options} ] {p 8 17 2} aweights and fweights are supported. {title:Description} {p 4 4 2} {cmd:designplot} produces a graphical summary of a numeric response variable {it:yvar} given one or more "factors" {it:xvarlist}. The term "factor" in this context means that any (numeric or string) variable concerned will be treated in terms of its distinct values or levels as they occur in the data. Use of Stata's factor variable syntax is neither explicit nor implicit. {p 4 4 2} For concreteness, consider the example {p 8 8 2}{cmd:. sysuse auto, clear}{p_end} {p 8 8 2}{cmd:. designplot mpg foreign rep78} {p 4 4 2} This produces a plot showing the mean of {cmd:mpg} for all observations; for all the classes defined by the values of {cmd:foreign} and also those of {cmd:rep78}; and for all the classes defined by the cross-combinations of values of {cmd:foreign} and {cmd:rep78} occurring in the data. {p 4 4 2} Options give scope for showing other summary statistics as calculated by {help summarize} and for restricting the results shown in the plot. {p 4 4 2} By default the graph is produced by {help graph dot}. Optionally {help graph hbar} or {help graph bar} may be used instead. {p 4 4 2} Design plots offer a diversity of uses, ranging from simple exploratory overviews to multiscale breakdowns deserving and demanding detailed scrutiny. {title:Remarks} {p 4 4 2} {cmd:designplot} is an eclectic combination of ideas. Readers are warmly invited to inform the author of other similar or related work. {p 4 4 2} 1. The existing Stata command {help grmeanby} shows means (or optionally medians) of a response variable given one or more other variables. The scope of {cmd:grmeanby} is identical to that of {cmd:designplot} insofar as the other variables could be string variables as well as numeric variables. As recorded by Gould (1993) and in the manual entry, {cmd:grmeanby} was inspired by examples in Chambers and Hastie (1992). {cmd:grmeanby} is based on direct use of {cmd: summarize}. {p 4 4 2} 2. Freeny and Landwehr (1992) gave the name "design plot" to plots similar to those in Chambers and Hastie (1992) and that name is associated with software implementations outside Stata, notably in S, S-Plus and R. The name is also consistent with S syntax detailed at Chambers and Hastie (1992, pp.546{c -}547). In these implementations plots show results from fitting linear models, specifically analyses of variance. The name evokes the idea of an underlying experimental design, but the command here clearly may be applied to any data, including observational data in any sense of that term. The graph shown by Zuur et al. (2007, p.37) is an example from the applied literature. {p 4 4 2} 3. Various plots given in Hoaglin, Mosteller, and Tukey (1991) show displays "side-by-side" of main effects, interactions and residuals as fitted in analysis of variance. Roberts (1993, p.310) cites an earlier instance of the same idea in Tukey (1977, p.451). Yandell (1997) calls these "effect plots" or "effects plots". {p 4 4 2} 4. Broadly similar plots for "graphical ANOVA" appear in Box, Hunter and Hunter (2005). See also the earlier work in Box (1993). van Belle (2008, p.201) called them "BHH plots". {p 4 4 2} Graphs of types 3 and 4 commonly show effects and residuals scaled to be comparable in terms of variability. {p 4 4 2} 5. Graphically, these displays share a possible problem: points may need to be plotted close to each other, creating difficulties especially if any text labels occlude each other or need to be abbreviated. 3 out of 4 examples in Chambers and Hastie (1992) show this, as does the example in [R] grmeanby. Several examples in Hoaglin et al. (1991) avoid the problem only by jittering points apart. Harrell (2001) used a different display based on dot charts or dot plots (in the sense of Cleveland 1984, 1985, 1994) that avoids this problem. Conversely, a dot chart representation will work well with say 10 entries, but not with 100 or more. {p 4 4 2} 6. On a simpler level, tables or graphs reporting survey results often show two or more separate breakdowns of some sample. Examples are shown by Tufte (1983/2001, p.179) and (more trivially) Cox (2008), among many others. {p 4 4 2} 7. The {cmd:statsby} command with its {cmd:subsets} option provides an easy framework for calculation and assembly of summary statistics for zero-, one-, two-way and higher breakdowns of a dataset. Cox (2010) provided an illustration of its exploitation for graphics. {p 4 4 2} The name "design plot" is adopted here as a simple, memorable name and given its earlier and widespread use to show similar information. These are positive features. On the other hand, the connotation of experimental design will often be inappropriate. The use of dot chart (or optionally bar chart) form also distinguishes the results of this command from others published as design plots. People who like the plots and dislike the name are naturally free to use other terminology, or none at all. Not every kind of graph needs a distinct name, but every graph program does. {p 4 4 2} Naturally, this lack of standardization is not new. "Most or all features of statistical computation{c -}computer hardware, software systems, coding, languages, symbols, terminology, procedures{c -}have much to gain from elimination of pointless variations, redundancies and confusion. Yet pointlessness is not always easy to judge. The only quite satisfying rule of standardization is that you adopt my standards." (Anscombe 1981, p.3) {p 4 4 2} {cmd:designplot} creates a new dataset of {cmd:summarize} results with default variable names {cmd:_stat1} and so forth for each statistic and {cmd:_way}, {cmd:_group} and {cmd:_entry} describing the results. If the number of observations is not one of the statistics requested, a variable with default name {cmd:_nobs} is added any way, on the grounds that it will often be interesting or useful. The original dataset will be restored after the graph is drawn, but the results set may be {cmd:save}d for other use with the {cmd:saveresults()} option. {p 4 4 2} How therefore does {cmd:designplot} differ from what is readily available through (e.g.) {cmd:graph dot}? There are two main differences. First, {cmd:graph dot} and its siblings are more restricted in offering only one-way or two-way or three-way breakdowns given, respectively, one or two or three "factors" as arguments to {cmd:over()} or {cmd:by()} options. Second, they do not give scope for saving results for separate graphing or tabulation. {p 4 4 2} For concreteness, consider again the example {p 8 8 2}{cmd:. sysuse auto, clear}{p_end} {p 8 8 2}{cmd:. designplot mpg foreign rep78} {p 4 4 2} This produces a plot showing {p 8 8 2} the mean of {cmd:mpg} for all observations, which may be called a "zero-way" breakdown {p 8 8 2} the means for all the classes defined by the values of {cmd:foreign} and also of {cmd:rep78}, which may be called "one-way" breakdowns, as often done in statistical literature {p 8 8 2} and the means for all the classes defined by the cross-combinations of values of {cmd:foreign} and {cmd:rep78} occurring in the data, which similarly may be called a "two-way" breakdown, again as often done. {p 4 4 2} In general, specifying one or more factors gives scope for various breakdowns, but the number of (cross-)combinations may grow rapidly, so that the resulting graph might be too complicated to be readable or useful. Thus {cmd:designplot} offers options to restrict the scope of what is plotted. {title:Options} {p 4 8 2} {cmd:statistics(}{cmd:)} specifies statistics calculated by {help summarize} to be calculated. The default is the mean (only). One or more statistics may be specified. Note that no allowance is made in graphics for different statistics being on quite different scales, so that the user may need to exercise discretion over what is specified. The names allowed include the names of the r-class results as visible after {cmd:summarize, detail} or as documented in [R] summarize. Thus {cmd:p50} specifies the median available as {cmd:r(p50)}. {p 4 8 2} Allowed synonyms also include the following. Any synonym specified will be echoed literally to the {cmd:ytitle()}. {p 8 8 2} {cmd:n} or {cmd:count} or any abbreviation of {cmd:frequency} for {cmd:N}. {p 8 8 2} {cmd:minimum} for {cmd:min} and {cmd:maximum} for {cmd:max}. {p 8 8 2} {cmd:total} for {cmd:sum}. {p 8 8 2} {cmd:median} for {cmd:p50}. {p 8 8 2} {cmd:SD} for {cmd:sd}. {p 8 8 2} any abbreviation of {cmd:variance} or {cmd:Variance} for {cmd:Var}. {p 4 8 2} {cmd:skew} for {cmd:skewness} and {cmd:kurt} for {cmd:kurtosis}. {p 8 8 2}Note that if just {cmd:statistics(N)} is specified, which {it:yvar} is specified is immaterial so long as it is non-missing whenever {it:xvarlist} are non-missing. {p 4 8 2} {cmd:prefix()} is an occasionally used option. {cmd:designplot} creates a dataset of results with variable names such as {cmd:_stat1} and so forth. If these names clash with existing variable names, this option may be used to add a prefix to all such names to remove the clash. {p 4 8 2} {cmd:saveresults()} saves the results as a Stata dataset. Options of {help save} may be specified, most usefully {cmd:replace}. The dataset will include {help notes} on the {cmd:designplot} command issued and (if defined) the filename and its date for the ({cmd:save}d) dataset. {p 4 8 2} {cmd:maxway()} specifies the maximum "way" to be plotted. See explanation in Remarks on breakdowns that are called zero-way, one-way, two-way and so forth. Thus {cmd:maxway(1)} by itself specifies that zero-way and one-way breakdowns only are to be shown. {p 4 8 2} {cmd:minway()} specifies the minimum "way" to be plotted. See explanation in Remarks on breakdowns that are called zero-way, one-way, two-way and so forth. Thus {cmd:minway(1)} by itself specifies that the zero-way breakdown should not be shown. {p 4 8 2} {cmd:recast(}{c -(}{cmd:hbar}{c |}{cmd:bar}{c )-}{cmd:)} specifies that the graph should be drawn using {help graph hbar} or {help graph bar}. The default is {help graph dot}. People fond of bar charts are advised to try {cmd:graph hbar} for greater readability of axis information. Note for experienced users: although the option name is suggested by another {help advanced_options:recast()} option, this is not a back door to recasting to a {cmd:twoway} plot. {p 4 8 2} {cmd:variablelabels} specifies that one-way breakdowns should be labelled by the corresponding variable labels, or the corresponding variable names if no variable label is defined. The default is, or should be, an invisible label (precisely, an instance of {cmd:char(160)}). {p 4 8 2} {cmd:variablenames} specifies that one-way breakdowns should be labelled by the corresponding variable names. The default is, or should be, an invisible label (precisely, an instance of {cmd:char(160)}). The reason for using this option rather than {cmd:variablelabels} is likely to be that variable labels would take up too much space. {p 8 8 2} Only one of {cmd:variablelabels} and {cmd:variablenames} may be specified. {p 4 8 2} {cmd:alllabel(}{it:text}{cmd:)} specifies text to label results for all observations used. The default is {cmd:(all)}. {p 4 8 2} {cmd:entryopts(}{it:over_subopts}{cmd:)} specifies {it:over_subopts} of {cmd:graph dot}, {cmd:graph hbar} or {cmd:graph bar}, used to tune the corresponding call to an {cmd:over()} option that affects the display of individual entries in the graph. Users unsure of what this means may find inspection of the source code helpful or alternatively just modify a graph by use of the Graph Editor. {p 4 8 2} {cmd:groupopts(}{it:over_subopts}{cmd:)} specifies {it:over_subopts} of {cmd:graph dot}, {cmd:graph hbar} or {cmd:graph bar}, used to tune the corresponding call to an {cmd:over()} option that affects the display of groups of entries in the graph. Users unsure of what this means may find inspection of the source code helpful or alternatively just modify a graph by use of the Graph Editor. {p 4 8 2} {it:graph_options} are other options allowed with {help graph dot}, {help graph hbar} or {help graph bar} (whichever command is being used). Note that among other defaults {cmd:t1title()} is used to display information on {it:yvar}. {title:Examples} {p 4 8 2}{cmd:. set scheme s1color}{p_end} {p 4 8 2}{cmd:. sysuse auto, clear}{p_end} {p 4 8 2}{cmd:. designplot mpg foreign rep78}{p_end} {p 4 8 2}{cmd:. designplot mpg foreign rep78 if !missing(foreign,rep78), stat(count) recast(hbar) blabel(total) yla(none) t1title("frequencies") variablelabels ytitle("") ysc(r(0 72))}{p_end} {p 4 8 2}{cmd:. designplot mpg foreign rep78, stat(min p25 median mean p75 max) maxway(1) legend(row(1))}{p_end} {p 4 8 2}{cmd:. infix class 1-9 adult 10-18 male 19-27 survived 28-36 using http://www.amstat.org/publications/jse/datasets/titanic.dat.txt, clear }{p_end} {p 4 8 2}{cmd:. label def class 0 crew 1 first 2 second 3 third}{p_end} {p 4 8 2}{cmd:. label def adult 1 adult 0 child}{p_end} {p 4 8 2}{cmd:. label def male 1 male 0 female}{p_end} {p 4 8 2}{cmd:. label def survived 1 yes 2 no}{p_end} {p 4 8 2}{cmd:. foreach v in class adult male survived {c -(}}{p_end} {p 4 8 2}{cmd:. }{space 4}{cmd:label val `v' `v'}{p_end} {p 4 8 2}{cmd:. {c )-}}{p_end} {p 4 8 2}{cmd:. designplot survived class adult male, max(2) ysize(7)}{p_end} {title:Author} {p 4 4 2}Nicholas J. Cox, Durham University, U.K.{break} n.j.cox@durham.ac.uk {title:References} {p 4 8 2} Anscombe, F.J. 1981. {it:Computing in Statistical Science through APL.} New York: Springer. {p 4 8 2} Box, G.E.P. 1993. How to get lucky. {it:Quality Engineering} 5: 517{c -}524. {p 4 8 2} Box, G.E.P., Hunter, J.S. and Hunter, W.G. 2005. {it:Statistics for Experimenters: Design, Innovation, and Discovery.} Hoboken, NJ: John Wiley. {p 4 8 2} Chambers, J.M. and Hastie, T.J. (Eds.) 1992. {it:Statistical Models in S.} Pacific Grove, CA: Wadsworth and Brooks/Cole. See pp.3, 9, 148, 164. {p 4 8 2} Cleveland, W.S. 1984. Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging. {it:American Statistician} 38: 270{c -}80. {p 4 8 2} Cleveland, W.S. 1985. {it:Elements of graphing data.} Monterey, CA: Wadsworth. {p 4 8 2} Cleveland, W.S. 1994. {it:Elements of graphing data.} Summit, NJ: Hobart Press. {p 4 8 2} Cox, N.J. 2008. Between tables and graphs. {it:Stata Journal} 8: 269{c -}289. {p 4 8 2} Cox, N.J. 2010. The statsby strategy. {it:Stata Journal} 10: 143{c -}151. {p 4 8 2} Dawson, R.J.M. 1995. The "unusual episode" data revisited. {it:Journal of Statistics Education} 3(3). {browse "http://www.amstat.org/publications/jse/v3n3/datasets.dawson.html":http://www.amstat.org/publications/jse/v3n3/datasets.dawson.html} {p 4 8 2} Freeny, A.E. and Landwehr, J.M. 1992. Displays for data from large designed experiments. In Page, C. and LePage, R. (Eds) {it:Computer Science and Statistics: Proceedings of the 22nd Symposium on the Interface. Statistics of Many Parameters: Curves, Images, Spatial Models}. New York: Springer, 117{c -}126. {p 4 8 2} Gould, W.W. 1993. gr12: Graphs of means and medians by categorical variables. {it:Stata Technical Bulletin} 12: 13. {p 4 8 2} Harrell, F.E. 2001. {it:Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis.} New York: Springer. See pp.126, 303, 304, 314, 315, 336. {p 4 8 2} Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (Eds) 1991. {it:Fundamentals of Exploratory Analysis of Variance.} New York: John Wiley. See pp.84, 97, 103, 120, 125, 133, 140, 174, 181, 182, 382, 385. {p 4 8 2} Roberts, S. 1993. Fundamentals of Exploratory Analysis of Variance. Edited by David C. Hoaglin, Frederick Mosteller, and John W. Tukey. {it:American Journal of Psychology} 106: 308{c -}320. {p 4 8 2} Tufte, E.R. 1983/2001. {it:The Visual Display of Quantitative Information.} Cheshire, CT: Graphics Press. {p 4 8 2} Tukey, J.W. 1977. {it:Exploratory Data Analysis.} Reading, MA: Addison-Wesley. {p 4 8 2} van Belle, G. 2008. {it:Statistical Rules of Thumb.} Hoboken, NJ: John Wiley. {p 4 8 2} Yandell, B.S. 1997. {it:Practical Data Analysis for Designed Experiments.} London: Chapman & Hall. See pp.138, 173, 174, etc. for examples of effects plots. {p 4 8 2} Zuur, A.F., Ieno, E.N. and Smith, G.M. 2007. {it:Analysing ecological data.} New York: Springer. {title:Also see} {p 4 13 2} On-line: help for {help graph dot}, help for {help graph hbar}, help for {help graph bar}; help for {help statsby}