-------------------------------------------------------------------------------
help for stripplot
-------------------------------------------------------------------------------

Strip plots: oneway dot plots

stripplot varlist [if exp] [in range] [ , vertical width(#) { floor | ceiling } stack height(#) { centre | center } separate(varname) { bar[(bar_options)] | box[(box_options)] } iqr[(#)] pctile(#) whiskers(rspike_options) boffset(#) variablelabels plot(plot) addplot(plot) graph_options ]

stripplot varname [if exp] [in range] [ , vertical width(#) { floor | ceiling } stack height(#) { centre | center } over(groupvar) separate(varname) { bar[(bar_options)] | box[(box_options)] } iqr[(#)] pctile(#) whiskers(rspike_options) boffset(#) plot(plot) addplot(plot) graph_options ]

Description

stripplot plots data as a series of marks against a single magnitude axis. By default this axis is horizontal. With the option vertical it is vertical. Optionally, data points may be jittered or stacked into histogram- or dotplot-like displays, and either bars showing means and confidence intervals, or boxes showing medians and quartiles, may be added.

Remarks

General and bibliographic remarks

There is not a sharp distinction in the literature or in software implementations between dot plots and strip plots. Commonly, but with many exceptions, a dot plot is drawn as a pointillist analogue of a histogram. Sometimes, dot plot is used as the name when data points are plotted in a line, or at most a narrow strip, against a magnitude axis. Strip plot implementations, as here, usually allow stacking options, so that dot plots may be drawn as one choice.

Such plots under these and yet other names go back at least as far as Langren (1644): see Tufte (1997, p.15) and in much more detail Friendly et al. (2010). Sasieni and Royston (1996) and Wilkinson (1999) give general discussions and several further references of historical interest. Monkhouse and Wilkinson (1952) used the term dispersion diagrams. Pearson (1956) gives several examples. Dickinson (1963) used the term dispersal graphs. Box et al. (1978) used the term dot diagrams. Chambers et al. (1983), Becker et al. (1988) and Cleveland (1994) used the term one-dimensional scatter plots, as did Lee and Tu (1997) and Reimann et al. (2008). Ryan et al. (1985) discuss their Minitab implementation as dotplots. Cleveland (1985) used the term point graphs. The term oneway plots appears to have been introduced by Computing Resource Center (1985). Feinstein (2002, p.67) uses the term one-way graphs. The term strip plots (or strip charts) (e.g. Dalgaard 2002; Venables and Ripley 2002; Robbins 2005; Faraway 2005; Maindonald and Braun 2007) appears traceable to work by J.W. and P.A. Tukey (1990). The term dit plots appears in Ellison (1993, 2001). The term linear plots appears in Hay (1996) and that of line plots in Klemelä (2009) and Schenemeyer and Drew (2011).

Tufte (1974), Berry (1996), Cobb (1998), Griffiths et al. (1998), Bland (2000), Wild and Seber (2000), Robbins (2005), Young et al. (2006), Morgenthaler (2007), Warton (2008) and Keen (2010) show many interesting examples of strip plots.

Hybrid dot-box plots were used by Monkhouse and Wilkinson (1952), Gregory (1963), Matthews (1981), Wilkinson (1992, 2005), Wild and Seber (2000), Ellison (2001), Quinn and Keough (2002) and Young et al. (2006). Box plots in widely current forms are best known through the work of Tukey (1972, 1977). Similar ideas go back much further. Cox (2009) gives various references. Bibby (1986, pp.56, 59) gave even earlier references to their use by A.L. Bowley in his lectures about 1897 and to his recommendation (Bowley, 1910, p.62; 1952, p.73) to use minimum and maximum and 10, 25, 50, 75 and 90% points as a basis for graphical summary. Keen (2010) also discusses several variants of box plots.

Dot charts (also sometimes called dot plots) in the sense of Cleveland (1984, 1994), as implemented in graph dot, are quite distinct.

See also Cox (2004) for a general discussion of graphing distributions in Stata; Cox (2007) for an implementation of stem-and-leaf plots that bears some resemblance to what is possible with stripplot; and Cox (2009) on how to draw box plots using twoway.

A note for experimental design people

There is no connection between stripplot and the strip plots discussed in design of experiments.

A comparison between stripplot, gr7, oneway and dotplot

stripplot may have either horizontal or vertical magnitude axis. With gr7, oneway the magnitude axis is always horizontal. With dotplot the magnitude axis is always vertical.

stripplot and dotplot put descriptive text on the axes. gr7, oneway puts descriptive text under each line of marks.

stripplot and dotplot allow any marker symbol to be used for the data marks. gr7, oneway always shows data marks as short vertical bars, unless jitter() is specified.

stripplot and dotplot interpret jitter() in the same way as does scatter. gr7, oneway interprets jitter() as replacing short vertical bars by sets of dots.

stripplot and dotplot allow tuning of xlabel(). gr7, oneway does not allow such tuning: the minimum and maximum are always shown. Similarly, stripplot and dotplot allow the use of xline() and yline().

dotplot uses only one colour in the body of the graph. stripplot allows several colours in the body of the graph with its separate() option. gr7, oneway uses several colours with several variables.

There is no equivalent with stripplot or dotplot to gr7, oneway rescale, which stretches each set of data marks to extend over the whole horizontal range of the graph. Naturally, users could standardise a bunch of variables in some way before calling stripplot or dotplot.

stripplot and dotplot with option over(groupvar) do not require data to be sorted by groupvar. The equivalent gr7, oneway by(groupvar) does require this.

stripplot allows the option by(byvar), producing separate graph panels according to the groups of byvar. dotplot does not allow the option by(). gr7, oneway allows the option by(byvar), producing separate displays within a single panel. It does not take the values of byvar literally: displays for values 1, 2 and 4 will appear equally spaced.

stripplot with the stack option produces a variant on dotplot. There is by default no binning of data: compare dotplot, nogroup. Binning may be accomplished with the width() option so that classes are defined by round(varname/width) or optionally by width * floor(varname/width) or width * ceil(varname/width): contrast dotplot, ny(). Conversely, stacking may in effect be suppressed in dotplot by setting nx() sufficiently large.

stripplot has options for showing bars as confidence intervals and boxes showing medians and quartiles. gr7, oneway box shows Tukey-style box plots. dotplot allows the showing of mean +/- SD or median and quartiles by horizontal lines.

Options

vertical specifies that the magnitude axis should be vertical.

width(#) specifies that values are to be rounded in classes of specified width. Classes are defined by default by round(varname,width). See also the floor and ceiling options just below.

floor or ceiling in conjunction with width() specifies rounding by width * floor(varname/width) or width * ceil(varname/width) respectively. Only one may be specified. (These options are included to give some users the minute control they may desire, but if either option produces a marked difference in your plot, you may be rounding too much.)

stack specifies that data points with identical values are to be stacked, as in dotplot, except that by default there is no binning of data.

height(#) controls the amount of graph space taken up by stacked data points under the stack option above. The default is 0.8. This option will not by itself change the appearance of a plot for a single variable. Note that the height may need to be much smaller or much larger than 1 with over(), given that the latter takes values literally. For example, if your classes are 0(45)360, 36 might be a suitable height.

centre or center centres or centers markers for each variable or group on a hidden line.

over(groupvar) specifies that values of varname are to be shown separately by groups defined by groupvar. This option may only be specified with a single variable. If stack is also specified, then note that distinct values of any numeric groupvar are assumed to differ by at least 1. Tuning height() or the prior use of egen, group() label will fix any problems. See help on egen if desired.

Note that by() is also available as an alternative or complement to over(). See the examples for detail on how over() and by() could be used to show data subdivided by a cross-combination of categories.

separate() specifies that data points be shown separately according to the distinct classes of the variable specified. Commonly, but not necessarily, this option will be specified together with stack. Note that this option has no effect on any error bar or box plot calculations.

bar specifies that bars be added showing means and confidence intervals. Bar information is calculated using ci. bar(bar_options) may be used to specify details of the means and confidence intervals. bar_options are

Various options of ci: level(), poisson, binomial, exact, wald, agresti, wilson, jeffreys and exposure(). For example, bar(binomial jeffreys) specifies those options of ci.

mean(scatter_options) may be used to control the rendering of the symbol for the mean. For example, bar(mean(mcolor(red) ms(sh))) specifies the use of red small hollow squares.

Options of twoway rcap may be used to control the appearance of the bar. For example, bar(lcolor(red)) specifies red as the bar colour.

These kinds of options may be combined.

box specifies that boxes be added showing medians and quartiles. Box information is calculated using egen, median() and egen, pctile(). box(box_options) may be used to specify options of twoway rbar to control the appearance of the box. For example, box(bfcolor(eltgreen)) specifies eltgreen as the box fill colour. The defaults are bcolor(none) barwidth(0.4). Note that the length of each box is the interquartile range or IQR.

iqr[(#)] specifies that spikes are to be added to boxes that extend as far as the largest or smallest value within # IQR of the upper or lower quartile. Plain iqr without argument yields a default of 1.5 for #.

pctile(#) specifies that spikes are to be added to boxes that extend as far as the # and 100 - # percentiles.

whiskers() specifies options of twoway rspike that may be used to modify the appearance of spikes added to boxes.

iqr, iqr(), pctile() and whiskers() have no effect without box or box(). iqr or iqr() may not be combined with pctile().

bar[()] and box[()] may not be combined.

boffset() may be used to control the position of bars or boxes. By default, bars are positioned 0.2 unit to the left of (or below) the base line for strips, and boxes are positioned under the the base line for strips. Negative arguments specify positions to the left or below of the base line and positive arguments specify positions to the right or above.

variablelabels specifies that multiple variables be labelled by their variable labels. The default is to use variable names.

plot(plot) provides a way to add other plots to the generated graph; see help plot_option (Stata 8 only).

addplot(plot) provides a way to add other plots to the generated graph; see help addplot_option (Stata 9 up).

graph_options are options of scatter, including by(), on which see by_option. Note that by(, total) is not supported with bars or boxes. jitter() is often helpful.

Examples

(Stata's auto data) . sysuse auto, clear . stripplot mpg . stripplot mpg, aspect(0.05) . stripplot mpg, over(rep78) . stripplot mpg, over(rep78) by(foreign) . stripplot mpg, over(rep78) vertical . stripplot mpg, over(rep78) vertical stack . stripplot mpg, over(rep78) vertical stack h(0.4)

. gen pipe = "|" . stripplot mpg, ms(none) mlabpos(0) mlabel(pipe) mlabsize(*2) stack . stripplot price, over(rep78) ms(none) mla(pipe) mlabpos(0) . stripplot price, over(rep78) w(200) stack h(0.4)

(5 here is empirical: adjust for your variable) . gen price1 = price - 5 . gen price2 = price + 5 . stripplot price, over(rep78) box ms(none) addplot(rbar price1 price2 rep78, horizontal barw(0.2) bcolor(gs6))

. stripplot mpg, over(rep78) stack h(0.5) bar(lcolor(red)) . stripplot mpg, over(rep78) box . stripplot mpg, over(rep78) box(bfcolor(eltgreen)) boffset(-0.3) . stripplot mpg, over(rep78) box boffset(-0.3) . stripplot mpg, over(rep78) box(bfcolor(eltgreen) barw(0.2)) boffset(-0.2) stack h(0.5) . stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2)) boffset(-0.2) stack h(0.5) . stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2)) iqr boffset(-0.2) stack h(0.5) . stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2)) pctile(10) whiskers(recast(rbar) bcolor(black) barw(0.02)) boffset(-0.2) stack h(0.5)

. gen digit = mod(mpg, 10) . stripplot mpg, stack vertical mla(digit) mlabpos(0) ms(i) over(foreign) height(0.2) yla(, ang(h)) xla(, noticks) . stripplot mpg, stack vertical mla(digit) mlabpos(0) ms(i) by(foreign) yla(, ang(h))

. stripplot mpg, over(rep78) separate(foreign) stack . stripplot mpg, by(rep78) separate(foreign) stack

. gen rep78_1 = rep78 - 0.1 . egen mean = mean(mpg), by(foreign rep78) . stripplot mpg, over(rep78) by(foreign, compact) addplot(scatter rep78_1 mean, ms(T)) stack

. clonevar rep78_2 = rep78 . replace rep78_2 = cond(foreign, rep78 + 0.15, rep78 - 0.15) . stripplot mpg, over(rep78_2) separate(foreign) yla(1/5) jitter(1 1)

(Challenger shuttle O-ring damage) . logit damage temperature . predict pre . stripplot damage, over(temperature) stack ms(sh) height(0.4) addplot(mspline pre temperature, bands(20))

(Stata's blood pressure data) . sysuse bplong, clear . egen group = group(age sex), label . stripplot bp*, bar over(when) by(group, compact col(1) note("")) ysc(reverse) subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) ytitle("") xtitle(Blood pressure (mm Hg))

Acknowledgments

Philip Ender helpfully identified a bug. William Dupont offered encouragement. Kit Baum nudged me into implementing separate(). Maarten Buis made a useful suggestion about this help. Ronán Conroy suggested adding whiskers. He also found two bugs. Marc Kaulisch asked a question which led to more emphasis on the use of by() and the blood pressure example. David Airey found another bug. Oliver Jones asked a question which led to an example of the use of twoway rbar to mimic pipe or barcode symbols. Fredrik Norström found yet another bug.

Author

Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

References

Becker, R.A., J.M. Chambers, and A.R. Wilks. 1988. The new S language: A programming environment for data analysis and graphics. Pacific Grove, CA: Wadsworth and Brooks/Cole.

Berry, D.A. 1996. Statistics: a Bayesian perspective. Belmont, CA: Duxbury.

Bibby, J. 1986. Notes towards a history of teaching statistics. Edinburgh: John Bibby (Books).

Bland, M. 2000. An introduction to medical statistics. Oxford: Oxford University Press.

Bowley, A.L. 1910. An elementary manual of statistics. London: Macdonald and Evans. (seventh edition 1952)

Box, G.E.P., W.G. Hunter and J.S. Hunter. 1978. Statistics for experimenters: an introduction to design, data analysis, and model building. New York: John Wiley. (second edition 2005)

Chambers, J.M., W.S. Cleveland, B. Kleiner and P.A. Tukey. 1983. Graphical methods for data analysis. Belmont, CA: Wadsworth.

Cleveland, W.S. 1984. Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging. American Statistician 38: 270-80.

Cleveland, W.S. 1985. Elements of graphing data. Monterey, CA: Wadsworth.

Cleveland, W.S. 1994. Elements of graphing data. Summit, NJ: Hobart Press.

Cobb, G.W. 1998. Introduction to design and analysis of experiments. New York: Springer.

Cox, N.J. 2004. Speaking Stata: Graphing distributions. Stata Journal 4(1): 66-88.

Cox, N.J. 2007. Speaking Stata: Turning over a new leaf. Stata Journal 7(3): 413-433.

Cox, N.J. 2009. Speaking Stata: Creating and varying box plots. Stata Journal 9(3): 478-496.

Computing Resource Center. 1985. STATA/Graphics user's guide. Los Angeles, CA: Computing Resource Center.

Dalgaard, P. 2002. Introductory statistics with R. New York: Springer.

Dickinson, G.C. 1963. Statistical mapping and the presentation of statistics. London: Edward Arnold. (second edition 1973)

Ellison, A.M. 1993. Exploratory data analysis and graphic display. In Scheiner, S.M. and J. Gurevitch (eds) Design and analysis of ecological experiments. New York: Chapman & Hall, 14-45.

Ellison, A.M. 2001. Exploratory data analysis and graphic display. In Scheiner, S.M. and J. Gurevitch (eds) Design and analysis of ecological experiments. New York: Oxford University Press, 37-62.

Faraway, J.J. 2005. Linear models with R. Boca Raton, FL:Chapman and Hall/CRC.

Feinstein, A.R. 2002. Principles of medical statistics. Boca Raton, FL: Chapman and Hall/CRC.

Friendly, M., P. Valero-Mora and J.I. Ulargui. 2010. The first (known) statistical graph: Michael Florent van Langren and the "secret" of longitude. American Statistician 64: 174-184. (supplementary materials online)

Gregory, S. 1963. Statistical methods and the geographer. London: Longmans. (later editions 1968, 1973, 1978; publisher later Longman)

Griffiths, D., W.D. Stirling and K.L. Weldon. 1998. Understanding data: principles and practice of statistics. Brisbane: John Wiley.

Hay, I. 1996. Communicating in geography and the environmental sciences. Melbourne: Oxford University Press. (later editions 2002, 2006)

Keen, K.J. 2010. Graphics for statistics and data analysis with R. Boca Raton, FL: CRC Press.

Klemelä, J. 2009. Smoothing of multivariate data: Density estimation and visualization. Hoboken, NJ: John Wiley.

Langren, Michael Florent van. 1644. La verdadera longitud por mar y tierra. Antwerp.

Lee, J.J. and Z.N. Tu. 1997. A versatile one-dimensional distribution plot: the BLiP plot. American Statistician 51: 353-358.

Maindonald, J.H. and W.J. Braun. 2007. Data analysis and graphics using R - an example-based approach. Cambridge: Cambridge University Press.

Matthews, J.A. 1981. Quantitative and statistical approaches to geography: A practical manual. Oxford: Pergamon.

Monkhouse, F.J. and H.R. Wilkinson. 1952. Maps and diagrams: Their compilation and construction. London: Methuen. (later editions 1963, 1971)

Morgenthaler, S. 2007. Introduction ā la statistique. Lausanne: Presses polytechniques et universitaires romandes.

Pearson, E.S. 1956. Some aspects of the geometry of statistics: the use of visual presentation in understanding the theory and application of mathematical statistics. Journal of the Royal Statistical Society A 119: 125-146.

Quinn, G.P. and M.J. Keough. 2002. Experimental design and data analysis for biologists. Cambridge: Cambridge University Press.

Reimann, C., P. Filzmoser, R.G. Garrett and R. Dutter. 2008. Statistical data analysis explained: applied environmental statistics with R. Chichester: John Wiley.

Robbins, N.B. 2005. Creating more effective graphs. Hoboken, NJ: John Wiley.

Ryan, B.F., B.L. Joiner and T.A. Ryan. 1985. Minitab handbook. Boston, MA: Duxbury.

Sasieni, P.D. and P. Royston. 1996. Dotplots. Applied Statistics 45: 219-234.

Schenemeyer, J.H. and L.J. Drew. 2011. Statistics for earth and environmental scientists. Hoboken, NJ: John Wiley.

Tufte, E.R. 1974. Data analysis for politics and policy. Englewood Cliffs, NJ: Prentice-Hall.

Tufte, E.R. 1997. Visual explanations: images and quantities, evidence and narrative. Cheshire, CT: Graphics Press.

Tukey, J.W. 1972. Some graphic and semi-graphic displays. In Bancroft, T.A. and Brown, S.A. (eds) Statistical papers in honor of George W. Snedecor. Ames, IA: Iowa State University Press, 293-316. (also accessible at http://www.edwardtufte.com/tufte/tukey)

Tukey, J.W. 1977. Exploratory data analysis. Reading, MA: Addison-Wesley.

Tukey, J.W. and P.A. Tukey. 1990. Strips displaying empirical distributions: I. Textured dot strips. Bellcore Technical Memorandum.

Venables, W.N. and B.D. Ripley. 2002. Modern applied statistics with S. New York: Springer.

Warton, D.I. 2008. Raw data graphing: an informative but under-utilized tool for the analysis of multivariate abundances. Austral Ecology 33: 290-300.

Wild, C.J. and G.A.F. Seber. 2000. Chance encounters: a first course in data analysis and inference. New York: John Wiley.

Wilkinson, L. 1992. Graphical displays. Statistical Methods in Medical Research 1: 3-25.

Wilkinson, L. 1999. Dot plots. American Statistician 53: 276-281.

Wilkinson, L. 2005. The language of graphics. New York: Springer.

Young, F.W., P.M. Valero-Mora and M. Friendly. 2006. Visual statistics: Seeing data with interactive graphics. Hoboken, NJ: John Wiley.

Also see

On-line: help for dotplot, gr7oneway, histogram, beamplot (if installed)