------------------------------------------------------------------------------- help forstripplot-------------------------------------------------------------------------------

Strip plots: oneway dot plots

stripplotvarlist[ifexp] [inrange] [,verticalwidth(#){floor|ceiling}stackheight(#){centre|center}separate(varname){bar[(bar_options)] |box[(box_options)] }iqr[(#)]pctile(#)whiskers(rspike_options)boffset(#)variablelabelsplot(plot)addplot(plot)graph_options]

stripplotvarname[ifexp] [inrange] [,verticalwidth(#){floor|ceiling}stackheight(#){centre|center}over(groupvar)separate(varname){bar[(bar_options)] |box[(box_options)] }iqr[(#)]pctile(#)whiskers(rspike_options)boffset(#)plot(plot)addplot(plot)graph_options]

Description

stripplotplots data as a series of marks against a single magnitude axis. By default this axis is horizontal. With the optionverticalit is vertical. Optionally, data points may be jittered or stacked into histogram- ordotplot-like displays, and either bars showing means and confidence intervals, or boxes showing medians and quartiles, may be added.

Remarks

General and bibliographic remarksThere is not a sharp distinction in the literature or in software implementations between

dot plotsandstrip plots. Commonly, but with many exceptions, a dot plot is drawn as a pointillist analogue of a histogram. Sometimes, dot plot is used as the name when data points are plotted in a line, or at most a narrow strip, against a magnitude axis. Strip plot implementations, as here, usually allow stacking options, so that dot plots may be drawn as one choice.Such plots under these and yet other names go back at least as far as Langren (1644): see Tufte (1997, p.15) and in much more detail Friendly

et al.(2010). Sasieni and Royston (1996) and Wilkinson (1999) give general discussions and several further references of historical interest. Monkhouse and Wilkinson (1952) used the termdispersiondiagrams. Pearson (1956) gives several examples. Dickinson (1963) used the termdispersal graphs. Boxet al.(1978) used the termdot diagrams. Chamberset al.(1983), Beckeret al.(1988) and Cleveland (1994) used the termone-dimensional scatter plots, as did Lee and Tu (1997) and Reimannet al.(2008). Ryanet al.(1985) discuss their Minitab implementation asdotplots. Cleveland (1985) used the termpoint graphs. The termoneway plotsappears to have been introduced by Computing Resource Center (1985). Feinstein (2002, p.67) uses the termone-waygraphs. The termstrip plots(orstrip charts) (e.g. Dalgaard 2002; Venables and Ripley 2002; Robbins 2005; Faraway 2005; Maindonald and Braun 2007) appears traceable to work by J.W. and P.A. Tukey (1990). The termdit plotsappears in Ellison (1993, 2001). The termlinear plotsappears in Hay (1996) and that ofline plotsin Klemelä (2009) and Schenemeyer and Drew (2011).Tufte (1974), Berry (1996), Cobb (1998), Griffiths

et al.(1998), Bland (2000), Wild and Seber (2000), Robbins (2005), Younget al.(2006), Morgenthaler (2007), Warton (2008) and Keen (2010) show many interesting examples of strip plots.Hybrid dot-box plots were used by Monkhouse and Wilkinson (1952), Gregory (1963), Matthews (1981), Wilkinson (1992, 2005), Wild and Seber (2000), Ellison (2001), Quinn and Keough (2002) and Young

et al.(2006). Box plots in widely current forms are best known through the work of Tukey (1972, 1977). Similar ideas go back much further. Cox (2009) gives various references. Bibby (1986, pp.56, 59) gave even earlier references to their use by A.L. Bowley in his lectures about 1897 and to his recommendation (Bowley, 1910, p.62; 1952, p.73) to use minimum and maximum and 10, 25, 50, 75 and 90% points as a basis for graphical summary. Keen (2010) also discusses several variants of box plots.Dot charts (also sometimes called dot plots) in the sense of Cleveland (1984, 1994), as implemented in graph dot, are quite distinct.

See also Cox (2004) for a general discussion of graphing distributions in Stata; Cox (2007) for an implementation of stem-and-leaf plots that bears some resemblance to what is possible with

stripplot; and Cox (2009) on how to draw box plots using twoway.

A note for experimental design peopleThere is no connection between

stripplotand the strip plots discussed in design of experiments.

A comparison betweenstripplot,gr7, onewayanddotplot

stripplotmay have either horizontal or vertical magnitude axis. Withgr7, onewaythe magnitude axis is always horizontal. Withdotplotthe magnitude axis is always vertical.

stripplotanddotplotput descriptive text on the axes.gr7, onewayputs descriptive text under each line of marks.

stripplotanddotplotallow any marker symbol to be used for the data marks.gr7, onewayalways shows data marks as short vertical bars, unlessjitter()is specified.

stripplotanddotplotinterpretjitter()in the same way as doesscatter.gr7, onewayinterpretsjitter()as replacing short vertical bars by sets of dots.

stripplotanddotplotallow tuning ofxlabel().gr7, onewaydoes not allow such tuning: the minimum and maximum are always shown. Similarly,stripplotanddotplotallow the use ofxline()andyline().

dotplotuses only one colour in the body of the graph.stripplotallows several colours in the body of the graph with itsseparate()option.gr7, onewayuses several colours with several variables.There is no equivalent with

stripplotordotplottogr7, oneway rescale, which stretches each set of data marks to extend over the whole horizontal range of the graph. Naturally, users could standardise a bunch of variables in some way before callingstripplotordotplot.

stripplotanddotplotwith optionover(groupvar)do not require data to be sorted bygroupvar. The equivalentgr7, oneway by(groupvar)does require this.

stripplotallows the optionby(byvar), producing separate graph panels according to the groups ofbyvar.dotplotdoes not allow the optionby().gr7, onewayallows the optionby(byvar), producing separate displays within a single panel. It does not take the values ofbyvarliterally: displays for values 1, 2 and 4 will appear equally spaced.

stripplotwith thestackoption produces a variant ondotplot. There is by default no binning of data: comparedotplot, nogroup. Binning may be accomplished with thewidth()option so that classes are defined byround(varname/width) or optionally bywidth* floor(varname/width)orwidth* ceil(varname/width): contrastdotplot, ny(). Conversely, stacking may in effect be suppressed indotplotby settingnx()sufficiently large.

stripplothas options for showing bars as confidence intervals and boxes showing medians and quartiles.gr7, oneway boxshows Tukey-style box plots.dotplotallows the showing of mean +/- SD or median and quartiles by horizontal lines.

Options

verticalspecifies that the magnitude axis should be vertical.

width(#)specifies that values are to be rounded in classes of specified width. Classes are defined by default byround(varname,width). See also thefloorandceilingoptions just below.

floororceilingin conjunction withwidth()specifies rounding bywidth* floor(varname/width)orwidth* ceil(varname/width)respectively. Only one may be specified. (These options are included to give some users the minute control they may desire, but if either option produces a marked difference in your plot, you may be rounding too much.)

stackspecifies that data points with identical values are to be stacked, as indotplot, except that by default there is no binning of data.

height(#)controls the amount of graph space taken up by stacked data points under thestackoption above. The default is 0.8. This option will not by itself change the appearance of a plot for a single variable. Note that the height may need to be much smaller or much larger than 1 withover(), given that the latter takes values literally. For example, if your classes are 0(45)360, 36 might be a suitable height.

centreorcentercentres or centers markers for each variable or group on a hidden line.

over(groupvar)specifies that values ofvarnameare to be shown separately by groups defined bygroupvar. This option may only be specified with a single variable. Ifstackis also specified, then note that distinct values of any numericgroupvarare assumed to differ by at least 1. Tuningheight()or the prior use ofegen,group() labelwill fix any problems. See help on egen if desired.Note that

by()is also available as an alternative or complement toover(). See the examples for detail on howover()andby()could be used to show data subdivided by a cross-combination of categories.

separate()specifies that data points be shown separately according to the distinct classes of the variable specified. Commonly, but not necessarily, this option will be specified together withstack. Note that this option has no effect on any error bar or box plot calculations.

barspecifies that bars be added showing means and confidence intervals. Bar information is calculated usingci.bar(bar_options)may be used to specify details of the means and confidence intervals.bar_optionsareVarious options of ci:

level(),poisson,binomial,exact,wald,agresti,wilson,jeffreysandexposure(). For example,bar(binomialjeffreys)specifies those options ofci.

mean(scatter_options)may be used to control the rendering of the symbol for the mean. For example,bar(mean(mcolor(red) ms(sh)))specifies the use of red small hollow squares.Options of twoway rcap may be used to control the appearance of the bar. For example,

bar(lcolor(red))specifies red as the bar colour.These kinds of options may be combined.

boxspecifies that boxes be added showing medians and quartiles. Box information is calculated usingegen, median()andegen, pctile().box(box_options)may be used to specify options of twoway rbar to control the appearance of the box. For example,box(bfcolor(eltgreen))specifieseltgreenas the box fill colour. The defaults arebcolor(none) barwidth(0.4). Note that the length of each box is the interquartile range or IQR.

iqr[(#)] specifies that spikes are to be added to boxes that extend as far as the largest or smallest value within#IQR of the upper or lower quartile. Plainiqrwithout argument yields a default of 1.5 for#.

pctile(#)specifies that spikes are to be added to boxes that extend as far as the#and 100 -#percentiles.

whiskers()specifies options of twoway rspike that may be used to modify the appearance of spikes added to boxes.

iqr,iqr(),pctile()andwhiskers()have no effect withoutboxorbox().iqroriqr()may not be combined withpctile().

bar[()] andbox[()] may not be combined.

boffset()may be used to control the position of bars or boxes. By default, bars are positioned 0.2 unit to the left of (or below) the base line for strips, and boxes are positioned under the the base line for strips. Negative arguments specify positions to the left or below of the base line and positive arguments specify positions to the right or above.

variablelabelsspecifies that multiple variables be labelled by their variable labels. The default is to use variable names.

plot(plot)provides a way to add other plots to the generated graph; see help plot_option (Stata 8 only).

addplot(plot)provides a way to add other plots to the generated graph; see help addplot_option (Stata 9 up).

graph_optionsare options of scatter, includingby(), on which see by_option. Note thatby(, total)is not supported with bars or boxes.jitter()is often helpful.

Examples(Stata's auto data)

. sysuse auto, clear. stripplot mpg. stripplot mpg, aspect(0.05). stripplot mpg, over(rep78). stripplot mpg, over(rep78) by(foreign). stripplot mpg, over(rep78) vertical. stripplot mpg, over(rep78) vertical stack. stripplot mpg, over(rep78) vertical stack h(0.4)

. gen pipe = "|". stripplot mpg, ms(none) mlabpos(0) mlabel(pipe) mlabsize(*2) stack. stripplot price, over(rep78) ms(none) mla(pipe) mlabpos(0). stripplot price, over(rep78) w(200) stack h(0.4)(5 here is empirical: adjust for your variable)

. gen price1 = price - 5. gen price2 = price + 5. stripplot price, over(rep78) box ms(none) addplot(rbar price1 price2rep78, horizontal barw(0.2) bcolor(gs6))

. stripplot mpg, over(rep78) stack h(0.5) bar(lcolor(red)). stripplot mpg, over(rep78) box. stripplot mpg, over(rep78) box(bfcolor(eltgreen)) boffset(-0.3). stripplot mpg, over(rep78) box boffset(-0.3). stripplot mpg, over(rep78) box(bfcolor(eltgreen) barw(0.2))boffset(-0.2) stack h(0.5). stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2))boffset(-0.2) stack h(0.5). stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2))iqr boffset(-0.2) stack h(0.5). stripplot mpg, over(rep78) box(bfcolor(black) blcolor(white) barw(0.2))pctile(10) whiskers(recast(rbar) bcolor(black) barw(0.02))boffset(-0.2) stack h(0.5)

. gen digit = mod(mpg, 10). stripplot mpg, stack vertical mla(digit) mlabpos(0) ms(i) over(foreign)height(0.2) yla(, ang(h)) xla(, noticks). stripplot mpg, stack vertical mla(digit) mlabpos(0) ms(i) by(foreign)yla(, ang(h))

. stripplot mpg, over(rep78) separate(foreign) stack. stripplot mpg, by(rep78) separate(foreign) stack

. gen rep78_1 = rep78 - 0.1. egen mean = mean(mpg), by(foreign rep78). stripplot mpg, over(rep78) by(foreign, compact) addplot(scatter rep78_1mean, ms(T)) stack

. clonevar rep78_2 = rep78. replace rep78_2 = cond(foreign, rep78 + 0.15, rep78 - 0.15). stripplot mpg, over(rep78_2) separate(foreign) yla(1/5) jitter(1 1)(Challenger shuttle O-ring damage)

. logit damage temperature. predict pre. stripplot damage, over(temperature) stack ms(sh) height(0.4)addplot(mspline pre temperature, bands(20))(Stata's blood pressure data)

. sysuse bplong, clear. egen group = group(age sex), label. stripplot bp*, bar over(when) by(group, compact col(1) note(""))ysc(reverse) subtitle(, pos(9) ring(1) nobexpand bcolor(none)placement(e)) ytitle("") xtitle(Blood pressure (mm Hg))

AcknowledgmentsPhilip Ender helpfully identified a bug. William Dupont offered encouragement. Kit Baum nudged me into implementing

separate(). Maarten Buis made a useful suggestion about this help. Ronán Conroy suggested adding whiskers. He also found two bugs. Marc Kaulisch asked a question which led to more emphasis on the use ofby()and the blood pressure example. David Airey found another bug. Oliver Jones asked a question which led to an example of the use oftwoway rbarto mimic pipe or barcode symbols. Fredrik Norström found yet another bug.

AuthorNicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

ReferencesBecker, R.A., J.M. Chambers, and A.R. Wilks. 1988.

The new S language: Aprogramming environment for data analysis and graphics.Pacific Grove, CA: Wadsworth and Brooks/Cole.Berry, D.A. 1996.

Statistics: a Bayesian perspective.Belmont, CA: Duxbury.Bibby, J. 1986.

Notes towards a history of teaching statistics.Edinburgh: John Bibby (Books).Bland, M. 2000.

An introduction to medical statistics.Oxford: Oxford University Press.Bowley, A.L. 1910.

An elementary manual of statistics.London: Macdonald and Evans. (seventh edition 1952)Box, G.E.P., W.G. Hunter and J.S. Hunter. 1978.

Statistics forexperimenters: an introduction to design, data analysis, and modelbuilding.New York: John Wiley. (second edition 2005)Chambers, J.M., W.S. Cleveland, B. Kleiner and P.A. Tukey. 1983.

Graphical methods for data analysis.Belmont, CA: Wadsworth.Cleveland, W.S. 1984. Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging.

American Statistician38: 270-80.Cleveland, W.S. 1985.

Elements of graphing data.Monterey, CA: Wadsworth.Cleveland, W.S. 1994.

Elements of graphing data.Summit, NJ: Hobart Press.Cobb, G.W. 1998.

Introduction to design and analysis of experiments.New York: Springer.Cox, N.J. 2004. Speaking Stata: Graphing distributions.

Stata Journal4(1): 66-88.Cox, N.J. 2007. Speaking Stata: Turning over a new leaf.

Stata Journal7(3): 413-433.Cox, N.J. 2009. Speaking Stata: Creating and varying box plots.

StataJournal9(3): 478-496.Computing Resource Center. 1985.

STATA/Graphics user's guide.Los Angeles, CA: Computing Resource Center.Dalgaard, P. 2002.

Introductory statistics with R.New York: Springer.Dickinson, G.C. 1963.

Statistical mapping and the presentation ofstatistics.London: Edward Arnold. (second edition 1973)Ellison, A.M. 1993. Exploratory data analysis and graphic display. In Scheiner, S.M. and J. Gurevitch (eds)

Design and analysis ofecological experiments.New York: Chapman & Hall, 14-45.Ellison, A.M. 2001. Exploratory data analysis and graphic display. In Scheiner, S.M. and J. Gurevitch (eds)

Design and analysis ofecological experiments.New York: Oxford University Press, 37-62.Faraway, J.J. 2005.

Linear models with R.Boca Raton, FL:Chapman and Hall/CRC.Feinstein, A.R. 2002.

Principles of medical statistics.Boca Raton, FL: Chapman and Hall/CRC.Friendly, M., P. Valero-Mora and J.I. Ulargui. 2010. The first (known) statistical graph: Michael Florent van Langren and the "secret" of longitude.

American Statistician64: 174-184. (supplementary materials online)Gregory, S. 1963.

Statistical methods and the geographer.London: Longmans. (later editions 1968, 1973, 1978; publisher later Longman)Griffiths, D., W.D. Stirling and K.L. Weldon. 1998.

Understanding data:principles and practice of statistics.Brisbane: John Wiley.Hay, I. 1996.

Communicating in geography and the environmental sciences.Melbourne: Oxford University Press. (later editions 2002, 2006)Keen, K.J. 2010.

Graphics for statistics and data analysis with R.Boca Raton, FL: CRC Press.Klemelä, J. 2009.

Smoothing of multivariate data: Density estimation andvisualization.Hoboken, NJ: John Wiley.Langren, Michael Florent van. 1644.

La verdadera longitud por mar ytierra.Antwerp.Lee, J.J. and Z.N. Tu. 1997. A versatile one-dimensional distribution plot: the BLiP plot.

American Statistician51: 353-358.Maindonald, J.H. and W.J. Braun. 2007.

Data analysis and graphics usingR-an example-based approach.Cambridge: Cambridge University Press.Matthews, J.A. 1981.

Quantitative and statistical approaches togeography: A practical manual.Oxford: Pergamon.Monkhouse, F.J. and H.R. Wilkinson. 1952.

Maps and diagrams: Theircompilation and construction.London: Methuen. (later editions 1963, 1971)Morgenthaler, S. 2007.

Introduction ā la statistique. Lausanne: Presses polytechniques et universitaires romandes.Pearson, E.S. 1956. Some aspects of the geometry of statistics: the use of visual presentation in understanding the theory and application of mathematical statistics.

Journal of the Royal Statistical SocietyA 119: 125-146.Quinn, G.P. and M.J. Keough. 2002.

Experimental design and data analysisfor biologists.Cambridge: Cambridge University Press.Reimann, C., P. Filzmoser, R.G. Garrett and R. Dutter. 2008.

Statisticaldata analysis explained: applied environmental statistics with R.Chichester: John Wiley.Robbins, N.B. 2005.

Creating more effective graphs.Hoboken, NJ: John Wiley.Ryan, B.F., B.L. Joiner and T.A. Ryan. 1985.

Minitab handbook.Boston, MA: Duxbury.Sasieni, P.D. and P. Royston. 1996. Dotplots.

Applied Statistics45: 219-234.Schenemeyer, J.H. and L.J. Drew. 2011.

Statistics for earth andenvironmental scientists.Hoboken, NJ: John Wiley.Tufte, E.R. 1974.

Data analysis for politics and policy.Englewood Cliffs, NJ: Prentice-Hall.Tufte, E.R. 1997.

Visual explanations: images and quantities, evidenceand narrative.Cheshire, CT: Graphics Press.Tukey, J.W. 1972. Some graphic and semi-graphic displays. In Bancroft, T.A. and Brown, S.A. (eds)

Statistical papers in honor of George W.Snedecor.Ames, IA: Iowa State University Press, 293-316. (also accessible at http://www.edwardtufte.com/tufte/tukey)Tukey, J.W. 1977.

Exploratory data analysis.Reading, MA: Addison-Wesley.Tukey, J.W. and P.A. Tukey. 1990. Strips displaying empirical distributions: I. Textured dot strips. Bellcore Technical Memorandum.

Venables, W.N. and B.D. Ripley. 2002.

Modern applied statistics with S.New York: Springer.Warton, D.I. 2008. Raw data graphing: an informative but under-utilized tool for the analysis of multivariate abundances.

Austral Ecology33: 290-300.Wild, C.J. and G.A.F. Seber. 2000.

Chance encounters: a first course indata analysis and inference.New York: John Wiley.Wilkinson, L. 1992. Graphical displays.

Statistical Methods in MedicalResearch1: 3-25.Wilkinson, L. 1999. Dot plots.

American Statistician53: 276-281.Wilkinson, L. 2005.

The language of graphics.New York: Springer.Young, F.W., P.M. Valero-Mora and M. Friendly. 2006.

Visual statistics:Seeing data with interactive graphics.Hoboken, NJ: John Wiley.

Also seeOn-line: help for dotplot, gr7oneway, histogram, beamplot (if installed)