Sparkline-type plots
sparkline yvarlist xvar [if] [in] [ , by(varname [, byopts]) separate(varname) height(#) limits(min max) format(fmt) extremes[(scatter_options)] extremeslabel[(marker_label_options)] flipy variablelabels line_options ]
sparkline yvar xvar [if] [in] [ , over(varname) by(varname [, byopts]) separate(varname) height(#) limits(min max) format(fmt) extremes[(scatter_options)] extremeslabel[(marker_label_options)] flipy variablelabels line_options ]
Description
sparkline graphs sparkline-type plots for one or more y variables against a single x variable. Typically, plots for different y variables or for different subsets of one y variable are stacked vertically into one image. Commonly, but not necessarily, such plots are multiple time series, so that the x variable is a time variable.
Slogans
Graphics can be shrunk way down. E.R. Tufte (1983/2001, p.169)
Wider-than-tall shapes usually make it easier for the eye to follow from left to right. J.W. Tukey (1977, p.129)
Remarks
sparkline takes its name from the discussion in Tufte (2006, pp.44-63). Sparklines are typically simple in design, sparing of space and rich in data, but they include several quite different kinds of graph otherwise. The most common kind of example, however, shows several wider-than-tall time series stacked vertically. By any reasonably broad definition, sparklines have long been standard in several fields, including climatology, ecology (e.g. pollen diagrams) and physiology (notably electroencephalography and electrocardiography). Tufte provided an memorable and evocative new name and an excellent provocative discussion.
sparkline is intended to support certain line plots and related graphs presented in a sparkline style. The implementation is indicative, not definitive. It is not the intention that sparkline provides Stata support for every kind of graph discussed elsewhere under this heading. Conversely, sparkline supports some kinds of graphs that might not be considered sparklines in Tufte's sense.
There are two leading situations in which sparkline may be useful. In both there is a single x variable. As mentioned, x is commonly but not necessarily a time variable.
One y variable and a subdivision by a third variable into subsets (e.g. panel data). The subdivision is usually indicated to sparkline by the over() option.
Two or more y variables.
In both situations, subsets or variables are plotted separately and stacked vertically. Values are scaled to (value - minimum) / (maximum - minimum) so that they lie in [0,1] and each subset or variable is assigned the same vertical space. Axis labelling is, however, in terms of the observed or optionally specified minimum and maximum.
Know that multiple subsets or variables on the y axis are plotted centred on y = 1, 2, 3, etc. Otherwise data are plotted as supplied.
Note that it is not an error to specify a single variable y without the over() option. The plot so produced would have the same style as other plots produced by sparkline but contain only a single set of values. More usefully, subplots could be produced by also using the by() option. Note that specifying by() does not itself trigger scaling to [0,1].
Some sparkline displays show elaborate mixes of text and graphical displays and would require more complex Stata code or more work integrating text and graphics than is supported here. The Examples indicate that previously prepared value labels may be used to carry further text. The same device could be used with variable labels. See also Cox (2008, 2009).
For other broadly similar plots, see in addition to Tufte's references and the page cited below on his website
climatology examples: Lamb (1972)
aligned bar charts: Mackinlay (1986)
survey plots: Lohninger (1994), Hoffman and Grinstein (2002, p.57) and Grinstein et al. (2002, pp.143, 152, 155, 158, 162, 163, 166)
table lens: Rao and Card (1994), Pirolli and Rao (1996)
multiline graphs: Hoffman and Grinstein (2002, p.52)
Note 21 Jan 2013: more references will be added.
Options
over() indicates a third variable (e.g. a panel identifier) to subdivide data. This option may only be used when there is a single y variable.
by() is the usual by_option provided for completeness. Typically it is less useful than the over() option for producing sparklines, but it may be combined with two or more y variables. See Remarks above and Examples below.
separate() indicates that data are to be shown differently for different values of its argument. Otherwise it has no effect on values shown.
height() specifies the height of each vertical zone within which each of the multiple variables or subsets is shown. The default is 0.7, so that each series takes up 70% of the available space.
limits() specifies minimum and maximum to use in scaling all subsets or variables. The default is to use the observed minimum and maximum in each case. Typically this option is useful when values are broadly similar in each case and it is desired to specify exactly similar vertical scales or just to ensure that axis labels are simple. Placement of labels can be based on knowing that series are centred at y = 1, 2, 3, ... and that minimum and maximum are plotted at half the height above and half the height below those levels. See also height() option above.
Note that it is not possible to specify a different minimum and minimum for different subsets or variables. Note also that there is no check on whether the specified minimum and maximum are exceeded by any of the data. The latter imparts both some risk (of overlapping series) and some flexibility.
format() specifies a numeric format controlling the display of axis labels showing the maximum and minimum of each series. The default is the display format of the first y variable specified. See help for format.
extremes specifies that the minimum and maximum of each subset or variable be flagged by marker symbols. The default symbol is O. scatter_options may be specified to tune the display. See help for scatter.
extremeslabel specifies that the minimum and maximum of each subset or variable be shown as marker labels. By default minima are shown at clock position 4 and maxima at clock position 10. marker_label_options may be specified to tune the display. See help for marker_label_options. That said, fine tuning with the Graph Editor may also be desired if this option is found useful.
flipy specifies that what is shown on the left-hand y axis (axis 1) be shown on the right-hand axis (axis 2), and vice versa.
variablelabels specifies that variable labels be shown to describe two or more y variables. The default is to use variable names.
line_options are options of line.
Examples
. set scheme s1color
. webuse grunfeld, clear . sparkline invest mvalue kstock year if company == 1 . sparkline invest year, over(company) . sparkline invest year, over(company) extremes
. sparkline invest year, by(company) extremes . sparkline invest year, by(company, col(2) compact) subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) extremes ysc(log) . sparkline invest mvalue kstock year, by(company) xtitle("") extremes . sparkline invest mvalue kstock year, by(company, note("")) xtitle("") extremes extremeslabel ysc(r(0.3 3.7))
. bysort company (year) : gen clabel= string(invest[_N], "%9.0g") + " " + string(company) . * for labmask: net describe gr0034, from(http://www.stata-journal.com/software/sj8-2) . labmask company, values(clabel) . sparkline invest year, over(company) flipy xtick(1935/1954) xla(1935(5)1950 1954, tlength(*1.6)) extremes
. sysuse auto, clear . gen gpm = 1/mpg . sort rep78 gpm weight . gen observation = _n . sparkline rep78 gpm weight observation, separate(rep78) recast(connect) xla(1 10(10)70 74)
. * iris data in Stata 11 up . webuse iris, clear . pca sep* pet* . predict PC1 . sparkline sep* pet* PC1, separate(iris) recast(scatter) legend(on row(1)) lc(none ..) ms(Oh Dh Th) variablelabels format(%3.1f) . sparkline sep* pet* PC1, separate(iris) recast(scatter) legend(on row(1)) lc(none ..) ms(Oh Dh Th) yla(1 "sepal length" 2 "sepal width" 3 "petal length" 4 "petal width", axis(2)) subtitle(all measurements in cm, place(w) size(*0.8)) format(%3.1f) yli(1.5 2.5 3.5, lstyle(grid)) flipy
. * stocks data in Stata 12 up . webuse stocks, clear . sparkline toyota nissan honda t . sparkline toyota nissan honda t, limits(-0.2 0.2) . sparkline toyota nissan honda t, limits(-0.2 0.2) height(0.8) yla(0.6 "-0.2" 1 "0" 1.4 "0.2" 1.6 "-0.2" 2 "0" 2.4 "0.2" 2.6 "-0.2" 3 "0" 3.4 "0.2", axis(2) labgap(*1) ticks) yli(1.5 2.5, lstyle(grid))
Author
Nicholas J. Cox Durham University n.j.cox@durham.ac.uk
Acknowledgments
Ariel Linden and Vince Wiggins provided encouragement.
References
Cox, N.J. 2008. Speaking Stata: Between tables and graphs. Stata Journal 8: 269-289. http://www.stata-journal.com/sjpdf.html?articlenum=gr0034
Cox, N.J. 2009. Speaking Stata: Paired, parallel, or profile plots for changes, correlations, and other comparisons. Stata Journal 9: 621-639. http://www.stata-journal.com/article.html?article=gr0041
Grinstein, G.G., Hoffman, P.E., Pickett, R.M. and Laskowski, S.J. 2002. Benchmark development for the evaluation of visualization for data mining. In Fayyad, U., Grinstein, G.G. and Wierse, A. (Eds). Information visualization in data mining and knowledge discovery. San Francisco: Morgan Kaufmann, 129-176.
Hoffman, P.E. and Grinstein, G.G. 2002. A survey of visualizations for high-dimensional data mining. In Fayyad, U., Grinstein, G.G. and Wierse, A. (Eds). Information visualization in data mining and knowledge discovery. San Francisco: Morgan Kaufmann, 47-82.
Lamb, H.H. 1972. Climate: present, past and future. Volume 1: Fundamentals and climate now. London: Methuen.
Lohninger, H. 1994. INSPECT: A program system to visualize and interpret chemical data. Chemometrics and Intelligent Laboratory Systems 22: 147-153.
Mackinlay, J.D. 1986. Automating the design of graphical presentations of relational information. ACM Transactions on Graphics 5: 111-141.
Pirolli, P. and Rao, R. 1996. Table lens as a tool for making sense of data. In Catarci, T., Costabilem, M.F., Levialdi, S. and Santucci, G. (Eds) Workshop on Advanced Visual Interfaces: AVI-96. New York: Association for Computing Machinery, 67-80.
Rao, R. and Card, S.K. 1994. The table lens: merging graphical and symbolic representations in an interactive focus+context visualization for tabular information. Proceedings of CHI '94, ACM Conference on Human Factors in Computing Systems New York: Association for Computing Machinery, 318-322 and 481-482.
Tufte, E.R. 1983, 2nd edition 2001. The visual display of quantitative information. Cheshire, CT: Graphics Press.
Tufte, E.R. 2006. Beautiful evidence. Cheshire, CT: Graphics Press.
Sparkline theory and practice. http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR (accessed 15 January 2013)
Tukey, J.W. 1977. Exploratory data analysis. Reading, MA: Addison-Wesley.
Also see
Online: line, tsline, xtline, dotplot, stripplot (if installed), tabplot (if installed)