Density distribution sunflower plots
sunflower yvar xvar [weight] [if exp] [in range] [, binwidth(#d) xcenter(#x) ycenter(#y) light(#a) dark(#b) petalweight(#w) pointsize(#) lightsize(#) darksize(#) dotsize(#) background(#) saving(filename [, replace]) xsize(#) ysize(#) notable nokey graph_options ]
sunflower draws density distribution sunflower plots (Dupont and Plummer 2002). These plots are useful for displaying bivariate data whose density is too great for conventional scatter plots to be effective. A sunflower is a number of line segments of equal length, called petals, that radiate from a central point. There are two varieties of sunflowers: light and dark. Each petal of a light sunflower represents one observation. Each petal of a dark sunflower represents a specific number of observations specified by the user. The program uses dark and light sunflowers to represent high and medium density regions of the data, and dots or circles to represent individual observations in low density regions.
The program first divides the plane defined by the variables yvar and xvar into contiguous regular hexagonal bins of equal size. The width of each bin is given by #d and is specified in the same units as xvar. The program then counts the number of data points that fall within each bin. The user specifies three values, #a, #b, and #w, that specify how these points are to be represented:
1. When there are fewer than #a points in a bin, they are displayed as individual dots or circles as in a conventional scatter plot.
2. When there are at least #a but fewer than #b points in a bin they are represented by a light sunflower.
3. When there are #b or more points in a bin they are represented by a dark sunflower. #w specifies the number of observations represented by each petal of a dark sunflower. More precisely, if a dark sunflower bin contains n points then the number of petals on its sunflower will equal n / #w rounded to the nearest integer. A sunflower with one petal is represented by a dot in the center of its bin.
Running sunflower Under Stata Versions 7 and 8
This program was written to run under Stata version 7. We recommend that it only be used by people running version 7. Stata Corp has released a new version of this program that runs under version 8. Their program includes numerous enhancements over our original program including hexagonal bin shading, the ability to overlay additional curves on the sunflower plot, and other graph options that have been added to version 8 programs. To use their program you must be running Stata Version 8.2 or later. Type
. help sunflower
in the version 8 Stata Command window for further details.
sunflower uses pens 1 through 6. Pen 1 draws and labels the axes of the graph. Pen 2 draws the circles or dots for individual data points. Pens 3 and 5 draw the light and dark sunflowers, respectively. Pens 4 and 6 draw circular colored backgrounds behind the light and dark sunflowers, respectively. The user should choose the color and thickness of pens 3 through 6 to distinguish between light and dark sunflowers. Pens 4 and 6 give additional weight and contrast to the light and dark sunflowers. Note that pens 3 and 4 must have different colors in order for petals on light sunflowers to be visible. Similarly, pens 5 and 6 must also have different colors. We recommend that pens 3 through 6 be chosen so that the color darkness increases with increasing data density. An example, called fig2.do, is given below which makes reasonable choices for these pen colors.
Frequency weights may be specified.
All sunflowers should lie entirely above the x axis and to the right of the y axis. If you choose a small bin width, or use the default bin width, this should occur automatically. With a large bin width and a cluster of points near an axis, it is possible for a sunflower to intersect the edge of the graph window. When this happens an error message occurs. The problem can be fixed by using the xlabel and ylabel options to ensure that no sunflower touches an axis.
Options binwidth(#d) sets the horizontal width of each bin to #d. This width is specified in the same units as xvar. The default bin width is set to equal (max of xvar - min of xvar) / 40. The bin height in units of yvar is determined by the program and depends on the bin width, the aspect ratio of the x and y axes, the range of values observed for xvar and yvar, and the aspect ratio of the graph window. Note that the shape of the bins is always that of a regular hexagon.
xcenter(#x) and ycenter(#y) specify the center of a bin to be at (#x, #y). The default values of #x and #y are the median values of xvar and yvar, respectively. The centers of the other bins are implicitly defined by (#x, #y) together with the bin width #d.
light(#a) specifies #a to be the minimum number of points needed in a bin to generate a light sunflower. The default value of #a is 3.
dark(#b) specifies #b to be the minimum number of points needed in a bin to generate a dark sunflower. The default value of #b is 13.
petalweight(#w) specifies #w to be the number of observations represented by each petal of a dark sunflower. The default value of #w is chosen so that the maximum number of petals on a dark sunflower equals 14.
pointsize(#) specifies the size of the circle representing individual points as a percent of the default size. The default is 100 percent.
lightsize(#) specifies the size of the light sunflowers as a percent of the maximum permitted size, which is #d / 2. The default is 80 percent, which produces light sunflower petals of length 0.8 * #d / 2.
darksize(#) specifies the size of the dark sunflowers as a percent of the maximum permitted size, which is #d / 2. The default is 97.5 percent, which produces dark sunflower petals of length 0.95 * #d / 2.
dotsize(#) specifies the size of the dot drawn at the center of each sunflower as a percent of the default size. The default is 100 percent.
background(#r) sets the size of the colored circular background behind sunflowers as a number between 0 and 1. The default is 1, in which case the edge of each background passes through the vertices of its bin. When #r = 0 the edge of each background kisses the edges of its bin. Let r0 denote the background radius when #r = 0 and let r1 denote this radius when #r = 1. Then r0 = #d / 2 and r1 = r0 / cos(pi / 6). In general, this radius equals r0 + (#r) * (r1 - r0).
saving(filename [, replace]) saves the graph in a file. If you do not specify an extension, .gph will be assumed.
xsize(#) specifies the width, in inches, of the graph image. The default is the current Stata default size in effect when sunflower is called.
ysize(#) specifies the height, in inches, of the graph image. The default is the current Stata default size in effect when sunflower is called.
notable specifies that the summary table produced by the sunflower command be omitted.
nokey causes the sunflower key at the top of the graph to be omitted.
graph_options are the same as those allowed for scatter plots with the graph command, except that the connect(), symbol(), pen(), trim(), psize(), bands(), and jitter() options may not be used.
. sunflower mpg displ
. sunflower mpg displ, xc(100) yc(100) binwid(10)
. sunflower mpg weight, binwid(300) xlab ylab petal(2) light(3) dark(15)
---- fig2.do --------------------------- log using "fig2.log", replace * * fig2.do * * N.B. The gprefs statements given below will * override your Custom 1 color settings! * * This is a Stata program that produces figure 2 from * * Dupont WD and Plummer WD Jr. (2002) "Density Distribution Sunflower Plots" * Submitted for publication. URL pending. * * use "http://www.mc.vanderbilt.edu/prevmed/wddtext/data/2.20.Framingham.dta" > , clear * * Drop ten extreme outliers. * drop if bmi>55 * * Color choices for custom1 * * The following color choices attempt to make an analogy between data density * and a topographical map of an island in the sea. Hence individual data points * are blue (the sea), light sunflowers are brown on green (low altitude verdant * regions), and dark flowers are black on brown (high altitude rocky areas). * * Pen Color hue sat lum: red green blue * ----------------------------------------------- * 1 black: 0 0: 0 0 0 * 2 true blue: 160 240 120: 0 0 255 * 3 dark brown: 20 240 60: 128 64 0 * 4 light green: 50 240 200: 234 255 170 * 5 black: 0 0: 0 0 0 * 6 light brown: 20 240 120: 255 128 0 * * All pen widths are 9 * Background color is white * Graph size is 6 x 4 inches * * Define color scheme custom1 as specified above * gprefs set window scheme custom1 gprefs set custom1 background_color 255 255 255 gprefs set custom1 pen1_color 0 0 0 gprefs set custom1 pen2_color 0 0 255 gprefs set custom1 pen3_color 128 64 0 gprefs set custom1 pen4_color 234 255 170 gprefs set custom1 pen5_color 0 0 0 gprefs set custom1 pen6_color 255 128 0 gprefs set custom1 pen1_thick 9 gprefs set custom1 pen2_thick 9 gprefs set custom1 pen3_thick 9 gprefs set custom1 pen4_thick 9 gprefs set custom1 pen5_thick 9 gprefs set custom1 pen6_thick 9 gprefs set window xsize 6 gprefs set window ysize 4 * * Draw sunflower plot of diastolic blood pressure by body mass index * using the Framingham data (Levy 1999). * sunflower dbp bmi, bin(.85) xlabel(20 25 to 50) ylabel(50 70 to 150) gap(3) log off ---- end fig2.do -----------------------
This program was designed by William D. Dupont and W. Dale Plummer Jr. It was written by W. Dale Plummer Jr.
It may be downloaded together with documentation from http://ideas.repec.org/c/boc/bocode/s430201.html.
Address: Division of Biostatistics S-2323 Medical Center North Vanderbilt University School of Medicine Nashville TN, 37232-2158
E-mail: email@example.com firstname.lastname@example.org
The sunflower program is based, in part, on public domain code found in the program flower (Steichen and Cox 1999). We thank these authors for making their code available.
We also thank Nicholas J. Cox for converting our help file to SMCL and for some helpful edits.
We are most grateful to Jeff Pitblado and Stata Corp for releasing the version 8.2 edition of this program. Jeff has completely rewritten the graphics part of the program to run under version 8.
Carr, D.B., Littlefield, R.J., Nicholson, W.L., and Littlefield, J.S. (1987) Scatterplot matrix techniques for large N. Journal of the American Statistical Association, 82: 424-436.
Cleveland, W.S. and McGill, R. (1984) The many faces of a scatterplot. Journal of the American Statistical Association, 79: 807-822.
Dupont, W.D. and Plummer W.D. Jr. (2003) Density distribution sunflower plots. Journal of Statistical Software, 8:(3) 1-11. Downloadable from http://www.jstatsoft.org/index.php?vol=8. Accessed January 23, 2003.
Dupont, W.D. and Plummer, W.D. Jr. (2002) sunflower: Stata module to draw density distribution sunflower plots. Stata program and help file downloadable from http://ideas.repec.org/c/boc/bocode/s430201.html. Accessed December 12, 2002.
Huang, C., McDonald, J.A, and Stuetzle, W. (1997) Variable resolution bivariate plots. Journal of Computational and Graphical Statistics, 6: 383-396.
Levy, D. (1999) 50 years of discovery: medical milestones from the National Heart, Lung, and Blood Institute's Framingham Heart Study. Hackensack, NJ: Center for Bio-Medical Communication Inc.
Steichen, T.J. and Cox, N.J. (1999) flower: Stata module to draw sunflower plots. Stata program and help file downloadable from http://ideas.repec.org/c/boc/bocode/s393001.html. Accessed December 6, 2002.
Manual: [G] graphics On-line: help for graph, functions (for round())