-------------------------------------------------------------------------------
help for zmap
-------------------------------------------------------------------------------

Binned scatter map

zmap zvar xvar yvar [if exp] [in range] [ , { pctiles(numlist) | breaks(numlist) } multiples graph_options ]

Description

zmap graphs (or maps) binned values of a variable z with respect to two variables x and y treated as Cartesian coordinates. In geographical or cartographical terms x defines distance east and y defines distance north. The range of z is divided into two or more bins or classes and points in each bin are shown distinctly. The resulting plot is thus a composite scatter plot.

By default binning is into 8 classes with 7 breaks determined by the 5 10 25 50 75 90 95% points or percentiles of the distribution of z. Alternatively, the user may specify other percentile breaks, or a set of breaks on the scale of z. The number of classes in general is naturally one more than the number of breaks.

By default with between 1 and 8 breaks, points falling into different bins are shown with different gray scale colours, darker meaning higher values. If more than 8 breaks are specified, default colours are just those of the prevailing graph scheme. In either case users may specify their own colour choices to override defaults.

Lower limits are inclusive, so that each bin contains points >= its lower limit and < its upper limit. The upper limit of the highest class is open.

Remarks

The main intended application is that z is a spatial series measured at numerous points or for numerous small areas with respect to map coordinates x and y. However, nothing ties this command to spatial data. Users may wish to use the command on other trivariate data. The marker symbols used may then be better set to something larger than the default of points.

If the y variable follows a table or matrix row or Southern latitude convention so that it increases downwards, then use the ysc(reverse) option.

If you already have a classificatory variable, you do not need any special machinery. Some basic technique is given by the schematic examples below. See also Cox (2005).

separate y, by(classvar) veryshortlabel scatter `r(varlist)' x

or

scatter y x, by(classvar)

The following limitations may be noted:

1. zmap is not smart about tied values. Higher values of z for the same x and y values will just overplot lower values. If this is important, consider averaging z for each distinct combination of x and y in some way. An example appears below.

2. zmap does not apply any special intelligence to ensure appropriate aspect ratios to maintain equal scales on both x and y axes. The presumption is that most uses will be exploratory or that, if this is important, xsize(), ysize() or aspect() options may be used according to taste.

3. zmap can do nothing about the limitations of your monitor, or indeed any other monitor.

Options

pctiles(numlist) specifies that breaks between categories are defined by the percent points or percentiles indicated. For example, pct(25 50 75) specifies that the 25%, 50% and 75% points (lower quartile, median, upper quartile) be used as breaks, thus defining 4 classes. Percent points must be greater than 0 and less than 100.

breaks(numlist) specifies that breaks between categories are defined by the values indicated. For example, br(100 200 400) specifies that values of 100, 200 and 400 be used as breaks, thus defining 4 classes.

Only one of pctiles() and breaks() may be specified. If neither is specified, the default is pctiles(5 10 25 50 75 90 95).

multiples specifies that each bin or class be shown as a separate graph. Thus each plot is just a plot of the x and y values of points in that bin of z. multiples is likely to be most useful with a small number of bins.

graph_options refers to any of the options of scatter. Defaults include, but are not limited to,

if multiples is not specified: ysc(off) xsc(off) ms(p ..) legend(col(1) pos(3)) plotregion(style(none))

if multiples is specified: by(, compact) yla(none) xla(none) ms(p) mcolor(gs4) legend(off) plotregion(style(none))

Examples

(spatial data, all x, y pairs distinct) . zmap B5 X Y, pct(25 50 75) . zmap B5 X Y, pct(25 50 75) mcolor(blue blue*0.5 orange*0.5 orange) . zmap B5 X Y, pct(25 50 75) multiples

(non-spatial data, average z given x, y first) . webuse nlswork . egen mean = mean(ln_wage), by(age grade) . egen tag = tag(age grade) . label var mean "mean ln wage" . su ln_wage if !missing(age, grade), detail . zmap mean age grade if tag, breaks(.993 1.166 1.361 1.641 1.964 2.275 2.456) ms(S ..) ysc(on) xsc(on) yla(0/18, ang(h)) ytitle(`: var label grade') xla(15(5)45) note("")

Acknowledgments

Vince Wiggins made a very helpful suggestion.

Author

Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

References

Cox, N.J. 2005. Classifying data points on scatter plots. Stata Journal 5: 604-606.

Also see

On-line: help for spmap (if installed)