-------------------------------------------------------------------------------
help for catplot
-------------------------------------------------------------------------------

Plots of frequencies, fractions or percents of categorical data

catplot catvar1 [catvar2 [catvar3]] [weight] [if exp] [in range] [ , {fraction|fraction(varlist)|percent|percent(varlist)} var1opts(over_options) var2opts(over_options) var3opts(over_options) recast(plottype) graph_options ]

Description

catplot shows frequencies (or optionally fractions or percents) of the categories of one, two or three categorical variables. The first named variable is innermost on the display; that is, its categories vary fastest. Often, but not necessarily, it will be the response or outcome of interest. By default catplot is a wrapper for graph hbar. Optionally catplot may be recast as a wrapper for graph bar or graph dot. The choice is a matter of personal taste, although in general horizontal displays make it easier to identify names or labels of categories.

fweights, aweights and iweights may be specified. This opens a door to use of catplot for plotting any set of values for each of several different categories.

Remarks

This version of catplot (2.0.0 or up) is not compatible with previous versions.

The default display using graph hbar or graph bar is graphically conservative, reflecting the view that height or length of bars and text indicating categories are good ways of conveying information. If you wish also to have bars in different colours, specify the option asyvars, which differentiates the categories of the first named variable catvar1. If you wish also to stack bars of different colours, specify the further option stack.

The default display with graph dot is similarly conservative. If you wish to have point symbols in different colours, specify the option asyvars, which differentiates the categories of the first named variable catvar1. If you wish also to use different point symbols, use the further option marker().

Note some simple principles in this territory:

It is difficult to create a great graph, but easy to improve a bad one.

Comparisons must be easy. That could mean in one dimension, across a row or down a column, or it could mean using a table structure.

Ordering by magnitude may be even more useful than ordering by category.

Bars are better than pie slices as length is easier to judge than angle. Dots on a scale are a good way to include magnitudes.

Text is better read as horizontal than as vertical.

Showing numbers as text as well by graphical elements can be helpful.

Lose the legend if you can. A great advantage of graph hbar | bar | dot is strong support for category labels, which can be nested too.

The sum of one value is just that value, so weights allow showing any values, not just frequencies or percents.

by() allows table structures to be shown with graph hbar | bar | dot.

by() can look like another over().

Options

fraction indicates that all frequencies should be shown as fractions (with sum 1) of the total frequency of all values being represented in the graph.

fraction(varlist) indicates that all frequencies should be shown as fractions (with sum 1) of the total frequency for each distinct category defined by the combinations of varlist. For example, given a variable sex with two categories male and female, the fractions shown for male would have sum 1 and those for female would have sum 1.

percent indicates that all frequencies should be shown as percents (with sum 100) of the total frequency of all values being represented in the graph.

percent(varlist) indicates that all frequencies should be shown as percents (with sum 100) of the total frequency for each distinct category defined by the combinations of varlist. For example, given a variable sex with two categories male and female, the percents shown for male would have sum 100 and those for female would have sum 100.

Only one of these fraction[()] and percent[()] options may be specified.

recast() recasts the graph to another plottype, one of hbar, bar, dot.

Note for users of Stata 10 up: using the Graph Editor is another way to produce these and many other changes.

Note for experienced users: although the name is suggested by another recast() option, this is not a back door to recasting to a twoway plot.

var1opts(), var2opts() and var3opts() contain calls to an over() option of graph bar, graph hbar or graph dot as appropriate controlling the display of elements for catvar1, catvar2 and catvar3 respectively. For example, var1opts(sort(1) descending) specifies that values of catvar1 should be sorted on frequency or percent and displayed increasing downwards or from left to right.

graph_options refers to options of graph bar, graph hbar or graph dot as appropriate. by() is one useful example. Note: any categorical axis title that appears by default is produced by l1title() with hbar or dot or by b1title() with bar or the (otherwise undocumented) vertical option.

Examples

. set scheme s1color

(Stata's auto data) . sysuse auto, clear

. catplot rep78 . catplot rep78, blabel(bar, pos(base) size(4)) bar(1, bfcolor(none)) ysc(off) . catplot rep78 foreign . catplot rep78 foreign, nofill . catplot rep78, by(foreign) percent(foreign) . catplot rep78, by(foreign) percent(foreign) recast(bar) . catplot rep78 foreign, percent(foreign) bar(1, bcolor(blue)) blabel(bar, position(outside) format(%3.1f)) ylabel(none) yscale(r(0,60))

. gen himpg = mpg > 25 . label def himpg 1 "mpg > 25" 0 "mpg <= 25" . label val himpg himpg . catplot himpg rep78 foreign . catplot rep78 foreign, by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) . catplot rep78 foreign, recast(dot) by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) . catplot rep78 foreign, recast(bar) by(himpg, row(1) note("")) subtitle(, pos(6) ring(1) bcolor(none) nobexpand)

. catplot rep78, var1opts(sort(1)) . catplot rep78, var1opts(sort(1) descending)

(Titanic data) . use titanic, clear . collapse survived, by(age sex class)

. catplot age sex [aw=100*survived], by(class, compact note("") col(1)) bar(1, blcolor(gs8) bfcolor(gs14)) blabel(bar, format(%4.1f) pos(base)) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic, place(e)) var1opts(gap(0)) var2opts(gap(*.2)) outergap(*.2) ysize(5) yla(0(25)100, glcolor(gs14) glw(*.5))

. catplot age sex [aw=100*survived], by(class, compact note("") col(1) ) bar(1, blcolor(gs8) bfcolor(pink*.2)) blabel(bar, format(%4.1f) pos(base)) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic) var1opts(gap(*0.1) axis(noline)) var2opts(gap(*.2)) ysize(5) yla(none) ysc(noline) plotregion(lcolor(none))

Author

Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk

Acknowledgments

The first version of catplot was written and revised in 2003 and 2004. At that time, Vince Wiggins provided very helpful comments, Fred Wolfe asked for sorting and David Schwappach provided feedback on limitations. During revision in 2010, Vince Wiggins and Ronán Conroy made encouraging noises.

Also see

On-line: help for graph hbar; graph bar; graph dot; histogram; tabplot (if installed)