{smcl}
{* 16may2004/7jun2010}{...}
{hline}
help for {hi:catplot}
{hline}
{title:Plots of frequencies, fractions or percents of categorical data}
{p 8 17 2}
{cmd:catplot}
{it: catvar1} [{it:catvar2} [{it:catvar3}]]
[{it:weight}]
[{cmd:if} {it:exp}]
[{cmd:in} {it:range}]
[
{cmd:,}
{c -(}{cmdab:fr:action}{c |}{cmdab:fr:action(}{it:varlist}{cmd:)}{c |}{cmdab:perc:ent}{c |}{cmdab:perc:ent(}{it:varlist}{cmd:)}{c )-} {break}
{cmd:var1opts(}{it:over_options}{cmd:)}
{cmd:var2opts(}{it:over_options}{cmd:)}
{cmd:var3opts(}{it:over_options}{cmd:)} {break}
{cmd:recast(}{it:plottype}{cmd:)}
{it:graph_options}
]
{title:Description}
{p 4 4 2}
{cmd:catplot} shows frequencies (or optionally fractions or percents) of
the categories of one, two or three categorical variables. The first
named variable is innermost on the display; that is, its categories vary
fastest. Often, but not necessarily, it will be the response or outcome
of interest. By default {cmd:catplot} is a wrapper for
{help graph_bar:graph hbar}. Optionally {cmd:catplot} may be recast as a
wrapper for {help graph_bar:graph bar} or {help graph_dot:graph dot}.
The choice is a matter of personal taste, although in general horizontal
displays make it easier to identify names or labels of categories.
{p 4 4 2}{cmd:fweight}s, {cmd:aweight}s and {cmd:iweight}s may be
specified. This opens a door to use of {cmd:catplot} for plotting any
set of values for each of several different categories.
{title:Remarks}
{p 4 4 2}This version of {cmd:catplot} (2.0.0 or up) is not compatible
with previous versions.
{p 4 4 2}The default display using {cmd:graph hbar} or {cmd:graph bar}
is graphically conservative, reflecting the view that height or length of bars and
text indicating categories are good ways of conveying information.
If you wish also to have bars in different colours, specify the option
{cmd:asyvars}, which differentiates the categories of the {it:first}
named variable {it:catvar1}. If you wish also to stack bars of different
colours, specify the further option {cmd:stack}.
{p 4 4 2}The default display with {cmd:graph dot} is similarly
conservative. If you wish to have point symbols in different colours,
specify the option {cmd:asyvars}, which differentiates the categories of
the {it:first} named variable {it:catvar1}. If you wish also to use
different point symbols, use the further option {cmd:marker()}.
{p 4 4 2}Note some simple principles in this territory:
{p 8 8 2}It is difficult to create a great graph, but easy to improve a
bad one.
{p 8 8 2}Comparisons must be easy. That could mean in one dimension,
across a row or down a column, or it could mean using a table structure.
{p 8 8 2}Ordering by magnitude may be even more useful than ordering
by category.
{p 8 8 2}Bars are better than pie slices as length is easier to judge
than angle. Dots on a scale are a good way to include magnitudes.
{p 8 8 2}Text is better read as horizontal than as vertical.
{p 8 8 2}Showing numbers as text as well by graphical elements can be
helpful.
{p 8 8 2}Lose the legend if you can. A great advantage of
{cmd:graph hbar {c |} bar {c |} dot} is strong support for category labels,
which can be nested too.
{p 8 8 2}The sum of one value is just that value, so weights allow showing
any values, not just frequencies or percents.
{p 8 8 2}{cmd:by()} allows table structures to be shown with
{cmd:graph hbar {c |} bar {c |} dot}.
{p 8 8 2}{cmd:by()} can look like another {cmd:over()}.
{title:Options}
{p 4 8 2}{cmd:fraction} indicates that all frequencies should be shown
as fractions (with sum 1) of the total frequency of all values being
represented in the graph.
{p 4 8 2}{cmd:fraction(}{it:varlist}{cmd:)} indicates that all
frequencies should be shown as fractions (with sum 1) of the total
frequency for each distinct category defined by the combinations of
{it:varlist}. For example, given a variable {cmd:sex} with two
categories male and female, the fractions shown for male would have sum
1 and those for female would have sum 1.
{p 4 8 2}{cmd:percent} indicates that all frequencies should be shown as
percents (with sum 100) of the total frequency of all values being
represented in the graph.
{p 4 8 2}{cmd:percent(}{it:varlist}{cmd:)} indicates that all
frequencies should be shown as percents (with sum 100) of the total
frequency for each distinct category defined by the combinations of
{it:varlist}. For example, given a variable {cmd:sex} with two
categories male and female, the percents shown for male would have sum
100 and those for female would have sum 100.
{p 4 8 2}Only one of these {cmd:fraction}[{cmd:()}] and
{cmd:percent}[{cmd:()}] options may be specified.
{p 4 8 2}{cmd:recast()} recasts the graph to another {it:plottype}, one
of {cmd:hbar}, {cmd:bar}, {cmd:dot}.
{p 8 8 2}Note for users of Stata 10 up: using the {help Graph Editor} is another
way to produce these and many other changes.
{p 8 8 2}Note for experienced users: although the name is suggested by another
{help advanced_options:recast()} option, this is not a back door to recasting
to a {cmd:twoway} plot.
{p 4 8 2}{cmd:var1opts()}, {cmd:var2opts()} and {cmd:var3opts()} contain
calls to an {cmd:over()} option of
{help graph_bar:graph bar}, {help graph_bar:graph hbar} or
{help graph_bar:graph dot} as appropriate
controlling the display of elements for {it:catvar1},
{it:catvar2} and {it:catvar3} respectively. For example,
{cmd:var1opts(sort(1) descending)} specifies that values of {it:catvar1}
should be sorted on frequency or percent and displayed increasing
downwards or from left to right.
{p 4 8 2}{it:graph_options} refers to options of
{help graph_bar:graph bar}, {help graph_bar:graph hbar} or
{help graph_bar:graph dot} as appropriate.
{cmd:by()} is one useful
example. Note: any categorical axis title that appears by default is
produced by {cmd:l1title()} with {cmd:hbar} or {cmd:dot} or by
{cmd:b1title()} with {cmd:bar} or the (otherwise undocumented)
{cmd:vertical} option.
{title:Examples}
{p 4 8 2}{cmd:. set scheme s1color}
{p 4 4 2}(Stata's auto data){p_end}
{p 4 8 2}{cmd:. sysuse auto, clear}
{p 4 8 2}{cmd:. catplot rep78}{p_end}
{p 4 8 2}{cmd:. catplot rep78, blabel(bar, pos(base) size(4)) bar(1, bfcolor(none)) ysc(off)}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign, nofill}{p_end}
{p 4 8 2}{cmd:. catplot rep78, by(foreign) percent(foreign)}{p_end}
{p 4 8 2}{cmd:. catplot rep78, by(foreign) percent(foreign) recast(bar)}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign, percent(foreign) bar(1, bcolor(blue)) blabel(bar, position(outside) format(%3.1f)) ylabel(none) yscale(r(0,60))}
{p 4 8 2}{cmd:. gen himpg = mpg > 25}{p_end}
{p 4 8 2}{cmd:. label def himpg 1 "mpg > 25" 0 "mpg <= 25"}{p_end}
{p 4 8 2}{cmd:. label val himpg himpg}{p_end}
{p 4 8 2}{cmd:. catplot himpg rep78 foreign}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign, by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e))}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign, recast(dot) by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e))}{p_end}
{p 4 8 2}{cmd:. catplot rep78 foreign, recast(bar) by(himpg, row(1) note("")) subtitle(, pos(6) ring(1) bcolor(none) nobexpand)}
{p 4 8 2}{cmd:. catplot rep78, var1opts(sort(1)) }{p_end}
{p 4 8 2}{cmd:. catplot rep78, var1opts(sort(1) descending) }{p_end}
{p 4 4 2}(Titanic data){p_end}
{p 4 8 2}{cmd:. use titanic, clear}{p_end}
{p 4 8 2}{cmd:. collapse survived, by(age sex class)}{p_end}
{p 4 8 2}{cmd:. catplot age sex [aw=100*survived], by(class, compact note("") col(1)) bar(1, blcolor(gs8) bfcolor(gs14)) blabel(bar, format(%4.1f) pos(base))}
{cmd: subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic, place(e)) var1opts(gap(0)) var2opts(gap(*.2)) outergap(*.2) ysize(5) yla(0(25)100, glcolor(gs14) glw(*.5))}
{p 4 8 2}{cmd:. catplot age sex [aw=100*survived], by(class, compact note("") col(1) ) bar(1, blcolor(gs8) bfcolor(pink*.2)) blabel(bar, format(%4.1f) pos(base))}
{cmd: subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic) var1opts(gap(*0.1) axis(noline)) var2opts(gap(*.2)) ysize(5) yla(none) ysc(noline) plotregion(lcolor(none))}
{title:Author}
{p 4 4 2}Nicholas J. Cox, Durham University{break}
n.j.cox@durham.ac.uk
{title:Acknowledgments}
{p 4 4 2}The first version of {cmd:catplot} was written and revised in
2003 and 2004. At that time, Vince Wiggins provided very helpful
comments, Fred Wolfe asked for sorting and David Schwappach provided
feedback on limitations. During revision in 2010, Vince Wiggins and
Ron{c a'}n Conroy made encouraging noises.
{title:Also see}
{p 4 8 2}On-line: help for {help graph_hbar:graph hbar};
{help graph_bar:graph bar}; {help graph_dot:graph dot}; {help histogram};
{help tabplot} (if installed)