{smcl} {* 16may2004/7jun2010/23sep2024}{...} {hline} help for {hi:catplot} {hline} {title:Title} {p 4 4 2}Plots of frequencies, fractions or percents of categorical data {p 8 17 2} {cmd:catplot} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] {cmd:,} {cmd:over(}{it:firstvar}{cmd:, }{it:over_options}{cmd:)}{break} [ {cmd:over(}{it:secondvar}{cmd:, }{it:over_options}{cmd:)} {cmd:by(}{it:thirdvar}{cmd:, }{it:by_options}{cmd:)} {break} {c -(}{cmdab:fr:action}{c |}{cmdab:fr:action(}{it:varlist}{cmd:)}{c |}{cmdab:perc:ent}{c |}{cmdab:perc:ent(}{it:varlist}{cmd:)}{c )-} {break} {cmd:recast(}{it:plottype}{cmd:)} {it:graph_options} ] {title:Description} {p 4 4 2} {cmd:catplot} shows frequencies (or optionally fractions or percents) of the categories of one, two or three categorical variables. The first-named variable {it:firstvar}, specified with an {cmd:over()} option, is innermost on the display; that is, its categories vary fastest. Often, but not necessarily, it will be the substantive response or outcome of interest. One or two other variables, perhaps with predictor roles, may be specified using a second {cmd:over()} option and/or a {cmd:by()} option. {p 4 4 2} By default {cmd:catplot} is a wrapper for {help graph_bar:graph hbar}. Optionally {cmd:catplot} may be recast as a wrapper for {help graph bar} or {help graph_dot:graph dot}. The choice is a matter of personal taste, although in general horizontal displays make it easier to identify names or labels of categories. {p 4 4 2} {cmd:fweight}s, {cmd:aweight}s and {cmd:iweight}s may be specified. This opens a door to use of {cmd:catplot} for plotting any set of values for each of several different categories. {title:Quick start} {p 4 4 2}Read in data{p_end} {p 4 8 2}{cmd:. sysuse auto, clear}{p_end} {p 4 4 2}Horizontal bar chart showing category frequencies{p_end} {p 4 8 2}{cmd:. catplot, over(rep78)}{p_end} {p 4 4 2}Horizontal bar chart showing category percents{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) percent}{p_end} {p 4 4 2}Given foreign or domestic, what is percent breakdown of repair record?{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) over(foreign) percent(foreign)}{p_end} {p 4 4 2}Given repair record, what is percent breakdown of foreign or domestic?{p_end} {p 4 8 2}{cmd:. catplot, over(foreign) over(rep78) percent(rep78)}{p_end} {p 4 4 2}And show the percents as numeric text? (You may need to add some space){p_end} {p 4 8 2}{cmd:. catplot, over(foreign) over(rep78) percent(rep78) blabel(bar, format(%02.0f)) ysc(r(0 105))}{p_end} {p 4 4 2}Show that as a side-by-side display {c -} and add an axis title too{p_end} {p 4 8 2}{cmd:. catplot, by(foreign, l1title(Repair record 1978)) over(rep78) percent(rep78) blabel(bar, format(%02.0f)) ysc(r(0 105))}{p_end} {title:Options} {p 4 8 2} {cmd:over()} and {cmd:by()} options are intended to work as in {help graph bar}. Note however that {cmd:by(, total)} is not smart enough to work with fraction or percent options. {p 4 8 2} {cmd:fraction} indicates that all frequencies should be shown as fractions (with sum 1) of the total frequency of all values being represented in the graph. {p 4 8 2} {cmd:fraction(}{it:varlist}{cmd:)} indicates that all frequencies should be shown as fractions (with sum 1) of the total frequency for each distinct category defined by the combinations of its {it:varlist}. For example, given a variable {cmd:male} with two categories male and female, the fractions shown for male would have sum 1 and those for female would have sum 1. {p 4 8 2} {cmd:percent} indicates that all frequencies should be shown as percents (with sum 100) of the total frequency of all values being represented in the graph. {p 4 8 2} {cmd:percent(}{it:varlist}{cmd:)} indicates that all frequencies should be shown as percents (with sum 100) of the total frequency for each distinct category defined by the combinations of its {it:varlist}. For example, given a variable {cmd:male} with two categories male and female, the percents shown for male would have sum 100 and those for female would have sum 100. {p 8 8 2} Only one of these {cmd:fraction}[{cmd:()}] and {cmd:percent}[{cmd:()}] options may be specified. {p 4 8 2} {cmd:recast()} recasts the graph to another {it:plottype}, one of {cmd:hbar}, {cmd:bar}, {cmd:dot}. {p 8 8 2} Note for users of Stata 10 up: using the {help Graph Editor} is another way to produce these and many other changes. {p 8 8 2} Note for experienced users: although the name is suggested by another {help advanced_options:recast()} option, this is not a back door to recasting to a {cmd:twoway} plot. {p 4 8 2} {it:graph_options} refers to other options of {help graph_bar:graph bar}, {help graph_bar:graph hbar} or {help graph_bar:graph dot} as appropriate. {p 8 8 2} Note: you may find it helpful to display information on variables using {cmd:l1title()} with {cmd:hbar} or {cmd:dot}; or {cmd:b1title()} with {cmd:bar} or {cmd:dot} with the (undocumented) {cmd:vertical} option; or with {cmd:subtitle()} in general. See also Remarks below on axis titles. {title:Remarks} {it:Why and how this command was written} {p 4 4 2} This version of {cmd:catplot} is a moderate rewriting of the previous version of {cmd:catplot} from SSC, now there renamed {cmd:catplot2010}. The rewriting reflects personal experience and judgement. The revised syntax is offered as less awkward. The command is also perhaps now better explained and exemplified. {p 4 4 2} The original posting about {cmd:catplot} on Statalist (Cox 2003) explained the main idea. I wanted a one-line command to plot counts, or fractions, or percents of observations of one or more categorical variables. {cmd:graph hbar} and its kin, as released in Stata 8, would do this if you first fed one of those commands a variable to be summed over observations. Suppose you have 10 observations and you can see from listings that you have 7 frogs and 3 toads. Stata will come to the same conclusion once you create a variable that is identically 1 in each observation and then ask for sums. Evidently 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7 and 1 + 1 + 1 = 3 are the counts you need. So counting is just summation. So also is working out fractions and percents. For those you feed Stata, in this example, a variable with observations each containing 1/10 and 100 (1/10) respectively. More complicated set-ups, such as wanting percent breakdowns of the categories of categorical variable {it:C} given cross-combinations of categorical variables {it:A} and {it:B}, are just trivial extensions of the main idea {c -} once you have worked out the details. {p 4 4 2} The easy part is thus writing a wrapper for {cmd:graph hbar} and its kin in which such a variable is created on the fly before calling up the main command. The more challenging part is combining that code with the official {cmd:graph} code that does the hard work, principally through {cmd:over()} and {cmd:by()} options as well as other {cmd:graph} options. Indeed, the user needs to be able to choose between {cmd:hbar}, {cmd:bar} and {cmd:dot} as the engine. The original version (Cox 2003) did that one way and the second version (Cox 2010) did it another way. The first version was discussed in Cox (2004), while the second version has been discussed mainly on Statalist. Now this {cmd:catplot} is a new version that is closer to the official Stata implementation of newer commands {cmd:graph hbar (count)} and {cmd:graph hbar (percent)}, first released on 9 October 2014 within the life of Stata 13 (see {help whatsnew13}). {it:catplot principles and practice} {p 4 4 2} The default display of {cmd:catplot} using {cmd:graph hbar} or {cmd:graph bar} is graphically conservative, reflecting the view that height or length of bars and text indicating categories are good ways of conveying information. If you wish also to have bars in different colours, specify the option {cmd:asyvars}, which differentiates the categories of the first-named variable. If you wish also to stack bars of different colours, specify the further option {cmd:stack}. {p 4 4 2} The default display of {cmd:catplot} using {cmd:graph dot} is similarly conservative. If you wish to have point symbols in different colours, specify the option {cmd:asyvars}, which differentiates the categories of the first-named variable. If you wish also to use different point symbols, use the further option {cmd:marker()}. {p 8 8 2} Such choices may or may not improve the graph. Personal suggestions: legends are to be avoided if possible; multiple colours can confuse as much as they clarify; stacking may make it harder to compare categories with rare or zero frequencies or to show annotation visibly. {p 4 4 2} There is much scope for personal judgment over what is presented as {it:firstvar}, {it:secondvar} and {it:thirdvar}. Indeed, in the {it:Titanic} example below, the mean survival proportion presented as weights is the outcome of interest. A simple comparison such as {p 4 8 2}{cmd:. sysuse auto, clear}{p_end} {p 4 8 2}{cmd:. catplot, percent(foreign) over(rep78) over(foreign) name(G1)}{p_end} {p 4 8 2}{cmd:. catplot, percent(foreign) over(foreign) over(rep78) name(G2)}{p_end} {p 4 4 2} highlights the possibilities for displaying the same results differently. Note the moral that the order of {cmd:over()} options does matter. {p 4 4 2} As usual, running the examples should impart a good sense of what the command can do. Some examples using {cmd:collgrad} (college graduate?) from the {cmd:nlsw88} data are redundant in the sense that {cmd:collgrad} is a (0, 1) indicator variable, so that it would be simpler to plot means, so avoiding the redundancy of plotting two complementary fractions or percents, but they may help to underline how the command works. {p 4 4 2} All that said, a personal suggestion is that many one-, two- or three-way breakdowns of categorical data are better served by {cmd:tabplot}. See Cox (2016) and {cmd:search tabplot, sj} for updates. {it:Axis titles} {p 4 4 2} It is clearly documented that {p 8 8 2} the axis of {cmd:graph bar}, {cmd:graph hbar} and {cmd:graph dot} showing magnitudes is always regarded as the {it:y} axis, even if it is horizontal, while {p 8 8 2} the other axis in those commands is always regarded as a categorical axis, and {it:not} as the {it:x} axis, regardless of whether it is horizontal or vertical. {p 4 4 2} For any programmer coding with these commands, and any user working with them directly or indirectly, these choices become hard rules. In {cmd:catplot} the {cmd:ytitle()} defaults to simple text such as "frequency", "fraction" or "percent" (unless you are using weights), but you are encouraged to over-write that default with any text closer to your purpose. That title is displayed horizontally if you are using {cmd:graph hbar} or {cmd:graph dot} and vertically if you are using {cmd:graph bar} or {cmd:graph dot, vertical}. {p 4 4 2} While attempts to set an {cmd:xtitle()} will fail, a rich variety of other {help title options} are available. The examples show some of the possibilities. {it:Homespun and other wisdom} {p 4 4 2} Note some simple principles in this territory: {p 8 8 2} It is difficult to create a great graph, but easy to improve a bad one. {p 8 8 2} Comparisons should be easy. That could mean in one dimension, across a row or down a column, or it could mean using a table structure. {p 8 8 2} Ordering by magnitude may be even more useful than ordering by category. {p 8 8 2} Bars are better than pie slices as length is easier to judge than angle. Dots on a scale are a good way to include magnitudes. {p 8 8 2} Text is better read as horizontal than as vertical. {p 8 8 2} Showing numbers as text as well by graphical elements can be helpful. {p 8 8 2} Lose the legend if you can. A great advantage of {cmd:graph hbar {c |} bar {c |} dot} is strong support for category labels, which can be nested too. {p 8 8 2} The sum of one value is just that value, so weights allow showing any values, not just frequencies or percents. {p 8 8 2} {cmd:by()} allows table structures to be shown with {cmd:graph hbar {c |} bar {c |} dot}. {p 8 8 2} {cmd:by()} can look like another {cmd:over()}. {title:Examples} {p 4 8 2}Choose a different scheme according to taste or if using Stata 17 or earlier.{p_end} {p 4 8 2}{cmd:. set scheme stcolor} {p 4 4 2}(Stata's auto data){p_end} {p 4 8 2}{cmd:. sysuse auto, clear}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) l1title(Repair record 1978) name(CAT1, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78, sort(1)) l1title(Repair record 1978) name(CAT2, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78, sort(1) descending) l1title(Repair record 1978) name(CAT3, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) l1title(Repair record 1978) blabel(bar, pos(base) size(4)) bar(1, bfcolor(none)) ysc(off) name(CAT4, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) over(foreign) subtitle(Car origin and Repair record 1978) name(CAT5, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) over(foreign) subtitle(Car origin and Repair record 1978) nofill name(CAT6, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) by(foreign, note("") l1title(Repair record 1978)) percent(foreign) name(CAT7, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) by(foreign, note("")) b2title(Repair record 1978) percent(foreign) recast(bar) name(CAT8, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(rep78) by(foreign, note("") l1title(Repair record 1978)) percent(foreign) blabel(bar, position(outside) format(%3.1f)) ylabel(none) yscale(r(0,60)) name(CAT9, replace)}{p_end} {p 4 8 2}(Stata's nlsw88 data){p_end} {p 4 8 2}{cmd:. sysuse nlsw88, clear}{p_end} {p 4 8 2}{cmd:. catplot, over(collgrad) over(race) by(married, note("")) name(CAT10, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(collgrad) over(race) by(married, note("")) recast(dot) name(CAT11, replace)}{p_end} {p 4 8 2}{cmd:. local opts percent(race married) blabel(bar, format(%02.1f))}{p_end} {p 4 8 2}{cmd:. catplot, over(collgrad) over(race) by(married, note("")) `opts' name(CAT12, replace)}{p_end} {p 4 8 2}{cmd:. local trick subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e))}{p_end} {p 4 8 2}{cmd:. local opts `opts' `trick'}{p_end} {p 4 8 2}{cmd:. catplot, over(married) over(race) by(collgrad, col(1) note("")) `opts' name(CAT13, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(married) over(race) by(collgrad, col(1) note("")) recast(bar) `opts' name(CAT14, replace)}{p_end} {p 4 8 2}{cmd:. catplot, over(married) over(race) by(collgrad, col(1) note("")) recast(dot) `opts' name(CAT15, replace)}{p_end} {p 4 8 2}({it:Titanic} data: Dawson 1995){p_end} {p 4 8 2}{cmd:. clear}{p_end} {p 4 8 2}{cmd:. input byte(class adult male) float survived}{p_end} {p 4 8 2}{cmd:1 0 0 1}{p_end} {p 4 8 2}{cmd:2 0 0 1}{p_end} {p 4 8 2}{cmd:3 0 0 .4516129}{p_end} {p 4 8 2}{cmd:1 0 1 1}{p_end} {p 4 8 2}{cmd:2 0 1 1}{p_end} {p 4 8 2}{cmd:3 0 1 .27083334}{p_end} {p 4 8 2}{cmd:1 1 0 .9722222}{p_end} {p 4 8 2}{cmd:2 1 0 .8602151}{p_end} {p 4 8 2}{cmd:3 1 0 .4606061}{p_end} {p 4 8 2}{cmd:4 1 0 .8695652}{p_end} {p 4 8 2}{cmd:1 1 1 .3257143}{p_end} {p 4 8 2}{cmd:2 1 1 .08333334}{p_end} {p 4 8 2}{cmd:3 1 1 .16233766}{p_end} {p 4 8 2}{cmd:4 1 1 .2227378}{p_end} {p 4 8 2}{cmd:end}{p_end} {p 4 8 2}{cmd:. label values class class}{p_end} {p 4 8 2}{cmd:. label def class 1 "first" 2 "second" 3 "third" 4 "crew" }{p_end} {p 4 8 2}{cmd:. label values adult adult}{p_end} {p 4 8 2}{cmd:. label def adult 0 "child" 1 "adult"}{p_end} {p 4 8 2}{cmd:. label values male male}{p_end} {p 4 8 2}{cmd:. label def male 0 "female" 1 "male"}{p_end} {p 4 8 2}{cmd:. catplot [aw=100*survived], over(adult, gap(*0.3) axis(noline)) over(male, gap(*0.8)) outergap(*.2) ///}{p_end} {p 4 8 2}{cmd:by(class, compact note("") col(1) subtitle(% survived from Titanic)) ///}{p_end} {p 4 8 2}{cmd:bar(1, blcolor(gs8) bfcolor(pink*.1)) blabel(bar, format(%4.1f) pos(base)) `trick' ///}{p_end} {p 4 8 2}{cmd:ysize(7) yla(none) ytitle("") ysc(noline) name(CAT16, replace)} {title:Author} {p 4 4 2}Nicholas J. Cox, Durham University{break} n.j.cox@durham.ac.uk {title:Acknowledgments} {p 4 4 2}The first version of {cmd:catplot} was written and revised in 2003 and 2004. At that time, Vince Wiggins provided very helpful comments, Fred Wolfe asked for sorting and David Schwappach provided feedback on limitations. During revision in 2010, Vince Wiggins and Ron{c a'}n Conroy made encouraging noises. {title:References} {p 4 8 2} Cox, N.J. 2003. st: -catplot- available for download from SSC. Statalist post 21 February. {browse "https://www.stata.com/statalist/archive/2003-02/msg00608.html":https://www.stata.com/statalist/archive/2003-02/msg00608.html} {p 4 8 2} Cox, N.J. 2004. Speaking Stata: Graphing categorical and compositional data. {it:Stata Journal} 4: 190{c -}215. {browse "https://journals.sagepub.com/doi/pdf/10.1177/1536867X0400400209":https://journals.sagepub.com/doi/pdf/10.1177/1536867X0400400209} {p 4 8 2} Cox, N.J. 2010. st: -catplot- revised on SSC. Statalist post 8 June. {browse "https://www.stata.com/statalist/archive/2010-06/msg00431.html":https://www.stata.com/statalist/archive/2010-06/msg00431.html} {p 4 8 2} Cox, N.J. 2016. Multiple bar charts in table form. {it:Stata Journal} 16: 491{c -}510. {browse "https://journals.sagepub.com/doi/pdf/10.1177/1536867X1601600214":https://journals.sagepub.com/doi/pdf/10.1177/1536867X1601600214} {p 4 8 2} Dawson, R.J.MacG. 1995. The "unusual episode" data revisited. {it:Journal of Statistics Education} 3(3). [{it:Titanic} data] {browse "https://jse.amstat.org/v3n3/datasets.dawson.html":https://jse.amstat.org/v3n3/datasets.dawson.html} {title:Also see} {p 4 8 2}On-line: help for {help graph_hbar:graph hbar}; {help graph_bar:graph bar}; {help graph_dot:graph dot}; {help histogram}; {help tabplot} (if installed)