-------------------------------------------------------------------------------
help for bandplot
-------------------------------------------------------------------------------

Plot summary statistics of responses for bands of predictors

bandplot yvar xvars [weight] [if exp] [in range] [ categorical(varlist) continuous(varlist) dta(filename [, save_options]) missing nquantiles(#) statistics(stat [stat ... ]) xweighted

bandopts(over_subopts) number recast(hbar | bar) xopts(over_subopts) xvarlabels yvarlabels graph_options ]

bandplot (yvars) xvars [weight] [if exp] [in range] [ , categorical(varlist) continuous(varlist) dta(filename [, save_options]) missing nquantiles(#) statistics(stat) xweighted

bandopts(over_subopts) number recast(hbar | bar) xopts(over_subopts) xvarlabels yvarlabels graph_options ]

aweights and fweights may be specified.

Description

bandplot produces plots showing summary statistics of one or more response variables for bands of one or more predictor variables.

By default, bandplot is a wrapper for graph dot. Optionally, bandplot can be specified to be a wrapper for graph hbar or graph bar.

There are two syntaxes. In the first, bandplot takes the first variable in a varlist to be a response variable yvar, which is summarised for observations in each of various bands of the other predictor variables xvars. In the second, bandplot takes two or more variables specified first within parentheses () as being response variables yvars; all subsequent variables are then taken to be predictors xvars.

By default, bandplot shows means. Any other statistics produced by summarize may be specified. Note that with two or more yvars only one statistic may be shown.

Bands are to be interpreted as follows. By default numeric variables are divided into quantile-based bands. (By default in turn quartile-based bands are used.) Alternatively, variables can be declared explicitly or implicitly as categorical, in which case the distinct values of each such variable are used as bands. Any string variables specified as xvars are treated as categorical, regardless of any other specifications. No string variables may be specified as yvars.

Remarks

bandplot does not draw plots based on coloured bands. If your search for those or similar plots has led you here, check out twoway rarea. The name used here is not standard, but nor apparently is any other name used routinely for what is plotted by this command.

The idea of showing summaries of responses for bands of one or more predictors evidently has a long history, which is difficult to trace. Plots summarizing polls or elections in terms of votes for major parties or candidates broken down separately by categorical variables such as sex, age, race or region are common. The particular choices here were inspired largely by examples given by Harrell (2001). See his pp. 126, 303f, 314f, 336.

What bandplot offers is perhaps best explained by a direct comparison with graph dot. There are three major differences and several minor differences. (Similar comments apply to graph bar or graph hbar if either is invoked.)

First, consider an example with the auto data. Compare

. graph dot (mean) mpg, over(foreign) over(rep78)

and

. bandplot mpg foreign rep78, cat(foreign rep78)

The graph dot command shows means of mpg for the cross-combinations of foreign and rep78 occurring in the data, i.e. one variable's classes are nested inside the other's. The bandplot command shows means of mpg separately for classes of each variable.

Second, bandplot supports quantile-based bands on the fly. You could show those with graph dot, but you would need to create any variables classed into bands first, say by using xtile.

Third, graph dot typically carries out a temporary reduction of the dataset, but bandplot carries out its own reduction and passes the results to graph dot for plotting asis. Various options of graph dot are thus irrelevant or inappropriate so far as bandplot is concerned. Further, variables in the dataset are not accessible to the graph dot command.

bandplot does not offer any rounding or coarsening option such as might be used to bin numeric variables into equal intervals. You would need to do that first. Advice is to use clonevar to create a copy of a variable (notably, keeping the variable label) and then to replace that with a binned version using a function such as round(), floor() or ceil(). Then declare such variables to bandplot as categorical [sic].

Although bandplot ignores missing values on the yvars, the structure of such missing values may be explored by creating an indicator for missingness using missing().

Options

Statistics options

categorical() specifies the names of variables to be treated as categorical, so that the bands of each are the distinct values of each. All string variables are treated as categorical, regardless of any explicit or implicit specification.

continuous() specifies the names of variables to be treated as continuous, meaning here only that tbe bands of each are based on quantiles calculated from the data.

If categorical() is specified but not continuous(), then continuous variables are those not declared as categorical; and conversely. If both categorical() and continuous() are specified, note that all variables must be classified one way or the other. If neither categorical() nor continuous() is specified, all numeric variables are treated as continuous. It will typically be easiest to specify which kind of variable is in the minority. The convention here thus resembles that used by anova.

dta(filename [, save_options]) specifies that the dataset used on the fly for the graph is to be saved as a Stata data file filename using save. save_options are options of save. The non-standard option name reflects the fact that users may wish to use the graph, saving() option to save their graph to a file.

missing specifies that missing values of the xvars be used as separate bands for summary. Missing values of the yvars are ignored, come what may. Note that specifying this option has implications for which observations are included in the plot, as observations with missing values on any of the xvars are by default excluded from the summarized and plotted data. The missing option overrides that default.

nquantiles() specifies the number of quantile bands to be used for continuous variables. The default is 4. Quantiles are calculated using _pctile. Brackets and parentheses are used on the plot to indicate bands precisely. Quantile bands take the form [,) [,) ... [,] i.e. values equal to calculated quantiles are allocated to the higher band; values equal to the minimum and maximum are necessarily allocated to the lowest and highest bands respectively. Users unfamiliar with this notation should note that [a, b) means a <= values < b and [a, b] means a <= values <= b. Note that the binning convention here differs from that applied by xtile.

statistics() indicates one or more statistics as calculated by summarize. Names must be used exactly as listed in the help for summarize indicating its saved results. For example, use sd not SD and N not n. If not specified, the default is to show means. If two or more yvars are specified, only one statistic may be specified.

xweighted indicates that any weights specified are to be used in determining the quantiles of continuous xvars. By default, weights are used only in summarizing the yvars. This is a rarely used option.

Graphics options

bandopts() are over_subopts of graph dot (or graph hbar or graph bar, as the case may be) used to specify the appearance of the band information (the inside categorical axis).

number specifies that the (unweighted) number of observations in each band be shown on the graph. The number will appear as part of the band label.

recast(hbar | bar) specifies that either graph hbar or graph bar be used to plot results rather than graph dot. The name is inspired by an option of graph twoway: see advanced_options. Otherwise there is no resemblance, so that in particular this option can not be used to recast the graph to a twoway type.

xopts() are over_subopts of graph dot (or graph hbar or graph bar, as the case may be) used to specify the appearance of the xvars information (the outside categorical axis).

xvarlabels specifies that the variable labels of the xvars are to be shown on the outer categorical axis. By default, variable names are shown, to save space.

yvarlabels specifies that the variable labels of two or more yvars specified are to be shown in the legend. By default, variable names are shown, to save space.

graph_options are other options of graph dot (or graph hbar or graph bar, as the case may be). See also Remarks above.

Examples

. sysuse auto, clear . bandplot mpg foreign rep78 weight, cat(for rep) . bandplot mpg foreign rep78 weight, cont(weight) . bandplot mpg foreign rep78 weight, cont(weight) missing . bandplot mpg foreign rep78 weight, cont(weight) missing dta(results) . bandplot mpg foreign rep78 weight, cont(weight) missing nq(8) dta(results, replace) . bandplot mpg foreign rep78 weight, cont(weight) missing nq(8) s(mean p50) legend(order(1 "mean" 2 "median")) marker(1, ms(Sh)) marker(2, ms(Dh)) . bandplot (trunk turn) foreign rep78 weight, cont(weight) yvarlabels . bandplot (trunk turn) foreign rep78 weight, cont(weight) yvarlabels xvarlabels marker(1, ms(Sh)) marker(2, ms(Dh)) . bandplot (trunk turn) foreign rep78 weight, cont(weight) number yvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record" "1978" "' 3 `" "Weight" "(lb)" "')) . bandplot (trunk turn) foreign rep78 weight, cont(weight) number yvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record" "1978" "' 3 `" "Weight" "(lb)" "')) recast(hbar) . bandplot (trunk turn) foreign rep78 weight, cont(weight) number yvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record" "1978" "' 3 `" "Weight" "(lb)" "')) bandopts(label(labsize(*0.8)))

Author

Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk

Acknowledgments

Marcello Pagano and Fred Wolfe kindly notified me of small bugs.

References

Harrell, F.E. 2001. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.

Also see

On-line: help for graph dot, help for graph hbar, help for graph bar, help for summarize, help for _pctile