------------------------------------------------------------------------------- help forbandplot-------------------------------------------------------------------------------

Plot summary statistics of responses for bands of predictors

bandplotyvarxvars[weight] [ifexp] [inrange] [categorical(varlist)continuous(varlist)dta(filename[,save_options])missingnquantiles(#)statistics(stat[stat... ])xweighted

bandopts(over_subopts)numberrecast(hbar|bar)xopts(over_subopts)xvarlabelsyvarlabelsgraph_options]

bandplot(yvars)xvars[weight] [ifexp] [inrange] [,categorical(varlist)continuous(varlist)dta(filename[,save_options])missingnquantiles(#)statistics(stat)xweighted

bandopts(over_subopts)numberrecast(hbar|bar)xopts(over_subopts)xvarlabelsyvarlabelsgraph_options]

aweights andfweights may be specified.

Description

bandplotproduces plots showing summary statistics of one or more response variables for bands of one or more predictor variables.By default,

bandplotis a wrapper for graph dot. Optionally,bandplotcan be specified to be a wrapper for graph hbar or graph bar.There are two syntaxes. In the first,

bandplottakes the first variable in avarlistto be a response variableyvar, which is summarised for observations in each of various bands of the other predictor variablesxvars. In the second,bandplottakes two or more variables specified first within parentheses()as being response variablesyvars; all subsequent variables are then taken to be predictorsxvars.By default,

bandplotshows means. Any other statistics produced by summarize may be specified. Note that with two or moreyvarsonly one statistic may be shown.Bands are to be interpreted as follows. By default numeric variables are divided into quantile-based bands. (By default in turn quartile-based bands are used.) Alternatively, variables can be declared explicitly or implicitly as categorical, in which case the distinct values of each such variable are used as bands. Any string variables specified as

xvarsare treated as categorical, regardless of any other specifications. No string variables may be specified asyvars.

Remarks

bandplotdoes not draw plots based on coloured bands. If your search for those or similar plots has led you here, check out twoway rarea. The name used here is not standard, but nor apparently is any other name used routinely for what is plotted by this command.The idea of showing summaries of responses for bands of one or more predictors evidently has a long history, which is difficult to trace. Plots summarizing polls or elections in terms of votes for major parties or candidates broken down separately by categorical variables such as sex, age, race or region are common. The particular choices here were inspired largely by examples given by Harrell (2001). See his pp. 126, 303f, 314f, 336.

What

bandplotoffers is perhaps best explained by a direct comparison with graph dot. There are three major differences and several minor differences. (Similar comments apply to graph bar or graph hbar if either is invoked.)First, consider an example with the auto data. Compare

. graph dot (mean) mpg, over(foreign) over(rep78)and

. bandplot mpg foreign rep78, cat(foreign rep78)The

graph dotcommand shows means ofmpgfor the cross-combinations offoreignandrep78occurring in the data, i.e. one variable's classes are nested inside the other's. Thebandplotcommand shows means ofmpgseparately for classes of each variable.Second,

bandplotsupports quantile-based bands on the fly. You could show those withgraph dot, but you would need to create any variables classed into bands first, say by using xtile.Third,

graph dottypically carries out a temporary reduction of the dataset, butbandplotcarries out its own reduction and passes the results tograph dotfor plottingasis. Various options ofgraph dotare thus irrelevant or inappropriate so far asbandplotis concerned. Further, variables in the dataset are not accessible to thegraph dotcommand.

bandplotdoes not offer any rounding or coarsening option such as might be used to bin numeric variables into equal intervals. You would need to do that first. Advice is to use clonevar to create a copy of a variable (notably, keeping the variable label) and then toreplacethat with a binned version using a function such as round(), floor() or ceil(). Then declare such variables tobandplotas categorical [sic].Although

bandplotignores missing values on theyvars, the structure of such missing values may be explored by creating an indicator for missingness using missing().

Options

Statistics options

categorical()specifies the names of variables to be treated as categorical, so that the bands of each are the distinct values of each. All string variables are treated as categorical, regardless of any explicit or implicit specification.

continuous()specifies the names of variables to be treated as continuous, meaning here only that tbe bands of each are based on quantiles calculated from the data.If

categorical()is specified but notcontinuous(), then continuous variables are those not declared as categorical; and conversely. If bothcategorical()andcontinuous()are specified, note that all variables must be classified one way or the other. If neithercategorical()norcontinuous()is specified, all numeric variables are treated as continuous. It will typically be easiest to specify which kind of variable is in the minority. The convention here thus resembles that used by anova.

dta(filename[,save_options])specifies that the dataset used on the fly for the graph is to be saved as a Stata data filefilenameusing save.save_optionsare options of save. The non-standard option name reflects the fact that users may wish to use thegraph, saving()option to save their graph to a file.

missingspecifies that missing values of thexvarsbe used as separate bands for summary. Missing values of theyvarsare ignored, come what may. Note that specifying this option has implications for which observations are included in the plot, as observations with missing values on any of thexvarsare by default excluded from the summarized and plotted data. Themissingoption overrides that default.

nquantiles()specifies the number of quantile bands to be used for continuous variables. The default is 4. Quantiles are calculated using _pctile. Brackets and parentheses are used on the plot to indicate bands precisely. Quantile bands take the form [,) [,) ... [,] i.e. values equal to calculated quantiles are allocated to the higher band; values equal to the minimum and maximum are necessarily allocated to the lowest and highest bands respectively. Users unfamiliar with this notation should note that [a, b) means a <= values < b and [a, b] means a <= values <= b. Note that the binning convention here differs from that applied by xtile.

statistics()indicates one or more statistics as calculated by summarize. Names must be used exactly as listed in the help forsummarizeindicating its saved results. For example, usesdnotSDandNnotn. If not specified, the default is to show means. If two or moreyvarsare specified, only one statistic may be specified.

xweightedindicates that any weights specified are to be used in determining the quantiles of continuousxvars. By default, weights are used only in summarizing theyvars. This is a rarely used option.

Graphics options

bandopts()areover_suboptsof graph dot (or graph hbar or graph bar, as the case may be) used to specify the appearance of the band information (the inside categorical axis).

numberspecifies that the (unweighted) number of observations in each band be shown on the graph. The number will appear as part of the band label.

recast(hbar|bar)specifies that either graph hbar or graph bar be used to plot results rather than graph dot. The name is inspired by an option of graph twoway: see advanced_options. Otherwise there is no resemblance, so that in particular this option can not be used to recast the graph to atwowaytype.

xopts()areover_suboptsof graph dot (or graph hbar or graph bar, as the case may be) used to specify the appearance of thexvarsinformation (the outside categorical axis).

xvarlabelsspecifies that the variable labels of thexvarsare to be shown on the outer categorical axis. By default, variable names are shown, to save space.

yvarlabelsspecifies that the variable labels of two or moreyvarsspecified are to be shown in the legend. By default, variable names are shown, to save space.

graph_optionsare other options of graph dot (or graph hbar or graph bar, as the case may be). See alsoRemarksabove.

Examples

. sysuse auto, clear. bandplot mpg foreign rep78 weight, cat(for rep). bandplot mpg foreign rep78 weight, cont(weight). bandplot mpg foreign rep78 weight, cont(weight) missing. bandplot mpg foreign rep78 weight, cont(weight) missing dta(results). bandplot mpg foreign rep78 weight, cont(weight) missing nq(8)dta(results, replace). bandplot mpg foreign rep78 weight, cont(weight) missing nq(8) s(meanp50) legend(order(1 "mean" 2 "median")) marker(1, ms(Sh)) marker(2,ms(Dh)). bandplot (trunk turn) foreign rep78 weight, cont(weight) yvarlabels. bandplot (trunk turn) foreign rep78 weight, cont(weight) yvarlabelsxvarlabels marker(1, ms(Sh)) marker(2, ms(Dh)). bandplot (trunk turn) foreign rep78 weight, cont(weight) numberyvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record""1978" "' 3 `" "Weight" "(lb)" "')). bandplot (trunk turn) foreign rep78 weight, cont(weight) numberyvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record""1978" "' 3 `" "Weight" "(lb)" "')) recast(hbar). bandplot (trunk turn) foreign rep78 weight, cont(weight) numberyvarlabels xopts(relabel(1 `" "Car" "type" "' 2 `" "Repair" "record""1978" "' 3 `" "Weight" "(lb)" "')) bandopts(label(labsize(*0.8)))

AuthorNicholas J. Cox, Durham University n.j.cox@durham.ac.uk

AcknowledgmentsMarcello Pagano and Fred Wolfe kindly notified me of small bugs.

ReferencesHarrell, F.E. 2001.

Regression Modeling Strategies: With Applications toLinear Models, Logistic Regression, and Survival Analysis.New York: Springer.

Also seeOn-line: help for graph dot, help for graph hbar, help for graph bar, help for summarize, help for _pctile