Plot estimates with confidence limits
eclplot estimate_varname clmin_varname clmax_varname parmid_varname [if exp] [in range] [, horizontal eplottype(eplot_type) rplottype(rplot_type) nociforeground estopts([eplot_options] [, weight]) ciopts([rplot_options] [, weight]) supby(supby_varname [, supby_suboptions) estopts1([eplot_options] [, weight]) ... estopts15([eplot_options] [, weight]) ciopts1([rplot_options] [, weight]) ... ciopts15([rplot_options] [, weight]) plot(plot) nograph twoway_options ]
where estimate_varname, clmin_varname and clmax_varname are numeric variables containing parameter estimates, lower confidence limits, upper confidence limits, respectively, to be plotted on one axis, and parmid_varname is a parameter identity variable to determine the position of each confidence interval on the other axis. The twoway_options are as specified for twoway; see help for twoway_options.
Description
eclplot creates a plot of estimates with lower and upper confidence limits on one axis against another variable on the other axis. The estimates and lower and upper confidence limits are stored in three variables, with one observation per confidence interval plotted. Data sets with such variables may be created by the parmest package (downloadable from SSC), or by statsby or postfile in official Stata. The user has a choice of plotting the confidence intervals horizontally or vertically, a choice of estimate plot types for the estimates, and a choice of range plot types for the confidence intervals, and may also overlay the confidence interval plot with other plots using the plot option. In default, eclplot does not print a legend unless multiple superimposed confidence interval plots are requested, and has "sensible" settings for axis titles and labels (see help for title_option and axis_options). However, these defaults may be overridden, using the twoway_options.
Options
horizontal specifies that the confidence intervals must be plotted horizontally, with the estimates and confidence limits on the horizontal axis and the other variable on the vertical axis. In default, if horizontal is not specified, the confidence intervals are plotted vertically, with the estimates and confidence limits on the vertical axis and the other variable on the horizontal axis.
eplottype(eplot_type) specifies the estimate plot type used to plot the estimates. The value of this option may be any one of the twoway plot types scatter, connected, line, area, bar, spike, and dropline. If the eplottype() option is not specified, then it is set to scatter, and the estimates are drawn as symbols.
rplottype(rplot_type) specifies the range plot type used to plot the confidence intervals. The value of this option may be any one of the range plot types allowed by twoway, namely rarea, rbar, rspike, rcap, rcapsym, rscatter, rline, and rconnected. If the rplottype option is not specified, then it is set to rcap, and the confidence limits are drawn with capped spikes.
nociforeground specifies whether the confidence intervals are in the foreground (where they can overwrite the estimates) or in the background (where the estimates can overwrite them). If neither ciforeground nor nociforeground is specified, then a sensible default is decided as follows. First, the eplottype() option is assigned a group rank, which is 1 for scatter, connected and line, 2 for dropline and spike, 3 for bar, and 4 for area. Then, the eplottype() option is assigned a group rank, which is 1 for rscatter, rconnected and rline, 2 for rcapsym, rcap and rspike, 3 for rbar, and 4 for rarea. Then, the nociforeground option is set to ciforeground if the rplottype() group rank is equal to or less than the eplottype() group rank, and is set to nociforeground otherwise. The default rule can therefore be described as "symbols and connecting lines in front of spikes in front of bars in front of areas", and was chosen to minimize the probability of important information being hidden.
estopts([eplot_options] [, weight]) specifies any plot options for the plotting of the estimates. These options may be any of the options allowed for the estimate plot type specified by the eplottype() option. To find more about the options allowed by each estimate plot type, see help for twoway and for the individual plot types scatter, connected, line, area, bar, spike, and dropline. The optional weight is a weight specification, of the general form [weighttype=expression], where weighttype may be aweight, fweight or pweight, and expression is a Stata expression or variable name. If it is present, and if the user has also specified the eplottype() option as dropline, scatter or connected, then it specifies that the marker symbol sizes will be weighted by the value of the expression, which must be non-negative. The weight can be useful for creating Cochrane forest plots for meta-analyses, in which the marker symbol is often proportional to the study size.
ciopts([rplot_options] [, weight]) specifies any plot options for drawing the confidence limits. These options may be any of the options allowed for the range plot type specified by the rplottype() option, which may be any of the range plot options allowed by twoway, and defaults to rcap. For instance, the user may specify the width of the caps on each confidence limit. To find more about the options allowed by each range plot type, see help for twoway, for scatter, and for the individual range plot types rarea, rbar, rspike, rcap, rcapsym, rscatter, rline, and rconnected. The optional weight is a weight specification, of the general form [weighttype=expression], where weighttype may be aweight, fweight or pweight, and expression is a Stata expression or variable name. If it is present, and if the user has also specified the rplottype() option as rcapsym, rscatter or rconnected, then it specifies that the cap symbol sizes will be weighted by the value of the expression, which must be non-negative.
supby(supby_varname [, supby_suboptions]) specifies that multiple superimposed plots of estimates and confidence limits will be created, one for each value of the variable supby_varname, with distinct styles. There can be up to 15 superimposed plots. Unless the user specifies otherwise, a legend will be created, identifying each plot with a value of the variable supby_varname. The suboptions of the supby() option are listed below under Suboptions of the supby() option.
estopts1([eplot_options] [, weight]) ... estopts15([eplot_options] [, weight]) are only used if a supby() option is specified. They specify plot options specific to the individual superimposed estimates plots, additional to the plot options specified for all estimate plots by the estopts() option. If the weight is specified, then it overrides any weight specified by the estopts() option.
ciopts1([rplot_options] [, weight]) ... ciopts15([rplot_options] [, weight]) are only used if a supby() option is specified. They specify plot options specific to the individual superimposed confidence limit plots, additional to the plot options specified for all confidence limit plots by the ciopts() option. If the weight is specified, then it overrides any weight specified by the ciopts() option.
plot(plot) provides a way to add other plots to the generated graph; see help for plot_option.
nograph specifies that no graph will be drawn. This option is useful if the user is building a twoway command from subcommands returned in r() by eclplot (see below).
twoway_options are any of the options documented in help for twoway_options. These include options for titling the graph (see help for title_options), options for saving the graph to disk (see help for saving_option), the legend() option (see help for legend_option), and the by() option (see help for by_option). In default, eclplot sets the legend() option to legend(off) (implying no legend) if the supby() option is not specified, and sets the contents of the legend to contain a key for each value of the supby() variable if supby() is specified. If the user specifies a by() option without a legend() suboption, then the legend() suboption is set by default to legend(off) if supby() is not specified, and to legend(on) if supby() is specified. Therefore, in default, eclplot draws a legend if and only if the user specifies the supby() option. THese defaults add to and/or override any defaults set by the graphics scheme currently in use, and can in turn be added to and/or overridden using options set by the user.
Suboptions of the supby() option
The supby() option has the syntax
supby( supby_varname [ , missing truncate(num) spaceby(num) offset(num) ] )
The suboptions are as follows:
missing specifies that superimposed plots will be produced for missing values of the variable supby_varname.
truncate(num) specifies that, in the legend, the values of the variable supby_varname will be truncated to the length num.
spaceby(num) specifies a number, in units of the parameter identification variable parmid_varname, by which the superimposed plots corresponding to successive values of the variable supby_varname will be spaced on the axis corresponding to the parameter identification variable. This option is used to prevent multiple superimposed plots from obscuring each other. If spaceby() is not specified, then it is set to zero, implying no spacing.
offset(num) specifies a number, in units of the parameter identification variable parmid_varname, by which the superimposed plot corresponding to the first value of the variable supby_varname will be displaced from the value implied by the variable parmid_varname. This number may be positive or negative. If offset() is not specified, then it is set to zero, implying that the plot corresponding to the first value of the variable supby_varname will not be displaced from its true value. In general, the positions of the plots on the axis corresponding to the parameter identification variable parmid_varname is given by the formula
parmpos = parmid_varname + offset + spaceby*(supby_seqnum-1)
where parmpos is the position of the plot on the axis, offset is the value of the offset() suboption, spaceby is the value of the spaceby{} option, and supby_seqnum is the ascending sequential order of the value of the variable supby_varname corresponding to the plot.
Saved results
eclplot returns the following macro results in r():
r(plot) Contents of plot() option r(allplots) Sequence of twoway plot subcommands generated by eclplot r(ifin) if and/or in qualifiers r(twowayopts) twoway options generated by eclplot r(cmd) twoway command generated by eclplot
eclplot works by constructing a twoway command, which it then executes, unless nograph is specified. Users can use the saved twoway plot subcommands, qualifiers and options to build twoway commands of their own. The result r(allplots) contains a sequence of twoway plot subcommands separated by ||. The result r(cmd) contains a command, which can be specified by the macro expression
twoway `r(allplots)' || `r(ifin)' , `r(twowayopts)'
and which is executed by eclplot to produce the plot. Note that, if the supby() option is specified, then r(allplots) will contain temporary variable names, belonging to temporary variables used within eclplot, and therefore cannot be used to build new twoway plot commands.
Remarks
eclplot plots confidence intervals against another variable. More information about eclplot, and about the creation of datasets for input to eclplot, can be found in Newson (2003), Newson (2004) and Newson (2005).
Data sets used by eclplot may be created manually using a spreadsheet. However, they may also be created by the parmest package, downloadable from SSC. The parmest package stores results from an estimation command as a data set. (See also help for _estimates or ereturn.) It creates a data set with one observation per model parameter, or one observation per parameter per by-group, and data on parameter names, estimates, confidence limits, and other parameter attributes. The other variable, against which the confidence intervals are plotted, may be any numeric variable, but is often a categorical factor included as a predictor in the model fitted by the estimation command using the xi utility. To reconstruct such a categorical factor in a parmest output data set, the user may use the factext and descsave packages, also downloadable from SSC. Alternatively, the user may use the parmest package, possibly with the label option, and then use the sencode package (also downloadable from SSC) to encode the parm or label string variable in the output data set to a numeric variable, which may be plotted by eclplot against the estimates and confidence limits.
Under Windows 98/ME, the default eclplot dialog should not be used, as it requires too much memory. Windows 98 users who want to use dialogs with eclplot should therefore use the small eclplot dialog for Windows 98/ME users. (See help for smalldlg for technical details on small dialogs for Windows 98/ME users.)
This version of eclplot is written in Stata Version 9. However, Stata 8 users can download the Stata 8 version of eclplot from Roger Newson's website at http://www.imperial.ac.uk/nhli/r.newson.
Under Stata 7, the present author usually plotted confidence intervals using either the Stata 7 graph command (with the connect() option) or Nicholas J. Cox's hplot package, downloadable from SSC. The hplot package is a very comprehensive package for general horizontal plots. The eclplot package, on its own, cannot entirely supersede hplot, but the two packages perform overlapping sets of functions, and may possibly be viewed as being complementary.
Examples
The following examples use the auto data, shipped with official Stata (see help for sysuse). A regression model is fitted for the Y-variable mpg (miles per gallon), predicted by the categorical variables rep78 (repair record) and foreign. The parmby command of the parmest package is used to create an output data set with one observation per parameter and data on estimates and confidence limits. The sencode package is used to create a numeric variable (with value labels) encoding the model parameter corresponding to each observation. Finally, eclplot is used to display the confidence intervals. The first example uses parameter names to label a vertical confidence interval plot. The second example uses parameter labels to label a horizontal confidence interval plot. The third example uses parameter labels to label a horizontal "detonator plot".
. sysuse auto,clear . parmby "xi:regress mpg i.foreign i.rep78", label norestore . sencode parm,gene(parmid) . eclplot estimate min95 max95 parmid
. sysuse auto,clear . parmby "xi:regress mpg i.foreign i.rep78", label norestore . sencode label,gene(parmlab) . eclplot estimate min95 max95 parmlab, hori
. sysuse auto,clear . parmby "xi:regress mpg i.foreign i.rep78", label norestore . sencode label, gene(parmlab) . eclplot estimate min95 max95 parmlab, hori eplot(bar)
The following advanced example fits the same model to the same data with a different parameterization, and uses the descsave and factext packages as well as parmby. It creates two confidence interval plots. The first plot displays two confidence intervals for the mean mileage levels expected for cars from the USA and from elsewhere with rep78==0. The second plot displays confidence intervals for the difference in mileage expected for each non-zero level of rep78, with a dotted reference line on the horizontal axis, indicating the difference of zero expected if rep78 has no independent effect on mpg. The plots demonstrate the use of the options estopts and ciopts and the use of the twoway_options.
. sysuse auto,clear . tab foreign,gene(orig_) nolabel . tempfile tf0 . descsave foreign rep78,do(`tf0') . parmby "xi:regress mpg orig_* i.rep78,noconst",label norestore . factext,do(`tf0') . eclplot estimate min95 max95 foreign,hori estopts(msize(vlarge)) ciopts(msize(vlarge)) yscale(range(-1 2)) ylab(0 1) xtitle("Mean mileage per gallon") . eclplot estimate min95 max95 rep78,hori estopts(msize(vlarge)) ciopts(msize(vlarge)) yscale(range(1 6)) xline(0,lpattern(dot)) xtitle("Mean difference (miles per gallon)")
The following example also uses parmby and sencode. It demonstrates the use of the supby() option of eclplot to produce multiple superimposed detonator plots.
. sysuse auto, clear . tabulate rep78, gene(rep78_) . parmby "regress mpg rep78_*, noconst", by(foreign) label norestore . sencode label if parm!="_cons", gene(parmlab) . lab var parmlab "Repair record 1978" . lab var estimate "Mean mileage (mpg)" . eclplot estimate min95 max95 parmlab, eplot(bar) estopts(barwidth(0.25)) supby(foreign, spaceby(0.25)) xscale(range(0 6)) xlabel(1(1)5, angle(30)) . more . eclplot estimate min95 max95 parmlab, eplot(bar) ciopts(blcolor(black)) estopts(barwidth(0.25)) estopts1(bcolor(red)) estopts2(bcolor(blue)) supby(foreign, spaceby(0.25)) xscale(range(0 6)) xlabel(1(1)5, angle(30)) . more
Acknowledgements
I would like to thank Jean Marie Linhart and James Hassell of StataCorp for their very helpful advice on writing the dialogs for eclplot, and also Vince Wiggins of StataCorp for his very helpful advice on writing eclplot.
Author
Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Newson, R. 2003. Confidence intervals and p-values for delivery to the end user. The Stata Journal 3(3): 245-269. Pre-publication draft downloadable from Roger Newson's website at http://www.imperial.ac.uk/nhli/r.newson.
Newson, R. 2004. From datasets to resultssets in Stata. Presented at the 10th United Kingdom Stata Users' Group Meeting, London, 29 June, 2004. Also downloadable from Roger Newson's website at http://www.imperial.ac.uk/nhli/r.newson.
Newson, R. 2005. Generalized confidence interval plots using commands or dialogs. Presented at the 11th United Kingdom Stata Users' Group Meeting, London, 17 May, 2005. Also downloadable from Roger Newson's website at http://www.imperial.ac.uk/nhli/r.newson.
Also see
Manual: [G] graph intro, [G] graph twoway On-line: help for twoway, graph, graph_intro, graph7 help for parmest, sencode, factext, descsave, hplot if installed