{*.* !22 January 2010}  help sixplot
-------------------------------------------------------------------------------
    Title

Syntax

sixplotvarlist [if] [in]

Description

sixplot displays six diagnostic and descriptive graphs for a single variable formatted as a 2 row, 3 column array. The arguments are varname and sequence variable. If no sequence variable is named, the program plots varname versus the sequence the data are stored in.

The plot in the (1,1) position is a sequence plot of varname versus the sequence.

The plot in the (1,2) position is a residual versus fitted plot of the regression of varname versus sequence.

The plot in the (1,3) position is a boxplot of varname.

The plot in the (2,1) position is a first difference plot of varname versus sequence.

The plot in the (2,2) position is a histogram of varname.

The plot in the (2,3) position is a normal quantile plot of varname.

The default is to conduct these analyses for all observations in the data set in the order they are recorded. If you sort the data, the analysis will be conducted on that order.

The sequence plot allows you to examine the data for drift over the sequence (presumably time). This graph also displays the linear fit line and a 95% forecast interval. Observations outside the shaded line are candidates for inspection as outliers. If you plot more than 300 observations, the plot is blurred and I suggest you use batches of 300.

The rvfplot displays the residuals versus the fitted values and allows you to check for outliers and patterns such as unequal variance over fitted values. Clear patterns suggest you should look closely at your model. It also displays limits of 2*rmse as a guide.

The boxplot shows quartiles and outliers.

The first difference plot checks for changes in the data.

The histogram provides a picture of the distribution of varname. It has 10 bins, which you may wish to change in further analysis. It should be roughly symmetric if the data are normal. Do not get overly concerned with apparent departures from symmetry if your data set is small.

The normal quantile plot gives a graphical diagnostic of normality. If the plot suggests non-normality, there may be concern about the validity of procedures such as confidence intervals.

Caution: If the data set is large, the sequence plot and the first difference plot may be blurred and difficult to interpret. We suggest examining the data in batches of 300 or so using the in 1/300 option. Sixplot does not superimpose a normal plot on the histogram.

Examples

----------------------------------------------------------------------- Setup . sysuse uslifeexp.dta . sixplot le_male . sixplot le_female

The data set gives life expectancy by sex and race from 1900 to 1999. The above commands provide a sixplot for these years.

Setup . sysuse nlsw88.dta . sixplot wage

This data has over 2000 observations and blurs the information on the plots. There is no obvious time relation here.

. sixplot wage in 1/300

The "in" restriction can be repeated as 301/600, etc.

Notes

These plots were cited in Good and Hardin's book as the fourplot. I have added the rvfplot and boxplot. This seems to have originated in the Engineering Statistics Handbook section 4.4.5.3, from NIST (available online from www.itl.nist.gov/div898/handbook/)

Author Peter. A. Lachenbruch, Oregon State University, Corvallis, peter.lachenbruch@oregonstate.edu

Acknowledgement I thank Nick Cox and Vince Wiggns who made several useful and important suggestions to improve the ado file.

References

Good, P.I. and Hardin, J. W. (2009) Common Errors in Statistics (and how to avoid them} New York: Wiley

NIST Engineering Statistics Handbook downloaded Jan 4, 2010 - see section 4.4.5.3