```-------------------------------------------------------------------------------
help for skewplot
-------------------------------------------------------------------------------

Skewness plots

skewplot varname [if exp] [in range] [, skew by(byvar) missing
scatter_options]

skewplot varlist [if exp] [in range] [, skew scatter_options]

Description

skewplot produces by default a plot of the midsummary versus the spread
for the variables in varlist, also known as the mid versus spread plot.
With the skew option, it produces a plot of the skewness function versus
the spread function. Such plots convey both the general character and the
fine structure of the symmetry or skewness of data sets, and can be used
to compare distributions or to assess whether transformations are
necessary or effective.

Remarks

Order n data values for a variable x and label them such that x_(1) <=
... <= x_(n). In a perfectly symmetric set of data, the midsummaries

(x_(1) + x_(n)) / 2,
(x_(2) + x_(n - 1)) / 2,
etc.

would all be identical, and equal to the median. A plot of each
midsummary

(x_(i) + x_(n - i + 1)) / 2

versus each difference or spread or quasi-range

x_(n - i + 1) - x_(i)

would yield a horizontal straight line. Conversely, skewness in sets of
data will be reflected by departures from horizontality.

Apart from the divisor of 2, this plot was suggested by J.W. Tukey (Wilk
Fisher (1983). The form used here and the name `mid versus spread plot'
are found in Hoaglin (1985). It is usual to plot only that half of the
sample results for which spread is >= 0.

The skew option produces an alternative form promoted by Benjamini and
Krieger (1996, 1999). The identity

x_(n - i + 1) = median

+ (x_(n - i + 1) - x_(i)) / 2

+ (x_(i) + x_(n - i + 1) - 2 * median) / 2

= median + spread function + skewness function

for x_(i) in the lower half of the sample leads to a plot of the skewness
plot. Note that the skewness function is midsummary - median, and will be
constant and zero for a perfectly symmetric distribution, and that the

x_(i) + x_(n - i + 1) - 2 * median
----------------------------------
x_(n - i + 1) - x_(i)

is a measure of skewness (in the traditional sense) originally suggested
for quartiles by Bowley (1902) and generalised to this form by David and
Johnson (1956). It varies between -1 and 1. A similar general measure was
used by Parzen (1979). Graphically this measure is the slope of the line
connecting (0,0) and each data point.

See Benjamini and Krieger (1996, 1999) and Groeneveld (1998) for concise
reviews tracing such ideas from late 19th century antecedents to recent
work and further details on the interpretation of the skewness versus

Options

skew specifies the skewness versus spread plot, not the default mid

by(byvar) specifies that calculations are to be carried out separately
for each group defined by byvar. by() is allowed only with a single
varname.

missing, used only with by(), permits the use of non-missing values of
varname corresponding to missing values for the variable named by
by(). The default is to ignore such values.

scatter_options refers to options of graph twoway scatter.

Examples

. webuse citytemp
. describe
. skewplot *dd
. skewplot *dd, skew
. skewplot cooldd, by(region)
. skewplot cooldd, by(region) ms(i i i i) c(l l l l)
. skewplot temp*

References

Benjamini, Y. and Krieger, A.M. 1996. Concepts and measures for skewness
with data-analytic implications. Canadian Journal of Statistics 24:
131-140.

Benjamini, Y. and Krieger, A.M. 1999. Skewness - concepts and measures.
In Kotz, S., Read, C.B. and Banks, D.L. (eds) Encyclopedia of
Statistical Sciences Update Volume 3. New York: John Wiley, 663-670.

Bowley, A.L. 1902. Elements of statistics. London: P.S. King.  (2nd
edition: see p.331.)

David, F.N. and Johnson, N.L. 1956. Some tests of significance with
ordered variables. Journal, Royal Statistical Society B 18: 1-20.

Fisher, N.I. 1983. Graphical methods in nonparametric statistics: a
review and annotated bibliography. International Statistical Review
51: 25-58.

Gnanadesikan, R. 1977 (2nd edition 1997).  Methods for statistical data
analysis of multivariate observations.  New York: John Wiley.

Groeneveld, R. 1998. Skewness, Bowley's measures of. In Kotz, S., Read,
C.B. and Banks, D.L. (eds) Encyclopedia of Statistical Sciences
Update Volume 2. New York: John Wiley, 619-621.

Hoaglin, D.C. 1985. Using quantiles to study shape. In Hoaglin, D.C.,
Mosteller, F. and Tukey, J.W. (eds) Exploring data tables, trends,
and shapes. New York: John Wiley, 417-460.

Parzen, E. 1979. Nonparametric statistical data modeling.  Journal,
American Statistical Association 74, 105-131.

Wilk, M.B. and Gnanadesikan, R. 1968. Probability plotting methods for
the analysis of data. Biometrika 55: 1-17.

Author

Nicholas J. Cox, University of Durham
n.j.cox@durham.ac.uk

Acknowledgments

Richard Groeneveld tracked down the Bowley reference.

Also see

On-line: graph, symplot
Manual: [G] graph, [R] diagnostic plots

```