------------------------------------------------------------------------------- help forskewplot-------------------------------------------------------------------------------

Skewness plots

skewplotvarname[ifexp] [inrange] [,skewby(byvar)missingscatter_options]

skewplotvarlist[ifexp] [inrange] [,skewscatter_options]

Description

skewplotproduces by default a plot of the midsummary versus the spread for the variables invarlist, also known as the mid versus spread plot. With theskewoption, it produces a plot of the skewness function versus the spread function. Such plots convey both the general character and the fine structure of the symmetry or skewness of data sets, and can be used to compare distributions or to assess whether transformations are necessary or effective.

RemarksOrder

ndata values for a variablexand label them such thatx_(1) <= ... <=x_(n). In a perfectly symmetric set of data, the midsummaries(

x_(1) +x_(n)) / 2, (x_(2) +x_(n- 1)) / 2, etc.would all be identical, and equal to the median. A plot of each midsummary

(

x_(i) +x_(n-i+ 1)) / 2versus each difference or spread or quasi-range

x_(n-i+ 1) -x_(i)would yield a horizontal straight line. Conversely, skewness in sets of data will be reflected by departures from horizontality.

Apart from the divisor of 2, this plot was suggested by J.W. Tukey (Wilk and Gnanadesikan 1968). See also Gnanadesikan (1977 or 1997, Ch.6.2) or Fisher (1983). The form used here and the name `mid versus spread plot' are found in Hoaglin (1985). It is usual to plot only that half of the sample results for which spread is >= 0.

The

skewoption produces an alternative form promoted by Benjamini and Krieger (1996, 1999). The identity

x_(n-i+ 1) = median+ (

x_(n-i+ 1) - x_(i)) / 2+ (

x_(i) +x_(n-i+ 1) - 2 * median) / 2= median + spread function + skewness function

for

x_(i) in the lower half of the sample leads to a plot of the skewness function versus the spread function, known as the skewness versus spread plot. Note that the skewness function is midsummary - median, and will be constant and zero for a perfectly symmetric distribution, and that the spread function is half the spread of the mid versus spread plot.In addition, the ratio of the skewness and spread functions or

x_(i) +x_(n-i+ 1) - 2 * median ----------------------------------x_(n-i+ 1) -x_(i)is a measure of skewness (in the traditional sense) originally suggested for quartiles by Bowley (1902) and generalised to this form by David and Johnson (1956). It varies between -1 and 1. A similar general measure was used by Parzen (1979). Graphically this measure is the slope of the line connecting (0,0) and each data point.

See Benjamini and Krieger (1996, 1999) and Groeneveld (1998) for concise reviews tracing such ideas from late 19th century antecedents to recent work and further details on the interpretation of the skewness versus spread plot.

Options

skewspecifies the skewness versus spread plot, not the default mid versus spread plot.

by(byvar)specifies that calculations are to be carried out separately for each group defined bybyvar.by()is allowed only with a singlevarname.

missing, used only withby(), permits the use of non-missing values ofvarnamecorresponding to missing values for the variable named byby(). The default is to ignore such values.

scatter_optionsrefers to options of graph twoway scatter.

Examples

. webuse citytemp. describe. skewplot *dd. skewplot *dd, skew. skewplot cooldd, by(region). skewplot cooldd, by(region) ms(i i i i) c(l l l l). skewplot temp*

ReferencesBenjamini, Y. and Krieger, A.M. 1996. Concepts and measures for skewness with data-analytic implications.

Canadian Journal of Statistics24: 131-140.Benjamini, Y. and Krieger, A.M. 1999. Skewness - concepts and measures. In Kotz, S., Read, C.B. and Banks, D.L. (eds)

Encyclopedia ofStatistical Sciences Update Volume 3. New York: John Wiley, 663-670.Bowley, A.L. 1902.

Elements of statistics. London: P.S. King. (2nd edition: see p.331.)David, F.N. and Johnson, N.L. 1956. Some tests of significance with ordered variables.

Journal, Royal Statistical SocietyB 18: 1-20.Fisher, N.I. 1983. Graphical methods in nonparametric statistics: a review and annotated bibliography.

International Statistical Review51: 25-58.Gnanadesikan, R. 1977 (2nd edition 1997).

Methods for statistical dataanalysis of multivariate observations.New York: John Wiley.Groeneveld, R. 1998. Skewness, Bowley's measures of. In Kotz, S., Read, C.B. and Banks, D.L. (eds)

Encyclopedia of Statistical SciencesUpdate Volume 2. New York: John Wiley, 619-621.Hoaglin, D.C. 1985. Using quantiles to study shape. In Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (eds)

Exploring data tables, trends,and shapes. New York: John Wiley, 417-460.Parzen, E. 1979. Nonparametric statistical data modeling.

Journal,American Statistical Association74, 105-131.Wilk, M.B. and Gnanadesikan, R. 1968. Probability plotting methods for the analysis of data.

Biometrika55: 1-17.

AuthorNicholas J. Cox, University of Durham n.j.cox@durham.ac.uk

AcknowledgmentsRichard Groeneveld tracked down the Bowley reference.

Also seeOn-line: graph, symplot Manual:

[G] graph,[R] diagnostic plots