------------------------------------------------------------------------------- help forwinsor-------------------------------------------------------------------------------

Winsorizing a variable

winsorvarname[ifexp] [inrange] ,generate(newvar){p(#)|h(#)} [ {highonly|lowonly} ]

Description

winsortakes the non-missing values of a variablexordered such that

x_1<= ... <=x_nand generates a new variable

yidentical toxexcept that thehhighest andhlowest values are replaced by the next value counting inwards from the extremes:

y_1, ... ,y_h=y_(h + 1)

y_n, ... ,y_(n - h + 1)=y_(n - h)

hcan be specified directly or indirectly by specifying a fractionpof the number of observationsn:

h= [p n]where [ ] denotes integer part. This transformation is named after the biostatistician Charles P. Winsor (1895-1951): see, for example, Tukey (1962). For more discussion and references, see Barnett and Lewis (1994).

Charles (Charlie) Winsor was educated at Harvard as an engineer and then worked for the New England Telephone and Telegraph Company, but his interests shifted to biological research and biostatistics. After further study at Johns Hopkins and Harvard, he held posts at Iowa State College and Johns Hopkins; in between, in the Second World War, he did government work at Princeton.

Options

generate(newvar)specifies the name of the new variable. It is a required option.

p(#)specifies the fraction of the observations to be modified in each tail.pshould be greater than 0 and less than 0.5 and imply a value ofhas just below.

h(#)specifies the number of the observations to be modified in each tail.hshould be at least 1 and less than half the number of non-missing observations.Just one of

p()andh()should be specified.

highonlyandlowonlyspecify that Winsorizing should be one-sided, referring only to the tail with the highest values or only to the tail with the lowest values, respectively. These options should not be specified together.

Examples. winsor mpg, gen(Wmpg) h(3)

. winsor mpg, gen(Wmpg2) p(0.1)

ReferencesAnonymous. 1951. In memoriam: Charles P. Winsor.

Biometrics7: 221.Barnett, V. and Lewis, T. 1994.

Outliers in statistical data.Chichester: John Wiley. [Previous editions 1978, 1984.]Tukey, J.W. 1962. The future of data analysis.

Annals of MathematicalStatistics33: 1-67.

AuthorNicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk