{smcl} {* *! version 2.0 2020/11/24}{...} {hline} {cmd:help for {hi:winsor2}}{right: ({browse "https://www.lianxh.cn":blog})} {hline} {title:Winsorizing or Trimming variables} {title:Syntax} {p 4 19 2} {cmdab:winsor2} {varlist} {ifin}, [ {cmdab:s:uffix:}{cmd:(}string{cmd:)} {opt replace} {opt t:rim} {cmdab:c:uts(}{it:#} {it:#}{cmd:)} {opth by:(varlist:groupvar)} {opt l:abel} ] {title:Description} {p 4 4 2} {cmd:winsor2} winsorize or trim (if {cmd:trim} option is specified) the variables in {varlist} at particular percentiles specified by option {bf: cuts(#1 #2)}. In defult, new variables will be generated with a suffix "_w" or "_tr", which can be changed by specifying {bf:suffix()} option. The {bf:replace} option replaces the variables with their winsorized or trimmed ones. {text}{dlgtab:Difference between winsorizing and trimming}{text} {p 4 4 2} Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming or truncation. In a trimmed estimator, the extreme values are discarded; in a Winsorized estimator, the extreme values are instead replaced by certain percentiles, specified by option cuts(# #). For details, see {help winsor} (if installed), {help trimmean} (if installed). {p 4 4 2} For example, you type the following commands to get the 1th and 99th percentiles of variable wage, 1.930993 and 38.70926, respectively. {phang2} {bf: . sysuse nlsw88, clear} {p_end} {phang2} {bf: . sum wage, detail} {p_end} {p 4 4 2} In defult, {cmd:winsor2} winsorize wage at 1th and 99th percentiles, {phang2} {bf: . winsor2 wage, replace cuts(1 99)} {p_end} {p 4 4 2} which can be done by hands: {phang2} {bf: . replace wage=1.930993 if wage<1.930993} {p_end} {phang2} {bf: . replace wage=38.70926 if wage>38.70926} {p_end} {p 4 4 2} Note that, values smaller than the 1th percentile is repalce by the 1th percentile, and the similar thing is done with the 99th percentile. {p 4 4 2} Things change when -{bf:trim}- option is specified: {phang2} {bf: . winsor2 wage, replace cuts(1 99) trim} {p_end} {p 4 4 2} which can also be done by hands: {phang2} {bf: . replace wage=. if wage<1.930993} {p_end} {phang2} {bf: . replace wage=. if wage>38.70926} {p_end} {p 4 4 2} In this case, we discard values smaller than 1th percentile or greater than 99th percentile. This is trimming. {title:Options} {p 4 8 2}{cmd:suffix(}{it:string}{cmd:)} specifies the suffix of the new variables. The defult is "_w" or "_tr" (when {bf:trim} specified). {p 4 8 2}{cmd:replace} replaces the variables with their winsorized or trimmed counterpart. Can not be specified with {cmd:suffix(}{it:string}{cmd:)}. {p 4 8 2}{cmd:trim} trims the variables. {p 4 8 2}{cmd:cuts(}{it:#} {it:#}{cmd:)} specifies the percentiles at which the data is winsorized or trimmed. {bf: cuts(1 99)} (the default) means winsor (trim) at 1th and 99th percentile. Specify {bf: cuts(1 99)} or {bf: cuts(99 1)} makes no difference. {p 4 8 2}{opth by:(varlist:groupvar)} the winsor or trim is done within each group specified by {it:groupvar}. {title:Examples} {phang2} *- winsor at (p1 p99), get new variable "wage_w" {p_end} {phang2}{inp:.} {stata "sysuse nlsw88, clear": sysuse nlsw88, clear}{p_end} {phang2}{inp:.} {stata "winsor2 wage": winsor2 wage}{p_end} {phang2} *- winsor 3 variables at 0.5th and 99.5th percentiles, and overwrite the old variables {p_end} {phang2}{inp:.} {stata "winsor2 wage age hours, cuts(0.5 99.5) replace": winsor2 wage age hours, cuts(0.5 99.5) replace}{p_end} {phang2} *- winsor 3 variables at (p1 p99), gen new variables with suffix _win, and add variable labels {p_end} {phang2}{inp:.} {stata "winsor2 wage age hours, suffix(_win) label": winsor2 wage age hours, suffix(_win) label}{p_end} {phang2} *- left-winsorizing only, at 1th percentile {p_end} {phang2}{inp:.} {stata "winsor2 age, cuts(1 100)": winsor2 wage, cuts(1 100)}{p_end} {phang2} *- right-trimming only, at 99th percentile {p_end} {phang2}{inp:.} {stata "winsor2 wage, cuts(0 99) trim": winsor2 wage, cuts(0 99) trim}{p_end} {phang2} *- winsor variables at (p1 p99) by (industry), overwrite the old variables {p_end} {phang2}{inp:.} {stata "winsor2 wage age hours, replace by(industry)": winsor2 wage hours, replace by(industry)}{p_end} {title:References} {p 4 8 2}Anonymous. 1951. In memoriam: Charles P. Winsor. {it:Biometrics} 7: 221. {p 4 8 2}Barnett, V. and Lewis, T. 1994. {it:Outliers in statistical data.} Chichester: John Wiley. [Previous editions 1978, 1984.] {p 4 8 2}Tukey, J.W. 1962. The future of data analysis. {it:Annals of Mathematical Statistics} 33: 1{c -}67. {title:Acknowledgements} {p 4 8 2} Codes from {help winsor} by Nicholas J. Cox and -winsorizeJ.ado- by Judson Caskey have been incorporated. {title:Author} {phang} {cmd:Yujun,Lian (arlionn)} Department of Finance, Lingnan College, Sun Yat-Sen University.{break} E-mail: {browse "mailto:arlionn@163.com":arlionn@163.com}. {break} Blog: {browse "https://www.lianxh.cn":https://www.lianxh.cn} {break} {p_end} {title:Other Commands I have written} {pstd} {synoptset 30 }{...} {synopt:{help lianxh} (if installed)} {stata ssc install lianxh} (to install){p_end} {synopt:{help bdiff} (if installed)} {stata ssc install bdiff} (to install){p_end} {synopt:{help hhi5} (if installed)} {stata ssc install hhi5} (to install){p_end} {synopt:{help uall} (if installed)} {stata ssc install uall} (to install){p_end} {synopt:{help xtbalance} (if installed)} {stata ssc install xtbalance} (to install){p_end} {p2colreset}{...} {title:Also see} {p 4 13 2} Online: {help summarize}, {help means}, {help winsor} (if installed), {help trimplot} (if installed), {help trimmean} (if installed), {help iqr} (if installed), {help robmean} (if installed)