{smcl} {* *! version 1.1.0 30nov2012}{...} {cmd:help pctrim} {hline} {title:Title} {phang} {bf:pctrim} {hline 2} Mark or recode observations outside of a percentile range {title:Syntax} {p 8 17 2} {cmd:pctrim} {varlist} {ifin} {cmd:,} [{it:options}] {phang} {synoptset 30}{...} {synopthdr} {synoptline} {synopt:{opt p:ercentiles(lb ub)}}Lower and upper bounds for percentile trimming. Defaults are 1 and 99.{p_end} {synopt:{opth by(varname)}}Group variables within which to check for outliers and calculate replacement values.{p_end} {synopt:{opt mark(newvarname)}}Generate indicator variable marking observations with outliers in any variable.{p_end} {synopt:{opt rec:ode(mean|median|miss|bound)}}Recode outliers for each variable.{p_end} {synopt:{opt replace}}Replace existing outliers with recode value, after recode.{p_end} {synopt:{opt gen:erate(stubname)}}Prefix for new, recoded variables, after recode.{p_end} {synopt:{opt copy:rest}}Copy out-of-sample values from old variables to new variables, after recode generate. {p_end} {synopt:{opt miss:ok}}Do not exclude observations with missing values for some or all variables.{p_end} {synoptline} {p2colreset}{...} {title:Description} {pstd} {cmd:pctrim} trims outlying observations based on percentile bounds. {cmd:pctrim} can operate on a {varlist}. The user can create an indicator variable marking outliers or recode them. Recode options include mean, median, upper/lower bounds, or system missing. Outliers may be recoded in place, or new variables may be generated with trimmed data. The {opt mark} option operates case-wise, marking observations that have outliers for any variable in {varlist}. The {opt rec:ode} option operates on each variable indpendently. Only values that are outliers with respect to the current variable are recoded. Recode can replace outliers with system missing, mean, median, or the relevant (upper or lower) percentile bound. Replacement statistics are computed with outliers included. {pstd} {cmd:pctrim} works best if the variables in {varlist} have no missing observations. By default, {cmd:pctrim} works on a single analysis sample after excluding any observations with missing data for any variable in {varlist}. Observations with missing data are not considered when identifying outliers or when computing replacement statistics. If the user would prefer to operate on each variable independently, the option {opt miss:ok} should be specified. Given this option, {cmd:pctrim} operates on a single variable at a time, and does not consider whether other variables in {varlist} have missing data. {title:Options} {dlgtab:Main} {phang} {opt p:ercentiles(lb ub)} allows the user to choose lower and upper percentile bounds to designate outliers. The defaults are 1 and 99. 0 may be used for the lower bound if trimming is only desired for the top of the distribution. Similarly, 100 may be used for the upper bound. {p_end} {phang} {opth by(varlist)} designates group variables. If specified, percentile bounds and replacement statistics are calculated within each combination of group variables. By variables must be numeric. Observations with missing data for the by-variables are excluded. This is true even with the {opt miss:ok} option.{p_end} {phang} {opt mark(newvarname)} creates a new indicator variable marking observations with outliers in any of the variables in {varlist}. {opt mark(newvarname)} operates case-wise. {phang} {opt rec:ode(mean|median|miss|bound)} replaces outlying values for each variable. {opt rec:ode(mean|median|miss|bound)} operates on each variable indpendently. Only observations that are outliers with respect to that variable are recoded. Recode options are system missing, mean, median, or the relevant (upper or lower) percentile bound. {phang} {opt replace} replaces outlying values in existing variables. {phang} {opt gen:erate(stubname)} creates a new, trimmed variable for each variable in {varlist}. Each new variable is prefixed with stubname. {phang} {opt copy:rest} specifies that out-of-sample values be copied from the original variables. In line with other data-management commands, {cmd: pctrim} defaults to setting newvar to missing (.) outside the observations selected by if exp and in range. {opt copy:rest} option can only be used with the recode and generate options. {phang} {opt miss:ok} specifies that observations should not be excluded from the analysis sample if they have missing values for some or all variables in {varlist}. Missing values are ignored when computing percentile bounds and marking outlying observations for each variable. {title:Examples} {phang}{cmd:. sysuse auto} {phang}{cmd:. pctrim weight mpg length , by(foreign) p(0 99) mark(outlier)} {phang}{cmd:. pctrim weight mpg length , p(0 99) gen(tr_)} {title:Author} {pstd} Michael Barker {p_end} {pstd} Georgetown University {p_end} {pstd} mdb96@georgetown.edu {p_end} {title:Also see} {psee} {space 2}Help: {help pctile } , {help centile } , {help egen##pctile(): egen pctile() } {psee} {space 2}Net: {net `"describe winsor , from(http://fmwww.bc.edu/repec/bocode/w)"' : winsor } , {net `"describe winsor2 , from(http://fmwww.bc.edu/repec/bocode/w)"' : winsor2 } , {net `"describe trimmean , from(http://fmwww.bc.edu/repec/bocode/t)"' : trimmean } , {net `"describe trimplot , from(http://fmwww.bc.edu/repec/bocode/t)"' : trimplot } {p_end}