help robreg
-------------------------------------------------------------------------------

Title

robreg -- Robust regression

Syntax

MM-estimator

robreg mm depvar varlist [if] [in] [, mm_options ]

M-estimator

robreg m depvar [varlist] [if] [in] [, m_options ]

S-estimator

robreg s depvar varlist [if] [in] [, s_options ]

LMS/LQS/LTS-estimator

robreg lms depvar varlist [if] [in] [, lqs_options ] robreg lqs depvar varlist [if] [in] [, lqs_options ] robreg lts depvar varlist [if] [in] [, lqs_options ]

Replay syntax

robreg [, level(#) ]

mm_options description ------------------------------------------------------------------------- Main efficiency(#) gaussian efficiency; # in 70(5)95; default is efficiency(85) bp(#) breakdown point; # in .10(.05).50; default is bp(0.5)

Biweight M-estimate k(#) tuning constant; not allowed with efficiency() tolerance(#) tolerance for IRWLS weights; default is tolerance(1e-6) iterate(#) maximum number of iterations; default is iterate(16000) relax continue even if convergence not reached generate(newvar) store IRWLS weights replace overwrite existing variable

Initial S-estimate nsamp(#) number of trial samples sopts(s_options) additional options passed through to S-algorithm save(name) save S-estimate

Standard errors vce(norobust) traditional standard errors norobust synonym for vce(norobust)

Reporting level(#) set confidence level; default is level(95) first display initial S-estimate nodots suppress progress dots of S-estimate log display RWLS iteration log -------------------------------------------------------------------------

m_options description ------------------------------------------------------------------------- Main huber use Huber objective function; the default biweight use biweight objective function; bisquare is a synonym efficiency(#) gaussian efficiency; # in 70(5)95; default is efficiency(95) k(#) tuning constant; not allowed with efficiency()

IRWLS algorithm tolerance(#) tolerance for IRWLS weights; default is tolerance(1e-6) iterate(#) maximum number of iterations; default is iterate(16000) relax continue even if convergence not reached generate(newvar) store IRWLS weights replace overwrite existing variable

Initial estimate init(arg) initial estimate; arg may be lav, ols, name, or .; default is init(lav) save(name) save initial estimate

Scale estimate scale(#) provide preliminary scale estimate updatescale update scale estimate in each iteration center center residuals when computing scale

Standard errors vce(norobust) traditional standard errors vce(pv) traditional standard errors using pseudo-values approach norobust synonym for vce(norobust) nose skip computation of standard errors

Reporting level(#) set confidence level; default is level(95) first display initial estimate log display RWLS iteration log -------------------------------------------------------------------------

s_options description ------------------------------------------------------------------------- Main bp(#) breakdown point; # in .10(.05).50; default is bp(0.5) k(#) tuning constant; not allowed with bp()

Resampling algorithm nsamp(#) number of trial samples alpha(#) maximum risk of bad solution; default is alpha(0.01) epsilon(#) maximum contamination fraction; default is epsilon(0.2) nkeep(#) number of candidates to keep; default is nkeep(2) rsteps(#) number of local improvement steps; default is rsteps(1) stolerance(#) tolerance for scale estimate; default is stolerance(1e-6) siterate(#) maximum number of iterations for scale estimate; default is siterate(16000) tolerance(#) tolerance for coefficient vector; default is tolerance(1e-6) iterate(#) maximum number of RWLS iterations; default is iterate(16000) ssteps(#) number of scale approximation steps; default is ssteps(1) generate(newvar) store IRWLS weights replace overwrite existing variable

Standard errors vce(norobust) traditional standard errors norobust synonym for vce(norobust) nose skip computation of standard errors

Reporting level(#) set confidence level; default is level(95) nodots suppress progress dots -------------------------------------------------------------------------

lqs_options description ------------------------------------------------------------------------- Main * bp(#) breakdown point; # in (0,0.5]; default is bp(0.5)

Resampling algorithm nsamp(#) number of trial samples alpha(#) maximum risk of bad solution; default is alpha(0.01) epsilon(#) maximum contamination fraction; default is epsilon(0.2). generate(newvar) store minimizing sample replace overwrite existing variable

Reporting nodots suppress progress dots ------------------------------------------------------------------------- * bp() is not allowed with robreg lms

Description

robreg provides a number of robust estimators for linear regression models. The command accompanies Jann (2010), a survey paper on robust regression in a German handbook on social science data analysis.

robreg mm fits the efficient high breakdown MM-estimator proposed by Yohai (1987). On the first stage, a high breakdown S-estimator is applied to estimate the residual scale and derive starting values for the coefficients vector. On the second stage, an efficient bisquare M-estimator is applied to obtain the final coefficient estimates.

robreg m fits regression M-estimators (Huber 1973) using iteratively reweighted least squares (IRWLS).

robreg s fits the high breakdown S-estimator introduced by Rousseeuw and Yohai (1984) using the fast algorithm proposed by Salibian-Barrera and Yohai (2006).

robreg lms, robreg lqs, and robreg lts fit the least median of squares (LMS), least quantile of squares (LQS; a generalization of LMS), and the least trimmed squares (LTS) estimators (Rousseeuw and Leroy 1987). Estimation is carried out using simple resampling without local improvement (e.g. Rousseeuw and Leroy 1987:197). Computation of standard errors is not supported for LMS, LQS, and LTS.

For a recent contribution of similar estimators in Stata also see Verardi and Croux (2009).

Dependencies

robreg requires moremata. See ssc describe moremata.

Options for robreg mm

+------+ ----+ Main +-------------------------------------------------------------

efficiency(#) sets the gaussian efficiency of the MM-estimator (i.e. the asymptotic relative efficiency compared to the OLS or ML estimator in case of i.i.d. normal errors). The efficiency is determined by appropriate choice of the tuning constant for the bisquare M-estimator in the second stage of the MM-algorithm. # may be a number between 70 and 95 in steps of 5. The default for the MM-estimator is efficiency(85), as suggested by Maronna et al. (2006: 144).

bp(#) sets the breakdown point of the MM-Estimator. The breakdown point is determined by appropriate choice of the tuning constant for the S-estimator in the first stage of the MM-algorithm. # may be a number between 0.1 and 0.5 in steps of 0.05. The default is bp(0.5).

+---------------------+ ----+ Biweight M-estimate +----------------------------------------------

k(#) specifies the tuning constant for the bisquare M-estimator in the second stage of the MM-algorithm. k() not allowed if efficiency() is specified.

tolerance(#) specifies the tolerance for the weights of the IRWLS algorithm used to fit the bisquare M-estimator. When the maximum absolute change in the weights from one iteration to the next is less than or equal to tolerance(), the convergence criterion is satisfied. The default is tolerance(1e-6).

iterate(#) specifies the maximum number of iterations for the IRWLS algorithm used to fit the bisquare M-estimator. If convergence is not reached within iterate() iterations, the algorithm stops and returns error. The default is iterate(16000) or as set by set maxiter.

relax causes the IRWLS algorithm to return the current results instead of returning error if convergence is not reached.

generate(newvar) stores the final weights of the IRWLS algorithm in variable newvar.

replace permits robreg to overwrite existing variables.

+--------------------+ ----+ Initial S-estimate +-----------------------------------------------

nsamp(#) specifies the number of trial samples for the search algorithm of the S-estimator in the first stage of the MM-algorithm. The default value is determined according to formula

ceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 50 to 10000, where p is the number of coefficients in the model and alpha = 0.01 and epsilon = 0.2 (see Salibian-Barrera and Yohai 2006 for a justification of the formula). The default values for alpha and epsilon can be changed via sopts() (see below).

sopts(s_options) specified additional options to be passed through to the S-estimator. See the section on options for robreg s.

save(name) saves the results of the S-estimator under name using estimates store.

+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust) causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1; the traditional formula is equivalent to Avar_2s).

norobust is a synonym for vce(norobust)

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#) specifies the level for confidence intervals. The default is level(95) or as set by set level.

first causes the first stage S-estimate to be displayed.

nodots suppresses the progress dots of the S-estimator search algorithm.

log displays the iteration log of the second stage IRWLS algorithm.

Options for robreg m

+------+ ----+ Main +-------------------------------------------------------------

huber causes the Huber objective function to be used (monotone M-estimator). This is the default.

biweight causes the biweight or bisquare objective function to be used (redescending M-estimator). bisquare is a synonym for biweight. The solution of a redescending M-estimator may depend on the starting values.

efficiency(#) sets the gaussian efficiency (i.e. the asymptotic relative efficiency compared to the OLS or ML estimator in case of i.i.d. normal errors) by appropriate choice of the tuning constant. # may be a number between 70 and 95 in steps of 5. The default is efficiency(95).

k(#) specifies the tuning constant. k() not allowed if efficiency() is specified.

+-----------------+ ----+ IRWLS algorithm +--------------------------------------------------

tolerance(#) specifies the tolerance for the weights of the IRWLS algorithm. When the maximum absolute change in the weights from one iteration to the next is less than or equal to tolerance(), the convergence criterion is satisfied. The default is tolerance(1e-6).

iterate(#) specifies the maximum number of iterations for the IRWLS algorithm. If convergence is not reached within iterate() iterations, the algorithm stops and returns error. The default is iterate(16000) or as set by set maxiter.

relax causes the IRWLS algorithm to return the current results instead of returning error if convergence is not reached. For example, to fit a one-step M-estimate specify relax together with iterate(1).

generate(newvar) stores the final weights of the IRWLS algorithm in variable newvar.

replace permits robreg to overwrite existing variables.

+------------------+ ----+ Initial estimate +-------------------------------------------------

init(arg) determines the choice of the initial estimate that provides the starting values for the IRWLS algorithm. arg may be lav for the LAV-estimator (a.k.a. median regression; fitted using qreg), ols for the least squares estimator (fitted using regress), name for an estimation set stored under name, or . for the currently active estimation results. The default is init(lav).

save(name) saves initial lav or ols estimate under name using estimates store.

+----------------+ ----+ Scale estimate +---------------------------------------------------

scale(#) provides a preliminary value for the residual scale that will be held constant. The default is to use the normalized median of the (N - number of coefficients) largest absolute residuals from the initial fit as an estimate of the residual scale (MADN).

updatescale causes the MADN scale estimate to be updated in each iteration of the IRWLS algorithm. updatescale has no effect if scale() is specified.

center causes the MADN scale estimate to be computed based on median centered residuals. center has no effect if scale() is specified.

+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust) causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1s; the traditional formula is equivalent to Avar_2s).

vce(pv) causes traditional standard errors to be computed using the pseudo-values approach (Street et al. 1988). vce(pv) is equivalent to vce(norobust) but includes some small sample correction.

norobust is a synonym for vce(norobust)

nose skips the computation of standard errors.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#) specifies the level for confidence intervals. The default is level(95) or as set by set level.

first causes the initial estimate to be displayed.

log displays the iteration log of the second stage IRWLS algorithm.

Options for robreg s

+------+ ----+ Main +-------------------------------------------------------------

bp(#) sets the breakdown point by appropriate choice of the tuning constant (this also determines the gaussian efficiency). # may be a number between 0.1 and 0.5 in steps of 0.05. The default is bp(0.5).

k(#) specifies the tuning constant. k() not allowed if bp() is specified.

+----------------------+ ----+ Resampling algorithm +---------------------------------------------

nsamp(#) specifies the number of trial samples for the search algorithm. The default value is determined according to formula

ceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 50 to 10000, where p is the number of coefficients in the model and alpha and epsilon are set by alpha() and epsilon() (see Salibian-Barrera and Yohai 2006 for a justification of the formula).

alpha(#) specifies the maximum admissible risk of drawing a set of samples of which none is free of outliers. This is a parameter in the formula for the computation of the required number samples (see above). The default is alpha(0.01) (i.e. 1 percent). alpha() has no effect if nsamp() is specified.

epsilon(#) specifies the assumed maximum fraction of contaminated data. This is a parameter in the formula for the computation of the required number samples (see above). The default is epsilon(0.2) (i.e. 20 percent). epsilon() has no effect if nsamp() is specified.

nkeep(#) specifies the number of best candidates to be kept for final refinement. The default is nkeep(2).

rsteps(#) specifies the number of local improvement steps applied to the candidates. The default is rsteps(1).

stolerance(#) specifies the tolerance for the scale estimate of the candidates. When the absolute relative change in the scale from one iteration to the next is less than or equal to stolerance(), the convergence criterion is satisfied. The default is stolerance(1e-6).

siterate(#) specifies the maximum number of iterations for the scale estimate of the candidates. If convergence is not reached within siterate() iterations, the algorithm stops and returns error. The default is siterate(16000) or as set by set maxiter.

tolerance(#) specifies the tolerance for the coefficients in the refinement IRWLS algorithm. When the maximum relative change in the coefficient vector from one iteration to the next is less than or equal to tolerance(), the convergence criterion is satisfied. The default is tolerance(1e-6).

iterate(#) specifies the maximum number of iterations for the refinement IRWLS algorithm. If convergence is not reached within iterate() iterations, the algorithm stops and returns error. The default is iterate(16000) or as set by set maxiter.

ssteps(#) specifies the number of approximation steps for the scale estimate within each RWLS iteration. The default is ssteps(1).

generate(newvar) stores the final IRWLS weights from the best solution in variable newvar.

replace permits robreg to overwrite existing variables.

+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust) causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1; the traditional formula is equivalent to Avar_2s).

norobust is a synonym for vce(norobust)

nose skips the computation of standard errors.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#) specifies the level for confidence intervals. The default is level(95) or as set by set level.

nodots suppresses the progress dots of the search algorithm.

Options for robreg lms/lqs/lts

+------+ ----+ Main +-------------------------------------------------------------

bp(#) sets the breakdown point, where # may be in (0,0.5]. bp() determines the h parameter for the LQS and LTS estimators as follows:

h = floor((1-bp())*N) + floor(bp()*(p + 1))

where N is the sample size and p is the number of coefficients. The default is bp(0.5). bp() is not allowed with robreg lms.

+----------------------+ ----+ Resampling algorithm +---------------------------------------------

nsamp(#) specifies the number of trial samples for the search algorithm. The default value is determined according to formula

ceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 500 to 10000, where p is the number of coefficients in the model and alpha and epsilon are set by alpha() and epsilon().

alpha(#) specifies the maximum admissible risk of drawing a set of samples of which none is free of outliers. This is a parameter in the formula for the computation of the required number samples (see above). The default is alpha(0.01) (i.e. 1 percent). alpha() has no effect if nsamp() is specified.

epsilon(#) specifies the assumed maximum fraction of contaminated data. This is a parameter in the formula for the computation of the required number samples (see above). The default is epsilon(0.2) (i.e. 20 percent). epsilon() has no effect if nsamp() is specified.

generate(newvar) stores a variable newvar that marks the minimizing trial sample.

replace permits robreg to overwrite existing variables.

+-----------+ ----+ Reporting +--------------------------------------------------------

nodots suppresses the progress dots of the search algorithm.

Examples

. sysuse auto

. robreg mm price mpg weight headroom foreign

. robreg m price mpg weight headroom foreign

. robreg m price mpg weight headroom foreign, biweight

. robreg s price mpg weight headroom foreign

. robreg lqs price mpg weight headroom foreign

. robreg lts price mpg weight headroom foreign

Saved results

robreg saves its results in e(). Type ereturn list to list the results after estimation.

References

Croux, C., G. Dhaene, D. Hoorelbeke (2003). Robust Standard Errors for Robust Estimators. Discussions Paper Series (DPS) 03.16. Center for Economic Studies.

Huber, P. J. (1973). Robust Regression: Asymptotics, Conjectures and Monte Carlo. The Annals of Statistics 1: 799-821.

Jann, B. (2010). Robuste Regression. In: Henning Best, Christof Wolf (eds.). Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS-Verlag.

Salibian-Barrera, M., V. J. Yohai (2006). A Fast Algorithm for S-Regression Estimates. Journal of Computational and Graphical Statistics 15: 414-427.

Street, J. O., R. J. Carroll, D. Ruppert (1988). A Note on Computing Robust Regression Estimates Via Iteratively Reweighted Least Squares. The American Statistician 42: 152-154.

Rousseeuw, P., V. Yohai (1984). Robust Regression by Means of S-Estimators. Pp. 256-272 in: Jürgen Franke, Wolfgang Hardle, and Douglas Martin (eds.). Robust and Nonlinear Time Series Analysis. Lecture Notes in Statistics Vol. 26. Berlin: Springer.

Yohai, V. J. (1987). High Breakdown-Point and High Efficiency Robust Estimates for Regression. The Annals of Statistics 15: 642-656.

Verardi, V., C. Croux (2009). Robust regression in Stata. The Stata Journal 9: 439-453.

Author

Ben Jann, ETH Zurich, jannb@ethz.ch

Thanks for citing this software as follows:

Jann, B. (2010). robreg: Stata module providing robust regression estimators. Available from http://ideas.repec.org/c/boc/bocode/s457114.html.

Also see