help robreg-------------------------------------------------------------------------------

Title

robreg-- Robust regression

SyntaxMM-estimator

robreg mmdepvarvarlist[if] [in] [,mm_options]M-estimator

robreg mdepvar[varlist] [if] [in] [,m_options]S-estimator

robreg sdepvarvarlist[if] [in] [,s_options]LMS/LQS/LTS-estimator

robreg lmsdepvarvarlist[if] [in] [,lqs_options]robreg lqsdepvarvarlist[if] [in] [,lqs_options]robreg ltsdepvarvarlist[if] [in] [,lqs_options]Replay syntax

robreg[,level(#)]

mm_optionsdescription ------------------------------------------------------------------------- Mainefficiency(#)gaussian efficiency; # in 70(5)95; default isefficiency(85)bp(#)breakdown point;#in .10(.05).50; default isbp(0.5)Biweight M-estimate

k(#)tuning constant; not allowed withefficiency()tolerance(#)tolerance for IRWLS weights; default istolerance(1e-6)iterate(#)maximum number of iterations; default isiterate(16000)relaxcontinue even if convergence not reachedgenerate(newvar)store IRWLS weightsreplaceoverwrite existing variableInitial S-estimate

nsamp(#)number of trial samplessopts(s_options)additional options passed through to S-algorithmsave(name)save S-estimateStandard errors

vce(norobust)traditional standard errorsnorobustsynonym forvce(norobust)Reporting

level(#)set confidence level; default islevel(95)firstdisplay initial S-estimatenodotssuppress progress dots of S-estimatelogdisplay RWLS iteration log -------------------------------------------------------------------------

m_optionsdescription ------------------------------------------------------------------------- Mainhuberuse Huber objective function; the defaultbiweightuse biweight objective function;bisquareis a synonymefficiency(#)gaussian efficiency; # in 70(5)95; default isefficiency(95)k(#)tuning constant; not allowed withefficiency()IRWLS algorithm

tolerance(#)tolerance for IRWLS weights; default istolerance(1e-6)iterate(#)maximum number of iterations; default isiterate(16000)relaxcontinue even if convergence not reachedgenerate(newvar)store IRWLS weightsreplaceoverwrite existing variableInitial estimate

init(arg)initial estimate;argmay belav,ols,name, or.; default isinit(lav)save(name)save initial estimateScale estimate

scale(#)provide preliminary scale estimateupdatescaleupdate scale estimate in each iterationcentercenter residuals when computing scaleStandard errors

vce(norobust)traditional standard errorsvce(pv)traditional standard errors using pseudo-values approachnorobustsynonym forvce(norobust)noseskip computation of standard errorsReporting

level(#)set confidence level; default islevel(95)firstdisplay initial estimatelogdisplay RWLS iteration log -------------------------------------------------------------------------

s_optionsdescription ------------------------------------------------------------------------- Mainbp(#)breakdown point;#in .10(.05).50; default isbp(0.5)k(#)tuning constant; not allowed withbp()Resampling algorithm

nsamp(#)number of trial samplesalpha(#)maximum risk of bad solution; default isalpha(0.01)epsilon(#)maximum contamination fraction; default isepsilon(0.2)nkeep(#)number of candidates to keep; default isnkeep(2)rsteps(#)number of local improvement steps; default isrsteps(1)stolerance(#)tolerance for scale estimate; default isstolerance(1e-6)siterate(#)maximum number of iterations for scale estimate; default issiterate(16000)tolerance(#)tolerance for coefficient vector; default istolerance(1e-6)iterate(#)maximum number of RWLS iterations; default isiterate(16000)ssteps(#)number of scale approximation steps; default isssteps(1)generate(newvar)store IRWLS weightsreplaceoverwrite existing variableStandard errors

vce(norobust)traditional standard errorsnorobustsynonym forvce(norobust)noseskip computation of standard errorsReporting

level(#)set confidence level; default islevel(95)nodotssuppress progress dots -------------------------------------------------------------------------

lqs_optionsdescription ------------------------------------------------------------------------- Main *bp(#)breakdown point;#in (0,0.5]; default isbp(0.5)Resampling algorithm

nsamp(#)number of trial samplesalpha(#)maximum risk of bad solution; default isalpha(0.01)epsilon(#)maximum contamination fraction; default isepsilon(0.2).generate(newvar)store minimizing samplereplaceoverwrite existing variableReporting

nodotssuppress progress dots ------------------------------------------------------------------------- *bp()is not allowed withrobreg lms

Description

robregprovides a number of robust estimators for linear regression models. The command accompanies Jann (2010), a survey paper on robust regression in a German handbook on social science data analysis.

robreg mmfits the efficient high breakdown MM-estimator proposed by Yohai (1987). On the first stage, a high breakdown S-estimator is applied to estimate the residual scale and derive starting values for the coefficients vector. On the second stage, an efficient bisquare M-estimator is applied to obtain the final coefficient estimates.

robreg mfits regression M-estimators (Huber 1973) using iteratively reweighted least squares (IRWLS).

robreg sfits the high breakdown S-estimator introduced by Rousseeuw and Yohai (1984) using the fast algorithm proposed by Salibian-Barrera and Yohai (2006).

robreg lms,robreg lqs, androbreg ltsfit the least median of squares (LMS), least quantile of squares (LQS; a generalization of LMS), and the least trimmed squares (LTS) estimators (Rousseeuw and Leroy 1987). Estimation is carried out using simple resampling without local improvement (e.g. Rousseeuw and Leroy 1987:197). Computation of standard errors is not supported for LMS, LQS, and LTS.For a recent contribution of similar estimators in Stata also see Verardi and Croux (2009).

Dependencies

robregrequiresmoremata. See ssc describe moremata.

Options for robreg mm+------+ ----+ Main +-------------------------------------------------------------

efficiency(#)sets the gaussian efficiency of the MM-estimator (i.e. the asymptotic relative efficiency compared to the OLS or ML estimator in case of i.i.d. normal errors). The efficiency is determined by appropriate choice of the tuning constant for the bisquare M-estimator in the second stage of the MM-algorithm.#may be a number between 70 and 95 in steps of 5. The default for the MM-estimator isefficiency(85), as suggested by Maronna et al. (2006: 144).

bp(#)sets the breakdown point of the MM-Estimator. The breakdown point is determined by appropriate choice of the tuning constant for the S-estimator in the first stage of the MM-algorithm.#may be a number between 0.1 and 0.5 in steps of 0.05. The default isbp(0.5).+---------------------+ ----+ Biweight M-estimate +----------------------------------------------

k(#)specifies the tuning constant for the bisquare M-estimator in the second stage of the MM-algorithm.k()not allowed ifefficiency()is specified.

tolerance(#)specifies the tolerance for the weights of the IRWLS algorithm used to fit the bisquare M-estimator. When the maximum absolute change in the weights from one iteration to the next is less than or equal totolerance(), the convergence criterion is satisfied. The default istolerance(1e-6).

iterate(#)specifies the maximum number of iterations for the IRWLS algorithm used to fit the bisquare M-estimator. If convergence is not reached withiniterate()iterations, the algorithm stops and returns error. The default isiterate(16000)or as set byset maxiter.

relaxcauses the IRWLS algorithm to return the current results instead of returning error if convergence is not reached.

generate(newvar)stores the final weights of the IRWLS algorithm in variablenewvar.

replacepermitsrobregto overwrite existing variables.+--------------------+ ----+ Initial S-estimate +-----------------------------------------------

nsamp(#)specifies the number of trial samples for the search algorithm of the S-estimator in the first stage of the MM-algorithm. The default value is determined according to formulaceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 50 to 10000, where p is the number of coefficients in the model and alpha = 0.01 and epsilon = 0.2 (see Salibian-Barrera and Yohai 2006 for a justification of the formula). The default values for alpha and epsilon can be changed via

sopts()(see below).

sopts(s_options)specified additional options to be passed through to the S-estimator. See the section on options forrobreg s.

save(name)saves the results of the S-estimator undernameusingestimates store.+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust)causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1; the traditional formula is equivalent to Avar_2s).

norobustis a synonym forvce(norobust)

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#)specifies the level for confidence intervals. The default islevel(95)or as set byset level.

firstcauses the first stage S-estimate to be displayed.

nodotssuppresses the progress dots of the S-estimator search algorithm.

logdisplays the iteration log of the second stage IRWLS algorithm.

Options for robreg m+------+ ----+ Main +-------------------------------------------------------------

hubercauses the Huber objective function to be used (monotone M-estimator). This is the default.

biweightcauses the biweight or bisquare objective function to be used (redescending M-estimator).bisquareis a synonym forbiweight. The solution of a redescending M-estimator may depend on the starting values.

efficiency(#)sets the gaussian efficiency (i.e. the asymptotic relative efficiency compared to the OLS or ML estimator in case of i.i.d. normal errors) by appropriate choice of the tuning constant.#may be a number between 70 and 95 in steps of 5. The default isefficiency(95).

k(#)specifies the tuning constant.k()not allowed ifefficiency()is specified.+-----------------+ ----+ IRWLS algorithm +--------------------------------------------------

tolerance(#)specifies the tolerance for the weights of the IRWLS algorithm. When the maximum absolute change in the weights from one iteration to the next is less than or equal totolerance(), the convergence criterion is satisfied. The default istolerance(1e-6).

iterate(#)specifies the maximum number of iterations for the IRWLS algorithm. If convergence is not reached withiniterate()iterations, the algorithm stops and returns error. The default isiterate(16000)or as set byset maxiter.

relaxcauses the IRWLS algorithm to return the current results instead of returning error if convergence is not reached. For example, to fit a one-step M-estimate specifyrelaxtogether withiterate(1).

generate(newvar)stores the final weights of the IRWLS algorithm in variablenewvar.

replacepermitsrobregto overwrite existing variables.+------------------+ ----+ Initial estimate +-------------------------------------------------

init(arg)determines the choice of the initial estimate that provides the starting values for the IRWLS algorithm.argmay belavfor the LAV-estimator (a.k.a. median regression; fitted usingqreg),olsfor the least squares estimator (fitted usingregress),namefor an estimation set stored undername, or.for the currently active estimation results. The default isinit(lav).

save(name)saves initiallavorolsestimate undernameusingestimatesstore.+----------------+ ----+ Scale estimate +---------------------------------------------------

scale(#)provides a preliminary value for the residual scale that will be held constant. The default is to use the normalized median of the (N - number of coefficients) largest absolute residuals from the initial fit as an estimate of the residual scale (MADN).

updatescalecauses the MADN scale estimate to be updated in each iteration of the IRWLS algorithm.updatescalehas no effect ifscale()is specified.

centercauses the MADN scale estimate to be computed based on median centered residuals.centerhas no effect ifscale()is specified.+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust)causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1s; the traditional formula is equivalent to Avar_2s).

vce(pv)causes traditional standard errors to be computed using the pseudo-values approach (Street et al. 1988).vce(pv)is equivalent tovce(norobust)but includes some small sample correction.

norobustis a synonym forvce(norobust)

noseskips the computation of standard errors.+-----------+ ----+ Reporting +--------------------------------------------------------

level(#)specifies the level for confidence intervals. The default islevel(95)or as set byset level.

firstcauses the initial estimate to be displayed.

logdisplays the iteration log of the second stage IRWLS algorithm.

Options for robreg s+------+ ----+ Main +-------------------------------------------------------------

bp(#)sets the breakdown point by appropriate choice of the tuning constant (this also determines the gaussian efficiency).#may be a number between 0.1 and 0.5 in steps of 0.05. The default isbp(0.5).

k(#)specifies the tuning constant.k()not allowed ifbp()is specified.+----------------------+ ----+ Resampling algorithm +---------------------------------------------

nsamp(#)specifies the number of trial samples for the search algorithm. The default value is determined according to formulaceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 50 to 10000, where p is the number of coefficients in the model and alpha and epsilon are set by

alpha()andepsilon()(see Salibian-Barrera and Yohai 2006 for a justification of the formula).

alpha(#)specifies the maximum admissible risk of drawing a set of samples of which none is free of outliers. This is a parameter in the formula for the computation of the required number samples (see above). The default isalpha(0.01)(i.e. 1 percent).alpha()has no effect ifnsamp()is specified.

epsilon(#)specifies the assumed maximum fraction of contaminated data. This is a parameter in the formula for the computation of the required number samples (see above). The default isepsilon(0.2)(i.e. 20 percent).epsilon()has no effect ifnsamp()is specified.

nkeep(#)specifies the number of best candidates to be kept for final refinement. The default isnkeep(2).

rsteps(#)specifies the number of local improvement steps applied to the candidates. The default isrsteps(1).

stolerance(#)specifies the tolerance for the scale estimate of the candidates. When the absolute relative change in the scale from one iteration to the next is less than or equal tostolerance(), the convergence criterion is satisfied. The default isstolerance(1e-6).

siterate(#)specifies the maximum number of iterations for the scale estimate of the candidates. If convergence is not reached withinsiterate()iterations, the algorithm stops and returns error. The default issiterate(16000)or as set byset maxiter.

tolerance(#)specifies the tolerance for the coefficients in the refinement IRWLS algorithm. When the maximum relative change in the coefficient vector from one iteration to the next is less than or equal totolerance(), the convergence criterion is satisfied. The default istolerance(1e-6).

iterate(#)specifies the maximum number of iterations for the refinement IRWLS algorithm. If convergence is not reached withiniterate()iterations, the algorithm stops and returns error. The default isiterate(16000)or as set byset maxiter.

ssteps(#)specifies the number of approximation steps for the scale estimate within each RWLS iteration. The default isssteps(1).

generate(newvar)stores the final IRWLS weights from the best solution in variablenewvar.

replacepermitsrobregto overwrite existing variables.+-----------------+ ----+ Standard errors +--------------------------------------------------

vce(norobust)causes standard errors to be computed using traditional formulas assuming constant error variance. The default is to compute robust standard errors as suggested by Croux et al (2003; using formula Avar_1; the traditional formula is equivalent to Avar_2s).

norobustis a synonym forvce(norobust)

noseskips the computation of standard errors.+-----------+ ----+ Reporting +--------------------------------------------------------

level(#)specifies the level for confidence intervals. The default islevel(95)or as set byset level.

nodotssuppresses the progress dots of the search algorithm.

Options for robreg lms/lqs/lts+------+ ----+ Main +-------------------------------------------------------------

bp(#)sets the breakdown point, where # may be in (0,0.5].bp()determines the h parameter for the LQS and LTS estimators as follows:h = floor((1-

bp())*N) + floor(bp()*(p + 1))where N is the sample size and p is the number of coefficients. The default is

bp(0.5).bp()is not allowed withrobreg lms.+----------------------+ ----+ Resampling algorithm +---------------------------------------------

nsamp(#)specifies the number of trial samples for the search algorithm. The default value is determined according to formulaceil(ln(alpha) / ln(1 - (1 - epsilon)^p))

within a range of 500 to 10000, where p is the number of coefficients in the model and alpha and epsilon are set by

alpha()andepsilon().

alpha(#)specifies the maximum admissible risk of drawing a set of samples of which none is free of outliers. This is a parameter in the formula for the computation of the required number samples (see above). The default isalpha(0.01)(i.e. 1 percent).alpha()has no effect ifnsamp()is specified.

epsilon(#)specifies the assumed maximum fraction of contaminated data. This is a parameter in the formula for the computation of the required number samples (see above). The default isepsilon(0.2)(i.e. 20 percent).epsilon()has no effect ifnsamp()is specified.

generate(newvar)stores a variablenewvarthat marks the minimizing trial sample.

replacepermitsrobregto overwrite existing variables.+-----------+ ----+ Reporting +--------------------------------------------------------

nodotssuppresses the progress dots of the search algorithm.

Examples. sysuse auto

. robreg mm price mpg weight headroom foreign

. robreg m price mpg weight headroom foreign

. robreg m price mpg weight headroom foreign, biweight

. robreg s price mpg weight headroom foreign

. robreg lqs price mpg weight headroom foreign

. robreg lts price mpg weight headroom foreign

Saved results

robregsaves its results ine(). Typeereturn listto list the results after estimation.

ReferencesCroux, C., G. Dhaene, D. Hoorelbeke (2003). Robust Standard Errors for Robust Estimators. Discussions Paper Series (DPS) 03.16. Center for Economic Studies.

Huber, P. J. (1973). Robust Regression: Asymptotics, Conjectures and Monte Carlo. The Annals of Statistics 1: 799-821.

Jann, B. (2010). Robuste Regression. In: Henning Best, Christof Wolf (eds.). Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS-Verlag.

Salibian-Barrera, M., V. J. Yohai (2006). A Fast Algorithm for S-Regression Estimates. Journal of Computational and Graphical Statistics 15: 414-427.

Street, J. O., R. J. Carroll, D. Ruppert (1988). A Note on Computing Robust Regression Estimates Via Iteratively Reweighted Least Squares. The American Statistician 42: 152-154.

Rousseeuw, P., V. Yohai (1984). Robust Regression by Means of S-Estimators. Pp. 256-272 in: Jürgen Franke, Wolfgang Hardle, and Douglas Martin (eds.). Robust and Nonlinear Time Series Analysis. Lecture Notes in Statistics Vol. 26. Berlin: Springer.

Yohai, V. J. (1987). High Breakdown-Point and High Efficiency Robust Estimates for Regression. The Annals of Statistics 15: 642-656.

Verardi, V., C. Croux (2009). Robust regression in Stata. The Stata Journal 9: 439-453.

AuthorBen Jann, ETH Zurich, jannb@ethz.ch

Thanks for citing this software as follows:

Jann, B. (2010). robreg: Stata module providing robust regression estimators. Available from http://ideas.repec.org/c/boc/bocode/s457114.html.

Also see