{smcl} {* 06feb2021}{...} {hi:help robmv}{...} {right:{browse "http://github.com/benjann/robmv/"}} {hline} {title:Title} {pstd}{hi:robmv} {hline 2} Robust multivariate estimation of location and covariance {title:Syntax} {pstd}Classical (non-robust) estimator {p 8 15 2} {cmd:robmv} {opt cl:assic} {varlist} {ifin} {weight} [{cmd:,} {help robmv##cl_opt:{it:classic_options}} {help robmv##opt:{it:general_options}} ] {pstd}M estimator {p 8 15 2} {cmd:robmv} {opt m} {varlist} {ifin} {weight} [{cmd:,} {help robmv##m_opt:{it:m_options}} {help robmv##opt:{it:general_options}} ] {pstd}S estimator {p 8 15 2} {cmd:robmv} {opt s} {varlist} {ifin} {weight} [{cmd:,} {help robmv##s_opt:{it:s_options}} {help robmv##opt:{it:general_options}} ] {pstd}MM estimator {p 8 15 2} {cmd:robmv} {opt mm} {varlist} {ifin} {weight} [{cmd:,} {help robmv##mm_opt:{it:mm_options}} {help robmv##opt:{it:general_options}} ] {pstd}Minimum Volume Ellipsoid (MVE) estimator {p 8 15 2} {cmd:robmv} {opt mve} {varlist} {ifin} {weight} [{cmd:,} {help robmv##mve_opt:{it:mve_options}} {help robmv##opt:{it:general_options}} ] {pstd}Minimum Covariance Determinant (MCD) estimator {p 8 15 2} {cmd:robmv} {opt mcd} {varlist} {ifin} {weight} [{cmd:,} {help robmv##mcd_opt:{it:mcd_options}} {help robmv##opt:{it:general_options}} ] {pstd}Stahel-Donoho estimator {p 8 15 2} {cmd:robmv} {opt sd} {varlist} {ifin} {weight} [{cmd:,} {help robmv##sd_opt:{it:sd_options}} {help robmv##opt:{it:general_options}} ] {pstd}Generate robust distances, outliers, etc., after estimation {p 8 15 2} {cmd:predict} {dtype} {newvar} {ifin} [{cmd:,} {help robmv##predict_opt:{it:predict_options}} ] {p 4 6 2} {it:varlist} may contain factor variables; see {help fvvarlist}.{p_end} {p 4 6 2} {opt pweight}s, {opt aweight}s, {opt iweight}s, and {opt fweight}s are allowed; see {help weight}{p_end} {p 4 6 2}(exception: {cmd:robmv mcd} and {cmd:robmv mve} do not allow {opt fweight}s) {synoptset 21 tabbed}{...} {marker opt}{col 5}{it:{help robmv##options:general_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt corr:elation}}report correlations instead of covariances {p_end} {syntab :Standard errors/CIs} {synopt :{cmd:vce(}{help robmv##vcetype:{it:vcetype}}{cmd:)}}{it:vcetype} may be {cmdab:a:nalytic} (the default), {cmdab:cl:uster} {it:clustvar}, {cmdab:boot:strap} or {cmdab:jack:knife} {p_end} {synopt :{cmd:svy}[{cmd:(}{help robstat##svy:{it:subpop}}{cmd:)}]}take account of survey design as set by {helpb svyset}, optionally restricting computations to {it:subpop} {p_end} {synopt :{opt nose}}suppress computation of standard errors and confidence intervals {p_end} {synopt :{cmdab:if:generate(}{help robmv##ifgen:{it:names}}{cmd:)}}stores the values of the influence functions {p_end} {synopt :{opt r:eplace}}allows overwriting existing variables {p_end} {syntab :Reporting} {synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)} {p_end} {synopt :{opt nohe:ader}}suppress output header {p_end} {synopt :{opt notab:le}}suppress output table {p_end} {synopt :{help robmv##displayopts:{it:display_options}}}standard reporting options as described in {helpb estimation options:[R] estimation options} {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker cl_opt}{col 5}{it:{help robmv##cl_options:classic_options}}{col 28}Description {synoptline} {synopt :{opt normc:oll}}do not remove collinear variables {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker m_opt}{col 5}{it:{help robmv##m_options:m_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt k(#)}}set custom tuning constant {p_end} {synopt :{opt ptrim(#)}}set winsorizing percentage {p_end} {syntab :Consistency correction} {synopt :{opt c(#)}}set custom consistency correction factor {p_end} {synopt :{opt cemp}}use alternative approach to compute consistency correction factor {p_end} {syntab :Algorithm} {synopt :{opt tol:erance(#)}}tolerance for reweighting algorithm; default is {cmd:tolerance(1e-10)} {p_end} {synopt :{opt iter:ate(#)}}maximum number of iterations; default is as set by {helpb set maxiter} {p_end} {synopt :{opt relax}}do not return error if convergence is not reached {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker s_opt}{col 5}{it:{help robmv##s_options:s_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)} {p_end} {synopt :{opt wh:ilferty}}obtain tuning constant using Wilson-Hilferty transformation {p_end} {synopt :{opt k(#)}}custom tuning constant {p_end} {syntab :Algorithm} {synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(20)} {p_end} {synopt :{opt cstep:s(#)}}improvement steps applied to each trial candidate; default is {cmd:csteps(2)} {p_end} {synopt :{opt nk:eep(#)}}number of candidates kept for final refinement; default is {cmd:nkeep(5)} {p_end} {synopt :{opt tol:erance(#)}}tolerance for refinements; default is {cmd:tolerance(1e-10)} {p_end} {synopt :{opt iter:ate(#)}}maximum number of iterations; default is as set by {helpb set maxiter} {p_end} {synopt :{opt relax}}do not return error if convergence is not reached {p_end} {synopt :{opt noee}}do not use exact enumeration even if feasible {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker mm_opt}{col 5}{it:{help robmv##mm_options:mm_options}}{col 28}Description {synoptline} {synopt :{opt eff:iciency(#)}}desired efficiency, in percent; default is {cmd:efficiency(95)} {p_end} {synopt :{opt loc:ation}}set location efficiency rather than shape efficiency {p_end} {synopt : {help robmv##s_opt:{it:s_options}}}options as for {cmd:robreg s} {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker mve_opt}{col 5}{it:{help robmv##mve_options:mve_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt nore:weight}}report raw MVE estimate without reweighting step {p_end} {synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)} {p_end} {synopt :{opt alpha(#)}}reweighting cutoff, in percent; default is {cmd:alpha(2.5)} {p_end} {syntab :Consistency correction} {synopt :{opt calpha(#)}}set custom consistency factor for raw MVE estimate {p_end} {synopt :{opt cdelta(#)}}set custom consistency factor for reweighted estimate {p_end} {syntab :Algorithm} {synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)} {p_end} {synopt :{opt noee}}do not use exact enumeration even if feasible {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker mcd_opt}{col 5}{it:{help robmv##mcd_options:mcd_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt nore:weight}}report raw MCD estimate without reweighting step {p_end} {synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)} {p_end} {synopt :{opt alpha(#)}}reweighting cutoff, in percent; default is {cmd:alpha(2.5)} {p_end} {syntab :Consistency correction} {synopt :{opt calpha(#)}}set custom consistency factor for raw MCD estimate {p_end} {synopt :{opt cdelta(#)}}set custom consistency factor for reweighted estimate {p_end} {synopt :{opt nosmall}}omit additional small sample correction {p_end} {syntab :Algorithm} {synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)} {p_end} {synopt :{opt cstep:s(#)}}concentration steps applied to each trial candidate; default is {cmd:csteps(2)} {p_end} {synopt :{opt nk:eep(#)}}number of candidates kept for final refinement; default is {cmd:nkeep(10)} {p_end} {synopt :{opt nsub(#)}}minimum subsample size; default is max(p*50, 300); type {cmd:nsub(.)} to omit subsampling {p_end} {synopt :{opt ksub(#)}}maximum number of subsamples; default is {cmd:ksub(5)} {p_end} {synopt :{opt tol:erance(#)}}tolerance for final refinement; default is {cmd:tolerance(1e-10)} {p_end} {synopt :{opt iter:ate(#)}}maximum number of iterations; default is as set by {helpb set maxiter} {p_end} {synopt :{opt relax}}do not return error if convergence is not reached {p_end} {synopt :{opt noee}}do not use exact enumeration even if feasible {p_end} {synopt :{opt nouni:var}}use standard algorithm even if p=1 {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker sd_opt}{col 5}{it:{help robmv##sd_options:sd_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt h:uber}}use a Huber-type rather than rectangular function to down-weight outliers {p_end} {synopt :{opt alpha(#)}}outlier percentage under normality; default is {cmd:alpha(2.5)} {p_end} {synopt :{opt asym:metric}[{cmd:(}{it:#}{cmd:)}]}compute generalized SD distances {p_end} {synopt :{opt cut:off(#)}}set custom cutoff value for outlier identification {p_end} {synopt :{opt nofit}}do not compute the location and covariance estimate {p_end} {syntab :Generate} {synopt :{cmdab:gen:erate(}{it:names}{cmd:)}}store SD distances, outlier indicator, and weights {p_end} {synopt :{opt r:eplace}}allow overwriting existing variables {p_end} {syntab :Algorithm} {synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)} {p_end} {synopt :{opt nmax(#)}}maximum number of invalid candidates before aborting; default is max(1000,{cmd:nsamp()}) {p_end} {synopt :{opt expand}}expand invalid candidates by adding observations (not recommended) {p_end} {synopt :{opt noee}}do not use exact enumeration even if feasible {p_end} {synopt :{opt nostd}}omit standardization (not recommended) {p_end} {synopt :{opt control:s(spec)}}partial out effects of covariates {p_end} {synoptline} {synoptset 21 tabbed}{...} {marker predict_opt}{col 5}{it:{help robmv##predict_options:predict_options}}{col 28}Description {synoptline} {syntab :Main} {synopt :{opt d:istance}}generate robust distances; the default {p_end} {synopt :{opt r:d}}synonym for {cmd:distance} {p_end} {synopt :{opt o:utlier}[{cmd:(}{it:#}{cmd:)}]}generate outlier indicator {p_end} {synopt :{opt i:nlier}[{cmd:(}{it:#}{cmd:)}]}generate inlier indicator {p_end} {syntab :Additional M options} {synopt :{opt w:eights}}generate W1 weights {p_end} {syntab :Additional MVE/MCD options} {synopt :{opt s:ubset}}generate best H-subset indicator {p_end} {synopt :{opt nore:weight}}use raw MVE/MCD estimate {p_end} {synopt :{opt noscale}}use unscaled raw MVE/MCD estimate {p_end} {synoptline} {title:Description} {pstd} {cmd:robmv} provides a number of robust multivariate estimators of location and covariance. {pstd} {cmd:robmv classic} computes the classical (non-robust) estimate of location and covariance. Results are the same as computed by standard commands such as {helpb correlate}. {pstd} {cmd:robmv m} computes an M estimate of location and covariance using a Huber weighting function as suggested by Lopuha{c a:} (1989). Singular solutions are handled as suggested by Maronna et al. (2006, p. 184-185). {pstd} {cmd:robmv s} computes an S estimate of location and covariance (Lopuha{c a:} 1989) using the FastS algorithm as described in Hubert et al. (2013). {pstd} {cmd:robmv mm} computes an MM estimate of location and covariance (Salibian-Barrera et al. 2006). {pstd} {cmd:robmv mve} computes the Minimum Volume Ellipsoid (MVE) estimator of location and covariance. By default, the one-step reweighted estimate is reported instead of the raw MVE estimate. The estimation algorithm employs and improvement step as suggested by Maronna et al. (2006, p. 198). In case of an exact-fit situation (that is, when the variance matrix in the best H-subset is singular due to local collinearity among the variables) the means and covariances are based on all observations that lie on the hyperplane and the corresponding hyperplane equation is reported. {pstd} {cmd:robmv mcd} computes the Minimum Covariance Determinant (MCD) estimator of location and covariance. By default, the one-step reweighted estimate is reported instead of the raw MCD estimate. A fast algorithm as suggested by Rousseeuw and Van Driessen (1999) is used for computation of the MCD estimate. Consistency correction as given in Croux and Haesbroeck (1999) is applied. Furthermore, by default, small sample bias is corrected as suggested by Pison et al. (2002). In case of an exact-fit situation (that is, when the variance matrix in the best H-subset is singular due to local collinearity among the variables) the means and covariances are based on all observations that lie on the hyperplane and the corresponding hyperplane equation is reported. {pstd} {cmd:robmv sd} computes the Stahel-Donoho estimator of location and covariance as discussed, for example, by Maronna and Yohai (1995). It also supports the modified Stahel-Donoho estimator for skewed and/or heavy-tailed distributions suggested by Verardi and Vermandele (2016). {pstd} {cmd:predict} can be used after {cmd:robmv} to generate variables identifying outliers, containing robust distances, etc. {title:Dependencies} {pstd} {cmd:robmv} requires {cmd:moremata}; see {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":ssc describe moremata}. In addition, the {cmd:asymmetric} option of {cmd:robmv sd} requires {cmd:robbox}; see {net "describe robbox, from(http://fmwww.bc.edu/repec/bocode/r/)":ssc describe robbox}. {marker options}{...} {title:General options} {dlgtab:Main} {phang} {opt correlation} specifies that correlations be reported instead of variances and covariances. {dlgtab:Standard errors/CIs} {marker vcetype}{...} {phang} {opth vce(vcetype)} determines how standard errors and confidence intervals are computed. {it:vcetype} may be {cmd:analytic} {cmd:cluster} {it:clustvar} {cmd:bootstrap} [{cmd:,} {help bootstrap:{it:bootstrap_options}}] {cmd:jackknife} [{cmd:,} {help jackknife:{it:jackknife_options}}] {pmore} {cmd:vce(analytic)}, the default, computes standard errors based on influence functions. Likewise, {cmd:vce(cluster} {it:clustvar}{cmd:)} computes standard errors based on influence function allowing for intragroup correlation, where {it:clustvar} specifies to which group each observation belongs. For bootstrap and jackknife estimation, see help {it:{help vce_option}}. {pmore} {cmd:vce(analytic)} and {cmd:vce(cluster)} are currently not supported by {cmd:robmv mcd}, {cmd:robmv mve}, and {cmd:robmv sd}. No standard errors will be estimated by these subcommands. {phang} {cmd:svy}[{cmd:(}{it:subpop}{cmd:)}] causes the survey design to be taken into account for variance estimation. The data need to be set up for survey estimation; see help {helpb svyset}. Only one of {cmd:svy()} and {cmd:vce()} is allowed. Specify {it:subpop} to restrict survey estimation to a subpopulation, where {it:subpop} is [{varname}] [{it:{help if}}] {pmore} The subpopulation is defined by observations for which {it:varname}!=0 and for which the {cmd:if} condition is met. See help {helpb svy} and {manlink SVY subpopulation estimation} for more information on subpopulation estimation. {pmore} The {cmd:svy} option of {cmd:robmv} only works if the variance estimation method is set to Taylor linearization by {helpb svyset} (the default). For other variance estimation methods you can use the usual {helpb svy} prefix command. {pmore} {cmd:svy()} is currently not supported by {cmd:robmv mcd}, {cmd:robmv mve}, and {cmd:robmv sd}. No standard errors will be estimated by these subcommands. {phang} {opt nose} suppresses the computation of standard errors and confidence intervals. {marker ifgen}{...} {phang} {opt ifgenerate(names)} stores the influence functions that were used to compute the standard errors, where {it:names} is either a list of (new) variable names or {help newvarlist##stub*:{it:stub}}{cmd:*} to create names {it:stub}{cmd:1}, {it:stub}{cmd:2}, etc. {cmd:ifgenerate()} has no effect if specified together with {cmd:nose}, {cmd:vce(bootstrap)}, or {cmd:vce(jackknife)}. {phang} {opt replace} allows {cmd:ifgenerate()} to overwrite existing variables. {dlgtab:Reporting} {phang} {opt level(#)} specifies the confidence level, as a percentage, for confidence intervals. The default is {cmd:level(95)} or as set by {helpb set level}. {phang} {opt noheader} suppresses the output header; only the coefficient table is displayed. {phang} {opt notable} suppresses the coefficient table. {marker displayopts}{...} {phang} {it:display_options} are standard reporting options such as {cmd:cformat()}, {cmd:pformat()}, {cmd:sformat()}, or {cmd:coeflegend}. See {helpb estimation options:[R] estimation options}. {marker cl_options}{...} {title:Additional options for robmv classic} {phang} {opt normcoll} requests that collinear variables will be included in the estimation. The default is to remove collinear variables. {marker m_options}{...} {title:Additional options for robmv m} {dlgtab:Main} {phang} {opt k(#)} sets the tuning constant for the Huber objective function. Unless {cmd:ptrim()} is specified (see below), the default is to set the tuning constant to k = sqrt(p+1), where p is the number of variables, so that the maximum asymptotic breakdown point of bp = min(1/k^2, 1-p/k^2) is reached (see Lopuha{c a:} 1989). Note that {cmd:k()} must be larger than sqrt(p) for the M estimate to exist. {phang} {opt ptrim(#)} sets the percentage of winsorizing. If {opt ptrim()} is specified, the tuning constant is set to k = sqrt(invchi2tail(p, ptrim/100)), where p is the number of variables. Setting {cmd:ptrim(0)} will return the classical location and covariance estimate (no winsorizing). Note that {cmd:ptrim()} must be smaller than chi2tail(p,p)*100, where p is the number of variables, for the M estimate to exist. Only one of {cmd:ptrim()} and {cmd:k()} is allowed. {dlgtab:Consistency correction} {phang} {opt c(#)} specifies a custom consistency correction factor by which the raw estimate be rescaled. The default is to rescale the estimate so that it provides a consistent estimate of the mean and covariance matrix for normally distributed data (see option {cmd:cemp} below). {phang} {opt cemp} specifies that the normal consistency correction factor is estimated empirically (equation 6.22 in Maronna et al. 2006). The default is to derive the normal consistency correction factor numerically (equation 6.21 in Maronna et al. 2006). Use the {cmd:cemp} option should the default algorithm fail (unlikely to happen). {dlgtab:Algorithm} {phang} {opt tolerance(#)} sets the tolerance for the reweighting algorithm. When the maximum relative change in the location and covariance estimate is less than or equal to {cmd:tolerance()}, convergence is achieved. The default is {cmd:tolerance(1e-10)}. {phang} {opt iterate(#)} specifies the maximum number of iterations for the reweighting algorithm. If convergence is not reached within {cmd:iterate()} iterations, the algorithm stops and returns error. The default is as set by {helpb set maxiter}. {phang} {opt relax} causes the algorithm to return the current results instead of returning error if convergence is not reached within {cmd:iterate()} iterations. {marker s_options}{...} {title:Additional options for robmv s} {dlgtab:Main} {phang} {opt bp(#)} sets the breakdown point (in percent) with # in [1,50]. The default is {cmd:bp(50)}. {phang}{opt whilferty} obtains the tuning constant corresponding to the desired breakdown point by applying the Wilson-Hilferty transformation to the tuning constant of the univariate biweight function. The default is to obtain the tuning constant by finding value {it:k} that solves {it:bp} = {it:b} / ({it:k}^2/6), where {it:bp} is the desired breakdown point and {it:b} is the Gaussian consistency parameter of the scale optimization problem. {phang} {opt k(#)} sets the tuning constant to a custom value. Only one of {cmd:k()} and {cmd:bp()} is allowed. The procedure used to compute the breakdown point corresponding to {cmd:k()} depends on whether {cmd:whilferty} is specified or not. {dlgtab:Algorithm} {phang} {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the search algorithm. The default is {cmd:nsamp(20)}. {phang} {opt csteps(#)} sets the number of improvement steps (C-steps) applied when evaluating the trial candidates. The default is {cmd:csteps(2)}. {phang} {opt nkeep(#)} sets the number of best trial candidates kept for final refinement. The default is {cmd:nkeep(5)}. {phang} {opt tolerance(#)} sets the tolerance for the candidate scale refinements and the final refinement of the best candidates. When the relative change in the scale from one iteration to the next is less than or equal to {cmd:tolerance()}, convergence is achieved. The default is {cmd:tolerance(1e-10)}. {phang} {opt iterate(#)} specifies the maximum number of iterations for the candidate scale refinements and the final refinement of the best candidates. If convergence is not reached within {cmd:iterate()} iterations, the algorithm stops and returns error. The default is as set by {helpb set maxiter}. {phang} {opt relax} causes the algorithm to use the current results instead of returning error if convergence is not reached within {cmd:iterate()} iterations. {phang} {opt noee} specifies that enumeration of random trials is used even if exact enumeration of all possible candidates would be feasible. The algorithms uses exact enumeration of all possible (p+1)-subsets if comb(N, p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random (p+1)-subsets are enumerated. Given the low default value of {cmd:nsamp()}, exact enumeration will only be used in very small samples. Set {cmd:nsamp()} to comb(N, p+1), to enforce exact enumeration of (p+1)-subsets. {marker mm_options}{...} {title:Additional options for robmv mm} {phang} {opt efficiency(#)} sets the desired gaussian efficiency (in percent) with # in [70,100). The default is {cmd:efficiency(95)}. {phang} {opt location} requests that {cmd:efficiency()} sets the location efficiency. The default is to set the shape efficiency. {phang} {help robmv##s_options:{it:s_options}} are additional options as for {cmd:robreg s}. {marker mve_options}{...} {title:Additional options for robmv mve} {dlgtab:Main} {phang} {opt noreweight} causes the raw MVE estimate to be reported instead of the one-step reweighted estimate. The one-step reweighted estimate is computed from the observations whose robust distances based on the raw MVE fit are smaller than invchi2(p, 0.975), where p is the number of variables. The one-step reweighted estimate has better efficiency properties than the raw MVE estimate. {p_end} {phang} {opt bp(#)} sets the approximate breakdown point (in percent) with # being an integer number between 0 and 50. The default is {cmd:bp(50)}. The breakdown point determines the size of the H-subset, that is, the number of observations in the subset that identifies the MVE fit. The size of the H-subset is computed as h = floor((N - p - 1)*(1 - bp/100) + p + 1) {pmore} where N is the sample size, p is the number of variable and bp is the specified breakdown point (in percent). In case of a breakdown point of 50% this simplifies to h = floor((N + p + 1)/2). Note that in case of weights, the H-subset will be constructed from the raw observations ignoring weights. That is, the breakdown point should be interpreted in terms of a percentage of the raw observations and not in terms of a percentage of the sum of weights. {phang} {opt alpha(#)} sets the cutoff, in percent, used to determine the weights for the reweighted estimate. The weights are set to one for observations with squared distanced smaller than invchi2(p, 1-alpha/100) and zero else, where p is the number of variables. The default is {cmd:alpha(2.5)}. {dlgtab:Consistency correction} {phang} {opt calpha(#)} specifies a custom consistency factor by which the initial MVE estimate be rescaled. The default is to rescale the estimate so that it provides a consistent estimate of the mean and covariance matrix for normally distributed data. The default consistency factor is computed as median(d2)/invchi2(p, 0.5), where d2 are the squared robust distances based on the initial MVE fit an p is the number of variables. {phang} {opt cdelta(#)} specifies a custom consistency factor by which the one-step reweighted estimate be rescaled. The default is to rescale the estimate so that it provides a consistent estimate of the mean and covariance matrix for normally distributed data. The default consistency factor is computed as (W/N)/chi2(p+2, invchi2(p, W/N)), where p is the number of variables, N is the sample size, and W is the number of observations on which the reweighted estimate is based (see option {cmd:noreweight} above). {dlgtab:Algorithm} {phang} {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the search algorithm. The default is {cmd:nsamp(500)}. {phang} {opt noee} specifies that enumeration of random trials is used even if exact enumeration of all possible candidates would be feasible. The algorithms uses exact enumeration of all possible (p+1)-subsets if comb(N, p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random (p+1)-subsets are enumerated. Given the low default value of {cmd:nsamp()}, exact enumeration will only be used in very small samples. Set {cmd:nsamp()} to comb(N, p+1), to enforce exact enumeration of (p+1)-subsets. {marker mcd_options}{...} {title:Additional options for robmv mcd} {dlgtab:Main} {phang} {opt noreweight} causes the raw MCD estimate to be reported instead of the one-step reweighted estimate. The one-step reweighted estimate is computed from the observations whose robust distances based on the raw MCD fit are smaller than invchi2(p, 0.975), where p is the number of variables. The one-step reweighted estimate has better efficiency properties than the raw MCD estimate. {p_end} {phang} {opt bp(#)} sets the approximate breakdown point (in percent) with # being an integer number between 0 and 50. The default is {cmd:bp(50)}. The breakdown point determines the size of the H-subset, that is, the number of observations in the subset that identifies the MCD fit. The size of the H-subset is computed as h = floor((N - p - 1)*(1 - bp/100) + p + 1) {pmore} where N is the sample size, p is the number of variable and bp is the specified breakdown point (in percent). In case of a breakdown point of 50% this simplifies to h = floor((N + p + 1)/2). The relationship between h and the desired breakdown point is only approximate, as the breakdown point of the MCD estimate is given as (N - h + 1)/N. The larger the sample size, the better h realizes the desired breakdown point. Note that in case of weights, the H-subset will be constructed from the raw observations ignoring weights. That is, the breakdown point should be interpreted in terms of a percentage of the raw observations and not in terms of a percentage of the sum of weights. {phang} {opt alpha(#)} sets the cutoff, in percent, used to determine the weights for the reweighted estimate. The weights are set to one for observations with squared distanced smaller than invchi2(p, 1-alpha/100) and zero else, where p is the number of variables. The default is {cmd:alpha(2.5)}. {dlgtab:Consistency correction} {phang} {opt calpha(#)} specifies a custom consistency factor by which the initial MCD estimate be rescaled. The default is to rescale the estimate so that it provides a consistent estimate of the mean and covariance matrix for normally distributed data. The default consistency factor is computed as (h/N)/chi2(p+2, invchi2(p, h/N)), where p is the number of variables, N is the sample size, and h is the size of the H-subset (see option {cmd:bp()} above). {phang} {opt cdelta(#)} specifies a custom consistency factor by which the one-step reweighted estimate be rescaled. The default is to rescale the estimate so that it provides a consistent estimate of the mean and covariance matrix for normally distributed data. The default consistency factor is computed as (W/N)/chi2(p+2, invchi2(p, W/N)), where p is the number of variables, N is the sample size, and W is the number of observations on which the reweighted estimate is based (see option {cmd:noreweight} above). {phang} {opt nosmall} specifies that the additional small sample correction suggested by Pison et al. (2002) be omitted. {p_end} {dlgtab:Algorithm} {phang} {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the search algorithm for the best H-subset. The default is {cmd:nsamp(500)}. {phang} {opt csteps(#)} sets the number of concentration steps (C-steps) applied when evaluating the trial candidates. The default is {cmd:csteps(2)}. {phang} {opt nkeep(#)} sets the number of best trial candidates kept for final refinement. The default is {cmd:nkeep(10)}. {phang} {opt nsub(#)} specifies the subsample size used by the search algorithm in case of a large sample size N. The default is max(p*50, 300), where p is the number of variables. If N >= 2*{cmd:nsub()} the algorithm splits the sample into subsamples for the enumeration of the trial candidates. Up to {cmd:ksub()} subsamples are constructed (see below). For example, if {cmd:nsub()} is set to 300, {cmd:ksub()} is set to 5, and N = 10000, then 5 subsamples of size 300 are drawn (without replacement) from the 10000 observations and in each of these subsample {cmd:nsamp()}/5 trial candidates are enumerated. From each subsample the {cmd:nkeep()} best candidates are kept for further evaluation. The 5 subsamples are then merged together to a subsample of size 5*300 = 1500 observations and the 5*{cmd:nkeep()} candidates are evaluated. The {cmd:nkeep()} best candidates from the merged sample are then refined util convergence in the full sample to identify the best solution. If N < 2*{cmd:nsub()}, no subsampling is applied, that is, all {cmd:nsamp()} candidates are enumerated in the full sample. If 2*{cmd:nsub()} <= N <= {cmd:ksub()}*{cmd:nsub()}, the data is split into as many subsamples as possible using a minimum subsample size of {cmd:nsub()} observations. For example, if N = 800 and {cmd:nsub()} is set to 300, then the data is split into two subsamples with 400 observations and in each of the subsample {cmd:nsamp()}/2 trial candidates are enumerated. See Rousseeuw and Van Driessen (1999) for a more detailed description of the algorithm. {pmore} Specify {cmd:nsub(.)} to omit subsampling and evaluate all {cmd:nsamp()} trial candidates in the full sample irrespective of the sample size. {phang} {opt ksub(#)} sets the maximum number of subsamples used by the large-N algorithm; see the {cmd:nsub()} option above. The default is {cmd:ksub(5)}. {cmd:ksub()} must be equal to 2 or larger. {phang} {opt tolerance(#)} sets the tolerance for the final refinement of the best candidates. When the relative change in the determinant from one iteration to the next is less than or equal to {cmd:tolerance()}, convergence is achieved. The default is {cmd:tolerance(1e-10)}. {phang} {opt iterate(#)} specifies the maximum number of iterations for the final refinement of the best candidates. If convergence is not reached within {cmd:iterate()} iterations, the algorithm stops and returns error. The default is as set by {helpb set maxiter}. {phang} {opt relax} causes the algorithm to return the current results instead of returning error if convergence is not reached within {cmd:iterate()} iterations. Use this option together with {cmd:iterate()} if you want to restrict the number of C-steps in the final refinement. {phang} {opt noee} specifies that enumeration of random trials is used even if exact enumeration of all possible candidates would be feasible. The algorithms uses exact enumeration of all possible H-subsets if comb(N, h) <= {cmd:nsamp()} and uses exact enumeration of all possible (p+1)-subsets if comb(N, p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random (p+1)-subsets are enumerated (while expanding singular subsets until nonsingular or until reaching size h). Given the low default value of {cmd:nsamp()}, exact enumeration will only be used in very small samples. Set {cmd:nsamp()} to comb(N, h) or comb(N, p+1), to enforce exact enumeration of H-subsets or (p+1)-subsets, respectively. {phang} {opt nounivar} specifies that the standard search algorithm is used even if only one variable is analyzed. The default is to use an exact enumeration algorithm for univariate data in this case. {marker sd_options}{...} {title:Additional options for robmv sd} {dlgtab:Main} {phang} {opt huber} requests that a Huber-type function is used to down-weight the outliers when computing the location and covariance estimate. By default, a rectangular function is used, which is equivalent to excluding the outliers. {phang} {opt alpha(#)} sets the expected percentage of observations that will be classified as outliers under normal conditions. The default is {cmd:alpha(2.5)}. {cmd:alpha()} has no effect if {cmd:cutoff()} is specified. {phang} {opt asymmetric}[{cmd:(}{it:#}{cmd:)}] computes generalized SD distances and determines the cutoff point for outlier identification based on Tukey's g-and-h distribution (employing command {helpb robbox}) as suggested by Verardi and Vermandele (2016). {it:#} sets the breakdown point, in percent, that is used when fitting the g-and-h distribution; the default is {cmd:10}. {it:#} has no effect if {cmd:cutoff()} is specified. {phang} {opt cutoff(#)} specifies a custom cutoff value for outlier identification. By default, if {cmd:asymmetric} is omitted, the cutoff value is set to sqrt(invchi2({it:p}, 1 - {cmd:alpha()}/100)), where {it:p} is the number of variables. If {cmd:asymmetric} is specified, the default is to determine the cutoff value corresponding to {cmd:alpha()} based on Tukey's g-and-h distribution. {phang} {opt nofit} omits the computation of the location and covariance estimate. Use this option if you are only interested in the Stahel-Donoho distances, but not in the location and covariance estimate. The Stahel-Donoho distances can be stored by the {cmd:generate()} option. {dlgtab:Generate} {phang} {cmd:generate(}{it:names}{cmd:)} store a variable containing the SD distances, an outlier indicator, and a variable containing the weights used to compute the location and covariance estimate. {it:names} may contain one to three names, depending on whether you only want to store the distances, the distances and the outlier indicators, or the distances, outlier indicator, and weights. {phang} {opt replace} allows {cmd:generate()} to overwrite existing variables. {dlgtab:Algorithm} {phang} {opt nsamp(#)} specifies the number of candidates (p-subsets) to be evaluated in the search algorithm. The default is {cmd:nsamp(500)}. {phang} {opt nsmax(#)} specifies the maximum number of candidates that are allowed to be infeasible (within each single search for a trial candidate) before aborting with error. The default is max(1000,{cmd:nsamp()}). {phang} {opt expand} expands infeasible candidates by adding observations to the subset until the candidate becomes feasible. Specifying {cmd:expand} is not recommended. {phang} {opt noee} specifies that enumeration of random subsets is used even if exact enumeration of all possible subsets would be feasible. The algorithms uses exact enumeration of all possible subsets if comb(N, p) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random subsets are enumerated. {phang} {opt nostd} omits standardization of the data for the enumeration algorithm. Specifying {cmd:nostd} is not recommended. {phang} {cmd:controls(}{it:varlist}[{cmd:,} {it:options}]{cmd:)} partials out the effects of {it:varlist} from the SD distances within each projection; {it:varlist} may contain factor variables; see {help fvvarlist}. {it:options} determine whether to include the controls in the final location and covariance estimate and set the details of the Huber M estimator used to partial out the effects of the controls; the options are as follows: {phang2} {cmd:include} includes the variables specified in {cmd:controls()} in the final location and covariance estimate. The default is to include only the main variables. {phang2} {opt eff:iciency(#)} sets the gaussian efficiency of the M estimator, in percent. {it:#} must be within [63.7,99.9]. The default is 100 - {cmd:alpha()}. {phang2} {opt k(#)} sets the tuning constant of the M estimator. Only one of {opt efficiency()} and {cmd:k()} is allowed. {phang2} {opt tol:erance(#)} set the tolerance of the M estimator. The default is {cmd:tolerance(1e-10)}. {phang2} {opt iter:ate(#)} set the maximum number of iterations of the M estimator. The default is as set by {helpb set maxiter} {marker predict_options}{...} {title:Options for predict} {dlgtab:Main} {phang} {opt distance} generates a variable containing robust distances. This is the default. {phang} {opt rd} is a synonym for {cmd:distance}. {phang} {opt outlier}[{cmd:(}{it:#}{cmd:)}] generates a 0/1 variable identifying outliers (1 = outlier, 0 = inlier). Optional argument {it:#} specifies the percentage of observations classified as outliers in normal data. That is, observations with squared distances greater than or equal to invchi2(p, 1-{it:#}/100), were p is the number of variables, are classified as outliers. Argument {it:#} must be in [0,50]; the default is 2.5. {phang} {opt inlier}[{cmd:(}{it:#}{cmd:)}] generates a 0/1 variable identifying inliers (1 = inlier, 0 = outlier). Optional argument {it:#} specifies the percentage of observations classified as inliers in normal data. That is, observations with squared distances smaller than invchi2(p, {it:#}/100), were p is the number of variables, are classified as inliers; all other observations are classified as outliers. Argument {it:#} must be in [50,100]; default is 97.5. {dlgtab:Additional M options} {phang} {opt weights} generates a variable containing the W1 weights of the M fit. {dlgtab:Additional MVE/MCD options} {phang} {opt subset} generates a 0/1 variable identifying the best H-subset in the estimation sample. Observations outside {cmd:e(sample)} will be set to missing. {phang} {opt noreweight} specifies that the raw MVE/MCD estimate be used for determining robust distances, ouliers, and inliers. If available (that is, unless option {cmd:noreweight} was specified during estimation), the default is to base computations on the one-step reweighted estimate. {phang} {opt noscale} specifies that the unscaled raw MVE/MCD estimate be used for determining robust distances, ouliers, and inliers. The unscaled raw MVE/MCD estimate is equal to the raw MVE/MCD estimate before applying consistency or small-sample correction factors. {title:Examples} . {stata sysuse auto} . {stata robmv classic price mpg weight length} . {stata robmv m price mpg weight length} . {stata robmv s price mpg weight length} . {stata robmv mm price mpg weight length} . {stata robmv mve price mpg weight length} . {stata robmv mcd price mpg weight length} . {stata robmv sd price mpg weight length} {title:Saved results} {pstd} {cmd:robmv} stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations{p_end} {synopt:{cmd:e(nvars)}}number of variables included in the location and covariance estimate{p_end} {synopt:{cmd:e(rnk)}}rank of covariance matrix{p_end} {synopt:{cmd:e(N_clust)}}number of clusters (only if {cmd:vce(cluster)} is specified){p_end} {synopt:{cmd:e(df_r)}}sample degrees of freedom (only if {cmd:e(V)} is stored){p_end} {synopt:{cmd:e(rank)}}rank of {cmd:e(V)}{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(cmdline)}}command as typed{p_end} {synopt:{cmd:e(cmd)}}{cmd:robmv}{p_end} {synopt:{cmd:e(subcmd)}}name of subcommand{p_end} {synopt:{cmd:e(predict)}}{cmd:robmv_p}{p_end} {synopt:{cmd:e(depvar)}}{cmd:Cov} or {cmd:Corr}{p_end} {synopt:{cmd:e(valist)}}names of variables included in the location and covariance estimate{p_end} {synopt:{cmd:e(valist0)}}names of variables including base levels and omitted terms{p_end} {synopt:{cmd:e(correlation)}}{cmd:correlation} or empty{p_end} {synopt:{cmd:e(wtype)}}weight type{p_end} {synopt:{cmd:e(wexp)}}weight expression{p_end} {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end} {synopt:{cmd:e(clustvar)}}name of cluster variable{p_end} {synopt:{cmd:e(title)}}title in estimation output{p_end} {synopt:{cmd:e(properties)}}{cmd:b V} or {cmd:b}{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Matrices}{p_end} {synopt:{cmd:e(b)}}estimates{p_end} {synopt:{cmd:e(V)}}sampling variance of estimates (only if supported){p_end} {synopt:{cmd:e(mu)}}location estimates{p_end} {synopt:{cmd:e(Cov)}}covariance estimates{p_end} {synopt:{cmd:e(Corr)}}correlation estimates{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Functions}{p_end} {synopt:{cmd:e(sample)}}marks estimation sample{p_end} {p2colreset}{...} {pstd} {cmd:robmv m} additionally stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(bp)}}limiting value of breakdown point{p_end} {synopt:{cmd:e(ptrim)}}winsorizing percentage{p_end} {synopt:{cmd:e(k)}}tuning constant of the Huber objective function{p_end} {synopt:{cmd:e(c)}}consistency correction factor{p_end} {synopt:{cmd:e(tolerance)}}tolerance for reweighting algorithm{p_end} {synopt:{cmd:e(iterate)}}maximum number of iterations for reweighting algorithm{p_end} {synopt:{cmd:e(niter)}}executed number of iterations{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end} {pstd} {cmd:robmv s} and {cmd:robmv mm} additionally store the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(bp)}}breakdown point{p_end} {synopt:{cmd:e(k)}}tuning constant{p_end} {synopt:{cmd:e(delta)}}normal consistency parameter{p_end} {synopt:{cmd:e(nsamp)}}number of trial candidates{p_end} {synopt:{cmd:e(csteps)}}number of C-steps for trial candidates{p_end} {synopt:{cmd:e(nkeep)}}number of best candidates for final refinement{p_end} {synopt:{cmd:e(tolerance)}}tolerance for refinements{p_end} {synopt:{cmd:e(iterate)}}maximum number of iterations for refinements{p_end} {synopt:{cmd:e(scale)}}scale estimate{p_end} {synopt:{cmd:e(efficiency)}}efficiency, in percent ({cmd:robreg mm} only){p_end} {synopt:{cmd:e(k_m)}}tuning constant of M step ({cmd:robreg mm} only){p_end} {synopt:{cmd:e(niter)}}executed number of M step iterations ({cmd:robreg mm} only){p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(whilferty)}}{cmd:whilferty} or empty{p_end} {synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end} {synopt:{cmd:e(method)}}{cmd:random} or {cmd:exact}{p_end} {synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end} {synopt:{cmd:e(efftype)}}{cmd:shape} or {cmd:location} ({cmd:robreg mm} only){p_end} {pstd} {cmd:robmv mve} additionally stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(h)}}size of H-subset{p_end} {synopt:{cmd:e(bp)}}requested breakdown point{p_end} {synopt:{cmd:e(calpha)}}consistency factor for raw MVE estimate{p_end} {synopt:{cmd:e(cdelta)}}consistency factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end} {synopt:{cmd:e(nsamp)}}number of trial candidates{p_end} {synopt:{cmd:e(nhyper)}}number of observations on hyperplane if H-subset is collinear; {cmd:0} else {p_end} {synopt:{cmd:e(MVE)}}(normalized) scale of initial MVE estimate{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(noreweight)}}{cmd:noreweight} or empty{p_end} {synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end} {synopt:{cmd:e(method)}}{cmd:classical}, {cmd:random}, or {cmd:exact}{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Matrices}{p_end} {synopt:{cmd:e(mu0)}}unscaled raw MVE location estimate{p_end} {synopt:{cmd:e(Cov0)}}unscaled raw MVE covariance estimate{p_end} {synopt:{cmd:e(Corr0)}}unscaled raw MVE correlation estimate{p_end} {synopt:{cmd:e(gamma)}}coefficients of hyperplane equation (if H-subset is collinear){p_end} {pstd} {cmd:robmv mcd} additionally stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(h)}}size of H-subset{p_end} {synopt:{cmd:e(bp)}}requested breakdown point{p_end} {synopt:{cmd:e(calpha)}}consistency factor for raw MCD estimate{p_end} {synopt:{cmd:e(salpha)}}small sample correction factor for raw MCD estimate {p_end} {synopt:{cmd:e(cdelta)}}consistency factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end} {synopt:{cmd:e(sdelta)}}small sample correction factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end} {synopt:{cmd:e(nsamp)}}number of trial candidates{p_end} {synopt:{cmd:e(nsub)}}(minimum) size of subsamples for large-N algorithm{p_end} {synopt:{cmd:e(ksub)}}number of subsamples uses by large-N algorithm; {cmd:0} else{p_end} {synopt:{cmd:e(nmerged)}}size of merged subsamples{p_end} {synopt:{cmd:e(csteps)}}(maximum) number of C-steps for trial candidates{p_end} {synopt:{cmd:e(nkeep)}}number of best candidates for final refinement{p_end} {synopt:{cmd:e(tolerance)}}tolerance for final refinement{p_end} {synopt:{cmd:e(iterate)}}maximum number of iterations for final refinement{p_end} {synopt:{cmd:e(nhyper)}}number of observations on hyperplane if H-subset is collinear; {cmd:0} else {p_end} {synopt:{cmd:e(MCD)}}determinant of unscaled raw MCD estimate{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(noreweight)}}{cmd:noreweight} or empty{p_end} {synopt:{cmd:e(nosmall)}}{cmd:nosmall} or empty{p_end} {synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end} {synopt:{cmd:e(method)}}{cmd:classical}, {cmd:univar}, {cmd:random}, {cmd:exact-h}, or {cmd:exact-p}{p_end} {synopt:{cmd:e(nounivar)}}{cmd:nounivar} or empty{p_end} {synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Matrices}{p_end} {synopt:{cmd:e(mu0)}}unscaled raw MCD location estimate{p_end} {synopt:{cmd:e(Cov0)}}unscaled raw MCD covariance estimate{p_end} {synopt:{cmd:e(Corr0)}}unscaled raw MCD correlation estimate{p_end} {synopt:{cmd:e(gamma)}}coefficients of hyperplane equation (if H-subset is collinear){p_end} {pstd} {cmd:robmv sd} additionally stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Scalars}{p_end} {synopt:{cmd:e(nsamp)}}number of trial candidates{p_end} {synopt:{cmd:e(nmax)}}setting of {cmd:nmax()}{p_end} {synopt:{cmd:e(nskip)}}number of discarded candidates{p_end} {synopt:{cmd:e(alpha)}}outlier percentage under normality{p_end} {synopt:{cmd:e(cutoff)}}cutoff value for outlier identification{p_end} {synopt:{cmd:e(Nout)}}number of observations classified as outliers{p_end} {synoptset 20 tabbed}{...} {p2col 7 20 24 2: Macros}{p_end} {synopt:{cmd:e(method)}}{cmd:uniform}, {cmd:random} or {cmd:exact}{p_end} {synopt:{cmd:e(asymmetric)}}{cmd:asymmetric} or empty{p_end} {synopt:{cmd:e(xvars)}}names of main variables{p_end} {synopt:{cmd:e(controls)}}names of control variables{p_end} {synopt:{cmd:e(include)}}{cmd:include} or empty{p_end} {synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end} {synopt:{cmd:e(expand)}}{cmd:expand} or empty{p_end} {synopt:{cmd:e(nostd)}}{cmd:nostd} or empty{p_end} {synopt:{cmd:e(wftype)}}{cmd:huber} or {cmd:rectangle}{p_end} {synopt:{cmd:e(nofit)}}{cmd:nofit} or empty{p_end} {synopt:{cmd:e(generate)}}names of generated variables{p_end} {pstd} If {cmd:robmv sd} is specified with option {cmd:nofit}, only a reduced set of results is stored. {pstd} If the {cmd:svy} option is specified, various additional results as described in help {helpb svy} are stored. {title:References} {phang} Croux, C., G. Haesbroeck (1999). Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator. Journal of Multivariate Analysis 71: 161 190. {phang} Hubert, M., P.J. Rousseeuw, D. Vanpaemel, T. Verdonck (2013). A deterministic algorithm for S-estimators and MM-estimators of multivariate location and scatter. KU Leuven. Available from {browse "http://wis.kuleuven.be/stat/robust/papers/2013/dets-technicalreport.pdf"}. {phang} Lopuha{c a:}, H. P. (1989). On the Relation Between S-Estimators and M-Estimators of Multivariate Location and Covariance. The Annals of Statistics 17: 1662-1683. {phang} Maronna, R.A., D.R. Martin, V.J. Yohai (2006). Robust Statistics. Theory and Methods. Chichester: John Wiley & Sons. {phang} Maronna, R.A., V.J. Yohai (1995). The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90(429): 330-341. {phang} Pison, G., S. Van Aelst, G. Willems (2002). Small sample corrections for LTS and MCD. Metrika 55: 111-123. {phang} Rousseeuw, P.J., K. Van Driessen (1999). A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 41(3): 212-223. {phang} Salibian-Barrera, M., S. Van Aelst, G. Willems (2006). Principal Components Analysis Based on Multivariate MM Estimators With Fast and Robust Bootstrap. Journal of the American Statistical Association 101(475):1198-1211. {phang} Verardi, V., C. Vermandele (2016). Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions. Journal de la Société Française de Statistique 157(2): 90-114. {title:Authors} {pstd} Ben Jann (University of Bern), Vincenzo Verardi (University of Namur and Universite libre de Bruxelles), Catherine Vermandele (Universite libre de Bruxelles) {pstd} Support: ben.jann@soz.unibe.ch {pstd} Thanks for citing this software as follows: {pmore} Jann, B., V. Verardi, C. Vermandele (2021). robmv: Stata module for robust multivariate estimation of location and covariance. Available from {browse "http://ideas.repec.org/c/boc/bocode/s458895.html"}. {title:Also see} {psee} Online: help for {helpb correlate}, {helpb robreg}, {helpb robstat}, {helpb robbox}