{smcl}
{* 06feb2021}{...}
{hi:help robmv}{...}
{right:{browse "http://github.com/benjann/robmv/"}}
{hline}

{title:Title}

{pstd}{hi:robmv} {hline 2} Robust multivariate estimation of location and covariance


{title:Syntax}

{pstd}Classical (non-robust) estimator

{p 8 15 2}
    {cmd:robmv} {opt cl:assic} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##cl_opt:{it:classic_options}} {help robmv##opt:{it:general_options}} ]

{pstd}M estimator

{p 8 15 2}
    {cmd:robmv} {opt m} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##m_opt:{it:m_options}} {help robmv##opt:{it:general_options}} ]

{pstd}S estimator

{p 8 15 2}
    {cmd:robmv} {opt s} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##s_opt:{it:s_options}} {help robmv##opt:{it:general_options}} ]

{pstd}MM estimator

{p 8 15 2}
    {cmd:robmv} {opt mm} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##mm_opt:{it:mm_options}} {help robmv##opt:{it:general_options}} ]

{pstd}Minimum Volume Ellipsoid (MVE) estimator

{p 8 15 2}
    {cmd:robmv} {opt mve} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##mve_opt:{it:mve_options}} {help robmv##opt:{it:general_options}} ]

{pstd}Minimum Covariance Determinant (MCD) estimator

{p 8 15 2}
    {cmd:robmv} {opt mcd} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##mcd_opt:{it:mcd_options}} {help robmv##opt:{it:general_options}} ]

{pstd}Stahel-Donoho estimator

{p 8 15 2}
    {cmd:robmv} {opt sd} {varlist} {ifin} {weight}
    [{cmd:,} {help robmv##sd_opt:{it:sd_options}} {help robmv##opt:{it:general_options}} ]

{pstd}Generate robust distances, outliers, etc., after estimation

{p 8 15 2}
    {cmd:predict} {dtype} {newvar} {ifin} [{cmd:,}
    {help robmv##predict_opt:{it:predict_options}} ]

{p 4 6 2}
{it:varlist} may contain factor variables; see {help fvvarlist}.{p_end}
{p 4 6 2}
{opt pweight}s, {opt aweight}s, {opt iweight}s, and {opt fweight}s are
allowed; see {help weight}{p_end}
{p 4 6 2}(exception: {cmd:robmv mcd} and {cmd:robmv mve} do
not allow {opt fweight}s)

{synoptset 21 tabbed}{...}
{marker opt}{col 5}{it:{help robmv##options:general_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt corr:elation}}report correlations instead of covariances
    {p_end}

{syntab :Standard errors/CIs}
{synopt :{cmd:vce(}{help robmv##vcetype:{it:vcetype}}{cmd:)}}{it:vcetype} may
    be {cmdab:a:nalytic} (the default), {cmdab:cl:uster} {it:clustvar},
    {cmdab:boot:strap} or {cmdab:jack:knife}
    {p_end}
{synopt :{cmd:svy}[{cmd:(}{help robstat##svy:{it:subpop}}{cmd:)}]}take account
    of survey design as set by {helpb svyset}, optionally restricting
    computations to {it:subpop}
    {p_end}
{synopt :{opt nose}}suppress computation of standard errors and confidence
    intervals
    {p_end}
{synopt :{cmdab:if:generate(}{help robmv##ifgen:{it:names}}{cmd:)}}stores the values of the
    influence functions
    {p_end}
{synopt :{opt r:eplace}}allows overwriting existing variables
    {p_end}

{syntab :Reporting}
{synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)}
    {p_end}
{synopt :{opt nohe:ader}}suppress output header
    {p_end}
{synopt :{opt notab:le}}suppress output table
    {p_end}
{synopt :{help robmv##displayopts:{it:display_options}}}standard
    reporting options as described in
    {helpb estimation options:[R] estimation options}
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker cl_opt}{col 5}{it:{help robmv##cl_options:classic_options}}{col 28}Description
{synoptline}
{synopt :{opt normc:oll}}do not remove collinear variables
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker m_opt}{col 5}{it:{help robmv##m_options:m_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt k(#)}}set custom tuning constant
    {p_end}
{synopt :{opt ptrim(#)}}set winsorizing percentage
    {p_end}

{syntab :Consistency correction}
{synopt :{opt c(#)}}set custom consistency correction factor
    {p_end}
{synopt :{opt cemp}}use alternative approach to compute consistency correction factor
    {p_end}

{syntab :Algorithm}
{synopt :{opt tol:erance(#)}}tolerance for reweighting algorithm; default is
    {cmd:tolerance(1e-10)}
    {p_end}
{synopt :{opt iter:ate(#)}}maximum number of iterations;
    default is as set by {helpb set maxiter}
    {p_end}
{synopt :{opt relax}}do not return error if convergence is not reached
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker s_opt}{col 5}{it:{help robmv##s_options:s_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)}
    {p_end}
{synopt :{opt wh:ilferty}}obtain tuning constant using Wilson-Hilferty transformation
    {p_end}
{synopt :{opt k(#)}}custom tuning constant
    {p_end}

{syntab :Algorithm}
{synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(20)}
    {p_end}
{synopt :{opt cstep:s(#)}}improvement steps applied to each trial
    candidate; default is {cmd:csteps(2)}
    {p_end}
{synopt :{opt nk:eep(#)}}number of candidates kept for final refinement;
    default is {cmd:nkeep(5)}
    {p_end}
{synopt :{opt tol:erance(#)}}tolerance for refinements; default is
    {cmd:tolerance(1e-10)}
    {p_end}
{synopt :{opt iter:ate(#)}}maximum number of iterations; default is as set by {helpb set maxiter}
    {p_end}
{synopt :{opt relax}}do not return error if convergence is not reached
    {p_end}
{synopt :{opt noee}}do not use exact enumeration even if feasible
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker mm_opt}{col 5}{it:{help robmv##mm_options:mm_options}}{col 28}Description
{synoptline}
{synopt :{opt eff:iciency(#)}}desired efficiency, in percent; default is {cmd:efficiency(95)}
    {p_end}
{synopt :{opt loc:ation}}set location efficiency rather than shape efficiency
    {p_end}
{synopt : {help robmv##s_opt:{it:s_options}}}options as for {cmd:robreg s}
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker mve_opt}{col 5}{it:{help robmv##mve_options:mve_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt nore:weight}}report raw MVE estimate without reweighting step
    {p_end}
{synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)}
    {p_end}
{synopt :{opt alpha(#)}}reweighting cutoff, in percent; default is {cmd:alpha(2.5)}
    {p_end}

{syntab :Consistency correction}
{synopt :{opt calpha(#)}}set custom consistency factor for raw MVE estimate
    {p_end}
{synopt :{opt cdelta(#)}}set custom consistency factor for reweighted estimate
    {p_end}

{syntab :Algorithm}
{synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)}
    {p_end}
{synopt :{opt noee}}do not use exact enumeration even if feasible
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker mcd_opt}{col 5}{it:{help robmv##mcd_options:mcd_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt nore:weight}}report raw MCD estimate without reweighting step
    {p_end}
{synopt :{opt bp(#)}}breakdown point, in percent; default is {cmd:bp(50)}
    {p_end}
{synopt :{opt alpha(#)}}reweighting cutoff, in percent; default is {cmd:alpha(2.5)}
    {p_end}

{syntab :Consistency correction}
{synopt :{opt calpha(#)}}set custom consistency factor for raw MCD estimate
    {p_end}
{synopt :{opt cdelta(#)}}set custom consistency factor for reweighted estimate
    {p_end}
{synopt :{opt nosmall}}omit additional small sample correction
    {p_end}

{syntab :Algorithm}
{synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)}
    {p_end}
{synopt :{opt cstep:s(#)}}concentration steps applied to each trial
    candidate; default is {cmd:csteps(2)}
    {p_end}
{synopt :{opt nk:eep(#)}}number of candidates kept for final refinement;
    default is {cmd:nkeep(10)}
    {p_end}
{synopt :{opt nsub(#)}}minimum subsample size; default is max(p*50, 300); type {cmd:nsub(.)} to omit subsampling
    {p_end}
{synopt :{opt ksub(#)}}maximum number of subsamples; default is {cmd:ksub(5)}
    {p_end}
{synopt :{opt tol:erance(#)}}tolerance for final refinement; default is
    {cmd:tolerance(1e-10)}
    {p_end}
{synopt :{opt iter:ate(#)}}maximum number of iterations;
    default is as set by {helpb set maxiter}
    {p_end}
{synopt :{opt relax}}do not return error if convergence is not reached
    {p_end}
{synopt :{opt noee}}do not use exact enumeration even if feasible
    {p_end}
{synopt :{opt nouni:var}}use standard algorithm even if p=1
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker sd_opt}{col 5}{it:{help robmv##sd_options:sd_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt h:uber}}use a Huber-type rather than rectangular function
    to down-weight outliers
    {p_end}
{synopt :{opt alpha(#)}}outlier percentage under normality; default is {cmd:alpha(2.5)}
    {p_end}
{synopt :{opt asym:metric}[{cmd:(}{it:#}{cmd:)}]}compute generalized SD distances
    {p_end}
{synopt :{opt cut:off(#)}}set custom cutoff value for outlier identification
    {p_end}
{synopt :{opt nofit}}do not compute the location and covariance estimate
    {p_end}

{syntab :Generate}
{synopt :{cmdab:gen:erate(}{it:names}{cmd:)}}store SD distances, outlier
    indicator, and weights
    {p_end}
{synopt :{opt r:eplace}}allow overwriting existing variables
    {p_end}

{syntab :Algorithm}
{synopt :{opt n:samp(#)}}number of trial candidates; default is {cmd:nsamp(500)}
    {p_end}
{synopt :{opt nmax(#)}}maximum number of invalid candidates before aborting; default is max(1000,{cmd:nsamp()})
    {p_end}
{synopt :{opt expand}}expand invalid candidates by adding observations (not recommended)
    {p_end}
{synopt :{opt noee}}do not use exact enumeration even if feasible
    {p_end}
{synopt :{opt nostd}}omit standardization (not recommended)
    {p_end}
{synopt :{opt control:s(spec)}}partial out effects of covariates
    {p_end}
{synoptline}

{synoptset 21 tabbed}{...}
{marker predict_opt}{col 5}{it:{help robmv##predict_options:predict_options}}{col 28}Description
{synoptline}
{syntab :Main}
{synopt :{opt d:istance}}generate robust distances; the default
    {p_end}
{synopt :{opt r:d}}synonym for {cmd:distance}
    {p_end}
{synopt :{opt o:utlier}[{cmd:(}{it:#}{cmd:)}]}generate outlier indicator
    {p_end}
{synopt :{opt i:nlier}[{cmd:(}{it:#}{cmd:)}]}generate inlier indicator
    {p_end}

{syntab :Additional M options}
{synopt :{opt w:eights}}generate W1 weights
    {p_end}

{syntab :Additional MVE/MCD options}
{synopt :{opt s:ubset}}generate best H-subset indicator
    {p_end}
{synopt :{opt nore:weight}}use raw MVE/MCD estimate
    {p_end}
{synopt :{opt noscale}}use unscaled raw MVE/MCD estimate
    {p_end}
{synoptline}


{title:Description}

{pstd}
    {cmd:robmv} provides a number of robust multivariate estimators of location
    and covariance.

{pstd}
    {cmd:robmv classic} computes the classical (non-robust) estimate of
    location and covariance. Results are the same as computed by standard
    commands such as {helpb correlate}.

{pstd}
    {cmd:robmv m} computes an M estimate of location and covariance using a
    Huber weighting function as suggested by Lopuha{c a:} (1989). Singular
    solutions are handled as suggested by Maronna et al. (2006, p. 184-185).

{pstd}
    {cmd:robmv s} computes an S estimate of
    location and covariance (Lopuha{c a:} 1989) using the FastS algorithm
    as described in Hubert et al. (2013).

{pstd}
    {cmd:robmv mm} computes an MM estimate of location and covariance
    (Salibian-Barrera et al. 2006).

{pstd}
    {cmd:robmv mve} computes the Minimum Volume Ellipsoid (MVE) estimator
    of location and covariance. By default, the one-step reweighted estimate
    is reported instead of the raw MVE estimate. The estimation algorithm employs
    and improvement step as suggested by Maronna et al. (2006, p. 198). In case
    of an exact-fit situation (that is, when the variance matrix in the best
    H-subset is singular due to local collinearity among the variables) the
    means and covariances are based on all observations that lie on the
    hyperplane and the corresponding hyperplane equation is reported.

{pstd}
    {cmd:robmv mcd} computes the Minimum Covariance Determinant (MCD) estimator
    of location and covariance. By default, the one-step reweighted estimate
    is reported instead of the raw MCD estimate. A fast algorithm as suggested
    by Rousseeuw and Van Driessen (1999) is used for computation of the MCD
    estimate. Consistency correction as given in Croux and Haesbroeck (1999) is
    applied. Furthermore, by default, small sample bias is corrected as
    suggested by Pison et al. (2002). In case of an exact-fit situation
    (that is, when the variance matrix in the best H-subset is singular
    due to local collinearity among the variables) the means and covariances
    are based on all observations that lie on the hyperplane and
    the corresponding hyperplane equation is reported.

{pstd}
    {cmd:robmv sd} computes the Stahel-Donoho estimator of
    location and covariance as discussed, for example, by Maronna and
    Yohai (1995). It also supports the modified Stahel-Donoho estimator for
    skewed and/or heavy-tailed distributions suggested by Verardi and
    Vermandele (2016).

{pstd}
    {cmd:predict} can be used after {cmd:robmv} to generate variables
    identifying outliers, containing robust distances, etc.


{title:Dependencies}

{pstd}
    {cmd:robmv} requires {cmd:moremata}; see
    {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":ssc describe moremata}.
    In addition, the {cmd:asymmetric} option of {cmd:robmv sd} requires {cmd:robbox}; see
    {net "describe robbox, from(http://fmwww.bc.edu/repec/bocode/r/)":ssc describe robbox}.


{marker options}{...}
{title:General options}

{dlgtab:Main}

{phang}
    {opt correlation} specifies that correlations be reported instead of
    variances and covariances.

{dlgtab:Standard errors/CIs}

{marker vcetype}{...}
{phang}
    {opth vce(vcetype)} determines how standard errors and confidence intervals
    are computed. {it:vcetype} may be

            {cmd:analytic}
            {cmd:cluster} {it:clustvar}
            {cmd:bootstrap} [{cmd:,} {help bootstrap:{it:bootstrap_options}}]
            {cmd:jackknife} [{cmd:,} {help jackknife:{it:jackknife_options}}]

{pmore}
    {cmd:vce(analytic)}, the default, computes standard errors based on influence
    functions. Likewise, {cmd:vce(cluster} {it:clustvar}{cmd:)} computes standard
    errors based on influence function allowing for intragroup correlation,
    where {it:clustvar} specifies to which group each observation belongs. For
    bootstrap and jackknife estimation, see help {it:{help vce_option}}.

{pmore}
    {cmd:vce(analytic)} and {cmd:vce(cluster)} are currently not supported by
    {cmd:robmv mcd}, {cmd:robmv mve}, and {cmd:robmv sd}. No standard errors will
    be estimated by these subcommands.

{phang}
    {cmd:svy}[{cmd:(}{it:subpop}{cmd:)}] causes the survey design to be taken
    into account for variance estimation. The data need to be set up for survey
    estimation; see help {helpb svyset}. Only one of {cmd:svy()} and {cmd:vce()}
    is allowed. Specify {it:subpop} to restrict survey
    estimation to a subpopulation, where {it:subpop} is

            [{varname}] [{it:{help if}}]

{pmore}
    The subpopulation is defined by observations for which {it:varname}!=0 and
    for which the {cmd:if} condition is met. See help {helpb svy} and
    {manlink SVY subpopulation estimation} for more information on subpopulation
    estimation.

{pmore}
    The {cmd:svy} option of {cmd:robmv} only works if the variance
    estimation method is set to Taylor linearization by {helpb svyset} (the
    default). For other variance estimation methods you can use the usual {helpb svy}
    prefix command.

{pmore}
    {cmd:svy()} is currently not supported by
    {cmd:robmv mcd}, {cmd:robmv mve}, and {cmd:robmv sd}. No standard errors will
    be estimated by these subcommands.

{phang}
    {opt nose} suppresses the computation of standard errors and confidence
    intervals.

{marker ifgen}{...}
{phang}
    {opt ifgenerate(names)} stores the influence functions that were used
    to compute the standard errors, where {it:names} is either a list of (new) variable names
    or {help newvarlist##stub*:{it:stub}}{cmd:*} to create names {it:stub}{cmd:1},
    {it:stub}{cmd:2}, etc. {cmd:ifgenerate()} has no effect if specified together
    with {cmd:nose}, {cmd:vce(bootstrap)}, or {cmd:vce(jackknife)}.

{phang}
    {opt replace} allows {cmd:ifgenerate()} to overwrite existing variables.

{dlgtab:Reporting}

{phang}
    {opt level(#)} specifies the confidence level, as a percentage, for
    confidence intervals. The default is {cmd:level(95)} or as set by
    {helpb set level}.

{phang}
    {opt noheader} suppresses the output header; only the coefficient table is
    displayed.

{phang}
    {opt notable} suppresses the coefficient table.

{marker displayopts}{...}
{phang}
    {it:display_options} are standard reporting options such as {cmd:cformat()},
    {cmd:pformat()}, {cmd:sformat()}, or {cmd:coeflegend}. See
    {helpb estimation options:[R] estimation options}.


{marker cl_options}{...}
{title:Additional options for robmv classic}

{phang}
    {opt normcoll} requests that collinear variables will be included in the
    estimation. The default is to remove collinear variables.


{marker m_options}{...}
{title:Additional options for robmv m}

{dlgtab:Main}

{phang}
    {opt k(#)} sets the tuning constant for the Huber objective function.
    Unless {cmd:ptrim()} is specified (see below), the default is to set the
    tuning constant to k = sqrt(p+1), where p is the number of variables, so
    that the maximum asymptotic breakdown point of bp = min(1/k^2, 1-p/k^2) is
    reached (see Lopuha{c a:} 1989). Note that {cmd:k()} must be larger than
    sqrt(p) for the M estimate to exist.

{phang}
    {opt ptrim(#)} sets the percentage of winsorizing. If {opt ptrim()} is
    specified, the tuning constant is set to k = sqrt(invchi2tail(p,
    ptrim/100)), where p is the number of variables. Setting {cmd:ptrim(0)}
    will return the classical location and covariance estimate (no
    winsorizing). Note that {cmd:ptrim()} must be smaller than
    chi2tail(p,p)*100, where p is the number of variables, for the M estimate
    to exist. Only one of {cmd:ptrim()} and {cmd:k()} is allowed.

{dlgtab:Consistency correction}

{phang}
    {opt c(#)} specifies a custom consistency correction factor by which
    the raw estimate be rescaled. The default is to rescale the
    estimate so that it provides a consistent estimate of the mean and
    covariance matrix for normally distributed data (see option {cmd:cemp}
    below).

{phang}
    {opt cemp} specifies that the normal consistency correction factor
    is estimated empirically (equation 6.22 in Maronna et al. 2006). The
    default is to derive the normal consistency correction factor
    numerically (equation 6.21 in Maronna et al. 2006). Use the {cmd:cemp}
    option should the default algorithm fail (unlikely to happen).

{dlgtab:Algorithm}

{phang}
    {opt tolerance(#)} sets the tolerance for the reweighting algorithm.
    When the maximum relative change in the location and covariance estimate
    is less than or equal to {cmd:tolerance()}, convergence is
    achieved. The default is {cmd:tolerance(1e-10)}.

{phang}
    {opt iterate(#)} specifies the maximum number of iterations for the
    reweighting algorithm. If convergence is not reached within
    {cmd:iterate()} iterations, the algorithm stops and returns error. The
    default is as set by {helpb set maxiter}.

{phang}
    {opt relax} causes the algorithm to return the current results
    instead of returning error if convergence is not reached within
    {cmd:iterate()} iterations.


{marker s_options}{...}
{title:Additional options for robmv s}

{dlgtab:Main}

{phang}
    {opt bp(#)} sets the breakdown point (in percent) with # in [1,50]. The
    default is {cmd:bp(50)}.

{phang}{opt whilferty} obtains the tuning constant corresponding to the desired
    breakdown point by applying the Wilson-Hilferty transformation to the tuning
    constant of the univariate biweight function. The default is to obtain the
    tuning constant by finding value {it:k} that solves {it:bp} = {it:b} /
    ({it:k}^2/6), where {it:bp} is the desired breakdown point and {it:b} is
    the Gaussian consistency parameter of the scale optimization problem.

{phang}
    {opt k(#)} sets the tuning constant to a custom value. Only one of
    {cmd:k()} and {cmd:bp()} is allowed. The procedure used to compute the
    breakdown point corresponding to {cmd:k()} depends on whether {cmd:whilferty}
    is specified or not.

{dlgtab:Algorithm}

{phang}
    {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the
    search algorithm. The default is {cmd:nsamp(20)}.

{phang}
    {opt csteps(#)} sets the number of improvement steps (C-steps) applied when
    evaluating the trial candidates. The default is {cmd:csteps(2)}.

{phang}
    {opt nkeep(#)} sets the number of best trial candidates kept for final
    refinement. The default is {cmd:nkeep(5)}.

{phang}
    {opt tolerance(#)} sets the tolerance for the candidate scale refinements
    and the final refinement of the best candidates. When the relative
    change in the scale from one iteration to the next is less than or equal
    to {cmd:tolerance()}, convergence is achieved. The default is {cmd:tolerance(1e-10)}.

{phang}
    {opt iterate(#)} specifies the maximum number of iterations for the candidate scale refinements
    and the final refinement of the best candidates. If convergence is not reached within
    {cmd:iterate()} iterations, the algorithm stops and returns error. The
    default is as set by {helpb set maxiter}.

{phang}
    {opt relax} causes the algorithm to use the current results
    instead of returning error if convergence is not reached within
    {cmd:iterate()} iterations.

{phang}
    {opt noee} specifies that enumeration of random trials is used even if
    exact enumeration of all possible candidates would be feasible. The
    algorithms uses exact enumeration of all possible (p+1)-subsets if comb(N,
    p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random (p+1)-subsets are
    enumerated. Given the low default value of {cmd:nsamp()}, exact enumeration
    will only be used in very small samples. Set {cmd:nsamp()} to comb(N, p+1),
    to enforce exact enumeration of (p+1)-subsets.


{marker mm_options}{...}
{title:Additional options for robmv mm}

{phang}
    {opt efficiency(#)} sets the desired gaussian efficiency (in percent) with #
    in [70,100). The default is {cmd:efficiency(95)}.

{phang}
    {opt location} requests that {cmd:efficiency()} sets the location
    efficiency. The default is to set the shape efficiency.

{phang}
    {help robmv##s_options:{it:s_options}} are additional options as for {cmd:robreg s}.


{marker mve_options}{...}
{title:Additional options for robmv mve}

{dlgtab:Main}

{phang}
    {opt noreweight} causes the raw MVE estimate to be reported instead of
    the one-step reweighted estimate. The one-step reweighted estimate
    is computed from the observations whose robust distances based on the
    raw MVE fit are smaller than invchi2(p, 0.975), where p is the number
    of variables. The one-step reweighted estimate has better efficiency
    properties than the raw MVE estimate.
    {p_end}

{phang}
    {opt bp(#)} sets the approximate breakdown point (in percent) with # being
    an integer number between 0 and 50. The default is {cmd:bp(50)}. The breakdown
    point determines the size of the H-subset, that is, the number of
    observations in the subset that identifies the MVE fit. The size of
    the H-subset is computed as

            h = floor((N - p - 1)*(1 - bp/100) + p + 1)

{pmore}
    where N is the sample size, p is the number of variable and bp is the
    specified breakdown point (in percent). In case of a breakdown point of 50%
    this simplifies to h = floor((N + p + 1)/2). Note that in case of
    weights, the H-subset will be constructed from the raw observations
    ignoring weights. That is, the breakdown point should be interpreted in
    terms of a percentage of the raw observations and not in terms of a
    percentage of the sum of weights.

{phang}
    {opt alpha(#)} sets the cutoff, in percent, used to determine the weights for the
    reweighted estimate. The weights are set to one for observations with squared
    distanced smaller than invchi2(p, 1-alpha/100) and zero else, where p is the
    number of variables. The default is {cmd:alpha(2.5)}.

{dlgtab:Consistency correction}

{phang}
    {opt calpha(#)} specifies a custom consistency factor by which the initial
    MVE estimate be rescaled. The default is to rescale the estimate so that it
    provides a consistent estimate of the mean and covariance matrix for
    normally distributed data. The default consistency factor is computed as
    median(d2)/invchi2(p, 0.5), where d2 are the squared robust distances based
    on the initial MVE fit an p is the number of variables.

{phang}
    {opt cdelta(#)} specifies a custom consistency factor by which
    the one-step reweighted estimate be rescaled. The default is to rescale the
    estimate so that it provides a consistent estimate of the mean and
    covariance matrix for normally distributed data. The default consistency
    factor is computed as (W/N)/chi2(p+2, invchi2(p, W/N)), where p
    is the number of variables, N is the sample size, and W is the number of
    observations on which the reweighted estimate is based
    (see option {cmd:noreweight} above).

{dlgtab:Algorithm}

{phang}
    {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the
    search algorithm. The default is {cmd:nsamp(500)}.

{phang}
    {opt noee} specifies that enumeration of random trials is used even if
    exact enumeration of all possible candidates would be feasible. The
    algorithms uses exact enumeration of all possible (p+1)-subsets if comb(N,
    p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random (p+1)-subsets are
    enumerated. Given the low default value of {cmd:nsamp()}, exact enumeration
    will only be used in very small samples. Set {cmd:nsamp()} to comb(N, p+1),
    to enforce exact enumeration of (p+1)-subsets.


{marker mcd_options}{...}
{title:Additional options for robmv mcd}

{dlgtab:Main}

{phang}
    {opt noreweight} causes the raw MCD estimate to be reported instead of
    the one-step reweighted estimate. The one-step reweighted estimate
    is computed from the observations whose robust distances based on the
    raw MCD fit are smaller than invchi2(p, 0.975), where p is the number
    of variables. The one-step reweighted estimate has better efficiency
    properties than the raw MCD estimate.
    {p_end}

{phang}
    {opt bp(#)} sets the approximate breakdown point (in percent) with # being
    an integer number between 0 and 50. The default is {cmd:bp(50)}. The breakdown
    point determines the size of the H-subset, that is, the number of
    observations in the subset that identifies the MCD fit. The size of
    the H-subset is computed as

            h = floor((N - p - 1)*(1 - bp/100) + p + 1)

{pmore}
    where N is the sample size, p is the number of variable and bp is the
    specified breakdown point (in percent). In case of a breakdown point of 50%
    this simplifies to h = floor((N + p + 1)/2). The relationship between h and
    the desired breakdown point is only approximate, as the breakdown point of
    the MCD estimate is given as (N - h + 1)/N. The larger the sample size, the
    better h realizes the desired breakdown point. Note that in case of
    weights, the H-subset will be constructed from the raw observations
    ignoring weights. That is, the breakdown point should be interpreted in
    terms of a percentage of the raw observations and not in terms of a
    percentage of the sum of weights.

{phang}
    {opt alpha(#)} sets the cutoff, in percent, used to determine the weights for the
    reweighted estimate. The weights are set to one for observations with squared
    distanced smaller than invchi2(p, 1-alpha/100) and zero else, where p is the
    number of variables. The default is {cmd:alpha(2.5)}.

{dlgtab:Consistency correction}

{phang}
    {opt calpha(#)} specifies a custom consistency factor by which
    the initial MCD estimate be rescaled. The default is to rescale the
    estimate so that it provides a consistent estimate of the mean and
    covariance matrix for normally distributed data. The default consistency
    factor is computed as (h/N)/chi2(p+2, invchi2(p, h/N)), where p
    is the number of variables, N is the sample size, and h is the size of the
    H-subset (see option {cmd:bp()} above).

{phang}
    {opt cdelta(#)} specifies a custom consistency factor by which
    the one-step reweighted estimate be rescaled. The default is to rescale the
    estimate so that it provides a consistent estimate of the mean and
    covariance matrix for normally distributed data. The default consistency
    factor is computed as (W/N)/chi2(p+2, invchi2(p, W/N)), where p
    is the number of variables, N is the sample size, and W is the number of
    observations on which the reweighted estimate is based
    (see option {cmd:noreweight} above).

{phang}
    {opt nosmall} specifies that the additional small sample correction suggested
    by Pison et al. (2002) be omitted.
    {p_end}

{dlgtab:Algorithm}

{phang}
    {opt nsamp(#)} specifies the number of trial candidates to be evaluated in the
    search algorithm for the best H-subset. The default is {cmd:nsamp(500)}.

{phang}
    {opt csteps(#)} sets the number of concentration steps (C-steps) applied when
    evaluating the trial candidates. The default is {cmd:csteps(2)}.

{phang}
    {opt nkeep(#)} sets the number of best trial candidates kept for final
    refinement. The default is {cmd:nkeep(10)}.

{phang}
    {opt nsub(#)} specifies the subsample size used by the search algorithm in
    case of a large sample size N. The default is max(p*50, 300), where p is
    the number of variables. If N >= 2*{cmd:nsub()} the algorithm splits the
    sample into subsamples for the enumeration of the trial candidates. Up to
    {cmd:ksub()} subsamples are constructed (see below). For example, if
    {cmd:nsub()} is set to 300, {cmd:ksub()} is set to 5, and N = 10000, then 5
    subsamples of size 300 are drawn (without replacement) from the 10000
    observations and in each of these subsample {cmd:nsamp()}/5 trial
    candidates are enumerated. From each subsample the {cmd:nkeep()} best
    candidates are kept for further evaluation. The 5 subsamples are then
    merged together to a subsample of size 5*300 = 1500 observations and the
    5*{cmd:nkeep()} candidates are evaluated. The {cmd:nkeep()} best candidates
    from the merged sample are then refined util convergence in the full sample
    to identify the best solution. If N < 2*{cmd:nsub()}, no subsampling is
    applied, that is, all {cmd:nsamp()} candidates are enumerated in the full
    sample. If 2*{cmd:nsub()} <= N <= {cmd:ksub()}*{cmd:nsub()}, the data is
    split into as many subsamples as possible using a minimum subsample size of
    {cmd:nsub()} observations. For example, if N = 800 and {cmd:nsub()} is set
    to 300, then the data is split into two subsamples with 400 observations
    and in each of the subsample {cmd:nsamp()}/2 trial candidates are
    enumerated. See Rousseeuw and Van Driessen (1999) for a more detailed
    description of the algorithm.

{pmore}
    Specify {cmd:nsub(.)} to omit subsampling and evaluate all {cmd:nsamp()}
    trial candidates in the full sample irrespective of the sample size.

{phang}
    {opt ksub(#)} sets the maximum number of subsamples used by the
    large-N algorithm; see the {cmd:nsub()} option above. The default is
    {cmd:ksub(5)}. {cmd:ksub()} must be equal to 2 or larger.

{phang}
    {opt tolerance(#)} sets the tolerance for the final refinement of the best
    candidates. When the relative change in the determinant from one iteration
    to the next is less than or equal to {cmd:tolerance()}, convergence is
    achieved. The default is {cmd:tolerance(1e-10)}.

{phang}
    {opt iterate(#)} specifies the maximum number of iterations for the final
    refinement of the best candidates. If convergence is not reached within
    {cmd:iterate()} iterations, the algorithm stops and returns error. The
    default is as set by {helpb set maxiter}.

{phang}
    {opt relax} causes the algorithm to return the current results
    instead of returning error if convergence is not reached within
    {cmd:iterate()} iterations. Use this option together with
    {cmd:iterate()} if you want to restrict the number of C-steps in the
    final refinement.

{phang}
    {opt noee} specifies that enumeration of random trials is used even if
    exact enumeration of all possible candidates would be feasible. The
    algorithms uses exact enumeration of all possible H-subsets if comb(N, h)
    <= {cmd:nsamp()} and uses exact enumeration of all possible (p+1)-subsets
    if comb(N, p+1) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random
    (p+1)-subsets are enumerated (while expanding singular subsets until
    nonsingular or until reaching size h). Given the low default value of
    {cmd:nsamp()}, exact enumeration will only be used in very small samples.
    Set {cmd:nsamp()} to comb(N, h) or comb(N, p+1), to enforce exact
    enumeration of H-subsets or (p+1)-subsets, respectively.

{phang}
    {opt nounivar} specifies that the standard search algorithm is used even if
    only one variable is analyzed. The default is to use an exact enumeration
    algorithm for univariate data in this case.


{marker sd_options}{...}
{title:Additional options for robmv sd}

{dlgtab:Main}

{phang}
    {opt huber} requests that a Huber-type function is used to down-weight
    the outliers when computing the location and covariance estimate. By default,
    a rectangular function is used, which is equivalent to excluding the outliers.

{phang}
    {opt alpha(#)} sets the expected percentage of observations that will be classified
    as outliers under normal conditions. The default is {cmd:alpha(2.5)}. {cmd:alpha()}
    has no effect if {cmd:cutoff()} is specified.

{phang}
    {opt asymmetric}[{cmd:(}{it:#}{cmd:)}] computes generalized SD distances
    and determines the cutoff point for outlier identification based on
    Tukey's g-and-h distribution (employing command {helpb robbox}) as suggested by
    Verardi and Vermandele (2016). {it:#}
    sets the breakdown point, in percent, that is used when fitting the
    g-and-h distribution; the default is {cmd:10}. {it:#} has no effect if
    {cmd:cutoff()} is specified.

{phang}
    {opt cutoff(#)} specifies a custom cutoff value for outlier
    identification. By default, if {cmd:asymmetric} is omitted, the cutoff value
    is set to sqrt(invchi2({it:p}, 1 - {cmd:alpha()}/100)), where {it:p} is the
    number of variables. If {cmd:asymmetric} is specified, the default is to
    determine the cutoff value corresponding to {cmd:alpha()} based on
    Tukey's g-and-h distribution.

{phang}
    {opt nofit} omits the computation of the location and covariance estimate. Use
    this option if you are only interested in the Stahel-Donoho distances, but not
    in the location and covariance estimate. The Stahel-Donoho distances can be
    stored by the {cmd:generate()} option.

{dlgtab:Generate}

{phang}
    {cmd:generate(}{it:names}{cmd:)} store a variable containing the
    SD distances, an outlier indicator, and a variable containing the weights
    used to compute the location and covariance estimate. {it:names} may contain
    one to three names, depending on whether you only want to store the distances,
    the distances and the outlier indicators, or the distances, outlier indicator,
    and weights.

{phang}
    {opt replace} allows {cmd:generate()} to overwrite existing variables.

{dlgtab:Algorithm}

{phang}
    {opt nsamp(#)} specifies the number of candidates (p-subsets) to be evaluated in the
    search algorithm. The default is {cmd:nsamp(500)}.

{phang}
    {opt nsmax(#)} specifies the maximum number of candidates that are allowed to
    be infeasible (within each single search for a trial candidate) before aborting
    with error. The default is max(1000,{cmd:nsamp()}).

{phang}
    {opt expand} expands infeasible candidates by adding observations to the subset
    until the candidate becomes feasible. Specifying {cmd:expand} is not recommended.

{phang}
    {opt noee} specifies that enumeration of random subsets is used even if
    exact enumeration of all possible subsets would be feasible. The
    algorithms uses exact enumeration of all possible subsets if comb(N,
    p) <= {cmd:nsamp()}. Otherwise, {cmd:nsamp()} random subsets are
    enumerated.

{phang}
    {opt nostd} omits standardization of the data for the enumeration
    algorithm. Specifying {cmd:nostd} is not recommended.

{phang}
    {cmd:controls(}{it:varlist}[{cmd:,} {it:options}]{cmd:)} partials out the
    effects of {it:varlist} from the SD distances within each projection;
    {it:varlist} may contain factor variables; see {help fvvarlist}.
    {it:options} determine whether to include the controls in the final
    location and covariance estimate and set the details of the Huber M
    estimator used to partial out the effects of the controls; the options are
    as follows:

{phang2}
    {cmd:include} includes the variables specified in {cmd:controls()} in the
    final location and covariance estimate. The default is to include
    only the main variables.

{phang2}
    {opt eff:iciency(#)} sets the gaussian efficiency of the M estimator, in percent. {it:#}
    must be within [63.7,99.9]. The default is 100 - {cmd:alpha()}.

{phang2}
    {opt k(#)} sets the tuning constant of the M estimator. Only one of
    {opt efficiency()} and {cmd:k()} is allowed.

{phang2}
    {opt tol:erance(#)} set the tolerance of the M estimator. The default is
    {cmd:tolerance(1e-10)}.

{phang2}
    {opt iter:ate(#)} set the maximum number of iterations of the M
    estimator. The default is as set by {helpb set maxiter}


{marker predict_options}{...}
{title:Options for predict}

{dlgtab:Main}

{phang}
    {opt distance} generates a variable containing robust distances. This is
    the default.

{phang}
    {opt rd} is a synonym for {cmd:distance}.

{phang}
    {opt outlier}[{cmd:(}{it:#}{cmd:)}] generates a 0/1 variable identifying
    outliers (1 = outlier, 0 = inlier). Optional argument {it:#} specifies the
    percentage of observations classified as outliers in normal data. That is,
    observations with squared distances greater than or equal to
    invchi2(p, 1-{it:#}/100), were p is the number of variables, are classified
    as outliers. Argument {it:#} must be in [0,50]; the default is 2.5.

{phang}
    {opt inlier}[{cmd:(}{it:#}{cmd:)}] generates a 0/1 variable identifying
    inliers (1 = inlier, 0 = outlier). Optional argument {it:#} specifies the
    percentage of observations classified as inliers in normal data. That is,
    observations with squared distances smaller than invchi2(p, {it:#}/100),
    were p is the number of variables, are classified as inliers; all other
    observations are classified as outliers. Argument {it:#} must be in
    [50,100]; default is 97.5.

{dlgtab:Additional M options}

{phang}
    {opt weights} generates a variable containing the W1 weights of the
    M fit.

{dlgtab:Additional MVE/MCD options}

{phang}
    {opt subset} generates a 0/1 variable identifying the best H-subset in the
    estimation sample. Observations outside {cmd:e(sample)}
    will be set to missing.

{phang}
    {opt noreweight} specifies that the raw MVE/MCD estimate be used for
    determining robust distances, ouliers, and inliers. If available (that is,
    unless option {cmd:noreweight} was specified during estimation), the
    default is to base computations on the one-step reweighted estimate.

{phang}
    {opt noscale} specifies that the unscaled raw MVE/MCD estimate be used for
    determining robust distances, ouliers, and inliers. The unscaled
    raw MVE/MCD estimate is equal to the raw MVE/MCD estimate before applying
    consistency or small-sample correction factors.


{title:Examples}

        . {stata sysuse auto}
        . {stata robmv classic price mpg weight length}
        . {stata robmv m price mpg weight length}
        . {stata robmv s price mpg weight length}
        . {stata robmv mm price mpg weight length}
        . {stata robmv mve price mpg weight length}
        . {stata robmv mcd price mpg weight length}
        . {stata robmv sd price mpg weight length}


{title:Saved results}

{pstd}
{cmd:robmv} stores the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(N)}}number of observations{p_end}
{synopt:{cmd:e(nvars)}}number of variables included in the location and covariance estimate{p_end}
{synopt:{cmd:e(rnk)}}rank of covariance matrix{p_end}
{synopt:{cmd:e(N_clust)}}number of clusters (only if {cmd:vce(cluster)} is specified){p_end}
{synopt:{cmd:e(df_r)}}sample degrees of freedom (only if {cmd:e(V)} is stored){p_end}
{synopt:{cmd:e(rank)}}rank of {cmd:e(V)}{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(cmdline)}}command as typed{p_end}
{synopt:{cmd:e(cmd)}}{cmd:robmv}{p_end}
{synopt:{cmd:e(subcmd)}}name of subcommand{p_end}
{synopt:{cmd:e(predict)}}{cmd:robmv_p}{p_end}
{synopt:{cmd:e(depvar)}}{cmd:Cov} or {cmd:Corr}{p_end}
{synopt:{cmd:e(valist)}}names of variables included in the location and covariance estimate{p_end}
{synopt:{cmd:e(valist0)}}names of variables including base levels and omitted terms{p_end}
{synopt:{cmd:e(correlation)}}{cmd:correlation} or empty{p_end}
{synopt:{cmd:e(wtype)}}weight type{p_end}
{synopt:{cmd:e(wexp)}}weight expression{p_end}
{synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end}
{synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end}
{synopt:{cmd:e(clustvar)}}name of cluster variable{p_end}
{synopt:{cmd:e(title)}}title in estimation output{p_end}
{synopt:{cmd:e(properties)}}{cmd:b V} or {cmd:b}{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Matrices}{p_end}
{synopt:{cmd:e(b)}}estimates{p_end}
{synopt:{cmd:e(V)}}sampling variance of estimates (only if supported){p_end}
{synopt:{cmd:e(mu)}}location estimates{p_end}
{synopt:{cmd:e(Cov)}}covariance estimates{p_end}
{synopt:{cmd:e(Corr)}}correlation estimates{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Functions}{p_end}
{synopt:{cmd:e(sample)}}marks estimation sample{p_end}
{p2colreset}{...}

{pstd}
{cmd:robmv m} additionally stores the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(bp)}}limiting value of breakdown point{p_end}
{synopt:{cmd:e(ptrim)}}winsorizing percentage{p_end}
{synopt:{cmd:e(k)}}tuning constant of the Huber objective function{p_end}
{synopt:{cmd:e(c)}}consistency correction factor{p_end}
{synopt:{cmd:e(tolerance)}}tolerance for reweighting algorithm{p_end}
{synopt:{cmd:e(iterate)}}maximum number of iterations for reweighting algorithm{p_end}
{synopt:{cmd:e(niter)}}executed number of iterations{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end}

{pstd}
{cmd:robmv s} and {cmd:robmv mm} additionally store the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(bp)}}breakdown point{p_end}
{synopt:{cmd:e(k)}}tuning constant{p_end}
{synopt:{cmd:e(delta)}}normal consistency parameter{p_end}
{synopt:{cmd:e(nsamp)}}number of trial candidates{p_end}
{synopt:{cmd:e(csteps)}}number of C-steps for trial candidates{p_end}
{synopt:{cmd:e(nkeep)}}number of best candidates for final refinement{p_end}
{synopt:{cmd:e(tolerance)}}tolerance for refinements{p_end}
{synopt:{cmd:e(iterate)}}maximum number of iterations for refinements{p_end}
{synopt:{cmd:e(scale)}}scale estimate{p_end}
{synopt:{cmd:e(efficiency)}}efficiency, in percent ({cmd:robreg mm} only){p_end}
{synopt:{cmd:e(k_m)}}tuning constant of M step ({cmd:robreg mm} only){p_end}
{synopt:{cmd:e(niter)}}executed number of M step iterations ({cmd:robreg mm} only){p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(whilferty)}}{cmd:whilferty} or empty{p_end}
{synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end}
{synopt:{cmd:e(method)}}{cmd:random} or {cmd:exact}{p_end}
{synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end}
{synopt:{cmd:e(efftype)}}{cmd:shape} or {cmd:location} ({cmd:robreg mm} only){p_end}

{pstd}
{cmd:robmv mve} additionally stores the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(h)}}size of H-subset{p_end}
{synopt:{cmd:e(bp)}}requested breakdown point{p_end}
{synopt:{cmd:e(calpha)}}consistency factor for raw MVE estimate{p_end}
{synopt:{cmd:e(cdelta)}}consistency factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end}
{synopt:{cmd:e(nsamp)}}number of trial candidates{p_end}
{synopt:{cmd:e(nhyper)}}number of observations on hyperplane if H-subset is collinear; {cmd:0} else {p_end}
{synopt:{cmd:e(MVE)}}(normalized) scale of initial MVE estimate{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(noreweight)}}{cmd:noreweight} or empty{p_end}
{synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end}
{synopt:{cmd:e(method)}}{cmd:classical}, {cmd:random}, or {cmd:exact}{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Matrices}{p_end}
{synopt:{cmd:e(mu0)}}unscaled raw MVE location estimate{p_end}
{synopt:{cmd:e(Cov0)}}unscaled raw MVE covariance estimate{p_end}
{synopt:{cmd:e(Corr0)}}unscaled raw MVE correlation estimate{p_end}
{synopt:{cmd:e(gamma)}}coefficients of hyperplane equation (if H-subset is collinear){p_end}

{pstd}
{cmd:robmv mcd} additionally stores the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(h)}}size of H-subset{p_end}
{synopt:{cmd:e(bp)}}requested breakdown point{p_end}
{synopt:{cmd:e(calpha)}}consistency factor for raw MCD estimate{p_end}
{synopt:{cmd:e(salpha)}}small sample correction factor for raw MCD estimate {p_end}
{synopt:{cmd:e(cdelta)}}consistency factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end}
{synopt:{cmd:e(sdelta)}}small sample correction factor for reweighted estimate (unless {cmd:noreweight} was specified){p_end}
{synopt:{cmd:e(nsamp)}}number of trial candidates{p_end}
{synopt:{cmd:e(nsub)}}(minimum) size of subsamples for large-N algorithm{p_end}
{synopt:{cmd:e(ksub)}}number of subsamples uses by large-N algorithm; {cmd:0} else{p_end}
{synopt:{cmd:e(nmerged)}}size of merged subsamples{p_end}
{synopt:{cmd:e(csteps)}}(maximum) number of C-steps for trial candidates{p_end}
{synopt:{cmd:e(nkeep)}}number of best candidates for final refinement{p_end}
{synopt:{cmd:e(tolerance)}}tolerance for final refinement{p_end}
{synopt:{cmd:e(iterate)}}maximum number of iterations for final refinement{p_end}
{synopt:{cmd:e(nhyper)}}number of observations on hyperplane if H-subset is collinear; {cmd:0} else {p_end}
{synopt:{cmd:e(MCD)}}determinant of unscaled raw MCD estimate{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(noreweight)}}{cmd:noreweight} or empty{p_end}
{synopt:{cmd:e(nosmall)}}{cmd:nosmall} or empty{p_end}
{synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end}
{synopt:{cmd:e(method)}}{cmd:classical}, {cmd:univar}, {cmd:random}, {cmd:exact-h}, or {cmd:exact-p}{p_end}
{synopt:{cmd:e(nounivar)}}{cmd:nounivar} or empty{p_end}
{synopt:{cmd:e(relax)}}{cmd:relax} or empty{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Matrices}{p_end}
{synopt:{cmd:e(mu0)}}unscaled raw MCD location estimate{p_end}
{synopt:{cmd:e(Cov0)}}unscaled raw MCD covariance estimate{p_end}
{synopt:{cmd:e(Corr0)}}unscaled raw MCD correlation estimate{p_end}
{synopt:{cmd:e(gamma)}}coefficients of hyperplane equation (if H-subset is collinear){p_end}

{pstd}
{cmd:robmv sd} additionally stores the following in {cmd:e()}:

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Scalars}{p_end}
{synopt:{cmd:e(nsamp)}}number of trial candidates{p_end}
{synopt:{cmd:e(nmax)}}setting of {cmd:nmax()}{p_end}
{synopt:{cmd:e(nskip)}}number of discarded candidates{p_end}
{synopt:{cmd:e(alpha)}}outlier percentage under normality{p_end}
{synopt:{cmd:e(cutoff)}}cutoff value for outlier identification{p_end}
{synopt:{cmd:e(Nout)}}number of observations classified as outliers{p_end}

{synoptset 20 tabbed}{...}
{p2col 7 20 24 2: Macros}{p_end}
{synopt:{cmd:e(method)}}{cmd:uniform}, {cmd:random} or {cmd:exact}{p_end}
{synopt:{cmd:e(asymmetric)}}{cmd:asymmetric} or empty{p_end}
{synopt:{cmd:e(xvars)}}names of main variables{p_end}
{synopt:{cmd:e(controls)}}names of control variables{p_end}
{synopt:{cmd:e(include)}}{cmd:include} or empty{p_end}
{synopt:{cmd:e(noee)}}{cmd:noee} or empty{p_end}
{synopt:{cmd:e(expand)}}{cmd:expand} or empty{p_end}
{synopt:{cmd:e(nostd)}}{cmd:nostd} or empty{p_end}
{synopt:{cmd:e(wftype)}}{cmd:huber} or {cmd:rectangle}{p_end}
{synopt:{cmd:e(nofit)}}{cmd:nofit} or empty{p_end}
{synopt:{cmd:e(generate)}}names of generated variables{p_end}

{pstd}
    If {cmd:robmv sd} is specified with option {cmd:nofit}, only a reduced set
    of results is stored.

{pstd}
    If the {cmd:svy} option is specified, various additional results as described in
    help {helpb svy} are stored.


{title:References}

{phang}
    Croux, C., G. Haesbroeck (1999). Influence Function
    and Efficiency of the Minimum Covariance Determinant Scatter Matrix
    Estimator. Journal of Multivariate Analysis 71: 161 190.

{phang}
    Hubert, M., P.J. Rousseeuw, D. Vanpaemel, T. Verdonck (2013). A deterministic algorithm
    for S-estimators and MM-estimators of multivariate location and scatter. KU Leuven. Available
    from {browse "http://wis.kuleuven.be/stat/robust/papers/2013/dets-technicalreport.pdf"}.

{phang}
    Lopuha{c a:}, H. P. (1989). On the Relation Between S-Estimators and
    M-Estimators of Multivariate Location and Covariance. The Annals of
    Statistics 17: 1662-1683.

{phang}
    Maronna, R.A., D.R. Martin, V.J. Yohai (2006). Robust Statistics.
    Theory and Methods. Chichester: John Wiley & Sons.

{phang}
    Maronna, R.A., V.J. Yohai (1995). The Behavior of the Stahel-Donoho Robust
    Multivariate Estimator. Journal of the American Statistical Association 90(429): 330-341.

{phang}
    Pison, G., S. Van Aelst, G. Willems (2002). Small sample corrections for
    LTS and MCD. Metrika 55: 111-123.

{phang}
    Rousseeuw, P.J., K. Van Driessen (1999). A Fast Algorithm for the
    Minimum Covariance Determinant Estimator. Technometrics 41(3): 212-223.

{phang}
    Salibian-Barrera, M., S. Van Aelst, G. Willems (2006). Principal Components Analysis
    Based on Multivariate MM Estimators With Fast and Robust Bootstrap. Journal of
    the American Statistical Association 101(475):1198-1211.

{phang}
    Verardi, V., C. Vermandele (2016). Outlier identification
    for skewed and/or heavy-tailed unimodal multivariate distributions. Journal de
    la Société Française de Statistique 157(2): 90-114.


{title:Authors}

{pstd}
    Ben Jann (University of Bern),
    Vincenzo Verardi (University of Namur and Universite libre de Bruxelles),
    Catherine Vermandele (Universite libre de Bruxelles)

{pstd}
    Support: ben.jann@soz.unibe.ch

{pstd}
    Thanks for citing this software as follows:

{pmore}
    Jann, B., V. Verardi, C. Vermandele (2021). robmv: Stata module for robust
    multivariate estimation of location and covariance. Available from
    {browse "http://ideas.repec.org/c/boc/bocode/s458895.html"}.


{title:Also see}

{psee}
    Online:  help for
    {helpb correlate},
    {helpb robreg},
    {helpb robstat},
    {helpb robbox}