{smcl} {* 10aug2020}{...} {hi:help robstat}{...} {right:{browse "http://github.com/benjann/robstat/"}} {hline} {title:Title} {pstd}{hi:robstat} {hline 2} Robust univariate statistics {title:Syntax} {p 8 15 2} {cmd:robstat} {varlist} {ifin} {weight} [{cmd:,} {help robstat##opts:{it:options}} ] {synoptset 22 tabbed}{...} {marker opts}{col 5}{help robstat##options:{it:options}}{col 29}Description {synoptline} {syntab :Main} {synopt :{cmdab:s:tatistics(}{help robstat##stats:{it:stats}}{cmd:)}}statistics to be computed; default is {cmd:statistics(mean)} {p_end} {synopt :{opth over(varname)}}compute results for subpopulations defined by the values of {it:varname} {p_end} {synopt :{opt t:otal}}include overall results across all subpopulations; only allowed with {cmd:over()} {p_end} {synopt :{opt swap}}swap coefficients and equations {p_end} {syntab :SE/SVY} {synopt :{cmd:vce(}{help robstat##vcetype:{it:vcetype}}{cmd:)}}{it:vcetype} may be {cmd:analytic} (the default), {cmdab:cl:uster} {it:clustvar}, {cmdab:boot:strap} or {cmdab:jack:knife} {p_end} {synopt :{opt cl:uster(clustvar)}}synonym for {cmd:vce(cluster} {it:clustvar}{cmd:)} {p_end} {synopt :{cmd:svy}[{cmd:(}{help robstat##svy:{it:subpop}}{cmd:)}]}take account of survey design as set by {helpb svyset}, optionally restricting computations to {it:subpop} {p_end} {synopt :{opt nose}}supress computation of standard errors and confidence intervals {p_end} {synopt :{opt gen:erate}[{cmd:(}{it:prefix}{cmd:)}]}stores the values of the influence functions {p_end} {synopt :{opt replace}}allows overwriting existing variables {p_end} {syntab :Normality tests} {synopt :{cmdab:jb:test}[{cmd:(}{help robstat##tests:{it:tests}}{cmd:)}]}compute generalized Jarque-Bera tests for normality {p_end} {synopt :{opt wald}}employ Wald tests based on the estimated variance-covariance matrix {p_end} {syntab :Reporting} {synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)} {p_end} {synopt :{opt cil:og}}use log-transformed confidence intervals for scale statistics {p_end} {synopt :{opt nohe:ader}}suppress output header {p_end} {synopt :{opt notab:le}}suppress output table {p_end} {synopt :{help robstat##displayopts:{it:display_options}}}standard reporting options as described in {helpb estimation options:[R] estimation options} {p_end} {syntab :Technical options} {synopt :{opt tol:erance(#)}}tolerance for the M estimates; default is {cmd:tolerance(1e-10)} {p_end} {synopt :{opt iter:ate(#)}}maximum number of iterations for the M estimates; default is {cmd:iterate(16000)} or as set by {helpb set maxiter} {p_end} {synopt :{opt k:ernel(kernel)}}type of kernel function for density estimation; default is {cmd:kernel(epan2)} {p_end} {synopt :{opt bw(type)}}type of bandwidth selector for density estimation; default is {cmd:bw(dpi)} {p_end} {synopt :{opt a:daptive(#)}}number of repetitions for adaptive density estimation; default is {cmd:adaptive(2)} {p_end} {synopt :{opt n(#)}}number of evaluation points for density estimation; default is {cmd:n(512)} {p_end} {synoptline} {p 4 6 2} {opt pweight}s, {opt iweight}s, and {opt fweight}s are allowed; see {help weight}. {title:Description} {pstd} {cmd:robstat} estimates various classic and robust measures of location, scale, skewness, and kurtosis, and, optionally, performs robust tests for normality. For methodological details see Jann, Verardi and Vermandele (forthcoming). {pstd} {cmd:robstat} without {it:varlist} replays the previous results. Reporting options may be applied. {pstd} {cmd:robstat} requires {cmd:moremata}. See {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":ssc describe moremata}. {marker options}{...} {title:Options} {dlgtab:Main} {marker stats}{...} {phang} {opt statistics(stats)} specifies the statistics to be computed. {cmd:statistics()} is not allowed if the {cmd:jbtest} option is specified (see below). {it:stats} is a space-separated list of the following statistics (we use upper case letters in some of the statistics for purpose of clarity, but note that {it:stats} is not case sensitive). Default is {cmd:statistics(mean)}. {p2colset 13 25 27 2}{...} {p2col 11 25 27 2:Location}{p_end} {p2col :{opt m:ean}}arithmetic mean {p_end} {p2col :{cmdab:a:lpha}[{it:#}]}alpha-trimmed mean, where {it:#} specifies the trimming percentage; default is {cmd:alpha5} {p_end} {p2col :{opt med:ian}}median {p_end} {p2col :{opt HL}}Hodges-Lehmann estimator (Hodges and Lehmann 1963) {p_end} {p2col :{cmdab:H:uber}[{it:#}]}Huber M estimate of location, where {it:#} specifies the desired gaussian efficiency; default is {cmd:Huber95} {p_end} {p2col :{cmdab:bi:weight}[{it:#}]}biweight M estimate of location, where {it:#} specifies the desired gaussian efficiency; default is {cmd:biweight95} {p_end} {p2col 11 25 27 2:Scale}{p_end} {p2col :{opt sd}}standard deviation {p_end} {p2col :{opt IQR}}interquartile range {p_end} {p2col :{opt IQRc}}rescaled interquartile range {p_end} {p2col :{opt MAD}}median absolute deviation {p_end} {p2col :{opt MADN}}rescaled median absolute deviation {p_end} {p2col :{opt Q:n}}Qn coefficient (Rousseeuw and Croux 1993) {p_end} {p2col :{cmd:S}[{it:#}]}M estimate of scale, where {it:#} specifies the desired breakdown point; default is {cmd:S50} {p_end} {p2col 11 25 27 2:Skewness}{p_end} {p2col :{opt ske:wness}}classic skewness measure (Fisher coefficient) {p_end} {p2col :{cmd:SK}[{it:#}]}Hinkley (1975) skewness measure, where {it:#} specifies the desired percentage; default is {cmd:SK25}, which is equal to the Yule and Kendall (1968) skewness measure {p_end} {p2col :{opt MC}}medcouple (Brys et al. 2004) {p_end} {p2col 11 25 27 2:Kurtosis}{p_end} {p2col :{opt k:urtosis}}classic kurtosis measure {p_end} {p2col :{cmd:QW}[{it:#}]}quantile tail weight measure, where {it:#} specifies the desired percentage; default is {cmd:QW25} {p_end} {p2col :{cmd:LQW}[{it:#}]}left quantile tail weight measure, where {it:#} specifies the desired percentage; default is {cmd:LQW25} {p_end} {p2col :{cmd:RQW}[{it:#}]}right quantile tail weight measure, where {it:#} specifies the desired percentage; default is {cmd:RQW25} {p_end} {p2col :{opt LMC}}left medcouple tail weight measure (Brys et al. 2006) {p_end} {p2col :{opt RMC}}right medcouple tail weight measure (Brys et al. 2006) {p_end} {phang} {opth over(varname)} reports results for each subpopulation defined by the values of {it:varname}. {phang} {opt total} causes additional overall results across all subpopulations to be reported. {cmd:total} is only allowed if {cmd:over()} is specified. {phang} {opt swap} affects the layout of the results. Depending on specified options, {cmd:robstat} groups results into several "equations". Specify {cmd:swap} in such a case if you want to flip equations and coefficients. {dlgtab:SE/SVY} {marker vcetype}{...} {phang} {opth vce(vcetype)} determines how standard errors and confidence intervals are computed. {it:vcetype} may be: {cmd:analytic} {cmd:cluster} {it:clustvar} {cmd:bootstrap} [{cmd:,} {help bootstrap:{it:bootstrap_options}}] {cmd:jackknife} [{cmd:,} {help jackknife:{it:jackknife_options}}] {pmore} The default is {cmd:vce(analytic)}, using approximate formulas for variance estimation (based on influence functions) assuming independent data. For clustered data, specify {cmd:vce(cluster} {it:clustvar}{cmd:)}, where {it:clustvar} is the variable identifying the clusters. For bootstrap and jackknife estimation, see help {it:{help vce_option}}. Variance estimation is not supported if {cmd:iweights} or {cmd:fweights} are specified. {phang} {opt cluster(clustvar)} is a synonym for {cmd:vce(cluster} {it:clustvar}{cmd:)}. {marker svy}{...} {phang} {cmd:svy}[{cmd:(}{it:subpop}{cmd:)}] causes the survey design to be taken into account for variance estimation. The data need to be set up for survey estimation; see help {helpb svyset}. Specify {it:subpop} to restrict survey estimation to a subpopulation, where {it:subpop} is [{varname}] [{it:{help if}}] {pmore} The subpopulation is defined by observations for which {it:varname}!=0 and for which the {cmd:if} condition is met. See help {helpb svy} and {manlink SVY subpopulation estimation} for more information on subpopulation estimation. {pmore} The {cmd:svy} option of {cmd:robstat} only works if the variance estimation method is set to Taylor linearization by {helpb svyset} (the default). For other variance estimation methods you can use the usual {helpb svy} prefix command. For example, you could type {cmd:svy brr: robstat ...} to use BRR variance estimation. {cmd:robstat} does not allow the {helpb svy} prefix for Taylor linearization due to technical reasons. This is why the {cmd:svy} option is provided. {phang} {opt nose} suppresses the computation of standard errors and confidence intervals. The {cmd:nose} option may be useful to speed-up computations with prefix commands that use replication techniques for variance estimation, such as, e.g., {helpb svy jackknife}. Options {cmd:vce(bootstrap)} and {cmd:vce(jackknife)} imply {cmd:nose}. {phang} {opt generate}[{cmd:(}{it:prefix}{cmd:)}] stores the values of the influence functions used for the computation of standard errors and confidence intervals. One influence function variable is stored for each statistic per outcome variable in each over-group. The variable names are prefixed by {it:prefix}, where the default prefix is "{cmd:_IF_}". {cmd:generate()} has no effect if {cmd:nose}, {cmd:vce(bootstrap)}, or {cmd:vce(jackknife)} is specified. {phang} {opt replace} allows overwriting existing variables. This is only relevant if {cmd:generate()} has been specified. {dlgtab:Normality tests} {marker tests}{...} {phang} {cmd:jbtest}[{cmd:(}{it:tests}{cmd:)}] computes generalized Jarque-Bera tests for normality as suggested by Brys et al. (2008). The {cmd:statistics()} option (see above) is not allowed if {cmd:jbtest} is specified. {it:tests} is a space-separated list of the following tests. {p2colset 13 32 34 2}{...} {p2col 11 25 27 2:Test{space 6}Synonym}{p_end} {p2col :{opt JB}{space 8}{opt jb:era}}classic Jarque-Bera skewness and kurtosis test {p_end} {p2col :{opt MOORS}{space 5}{opt moors}}robust skewness and tail-weight test based on {cmd:SK25} and {cmd:QW25} {p_end} {p2col :{opt MC-LR}{space 5}{opt mclr}}robust skewness and tail-weight test based on {cmd:MC}, {cmd:LMC} and {cmd:RMC} {p_end} {p2col :{opt MC-L}{space 6}{opt mcl}}robust skewness and left tail-weight test based on {cmd:MC} and {cmd:LMC} {p_end} {p2col :{opt MC-R}{space 6}{opt mcr}}robust skewness and right tail-weight kurtosis test based on {cmd:MC} and {cmd:RMC} {p_end} {p2col :{opt MC}{space 8}{opt mc}}robust skewness test based on {cmd:MC} {p_end} {p2col :{opt LR}{space 8}{opt lr}}robust tail-weight test based on {cmd:LMC} and {cmd:RMC} {p_end} {p2col :{opt L:MC}{space 7}{opt l:mc}}robust left tail-weight test based on {cmd:LMC} {p_end} {p2col :{opt R:MC}{space 7}{opt r:mc}}robust right tail-weight test based on {cmd:RMC} {p_end} {pmore} Specifying option {cmd:jbtest} without argument is equivalent to {cmd:jbtest(jbera moors mclr)}. Furthermore, {cmd:jbtest(all)} reports all available tests. {phang} {opt wald} specifies that the normality tests are to be based on the estimated variance-covariance matrix of the involved skewness and kurtosis or tail-weight parameters. The default is to base the tests on the theoretical variance for normally distributed data. Specifying the {cmd:svy} option, a {it:vcetype} other than {cmd:analytic}, or weights other than {cmd:fweight}s implies {cmd:wald}. {cmd:wald} is not allowed with {cmd:fweight}s. {dlgtab:Reporting} {phang} {opt level(#)} specifies the confidence level, as a percentage, for confidence intervals. The default is {cmd:level(95)} or as set by {helpb set level}. {phang} {opt cilog} causes log-transformed confidence intervals to be used for scale statistics. This ensures that the confidence intervals do not include zero. Log-transformed confidence intervals are computed as exp(ln({it:S}) +/- z * SE/{it:S}) {pmore} where {it:S} is estimated the scale statistic, z is the critical value for the given confidence level, and SE is the standard error of {it:S}. {phang} {opt noheader} suppresses the output header; only the coefficient table is displayed. {phang} {opt notable} suppresses the coefficient table. {marker displayopts}{...} {phang} {it:display_options} are standard reporting options such as {cmd:cformat()}, {cmd:pformat()}, {cmd:sformat()}, or {cmd:coeflegend}. See {helpb estimation options:[R] estimation options}. {dlgtab:Technical options} {phang} {opt tolerance(#)} specifies the tolerance for the M estimates. The default is {cmd:tolerance(1e-10)}. {phang} {opt iterate(#)} specifies the maximum number of iterations for the M estimates. If convergence is not reached within {cmd:iterate()} iterations, the algorithm stops and returns error. The default is {cmd:iterate(16000)} or as set by {helpb set maxiter}. {phang} {opt kernel(kernel)} specifies the kernel function for density estimation (computation of standard errors involves density estimation for some of the statistics). The default is {cmd:kernel(epan2)}. See help {helpb mf_mm_density:mm_density()} for available kernels. {phang} {opt bw(type)} specified the type of automatic bandwidth selector for density estimation. The default is {cmd:bw(dpi)}. See help {helpb mf_mm_density:mm_density()} for available bandwidth selectors. {phang} {opt adaptive(#)} specifies the number of repetitions for adaptive kernel density estimation. The default is {cmd:adaptive(2)}. Specify {cmd:adaptive(0)} for non-adaptive kernel density estimation. {phang} {opt n(#)} specifies the size of the approximation grid for density estimation. The default is {cmd:n(512)}. {title:Examples} . {stata sysuse auto} . {stata robstat price, statistics(mean median alpha5 alpha25 HL Huber)} . {stata robstat price, jbtest} {title:Stored results} {pstd} {cmd:robstat} stores the following in {cmd:e()}: {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:e(N)}}number of observations{p_end} {synopt:{cmd:e(N_over)}}number of subpopulations{p_end} {synopt:{cmd:e(N_clust)}}number of clusters{p_end} {synopt:{cmd:e(k_eq)}}number of equations in {cmd:e(b)}{p_end} {synopt:{cmd:e(N_stats)}}number of statistics{p_end} {synopt:{cmd:e(N_vars)}}number of variables{p_end} {synopt:{cmd:e(df_r)}}sample degrees of freedom{p_end} {synopt:{cmd:e(rank)}}rank of {cmd:e(V)}{p_end} {synopt:{cmd:e(level)}}confidence level for CIs{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:e(cmd)}}{cmd:robstat}{p_end} {synopt:{cmd:e(cmdline)}}command as typed{p_end} {synopt:{cmd:e(title)}}title in estimation output{p_end} {synopt:{cmd:e(depvar)}}variable names{p_end} {synopt:{cmd:e(statistics)}}requested statistics{p_end} {synopt:{cmd:e(over)}}name of {cmd:over()} variable{p_end} {synopt:{cmd:e(over_namelist)}}values from {cmd:over()} variable{p_end} {synopt:{cmd:e(over_labels)}}labels from {cmd:over()} variable{p_end} {synopt:{cmd:e(total)}}{cmd:total} or empty{p_end} {synopt:{cmd:e(wtype)}}weight type{p_end} {synopt:{cmd:e(wexp)}}weight expression{p_end} {synopt:{cmd:e(clustvar)}}name of cluster variable{p_end} {synopt:{cmd:e(vce)}}{it:vcetype} specified in {cmd:vce()}{p_end} {synopt:{cmd:e(vcetype)}}title used to label Std. Err.{p_end} {synopt:{cmd:e(title)}}title in estimation output{p_end} {synopt:{cmd:e(jbtitle)}}title in normality test output{p_end} {synopt:{cmd:e(jbtype)}}{cmd:chi2} or {cmd:F} or empty{p_end} {synopt:{cmd:e(jbwald)}}{cmd:wald} or empty{p_end} {synopt:{cmd:e(properties)}}{cmd:b V} or {cmd:b}{p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Matrices}{p_end} {synopt:{cmd:e(b)}}estimates{p_end} {synopt:{cmd:e(V)}}variance-covariance matrix of estimates{p_end} {synopt:{cmd:e(aux)}}tuning constants of M estimators{p_end} {synopt:{cmd:e(class)}}class of statistic (1 location, 2 scale, 3 skewness, 4 kurtosis){p_end} {synopt:{cmd:e(_N)}}numbers of observations in subpopulations{p_end} {synopt:{cmd:e(jbtest)}}normality test results (if requested){p_end} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Functions}{p_end} {synopt:{cmd:e(sample)}}marks estimation sample{p_end} {p2colreset}{...} {pstd} If the {cmd:svy} option is specified, various additional results as described in help {helpb svy} are stored in {cmd:e()}. {title:Methods and Formulas} {pstd} Point estimates for {cmd:mean}, {cmd:alpha}, {cmd:median}, {cmd:sd}, {cmd:IQR}, {cmd:MAD}, {cmd:skewness}, {cmd:SK}, {cmd:kurtosis}, {cmd:QW}, {cmd:LQW}, and {cmd:RQW} are computed using {helpb summarize} and {helpb _pctile}. {pstd} Point estimates for {cmd:Huber}, {cmd:biweight}, and {cmd:S} are computed by the iterative re-weighted least square procedure (IRWLS). {pstd} Point estimates for {cmd:HL}, {cmd:Qn}, {cmd:MC}, {cmd:LMC}, and {cmd:RMC} are computed using variants of the fast algorithm proposed by Johnson and Mizoguchi (1978) (also see Croux and Rousseeuw 1992, Brys et al. 2004). {pstd} The normality tests requested by the {cmd:jbtest} option are defined and computed as suggested by Brys et al. (2008). In case of nonstandard VCE or if the {cmd:wald} option is specify, Wald tests based on the estimated variance-covariance matrix are performed using the {helpb test} command. {pstd} Approximate standard errors and confidence intervals are computed using an influence-function approach. For each statistic a variable containing the values of the influence function evaluated at the values of the outcome variable is computed, while imputing the unknown quantities of the influence function by their empirical counterparts. For some of the influence functions density estimation is required, which is performed using the binned approximation estimator implemented in {helpb mf_mm_density:mm_density()}. The sampling variance of the statistics is then estimated applying the {helpb mean} command to the variables containing the influence-function values. {pstd} For more information on methods and formulas see Jann, Verardi and Vermandele (forthcoming). {title:References} {phang} Brys, G., M. Hubert, A. Struyf (2004). A Robust Measure of Skewness. Journal of Computational and Graphical Statistics 13(4): 996-1017. {p_end} {phang} Brys, G., M. Hubert, A. Struyf (2006). Robust measures of tail weight. Computational Statistics & Data Analysis 50: 733-759. {p_end} {phang} Brys, G., M. Hubert, A. Struyf (2008). Goodness-of-fit tests based on a robust measure of skewness. Computational Statistics 23: 429-442. {p_end} {phang} Croux, C., P. J. Rousseeuw (1992). Time-efficient algorithms for two highly robust estimators of scale. P. 411-428 in: Y. Dodge and J. Whittaker (eds.). Computational Statistics. Heidelberg: Physica-Verlag. {p_end} {phang} Jann, B., V. Verardi, C. Vermandele (forthcoming). Applied Robust Regression in Stata. College Station, Texas: The Stata Press. {p_end} {phang} Johnson, D. B., T. Mizoguchi (1978). Selecting the {it:K}th element in {it:X} + {it:Y} and {it:X}_1 + {it:X}_2 + ... + {it:X}_{it:m}. SIAM Journal on Scientific Computing 7(2): 147–153. {p_end} {phang} Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika 62(1): 101-111. {p_end} {phang} Hodges, Jr., J. L., E. L. Lehmann (1963). Estimates of location based on rank tests. Annals of Mathematical Statistics 34(2): 598-611. {p_end} {phang} Rousseeuw, P. J., C. Croux (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association 88(424): 1273-1283. {p_end} {phang} Yule, G. U., M. G. Kendall (1968). An Introduction to the Theory of Statistics. 14th ed. London: Griffin. {p_end} {title:Authors} {pstd} Ben Jann (University of Bern), Vincenzo Verardi (University of Namur and Universite libre de Bruxelles), Catherine Vermandele (Universite libre de Bruxelles) {pstd} Support: ben.jann@soz.unibe.ch {pstd} Thanks for citing this software as follows: {pmore} Jann, B., V. Verardi, C. Vermandele (2018). robstat: Stata module to estimate robust univariate statistics. Available from {browse "http://ideas.repec.org/c/boc/bocode/s458524.html"}. {title:Also see} {psee} Online: help for {helpb summarize}, {helpb mean}, {helpb tabstat}, {helpb centile}, {helpb ci}, {helpb pctile} {psee} From the SSC Archive: {stata ssc describe robreg:{bf:robreg}}