{smcl} {* 1.0.3 MLB 14May2013}{...} {* 1.0.2 MLB 27Jan2013}{...} help for {hi:als_norm} {hline} {title:Title} {p2colset 5 17 19 2}{...} {p2col :{hi: asl_norm} {hline 2}}Bootstrap normality tests.{p_end} {p2colreset}{...} {title:Syntax} {p 8 15 2} {cmd:asl_norm} {varname} {ifin} [ {cmd:,} {opt dh:ansen} {opt hz:irkler} {opt mku:rtosis} {opt mskew:ness} {opt jb:test} {opt sk:test} {opt skew:ness} {opt ku:rtosis} {opt sw:ilk} {opt sf:rancia} {opt all} {opt fast} {opt reps(#)} {opt nodots} {opt mcci(#)} {opt sa:ving(filename [, replace double every(#)])} ] {title:Description} {pstd} {cmd:asl_norm} computes bootstrap estimates of the Achieved Significance Levels (ASL) for a number of Gaussianity (normality) tests. The idea behind a boostrap test is to take the logic of a p-value literaly: In order to find the probability of observing data with a test statistic at least as extreme as the one found in the data one can: {pmore} 1) repeatedly draw a new variable from a Gaussian distribution with the mean and standard deviation as observed in the data, {pmore} 2) compute in each replication the test statistic for that simulated variable, and {pmore} 3) compute the ASL as the number of times that that test statistic is as extreme or more than the one found in the real data plus 1 divided by the number of replications plus 1. {pstd} This ASL can be interpreted as a more robust estimate of the p-value and/or as a simulation check of whether the chosen statistic is working well in your sample, i.e. whether the ASL corresponds with the asymptotic p-value. {pstd} A key aspect of ASLs is that they are computed with aid of a Monte Carlo experiment and are thus uncertain. If one were to call {cmd: asl_norm} twice (and not set the {help set_seed:seed}), one will find slightly different ASLs. {cmd:asl_norm} quantifies this uncertainty by displaying a Monte Carlo confidence interval for the ASL. If you were to call {cmd:asl_norm} a 100 times you would in 95 of these calls expect to find an ASL within this interval. If you are unhappy with the width of this interval, than you can decrease it by specifying more replications in the {cmd:reps()}. Typically, I find 19,999 a number that works well for this type of procedure; if the "true" p-value is 0.05 than the ASL will on average be computed based on a 999 replications where the simulated statistic is larger than the observed statistic. {title:An aside} {pstd} The estimate of the ASL used in this program is discussed in Chapter 4 of ({help asl_norm##davison_hinkley:Davison and Hinkley 1997}) and deviates slightly from the estimate discussed in Chapter 16 of ({help asl_norm##efron_tibshirani:Efron and Tibshirani 1993}) in that the latter estimate the ASL using just the proportion of times that that test statistic is more extreme than the one found in the real data. With a large number of replications this difference is not going to matter, but striclty speaking the former is slightly preferable. {pstd} If we find an ASL of {it:a}, than the probability of drawing a dataset with an ASL less than or equal to {it:a} if the null hypothesis is true should itself be {it:a}, and this should be true for all possible values of {it:a}. So the sampling distribution of the ASL if the null hypothesis is true should be a standard uniform distribution. In other words, if the null hypothesis is true all possible values of {it:a} should be equally likely. {pstd} If we compute our ASL using B replications, than the number of possible values for {it:a} is limited to only B+1 values: a value based on 0, 1, 2, ..., or B replications with a test statistic as extreme or more than the one found in the real data. Based on the condition discussed above, each of these possible values should have a probability of 1/(B+1) if the null hypothesis is true. This means that if the null hypothesis is true, the probability of finding 0 replications with a test statistic as extreme or more as the one found in the real data is 1/(B+1). Similarly, the probability of finding 1 or less replications with a test statistic at least as extreme as the one found in the real data is 2/(B+1). Generally, the probability of finding k or less replications with a test statistic as extreme or more as the one found in the real data if the null hypothesis is true is (k+1)/(B+1). {title:Options} {dlgtab:Statistics} {phang} {opt dh:ansen} specifies that thee Doornik-Hansen statistic as computed by {help mvtest_normality} should be used. {phang} {opt hz:irkler} specifies that the Henze-Zirkler statistic as computed by {help mvtest_normality} should be used. {phang} {opt mku:rtosis} specifies that Mardia's kurtosis test statistic as computed by {help mvtest_normality} should be used. {phang} {opt mskew:ness} specifies that Mardia's skewness test statistic as computed by {help mvtest_normality} should be used. {phang} {opt jb:test} specifies that the Jarque-Bera test statistic should be used. {phang} {opt sk:test} specifies that the skewness-kurtosis test statistic as computed by {help sktest} should be used. The default if N > 5000. {phang} {opt skew:ness} specifies that the skewness test statistic as computed by {help sktest} should be used. {phang} {opt ku:rtosis} specifies that the kurtosis test statistic as computed by {help sktest} should be used. {phang} {opt sw:ilk} specifies that the Shapiro-Wilk test statistic as computed by {help swilk} should be used. Only allowed if 4<=N<=2000. The default if N <= 2000. {phang} {opt sf:rancia} specifies that the Shapiro-Francia test statistic as computed by {help sfrancia} should be used. Only allowed if 5<=N<=5000. The default if 2000 < N <= 5000. {phang} {opt all} specifies that all the above statistics (if allowed) should be used. {phang} {opt fast} specifies that all the above statistics (if allowed), except he Henze-Zirkler statistic and Mardia's skewness test statistic, which tend to be slow in moderate to large samples. {dlgtab:Other options} {phang} {opt reps(#)} specifies the number of replications used to compute the ASL. The default is {cmd:reps(1000)}. {phang} {opt nodots} suppresses display of the replication dots. By default, one dot character is displayed for each successful replication. A red "x" is displayed if there was an error while computing one of the test statistics or if one of the test statistics is missing. {phang} {opt mcci(#)} specifies the confidence level, as a percentage, for the Monte Carlo confidence interval. The default is {cmd:mcci(95)} or as set by {helpb set level}. {phang} {opt saving(filename [, replace double every(#)])} creates a Stata data file containing for each replication the test statistics. See: {help prefix_saving_option}. {title:Examples} {pstd} In a small sample most tests have at least some degree of trouble. In this case (74 observations) the Doornik-Hansen test leads to a p-value that is a bit too large. {cmd}{...} sysuse auto, clear asl_norm trunk, dh reps(19999) {txt}{...} {p 4 4 2}({stata "asl_norm_ex 1":click to run}){p_end} {pstd} In larger samples the asymptotic p-values tend to be less problematic {cmd}{...} sysuse nlsw88, clear gen lnwage = ln(wage) asl_norm lnwage if union < ., dh reps(19999) {txt}{...} {p 4 4 2}({stata "asl_norm_ex 2":click to run}){p_end} {pstd} You can use the {cmd:saving()} option to perform a simulation of how well the tests perform for the number of observations in your dataset. This creates a dataset of test statistics on datasets where the null hypothesis is true, so the distribution of these statistics should correspond to the theoretical sampling distribution and the p-values should follow a continuous standard uniform distribution. {cmd}{...} set seed 12345 sysuse auto, clear tempfile res asl_norm trunk, jb sktest reps(1999) saving("`res'", replace) use "`res'" // check sampling distribution of test statistic qenvchi2 sktest, gen(lb ub) df(2) overall reps(5000) tempname qplot qplot sktest jbtest lb ub, trscale(invchi2(2,@)) /// ms(o o none ..) c(. . l l) lc(gs10 ..) /// scheme(s2color) ylab(,angle(horizontal)) /// legend(order( 1 "sktest" /// 2 "Jarque-Bera" /// 3 "95% simultaneaous" /// "Monte Carlo CI")) /// name(`qplot') // test sampling distribution of p-values gen p_sk = chi2tail(2,sktest) label var p_sk "sktest" gen p_jb = chi2tail(2,jbtest) label var p_jb "Jarque-Bera" simpplot p_sk p_jb, overall reps(19999) /// scheme(s2color) ylab(,angle(horizontal)) {txt}{...} {p 4 4 2}({stata "asl_norm_ex 3":click to run}){p_end} {pstd} If you do find a significant difference between the variable and a Gaussian distribution, you will need to follow this up with a graphical inspection of that variable in order to determine the source of the problem. In this case, the problem is in all likelihood that wages were top-coded. {cmd}{...} sysuse nlsw88, clear gen lnwage = ln(wage) asl_norm lnwage qnorm lnwage {txt}{...} {p 4 4 2}({stata "asl_norm_ex 4":click to run}){p_end} {pstd} Better yet, you can combine the testing logic with the graphing logic by adding confidence bands using the {cmd:qenv} package. {cmd}{...} sysuse nlsw88, clear gen lnwage = ln(wage) qenvnormal lnwage, gen(lb ub) overall reps(19999) sum lnwage qplot lnwage lb ub, ms(oh none ..) c(. l l) lc(gs10 ..) /// legend(off) ytitle("ln(wage)") xtitle(Normal quantiles) /// trscale(`r(mean)' + `r(sd)' * invnormal(@)) {txt}{...} {p 4 4 2}({stata "asl_norm_ex 5":click to run}){p_end} {title:Author} {p 4 4} Maarten L. Buis{break} Wissenschaftszentrum Berlin für Sozialforschung, WZB{break} Research unit Skill Formation and Labor Markets {break} maarten.buis@wzb.eu {p_end} {title:References} {marker davison_hinkley}{...} {phang} A.C. Davison & D.V. Hinkley (1997) {it:Bootstrap Methods and their Application}. Cambridge: Cambridge University Press. {marker efron_tibshirani}{...} {phang} B. Efron & R.J. Tibshirani (1993) {it:An Introduction to the Bootstrap}. Boca Raton: Chapman & Hall/CRC. {title:Also see} {psee} Online: {helpb qnorm}, {helpb pnorm}, {helpb sktest}, {helpb mvtest normality}, {helpb swilk} {psee} If installed: {helpb hangroot}, {helpb qenvnormal}, {helpb qplot} {p_end}