{smcl} {* *! version 1.5.0 25Mar2013}{...} {* *! version 1.2.1 02Aug2012}{...} {* *! version 1.1.0 19Jun2012}{...} {* *! version 1.0.0 18Jun2012}{...} {cmd:help simpplot} {hline} {title:Title} {p2colset 5 17 19 2}{...} {p2col :{hi:simpplot} {hline 2}}Plot describing p-values from a simulation by comparing nominal significance levels with the coverages{p_end} {p2colreset}{...} {title:Syntax} {p 8 12 2} {cmd:simpplot} {varlist} {ifin} [{cmd:,} {it:{help simpplot##options:options}} ] {marker options}{...} {it:options}{col 33}description {hline 67} {cmd:main#opt(}{it:}{help twoway_scatter:graph_opts} {cmd:)}{...} {col 33}options governing the look the #th variable {cmd:ra(}{it:off} | {it:{help twoway_rarea:graph_opts}}{cmd:)}{...} {col 33}options governing the look of the Monte {col 33}Carlo region of acceptance {opt l:evel(#)}{...} {col 33}set {help level:confidence level} for the Monte Carlo {col 33}region of acceptance; default is {cmd:level(95)} {cmd:overall}{...} {col 33}displays a region of acceptance with an {col 33}overall error rate of {cmd:level}. The default is {col 33}to display a region of acceptance with a {col 33}point-wise error rate of {cmd:level} {opt reps(#)}{...} {col 33}number of replications used for computing {col 33}the overall error rate. the default is 1000. {cmd:ref0(}{it:off} | {it:{help twoway_line:graph_opts}}{cmd:)}{...} {col 33}options governing the look of a reference {col 33}line {opt nodev:iations}{...} {col 33}displays the observed coverage against {col 33}nominal significance levels; default is to {col 33}display the deviations from the nominal {col 33}significance level against the nominal {col 33}significance level {cmd:by(}{it:{varlist}} [{cmd:,} {it:{help by_option:byopts}}]{cmd:)}{...} {col 33}option for repeating simpplot command {opt gen:erate(newvars)}{...} {col 33}specifies that the deviations or observed {col 33}coverages and optionally the x-coordinate, {col 33}upper and lower bound of the Monte Carlo {col 33}region are to be stored {opt addplot(plot)}{...} {col 33}add other plots to the graph {help twoway_scatter:other graph options} {hline 67} {title:Description} {pstd} {cmd:simpplot} describes the results of a simulation that inspects the coverage of a statistical test. If a null-hypothesis is true than a statistical test at a 5% siginificance level should reject the true null hypothesis in 5% of the replications. In a simulation we can make sure that the null hypothesis is true in the "population", draw many times from that "population", perform the test in each of these "samples", and count how many times the null hypothesis is rejected. The proportion of replications in which the hypothesis was rejected is our estimate of the coverage of the test, and this coverage should be close to the nominal siginficance level. There is nothing special about the 5% significance level other than that it is the conventional level in many disciplines. We could just as easily look at the coverage of tests at a 1%, 2%, 10%, or any other significance level. {cmd:simpplot} displays by default the deviations from the nominal significance level against the entire range of possible nominal significance levels. It also displays the range (Monte Carlo region of acceptance) within which we can reasonably expect these deviations to remain if the test is well behaved. {pstd} If a test is well behaved than the p-values of that test of a true null-hypothesis should follow a standard uniform distribution. That way 1% of the replication will have a p-value less than 0.01, 5% of the replications will have a p-value less than 0.05, 10% will have a p-value less than 0.10, etc. So whichever significance level we choose, we will on average reject the null hypothesis the right number of times. One could see the graph produced by {cmd:simpplot} as a graphical test whether the distribution of p-values follows a standard uniform distribution. {pstd} Even if the test we are checking with our simulaiton works perfectly, we would still expect slight deviations in the distribution of the p-values from a standard uniform distribution because of the randomness in a Monte Carlo experiment. To quantify the amount of this uncertainty {cmd:simpplot} by default also displays a point-wise Monte Carlo region of acceptance. This gives for each nominal significance level the range in which we would expect most simulations (by default 95%) to be if we were repeating our simulation experiment many times and we were simulating a well behaved test. If we did a simulation with {it:n} replication, than for a given nominal significance level {it:a} we would expect the number of rejections to follow a binomial({it:n},{it:a}) distribution. This is what is being used to compute the end points of the point-wise region of accpetance. {pstd} A 95% point-wise region of acceptance means that 5% of the replications of our Monte Carlo experiment is expected to lie outside the point-wise region of acceptance. So, the chance that the entire set of replications lies within the point-wise region of acceptence is a lot less than the nominal 95%. {cmd:simpplot} can also display an approximate overall region of acceptance, that is the region in which we expect 95% of the curves to remain if we were repeating the Monte Carlo experiment many times using an algorithm discussed by Davison and Hinkley (1997, Chapter 4). {pmore} 1. Define a grid of nominal significance level consisting of 300 equally spaced values between .001 and .999, i.e. .001, .0043, .0077, ... .992, .996, .999. Take {cmd:reps} samples of the same size from a standard uniform distribution. For each sample from the uniform distribution and each value on the grid, compute the proportion of that sample that is less than or equal to that grid value. {pmore} 2. Order each sample from smallest to largest. {pmore} 3. Set L to ceil((100 - {cmd:level})/200 * {cmd:reps}). {pmore} 4. For each sample, create an envelope using the remaining samples by storing for each grid value the Lth and ({cmd:reps} - L)th smallest value, and determine whether the entire sample falls within this envelope. The proportion of samples for which this is not true is an estimate of the overall error rate of that envelope. {pmore} 5. Decrease L until the overall error rate is less than 100 - {cmd:level}. If the nominal level has not been reached when L = 1, then the entire range is returned and a warning is displayed reporting the approximate overall error rate for that envelope. {pmore} 6. For each value on the grid, calculate reference intervals using the Lth and (1-L)th sample value. {title:options} {phang} {opt main#opt(graph_opts)} options governing the look of the #th variable specified in {it:varlist}. the relevant options are listed in {help twoway_scatter}. {phang} {opt ra(off | graph_opts)} options governing the look of the Monte Carlo region of acceptance. One can suppres the display (and computation) of the Monte Carlo region of acceptance by specifying {cmd:ra(off)}. Alternatively, one can specify options that change the look of the region of acceptance. The relevant options are listed in {help twoway_rarea}. {phang} {opt level(#)} set {help level:confidence level} for the Monte Carlo region of acceptance; default is level(95) {phang} {opt overall} specifies that the approximate overall Monte Carlo region of acceptance is displayed instead of the default point-wise Monte Carlo region of acceptance. {phang} {opt reps(#)} specifies the number of samples used to compute the overall Monte Carlo region of acceptance. The default is {opt reps(1000)}, which is often not enough. The {opt overall} option needs to be specified when specifying the {opt reps(#)} option. {phang} {opt ref0(off | graph_opts)} options governing the look of a reference line. By default this is a horizontal line at 0. If the {cmd:nodeviations} option is specified than the reference line is a diagonal line from 0,0 to 1,1. One can suppres the reference line by specifying {cmd:ref0(off)}. Alternatively, one can specify options than change the look of the reference line. The relevant options are listed in {help twoway_line}. {phang} {opt nodeviations} displays the observed coverage against nominal significance levels; default is to display the deviations from the nominal significance level against the nominal significance level. {phang} {cmd:by(}{it:varlist} [{cmd:,} {it:byopts}]{cmd:)} Option by() draws separate plots within one graph. The {it:byopts} are documented in {help by_option}. {phang} {opt gen:erate(newvars)} specifies that the deviations (the default) or coverages (when the {cmd:nodeviations} option has been specified) for each variable in {it:varlist} is to be stored. If the number of {it:newvars} is the number of variables in {it:varlist} + 3, than the x-coordinate, the upper and lower bound of the Monte Carlo region of acceptance are also stored. {pmore} So if one uses {cmd:simpplot} to display {it:k} p-values, than one can specify either {it:k} or {it:k} + 3 {it:newvars}. The first {it:k} {it:newvars} will contain the deviations or coverages of the corresponding p-value in {it:varlist}. The {it:k}+1th newvar will contain the x-coordinate for the Monte Carlo region of acceptance, the {it:k}+2th and {it:k}+3th will contain respectively the lower and upper bound of that area. {phang} {opt addplot(plot)} allows adding more {help graph twoway} plots to the graph {phang} {help twoway scatter: other graph options} {title:Examples} {pstd} In this example we test how well a t-test performs when the data is from a non-Gaussian (some prefer to say non-normal) distribution, in this case a Chi-square distribution with 2 degrees of freedom. We know that the mean of that distribution is 2, so a t-test whether the mean equals 2 is a test of a true null-hypothesis. We also look at how well this test performs in different sample sizes. In a sample size of 50 the test does not perform too well, the true null-hypothesis is too often rejected, but a sample size of a 500 seems already big enough for this test to work. {cmd} program drop _all program define sim, rclas drop _all set obs 500 gen x = rchi2(2) ttest x=2 in 1/50 return scalar p50 = r(p) ttest x=2 return scalar p500 = r(p) end set seed 12345 simulate p50=r(p50) p500=r(p500), /// reps(5000) : sim label var p50 "N=50" label var p500 "N=500" simpplot p50 p500, main1opt(mcolor(red)) /// main2opt(mcolor(blue)) {txt} {title:Author} {p 4 4} Maarten L. Buis{break} Wissenschaftszentrum Berlin für Sozialforschung, WZB{break} Research unit Skill Formation and Labor Markets {break} maarten.buis@wzb.eu {p_end} {title:Reference} {phang} Davison, A.C. and Hinkley, D.V. 1997. {it:Bootstrap methods and their application.} Cambridge: Cambridge University Press. {title:Also see:} {p 4 4} if installed: {help simsum}, {help qenvnormal}, {help qenvchi2}, {help qenvF}