Title
simpplot -- Plot describing p-values from a simulation by comparing nominal significance levels with the coverages
Syntax
simpplot varlist [if] [in] [, options ]
options description ------------------------------------------------------------------- main#opt(graph_opts ) options governing the look the #th variable ra(off | graph_opts) options governing the look of the Monte Carlo region of acceptance level(#) set confidence level for the Monte Carlo region of acceptance; default is level(95) ref0(off | graph_opts) options governing the look of a reference line noresid displays the observed coverage against nominal significance levels; default is to display the deviations from the nominal significance level against the nominal significance level region of acceptance by(varlist [, byopts]) option for repeating simpplot command generate(newvars) specifies that the residuals or observed coverages and optionally the x-coordinate, upper and lower bound of the Monte Carlo region of are to be stored addplot(plot) add other plots to the graph other graph options -------------------------------------------------------------------
Description
simpplot describes the results of a simulation that inspects the coverage of a statistical test. If a null-hypothesis is true than a statistical test at a 5% siginificance level should reject the true null hypothesis in 5% of the replications. In a simulation we can make sure that the null hypothesis is true in the "population", draw many times from that "population", perform the test in each of these "samples", and count how many times the null hypothesis is rejected. The proportion of replications in which the hypothesis was rejected is our estimate of the coverage of the test, and this coverage should be close to the nominal siginficance level. There is nothing special about the 5% significance level other than that it is the conventional level in many disciplines. We could just as easily look at the coverage of tests at a 1%, 2%, 10%, or any other significance level. simpplot displays by default the deviations from the nominal significance level against the entire range of possible nominal significance levels. It also displays the range (Monte Carlo region of acceptance) within which we can reasonably expect these deviations to remain if the test is well behaved.
If a test is well behaved than the p-values of that test of a true null-hypothesis should follow a standard uniform distribution. That way 1% of the replication will have a p-value less than 0.01, 5% of the replications will have a p-value less than 0.05, 10% will have a p-value less than 0.10, etc. So whichever significance level we choose, we will on average reject the null hypothesis the right number of times. One could see the graph produced by simpplot as a graphical test whether the distribution of p-values follows a standard uniform distribution.
The Monte Carlo region of acceptance is the area of the graph where we would expect most simulations (by default 95%) to be if we were repeating our simulation experiment many times and we were simulating a well behaved test. If we did a simulation with n replication, than for a given nominal significance level a we would expect the number of rejections to follow a binomial(n,a) distribution. This is what is being used to compute the end points of the region of accpetance.
options
main#opt(graph_opts) options governing the look of the #th variable specified in varlist. the relevant options are listed in twoway_scatter.
ra(off | graph_opts) options governing the look of the Monte Carlo region of acceptance. One can suppres the display (and computation) of the Monte Carlo region of acceptance by specifying ra(off). Alternatively, one can specify options that change the look of the region of acceptance. The relevant options are listed in twoway_rarea.
level(#) set confidence level for the Monte Carlo region of acceptance; default is level(95)
ref0(off | graph_opts) options governing the look of a reference line. By default this is a horizontal line at 0. If the noresid option is specified than the reference line is a diagonal line from 0,0 to 1,1. One can suppres the reference line by specifying ref0(off). Alternatively, one can specify options than change the look of the reference line. The relevant options are listed in twoway_line.
noresid displays the observed coverage against nominal significance levels; default is to display the deviations from the nominal significance level against the nominal significance level.
by(varlist [, byopts]) Option by() draws separate plots within one graph. The byopts are documented in by_option.
generate(newvars) specifies that the residuals (the default) or coverages (when the noresid option has been specified) for each variable in varlist is to be stored. If the number of newvars is the number of variables in varlist + 3, than the x-coordinate, the upper and lower bound of the Monte Carlo region of acceptance are also stored.
So if one uses simpplot to display k p-values, than one can specify either k or k + 3 newvars. The first k newvars will contain the residuals or coverages of the corresponding p-value in varlist. The k+1th newvar will contain the x-coordinate for the Monte Carlo region of acceptance, the k+2th and k+3th will contain respectively the lower and upper bound of that area.
addplot(plot) allows adding more graph twoway plots to the graph
other graph options
Examples
In this example we test how well a t-test performs when the data is from a non-Gaussian (some prefer to say non-normal) distribution, in this case a Chi-square distribution with 2 degrees of freedom. We know that the mean of that distribution is 2, so a t-test whether the mean equals 2 is a test of a true null-hypothesis. We also look at how well this test performs in different sample sizes. In a sample size of 50 the test does not perform too well, the true null-hypothesis is too often rejected, but a sample size of a 500 seems already big enough for this test to work.
program drop _all program define sim, rclas drop _all set obs 500 gen x = rchi2(2) ttest x=2 in 1/50 return scalar p50 = r(p) ttest x=2 return scalar p500 = r(p) end
set seed 12345 simulate p50=r(p50) p500=r(p500), /// reps(5000) : sim
label var p50 "N=50" label var p500 "N=500" simpplot p50 p500, main1opt(mcolor(red)) /// main2opt(mcolor(blue))
Author
Maarten L. Buis Wissenschaftszentrum Berlin für Sozialforschung, WZB Research unit Skill Formation and Labor Markets maarten.buis@wzb.eu
Also see: