help simpplot-------------------------------------------------------------------------------

Title

simpplot-- Plot describing p-values from a simulation by comparing nominal significance levels with the coverages

Syntax

simpplotvarlist[if] [in] [,options]

optionsdescription -------------------------------------------------------------------main#opt(graph_opts)options governing the look the #th variablera(off|graph_opts)options governing the look of the Monte Carlo region of acceptancelevel(#)set confidence level for the Monte Carlo region of acceptance; default islevel(95)ref0(off|graph_opts)options governing the look of a reference linenoresiddisplays the observed coverage against nominal significance levels; default is to display the deviations from the nominal significance level against the nominal significance level region of acceptanceby(varlist[,byopts])option for repeating simpplot commandgenerate(newvars)specifies that the residuals or observed coverages and optionally the x-coordinate, upper and lower bound of the Monte Carlo region of are to be storedaddplot(plot)add other plots to the graph other graph options -------------------------------------------------------------------

Description

simpplotdescribes the results of a simulation that inspects the coverage of a statistical test. If a null-hypothesis is true than a statistical test at a 5% siginificance level should reject the true null hypothesis in 5% of the replications. In a simulation we can make sure that the null hypothesis is true in the "population", draw many times from that "population", perform the test in each of these "samples", and count how many times the null hypothesis is rejected. The proportion of replications in which the hypothesis was rejected is our estimate of the coverage of the test, and this coverage should be close to the nominal siginficance level. There is nothing special about the 5% significance level other than that it is the conventional level in many disciplines. We could just as easily look at the coverage of tests at a 1%, 2%, 10%, or any other significance level.simpplotdisplays by default the deviations from the nominal significance level against the entire range of possible nominal significance levels. It also displays the range (Monte Carlo region of acceptance) within which we can reasonably expect these deviations to remain if the test is well behaved.If a test is well behaved than the p-values of that test of a true null-hypothesis should follow a standard uniform distribution. That way 1% of the replication will have a p-value less than 0.01, 5% of the replications will have a p-value less than 0.05, 10% will have a p-value less than 0.10, etc. So whichever significance level we choose, we will on average reject the null hypothesis the right number of times. One could see the graph produced by

simpplotas a graphical test whether the distribution of p-values follows a standard uniform distribution.The Monte Carlo region of acceptance is the area of the graph where we would expect most simulations (by default 95%) to be if we were repeating our simulation experiment many times and we were simulating a well behaved test. If we did a simulation with

nreplication, than for a given nominal significance levelawe would expect the number of rejections to follow a binomial(n,a) distribution. This is what is being used to compute the end points of the region of accpetance.

options

main#opt(graph_opts)options governing the look of the #th variable specified invarlist. the relevant options are listed in twoway_scatter.

ra(off|graph_opts)options governing the look of the Monte Carlo region of acceptance. One can suppres the display (and computation) of the Monte Carlo region of acceptance by specifyingra(off). Alternatively, one can specify options that change the look of the region of acceptance. The relevant options are listed in twoway_rarea.

level(#)set confidence level for the Monte Carlo region of acceptance; default is level(95)

ref0(off|graph_opts)options governing the look of a reference line. By default this is a horizontal line at 0. If thenoresidoption is specified than the reference line is a diagonal line from 0,0 to 1,1. One can suppres the reference line by specifyingref0(off). Alternatively, one can specify options than change the look of the reference line. The relevant options are listed in twoway_line.

noresiddisplays the observed coverage against nominal significance levels; default is to display the deviations from the nominal significance level against the nominal significance level.

by(varlist[,byopts])Option by() draws separate plots within one graph. Thebyoptsare documented in by_option.

generate(newvars)specifies that the residuals (the default) or coverages (when thenoresidoption has been specified) for each variable invarlistis to be stored. If the number ofnewvarsis the number of variables invarlist+ 3, than the x-coordinate, the upper and lower bound of the Monte Carlo region of acceptance are also stored.So if one uses

simpplotto displaykp-values, than one can specify eitherkork+ 3newvars. The firstknewvarswill contain the residuals or coverages of the corresponding p-value invarlist. Thek+1th newvar will contain the x-coordinate for the Monte Carlo region of acceptance, thek+2th andk+3th will contain respectively the lower and upper bound of that area.

addplot(plot)allows adding more graph twoway plots to the graphother graph options

ExamplesIn this example we test how well a t-test performs when the data is from a non-Gaussian (some prefer to say non-normal) distribution, in this case a Chi-square distribution with 2 degrees of freedom. We know that the mean of that distribution is 2, so a t-test whether the mean equals 2 is a test of a true null-hypothesis. We also look at how well this test performs in different sample sizes. In a sample size of 50 the test does not perform too well, the true null-hypothesis is too often rejected, but a sample size of a 500 seems already big enough for this test to work.

program drop _allprogram define sim, rclasdrop _allset obs 500gen x = rchi2(2)ttest x=2 in 1/50return scalar p50 = r(p)ttest x=2return scalar p500 = r(p)end

set seed 12345simulate p50=r(p50) p500=r(p500), ///reps(5000) : sim

label var p50 "N=50"label var p500 "N=500"simpplot p50 p500, main1opt(mcolor(red)) ///main2opt(mcolor(blue))

AuthorMaarten L. Buis Wissenschaftszentrum Berlin für Sozialforschung, WZB Research unit Skill Formation and Labor Markets maarten.buis@wzb.eu

Also see: