------------------------------------------------------------------------------- help for simsum Ian White -------------------------------------------------------------------------------
Analyses of simulation studies including Monte Carlo error
Introduction
The program simsum analyses simulation studies in which each simulated data set yields point estimates by one or more analysis methods. Bias, empirical standard error and precision relative to a reference method can be computed for each method. If, in addition, model-based standard errors are available then simsum can compute the average model-based standard error, the relative error in the model-based standard error, the coverage of nominal confidence intervals, and the power to reject a null hypothesis. Monte Carlo errors are available for all estimated quantities.
Syntax
Data may be in a wide or long format.
In the wide format, the data contain one record per simulated data set. The appropriate syntax is:
simsum estvarlist [if] [in], [true(expression) options]
where estvarlist is a varlist containing point estimates from one or more analysis methods.
In the long format, the data contain one record per method per simulated data set, and the appropriate syntax is:
simsum estvarname [if] [in], [true(expression) methodvar(varname) id(varlist) options]
where estvarnameis a variable containing the point estimates, methodvar(varname) identifies the method and id(varlist) identifies the simulated data set. The options are described below.
Main options
true(expression) gives the true value of the parameter. This is used in calculations of bias and coverage and is required whenever these statistics are requested.
methodvar(varname) specifies that the data are in long format, with each record representing one analysis of one simulated data set using the method identified by varname. Option id(varlist) must be specified. If methodvar() is not specified, the data must be in wide format, with each record representing all analyses of one simulated data set.
id(varlist) is required with option methodvar(). varlist must uniquely identify the data set used for each record, within levels of the by-variables.
se(varlist) lists the names of the variables containing the standard errors of the point estimates. For data in long format, this is a single variable.
seprefix(string) specifies that the names of the variables containing the standard errors of the point estimates are formed by adding the given prefix to the names of the variables containing the point estimates. It may be combined with sesuffix() but not with se().
sesuffix(string) specifies that the names of the variables containing the standard errors of the point estimates are formed by adding the given suffix to the names of the variables containing the point estimates. It may be combined with seprefix() but not with se().
Data checking options
graph requests a descriptive graph of standard errors against point estimates.
nomemcheck turns off checking that adequate memory is free. This check aims to avoid spending calculation time when simsum is likely to fail due to lack of memory.
max(#) specifies the maximum acceptable absolute value of the point estimates, standardised to mean 0 and SD 1. The default value is 10.
semax(#) specifies the maximum acceptable value of the standard error, as a multiple of the mean standard error. The default value is 100.
dropbig specifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Otherwise the program halts with an error. (Missing values are always dropped.)
nolistbig suppresses listing of point estimates and standard errors that lie outside the acceptable limits.
listmiss lists observations with missing point estimates and/or standard errors.
Calculation options
level(#) specifies the confidence level for coverages and powers. Default is $level.
by(varlist) summarises the results by varlist.
mcse reports Monte Carlo standard errors for all summaries.
robust is only useful if mcse is also specified. It requests robust Monte Carlo standard errors for the statistics empse, relprec and relerror, instead of those based on an assumption of normally distributed point estimates.
modelsemethod(rmse|mean) specifies whether the model standard error should be summarised as the root mean squared value (the default) or as the arithmetic mean.
ref(string) specifies the reference method against which relative precisions will be calculated. With data in wide format, string must be a variable name. With data in long format, string must be a value of the method variable; if the value is labelled then the label must be used.
Options specifying degrees of freedom
Degrees of freedom are used in calculating coverages and powers.
df(string) specifies the degrees of freedom. It may contain a variable name or a number (to apply to all estimators), or a list of variables containing the degrees of freedom for each estimator.
dfprefix(string) specifies that the names of the variables containing the degrees of freedom are formed by adding the given prefix to the names of the variables containing the point estimates. It may be combined with dfsuffix() but not with df().
dfsuffix(string) specifies that the names of the variables containing the degrees of freedom are formed by adding the given suffix to the names of the variables containing the point estimates. It may be combined with dfprefix() but not with df().
Statistic options
If none of the following options is specified, then all available statistics are computed.
bsims reports the number of simulations with non-missing point estimates.
sesims reports the number of simulations with non-missing standard errors.
bias estimates the bias in the point estimates.
empse estimates the empirical standard error -- the standard deviation of the point estimates.
relprec estimates the relative precision -- the inverse squared ratio of the empirical standard error of this method to the empirical standard error of the reference method. This calculation is slow: omitting it can reduce run time by up to 90%.
modelse estimates the model-based standard error. See modelsemethod() above.
relerror estimates the proportional error in the model-based standard error, using the empirical standard error as gold standard.
cover estimates the coverage of nominal confidence intervals at the specified level.
power estimates the power to reject the null hypothesis that the true parameter is zero, at the specified level.
Output options
clear loads the summary data into memory.
saving(filename) saves the summary data into filename.
nolist suppresses listing of the results, and is only allowed when clear or saving() is specified.
listsep lists the results using one table per statistic, giving narrower & better formatted output. The default is to list the results as a single table.
format(string) specifes the format for printing the results and saving the summary data. If listsep is also specified then up to three formats may be specified: (1) for results on the scale of the original estimates (bias, empse, modelse); (2) for percentages (relprec, relerror, cover, power); (3) for integers (bsims, sesims). Defaults are the existing format of the [first] estimate variable for (1) and (2), and %7.0f for (3).
sepby(varlist) invokes this list option when printing the results.
abbreviate(#) invokes this list option when printing the results.
gen(string) specifes the prefix for new variables identifying the different statistics in the output data set (only useful with clear or saving()).
Example
This example uses data in long format stored in MIsim.dta:
. simsum b, se(se) methodvar(method) id(dataset) true(0.5) mcse format(%7.0g)
Alternatively, the data could first be reshaped to wide format:
. reshape wide b se, i(dataset) j(method) string
. simsum b*, se(se*) true(0.5) mcse format(%7.0g)
Author
Ian White, MRC Biostatistics Unit, Cambridge, UK;