------------------------------------------------------------------------------- help for

simsumIan White -------------------------------------------------------------------------------

Analyses of simulation studies including Monte Carlo error

IntroductionThe program

simsumanalyses simulation studies in which each simulated data set yields point estimates by one or more analysis methods. Bias, empirical standard error and precision relative to a reference method can be computed for each method. If, in addition, model-based standard errors are available thensimsumcan compute the average model-based standard error, the relative error in the model-based standard error, the coverage of nominal confidence intervals, and the power to reject a null hypothesis. Monte Carlo errors are available for all estimated quantities.

SyntaxData may be in a wide or long format.

In the wide format, the data contain one record per simulated data set. The appropriate syntax is:

simsumestvarlist[if] [in], [true(expression)options]where

estvarlistis avarlistcontaining point estimates from one or more analysis methods.In the long format, the data contain one record per method per simulated data set, and the appropriate syntax is:

simsumestvarname[if] [in], [true(expression)methodvar(varname)id(varlist)options]where

estvarnameis a variable containing the point estimates,methodvar(varname)identifies the method andid(varlist)identifies the simulated data set. Theoptionsare described below.

Main options

true(expression)gives the true value of the parameter. This is used in calculations of bias and coverage and is required whenever these statistics are requested.

methodvar(varname)specifies that the data are in long format, with each record representing one analysis of one simulated data set using the method identified byvarname. Optionid(varlist)must be specified. Ifmethodvar()is not specified, the data must be in wide format, with each record representing all analyses of one simulated data set.

id(varlist)is required with optionmethodvar().varlistmust uniquely identify the data set used for each record, within levels of the by-variables.

se(varlist)lists the names of the variables containing the standard errors of the point estimates. For data in long format, this is a single variable.

seprefix(string)specifies that the names of the variables containing the standard errors of the point estimates are formed by adding the given prefix to the names of the variables containing the point estimates. It may be combined withsesuffix()but not withse().

sesuffix(string)specifies that the names of the variables containing the standard errors of the point estimates are formed by adding the given suffix to the names of the variables containing the point estimates. It may be combined withseprefix()but not withse().

Data checking options

graphrequests a descriptive graph of standard errors against point estimates.

nomemcheckturns off checking that adequate memory is free. This check aims to avoid spending calculation time whensimsumis likely to fail due to lack of memory.

max(#)specifies the maximum acceptable absolute value of the point estimates, standardised to mean 0 and SD 1. The default value is 10.

semax(#)specifies the maximum acceptable value of the standard error, as a multiple of the mean standard error. The default value is 100.

dropbigspecifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Otherwise the program halts with an error. (Missing values are always dropped.)

nolistbigsuppresses listing of point estimates and standard errors that lie outside the acceptable limits.

listmisslists observations with missing point estimates and/or standard errors.

Calculation options

level(#)specifies the confidence level for coverages and powers. Default is$level.

by(varlist)summarises the results byvarlist.

mcsereports Monte Carlo standard errors for all summaries.

robustis only useful ifmcseis also specified. It requests robust Monte Carlo standard errors for the statisticsempse,relprecandrelerror, instead of those based on an assumption of normally distributed point estimates.

modelsemethod(rmse|mean)specifies whether the model standard error should be summarised as the root mean squared value (the default) or as the arithmetic mean.

ref(string)specifies the reference method against which relative precisions will be calculated. With data in wide format,stringmust be a variable name. With data in long format,stringmust be a value of the method variable; if the value is labelled then the label must be used.

Options specifying degrees of freedomDegrees of freedom are used in calculating coverages and powers.

df(string)specifies the degrees of freedom. It may contain a variable name or a number (to apply to all estimators), or a list of variables containing the degrees of freedom for each estimator.

dfprefix(string)specifies that the names of the variables containing the degrees of freedom are formed by adding the given prefix to the names of the variables containing the point estimates. It may be combined withdfsuffix()but not withdf().

dfsuffix(string)specifies that the names of the variables containing the degrees of freedom are formed by adding the given suffix to the names of the variables containing the point estimates. It may be combined withdfprefix()but not withdf().

Statistic optionsIf none of the following options is specified, then all available statistics are computed.

bsimsreports the number of simulations with non-missing point estimates.

sesimsreports the number of simulations with non-missing standard errors.

biasestimates the bias in the point estimates.

empseestimates the empirical standard error -- the standard deviation of the point estimates.

relprecestimates the relative precision -- the inverse squared ratio of the empirical standard error of this method to the empirical standard error of the reference method. This calculation is slow: omitting it can reduce run time by up to 90%.

modelseestimates the model-based standard error. Seemodelsemethod()above.

relerrorestimates the proportional error in the model-based standard error, using the empirical standard error as gold standard.

coverestimates the coverage of nominal confidence intervals at the specified level.

powerestimates the power to reject the null hypothesis that the true parameter is zero, at the specified level.

Output options

clearloads the summary data into memory.

saving(filename)saves the summary data intofilename.

nolistsuppresses listing of the results, and is only allowed whenclearorsaving()is specified.

listseplists the results using one table per statistic, giving narrower & better formatted output. The default is to list the results as a single table.

format(string)specifes the format for printing the results and saving the summary data. Iflistsepis also specified then up to three formats may be specified: (1) for results on the scale of the original estimates (bias, empse, modelse); (2) for percentages (relprec, relerror, cover, power); (3) for integers (bsims, sesims). Defaults are the existing format of the [first] estimate variable for (1) and (2), and %7.0f for (3).

sepby(varlist)invokes thislistoption when printing the results.

abbreviate(#)invokes thislistoption when printing the results.

gen(string)specifes the prefix for new variables identifying the different statistics in the output data set (only useful withclearorsaving()).

ExampleThis example uses data in long format stored in MIsim.dta:

. simsum b, se(se) methodvar(method) id(dataset) true(0.5) mcseformat(%7.0g)Alternatively, the data could first be reshaped to wide format:

. reshape wide b se, i(dataset) j(method) string

. simsum b*, se(se*) true(0.5) mcse format(%7.0g)

AuthorIan White, MRC Biostatistics Unit, Cambridge, UK;