{smcl}
{* *! Help for -powersim- version 1.0.0 26July2013}{...}
{hline}
help {hi: powersim}
{hline}
{title:Title}
{p2colset 5 16 18 2}{...}
Simulation-based power analysis for linear and generalized linear models
{p2colreset}{...}
{title:Syntax}
{p 8 12 2}
{cmd:powersim}
[{cmd:using} {it:filename}] {it:options}
[{cmd:,} {it:options}] : {it:model_cmd}
{synoptset 30 tabbed}{...}
{synopthdr}
{synoptline}
{syntab :Power}
{synopt :{opth b(numlist)}} List of effect sizes, required{p_end}
{synopt :{opt null(#)}} Value of the null hypothesis, defaults to zero{p_end}
{synopt :{opth sample:size(numlist)}} List of sample sizes, required{p_end}
{synopt :{opt a:lpha(#)}} alpha value, defaults to alpha = 0.05{p_end}
{synopt :{opt pos:ition(#)}} column position of b in matrix e(b) of the analysis model, required{p_end}
{syntab :Data generating model}
{synopt :{opth f:amily(powersim##familyname:familyname)}} error distribution, required{p_end}
{synopt :{opth l:ink(powersim##linkname:linkname)}} link function, required* {p_end}
{synopt :{opt cov1(n b d # [#])}} covariate specifications, covariate 1 {p_end}
{synopt :{opt cov2(n b d # [#])}} covariate specifications, covariate 2 {p_end}
{synopt :{opt cov3(n b d # [#])}} covariate specifications, covariate 3 {p_end}
{synopt :{opt block22(b1 n1 b2 n2 [ b3 ])}} binary variables for 2x2 block design {p_end}
{synopt :{opt cons(b)}} constant term {p_end}
{synopt :{opt corr12(#)}} Pearson correlation between cov1 and cov2 {p_end}
{synopt :{opt corr13(#)}} Pearson correlation between cov1 and cov3 {p_end}
{synopt :{opt corr23(#)}} Pearson correlation between cov2 and cov3 {p_end}
{synopt :{opt inter1(# n*n)}} Interaction effect specification {p_end}
{synopt :{opt inter2(# n*n)}} Interaction effect specification {p_end}
{synopt :{opt inter3(# n*n)}} Interaction effect specification {p_end}
{syntab :Simulation}
{synopt :{opt nreps(#)}} number of Monte Carlo replications {p_end}
{synopt :{opt inside}} Generate predictor data inside simulation loop {p_end}
{synopt :{opt dry:run}} Dry run {p_end}
{synopt :{opt det:ail}} Detailed output for dry run {p_end}
{synopt :{opt gen:data}} create a single dataset {p_end}
{synopt :{opt nobs(#)}} Number of observations for created dataset {p_end}
{synopt :{opt seed(#)}} Set random-number seed {p_end}
{synopt :{opt expb}} Interpret effect sizes b as exp(b) {p_end}
{synopt :{opt nodots}} Suppress replication dots {p_end}
{synopt :{opt sil:ent}} Suppress any intermediate output (implies {opt nodots}) {p_end}
{synopt :{opt add:scalar(string)}} Fetch scalar stored in e() of analysis model {p_end}
{synopt :{opt maxiter(#)}} Set maximum number of iterations for the analysis model {p_end}
{synopt :{opt force}} Allow analysis model command other then {cmd: regress} or {cmd: glm} {p_end}
{synopt :{opt df(string)}} Use Students t-distribution for significance tests with df stored in {opt e(string)} of the analysis model {p_end}
{syntab :Saving}
{synopt :{opt do:file(filename[,replace])}} Save do-file to {it:filename} {p_end}
{synopt :{opt sav:ing(filename[,replace])}} Save simulation results to {it:filename} {p_end}
{synoptline}
{p 4 6 2}
* specification of a link function is only required if the do-file is generated via command options;
if an existing do-file is provided, the link function must be defined in that do-file.
{synoptline}
{p2colreset}{...}
{marker familyname}{...}
{synoptset 23}{...}
{synopthdr :familyname}
{synoptline}
{synopt :{opt gaussian} [{it:#}]}Gaussian (normal), with SD of Gaussian error {it:#}; defaults to 1{p_end}
{synopt :{opt igaussian} [{it:#}]}inverse Gaussian, with scale parameter {it:#}; defaults to 0.25{p_end}
{synopt :{opt binomial} [{it:#}]}Binomial, with number of trials {it:#}; defaults to 1 (i.e., Bernoulli){p_end}
{synopt :{opt poisson}}Poisson{p_end}
{synopt :{opt nbinomial} [{it:#}]}negative binomial, with overdispersion {it:#}; defaults to 1{p_end}
{synopt :{opt gamma} [{it:#}]}gamma, with scale parameter {it:#}; defaults to 1 {p_end}
{synoptline}
{p2colreset}{...}
{marker linkname}{...}
{synoptset 23}{...}
{synopthdr :linkname}
{synoptline}
{synopt :{opt identity}}identity{p_end}
{synopt :{opt log}}log{p_end}
{synopt :{opt logit}}logit{p_end}
{synopt :{opt probit}}probit{p_end}
{synopt :{opt cloglog}}cloglog{p_end}
{synopt :{opt power} {it:#}}power{p_end}
{synopt :{opt opower} {it:#}}odds power{p_end}
{synopt :{opt nbinomial}}negative binomial{p_end}
{synopt :{opt loglog}}log-log{p_end}
{synopt :{opt logc}}log-complement{p_end}
{synoptline}
{p2colreset}{...}
{marker covariate_distributions}{...}
{synoptset 23}{...}
{synopthdr :covariate_distributions}
{synoptline}
{synopt :{opt normal} {it:a#} {it:b#}}Gaussian, with M {it:a#} and SD {it:b#} {p_end}
{synopt :{opt poisson} {it:a#}}Poisson, with mean {it:a#}{p_end}
{synopt :{opt uniform} {it:a#} {it:b#}}uniform, ranging from {it:a#} to {it:b#}{p_end}
{synopt :{opt binomial} {it:a#}}bernoulli, i.e. binomial(1, {it:a#}) {p_end}
{synopt :{opt chi2} {it:a#}}chi-square, with {it:a#} df{p_end}
{synopt :{opt studentt} {it:a#}}Student's t, with {it:a#} df{p_end}
{synopt :{opt beta} {it:a#} {it:b#}}beta, with shape parameters {it:a#} and {it:b#}{p_end}
{synopt :{opt gamma} {it:a#} {it:b#}}gamma, with shape parameter {it:a#} and scale parameter {it:b#}{p_end}
{synopt :{opt nbinomial} {it:a#} {it:b#}}negative binomial, with number of failures {it:a#} and success probability {it:b#}{p_end}
{synopt :{opt block} {it:a#}}equally sized groups, with number of groups {it:a#} (2-4 groups) {p_end}
{synoptline}
{p2colreset}{...}
{title:Description}
{pstd}
{opt powersim} performs power analysis Monte Carlo simulations for linear and
generalized linear models. Data will be generated according to a model that
is specified via command options. Predictor data can be generated by either
using command options or by providing an existing do-file. Stata's {help regress}
and {help glm} commands can be used for specifying the analysis model. See
the accompanying tutorial for a more detailed introduction.
{title:Options}
{dlgtab:Power}
{phang}
{opth b(numlist)} List of the true population parameters.
{phang}
{opt null(#)} Value of the null hypothesis; if the {opt expb} option is specified,
the value will be interpreted as exponentiated value.
{phang}
{opth sample:size(numlist)} List of sample sizes.
{phang}
{opt a:lpha(#)} alpha level, default is alpha = 0.05.
{phang}
{opt pos:ition(#)} column position of the coefficient in matrix e(b) (see {help ereturn list})
of the analysis model for which power wants to be estimated.
{dlgtab:Data generating model}
{phang}
{opt f:amily(familyname)} specifies the (conditional) distribution of the outcome.
{phang}
{opt l:ink(linkname)} specifies the link function.
{phang}
{opt cov1(n b d # [#])} covariate specifications for covariate 1, where
n is the name of the covariate, b its effect size, and d its distribution, followed
by one or two parameter values, depending on the distribution
(see {help powersim##covariate_distributions:covariate_distributions)}.
Type {cmd: _bp} instead of an effect size value in case of the covariate for
which the power wants to be simulated.
{phang}
{opt cov2(n b d # [#])} covariate specifications as described above, covariate 2.
{phang}
{opt cov3(n b d # [#])} covariate specifications as described above, covariate 3.
{phang}
{opt block22(b1 n1 b2 n2 [b3])} creates two binary variables for a 2x2 block design
with balanced group sizes; n1 and n2 are the names of the two variables and b1 and b2 are
its coefficients; a coefficient for an interaction effect of the two variables can be specified
with b3.
{phang}
{opt constant(b)} b is the coefficient for the constant term; if this option is omitted,
the constant is assumed to equal zero.
{phang}
{opt corr12(#)} specifies Pearson correlation coefficient for cov1 and cov2, applicable only to
normally distributed covariates; if this option is omitted, cov1 and cov2 are assumed to be
uncorrelated (orthogonal).
{phang}
{opt corr13(#)} specifies Pearson correlation coefficient for cov1 and cov3, applicable only to
normally distributed covariates; if this option is omitted, cov1 and cov3 are assumed to be
uncorrelated (orthogonal).
{phang}
{opt corr23(#)} specifies Pearson correlation coefficient for cov2 and cov3, applicable only to
normally distributed covariates; if this option is omitted, cov2 and cov3 are assumed to be
uncorrelated (orthogonal).
{phang}
{opt inter1(# n*n)} specifies an interaction effect; for example, to generate data
with an interaction effect of 0.5 for cov2 and cov3 with names foo and bar,
type {cmd:inter1(0.5 foo*bar)}.
{phang}
{opt inter2(# n*n)} specifies an additional interaction effect.
{phang}
{opt inter3(# n*n)} specifies an additional interaction effect.
{dlgtab:Simulation}
{phang}
{opt nreps(#)} number of Monte Carlo replications.
{phang}
{opt inside} Generates predictor data inside the simulation loop; should be
specified if the predictor data itself is regarded as a stochastic component.
{phang}
{opt dry:run} Performs a dry run in order to check command inputs before running the simulations.
{phang}
{opt det:ail} Additionally prints the contents of the data generating do-file on the screen
if the {opt dryrun} option is specified.
{phang}
{opt gen:data} creates a single realization of the data.
{phang}
{opt nobs(#)} Determines the number of observations for the dataset requested with
the {opt gendata} option; defaults to N = 10,000.
{phang}
{opt seed(#)} Sets the initial random-number seed.
{phang}
{opt expb} Effect sizes specified in option {opt b()} are interpreted as exp(b); for example, this could be
useful if the power wants to be determined in the context of a logistic regression and the user wants
to specify effect sizes as odds ratios, rather than as log odds ratios.
{phang}
{opt nodots} Suppresses the replication dots during the simulations. By default, one dot character is
displayed for each successful replication. A red `x' is displayed if {it:model_cmd} returns
an error (for example, if an analysis model does not converge within the specified maximum
number of iterations).
{phang}
{opt sil:ent} Suppresses any intermediate output during simulations (implies {opt nodots}).
{phang}
{opt add:scalar} Specifies to fetch a scalar that is available in {opt e()} of the analysis
model from each MC replication; for example, if you would like to record the value of the
log-likelihood from each replication of a {opt glm}, the option would be specified as addscalar(ll);
the scalar will then be appended to the resulting dataset which contains results from each replication.
{phang}
{opt maxiter(#)} Specifies the number of maximum iterations for the analysis model, defaults to maxiter = 200;
will be ignored for model commands other than {cmd: glm}.
{phang}
{opt force} Allow analysis model command other then {cmd: regress} or {cmd: glm}.
{phang}
{opt df(string)} Use Students t-distribution for significance tests with df
stored in {opt e(string)} of the analysis model.
{dlgtab:Saving}
{phang}
{opt do:file(filename[,replace])} Save do-file to {it:filename}; if no existing do file is provided via
{opt using} then this option is required. {p_end}
{phang}
{opt sav:ing(filename[,replace])} Save simulation results from each replication to {it:filename}; {p_end}
{title:Remarks}
{pstd}
Stata's {help regress} and {help glm} commands can be used to specify an analysis model under
{opt model_cmd} after the colon of the {opt powersim} command. The outcome variable in
model_cmd must be named y. With {opt regress},
a t-distribution is used for siginificance testing and a normal distribution is used with
{opt glm}. Note that {opt powersim} removes the current data in memory and replaces
it with simulation results; make sure to save your data before running powersim. Also note
that {opt powersim} is creating a scalar with the name _bp. In case that a scalar with
the same name already exists, it will be overwritten. If multivariate
normal data is requested, Stata matrices with the names __M, __SD, and
__C are created which would replace any existing matrices with these names.
{pstd}
Note that both, distributional family and link function, need to be specified
for the data generating model. The available link functions are
{center:Link function {cmd:powersim} option}
{center:{hline 40}}
{center:identity {cmd:link(identity)} }
{center:log {cmd:link(log)} }
{center:logit {cmd:link(logit)} }
{center:probit {cmd:link(probit)} }
{center:complementary log-log {cmd:link(cloglog)} }
{center:odds power {cmd:link(opower} {it:#}{cmd:)} }
{center:power {cmd:link(power} {it:#}{cmd:)} }
{center:negative binomial {cmd:link(nbinomial)}}
{center:log-log {cmd:link(loglog)} }
{center:log-complement {cmd:link(logc)} }
{pstd}
If an existing do-file is provided, the link function
must be defined in that do-file and only the distributional family
has to be specified. Available distributional families are
{center:Family {cmd:powersim} option}
{center:{hline 40}}
{center:Gaussian(normal) {cmd:family(gaussian} {it:[#]}{cmd:)} }
{center:inverse Gaussian {cmd:family(igaussian} {it:[#]}{cmd:)}}
{center:Binomial {cmd:family(binomial} {it:[#]}{cmd:)} }
{center:Poisson {cmd:family(poisson)} }
{center:negative binomial {cmd:family(nbinomial} {it:[#]}{cmd:)}}
{center:gamma {cmd:family(gamma} {it:[#]}{cmd:)} }
{pstd}
As it is also stated in {help glm}, not all family/link
combinations make sense. You may choose from the following combinations:
{c |} id log logit probit clog pow opower nbinomial loglog logc
{hline 10}{c +}{hline 67}
Gaussian {c |} x x x
inv. Gau. {c |} x x x
binomial {c |} x x x x x x x x x
Poisson {c |} x x x
neg. bin. {c |} x x x x
gamma {c |} x x x
{title:Examples}
{hline}
{pstd}Estimating power for various effect and sample sizes for a linear effect of
a normally ditributed predictor variable which is correlated
with another predictor variable in a linear model: {p_end}
{phang2}{cmd: powersim , ///} {p_end}
{phang2}{cmd: b(0.1(0.1)0.3) ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(1) ///} {p_end}
{phang2}{cmd: sample(100(20)300) ///} {p_end}
{phang2}{cmd: nreps(500) ///} {p_end}
{phang2}{cmd: family(gaussian) ///} {p_end}
{phang2}{cmd: link(identity) ///} {p_end}
{phang2}{cmd: cov1(x1 _bp normal 0 1) ///} {p_end}
{phang2}{cmd: cov2(x2 0.3 normal 0 1) ///} {p_end}
{phang2}{cmd: corr12(0.5) ///} {p_end}
{phang2}{cmd: inside ///} {p_end}
{phang2}{cmd: saving(psim_results) ///} {p_end}
{phang2}{cmd: dofile(psim_dofile) : reg y x1 x2 } {p_end}
{pstd}The assumed model in the above call to {cmd:powersim} can be expressed as: {p_end}
{pstd}y = _bp*x1 + 0.3*x2 + e, e ~ N(0,1) {p_end}
{pstd}where _bp is a placeholder for the effect sizes specified in {opt b()};
x1 and x2 are drawn from a multivariate normal distribution with rho = 0.5. {p_end}
{hline}
{pstd}Using an existing do-file (here, the one that was created with the above example):{p_end}
{phang2}{cmd: powersim using psim_dofile, ///} {p_end}
{phang2}{cmd: b(0.1(0.1)0.3) ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(1) ///} {p_end}
{phang2}{cmd: sample(100(20)300) ///} {p_end}
{phang2}{cmd: nreps(500) ///} {p_end}
{phang2}{cmd: family(gaussian) ///} {p_end}
{phang2}{cmd: inside ///} {p_end}
{phang2}{cmd: saving(psim_results, replace) : reg y x1 x2 } {p_end}
{hline}
{pstd}Poisson model with two correlated predictor variables and power estimated
for the interaction effect:{p_end}
{phang2}{cmd: powersim , ///} {p_end}
{phang2}{cmd: b(0.1) ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(3) ///} {p_end}
{phang2}{cmd: sample(300) ///} {p_end}
{phang2}{cmd: nreps(500) ///} {p_end}
{phang2}{cmd: family(poisson) ///} {p_end}
{phang2}{cmd: link(log) ///} {p_end}
{phang2}{cmd: cov1(x1 -0.25 normal 0 1) ///} {p_end}
{phang2}{cmd: cov2(x2 0.4 normal 0 1) ///} {p_end}
{phang2}{cmd: inter1(_bp x1*x2) ///} {p_end}
{phang2}{cmd: cons(0.5) ///} {p_end}
{phang2}{cmd: corr12(0.5) ///} {p_end}
{phang2}{cmd: inside ///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y c.x1##c.x2, family(poisson) link(log) } {p_end}
{hline}
{pstd}Logistic regression model for a 2x2 balanced block design with an interaction effect,
for which the statistical power is simulated:{p_end}
{phang2}{cmd: powersim , ///} {p_end}
{phang2}{cmd: b(0.3) ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(8) ///} {p_end}
{phang2}{cmd: sample(2000) ///} {p_end}
{phang2}{cmd: nreps(500) ///} {p_end}
{phang2}{cmd: family(binomial) ///} {p_end}
{phang2}{cmd: link(logit) ///} {p_end}
{phang2}{cmd: block22(0.5 x1 0.5 x2 _bp) ///} {p_end}
{phang2}{cmd: cons(0.5) ///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y i.x1##i.x2, family(binomial) link(logit)} {p_end}
{hline}
{pstd}If power is estimated for several effect and/or sample sizes, power
can be plotted as a function of sample size (by effect sizes) after
simulations are completed (see {help powersimplot}): {p_end}
{phang2}{cmd: powersimplot} {p_end}
{hline}
{pstd}Power can also be plotted as a function of effect size if power is simulated
for varying effect sizes (see {help powersimplot}). The following simulation
assumes a negative binomial model with overdispersion parameter alpha = 0.5
(note that if alpha does not equal 1, {opt ml} must be specified in the {opt family()}
option of {cmd: glm} in the analysis model command), log link function,
and a single binomial predictor variable, distributed Bi(1,0.5), for a fixed sample
size of N = 500; effect sizes here are specified as incidence rate ratios
via usage of the {opt expb} option: {p_end}
{phang2}{cmd: powersim , ///} {p_end}
{phang2}{cmd: b(1(.05)1.5) ///} {p_end}
{phang2}{cmd: expb ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(1) ///} {p_end}
{phang2}{cmd: sample(500) ///} {p_end}
{phang2}{cmd: nreps(500) ///} {p_end}
{phang2}{cmd: family(nbinomial 0.5) ///} {p_end}
{phang2}{cmd: link(log) ///} {p_end}
{phang2}{cmd: cov1(x1 _bp binomial 0.5) ///} {p_end}
{phang2}{cmd: inside ///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y x1, family(nbinomial ml) link(log) } {p_end}
{pstd}Now plot the power curve as power vs. effect size: {p_end}
{phang2}{cmd: powersimplot, e } {p_end}
{hline}
{pstd}Simulating power for various effect and sample sizes for
the difference between two equally sized groups in an inverse
Gaussian model with scale parameter sigma = 0.5 and canonical
link function: {p_end}
{phang2}{cmd: powersim , ///} {p_end}
{phang2}{cmd: b(0.2(0.1)0.5) ///} {p_end}
{phang2}{cmd: alpha(0.05) ///} {p_end}
{phang2}{cmd: pos(2) ///} {p_end}
{phang2}{cmd: sample(300(100)500) ///} {p_end}
{phang2}{cmd: nreps(250) ///} {p_end}
{phang2}{cmd: family(igaussian 0.5) ///} {p_end}
{phang2}{cmd: link(power -2) ///} {p_end}
{phang2}{cmd: cov1(x1 _bp block 2) ///} {p_end}
{phang2}{cmd: cons(1) ///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y i.x1, fam(igaussian) link(power -2)} {p_end}
{pstd}Note that the random number generator for the inverse Gaussian distribution
may become slow, depending on specified predictor and scale parameters.{p_end}
{hline}
{title:Saved results}
{pstd}
{cmd:powersim} saves the following in {cmd:r()}:
{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Scalars}{p_end}
{synopt:{cmd:r(niter)}}number of MC replications (requested, per sample/effect size combination){p_end}
{synopt:{cmd:r(alpha)}}alpha level{p_end}
{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Macros}{p_end}
{synopt:{cmd:r(model)}}analysis model command line{p_end}
{synopt:{cmd:r(cmd)}}{cmd:powersim}{p_end}
{synopt:{cmd:r(effects)}}effect sizes{p_end}
{synopt:{cmd:r(samples)}}sample sizes{p_end}
{synopt:{cmd:r(iseed)}}initial random number seed{p_end}
{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Matrices}{p_end}
{synopt:{cmd:r(power)}}results matrix{p_end}
{p2colreset}{...}
{title:Acknowledgment}
{p 4 4 2}Ariel Linden suggested the {opt silent} option.
{title:Author}
{p 4 4 2}Joerg Luedicke{break}
Yale University and University of Florida{break}
United States{break}
email: joerg.luedicke@ufl.edu
{title:Also see}
{psee}
Manual: {manlink R glm}
{p_end}
{psee}
Manual: {manlink R regress}
{p_end}
{psee}
{space 2}Help: {help powersimplot}
{p_end}