{smcl}
{* *! Help for -powersim- version 1.0.0  26July2013}{...}
{hline}
help {hi: powersim}
{hline}

{title:Title}

{p2colset 5 16 18 2}{...}
Simulation-based power analysis for linear and generalized linear models
{p2colreset}{...}


{title:Syntax}

{p 8 12 2}
{cmd:powersim}
[{cmd:using} {it:filename}] {it:options}
[{cmd:,} {it:options}] : {it:model_cmd}

{synoptset 30 tabbed}{...}
{synopthdr}
{synoptline}

{syntab :Power}
{synopt :{opth b(numlist)}} List of effect sizes, required{p_end}
{synopt :{opt null(#)}} Value of the null hypothesis, defaults to zero{p_end}
{synopt :{opth sample:size(numlist)}} List of sample sizes, required{p_end}
{synopt :{opt a:lpha(#)}} alpha value, defaults to alpha = 0.05{p_end}
{synopt :{opt pos:ition(#)}} column position of b in matrix e(b) of the analysis model, required{p_end}

{syntab :Data generating model}
{synopt :{opth f:amily(powersim##familyname:familyname)}} error distribution, required{p_end}
{synopt :{opth l:ink(powersim##linkname:linkname)}} link function, required* {p_end}
{synopt :{opt cov1(n b d # [#])}} covariate specifications, covariate 1 {p_end}
{synopt :{opt cov2(n b d # [#])}} covariate specifications, covariate 2 {p_end}
{synopt :{opt cov3(n b d # [#])}} covariate specifications, covariate 3 {p_end}
{synopt :{opt block22(b1 n1 b2 n2 [ b3 ])}} binary variables for 2x2 block design {p_end}
{synopt :{opt cons(b)}} constant term {p_end}
{synopt :{opt corr12(#)}} Pearson correlation between cov1 and cov2 {p_end}
{synopt :{opt corr13(#)}} Pearson correlation between cov1 and cov3 {p_end}
{synopt :{opt corr23(#)}} Pearson correlation between cov2 and cov3 {p_end}
{synopt :{opt inter1(# n*n)}} Interaction effect specification {p_end}
{synopt :{opt inter2(# n*n)}} Interaction effect specification {p_end}
{synopt :{opt inter3(# n*n)}} Interaction effect specification {p_end}

{syntab :Simulation}
{synopt :{opt nreps(#)}} number of Monte Carlo replications {p_end}
{synopt :{opt inside}} Generate predictor data inside simulation loop  {p_end}
{synopt :{opt dry:run}} Dry run {p_end}
{synopt :{opt det:ail}} Detailed output for dry run {p_end}
{synopt :{opt gen:data}} create a single dataset {p_end}
{synopt :{opt nobs(#)}} Number of observations for created dataset {p_end}
{synopt :{opt seed(#)}} Set random-number seed {p_end}
{synopt :{opt expb}} Interpret effect sizes b as exp(b) {p_end}
{synopt :{opt nodots}} Suppress replication dots {p_end}
{synopt :{opt sil:ent}} Suppress any intermediate output (implies {opt nodots}) {p_end}
{synopt :{opt add:scalar(string)}} Fetch scalar stored in e() of analysis model {p_end}
{synopt :{opt maxiter(#)}} Set maximum number of iterations for the analysis model {p_end}
{synopt :{opt force}} Allow analysis model command other then {cmd: regress} or {cmd: glm} {p_end}
{synopt :{opt df(string)}} Use Students t-distribution for significance tests with df stored in {opt e(string)} of the analysis model {p_end}

{syntab :Saving}
{synopt :{opt do:file(filename[,replace])}} Save do-file to {it:filename} {p_end}
{synopt :{opt sav:ing(filename[,replace])}} Save simulation results to {it:filename} {p_end}

{synoptline}
{p 4 6 2}
* specification of a link function is only required if the do-file is generated via command options;
if an existing do-file is provided, the link function must be defined in that do-file.

{synoptline}
{p2colreset}{...}

{marker familyname}{...}
{synoptset 23}{...}
{synopthdr :familyname}
{synoptline}
{synopt :{opt gaussian} [{it:#}]}Gaussian (normal), with SD of Gaussian error {it:#}; defaults to 1{p_end}
{synopt :{opt igaussian} [{it:#}]}inverse Gaussian, with scale parameter {it:#}; defaults to 0.25{p_end}
{synopt :{opt binomial} [{it:#}]}Binomial, with number of trials {it:#}; defaults to 1 (i.e., Bernoulli){p_end}
{synopt :{opt poisson}}Poisson{p_end}
{synopt :{opt nbinomial} [{it:#}]}negative binomial, with overdispersion {it:#}; defaults to 1{p_end}
{synopt :{opt gamma} [{it:#}]}gamma, with scale parameter {it:#}; defaults to 1 {p_end}
{synoptline}
{p2colreset}{...}

{marker linkname}{...}
{synoptset 23}{...}
{synopthdr :linkname}
{synoptline}
{synopt :{opt identity}}identity{p_end}
{synopt :{opt log}}log{p_end}
{synopt :{opt logit}}logit{p_end}
{synopt :{opt probit}}probit{p_end}
{synopt :{opt cloglog}}cloglog{p_end}
{synopt :{opt power} {it:#}}power{p_end}
{synopt :{opt opower} {it:#}}odds power{p_end}
{synopt :{opt nbinomial}}negative binomial{p_end}
{synopt :{opt loglog}}log-log{p_end}
{synopt :{opt logc}}log-complement{p_end}
{synoptline}
{p2colreset}{...}

{marker covariate_distributions}{...}
{synoptset 23}{...}
{synopthdr :covariate_distributions}
{synoptline}
{synopt :{opt normal} {it:a#} {it:b#}}Gaussian, with M {it:a#} and SD {it:b#} {p_end}
{synopt :{opt poisson} {it:a#}}Poisson, with mean {it:a#}{p_end}
{synopt :{opt uniform} {it:a#} {it:b#}}uniform, ranging from {it:a#} to {it:b#}{p_end}
{synopt :{opt binomial} {it:a#}}bernoulli, i.e. binomial(1, {it:a#})  {p_end}
{synopt :{opt chi2} {it:a#}}chi-square, with {it:a#} df{p_end}
{synopt :{opt studentt} {it:a#}}Student's t, with {it:a#} df{p_end}
{synopt :{opt beta} {it:a#} {it:b#}}beta, with shape parameters {it:a#} and {it:b#}{p_end}
{synopt :{opt gamma} {it:a#} {it:b#}}gamma, with shape parameter {it:a#} and scale parameter {it:b#}{p_end}
{synopt :{opt nbinomial} {it:a#} {it:b#}}negative binomial, with number of failures {it:a#} and success probability {it:b#}{p_end}
{synopt :{opt block} {it:a#}}equally sized groups, with number of groups {it:a#} (2-4 groups) {p_end}

{synoptline}
{p2colreset}{...}


{title:Description}

{pstd}
{opt powersim} performs power analysis Monte Carlo simulations for linear and 
generalized linear models. Data will be generated according to a model that 
is specified via command options. Predictor data can be generated by either 
using command options or by providing an existing do-file. Stata's {help regress}
and {help glm} commands can be used for specifying the analysis model. See
the accompanying tutorial for a more detailed introduction.


{title:Options}

{dlgtab:Power}

{phang}
{opth b(numlist)} List of the true population parameters.

{phang}
{opt null(#)} Value of the null hypothesis; if the {opt expb} option is specified,
the value will be interpreted as exponentiated value.

{phang}
{opth sample:size(numlist)} List of sample sizes.

{phang}
{opt a:lpha(#)} alpha level, default is alpha = 0.05.

{phang}
{opt pos:ition(#)} column position of the coefficient in matrix e(b) (see {help ereturn list})
of the analysis model for which power wants to be estimated.


{dlgtab:Data generating model}

{phang}
{opt f:amily(familyname)} specifies the (conditional) distribution of the outcome.

{phang}
{opt l:ink(linkname)} specifies the link function.

{phang}
{opt cov1(n b d # [#])} covariate specifications for covariate 1, where 
n is the name of the covariate, b its effect size, and d its distribution, followed 
by one or two parameter values, depending on the distribution 
(see {help powersim##covariate_distributions:covariate_distributions)}. 
Type {cmd: _bp} instead of an effect size value in case of the covariate for 
which the power wants to be simulated.

{phang}
{opt cov2(n b d # [#])} covariate specifications as described above, covariate 2.

{phang}
{opt cov3(n b d # [#])} covariate specifications as described above, covariate 3.

{phang}
{opt block22(b1 n1 b2 n2 [b3])} creates two binary variables for a 2x2 block design
with balanced group sizes; n1 and n2 are the names of the two variables and b1 and b2 are 
its coefficients; a coefficient for an interaction effect of the two variables can be specified
with b3.

{phang}
{opt constant(b)} b is the coefficient for the constant term; if this option is omitted,
the constant is assumed to equal zero.

{phang}
{opt corr12(#)} specifies Pearson correlation coefficient for cov1 and cov2, applicable only to
normally distributed covariates; if this option is omitted, cov1 and cov2 are assumed to be 
uncorrelated (orthogonal).

{phang}
{opt corr13(#)} specifies Pearson correlation coefficient for cov1 and cov3, applicable only to
normally distributed covariates; if this option is omitted, cov1 and cov3 are assumed to be 
uncorrelated (orthogonal).

{phang}
{opt corr23(#)} specifies Pearson correlation coefficient for cov2 and cov3, applicable only to
normally distributed covariates; if this option is omitted, cov2 and cov3 are assumed to be 
uncorrelated (orthogonal).

{phang}
{opt inter1(# n*n)} specifies an interaction effect; for example, to generate data 
with an interaction effect of 0.5 for cov2 and cov3 with names foo and bar, 
type {cmd:inter1(0.5 foo*bar)}.

{phang}
{opt inter2(# n*n)} specifies an additional interaction effect. 

{phang}
{opt inter3(# n*n)} specifies an additional interaction effect.


{dlgtab:Simulation}

{phang}
{opt nreps(#)} number of Monte Carlo replications. 

{phang}
{opt inside} Generates predictor data inside the simulation loop; should be
specified if the predictor data itself is regarded as a stochastic component.

{phang} 
{opt dry:run} Performs a dry run in order to check command inputs before running the simulations.

{phang}
{opt det:ail} Additionally prints the contents of the data generating do-file on the screen
if the {opt dryrun} option is specified.

{phang}
{opt gen:data} creates a single realization of the data. 

{phang}
{opt nobs(#)} Determines the number of observations for the dataset requested with 
the {opt gendata} option; defaults to N = 10,000. 

{phang}
{opt seed(#)} Sets the initial random-number seed.

{phang}
{opt expb} Effect sizes specified in option {opt b()} are interpreted as exp(b); for example, this could be
useful if the power wants to be determined in the context of a logistic regression and the user wants 
to specify effect sizes as odds ratios, rather than as log odds ratios.

{phang}
{opt nodots} Suppresses the replication dots during the simulations. By default, one dot character is
displayed for each successful replication.  A red `x' is displayed if {it:model_cmd} returns
an error (for example, if an analysis model does not converge within the specified maximum
number of iterations).

{phang}
{opt sil:ent} Suppresses any intermediate output during simulations (implies {opt nodots}).

{phang}
{opt add:scalar} Specifies to fetch a scalar that is available in {opt e()} of the analysis 
model from each MC replication; for example, if you would like to record the value of the 
log-likelihood from each replication of a {opt glm}, the option would be specified as addscalar(ll); 
the scalar will then be appended to the resulting dataset which contains results from each replication.

{phang}
{opt maxiter(#)} Specifies the number of maximum iterations for the analysis model, defaults to maxiter = 200;
 will be ignored for model commands other than {cmd: glm}. 

{phang}
{opt force} Allow analysis model command other then {cmd: regress} or {cmd: glm}. 

{phang}
{opt df(string)} Use Students t-distribution for significance tests with df 
stored in {opt e(string)} of the analysis model.


{dlgtab:Saving}

{phang}
{opt do:file(filename[,replace])} Save do-file to {it:filename}; if no existing do file is provided via 
{opt using} then this option is required. {p_end}

{phang}
{opt sav:ing(filename[,replace])} Save simulation results from each replication to {it:filename};  {p_end}


{title:Remarks}

{pstd}
Stata's {help regress} and {help glm} commands can be used to specify an analysis model under 
{opt model_cmd} after the colon of the {opt powersim} command. The outcome variable in
model_cmd must be named y. With {opt regress},
a t-distribution is used for siginificance testing and a normal distribution is used with 
{opt glm}. Note that {opt powersim} removes the current data in memory and replaces 
it with simulation results; make sure to save your data before running powersim. Also note
that {opt powersim} is creating a scalar with the name _bp. In case that a scalar with
the same name already exists, it will be overwritten. If multivariate
normal data is requested, Stata matrices with the names __M, __SD, and
__C are created which would replace any existing matrices with these names.


{pstd}
Note that both, distributional family and link function, need to be specified
for the data generating model. The available link functions are

{center:Link function            {cmd:powersim} option}
{center:{hline 40}}
{center:identity                 {cmd:link(identity)} }
{center:log                      {cmd:link(log)}      }
{center:logit                    {cmd:link(logit)}    }
{center:probit                   {cmd:link(probit)}   }
{center:complementary log-log    {cmd:link(cloglog)}  }
{center:odds power               {cmd:link(opower} {it:#}{cmd:)} }
{center:power                    {cmd:link(power} {it:#}{cmd:)}  }
{center:negative binomial        {cmd:link(nbinomial)}}
{center:log-log                  {cmd:link(loglog)}   }
{center:log-complement           {cmd:link(logc)}     }

{pstd}
If an existing do-file is provided, the link function
must be defined in that do-file and only the distributional family
has to be specified. Available distributional families are

{center:Family                   {cmd:powersim} option}
{center:{hline 40}}
{center:Gaussian(normal)   {cmd:family(gaussian} {it:[#]}{cmd:)} } 
{center:inverse Gaussian   {cmd:family(igaussian} {it:[#]}{cmd:)}}
{center:Binomial           {cmd:family(binomial} {it:[#]}{cmd:)} }
{center:Poisson            {cmd:family(poisson)}      }
{center:negative binomial  {cmd:family(nbinomial} {it:[#]}{cmd:)}}
{center:gamma              {cmd:family(gamma} {it:[#]}{cmd:)}    }


{pstd}
As it is also stated in {help glm}, not all family/link
combinations make sense.  You may choose from the following combinations:

          {c |} id  log  logit  probit  clog  pow  opower  nbinomial  loglog  logc
{hline 10}{c +}{hline 67}
Gaussian  {c |}  x   x                         x
inv. Gau. {c |}  x   x                         x
binomial  {c |}  x   x     x      x       x    x     x                  x      x
Poisson   {c |}  x   x                         x
neg. bin. {c |}  x   x                         x              x
gamma     {c |}  x   x                         x


{title:Examples}

{hline}

{pstd}Estimating power for various effect and sample sizes for a linear effect of
a normally ditributed predictor variable which is correlated
with another predictor variable in a linear model: {p_end}

{phang2}{cmd: powersim , 				///} {p_end}
{phang2}{cmd: b(0.1(0.1)0.3) 			///} {p_end}
{phang2}{cmd: alpha(0.05) 				///} {p_end}
{phang2}{cmd: pos(1)					///} {p_end}
{phang2}{cmd: sample(100(20)300)		///} {p_end}
{phang2}{cmd: nreps(500) 				///} {p_end}
{phang2}{cmd: family(gaussian) 			///} {p_end}
{phang2}{cmd: link(identity) 			///} {p_end}
{phang2}{cmd: cov1(x1 _bp normal 0 1)  	///} {p_end}
{phang2}{cmd: cov2(x2 0.3 normal 0 1)  	///} {p_end}
{phang2}{cmd: corr12(0.5)				///} {p_end}
{phang2}{cmd: inside 					///} {p_end}
{phang2}{cmd: saving(psim_results)		///} {p_end}
{phang2}{cmd: dofile(psim_dofile) : reg y x1 x2 } {p_end}

{pstd}The assumed model in the above call to {cmd:powersim} can be expressed as: {p_end}
{pstd}y = _bp*x1 + 0.3*x2 + e, e ~ N(0,1) {p_end}
{pstd}where _bp is a placeholder for the effect sizes specified in {opt b()};  
x1 and x2 are drawn from a multivariate normal distribution with rho = 0.5. {p_end}

{hline}

{pstd}Using an existing do-file (here, the one that was created with the above example):{p_end}

{phang2}{cmd: powersim using psim_dofile, 	///} {p_end}
{phang2}{cmd: b(0.1(0.1)0.3) 				///} {p_end}
{phang2}{cmd: alpha(0.05) 					///} {p_end}
{phang2}{cmd: pos(1)						///} {p_end}
{phang2}{cmd: sample(100(20)300)			///} {p_end}
{phang2}{cmd: nreps(500) 					///} {p_end}
{phang2}{cmd: family(gaussian) 				///} {p_end}
{phang2}{cmd: inside 						///} {p_end}
{phang2}{cmd: saving(psim_results, replace) : reg y x1 x2 } {p_end}

{hline}

{pstd}Poisson model with two correlated predictor variables and power estimated  
for the interaction effect:{p_end}

{phang2}{cmd: powersim , 					///} {p_end}
{phang2}{cmd: b(0.1) 						///} {p_end}
{phang2}{cmd: alpha(0.05) 					///} {p_end}
{phang2}{cmd: pos(3)						///} {p_end}
{phang2}{cmd: sample(300)					///} {p_end}
{phang2}{cmd: nreps(500) 					///} {p_end}
{phang2}{cmd: family(poisson) 				///} {p_end}
{phang2}{cmd: link(log) 					///} {p_end}
{phang2}{cmd: cov1(x1 -0.25 normal 0 1)  	///} {p_end}
{phang2}{cmd: cov2(x2 0.4 normal 0 1)  		///} {p_end}
{phang2}{cmd: inter1(_bp x1*x2)				///} {p_end}
{phang2}{cmd: cons(0.5)						///} {p_end}
{phang2}{cmd: corr12(0.5)					///} {p_end}
{phang2}{cmd: inside 						///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y c.x1##c.x2, family(poisson) link(log) } {p_end}

{hline}

{pstd}Logistic regression model for a 2x2 balanced block design with an interaction effect, 
for which the statistical power is simulated:{p_end}

{phang2}{cmd: powersim , 					///} {p_end}
{phang2}{cmd: b(0.3) 						///} {p_end}
{phang2}{cmd: alpha(0.05) 					///} {p_end}
{phang2}{cmd: pos(8)						///} {p_end}
{phang2}{cmd: sample(2000)					///} {p_end}
{phang2}{cmd: nreps(500) 					///} {p_end}
{phang2}{cmd: family(binomial) 				///} {p_end}
{phang2}{cmd: link(logit) 					///} {p_end}
{phang2}{cmd: block22(0.5 x1 0.5 x2 _bp)	///} {p_end}
{phang2}{cmd: cons(0.5)						///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y i.x1##i.x2, family(binomial) link(logit)} {p_end}

{hline}

{pstd}If power is estimated for several effect and/or sample sizes, power 
can be plotted as a function of sample size (by effect sizes) after
simulations are completed (see {help powersimplot}): {p_end}

{phang2}{cmd: powersimplot} {p_end}

{hline}

{pstd}Power can also be plotted as a function of effect size if power is simulated
for varying effect sizes (see {help powersimplot}). The following simulation
assumes a negative binomial model with overdispersion parameter alpha = 0.5 
(note that if alpha does not equal 1, {opt ml} must be specified in the {opt family()} 
option of {cmd: glm} in the analysis model command), log link function,
and a single binomial predictor variable, distributed Bi(1,0.5), for a fixed sample
size of N = 500; effect sizes here are specified as incidence rate ratios
via usage of the {opt expb} option: {p_end}

{phang2}{cmd: powersim , 				///} {p_end}
{phang2}{cmd: b(1(.05)1.5) 				///} {p_end}
{phang2}{cmd: expb 						///} {p_end}
{phang2}{cmd: alpha(0.05) 				///} {p_end}
{phang2}{cmd: pos(1)					///} {p_end}
{phang2}{cmd: sample(500)				///} {p_end}
{phang2}{cmd: nreps(500) 				///} {p_end}
{phang2}{cmd: family(nbinomial 0.5) 	///} {p_end}
{phang2}{cmd: link(log) 				///} {p_end}
{phang2}{cmd: cov1(x1 _bp binomial 0.5)	///} {p_end}
{phang2}{cmd: inside 					///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y x1, family(nbinomial ml) link(log) } {p_end}

{pstd}Now plot the power curve as power vs. effect size: {p_end}

{phang2}{cmd: powersimplot, e } {p_end}

{hline}

{pstd}Simulating power for various effect and sample sizes for
the difference between two equally sized groups in an inverse
Gaussian model with scale parameter sigma = 0.5 and canonical
link function: {p_end}

{phang2}{cmd: powersim , 				///} {p_end}
{phang2}{cmd: b(0.2(0.1)0.5) 			///} {p_end}
{phang2}{cmd: alpha(0.05) 				///} {p_end}
{phang2}{cmd: pos(2) 					///} {p_end}
{phang2}{cmd: sample(300(100)500) 		///} {p_end}
{phang2}{cmd: nreps(250) 				///} {p_end}
{phang2}{cmd: family(igaussian 0.5) 	///} {p_end}
{phang2}{cmd: link(power -2) 			///} {p_end}
{phang2}{cmd: cov1(x1 _bp block 2) 		///} {p_end}
{phang2}{cmd: cons(1) 					///} {p_end}
{phang2}{cmd: dofile(psim_dofile, replace) : ///} {p_end}
{phang2}{cmd: glm y i.x1, fam(igaussian) link(power -2)} {p_end}

{pstd}Note that the random number generator for the inverse Gaussian distribution 
may become slow, depending on specified predictor and scale parameters.{p_end}

{hline}


{title:Saved results}

{pstd}
{cmd:powersim} saves the following in {cmd:r()}:

{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Scalars}{p_end}
{synopt:{cmd:r(niter)}}number of MC replications (requested, per sample/effect size combination){p_end}
{synopt:{cmd:r(alpha)}}alpha level{p_end}

{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Macros}{p_end}
{synopt:{cmd:r(model)}}analysis model command line{p_end}
{synopt:{cmd:r(cmd)}}{cmd:powersim}{p_end}
{synopt:{cmd:r(effects)}}effect sizes{p_end}
{synopt:{cmd:r(samples)}}sample sizes{p_end}
{synopt:{cmd:r(iseed)}}initial random number seed{p_end}

{synoptset 20 tabbed}{...}
{p2col 5 20 24 2: Matrices}{p_end}
{synopt:{cmd:r(power)}}results matrix{p_end}

{p2colreset}{...}


{title:Acknowledgment}

{p 4 4 2}Ariel Linden suggested the {opt silent} option.


{title:Author}

{p 4 4 2}Joerg Luedicke{break}
Yale University and University of Florida{break}
United States{break} 
email: joerg.luedicke@ufl.edu


{title:Also see}

{psee}
Manual:  {manlink R glm}
{p_end}

{psee}
Manual:  {manlink R regress}
{p_end}

{psee}
{space 2}Help:  {help powersimplot}
{p_end}