{smcl} {.-} help for {cmd:powercal} {right:(Roger Newson)} {.-} {title:Generalized power calculations saving results in variables} {p 8 27} {cmd:powercal} {it:newvarname} [{cmd:if} {it:exp}] [{cmd:in} {it:range}] , [ {cmdab:n:unit}{cmd:(}{it:expression_1}{cmd:)} {cmdab:p:ower}{cmd:(}{it:expression_2}{cmd:)} {cmdab:a:lpha}{cmd:(}{it:expression_3}{cmd:)} {cmdab:d:elta}{cmd:(}{it:expression_4}{cmd:)} {cmdab:s:dinf}{cmd:(}{it:expression_5}{cmd:)} {cmdab:t:df}{cmd:(}{it:expression_6}{cmd:)} {cmd:no}{cmdab:ce:iling} {cmd:float} ] {pstd} where {it:expression} is a numeric expression. The numeric expression for each option must be in the form required by the {cmd:generate} command. That is to say, each expression must be specified so that the command {pstd} {cmd:gene double }{it:newvarname}{cmd:=(}{it:expression}{cmd:)} {pstd} will work. {title:Description} {pstd} {cmd:powercal} performs generalized power calculations, storing the result in a new variable with a name specified by {it:newvarname}. All except one of the options {cmd:nunit}, {cmd:power}, {cmd:alpha}, {cmd:delta} and {cmd:sdinf} must be specified. The single unspecified option in this list specifies whether the output variable is the number of sampling units, power, alpha (significance level), delta (difference in parameter value to be detected), or the standard deviation (SD) of the influence function. Any of these 5 quantities can be calculated from the other 4. {cmd:powercal} can be used to calculate any of these quantities, assuming that we are testing a hypothesis that a parameter is zero, and that the true value is given by {cmd:delta}, and that the sample statistic is distributed around the population parameter in such a way that the pivotal quantity {pstd} {hi:PQ = sqrt(nunits) {c 42} (delta/sdinf)} {pstd} has a standard Normal distribution (if {cmd:tdf()} is not specified) or a t-distribution with {cmd:tdf()} degrees of freedom (if {cmd:tdf()} is specified). The formulas used by {cmd:powercal} define power as the probability of detecting a difference in the right direction, using a two-tailed test. {title:Options} {p 0 4}{cmd:nunit(}{it:expression_1}{cmd:)} gives an expression whose value is the number of independent sampling units. Sampling units are defined very generally. For instance, in an experiment involving equal-sized samples of individuals from Population A and Population B, a sampling unit might be a pair of sampled individuals, one from each population. Similarly, in a case-control study with 4 controls per case, a sampling unit might be a case together with 4 controls. {p 0 4}{cmd:power(}{it:expression_2}{cmd:)} gives an expression whose value is the power to detect a difference specified by the {cmd:delta()} option (see below). The power is defined as the probability that the sample difference is in the correct direction, and also large enough to be significant, using a 2-tailed test, at the level specified by the {cmd:alpha()} option (see below). {p 0 4}{cmd:alpha(}{it:expression_3}{cmd:)} gives an expression whose value is the size, or significance level, of the statistical test (in units of probability, not percentage). {p 0 4}{cmd:delta(}{it:expression_4}{cmd:)} gives an expression whose value is the true population difference to be detected. This difference is assumed to be positive. Therefore, if the user wishes to detect a negative difference, then s/he should specify an expression equal to minus that difference. The difference may be the log of a ratio parameter, such as an odds ratio, rate ratio, risk ratio or ratio of geometric means. {p 0 4}{cmd:sdinf(}{it:expression_5}{cmd:)} gives an expression whose value is the standard deviation of the influence function. That is to say, it is an expression equal to the expected standard error of the sample difference multiplied by the square root of the number of sampling units, where sampling units are defined generally, as specified in the option {cmd:nunit()}. In the simple case of a paired t-test, {cmd:sdinf()} is the standard deviation of the paired differences. More generally, {cmd:sdinf()} can be defined by calculating a standard error for a particular number of units, from a pilot study, from a simulation or from a formula, and multiplying this standard error by the square root of the number of units in the pilot study, simulation or formula. {p 0 4}{cmd:tdf(}{it:expression_6}{cmd:)} gives an expression whose value is the degrees of freedom of the t-distribution to be assumed for the pivotal quantity {hi:PQ} specified above. The degrees of freedom expression is not necessarily integer-valued. If {cmd:tdf()} is absent, then {hi:PQ} is assumed to follow a standard Normal distribution. {p 0 4}{cmd:noceiling} specifies that, if the output variable specified by {it:newvarname} is a number of units, then it will not be rounded up to the lowest integer no less than itself (as calculated by the Stata {cmd:ceil()} function). This option can be useful if the output variable is intended to specify an amount of exposure, such as a number of person-years, and the input {cmd:sdinf()} expression specifies a standard deviation of the influence function per unit exposure. If {cmd:noceiling} is not specified, and {cmd:power()}, {cmd:alpha()}, {cmd:delta()} and {cmd:sdinf()} are specified, then {cmd:powercal} rounds up the output variable, so that it contains a whole number of units {p 0 4}{cmd:float} specifies that the output variable will have a {help datatypes:storage type} no higher than {hi:float}. If {cmd:float} is not specified, then {cmd:powercal} creates the output variable with storage type {hi:double}. Whether or not {cmd:float} is specified, {cmd:powercal} compresses the output variable as much as possible without loss of precision. (See help for {help compress}.) {title:Remarks} {pstd} {cmd:powercal} carries out sample size calculations for a more general range of possible experimental designs than {help sampsi}, and stores the result in a new variable, instead of reporting the result in the log. The new variable may be input to further calculations and/or plotted and/or listed. {cmd:powercal} is intended as a low-level programming tool for users intending to carry out sample size calculations for a given experimental design. It is the responsibility of the user to ensure that the expressions are correct, and to choose a parameter scale on which the parameter is expected to be Normally distributed (or t-distributed), with a variance that does not vary excessively with the size of the measured difference. {pstd} The formulas used by {cmd:powercal} define power as the probability of detecting a difference in the right direction, using a two-tailed test. It follows that, in the limit, as the difference {hi:delta} tends to zero, the power to detect a difference of {hi:delta} with a P-value of {hi:alpha} tends to a minimum of {hi:alpha/2}, and not to a minimum of {hi:alpha}. {cmd:powercal} converts to missing the results of all input expressions for {cmd:power()} and {cmd:alpha()} which evaluate to a number outside the open interval {hi:(0,1)}, and the results of all input expressions for {cmd:delta()}, {cmd:sdinf()} and {cmd:nunit()} which evaluate to a non-positive number. {cmd:powercal} also converts to missing all values in the output variable for which there is not a unique maximum or minimum value of the output quantity. See Newson (2004), or the manual {hi:powercal.pdf} (distributed as an ancillary file with the {cmd:powercal} package), for details of the Methods and Formulas. {title:Examples} {pstd} The following examples are explained in detail in the manual {hi:powercal.pdf}, which is distributed with the {cmd:powercal} package. {pstd} This example creates Figure 1, displaying power as a function of the geometric mean ratio between 2 treatment groups: {p 8 16}{inp:. clear}{p_end} {p 8 16}{inp:. scal cv=0.5}{p_end} {p 8 16}{inp:. scal sdlog=sqrt(log(cv*cv + 1))}{p_end} {p 8 16}{inp:. scal r20=exp(-2*sdlog*invnorm(0.2))}{p_end} {p 8 16}{inp:. disp _n as text "Coefficient of variation: " as result cv _n as text "SD of logs: " as result sdlog _n as text "20% tail ratio: " as result r20}{p_end} {p 8 16}{inp:. set obs 100}{p_end} {p 8 16}{inp:. gene logratio=log(2)*(_n/_N)}{p_end} {p 8 16}{inp:. lab var logratio "Log GM ratio"}{p_end} {p 8 16}{inp:. gene gmratio=exp(logratio)}{p_end} {p 8 16}{inp:. lab var gmratio "GM ratio"}{p_end} {p 8 16}{inp:. powercal power, alpha(0.01) delta(logratio) sdinf(sdlog*sqrt(2)) nunit(50) tdf(98)}{p_end} {p 8 16}{inp:. line power gmratio, sort ylab(0(0.05)1) yline(0.8 0.9) xlab(1(0.1)2) xscale(log range(1 2))}{p_end} {pstd} This example creates Figure 2, displaying detectable geometric mean ratios between 2 groups as a function of number per group: {p 8 16}{inp:. clear}{p_end} {p 8 16}{inp:. scal cv=0.5}{p_end} {p 8 16}{inp:. scal sdlog=sqrt(log(cv*cv + 1))}{p_end} {p 8 16}{inp:. scal r20=exp(-2*sdlog*invnorm(0.2))}{p_end} {p 8 16}{inp:. disp _n as text "Coefficient of variation: " as result cv _n as text "SD of logs: " as result sdlog _n as text "20% tail ratio: " as result r20}{p_end} {p 8 16}{inp:. set obs 100}{p_end} {p 8 16}{inp:. gene npergp=_n}{p_end} {p 8 16}{inp:. lab var npergp "Number per group"}{p_end} {p 8 16}{inp:. powercal logratio, power(0.9) alpha(0.01) sdinf(sdlog*sqrt(2)) nunit(npergp) tdf(2*(npergp-1))}{p_end} {p 8 16}{inp:. gene hiratio=exp(logratio)}{p_end} {p 8 16}{inp:. gene loratio=exp(-logratio)}{p_end} {p 8 16}{inp:. lab var hiratio "Detectable GM ratio >1"}{p_end} {p 8 16}{inp:. lab var loratio "Detectable GM ratio <1"}{p_end} {p 8 16}{inp:. line hiratio loratio npergp if _n>=5, xlab(0(10)100)}{p_end} {pstd} This example creates Figures 3 and 4, displaying, respectively, detectable odds ratios in a case-control study as a function of number of cases and attainable significance levels as a function of odds ratio: {p 8 16}{inp:. clear}{p_end} {p 8 16}{inp:. scal conprev=0.25}{p_end} {p 8 16}{inp:. scal conodds=conprev/(1-conprev)}{p_end} {p 8 16}{inp:. disp _n as text "Expected control prevalence: " as result conprev _n as text "Expected control odds: " as result conodds}{p_end} {p 8 16}{inp:. set obs 101}{p_end} {p 8 16}{inp:. gene logor=log(1.25)+(log(5)-log(1.25))*(_n-1)/(_N-1)}{p_end} {p 8 16}{inp:. gene or=exp(logor)}{p_end} {p 8 16}{inp:. gene caseodds=conodds*or}{p_end} {p 8 16}{inp:. gene caseprev=caseodds/(1+caseodds)}{p_end} {p 8 16}{inp:. gene sdinflor=sqrt( 1/caseprev + 1/(1-caseprev) + (1/2)*( 1/conprev + 1/(1-conprev) ) );}{p_end} {p 8 16}{inp:. lab var logor "Log odds ratio"}{p_end} {p 8 16}{inp:. lab var or "Odds ratio"}{p_end} {p 8 16}{inp:. lab var caseodds "Case exposure odds"}{p_end} {p 8 16}{inp:. lab var caseprev "Case exposure prevalence"}{p_end} {p 8 16}{inp:. lab var sdinflor "SD of influence for log OR"}{p_end} {p 8 16}{inp:. desc}{p_end} {p 8 16}{inp:. * Detectable OR by number of cases *}{p_end} {p 8 16}{inp:. powercal ncases, power(0.9) alpha(0.01) delta(logor) sdinf(sdinflor)}{p_end} {p 8 16}{inp:. line or ncases if ncases<=2000, yscale(log)}{p_end} {p 8 16}{inp:. more}{p_end} {p 8 16}{inp:. * Significance level by odds ratio *}{p_end} {p 8 16}{inp:. powercal alphamin, power(0.9) delta(logor) sdinf(sdinflor) nunit(100)}{p_end} {p 8 16}{inp:. line alphamin or, yscale(log reverse) ylab(1 0.05 1e-1 1e-2 1e-3 1e-4 1e-5 1e-6 1e-7) xscale(log) xlab(1 1.25 1.5 2(1)5)}{p_end} {p 8 16}{inp:. more}{p_end} {title:Author} {pstd} Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {title:References} {pstd} Newson R. 2004. Generalized power calculations for generalized linear models and more. {it:The Stata Journal} 4(4): 379-401. Download from {browse "http://www.stata-journal.com/article.html?article=st0074":The Stata Journal website}. {title:Also see} {p 0 21} {bind: }Manual: {hi:[R] sampsi} {p_end} {p 0 21} On-line: help for {helpb sampsi} {p_end}