------------------------------------------------------------------------------- help forpowercal(Roger Newson) -------------------------------------------------------------------------------

Generalized power calculations saving results in variables

powercalnewvarname[ifexp] [inrange] , [nunit(expression_1)power(expression_2)alpha(expression_3)delta(expression_4)sdinf(expression_5)tdf(expression_6)noceilingfloat]where

expressionis a numeric expression. The numeric expression for each option must be in the form required by thegeneratecommand. That is to say, each expression must be specified so that the command

gene doublenewvarname=(expression)will work.

Description

powercalperforms generalized power calculations, storing the result in a new variable with a name specified bynewvarname. All except one of the optionsnunit,power,alpha,deltaandsdinfmust be specified. The single unspecified option in this list specifies whether the output variable is the number of sampling units, power, alpha (significance level), delta (difference in parameter value to be detected), or the standard deviation (SD) of the influence function. Any of these 5 quantities can be calculated from the other 4.powercalcan be used to calculate any of these quantities, assuming that we are testing a hypothesis that a parameter is zero, and that the true value is given bydelta, and that the sample statistic is distributed around the population parameter in such a way that the pivotal quantity

PQ = sqrt(nunits) * (delta/sdinf)has a standard Normal distribution (if

tdf()is not specified) or a t-distribution withtdf()degrees of freedom (iftdf()is specified). The formulas used bypowercaldefine power as the probability of detecting a difference in the right direction, using a two-tailed test.

Options

nunit(expression_1)gives an expression whose value is the number of independent sampling units. Sampling units are defined very generally. For instance, in an experiment involving equal-sized samples of individuals from Population A and Population B, a sampling unit might be a pair of sampled individuals, one from each population. Similarly, in a case-control study with 4 controls per case, a sampling unit might be a case together with 4 controls.

power(expression_2)gives an expression whose value is the power to detect a difference specified by thedelta()option (see below). The power is defined as the probability that the sample difference is in the correct direction, and also large enough to be significant, using a 2-tailed test, at the level specified by thealpha()option (see below).

alpha(expression_3)gives an expression whose value is the size, or significance level, of the statistical test (in units of probability, not percentage).

delta(expression_4)gives an expression whose value is the true population difference to be detected. This difference is assumed to be positive. Therefore, if the user wishes to detect a negative difference, then s/he should specify an expression equal to minus that difference. The difference may be the log of a ratio parameter, such as an odds ratio, rate ratio, risk ratio or ratio of geometric means.

sdinf(expression_5)gives an expression whose value is the standard deviation of the influence function. That is to say, it is an expression equal to the expected standard error of the sample difference multiplied by the square root of the number of sampling units, where sampling units are defined generally, as specified in the optionnunit(). In the simple case of a paired t-test,sdinf()is the standard deviation of the paired differences. More generally,sdinf()can be defined by calculating a standard error for a particular number of units, from a pilot study, from a simulation or from a formula, and multiplying this standard error by the square root of the number of units in the pilot study, simulation or formula.

tdf(expression_6)gives an expression whose value is the degrees of freedom of the t-distribution to be assumed for the pivotal quantityPQspecified above. The degrees of freedom expression is not necessarily integer-valued. Iftdf()is absent, thenPQis assumed to follow a standard Normal distribution.

noceilingspecifies that, if the output variable specified bynewvarnameis a number of units, then it will not be rounded up to the lowest integer no less than itself (as calculated by the Stata 8ceil()function). This option can be useful if the output variable is intended to specify an amount of exposure, such as a number of person-years, and the inputsdinf()expression specifies a standard deviation of the influence function per unit exposure. Ifnoceilingis not specified, andpower(),alpha(),delta()andsdinf()are specified, thenpowercalrounds up the output variable, so that it contains a whole number of units

floatspecifies that the output variable will have a storage type no higher thanfloat. Iffloatis not specified, thenpowercalcreates the output variable with storage typedouble. Whether or notfloatis specified,powercalcompresses the output variable as much as possible without loss of precision. (See help for compress.)

Remarks

powercalcarries out sample size calculations for a more general range of possible experimental designs than sampsi, and stores the result in a new variable, instead of reporting the result in the log. The new variable may be input to further calculations and/or plotted and/or listed.powercalis intended as a low-level programming tool for users intending to carry out sample size calculations for a given experimental design. It is the responsibility of the user to ensure that the expressions are correct, and to choose a parameter scale on which the parameter is expected to be Normally distributed (or t-distributed), with a variance that does not vary excessively with the size of the measured difference.The formulas used by

powercaldefine power as the probability of detecting a difference in the right direction, using a two-tailed test. It follows that, in the limit, as the differencedeltatends to zero, the power to detect a difference ofdeltawith a P-value ofalphatends to a minimum ofalpha/2, and not to a minimum ofalpha.powercalconverts to missing the results of all input expressions forpower()andalpha()which evaluate to a number outside the open interval(0,1), and the results of all input expressions fordelta(),sdinf()andnunit()which evaluate to a non-positive number.powercalalso converts to missing all values in the output variable for which there is not a unique maximum or minimum value of the output quantity. See the manualpowercal.pdf(distributed as an ancillary file with thepowercalpackage) for details of the Methods and Formulas.

ExamplesThe following examples are explained in detail in the manual

powercal.pdf, which is distributed with thepowercalpackage. They are designed to work both under Stata 7 and under Stata 8.This example creates Figure 1, displaying power as a function of the geometric mean ratio between 2 treatment groups:

. clear . scal cv=0.5 . scal sdlog=sqrt(log(cv*cv + 1)) . scal r20=exp(-2*sdlog*invnorm(0.2)) . disp _n as text "Coefficient of variation: " as result cv _n as text "SD of logs: " as result sdlog _n as text "20% tail ratio: " as result r20 . set obs 100 . gene logratio=log(2)*(_n/_N) . lab var logratio "Log GM ratio" . gene gmratio=exp(logratio) . lab var gmratio "GM ratio" . powercal power, alpha(0.01) delta(logratio) sdinf(sdlog*sqrt(2)) nunit(50) tdf(98) . version 7:graph power gmratio, s(.) c(L) ylab(0(0.05)1) yline(0.8 0.9) xlab(1(0.1)2) xlog

This example creates Figure 2, displaying detectable geometric mean ratios between 2 groups as a function of number per group:

. clear . scal cv=0.5 . scal sdlog=sqrt(log(cv*cv + 1)) . scal r20=exp(-2*sdlog*invnorm(0.2)) . disp _n as text "Coefficient of variation: " as result cv _n as text "SD of logs: " as result sdlog _n as text "20% tail ratio: " as result r20 . set obs 100 . gene npergp=_n . lab var npergp "Number per group" . powercal logratio, power(0.9) alpha(0.01) sdinf(sdlog*sqrt(2)) nunit(npergp) tdf(2*(npergp-1)) . gene hiratio=exp(logratio) . gene loratio=exp(-logratio) . lab var hiratio "Detectable GM ratio >1" . lab var loratio "Detectable GM ratio <1" . version 7:graph hiratio loratio npergp if _n>=5, s(..) c(LL) yline(1) ylab xlab(0(10)100)

This example creates Figures 3 and 4, displaying, respectively, detectable odds ratios in a case-control study as a function of number of cases and attainable significance levels as a function of odds ratio:

. clear . scal conprev=0.25 . scal conodds=conprev/(1-conprev) . disp _n as text "Expected control prevalence: " as result conprev _n as text "Expected control odds: " as result conodds . set obs 101 . gene logor=log(1.25)+(log(5)-log(1.25))*(_n-1)/(_N-1) . gene or=exp(logor) . gene caseodds=conodds*or . gene caseprev=caseodds/(1+caseodds) . gene sdinflor=sqrt( 1/caseprev + 1/(1-caseprev) + (1/2)*( 1/conprev + 1/(1-conprev) ) ) . lab var logor "Log odds ratio" . lab var or "Odds ratio" . lab var caseodds "Case exposure odds" . lab var caseprev "Case exposure prevalence" . lab var sdinflor "SD of influence for log OR" . desc . * Detectable OR by number of cases * . powercal ncases, power(0.9) alpha(0.01) delta(logor) sdinf(sdinflor) . version 7:graph or ncases if ncases<=2000, s(.) c(l) ylog ylab xlab . more . * Significance level by odds ratio * . powercal alphamin, power(0.9) delta(logor) sdinf(sdinflor) nunit(100) . version 7:graph alphamin or, s(.) c(l) ylog yreverse ylab(1 0.05 1e-1 1e-2 1e-3 1e-4 1e-5 1e-6 1e-7) yline(0.05 0.01) xlog xlab(1 1.25 1.5 2(1)5) . more

AuthorRoger Newson, King's College, London, UK. Email: roger.newson@kcl.ac.uk

Also seeManual:

[R] sampsiOn-line: help for sampsi