-------------------------------------------------------------------------------
help for powercal                                                (Roger Newson)
-------------------------------------------------------------------------------

Generalized power calculations saving results in variables

        powercal newvarname [if exp] [in range] , [ nunit(expression_1)
                           power(expression_2) alpha(expression_3)
                           delta(expression_4) sdinf(expression_5)
                           tdf(expression_6) noceiling float ]

where expression is a numeric expression. The numeric expression for each
option must be in the form required by the generate command. That is to say,
each expression must be specified so that the command

gene double newvarname=(expression)

will work.


Description

powercal performs generalized power calculations, storing the result in a new
variable with a name specified by newvarname. All except one of the options
nunit, power, alpha, delta and sdinf must be specified. The single unspecified
option in this list specifies whether the output variable is the number of
sampling units, power, alpha (significance level), delta (difference in
parameter value to be detected), or the standard deviation (SD) of the
influence function. Any of these 5 quantities can be calculated from the other
4. powercal can be used to calculate any of these quantities, assuming that we
are testing a hypothesis that a parameter is zero, and that the true value is
given by delta, and that the sample statistic is distributed around the
population parameter in such a way that the pivotal quantity

PQ = sqrt(nunits) * (delta/sdinf)

has a standard Normal distribution (if tdf() is not specified) or a
t-distribution with tdf() degrees of freedom (if tdf() is specified). The
formulas used by powercal define power as the probability of detecting a
difference in the right direction, using a two-tailed test.


Options

nunit(expression_1) gives an expression whose value is the number of
    independent sampling units. Sampling units are defined very generally. For
    instance, in an experiment involving equal-sized samples of individuals
    from Population A and Population B, a sampling unit might be a pair of
    sampled individuals, one from each population. Similarly, in a case-control
    study with 4 controls per case, a sampling unit might be a case together
    with 4 controls.

power(expression_2) gives an expression whose value is the power to detect a
    difference specified by the delta() option (see below).  The power is
    defined as the probability that the sample difference is in the correct
    direction, and also large enough to be significant, using a 2-tailed test,
    at the level specified by the alpha() option (see below).

alpha(expression_3) gives an expression whose value is the size, or
    significance level, of the statistical test (in units of probability, not
    percentage).

delta(expression_4) gives an expression whose value is the true population
    difference to be detected. This difference is assumed to be positive.
    Therefore, if the user wishes to detect a negative difference, then s/he
    should specify an expression equal to minus that difference. The difference
    may be the log of a ratio parameter, such as an odds ratio, rate ratio,
    risk ratio or ratio of geometric means.

sdinf(expression_5) gives an expression whose value is the standard deviation
    of the influence function. That is to say, it is an expression equal to the
    expected standard error of the sample difference multiplied by the square
    root of the number of sampling units, where sampling units are defined
    generally, as specified in the option nunit(). In the simple case of a
    paired t-test, sdinf() is the standard deviation of the paired differences.
    More generally, sdinf() can be defined by calculating a standard error for
    a particular number of units, from a pilot study, from a simulation or from
    a formula, and multiplying this standard error by the square root of the
    number of units in the pilot study, simulation or formula.

tdf(expression_6) gives an expression whose value is the degrees of freedom of
    the t-distribution to be assumed for the pivotal quantity PQ specified
    above. The degrees of freedom expression is not necessarily integer-valued.
    If tdf() is absent, then PQ is assumed to follow a standard Normal
    distribution.

noceiling specifies that, if the output variable specified by newvarname is a
    number of units, then it will not be rounded up to the lowest integer no
    less than itself (as calculated by the Stata 8 ceil() function). This
    option can be useful if the output variable is intended to specify an
    amount of exposure, such as a number of person-years, and the input sdinf()
    expression specifies a standard deviation of the influence function per
    unit exposure. If noceiling is not specified, and power(), alpha(), delta()
    and sdinf() are specified, then powercal rounds up the output variable, so
    that it contains a whole number of units

float specifies that the output variable will have a storage type no higher
    than float.  If float is not specified, then powercal creates the output
    variable with storage type double.  Whether or not float is specified,
    powercal compresses the output variable as much as possible without loss of
    precision. (See help for compress.)


Remarks

powercal carries out sample size calculations for a more general range of
possible experimental designs than sampsi, and stores the result in a new
variable, instead of reporting the result in the log. The new variable may be
input to further calculations and/or plotted and/or listed. powercal is
intended as a low-level programming tool for users intending to carry out
sample size calculations for a given experimental design. It is the
responsibility of the user to ensure that the expressions are correct, and to
choose a parameter scale on which the parameter is expected to be Normally
distributed (or t-distributed), with a variance that does not vary excessively
with the size of the measured difference.

The formulas used by powercal define power as the probability of detecting a
difference in the right direction, using a two-tailed test. It follows that, in
the limit, as the difference delta tends to zero, the power to detect a
difference of delta with a P-value of alpha tends to a minimum of alpha/2, and
not to a minimum of alpha. powercal converts to missing the results of all
input expressions for power() and alpha() which evaluate to a number outside
the open interval (0,1), and the results of all input expressions for delta(),
sdinf() and nunit() which evaluate to a non-positive number.  powercal also
converts to missing all values in the output variable for which there is not a
unique maximum or minimum value of the output quantity. See the manual
powercal.pdf (distributed as an ancillary file with the powercal package) for
details of the Methods and Formulas.


Examples

The following examples are explained in detail in the manual powercal.pdf,
which is distributed with the powercal package. They are designed to work both
under Stata 7 and under Stata 8.

This example creates Figure 1, displaying power as a function of the geometric
mean ratio between 2 treatment groups:

        . clear
        . scal cv=0.5
        . scal sdlog=sqrt(log(cv*cv + 1))
        . scal r20=exp(-2*sdlog*invnorm(0.2))
        . disp _n as text "Coefficient of variation: " as result cv _n as text
                "SD of logs: " as result sdlog _n as text "20% tail ratio: " as
                result r20
        . set obs 100
        . gene logratio=log(2)*(_n/_N)
        . lab var logratio "Log GM ratio"
        . gene gmratio=exp(logratio)
        . lab var gmratio "GM ratio"
        . powercal power, alpha(0.01) delta(logratio) sdinf(sdlog*sqrt(2))
                nunit(50) tdf(98)
        . version 7:graph power gmratio, s(.) c(L) ylab(0(0.05)1) yline(0.8
                0.9) xlab(1(0.1)2) xlog

This example creates Figure 2, displaying detectable geometric mean ratios
between 2 groups as a function of number per group:

        . clear
        . scal cv=0.5
        . scal sdlog=sqrt(log(cv*cv + 1))
        . scal r20=exp(-2*sdlog*invnorm(0.2))
        . disp _n as text "Coefficient of variation: " as result cv _n as text
                "SD of logs: " as result sdlog _n as text "20% tail ratio: " as
                result r20
        . set obs 100
        . gene npergp=_n
        . lab var npergp "Number per group"
        . powercal logratio, power(0.9) alpha(0.01) sdinf(sdlog*sqrt(2))
                nunit(npergp) tdf(2*(npergp-1))
        . gene hiratio=exp(logratio)
        . gene loratio=exp(-logratio)
        . lab var hiratio "Detectable GM ratio >1"
        . lab var loratio "Detectable GM ratio <1"
        . version 7:graph hiratio loratio npergp if _n>=5, s(..) c(LL) yline(1)
                ylab xlab(0(10)100)


This example creates Figures 3 and 4, displaying, respectively, detectable odds
ratios in a case-control study as a function of number of cases and attainable
significance levels as a function of odds ratio:

        . clear
        . scal conprev=0.25
        . scal conodds=conprev/(1-conprev)
        . disp _n as text "Expected control prevalence: " as result conprev _n
                as text "Expected control odds: " as result conodds
        . set obs 101
        . gene logor=log(1.25)+(log(5)-log(1.25))*(_n-1)/(_N-1)
        . gene or=exp(logor)
        . gene caseodds=conodds*or
        . gene caseprev=caseodds/(1+caseodds)
        . gene sdinflor=sqrt( 1/caseprev + 1/(1-caseprev) + (1/2)*( 1/conprev +
                1/(1-conprev) ) )
        . lab var logor "Log odds ratio"
        . lab var or "Odds ratio"
        . lab var caseodds "Case exposure odds"
        . lab var caseprev "Case exposure prevalence"
        . lab var sdinflor "SD of influence for log OR"
        . desc
        . * Detectable OR by number of cases *
        . powercal ncases, power(0.9) alpha(0.01) delta(logor) sdinf(sdinflor)
        . version 7:graph or ncases if ncases<=2000, s(.) c(l) ylog ylab xlab
        . more
        . * Significance level by odds ratio *
        . powercal alphamin, power(0.9) delta(logor) sdinf(sdinflor) nunit(100)
        . version 7:graph alphamin or, s(.) c(l) ylog yreverse ylab(1 0.05 1e-1
                1e-2 1e-3 1e-4 1e-5 1e-6 1e-7) yline(0.05 0.01) xlog xlab(1
                1.25 1.5 2(1)5)
        . more


Author

Roger Newson, King's College, London, UK.  Email: roger.newson@kcl.ac.uk


Also see

 Manual:  [R] sampsi
On-line:  help for sampsi