help ssi -------------------------------------------------------------------------------

Title

ssi -- Sample size and power calculation for balanced two-group randomized controlled trials (including non-inferiority and equivalence trials)

Syntax

ssi #1 #2 [, options]

options Description ------------------------------------------------------------------------- Main sd1(#) standard deviation of sample 1 sd2(#) standard deviation of sample 2

Options alpha(#) significance level of test; default is alpha(0.05) power(#) power of test; default is power(0.80) n(#) number of patients in each group (used when calculating power) loss(#) percentage of total sample expected to be lost to follow-up c1(#) percentage of sample 1 expected to crossover to sample 2 c2(#) percentage of sample 2 expected to crossover to sample 1 noninferiority use if trial design is non-inferiority equivalence use if trial design is equivalence -------------------------------------------------------------------------

Description

ssi is meant to be used by investigators needing sample size estimation for prospective randomized controlled trials (RCTs). Although Stata ships with a sample size estimation command (sampsi), it is not designed to estimate trial sizes for non-inferiority or equivalence trials.

ssi allows the estimation of sample size for two balanced (i.e. the same sample size for each group) groups for trials whose primary outcomes are either proportions or means.

If sd1(#) (with or without sd2(#)) is specified, ssi assumes a comparison of means; otherwise, it assumes a comparison of proportions. ssi is an immediate command, and all of its arguments are numbers; see immed.

Details on Syntax

Proportions 1. Two-sample comparison of proportions For sample size calculation: The postulated values of the two proportions are #1 and #2. For power calculation: The observed values of the two proportions are #1 and #2, and the per-group sample size is entered as n(#).

2. Non-inferiority & equivalence trial design (see remarks) For sample size calculation: The overall percentage of successes to be expected if the treatments are equivalent (or non-inferior) is #1, and the "delta" (minimal important difference to detect) is #2. Option noninferiority or equivalence must also be used. For power calculation: The observed proportion of successes in the comparator group is #1, the desired minimal important clinical difference (delta) is #2, and the per-group sample size is entered as n(#). Option noninferiority or equivalence must also be used.

Means 1. Two-sample comparison of means For sample size calculation: The postulated values of the means are #1 and #2, and the postulated standard deviations are sd1(#) and, optionally, sd2(#). For power calculation: The observed values of the means are #1 and #2, the actual standard deviations are sd1(#) and sd2(#), and the per-group sample size is entered as n(#).

2. Non-inferiority or equivalence trial design (see remarks) For sample size calculation: The postulated values of the means in the two groups are #1 and #2, (i.e. the minimal important detectable difference [delta] is the difference between the two postulated means) and the postulated standard deviation (assumed to be the same for each group) is entered as sd1(#). Option noninferiority or equivalence must also be used. For power calculation: The value of the observed mean in the comparator group is entered as #1, the mean of the other group that reflects the desired minimal important clinical difference (delta) is entered as #2 (i.e. the difference between #1 and #2 is the desired delta), the standard deviation of the groups is entered as sd1(#) (enter an average value if the two SDs are not the same), and the per-group sample size is entered as n(#). Option noninferiority or equivalence must also be used.

Options

+------+ ----+ Main +-------------------------------------------------------------

sd1(#) and sd2(#) are the standard deviations of population 1 and population 2, respectively. One or both must be specified when doing a comparison of means. If only sd1(#) is specified, ssi assumes that sd2(#) = sd1(#). If neither sd1(#) nor sd2(#) is specified, ssi assumes a test of proportions.

+---------+ ----+ Options +----------------------------------------------------------

alpha(#) is the significance level of the test. The default is alpha(0.05).

power(#) is the power of the test (= 1 - beta). The default is power(0.80). (Note that the default value of 0.80 constitutes a difference from the Stata sampsi command.

n(#) is used to calculate power (i.e. if any n is specified, sample size will not be calculated, and the power(#) parameter is ignored). n is the number of patients in each group. n in each group must be equal. (If they are not equal, enter the average size of each group as n.)

loss(#) is the percentage of patients of in the total sample who are expected to be lost to follow-up. For example, if 15% of patients in a trial are lost to follow-up, the total sample size needs to be increased to accommodate the reduction in power that occurs with the loss to follow-up. e.g. loss(15)

c1(#) is the percentage of patients of sample 1 expected to crossover to sample 2 during the trial. For example, if 5% of patients in one arm of a trial crossed over to the other group, the total sample size needs to be increased to accommodate the reduction in power that occurs with the crossover. e.g. c1(5)

c2(#) is the percentage of patients of sample 2 expected to crossover to sample 1 during the trial. See c1(#) above for details.

noninferiority indicates that you wish to calculate a sample size or power for a non-inferiority trial. (see remarks).

equivalence indicates that you wish to calculate a sample size or power for an equivalence trial. (see remarks).

Remarks and Background Information

+--------------------+ ----+ Equivalence Trials +-----------------------------------------------

Normally, the null hypothesis for a test statistic is that the two groups tested represent different samples from the same population (i.e. there is no significant difference between the groups). However, with equivalence and non-inferiority trial designs, the null hypothesis is the opposite: that the two groups are in fact different.

The graph below, adapted from Jones B (see References), demonstrates the confidence interval approach to equivalence testing. Once a delta has been pre-selected, the CI of the difference between treatments is visualized along the horizontal axis. In an infinitely large trial with infinitely narrow CIs, if the two treatments are really equivalent, then the difference will be 0.

If the CI of the difference between treatments lies completely within [-delta to +delta], then the proposed treatment can be considered to be equivalent to the comparator. However, as stated by Jones B,

"It is important to emphasise that absolute equivalence can never be demonstrated: it is possible only to assert that the true difference is unlikely to be outside a range which depends on the size of the trial, the results of the trial, and the specified probabilities of error. If we have predefined a range of equivalence as an interval from -delta to +delta we can then simply check whether the confidence interval centred on the observed difference lies entirely between -delta and +delta. If it does, equivalence is demonstrated; if it does not, there is still room for doubt." (see References).

(Note: if the graph below looks strange, re-size your Viewer window so it is wider.}

| | | | Not equivalent | | | | <<------------->> | | | Uncertain | | | <<--------+------->> | | | Equivalent | | | | <<------------>> | | | Equivalent | | | <<-------+--------->> | | | Equivalent | | | | <<------------>> | | | Uncertain | | | <<-------+-------->> | | | Not equivalent | | | + <<------------->> | | | | | Uncertain | + <<-----------+-------------------+------------------+--------- > -->> | | | | | | | | > | | | | +-------------------+-------------------+------------------+------ > ------------+ - delta 0 + delta Observed treatment difference

+------------------------+ ----+ Non-Inferiority Trials +-------------------------------------------

If we are only interested in ensuring that a proposed treatment (say, a new antibiotic) is not worse than a certain comparator, we can use a non-inferiority approach, where we are only interested in a one-sided difference. The approach is to first pre-define the smallest level of inferiority of the proposed treatment (delta), which, if surpassed, would be a clinically unacceptable difference. If the CI of the difference between treatments lies completely on one side of delta (in the more favourable direction for the new treatment), then the proposed treatment can be considered to be non-inferior to the comparator. Examples are found in the graph below.

(Note: if the graph below looks strange, re-size your Viewer window so it is wider.}

Note: text on graph refers to how the new treatment fares compared to the standard treatment. Example 1 is clearly worse, as its CI lies completely to the "bad" side of delta. Examples 2 and 8 are uncertain, since their CIs cross both delta and zero. Examples 3 and 4 meet the criteria for non-inferiority. Examples 5, 6, and 7 meet the criteria, and they also are superior to the comparator, since their CIs exclude no treatment effect in a favourable direction.

1 | | Worse | | <<------------->> 2 | Uncertain | <<--------+------->> 3 | Non-inferior | | <<------------>> | 4 Non-inferior | <<-------+--------->> | 5 Non-inferior | | <<------------>> | | 6 Non-inferior | | <<--------------->> | | 7 Non-inferior | | <<------------->> | | 8 Uncertain | <<------------------------------+------------------+--------- > -->> | | | | > | | +---------+---------+---------+---------+--------+---------+------ > ---+--------+ 0 delta New treatment better New treatment worse Observed treatment difference

Examples

+-------------+ ----+ Proportions +------------------------------------------------------

1. Two-sample comparison of proportions. The post-operative myocardial infarction rate as a baseline is 5.4% (0.054 as a proportion), and a new treatment is hypothesized to reduce mortality by half to 2.7% (0.027). Calculate the required sample size with a power of 80% and an alpha of 5%.

. ssi 0.054 0.027

2. Non-inferiority trial design. A new antibiotic was tested against penicillin for erysipelas. The primary outcome was the clinical cure rate. For the sample size calculation, the investigators assumed the proportion cured in both arms would be 85% (pi = 0.85). They considered that a difference in cure rate as large as 10% in favour of penicillin would still allow the new antibiotic to be non-inferior (delta = 0.1). Calculate the sample size based on 90% power to confirm non-inferiority and a one-sided confidence level of 97.5%.

. ssi 0.85 0.10, alpha(0.025) power(0.9) noninferiority

3. Equivalence trial design. Using the same example as above, the investigators thought that the new antibiotic could also be better than penicillin, but they were not sure, therefore they wanted to allow for a two-sided test. Therefore, performing an equivalence trial allows them to calculate two-sided confidence intervals. They relax their significance assumption to a more conventional alpha of 0.05.

. ssi 0.85 0.1, p(0.9) equivalence

4. Power Analysis. You read a paper comparing candesartan to ramipril for hypertensive patients that compared mortality rates at one year. In the candesartan group 2% of people died by one year, and in the ramipril group 4% died by one year. 400 patients were studied in each group. What power did the authors have to state that there was no difference between groups?

. ssi 0.04 0.02, n(400)

+-------+ ----+ Means +------------------------------------------------------------

1. Two-sample comparison of means. Serum troponin T in the medication arm was 2.55 (SD=2.12) while in the cardiac surgery arm troponin T was 3.94 (SD=2.8). Calculate the required sample size with a power of 80%, alpha of 5%, and an expected loss of followup of 5% of the total sample.

. ssi 2.55 3.94, sd1(2.12) sd2(2.8) loss(5)

2. Same as #1, except no loss to followup occurred, but 2% of each group crossed over into the other group, thereby reducing power.

. ssi 2.55 3.94, sd1(2.12) sd2(2.8) c1(2) c2(2)

3. Non-inferiority trial design. A new inhaler for asthma will be considered non-inferior to the standard treatment if it does not reduce the morning peak expiratory flow rate by more than 25 l/min (from a baseline of 450 l/min to 425 l/min). Previous data suggests that the SD in the trial population will be approximately 40 l/min. Calculate the required sample size with a power of 80% using a significance level of 2.5% (alpha = 0.025). (Keep in mind that the absolute values of the baseline and changed measurements do not matter: only the change and SD are important. This example could just have easily used the range 250 l/min to 225 l/min and obtained exactly the same result.)

. ssi 450 425, sd1(40) a(0.025) non

4. Equivalence trial design. Using the same values as the previous example, except this time, we are interested to know if the new inhaler is equivalent. We say the new inhaler will be deemed to be equivalent if the morning peak expiratory flow rate is within +/- 25 l/min of the standard inhaler. Since the absolute values do not matter, we will use 25 and 0 as our values for #1 and #2 (their order also does not matter). The expected SD is the same (40 l/min). Significance of 5% and power of 80%.

. ssi 25 0, sd(40) equ

5. Power Analysis. A new anaesthetic is thought to have superior pharmacologic preconditioning efficacy than the old anaesthetic. The average peak postoperative serum troponin T was 2.32 (SD=2.01) in the group anaesthetized with the old anaesthetic, while those patients anaesthetized with the new anaesthetic had an average peak postoperative troponin T of 1.75 (SD=1.52). 100 patients were studied in each group. What power did the researchers have to state that there was no difference between the two anaesthetics in terms of peak postoperative troponin T levels?

. ssi 2.32 1.75, sd1(2.01) sd2(1.52) n(100)

References:

1) Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ 1996;313:36-9. (i: British Medical Journal link, ii: PDF)

Saved results

ssi saves the following in r() (some values are not presented when power is calculated):

Scalars

r(power) entered or calculated power r(adj_ss) adjusted sample size (for loss of follow-up and/or crossovers) r(per_group_size) sample size for each of the two groups r(ss) total sample size

Author Information:

Philip M Jones, MD FRCPC Department of Anesthesiology & Perioperative Medicine Faculty of Medicine & Dentistry University of Western Ontario London, Ontario, Canada pjones8@uwo.ca

Change Log:

14 May 2010 - Version 1.0.0

Initial version published.

Also see

Manual: [R] sampsi

Help: sampsi