help ssi-------------------------------------------------------------------------------

Title

ssi-- Sample size and power calculation for balanced two-group randomized controlled trials (including non-inferiority and equivalence trials)

Syntax

ssi#1 #2 [,options]

optionsDescription ------------------------------------------------------------------------- Mainsd1(#)standard deviation of sample 1sd2(#)standard deviation of sample 2Options

alpha(#)significance level of test; default isalpha(0.05)power(#)power of test; default ispower(0.80)n(#)number of patients in each group (used when calculating power)loss(#)percentage of total sample expected to be lost to follow-upc1(#)percentage of sample 1 expected to crossover to sample 2c2(#)percentage of sample 2 expected to crossover to sample 1noninferiorityuse if trial design is non-inferiorityequivalenceuse if trial design is equivalence -------------------------------------------------------------------------

Description

ssiis meant to be used by investigators needing sample size estimation for prospective randomized controlled trials (RCTs). Although Stata ships with a sample size estimation command (sampsi), it is not designed to estimate trial sizes for non-inferiority or equivalence trials.

ssiallows the estimation of sample size for twobalanced(i.e. the same sample size for each group) groups for trials whose primary outcomes are either proportions or means.If

sd1(#)(with or withoutsd2(#)) is specified,ssiassumes a comparison of means; otherwise, it assumes a comparison of proportions.ssiis an immediate command, and all of its arguments are numbers; see immed.

Details on Syntax

1. Two-sample comparison of proportionsProportionsFor sample size calculation: Thepostulatedvalues of the two proportions are#1and#2.For power calculation: Theobservedvalues of the two proportions are#1and#2, and the per-group sample size is entered asn(#).2. Non-inferiority & equivalence trial design (see remarks)

For sample size calculation: The overall percentage of successes to be expected if the treatments are equivalent (or non-inferior) is#1, and the "delta" (minimal important difference to detect) is#2. Optionnoninferiorityorequivalencemust also be used.For power calculation: The observed proportion of successes in the comparator group is#1, the desired minimal important clinical difference (delta) is#2, and the per-group sample size is entered asn(#). Optionnoninferiorityorequivalencemust also be used.

1. Two-sample comparison of meansMeansFor sample size calculation: Thepostulatedvalues of the means are#1and#2, and the postulated standard deviations aresd1(#)and, optionally,sd2(#).For power calculation: Theobservedvalues of the means are#1and#2, theactualstandard deviations aresd1(#)andsd2(#), and the per-group sample size is entered asn(#).2. Non-inferiority or equivalence trial design (see remarks)

For sample size calculation: Thepostulatedvalues of the means in the two groups are#1and#2, (i.e. the minimal important detectable difference [delta] is the difference between the two postulated means) and the postulated standard deviation (assumed to be the same for each group) is entered assd1(#). Optionnoninferiorityorequivalencemust also be used.For power calculation: The value of the observed mean in the comparator group is entered as#1, the mean of the other group that reflects the desired minimal important clinical difference (delta) is entered as#2(i.e. the difference between #1 and #2 is the desired delta), the standard deviation of the groups is entered assd1(#)(enter an average value if the two SDs are not the same), and the per-group sample size is entered asn(#). Optionnoninferiorityorequivalencemust also be used.

Options+------+ ----+ Main +-------------------------------------------------------------

sd1(#)andsd2(#)are the standard deviations of population 1 and population 2, respectively. One or both must be specified when doing a comparison of means. If onlysd1(#)is specified,ssiassumes thatsd2(#)=sd1(#). If neithersd1(#)norsd2(#)is specified,ssiassumes a test of proportions.+---------+ ----+ Options +----------------------------------------------------------

alpha(#)is the significance level of the test. The default isalpha(0.05).

power(#)is the power of the test (= 1 - beta). The default ispower(0.80). (Note that the default value of 0.80 constitutes a difference from the Statasampsicommand.

n(#)is used to calculate power (i.e. if anynis specified, sample size will not be calculated, and thepower(#)parameter is ignored).nis the number of patients in each group.nin each group must be equal. (If they are not equal, enter the average size of each group asn.)

loss(#)is the percentage of patients of in the total sample who are expected to be lost to follow-up. For example, if 15% of patients in a trial are lost to follow-up, the total sample size needs to be increased to accommodate the reduction in power that occurs with the loss to follow-up. e.g.loss(15)

c1(#)is the percentage of patients of sample 1 expected to crossover to sample 2 during the trial. For example, if 5% of patients in one arm of a trial crossed over to the other group, the total sample size needs to be increased to accommodate the reduction in power that occurs with the crossover. e.g.c1(5)

c2(#)is the percentage of patients of sample 2 expected to crossover to sample 1 during the trial. Seec1(#)above for details.

noninferiorityindicates that you wish to calculate a sample size or power for a non-inferiority trial. (see remarks).

equivalenceindicates that you wish to calculate a sample size or power for an equivalence trial. (see remarks).

Remarks and Background Information+--------------------+ ----+ Equivalence Trials +-----------------------------------------------

Normally, the null hypothesis for a test statistic is that the two groups tested represent different samples from the same population (i.e. there is no significant difference between the groups). However, with equivalence and non-inferiority trial designs, the null hypothesis is the opposite: that the two groups

arein fact different.The graph below, adapted from Jones B (see References), demonstrates the confidence interval approach to equivalence testing. Once a

deltahas been pre-selected, the CI of the difference between treatments is visualized along the horizontal axis. In an infinitely large trial with infinitely narrow CIs, if the two treatments are really equivalent, then the difference will be 0.If the CI of the difference between treatments lies completely within [-delta to +delta], then the proposed treatment can be considered to be equivalent to the comparator. However, as stated by Jones B,

"It is important to emphasise that absolute equivalence can never be demonstrated: it is possible only to assert that the true difference is unlikely to be outside a range which depends on the size of the trial, the results of the trial, and the specified probabilities of error. If we have predefined a range of equivalence as an interval from -delta to +delta we can then simply check whether the confidence interval centred on the observed difference lies entirely between -delta and +delta. If it does, equivalence is demonstrated; if it does not, there is still room for doubt." (see References).

(

Note: if the graph below looks strange, re-size your Viewer window so itis wider.}

| | | | Not equivalent | | | | <<------------->> | | | Uncertain | | | <<--------+------->> | | | Equivalent | | | | <<------------>> | | | Equivalent | | | <<-------+--------->> | | | Equivalent | | | | <<------------>> | | | Uncertain | | | <<-------+-------->> | | | Not equivalent | | | + <<------------->> | | | | | Uncertain | + <<-----------+-------------------+------------------+--------- > -->> | | | | | | | | > | | | | +-------------------+-------------------+------------------+------ > ------------+ - delta 0 + delta Observed treatment difference

+------------------------+ ----+ Non-Inferiority Trials +-------------------------------------------

If we are only interested in ensuring that a proposed treatment (say, a new antibiotic) is

not worsethan a certain comparator, we can use a non-inferiority approach, where we are only interested in a one-sided difference. The approach is to first pre-define the smallest level of inferiority of the proposed treatment (delta), which, if surpassed, would be a clinically unacceptable difference. If the CI of the difference between treatments lies completely on one side of delta (in the more favourable direction for the new treatment), then the proposed treatment can be considered to be non-inferior to the comparator. Examples are found in the graph below.(

Note: if the graph below looks strange, re-size your Viewer window so itis wider.}Note: text on graph refers to how the new treatment fares compared to the standard treatment. Example 1 is clearly worse, as its CI lies completely to the "bad" side of delta. Examples 2 and 8 are uncertain, since their CIs cross both delta and zero. Examples 3 and 4 meet the criteria for non-inferiority. Examples 5, 6, and 7 meet the criteria, and they also are

superiorto the comparator, since their CIs exclude no treatment effect in a favourable direction.

1 | | Worse | | <<------------->> 2 | Uncertain | <<--------+------->> 3 | Non-inferior | | <<------------>> | 4 Non-inferior | <<-------+--------->> | 5 Non-inferior | | <<------------>> | | 6 Non-inferior | | <<--------------->> | | 7 Non-inferior | | <<------------->> | | 8 Uncertain | <<------------------------------+------------------+--------- > -->> | | | | > | | +---------+---------+---------+---------+--------+---------+------ > ---+--------+ 0 delta New treatment better New treatment worse Observed treatment difference

Examples+-------------+ ----+ Proportions +------------------------------------------------------

1.

Two-sample comparison of proportions.The post-operative myocardial infarction rate as a baseline is 5.4% (0.054 as a proportion), and a new treatment is hypothesized to reduce mortality by half to 2.7% (0.027). Calculate the required sample size with a power of 80% and an alpha of 5%.

. ssi 0.054 0.0272.

Non-inferiority trial design.A new antibiotic was tested against penicillin for erysipelas. The primary outcome was the clinical cure rate. For the sample size calculation, the investigators assumed the proportion cured in both arms would be 85% (pi = 0.85). They considered that a difference in cure rate as large as 10%in favourof penicillinwould still allow the new antibiotic to be non-inferior (delta = 0.1). Calculate the sample size based on 90% power to confirm non-inferiority and a one-sided confidence level of 97.5%.

. ssi 0.85 0.10, alpha(0.025) power(0.9) noninferiority3.

Equivalence trial design.Using the same example as above, the investigators thought that the new antibiotic could also bebetterthan penicillin, but they were not sure, therefore they wanted to allow for a two-sided test. Therefore, performing an equivalence trial allows them to calculate two-sided confidence intervals. They relax their significance assumption to a more conventional alpha of 0.05.

. ssi 0.85 0.1, p(0.9) equivalence4.

Power Analysis.You read a paper comparing candesartan to ramipril for hypertensive patients that compared mortality rates at one year. In the candesartan group 2% of people died by one year, and in the ramipril group 4% died by one year. 400 patients were studied in each group. What power did the authors have to state that there was no difference between groups?

. ssi 0.04 0.02, n(400)+-------+ ----+ Means +------------------------------------------------------------

1.

Two-sample comparison of means.Serum troponin T in the medication arm was 2.55 (SD=2.12) while in the cardiac surgery arm troponin T was 3.94 (SD=2.8). Calculate the required sample size with a power of 80%, alpha of 5%, and an expected loss of followup of 5% of the total sample.

. ssi 2.55 3.94, sd1(2.12) sd2(2.8) loss(5)2.

Same as #1, except no loss to followup occurred, but 2% of each group crossed over into the other group, thereby reducing power.

. ssi 2.55 3.94, sd1(2.12) sd2(2.8) c1(2) c2(2)3.

Non-inferiority trial design.A new inhaler for asthma will be considered non-inferior to the standard treatment if it does not reduce the morning peak expiratory flow rate by more than 25 l/min (from a baseline of 450 l/min to 425 l/min). Previous data suggests that the SD in the trial population will be approximately 40 l/min. Calculate the required sample size with a power of 80% using a significance level of 2.5% (alpha = 0.025). (Keep in mind that the absolute values of the baseline and changed measurements do not matter: only the change and SD are important. This example could just have easily used the range 250 l/min to 225 l/min and obtained exactly the same result.)

. ssi 450 425, sd1(40) a(0.025) non4.

Equivalence trial design.Using the same values as the previous example, except this time, we are interested to know if the new inhaler isequivalent. We say the new inhaler will be deemed to be equivalent if the morning peak expiratory flow rate is within +/- 25 l/min of the standard inhaler. Since the absolute values do not matter, we will use 25 and 0 as our values for#1and#2(their order also does not matter). The expected SD is the same (40 l/min). Significance of 5% and power of 80%.

. ssi 25 0, sd(40) equ5.

Power Analysis.A new anaesthetic is thought to have superior pharmacologic preconditioning efficacy than the old anaesthetic. The average peak postoperative serum troponin T was 2.32 (SD=2.01) in the group anaesthetized with the old anaesthetic, while those patients anaesthetized with the new anaesthetic had an average peak postoperative troponin T of 1.75 (SD=1.52). 100 patients were studied in each group. What power did the researchers have to state that there was no difference between the two anaesthetics in terms of peak postoperative troponin T levels?

. ssi 2.32 1.75, sd1(2.01) sd2(1.52) n(100)

References:1) Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ 1996;313:36-9. (i: British Medical Journal link, ii: PDF)

Saved results

ssisaves the following inr()(some values are not presented when power is calculated):Scalars

r(power)entered or calculated powerr(adj_ss)adjusted sample size (for loss of follow-up and/or crossovers)r(per_group_size)sample size for each of the two groupsr(ss)total sample size

Author Information:Philip M Jones, MD FRCPC Department of Anesthesiology & Perioperative Medicine Faculty of Medicine & Dentistry University of Western Ontario London, Ontario, Canada pjones8@uwo.ca

Change Log:

14 May 2010- Version 1.0.0Initial version published.

Also seeManual:

[R] sampsiHelp: sampsi