.- help for ^stcascoh^ [ec] Nov 21, 20 > 08 .-Create a dataset suitable for case-cohort analysis --------------------------------------------------

^stcascoh^ [varlist] [^if^ exp] [^in^ range] ^, a^lpha^(^#^)^ [ ^gr^oup^(^v > arnames^)^ ^gen^erate^(^varlist^)^ ^ep^s^(^#^)^ ^se^ed(#) ^nosh^o > w ]

^stcascoh^ is for use with survival-time data; see help @st@. You must ^stset^ > data with an ^id()^ variable before using this command; see help @stset@.

Description -----------

^stcascoh^ is used to create the appropriate dataset for a case-cohort analysis > by sampling the cohort at the time of entry (subcohort) and including all failures > , whether they occur in the random sample or not (non subcohort cases). To this aim, ^stcascoh^ expands > the observations who fail in two parts: (1) time interval (t0,t-eps] and (2) time interval (t-eps,t]. The following variables are added to the dataset:

^ _subco^ coded 0 for subcohort member with no failure 1 for subcohort members who failed 2 for non subcohort cases ^ _wBarlow^ log-weights of records as in Barlow scheme ^ _wSelPre^ log-weights of records as in Self and Prentice scheme.

The names of the new variables can be changed by specifying the ^gen()^ option. varlist defines the variables to be retained in the final dataset. If varlist i > s not specified, all variables are carried over into the resulting dataset.

^stcascoh^ shows two summary tables: the first one describes the subcohort memb > ership vs. failure in the cohort; the second displays the risk sets with three control > s or less to check whether the subcohort becomes small due to many failures or censorings > .

In the new dataset, non-subcohort cases cannot rely on the original ^stset^ dec > laration. At the end of the module, ^stset^ is invoked to fix entry and exit time to the > present _t0 and _t variables.

Randomness in the sampling is obtained using Stata uniform() function. Seed can > be specified by a ^seed()^ option. Observations not meeting ^if^ and ^in^ criteria are dropp > ed even if they fail.

The resulting dataset can be analyzed using ^stselpre^ or ^stcox^. ^stselpre^ fits proportional hazards model according with Prentice and Self-Pre > ntice methods. Self-Prentice model based variance is estimated. When using ^stcox^, ^robust^ option is needed to estimate the approximate varia > nce as proposed by Lin and Ying and Barlow. In this case, analysis can be performed using three method > s:

1- Prentice: ^stcox varlist, robust^ 2- Self and Prentice: ^stcox varlist, offset(_wSelPre) robust^ 3- Barlow: ^stcox varlist, offset(_wBarlow) robust^

Options -------

^alpha(^#^)^ specifies the sampling fraction. Sampling fraction can be expresse > d as real or integer.

^group(^varlist^)^ specifies that the ^alpha^ sample is to be drawn within each > set of values of varlist, thus maintaining the proportion of each group.

^generate(^varlist^)^ specifies variable names for three generated variables.

^eps(^#^)^ specifies a typically small number so that a case that is in the ris > k at time set at t is represented in the expanded data by an "infinitesimal" episode > (t-eps,t]. ^eps^ should be set to a number that is small compared to the measurement u > nit of time. ^eps^ defaults to 1E-3.

^seed(^#^)^ specifies seed for random sampling.

^noshow^ prevents ^stcascoh^ from showing the names of the key ^st^ variables.

Tip: if the cohort has already been sampled -------------------------------------------

^stcascoh^ can be used to prepare the dataset for case-cohort analysis when the > cohort has been previously sampled. The steps are as follows:

1) Divide the data set in two files: the first one for subcohort observations, > the second for additional cases.

2) Appropriately -stset- the new files.

3) Process the subcohort file with -stcascoh- setting alpha just below 1 (i.e. > .999) and save this file. Do some assert to verify that _subco _wSelpre are coded as you expect.

4) Process the additional cases file with -stcascoh- setting alpha = 0 and save > this second file. Also for this file some -assert- is appropriate to verify that _subco _wSelp > re are coded as you expect.

5) Append the two files you generated.

6) Adapt the Barlow weights. Suppose you sampled 1000 out of 16000 subjects. Yo > u should type ^replace _wBarlow=ln(16000/1000) if _wBarlow>0^

Examples --------

. ^stcascoh, alpha(20)^ . ^stcascoh afe yfe ln_exp, alpha(0.3) gen(mycohort) group(race) seed(9 > 87654321)^

Also see --------

Manual: ^[R] stcox [R] sttocc^ On-line: help for @stcox@ @stselpre@ @sttocc@

Reference ---------

Barlow WE, Ichicawa L, Rosner D, and Izumi S: Analysis of Case-Cohort Designs. Journal Clinical Epidemiology 1999; 52: 1165-1172.

Author ------

Enzo Coviello Unita' di Epidemiologia e Statistica Az. USL BA/1 70053 Andria (Bari) Italy enzo.coviello@@alice.it