------------------------------------------------------------------------------- help forstkerhaz-------------------------------------------------------------------------------

Baseline Hazard Estimates via Kernel Smoother and plots

stkerhaz[ifexp] [inrange],bwidth(# # #)[kerkode(#)npoint(#)basecha(varname)strata(varname)tmaxcilevel(#)outfile(filename[,replace])per(#)graph_options]

stkerhazis for use with survival-time data; see help st. You must havestsetyour data before using this command; see help stset.stkerhazneeds levels7 which you can download by describe http://www.stata.com/stb/stb60/dm90. See findit and ssc install to install user-written additions from the net.

Examples:

Baseline Hazard Plot with 95% confidence bounds . stcox, estimate basech(H) . stkerhaz,bwidth(10) ci

Baseline Hazard Plot with two curves for 5 and 10 time units bandwidth . stcox, estimate basech(H) . stkerhaz,bwidth(5 10)

Baseline Hazard Plot with as much curves as hormon values. The estimates are saved in \currentfolder\myhaz . stcox, estimate basech(H) strata(hormon) or . sts gen H = na, by(hormon) . stkerhaz,bwidth(10) strata(hormon) out(myhaz)

Adjusted Baseline SMR Plot . xi: stcox i.yfecat i.afecat, basech(H) offset(lograte) . stkerhaz,b(5)

Description

stkerhazcomputes nonparametric estimates of the baseline hazard or baseline SMR and draws the graph of the results. This command can be used afterstcox. In this case it requires that you previously specifiedstcox'sbasech()option; see stcox. Otherwise instkerhaz'sbasechaoption you can specifyvarnamestoring cumulative baseline hazard. Actuallystkerhazcan be used even to smooth a cumulative excess mortality function, so achieving a smoothed estimate of excess mortality function.

Options

bwidth(# # # #)specifies the window half-width to be used.bwidthis not optional. At least one bandwidth must be specified. Up to four bandwidths can be used. Then, curves are drawn for each bandwidth.

kerkode(#)specifies the weight function (kernel) to be used according to the following numerical codes: 1 = Uniform 2 = Epanechnikov (Default) 3 = BiweightAsymmetric kernel (see

below) is computed where appropriate.Remarks

npoint(#)specifies the number of equally spaced points in the range used for the estimation. Default is 150. Unlesstmaxis used, the range of points starts at the lowest_t0and stops at the last death time.

basecha(varname)specifies the variable storing baseline cumulative hazard.

strata(varname)option is intended for use in conjunction with thestrataoption ofstcoxorbyoption ofsts gen. It enables to calculate and graph up to four baseline hazard curves for corresponding numeric values of strata or by variable. If there are more values of the strata variable it needs to save estimates in a file and then graph them as wished.

tmaxsets the starting point of range at the time of the earliest death.

ciallows to plot confidence bounds around baseline smoothed hazard. Note that they are correct only when Baseline Cumulative Hazard derives from an unadjusted model. Multiple curves andcicannot be plotted at once.

level(#)specifies the confidence level, in percent, for the pointwise confidence bounds. Default is 95.

outfile(filename[,replace])saves in filename values used to plot. The variable containing baseline hazard estimates is prefixed by KS_. If confidence bounds are plotted the standard error and pointwise high and low confidence interval, based on a log transformation, are saved as well in the variables prefixed by KS_SE_, KS_HI_ and KS_LO_. Rest of names specifies bandwidth or value of stratavar which they refer to. A variable named Gridpoint saves equally spaced data points where estimates are calculated.

per(#)defines the time units in which the estimated hazards will be reported. If the time analysis is in year, specifying per(1000) results in the graph are in rates per 1000 person-years.

RemarksThe kernel-smoothed hazard estimated by

stkerhazuses the method described by Breslow and Day (1986, pp 178-229) and by Klein and Moeschberger (2003, pp 166-177) first introduced by Ramlau-Hansen (see formula 5.18). This estimate is simply a weighted average of the increments in cumulative baseline hazard, where weights are a kernel function of((t_target - t_obs) /bandwidth)defined in the interval[-1,+1].t_targetare time points at which baseline hazard is estimated andt_obsare times at which increment of cumulative baseline hazard is actually observed. Kernel weigths are defined as: Uniform K(z) = 0.5 if |z| < 1 and K(z) = 0 otherwise > Epanechnicov K(z) = 0.75(1-z^2) if |z| < 1 and K(z) = 0 otherwise > Biweight K(z) = 0.975(1-z^2)^2 if |z| < 1 and K(z) = 0 otherwis > eWhen

t_targetor, in the right-hand tail,(last_death_time - t_target)is smaller thanbandwidthasymmetric kernels are computed according to the formulas in Klein and Moeschberger's book (2003, pp 167-168).As pointed out by Breslow and Day, an estimate of cumulative baseline SMR can be obtained incorporating into a Cox model as an offset term the logarithm of time-dependent standard rates. Cumulative SMR can be smoothed using the same method to yield non parametric estimates of SMR at various points in time analysis axis.

They say: "Cumulative baseline mortality or incidence rates are not as informative as they might appear at first sight. They tend to overemphasize the jumps at very high times at which the estimate is least stable. Also, time-specific rates are usually of greater intrinsic interest than the cumulative rate." Furthermore, graphing baseline hazard or SMR, trend along time axis can be appreciated showing when a rise or a decline appears and if it is sharp or gradual.

stcoxorsts gencan create new variable containing cumulative baseline hazard unadjusted or adjusted for covariates or estimated in separate groups. This command is aimed at easily deriving from this stored result an estimate of the baseline hazard.Klein and Moeschberger also advice to use the kernel smoothing technique to compute a smoothed estimate of the excess mortality function starting from a cumulative excess mortality function.

Bandwidths have to be chosen being aware that small bandwidth (small smoothing) yields hazard (or SMR) estimates affected by random noise, while large bandwidth (large smoothing) blurs the structure of the data. Visual appearance of the graph can address the selection, although objective criteria exist.

Also seeManual:

[R] kdensity,[R] st stcox[R] st sts generate,[R] st sts graphOn-line: help for kernreg, bhcalc, sthaz if installed

NoteA previous version of stkerhaz computed more kinds of kernel weigths, but it did not compute asymmetric kernel in the tails of the analysis time. On request I can provide this old version.

ReferencesBreslow, N. E. and Day, N. E. Statistical Methods in Cancer Research. Volume II - The Design and analysis of cohort studies. Lyon: International Agency for Research on Cancer, 1987.

Klein, J. P. and Moeschberger, M. L. Survival Analysis Techniques fo Censored and Truncated Data (2nd Edition). New York: Springer-Verlag, 2003.

AuthorEnzo Coviello, Azienda U.S.L. BA/1, Italy enzo.coviello@tin.it