help for stkerhaz

Baseline Hazard Estimates via Kernel Smoother and plots

stkerhaz [if exp] [in range] , bwidth(# # #) [ kerkode(#) npoint(#) basecha(varname) strata(varname) tmax ci level(#) outfile(filename[,replace]) per(#) graph_options ]

stkerhaz is for use with survival-time data; see help st. You must have stset your data before using this command; see help stset. stkerhaz needs levels7 which you can download by describe http://www.stata.com/stb/stb60/dm90. See findit and ssc install to install user-written additions from the net.


Baseline Hazard Plot with 95% confidence bounds . stcox, estimate basech(H) . stkerhaz,bwidth(10) ci

Baseline Hazard Plot with two curves for 5 and 10 time units bandwidth . stcox, estimate basech(H) . stkerhaz,bwidth(5 10)

Baseline Hazard Plot with as much curves as hormon values. The estimates are saved in \currentfolder\myhaz . stcox, estimate basech(H) strata(hormon) or . sts gen H = na, by(hormon) . stkerhaz,bwidth(10) strata(hormon) out(myhaz)

Adjusted Baseline SMR Plot . xi: stcox i.yfecat i.afecat, basech(H) offset(lograte) . stkerhaz,b(5)


stkerhaz computes nonparametric estimates of the baseline hazard or baseline SMR and draws the graph of the results. This command can be used after stcox. In this case it requires that you previously specified stcox's basech() option; see stcox. Otherwise in stkerhaz's basecha option you can specify varname storing cumulative baseline hazard. Actually stkerhaz can be used even to smooth a cumulative excess mortality function, so achieving a smoothed estimate of excess mortality function.


bwidth(# # # #) specifies the window half-width to be used. bwidth is not optional. At least one bandwidth must be specified. Up to four bandwidths can be used. Then, curves are drawn for each bandwidth.

kerkode(#) specifies the weight function (kernel) to be used according to the following numerical codes: 1 = Uniform 2 = Epanechnikov (Default) 3 = Biweight

Asymmetric kernel (see Remarks below) is computed where appropriate.

npoint(#) specifies the number of equally spaced points in the range used for the estimation. Default is 150. Unless tmax is used, the range of points starts at the lowest _t0 and stops at the last death time.

basecha(varname) specifies the variable storing baseline cumulative hazard.

strata(varname) option is intended for use in conjunction with the strata option of stcox or by option of sts gen. It enables to calculate and graph up to four baseline hazard curves for corresponding numeric values of strata or by variable. If there are more values of the strata variable it needs to save estimates in a file and then graph them as wished.

tmax sets the starting point of range at the time of the earliest death.

ci allows to plot confidence bounds around baseline smoothed hazard. Note that they are correct only when Baseline Cumulative Hazard derives from an unadjusted model. Multiple curves and ci cannot be plotted at once.

level(#) specifies the confidence level, in percent, for the pointwise confidence bounds. Default is 95.

outfile(filename[,replace]) saves in filename values used to plot. The variable containing baseline hazard estimates is prefixed by KS_. If confidence bounds are plotted the standard error and pointwise high and low confidence interval, based on a log transformation, are saved as well in the variables prefixed by KS_SE_, KS_HI_ and KS_LO_. Rest of names specifies bandwidth or value of stratavar which they refer to. A variable named Gridpoint saves equally spaced data points where estimates are calculated.

per(#) defines the time units in which the estimated hazards will be reported. If the time analysis is in year, specifying per(1000) results in the graph are in rates per 1000 person-years.


The kernel-smoothed hazard estimated by stkerhaz uses the method described by Breslow and Day (1986, pp 178-229) and by Klein and Moeschberger (2003, pp 166-177) first introduced by Ramlau-Hansen (see formula 5.18). This estimate is simply a weighted average of the increments in cumulative baseline hazard, where weights are a kernel function of ((t_target - t_obs) / bandwidth) defined in the interval [-1,+1]. t_target are time points at which baseline hazard is estimated and t_obs are times at which increment of cumulative baseline hazard is actually observed. Kernel weigths are defined as: Uniform K(z) = 0.5 if |z| < 1 and K(z) = 0 otherwise > Epanechnicov K(z) = 0.75(1-z^2) if |z| < 1 and K(z) = 0 otherwise > Biweight K(z) = 0.975(1-z^2)^2 if |z| < 1 and K(z) = 0 otherwis > e

When t_target or, in the right-hand tail, (last_death_time - t_target) is smaller than bandwidth asymmetric kernels are computed according to the formulas in Klein and Moeschberger's book (2003, pp 167-168).

As pointed out by Breslow and Day, an estimate of cumulative baseline SMR can be obtained incorporating into a Cox model as an offset term the logarithm of time-dependent standard rates. Cumulative SMR can be smoothed using the same method to yield non parametric estimates of SMR at various points in time analysis axis.

They say: "Cumulative baseline mortality or incidence rates are not as informative as they might appear at first sight. They tend to overemphasize the jumps at very high times at which the estimate is least stable. Also, time-specific rates are usually of greater intrinsic interest than the cumulative rate." Furthermore, graphing baseline hazard or SMR, trend along time axis can be appreciated showing when a rise or a decline appears and if it is sharp or gradual. stcox or sts gen can create new variable containing cumulative baseline hazard unadjusted or adjusted for covariates or estimated in separate groups. This command is aimed at easily deriving from this stored result an estimate of the baseline hazard.

Klein and Moeschberger also advice to use the kernel smoothing technique to compute a smoothed estimate of the excess mortality function starting from a cumulative excess mortality function.

Bandwidths have to be chosen being aware that small bandwidth (small smoothing) yields hazard (or SMR) estimates affected by random noise, while large bandwidth (large smoothing) blurs the structure of the data. Visual appearance of the graph can address the selection, although objective criteria exist.

Also see

Manual: [R] kdensity, [R] st stcox [R] st sts generate, [R] st sts graph On-line: help for kernreg, bhcalc, sthaz if installed


A previous version of stkerhaz computed more kinds of kernel weigths, but it did not compute asymmetric kernel in the tails of the analysis time. On request I can provide this old version.


Breslow, N. E. and Day, N. E. Statistical Methods in Cancer Research. Volume II - The Design and analysis of cohort studies. Lyon: International Agency for Research on Cancer, 1987.

Klein, J. P. and Moeschberger, M. L. Survival Analysis Techniques fo Censored and Truncated Data (2nd Edition). New York: Springer-Verlag, 2003.


Enzo Coviello, Azienda U.S.L. BA/1, Italy enzo.coviello@tin.it