Flexible parametric models for survival-time data
stpm [varlist] [if exp] [in range] , model_complexity scale(hazard|normal|odds) [all cluster(cluster_var) left(leftvar)) mlmaxopts(ml_maximize_options) noconstant nolog noorthog offset(offsetvar) q(orthog_matrix) spline(splinevar derivativevar) stratify(strat_varlist) technique(algorithm_spec) theta(est|#)]
where
model_complexity is df(#) or stpmdf(#) or knots([l|%]knotlist) or knots(u#1 #2 #3).
stpm is for use with survival-time data; see st. You must have stset your data before using this command; see help st. Weights are to be set by stset. All weight-types are supported.
The syntax of predict following stpm is
predict [type] newvarname [if exp] [in range] [, statistic at_stuff noconstant nooffset stdp store(global_macro_name [#]) time(#|vn) zero]
where at_stuff is varname # |vn [varname #|vn ...]) or at(@varname #|string) or at(+#)
and statistic is
xb index (linear predictor for equation [xb]); the default cumodds log cumulative odds function cumhazard log cumulative hazard function normal Normal deviate function spline fitted spline function dzdy derivative of fitted spline function w.r.t. ln(_t) density density function deviance deviance residuals hazard hazard function martingale martingale residuals survival survival function failure failure (cumulative incidence) function centile(#|varname) # th centile of survival time distribution (or centiles stored in varname) tvc(varname) time-varying coefficient for varname (see Options for predict)
All statistics are available both in and out of sample; type "predict ... if e(sample) ..." if wanted only for the estimation sample. The default is linear prediction of the covariate part of the model, i.e. for statistic xb.
Not all of the options for predict are available with every statistic. The SE (stdp option of predict) is not available for every statistic.
Description
stpm fits spline-based distributional models to right- or interval-censored data. varlist is a set of covariates. Multistate models are also supported.
Options for stpm
Note that the complexity of the spline part of the model is defined by df() or stpmdf() or knots(). Just one of these options must be specified.
all includes out-of-sample observations in generated variables which contain the spline basis functions.
cluster(cluster_var)) adjusts standard errors for intra-group correlation, where the grouping variable is cluster_var; implies robust
df(#) specifies the degrees of freedom for the natural spline function. # must be between 1 and 6. The knots() option is not applicable and the knots are placed at the following centiles of the distribution of the uncensored log times:
--------------------------- df Centile positions --------------------------- 1 (no knots) 2 50 3 33 67 4 25 50 75 5 20 40 60 80 6 17 33 50 67 83 >6 (not allowed) ---------------------------
knots([l|%]knotlist) (syntax 1) defines the internal knot positions for the spline. If you specify knots(knotlist) then knotlist should consist of values of log time. If you specify knots(lknotlist) then the values in knotlist are taken to be times and are automatically log transformed by stpm. (This is a convenience feature; it is easier to enter times than log times). If you specify knots(%knotlist) then the values in knotlist are taken to be centile positions in the distribution of the uncensored log times.
knots(u#1 #2 #3) (syntax 2) also defines the internal knots for the spline. #1 knots are assigned at random uniformly distributed positions between #2 and #3, where #2 is the lowest centile position of the uncensored log times you wish to entertain and #3 is the highest. A suggested choice is #2=10, #3=90; knots are to be placed at random positions between the 10th and 90th centiles of the uncensored log times.
left(leftvar) specifies that some or all of the survival-time observations are interval-censored. The rules for specifying values of leftvar and their meanings in terms of interval censoring are as follows:
------------------------------------------------------------------- Value of leftvar _d Meaning ------------------------------------------------------------------- . or _t 0 Right censored at _t . or _t 1 Event at _t 0 0 Right censored at _t 0 1 Interval censored, event in (0,_t] <_t 0 Late entry at leftvar, right censored at _t <_t 1 Interval censored, event in [leftvar,_t] -------------------------------------------------------------------
Note that stpm support datasets with late entry and interval censoring only when late entry is specified through leftvar and the value of _d, as given in the above table.
mlmaxopts(ml_maximize_options) control the likelihood maximization process; seldom used. The difficult option is probably the most important. See ml##ml_maxopts for details.
noconstant suppresses the constant term in the [xb] equation.
nolog suppresses the iteration log while the model is fit.
noorthog suppresses orthogonalisation of the spline basis functions. You should rarely need this option, which is provided for compatibility with early versions of stpm, and for pedagngic reasons.
offset(offsetvar) defines the offset for the [xb] equation. offsetvar is added to the linear predictor.
q(orthog_matrix) is required with the spline() option, unless the noorthog option was used when the model was fitted. The Q-matrix orthog_matrix is a linear transformation between the regression coefficients on the orthogonalised and original spline basis functions. It may be stored following a run of stpm by extracting it from e(Q):
stpm ... matrix q = e(Q)
scale(hazard|normal|odds) is not optional and specifies the scale of the model. The hazard, normal and odds options fit models on the scale of the log cumulative hazard, normal equivalent deviate or log cumulative odds of failure respectively.
spline(splinevar derivativevar) allows you to specify the baseline spline function and its derivative with respect to ln(_t). For a given model where the spline function has been estimated, splinevar can be created by using for example
predict splinevar, spline zero
derivativevar may be created by using for example
predict derivativevar, dzdy
Note also that the noconstant option of predict subtracts the constant term from the spline function. This will ensure that if the model is refitted using splinevar and derivativevar, the constant will be reported as (close to) zero.
stpmdf(#) is identical to df(). stpmdf() is for inputting the df required for stpm to mfp, which has its own df() option.
stratify(strat_varlist) stratifies the spline functions according to the variables in strat_varlist. It will rarely make sense for the variables in strat_varlist not to be among those in varlist, but this is not checked.
technique() specifies how the likelihood function is to be maximized by ml. The following algorithms are currently implemented in ml: nr (the default), bhhh, dfp, bfgs. For further details, see ml or the book Maximum Likelihood Estimation with Stata, 2d Edition (Gould, Pitblado, and Sribney 2003).
theta(est|#) applies only with scale(odds) and estimates the transformation parameter (theta) or performs estimation with theta set to #. The transformation of the (baseline) survival function S_0(t) is then
g_theta(S_0(t)) = ln(S_0(t)^(-theta)-1)/theta)
theta = 0 corresponds to the cumulative hazards model. With theta(est), theta is estimated and presented on a log scale, i.e. ln(theta). With theta(#), # must be positive.
Options for predict
at(varname #|vn [varname #|vn ...]) (syntax 1) computes the various statistics at value(s) (# or vn)... of model covariates varname..., where vn means "variable name". The at() option is a convenient way of specifying out-of-sample prediction for some or all of the covariates in the model. Covariates in stpm's varlist but NOT listed in at() are used in computing predicted values, unless the zero option is specified, in which case adjustment is to value 0 of such predictors.
at(@varname #|string)(syntax 2) computes the various statistics at the mean of the covariate values for observations in which varname equals # (if varname is numeric) or in which varname equals string (if varname is a string variable). varname may typically be an identification number, e.g. a row number of the dataset. Note that varname is not usually one of the covariates in the model (i.e. it is not a member of varlist).
at(+#)(syntax 3) computes the various statistics after adding # to the linear predictor from the [xb] equation. This can be used to examine the effects of applying specific offsets to the predicted values on other statisics of interest.
deviance creates deviance-like residuals. The same formula is used as in streg.
martingale creates martingale-like residuals. The same formula is used as in streg.
noconstant removes the constant (if any) in equation [xb] when computing a statistic of interest. This includes the fitted spline function using the spline option.
nooffset is relevant only if you specified offset() with stpm. It modifies the calculations made by predict, xb so that they ignore the offset variable. The linear prediction for equation [xb] is treated as x*b rather than x*b + offset.
stdp computes the standard error of statistics xb, cumhazard, cumodds, normal, dzdy, or of the log (note: log) survival time for centile(). stdp is also implemented for hazard with hazard-scaled models (scale(hazard)) only, when it calculates the S.E. of the log (note: log) hazard function. stdp is not implemented for other statistics, but note that confidence intervals for the survival function may be found by back-transformation of confidence intervals for the cumulative hazard or odds or normal function, and those for the hazard function similarly, through the formula exp(ln(hazard) +/- z * SE(log hazard)).
spline computes the fitted spline function. This incorporates all covariate effects and by default is evaluated at the observed failure times. Use of the zero option evaluates the spline with all covariate values set to zero (i.e. the baseline spline). If there are variables in the model with time-dependent effects, specified through the stratify() option, and the time(#) option is also used, the result is effectively the prognostic index evaluated at time #.
store(global_macro_name [#]) stores the contents of newvarname in a global macro called global_macro_name. Successive non-missing values are stored, separated by spaces. Optionally, rounding to # decimal places is allowed, to save space and make the results more readable. The store() option is not often used, but may be helpful when passing the output to another program.
time(#|vn) predicts at time # or at the time values in vn. If time() is not specified, prediction is made for time _t.
tvc(varname) stands for "time-varying coefficient" and computes the estimated coefficient for varname, a covariate in stpm's varlist. If varname is "time-fixed", then newvarname will be a constant, namely [xb]_b[varname]. If varname is included in strat_varlist, then newvarname will depend on _t and may be interpreted as the time-varying effect of varname on the chosen scale of the model (proportional hazards, proportional odds or probit). For example, in a hazards-scale model (scale(hazard)), newvarname multiplied by varname will give an estimate of the time-varying log cumulative hazard ratio for varname (compared with varname = 0) at every observed value of varname. newvarname alone will give the log cumulative hazard ratio for a one-unit change in varname. Note that the time-varying log cumulative hazard ratio for varname will NOT be identical to the time- varying log hazard ratio for varname.
zero predicts at zero values of covariates in varlist and similarly for strat_varlist, if stratify() is specified. See also option at().
Remarks
Let t denote time. stpm works by first calculating the survival function S(t) non-parametrically by using the Kaplan-Meier estimator. The procedure is illustrated for proportional hazards models, specified by option scale(hazard). S(t) is converted to an estimate of the log cumulative hazard function Z(t) by the formula
Z(t) = ln(-ln S(t))
This Z(t) is then smoothed on ln(t) using regression splines with knots placed at certain quantiles of the distribution of t. The knot positions are chosen automatically if the spline complexity is specified by the df() option, or manually by way of the knots() option. (Note that the knots are values of ln(t), not t.) Denote the predicted values of the log cumulative hazard function by Zhat(t). The density function f(t) is
f(t) = -dS(t)/dt = dS/dZhat dZhat/dt = S(t) exp(Zhat) dZhat(t)/dt
dZhat(t)/dt is computed from the regression coefficients of the fitted spline function. A smoothed survival function is calculated as
Shat(t) = exp(-exp Zhat(t)).
The hazard function is calculated as f(t)/Shat(t).
If varlist is specified, the baseline survival function (i.e. at zero values of the covariates) is used instead of the survival function of the raw observations. With df(1) a Weibull model is fitted.
With scale(normal), smoothing is of the Normal quantile function, invnorm(1-S(t)), instead of the log cumulative hazard function. With df(1) a lognormal model is fitted.
With scale(odds), smoothing is of the log odds of failure function, ln((1-S(t))/S(t)), instead of the log cumulative hazard function. With df(1) a log-logistic model is fitted.
Estimation is performed by maximum likelihood. Optimisation uses the default technique (nr, meaning Stata's version of Newton-Raphson iteration.
As an example of output from stpm, consider the following. A stratifying variable nn has two levels, 0 and 1. There are two covariates, x5a and x5b, with effects assumed to act proportionately on the cumulative hazard function:
. stpm x5a x5b nn, stratify(nn) scale(odds) df(2)
initial: log likelihood = -1124.6562 rescale: log likelihood = -1124.6562 rescale eq: log likelihood = -1124.6562 Iteration 0: log likelihood = -1124.6562 Iteration 1: log likelihood = -1112.096 Iteration 2: log likelihood = -1111.5188 Iteration 3: log likelihood = -1111.5173 Iteration 4: log likelihood = -1111.5173 Number of obs = 1 > 289 Wald chi2(1) = 10 > .48 Log likelihood = -1111.5173 Prob > chi2 = 0.0 > 012 --------------------------------------------------------------------------- > --- _t | Coef. Std. Err. z P>|z| [95% Conf. Interv > al] ---------+----------------------------------------------------------------- > --- s0 | nn | -1.735629 .5360622 -3.238 0.001 -2.786292 -.6849 > 664 _cons | 3.416165 .4057033 8.420 0.000 2.621001 4.211 > 329 ---------+----------------------------------------------------------------- > --- s1 | nn | -.1167243 .046288 -2.522 0.012 -.2074471 -.0260 > 015 _cons | .1590071 .0349978 4.543 0.000 .0904127 .2276 > 016 ---------+----------------------------------------------------------------- > --- xb | x5a | .9971947 .1749101 5.701 0.000 .6543773 1.340 > 012 x5b | 1.844591 .2142359 8.610 0.000 1.424696 2.264 > 486 nn | -.5830707 .3702606 -1.575 0.115 -1.308768 .1426 > 267 _cons | -2.039426 .2662178 -7.661 0.000 -2.561203 -1.517 > 649 --------------------------------------------------------------------------- > --- Deviance = 2223.035 (1289 observations.)
Notes on the output
1. The equations: s0 and s1 represent the first and second spline basis functions, with [s0]_cons and [s1]_cons being the coefficients for value 0 of the stratifying variable nn, and [s0]_cons+[s0]_b[nn] and [s1]_cons+[s1]_b[nn] the coefficients for value 1 of nn.
2. If you wanted to test for the need to stratify by nn you could type test [s0]nn [s1]nn. This would give a 2 degree of freedom Wald test of any difference between the spline functions for nn.
3. The value of "Wald chi2(1)" should be ignored.
Reproducing the model
Sometimes one wishes to reproduce a model, that is, to incorporate the linear predictor in [xb] and the spline function representing the baseline distribution function without re-fitting any parameters. This can be done by using the spline, xb and dzdy options of predict to create the necessary variables, and storing the matrix e(Q) for later use. The model is reproduced by using the spline(), offset(), noconstant and q() options of stpm. The reason for the use of noconstant with predict, xb is that the constant of the [xb] equation is already included in the spline function by predict, spline and should not be introduced again. A specific example:
. stpm model_specifications . predict sp, spline zero . predict dz, dzdy . predict xb, xb noconstant . matrix Q = e(Q)
To refit the model in the subset of the data for which z equals 1 without re-estimating any parameters (only the deviance is displayed):
. stpm if z==1, offset(xb) spline(sp dz) q(Q) noconstant
To re-estimate the constant in the [xb] equation in a subset of the data:
. stpm if z==1, offset(xb) spline(sp dz) q(Q)
At present, the approach cannot be used to reproduce time-varying models, i.e. models fitted with the stratify() option.
Stored results
stpm returns the deviance (minus twice the maximised log likelihood) and the Akaike Information Criterion (AIC) in scalars e(dev) and e(aic) respectively. Additional quantities are stored in e() functions and may be inspected by using ereturn list.
Examples
. stpm, scale(hazard) df(1) . stpm, df(3) scale(normal) . stpm, df(3) scale(odds) . stpm, df(3) scale(odds) theta(2) . stpm, df(3) scale(odds) theta(est) . stpm treatmnt, scale(odds) knots(%10 90) . xi: stpm x1 x2 x3 i.pop, df(2) scale(o) stratify(i.pop) . stpm x1 x2 x3, df(3) scale(h) left(left)
. predict s0, survival zero [baseline survival function] . predict surv3, survival time(3) [survival probs at time 3] . predict median, centile(50) [median survival, given covars] . predict semedian, centile(50) stdp . predict h0, hazard zero [baseline hazard function] . predict h1, hazard zero at(z 1) . predict lnH, cumhazard at(@@id 25) [at covars for obs #id[25]]
. gen hr = h1/h0 [hr is defined for all obs.] . predict bz, tvc(z) [time-varying coeff for z] . predict sbz, tvc(z) stdp [SE of bz]
. stpm, scale(hazard) df(2) . gen times = _n in 1/10 . predict surv, survival time(times) store(survlist 4) . display "$survlist"
. stpm x1 x2 x3, df(3) scale(h) . matrix q = e(Q) . predict spline, spline zero noconstant . predict dspline, dzdy . stpm x1 x2 x3, spline(spline dspline) scale(h) q(q)
Auxiliary ado-files
frac_spl, frac_s3b, mlsurvlf, stpm_p.
Author
Patrick Royston, MRC Clinical Trials Unit, London. patrick.royston@ctu.mrc.ac.uk
References
P. Royston. 2001. Flexible alternatives to the Cox model, and more. The Stata Journal 1:1-28.
P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197.
Also see