help stpm2
also see: stpm, stpm2 postestimation
-------------------------------------------------------------------------------
Title
stpm2 -- Flexible parametric survival models
Syntax
stpm2 [varlist] [if] [in] [, options]
options Description
-------------------------------------------------------------------------
Model
bhazard(varname) invokes relative survival models where
varname holds the expected mortality
rate (hazard) at the time of death
bknots(numlist) boundary knots for baseline
bknotstvc(knots list) boundary knots for time-dependent effects
cure fit a cure model
df(#) degrees of freedom for baseline hazard
function
dftvc(df_list) degrees of freedom for each time-dependent
effect
failconvlininit automatically try lininit option if
convergence fails
knots(numlist) knot locations for baseline hazard
knotstvc(numlist) knot locations for time-dependent effects
knscale(scale) scale for user-defined knots (default
scale is time)
noconstant suppress constant term
rcsbaseoff do not include baseline spline variables
noorthog do not use orthogonal transformation of
splines variables
scale(scalename) specifies the scale on which the survival
model is to be fitted
stratify(varlist) for backward comapatibility with stpm
theta(est|#) for backward comapatibility with stpm
tvc(varlist) varlist of time varying effects
Reporting
alleq report all equations
eform exponentiate coefficients
keepcons do not drop constraints used in ml routine
level(#) set confidence level; default is level(95)
showcons list constraints in output
Max options
constheta(#) constrain value of theta when using
Aranda-Ordaz family of link functions
inittheta(#) initial value of theta (default 1: log
cumulative odds scale)
lininit obtain initial values by first fitting a
linear function of ln(time)
maximize_options control the maximization process; seldom
used
-------------------------------------------------------------------------
You must stset your data before using stpm2; see [ST] stset.
fweights, iweights, and pweights may be specified using stset; [ST]
stset.
Description
stpm2 fits flexible parametric survival models (Royston-Parmar models).
stpm2 can be used with single- or multiple-record or single- or
multiple-failure st data. Survival models can be fitted on the log
cumulative hazard scale, the log cumulative odds scale, the standard
normal deviate (probit) scale, or on a scale defined by the value of
theta using the Aranda-Ordaz family of link functions.
stpm2 can fit the same models as stpm, but is more flexible in that it
does not force the knots for time-dependent effects to be the same as
those used for the baseline distribution function. In addition, stpm2 can
fit relative survival models by use of the bhazard() option.
Post-estimation commands have been extended over what is available in
stpm. stpm2 is noticeably faster than stpm.
See [ST] streg for other (standard) parametric survival models.
Options
+-------+
----+ Model +------------------------------------------------------------
bhazard(varname) is used when fitting relative survival models. varname
gives the expected mortality rate at the time of death/censoring.
stpm2 gives an error message when there are missing values of
varname, since this usually indicates that an error has occurred when
merging the expected mortality rates.
bknots(knotslist) knotslist is a two-element numlist giving the boundary
knots. By default these are located at the minimum and maximum of the
uncensored survival times. They are specified on the scale defined by
knscale().
bknotstvc(knotslist) knotslist gives the boundary knots for any
time-dependent effects. By default these are the same as for the
bknots option. They are specified on the scale defined by knscale().
For example,
bknotstvc(x1 0.01 10 x2 0.01 8)
cure is used when fitting cure models. It forces the cumulative hazard to
be constant after the last knot. When the df() option is used
together with the cure option the internal knots are placed evenly
according to centiles of the distribution of the uncensored log
survival times except one that is placed at the 95th centile. Cure
models can only be used when modelling on the log cumulative hazard
scale (scale(hazard)
df(#) specifies the degrees of freedom for the restricted cubic spline
function used for the baseline function. # must be between 1 and 10,
but usually a value between 1 and 4 is sufficient, with 3 being the
default. The knots() option is not applicable if the df() option is
specified. The knots are placed at the following centiles of the
distribution of the uncensored log survival times:
------------------------------------------------------------
df knots Centile positions
------------------------------------------------------------
1 0 (no knots)
2 1 50
3 2 33 67
4 3 25 50 75
5 4 20 40 60 80
6 5 17 33 50 67 83
7 6 14 29 43 57 71 86
8 7 12.5 25 37.5 50 62.5 75 87.5
9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9
10 9 10 20 30 40 50 60 70 80 90
------------------------------------------------------------
Note that these are interior knots and there are also boundary knots
placed at the minimum and maximum of the distribution of uncensored
survival times.
When the cure option is used df must be between 3 and 11 and the
default location of the knots are as follows.
------------------------------------------------------------
df knots Centile positions
------------------------------------------------------------
3 2 50 95
4 3 33 67 95
5 4 25 50 75 95
6 5 20 40 60 80 95
7 6 17 33 50 67 83 95
8 7 14 29 43 57 71 86 95
9 8 12.5 25 37.5 50 62.5 75 87.5 95
10 9 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 95
11 10 10 20 30 40 50 60 70 80 90 95
------------------------------------------------------------
dftvc(df_list) gives the degrees of freedom for time-dependent effects in
df_list. The potential degrees of freedom are listed under the df()
option. With 1 degree of freedom a linear effect of log time is
fitted. If there is more than one time-dependent effect and
different degress of freedom are requested for each time-dependent
effect then the following syntax applies:
dftvc(x1:3 x2:2 1)
This will use 3 degrees of freedom for x1, 2 degrees of freedom for
x2 and 1 degree of freedom for all remaining time-dependent effects.
failconvlininit automatically tries the lininit option of the model fails
to converge.
knots(# [# ...]) specifies knot locations for the baseline distribution
function, as opposed to the default locations set by df(). Note that
the locations of the knots are placed on the scale defined by
knscale. However, the scale used by the restricted cubic spline
function is always log time. Default knot positions are determined by
the df() option.
knotstvc(knotslist) defines numlist knotslist as the location of the
interior knots for time-dependent effects. If different knots are
required for different time-dependent effects the option is
specified, for example, as follows:
knotstvc(x1 1 2 3 x2 1.5 3.5)
knscale(scale) sets the scale on which user-defined knots are specified.
knscale(time) denotes the original time scale, knscale(log) the log
time scale and knscale(centile) specifies that the knots are taken to
be centile positions in the distribution of the uncensored log
survival times. The default is knscale(time).
noconstant; see [ST] estimation options.
noorthog suppresses orthogonal transformation of spline variables.
rcsbaseoff drops baseline spline variables from the model. With this
option you will generally want to specify your baseline separatly in
two or more strata. For example, the following code will fit a
separate baseline hazard for males and females.
stpm2 males females, scale(hazard) tvc(males females) dftvc(3) nocons
rcsbaseoff
Note that identical fitted values would be obtained if using the
following.
stpm2 females, df(3) scale(hazard) tvc(females) dftvc(3)
scale(scalename) specifies on which scale the survival model is to be
fitted.
scale(hazard) fits a model on the log cumulative hazard scale, i.e.
the scale of ln(-ln S(t)). If no time-dependent effects are
specified, the resulting model has proportional hazards.
scale(odds) fits a model on the log cumulative odds scale, i.e. ln((1
- S(t))/S(t)). If no time-dependent effects are specified then this
is a gives a proportional odds model.
scale(normal) fits a model on the normal equivalent deviate scale
(i.e. a probit link for the survival function, invnorm(1 - S(t))).
scale(theta) fits a model on a scale defined by the value of theta
for the Aranda-Ordaz family of link functions, i.e.
ln((S(t)^(-theta) - 1)/theta). Note that theta = 1 corresponds to a
proportional odds model and theta = 0 to a proportional cumulative
hazards model.
stratify(varlist) is provided for compatibility with stpm. Members of
varlist are modelled with time-dependent effects. See the tvc() and
dftvc() options for stpm2's way of specifying time-dependent effects.
theta(est|#) is provided for compatibility with stpm. est requests that
theta be estimated, whereas # fixes theta to #. See constheta() and
inittheta() for stpm2's way of specifying theta.
tvc(varlist) gives the name of the variables that are time-dependent.
Time-dependent effects are fitted using restricted cubic splines.
The degrees of freedom are specified using the dftvc() option.
+-----------+
----+ Reporting +--------------------------------------------------------
alleq reports all equations used by ml. The models are fitted by using
various constraints for parameters associated with the derivatives of
the spline functions. These parameters are generally not of interest
and thus are not shown by default. In addition, an extra equation is
used when fitting delayed entry models, and again this is not shown
by default.
eform reports the exponentiated coefficents. For models on the log
cumulative hazard scale scale(hazard) this gives hazard ratios if the
covariate is not-time dependent. Similarly, for models on the log
cumulative odds scale scale(odds) this option will give odds ratios
for non time-dependent effects.
keepcons prevents the constraints imposed by stpm2 on the derivatives of
the spline function when fitting delayed entry models being dropped.
By default, the constraints are dropped.
level(#) specifies the confidence level, as a percentage, for confidence
intervals. The default is level(95) or as set by set level.
showcons The constraints used by stpm2 for the derivatives of the spline
function and when fitting delayed entry models are not listed by
default. Use of this option lists them in the output.
+-------------+
----+ Max options +------------------------------------------------------
constheta(#) constrains the value of theta, i.e. it is treated as a known
constant.
inittheta(#) gives an initial value for theta in the Aranda-Ordaz family
of link functions.
lininit This obtains initial values by fitting only the first spline
basis function (i.e. a linear function of log survival time). This
option is seldom needed.
maximize_options; difficult, technique(algorithm_spec), iterate(#),
[no]log, trace, gradient, showstep, hessian, shownrtolerance,
tolerance(#), ltolerance(#) gtolerance(#), nrtolerance(#),
nonrtolerance, from(init_specs); see [R] maximize. These options are
seldom used, but the difficult option may be useful if there are
convergence problems when fitting models that use Aranda-Ordaz family
of link functions.
Remarks
Let t denote time. stpm2 works by first calculating the survival function
after fitting a Cox proportional hazards model. The procedure is
illustrated for proportional hazards models, specified by option
scale(hazard). S(t) is converted to an estimate of the log cumulative
hazard function Z(t) by the formula
Z(t) = ln(-ln S(t))
This estimate of Z(t) is then smoothed on ln(t) using regression splines
with knots placed at certain quantiles of the distribution of t. The knot
positions are chosen automatically if the spline complexity is specified
by the df() option, or manually by way of the knots() option. (Note that
the knots are placed on values of ln(t), not t.) Denote the predicted
values of the log cumulative hazard function by Z_hat(t). The density
function f(t) is
f(t) = -dS(t)/dt = dS/dZ_hat dZ_hat/dt = S(t) exp(Z_hat) dZ_hat(t)/dt
dZ_hat(t)/dt is computed from the regression coefficients of the fitted
spline function. The estimated survival function is calculated as
S_hat(t) = exp(-exp Z_hat(t)).
The hazard function is calculated as f(t)/S_hat(t).
If varlist is specified, the baseline survival function (i.e. at zero
values of the covariates) is used instead of the survival function of the
raw observations. With df(1) a Weibull model is fitted.
With scale(normal), smoothing is of the Normal quantile function,
invnorm(1 - S(t)), instead of the log cumulative hazard function. With
df(1) a lognormal model is fitted.
With scale(odds), smoothing is of the log odds of failure function, ln((1
- S(t))/S(t)), instead of the log cumulative hazard function. With df(1)
a log-logistic model is fitted.
Estimation is performed by maximum likelihood. Optimisation uses the
default technique (nr, meaning Stata's version of Newton-Raphson
iteration.
Examples
---------------------------------------------------------------------------
Setup
webuse brcancer
stset rectime, failure(censrec = 1)
Proportional hazards model
stpm2 hormon, scale(hazard) df(4) eform
Proportional odds model
stpm2 hormon, scale(odds) df(4) eform
Time-dependent effects on cumulative hazard scale
stpm2 hormon, scale(hazard) df(4) tvc(hormon) dftvc(3)
User defined knots at centiles of uncensored event times
stpm2 hormon, scale(hazard) knots(20 50 80) knscale(centile)
Author
Paul Lambert, University of Leicester, UK. (
paul.lambert@leicester.ac.uk)
The option to fit cure models was implemented by Therese Andersson,
Karolinska Institutet, Stockholm, Sweden (therese.m-l.andersson@ki.se)
Various other additions and suggestions by Patrick Royston, MRC Clinical
Trials Unit, London, UK. (pr@ctu.mrc.ac.uk)
References
P. C. Lambert and P. Royston. 2009. Further development of flexible
parametric models for survival analysis. Stata Journal, in press.
C. P. Nelson, P. C. Lambert, I. B. Squire and D. R. Jones. 2007.
Flexible parametric models for relative survival, with application in
coronary heart disease. Statistics in Medicine 26:5486–5498.
P. Royston. 2001. Flexible alternatives to the Cox model, and more. The
Stata Journal 1:1-28.
P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and
proportional-odds models for censored survival data, with application
to prognostic modelling and estimation of treatment effects.
Statistics in Medicine 21:2175-2197.
Also see
Online: [ST] stpm2 postestimation; [ST] stset, stpm