help stpm2also see: stpm, stpm2 postestimation -------------------------------------------------------------------------------

Title

stpm2-- Flexible parametric survival models

Syntax

stpm2[varlist] [if] [in] [,options]

optionsDescription ------------------------------------------------------------------------- Modelbhazard(varname)invokes relative survival models wherevarnameholds the expected mortality rate (hazard) at the time of deathbknots(numlist)boundary knots for baselinebknotstvc(knots list)boundary knots for time-dependent effectscurefit a cure modeldf(#)degrees of freedom for baseline hazard functiondftvc(df_list)degrees of freedom for each time-dependent effectfailconvlininitautomatically try lininit option if convergence failsknots(numlist)knot locations for baseline hazardknotstvc(numlist)knot locations for time-dependent effectsknscale(scale)scale for user-defined knots (default scale is time)noconstantsuppress constant termrcsbaseoffdo not include baseline spline variablesnoorthogdo not use orthogonal transformation of splines variablesscale(scalename)specifies the scale on which the survival model is to be fittedstratify(varlist)for backward comapatibility with stpmtheta(est|#)for backward comapatibility with stpmtvc(varlist)varlist of time varying effectsReporting

alleqreport all equationseformexponentiate coefficientskeepconsdo not drop constraints used in ml routinelevel(#)set confidence level; default islevel(95)showconslist constraints in outputMax options

constheta(#)constrain value of theta when using Aranda-Ordaz family of link functionsinittheta(#)initial value of theta (default 1: log cumulative odds scale)lininitobtain initial values by first fitting a linear function of ln(time)maximize_optionscontrol the maximization process; seldom used ------------------------------------------------------------------------- You muststsetyour data before usingstpm2; see[ST] stset.fweights,iweights, andpweightsmay be specified using stset;[ST]stset.

Description

stpm2fits flexible parametric survival models (Royston-Parmar models).stpm2can be used with single- or multiple-record or single- or multiple-failurestdata. Survival models can be fitted on the log cumulative hazard scale, the log cumulative odds scale, the standard normal deviate (probit) scale, or on a scale defined by the value ofthetausing the Aranda-Ordaz family of link functions.

stpm2can fit the same models asstpm, but is more flexible in that it does not force the knots for time-dependent effects to be the same as those used for the baseline distribution function. In addition,stpm2can fit relative survival models by use of thebhazard()option. Post-estimation commands have been extended over what is available instpm.stpm2is noticeably faster thanstpm.See

[ST] stregfor other (standard) parametric survival models.

Options+-------+ ----+ Model +------------------------------------------------------------

bhazard(varname)is used when fitting relative survival models.varnamegives the expected mortality rate at the time of death/censoring.stpm2gives an error message when there are missing values ofvarname, since this usually indicates that an error has occurred when merging the expected mortality rates.

bknots(knotslist)knotslistis a two-elementnumlistgiving the boundary knots. By default these are located at the minimum and maximum of the uncensored survival times. They are specified on the scale defined byknscale().

bknotstvc(knotslist)knotslistgives the boundary knots for any time-dependent effects. By default these are the same as for the bknots option. They are specified on the scale defined byknscale().For example,

bknotstvc(x1 0.01 10 x2 0.01 8)

cureis used when fitting cure models. It forces the cumulative hazard to be constant after the last knot. When thedf()option is used together with thecureoption the internal knots are placed evenly according to centiles of the distribution of the uncensored log survival times except one that is placed at the 95th centile. Cure models can only be used when modelling on the log cumulative hazard scale (scale(hazard)

df(#)specifies the degrees of freedom for the restricted cubic spline function used for the baseline function.#must be between 1 and 10, but usually a value between 1 and 4 is sufficient, with 3 being the default. Theknots()option is not applicable if thedf()option is specified. The knots are placed at the following centiles of the distribution of the uncensored log survival times:------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 1 0 (no knots) 2 1 50 3 2 33 67 4 3 25 50 75 5 4 20 40 60 80 6 5 17 33 50 67 83 7 6 14 29 43 57 71 86 8 7 12.5 25 37.5 50 62.5 75 87.5 9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 10 9 10 20 30 40 50 60 70 80 90 ------------------------------------------------------------ Note that these are

interior knotsand there are also boundary knots placed at the minimum and maximum of the distribution of uncensored survival times.When the

cureoption is used df must be between 3 and 11 and the default location of the knots are as follows.------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 3 2 50 95 4 3 33 67 95 5 4 25 50 75 95 6 5 20 40 60 80 95 7 6 17 33 50 67 83 95 8 7 14 29 43 57 71 86 95 9 8 12.5 25 37.5 50 62.5 75 87.5 95 10 9 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 95 11 10 10 20 30 40 50 60 70 80 90 95 ------------------------------------------------------------

dftvc(df_list)gives the degrees of freedom for time-dependent effects indf_list. The potential degrees of freedom are listed under thedf()option. With 1 degree of freedom a linear effect of log time is fitted. If there is more than one time-dependent effect and different degress of freedom are requested for each time-dependent effect then the following syntax applies:

dftvc(x1:3 x2:2 1)This will use 3 degrees of freedom for

x1, 2 degrees of freedom forx2and 1 degree of freedom for all remaining time-dependent effects.

failconvlininitautomatically tries thelininitoption of the model fails to converge.

knots(#[#...])specifies knot locations for the baseline distribution function, as opposed to the default locations set bydf(). Note that the locations of the knots are placed on the scale defined byknscale. However, the scale used by the restricted cubic spline function is always log time. Default knot positions are determined by thedf()option.

knotstvc(knotslist)defines numlistknotslistas the location of the interior knots for time-dependent effects. If different knots are required for different time-dependent effects the option is specified, for example, as follows:

knotstvc(x1 1 2 3 x2 1.5 3.5)

knscale(scale)sets the scale on which user-defined knots are specified.knscale(time)denotes the original time scale,knscale(log)the log time scale andknscale(centile)specifies that the knots are taken to be centile positions in the distribution of the uncensored log survival times. The default isknscale(time).

noconstant; see[ST] estimation options.

noorthogsuppresses orthogonal transformation of spline variables.

rcsbaseoffdrops baseline spline variables from the model. With this option you will generally want to specify your baseline separatly in two or more strata. For example, the following code will fit a separate baseline hazard for males and females.

stpm2 males females, scale(hazard) tvc(males females) dftvc(3) noconsrcsbaseoffNote that identical fitted values would be obtained if using the following.

stpm2 females, df(3) scale(hazard) tvc(females) dftvc(3)

scale(scalename)specifies on which scale the survival model is to be fitted.

scale(hazard)fits a model on the log cumulative hazard scale, i.e. the scale of ln(-ln S(t)). If no time-dependent effects are specified, the resulting model has proportional hazards.

scale(odds)fits a model on the log cumulative odds scale, i.e. ln((1 - S(t))/S(t)). If no time-dependent effects are specified then this is a gives a proportional odds model.

scale(normal)fits a model on the normal equivalent deviate scale (i.e. a probit link for the survival function, invnorm(1 - S(t))).

scale(theta)fits a model on a scale defined by the value of theta for the Aranda-Ordaz family of link functions, i.e. ln((S(t)^(-theta) - 1)/theta). Note that theta = 1 corresponds to a proportional odds model and theta = 0 to a proportional cumulative hazards model.

stratify(varlist)is provided for compatibility with stpm. Members ofvarlistare modelled with time-dependent effects. See thetvc()anddftvc()options forstpm2's way of specifying time-dependent effects.

theta(est|#)is provided for compatibility with stpm.estrequests that theta be estimated, whereas#fixes theta to#. Seeconstheta()andinittheta()forstpm2's way of specifying theta.

tvc(varlist)gives the name of the variables that are time-dependent. Time-dependent effects are fitted using restricted cubic splines. The degrees of freedom are specified using thedftvc()option.

+-----------+ ----+ Reporting +--------------------------------------------------------

alleqreports all equations used by ml. The models are fitted by using various constraints for parameters associated with the derivatives of the spline functions. These parameters are generally not of interest and thus are not shown by default. In addition, an extra equation is used when fitting delayed entry models, and again this is not shown by default.

eformreports the exponentiated coefficents. For models on the log cumulative hazard scalescale(hazard)this gives hazard ratios if the covariate is not-time dependent. Similarly, for models on the log cumulative odds scalescale(odds)this option will give odds ratios for non time-dependent effects.

keepconsprevents the constraints imposed bystpm2on the derivatives of the spline function when fitting delayed entry models being dropped. By default, the constraints are dropped.

level(#)specifies the confidence level, as a percentage, for confidence intervals. The default islevel(95)or as set byset level.

showconsThe constraints used bystpm2for the derivatives of the spline function and when fitting delayed entry models are not listed by default. Use of this option lists them in the output.+-------------+ ----+ Max options +------------------------------------------------------

constheta(#)constrains the value of theta, i.e. it is treated as a known constant.

inittheta(#)gives an initial value for theta in the Aranda-Ordaz family of link functions.

lininitThis obtains initial values by fitting only the first spline basis function (i.e. a linear function of log survival time). This option is seldom needed.

maximize_options;difficult,technique(algorithm_spec),iterate(#), [no]log,trace,gradient,showstep,hessian,shownrtolerance,tolerance(#),ltolerance(#)gtolerance(#),nrtolerance(#),nonrtolerance,from(init_specs); see[R] maximize. These options are seldom used, but thedifficultoption may be useful if there are convergence problems when fitting models that use Aranda-Ordaz family of link functions.

RemarksLet t denote time.

stpm2works by first calculating the survival function after fitting a Cox proportional hazards model. The procedure is illustrated for proportional hazards models, specified by optionscale(hazard). S(t) is converted to an estimate of the log cumulative hazard function Z(t) by the formulaZ(t) = ln(-ln S(t))

This estimate of Z(t) is then smoothed on ln(t) using regression splines with knots placed at certain quantiles of the distribution of t. The knot positions are chosen automatically if the spline complexity is specified by the

df()option, or manually by way of theknots()option. (Note that the knots are placed on values of ln(t), not t.) Denote the predicted values of the log cumulative hazard function by Z_hat(t). The density function f(t) isf(t) = -dS(t)/dt = dS/dZ_hat dZ_hat/dt = S(t) exp(Z_hat) dZ_hat(t)/dt

dZ_hat(t)/dt is computed from the regression coefficients of the fitted spline function. The estimated survival function is calculated as

S_hat(t) = exp(-exp Z_hat(t)).

The hazard function is calculated as f(t)/S_hat(t).

If

varlistis specified, the baseline survival function (i.e. at zero values of the covariates) is used instead of the survival function of the raw observations. Withdf(1)a Weibull model is fitted.With

scale(normal), smoothing is of the Normal quantile function, invnorm(1 - S(t)), instead of the log cumulative hazard function. Withdf(1)a lognormal model is fitted.With

scale(odds), smoothing is of the log odds of failure function, ln((1 - S(t))/S(t)), instead of the log cumulative hazard function. Withdf(1)a log-logistic model is fitted.Estimation is performed by maximum likelihood. Optimisation uses the default technique (

nr, meaning Stata's version of Newton-Raphson iteration.

Examples--------------------------------------------------------------------------- Setup

webuse brcancer stset rectime, failure(censrec = 1)

Proportional hazards model stpm2 hormon, scale(hazard) df(4) eform

Proportional odds model stpm2 hormon, scale(odds) df(4) eform

Time-dependent effects on cumulative hazard scale stpm2 hormon, scale(hazard) df(4) tvc(hormon) dftvc(3)

User defined knots at centiles of uncensored event times stpm2 hormon, scale(hazard) knots(20 50 80) knscale(centile)

AuthorPaul Lambert, University of Leicester, UK. ( paul.lambert@leicester.ac.uk)

The option to fit cure models was implemented by Therese Andersson, Karolinska Institutet, Stockholm, Sweden (therese.m-l.andersson@ki.se)

Various other additions and suggestions by Patrick Royston, MRC Clinical Trials Unit, London, UK. (pr@ctu.mrc.ac.uk)

ReferencesP. C. Lambert and P. Royston. 2009. Further development of flexible parametric models for survival analysis. Stata Journal, in press.

C. P. Nelson, P. C. Lambert, I. B. Squire and D. R. Jones. 2007. Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in Medicine 26:5486–5498.

P. Royston. 2001. Flexible alternatives to the Cox model, and more. The Stata Journal 1:1-28.

P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21:2175-2197.

Also seeOnline:

[ST] stpm2 postestimation;[ST] stset, stpm