help for stsurvimpute Patrick Royston -------------------------------------------------------------------------------

Impute censored survival times

Syntax

stsurvimpute [varlist] [if] [in] [, options]

options Description ------------------------------------------------------------------------- df(#) baseline degrees of freedom for stpm2 (Royston-Parmar models) generate(newvar [,replace]) creates new variable containing observed and imputed survival times llogistic imputes using log-logistic distribution lnormal imputes using lognormal distribution scale(scalename) specifies the scale on which the Royston-Parmar model is to be fitted seed(#) sets random number seed truncate(#) truncate survival distribution at # uniform(varname) uses varname to supply uniformly distributed random numbers weibull imputes using Weibull distribution stpm2_options additional options for stpm2 -------------------------------------------------------------------------

Weights are not allowed. Note that stsurvimpute requires stpm2 to be installed - it can be downloaded from the SSC archive (see help on ssc).

Description

stsurvimpute singly imputes censored observation-times using a parametric survival model. Available basic parametric models are Weibull, log-logistic and lognormal; also supported are Royston-Parmar flexible parametric models, implemented by stpm2. Variables in varlist constitute a prognostic model that is used for predicting individual imputed survival times. Such a model should make the imputations more accurate than using only the overall distribution.

Options

df(#) specifies the degrees of freedom for Royston-Parmar models. When # > 1 Royston-Parmar models are used to impute censored survival times, and the baseline distribution function is approximated by a restricted cubic spline with # degrees of freedom. If # = 1 then a Weibull, log-logistic or lognormal distribution is assumed. Further options controlling Royston-Parmar models are available (see stpm2).

generate(newvar [, replace]) creates a new variable newvar containing observed and imputed survival times. replace allows newvar to be replaced with new data. replace may not be abbreviated.

llogistic imputes assuming a log-logistic distribution. llogistic is equivalent to df(1) scale(odds).

lnormal imputes assuming a lognormal distribution. lnormal is equivalent to df(1) scale(normal).

scale(scalename) specifies on which scale the survival model is to be fitted. scalename can be hazard, odds or normal.

seed(#) sets the random number seed before random uniform values are generated.

truncate(#) truncates the survival distribution at #. The default # is 1, meaning that the entire survival distribution is used for imputation. With, for example, truncate(0.8) the longest (but less probable) survival times, for which the survival probability is in the range 0.2 to 0.0, are not imputed. Only the range 0.2 to 1 of the survival distribution is used for imputation. The result is shorter and perhaps more realistic extreme imputed survival times.

uniform(varname) uses varname to supply uniformly distributed random numbers. By default the numbers are created internally. If the unif() option is used, the seed() option is ignored.

weibull imputes assuming a Weibull distribution. weibull is equivalent to df(1) scale(hazard).

stpm2_options are options for stpm2.

Remarks

Royston, Parmar & Altman (2008) present an example of imputing right-censored survival times in kidney cancer. The aim is to be able to use familiar and informative graphs such as scatter plots and dotplots to illustrate the relationship between the outcome and treatments, covariates, prognostic scores etc in an informative manner. Note that because the imputation of censored observations is heavily model-dependent, one cannot validly do linear regression or the like of the complete and imputed times on covariates.

Royston et al (2008) use the lognormal distribution in their analysis, and that is the default provided by stsurvimpute. However, stsurvimpute provides many other distributional models which may fit the data better than the lognormal, and so give more appropriate imputations. Choosing between the various models can be done using the Akaike Information Criterion, as discussed by Royston & Parmar (2002) - see also Royston (2001) and help on stpm2.

An important point to remember when inspecting imputed survival times is that their distribution is based on extrapolation of the modelled survival distribution into the future, on the assumption that all individuals will eventually experience the event of interest. In many cases the assumption is false - there is a 'cured fraction' who will never experience the event. The consequence is that in many instances unrealistic survival times will be imputed, particularly when a large proportion of the times are censored. The higher the censoring proportion, the less information is present on the right-hand tail of the survival distribution and the more 'wild' the imputed times are likely to be.

Examples

. stsurvimpute x1 x2 x3, gen(t_imputed) weibull

. stsurvimpute age sex stage, gen(t_imputed) llogistic

. stsurvimpute, scale(normal) df(3) gen(t_imputed) truncate(0.8) seed(101)

Author

Patrick Royston, MRC Clinical Trials Unit, London. pr@ctu.mrc.ac.uk

References

P. C. Lambert and P. Royston. 2009. Further development of flexible parametric models for survival analysis. Stata Journal, in press.

P. Royston. 2001. Flexible alternatives to the Cox model, and more. Stata Journal 1: 1-28.

P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197.

P. Royston, M. K. B. Parmar and D. G. Altman. 2008. Visualizing length of survival in time-to-event studies: a complement to Kaplan–Meier plots. Journal of the National Cancer Institute 100: 1-6.

Also see