help for stsurvimputePatrick Royston -------------------------------------------------------------------------------

Impute censored survival times

Syntax

stsurvimpute[varlist] [if] [in] [,options]

optionsDescription -------------------------------------------------------------------------df(#)baseline degrees of freedom for stpm2 (Royston-Parmar models)generate(newvar[,replace])creates new variable containing observed and imputed survival timesllogisticimputes using log-logistic distributionlnormalimputes using lognormal distributionscale(scalename)specifies the scale on which the Royston-Parmar model is to be fittedseed(#)sets random number seedtruncate(#)truncate survival distribution at#uniform(varname)usesvarnameto supply uniformly distributed random numbersweibullimputes using Weibull distributionstpm2_optionsadditional options for stpm2 -------------------------------------------------------------------------Weights are not allowed. Note that

stsurvimputerequiresstpm2to be installed - it can be downloaded from the SSC archive (see help on ssc).

Description

stsurvimputesingly imputes censored observation-times using a parametric survival model. Available basic parametric models are Weibull, log-logistic and lognormal; also supported are Royston-Parmar flexible parametric models, implemented by stpm2. Variables invarlistconstitute a prognostic model that is used for predicting individual imputed survival times. Such a model should make the imputations more accurate than using only the overall distribution.

Options

df(#)specifies the degrees of freedom for Royston-Parmar models. When#> 1 Royston-Parmar models are used to impute censored survival times, and the baseline distribution function is approximated by a restricted cubic spline with#degrees of freedom. If#= 1 then a Weibull, log-logistic or lognormal distribution is assumed. Further options controlling Royston-Parmar models are available (see stpm2).

generate(newvar[, replace])creates a new variablenewvarcontaining observed and imputed survival times.replaceallowsnewvarto be replaced with new data.replacemay not be abbreviated.

llogisticimputes assuming a log-logistic distribution.llogisticis equivalent todf(1) scale(odds).

lnormalimputes assuming a lognormal distribution.lnormalis equivalent todf(1) scale(normal).

scale(scalename)specifies on which scale the survival model is to be fitted.scalenamecan behazard,oddsornormal.

seed(#)sets the random number seed before random uniform values are generated.

truncate(#)truncates the survival distribution at#. The default#is 1, meaning that the entire survival distribution is used for imputation. With, for example,truncate(0.8)the longest (but less probable) survival times, for which the survival probability is in the range 0.2 to 0.0, are not imputed. Only the range 0.2 to 1 of the survival distribution is used for imputation. The result is shorter and perhaps more realistic extreme imputed survival times.

uniform(varname)usesvarnameto supply uniformly distributed random numbers. By default the numbers are created internally. If theunif()option is used, theseed()option is ignored.

weibullimputes assuming a Weibull distribution.weibullis equivalent todf(1) scale(hazard).

stpm2_optionsare options for stpm2.

RemarksRoyston, Parmar & Altman (2008) present an example of imputing right-censored survival times in kidney cancer. The aim is to be able to use familiar and informative graphs such as scatter plots and dotplots to illustrate the relationship between the outcome and treatments, covariates, prognostic scores etc in an informative manner. Note that because the imputation of censored observations is heavily model-dependent, one cannot validly do linear regression or the like of the complete and imputed times on covariates.

Royston et al (2008) use the lognormal distribution in their analysis, and that is the default provided by

stsurvimpute. However,stsurvimputeprovides many other distributional models which may fit the data better than the lognormal, and so give more appropriate imputations. Choosing between the various models can be done using the Akaike Information Criterion, as discussed by Royston & Parmar (2002) - see also Royston (2001) and help on stpm2.An important point to remember when inspecting imputed survival times is that their distribution is based on

extrapolationof the modelled survival distribution into the future, on the assumption that all individuals will eventually experience the event of interest. In many cases the assumption is false - there is a 'cured fraction' who will never experience the event. The consequence is that in many instances unrealistic survival times will be imputed, particularly when a large proportion of the times are censored. The higher the censoring proportion, the less information is present on the right-hand tail of the survival distribution and the more 'wild' the imputed times are likely to be.

Examples

. stsurvimpute x1 x2 x3, gen(t_imputed) weibull

. stsurvimpute age sex stage, gen(t_imputed) llogistic

. stsurvimpute, scale(normal) df(3) gen(t_imputed) truncate(0.8)seed(101)

AuthorPatrick Royston, MRC Clinical Trials Unit, London. pr@ctu.mrc.ac.uk

ReferencesP. C. Lambert and P. Royston. 2009. Further development of flexible parametric models for survival analysis. Stata Journal, in press.

P. Royston. 2001. Flexible alternatives to the Cox model, and more.

Stata Journal1: 1-28.P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects.

Statistics in Medicine21: 2175-2197.P. Royston, M. K. B. Parmar and D. G. Altman. 2008. Visualizing length of survival in time-to-event studies: a complement to Kaplan–Meier plots.

Journal of the National Cancer Institute100: 1-6.

Also see