{smcl}
{hline}
help for {cmd:predsurv}{right:(Roger Newson)}
{hline}
{title:Compute predicted or baseline survival after {helpb streg} or {helpb stcox}}
{p 8 21 2}
{cmd:predsurv} {ifin} ,
{opt t:ime(#)} {opth g:enerate(newvarname)} [
{opt ty:pe(type)} {opt cum:inc} {opt fast}
]
{p 8 21 2}
{cmd:predbasesurv} {ifin} ,
{opt t:ime(#)} [ {opth g:enerate(newvarname)} {opth gs:urv(newvarname)}
{opt ty:pe(type)}
]
{pstd}
where {it:type} specifies a {help datatypes:numeric storage type}.
{title:Description}
{pstd}
{cmd:predsurv} and {cmd:predbasesurv} are intended for use in a survival time dataset set up by {helpb stset}.
{cmd:predsurv} is used after {helpb streg} has been used to fit a survival time regression model.
It computes a survival probability or cumulative incidence for a user-specified survival time.
{cmd:baseepredsurv} is used after {helpb stcox} has been used to fit a Cox regression model.
It computes a survival probability and/or a baseline survival probability for a user-specified survival time.
User-specified survival times are expressed in the units specified by the {cmd:scale()} option of {helpb stset}.
Note that {cmd:predsurv} and {cmd:predbasesurv} can do out-of-sample prediction,
if {helpb streg} or {helpb stcox} has been used to fit a model on a subset of the data,
but {cmd:predbasesurv} will only give the correct answers in the test set
if the training set and the test set are combined in the same dataset.
{title:Options for {cmd:predsurv} and {cmd:predbasesurv}}
{phang}
{opt time(#)} is required.
It specifies a survival time, for which survival probabilities are estimated.
Note that the times are assumed to be given in the units specified by the {cmd:scale()} option of {helpb stset},
as stored in the variable {cmd:_t} generated by {helpb stset}.
{phang}
{opth generate(newvarname)} is required.
It specifies the name of a new variable to be generated,
containing the survival probabilities or cumulative incidences (in the case of {cmd:predsurv})
or the baseline survival probabilities if all covariates are zero (in the case of {cmd:predbasesurv}),
at the time specified by the {cmd:time()} option.
{phang}
{opt type(type)} specifies a {help datatypes:numeric storage type} for the generated variable(s).
If absent, then {cmd:float} is assumed.
{title:Options for {cmd:predsurv} only}
{phang}
{opt cuminc} specifies that the generated variable will be a cumulative incidence probability variable.
In default, it shall be a survival probablity variable.
{phang}
{opt fast} is an option for programmers.
It specifies that {cmd:predsurv} will do no extra work to restore the original dataset
if the user presses {cmd:Break}.
It is necessary because {cmd:predsurv} makes temporary changes to the variable {cmd:_t}
generated by the {helpb stset} command.
It can save time when working with very large datasets.
{title: Options for {cmd:predbasesurv} only}
{phang}
{opth gsurv(newvarname)} specfies the name of a new variable to be generated,
containing the individual survival probabilities,
equal to the baseline survival probability if all covariates are zero
raised to the power of the hazard ratio for the individual observation.
Note that either or both of the options {cmd:generate()} and/or {cmd:gsurv()},
specifying generated variables,
must be present for {cmd:bpredbasesurv} to work.
{title:Remarks}
{pstd}
{cmd:predsurv} is useful for out-of-sample predictions after {helpb streg}.
The predicted survival probability can be used as a positive ordinal predictor of survival,
and the predicted cumulative incidence can be used as a negative ordinal predictor of survival,
much as the hazard ratio generated using {helpb stcox_postestimation:predict} after {helpb stcox}
can be used as a negative ordinal predictor of survival.
Note that, for {cmd:predsurv},
the training set can be separate from the test set,
because the model may be fitted in the training set,
and the {help estimates:estimation results} can be re-used for prediction in the test set,
without any training set observations being present.
{pstd}
{cmd:predbasesurv} is useful for out-of-sample prediction after {helpb stcox}.
It generates the baseline survival probability expected if all covariates are zero,
and/or the individual survival probability for each observation given the covariates in that observation.
Note that, for {cmd:baselinesurv} to work correctly,
the training set and the test set must be combined in the same dataset in memory,
and the Cox model must be fitted to the training set only.
The baseline survival variable for a given {cmd:time()} option
will have the same value within each stratum specified by the {cmd:stratua()} option of {helpb stcox},
but may vary from stratum to stratum.
If the {helpb stcox} command did not have a {cmd:strata()} option,
then the baseline survival probability will be constant.
Either way, the survival probability for an individual,
in the time specified in the {cmd:time()} option,
is equal, for each lifetime specified by an observationn,
to the baseline survival probability raised to the power of the hazard ratio
for te covariate values in that observation.
{pstd}
Note that it is a good idea for covariates in a Cox regression model to be centered,
so that they are equal to zero in a sensible and plausible scenario.
For instance, the user may enter covariates into a Cox model
after subtracting their mean or median values.
If this is done, then the baseline survival probability specified by {cmd:basepredsurv}
will be the survival probability for the lifetime of a typical individual,
and will not be very close to zero or one.
If variables are not centered, then the baseline survival probability
may correspond to an extreme scenario,
like the case of a heart failure patient of age zero.
Ths is important if we are estimating survival probabilities for individual subjects
by raising the baseline survival probability to the power of the hazard ratio,
which probably will not be done very precisely
if the baseline survival probability is close to zero or one.
{pstd}
The fitting of survival models to a training set,
and the testing of their ordinal predictive power using out-of-sample prediction in a test set,
is discussed in {browse "http://www.stata-journal.com/article.html?article=st0198":Newson (2010)}.
{title:Examples}
{pstd}
These examples use the {cmd:cancer} dataset,
which the user can download using the {helpb webuse} command.
{pstd}
Set-up
{phang2}{cmd:. webuse cancer, clear}{p_end}
{phang2}{cmd:. stset studytime died}{p_end}
{phang2}{cmd:. describe, full}{p_end}
{phang2}{cmd:. tab died, miss}{p_end}
{phang2}{cmd:. summ studytime, de}{p_end}
{pstd}
Fit model and compute median survival time and 12-month survival probability
{phang2}{cmd:. streg age, dist(weibull) strata(drug)}{p_end}
{phang2}{cmd:. predict medsurv}{p_end}
{phang2}{cmd:. summ medsurv, de}{p_end}
{phang2}{cmd:. predsurv, time(12) gene(surv12)}{p_end}
{phang2}{cmd:. summ surv12, de}{p_end}
{pstd}
Plot 12-month survival probability against median survival time
{phang2}{cmd:. scatter surv12 medsurv, yline(0.5) ylab(0(0.1)1) xline(12) xlab(0(6)60)}{p_end}
{pstd}
Note that the 12-month survival probability {cmd:s12} is above (or below) 0.5
if and only if the median survival time {cmd:medsurv} is above (or below) 12 months.
{pstd}
The following example estimates 40-day survival probabilities in te {cmd:stan3} dataset,
after grouping and centering covariates.
{pstd}
Input and describe dataset
{phang2}{cmd:. webuse stan3, clear}{p_end}
{phang2}{cmd:. describe, full}{p_end}
{phang2}{cmd:. stset}{p_end}
{pstd}
Group years into strata to reflect changes in treatment in 1970 and 1973
{phang2}{cmd:.generate pgroup = year}{p_end}
{phang2}{cmd:.recode pgroup min/69=1 70/72=2 73/max=3}{p_end}
{pstd}
Center age at 50 and year at 70
{phang2}{cmd:. summ age, de}{p_end}
{phang2}{cmd:. generate agem50=age-50}{p_end}
{phang2}{cmd:. summ year, de}{p_end}
{phang2}{cmd:. generate yearm70=year-70}{p_end}
{pstd}
Fit stratified Cox regression model
{phang2}{cmd:. stcox agem50 posttran surg yearm70, strata(pgroup) vce(robust)}{p_end}
{pstd}
Baseline and individual 40-day survival probabilities
{phang2}{cmd:. predbasesurv, time(40) generate(bsurv40) gsurv(surv40)}{p_end}
{pstd}
Show that baseline 40-day survival probabiility is a function of stratum
{phang2}{cmd:. tab bsurv40, m}{p_end}
{phang2}{cmd:. tab pgroup bsurv40, m}{p_end}
{pstd}
Distributon of 40-day survival probabilities for individual patients
{phang2}{cmd:. summ surv40, de}{p_end}
{title:Author}
{pstd}
Roger Newson, Imperial College London, UK.{break}
Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk}
{title:References}
{phang}
Newson, R. B. 2010.
Comparing the predictive powers of survival models using Harrell's {it:C} or Somers' {it:D}.
{it:Stata Journal} 10: 339-358.
Download from
{browse "http://www.stata-journal.com/article.html?article=st0198":the {it:Stata Journal} website}.
{title:Also see}
{p 4 13 2}
{bind: }Manual: {hi:[ST] stset}, {hi:[ST] streg}, {hi:[ST] stcox}
{p_end}
{p 4 13 2}
On-line: help for {helpb stset}, {helpb streg}, {helpb stcox}
{p_end}