help for str2ph, str2d                                          Patrick Royston

Explained variation in survival analysis

str2ph survival_cmd [varlist] [if] [in] [, adjust bootreps(#) calibrate denominator nodots offset(varname) randomness validate(varname) survival_cmd_options ]

str2d survival_cmd varlist [if] [in] [, adjust bootreps(#) randomness validate(varname) survival_cmd_options ]

where survival_cmd may be stcox, streg, or stpm (if installed).

You must have stset your data before using str2ph or str2d.


str2ph computes Royston (2006)'s modification of O'Quigley, Xu & Stare's (2005) modification of Nagelkerke's (1991) R-squared (R≤) statistic (a.k.a. coefficient of determination, proportion of explained variation) for proportional hazards (PH) models for censored survival data. str2ph will also give sensible results in non-PH survival models supported by streg and stpm; see Royston (2006) for further information.

str2d computes Royston & Sauerbrei (2004)'s R≤ statistic based on their index of discrimination (D) for proportional hazards, proportional odds and probit models for censored survival data. The D measure is available for all survival_cmds except streg, distribution(gamma).

The model is defined by

. survival_cmd varlist [ , survival_cmd_options ]

See the validate() option for comments on out-of-sample prediction and assessment of R≤ in a "validation" or test sample.


adjust computes adjusted R≤, taking into account the dimension (i.e. number of covariates) of the model. This may be helpful when R≤ is low and/or the model is very complex, since the expected value of R≤ under the null hypothesis (that the outcome is unrelated to the covariates) is greater than zero and depends on the model dimension. Adjustment attempts to eliminate this bias in R≤ under the null hypothesis. Since R≤ calculated by out-of-sample prediction in a "validation" sample does not require adjustment, the validate() option is not permitted with adjust.

bootreps(#) with # > 0 computes a bootstrap confidence interval for R≤, using # bootstrap replications. A minimum reasonable value of # is 1000, but a better number is 5000. Note that with # = 5000, the computation may take quite some time. The default value of # is 0, meaning no bootstrap CI is computed. With # = 0 in str2d, an analytic estimate of the SE of R≤ is displayed, derived by the delta method from the SE of D (see Royston & Sauerbrei (2004) for details of the SE of D).

calibrate (for use only with str2ph ... , validate()) forces the survival regression to be re-estimated in the test sample on the index predicted from varlist in the training sample. The default is to offset the predicted index and calculate R≤ via the likelihood of that model. Regression on the index amounts to calibration of the model in the test sample and may noticeably increase the R≤ value. See also the validate() option.

denominator changes the denominator for the model chisquare statistic from k (the number of events) to n*(k/n)^alpha, where n is the sample size and alpha is approximately 5/6. A better value of alpha is required; this is work in progress. The effect of this option is to reduce the variation explained, particularly when the number of events is small.

nodots suppresses display of the replication dots with bootstrap confidence interval estimation. By default, a single dot character is displayed after each 100 replications.

offset(varname) offsets varname from the linear predictor. Note that offset(varname) without a main varlist is permitted. This allows the evaluation of a predictor 'from outside'.

randomness prevents conversion of the modified Nagelkerke index of determination from explained randomness to explained variation. The reported R≤ is then interpretable, at least in PH models, as explained randomness.

validate(varname) estimates the model in the subsample defined by the low value of varname and computes R≤ in the subsample defined by the high value of varname. These subsamples may be thought of as a training and a test set. varname must have exactly two distinct values in the estimation sample defined by varlist and if and in. These two values are arbitrary. varname may be a string variable, in which case lexicographic ordering is assumed. R≤ is computed according to the index (xb) predicted from the training sample (low value of varname) into the test sample (high value of varname). With str2ph, there is a choice between refitting the index in the test sample, or offsetting the index there (see the calibrate option). With str2d, the index predicted on the test sample is transformed to scaled normal scores and regression on the scores is performed. The slope of this regression is Royston & Sauerbrei (2004)'s D statistic. This step is required to compute D and hence R≤. The calibrate option is not relevant to the D method, hence is not available with str2d.

survival_cmd_options are options of survival_cmd. Examples include distribution(weibull) for streg, df(2) scale(hazard) for stpm, and strata(x1 x2) for stcox.


. str2ph stcox x1 x2 x3 . str2ph stcox x1-x20, adjust bootreps(1000) . str2ph stcox x1-x20, validate(tt) bootreps(1000) . str2ph stcox x1-x20, validate(tt) calibrate bootreps(1000) . str2ph streg x1 x2 x3, distribution(weibull) . str2ph stpm x1 x2 x3, scale(hazard) df(2) . str2ph stcox, offset(index)

. str2d stcox x1 x2 x3 . str2d stcox x1 x2 x3, validate(tt) . str2d streg x1 x2 x3, distribution(llogistic) . str2d stpm x1 x2 x3 if a==1, scale(odds) df(2) validate(tt)


Patrick Royston, MRC Clinical Trials Unit, London. patrick.royston@ctu.mrc.ac.uk


N. J. D. Nagelkerke. 1991. A note on a general definition of the coefficient of determination. Biometrika 78: 691-692.

J. O'Quigley, R. Xu and J. Stare. 2005. Explained randomness in proportional hazards models. Statistics in Medicine 24: 479-489.

P. Royston. 2006. Explained variation for survival models. Stata Journal.

P. Royston and W. Sauerbrei. 2004. A new measure of prognostic separation in survival data. Statistics in Medicine 23: 723-748.

Also see

Online: help for stcox, streg; stpm if installed.