{smcl} {* *! version 2.0.0 13nov2014}{...} {cmd:help str2d} {hline} {title:Title} {p2colset 5 14 16 2}{...} {p2col :{hi:str2d} {hline 2}}Explained variation in survival analysis - Royston-Sauerbrei D measure{p_end} {p2colreset}{...} {title:Syntax} {phang2} {cmd:str2d} [, {it:str2d_options}] {cmd::} {it:survival_cmd} {it:xvarlist} {ifin} [{cmd:,} {it:survival_cmd_options}] {synoptset 24}{...} {synopthdr :str2d_options} {synoptline} {synopt :{opt adj:ust}}computes adjusted R2, taking into account model dimension{p_end} {synopt :{opt bo:otreps(#)}}computes a bootstrap confidence interval using {it:#} replications{p_end} {synopt :{opt exc:lude(varlist)}}excludes {it:varlist} from the linear predictor when assessing R2{p_end} {synopt :{opt mod:eldim(#)}}sets the dimension of the fitted model to {it:#}{p_end} {synopt :{opt nodot:s}}suppresses display of the bootstrap replication dots{p_end} {synopt :{opt rand:omness}}reports explained randomness (default is explained variation){p_end} {synopt :{opt val:idate(varname)}}estimates the model in the subsample defined by the lower value of {it:varname} and computes R2 in the subsample defined by the higher value of {it:varname}{p_end} {synopt :{it:survival_cmd_options}}options of {it:survival_cmd}{p_end} {synoptline} {pstd} {marker syntax}where {phang} {it:survival_cmd} may be {help stcox}, {help streg}, {help stpm} (if installed), or {help stpm2} (if installed). {pstd} You must have {cmd:stset} your data before using {cmd:str2d}. {title:Description} {pstd} {cmd:str2d} computes Royston & Sauerbrei (2004)'s R2 statistic based on the D measure of discrimination of proportional hazards, proportional odds and probit models for censored survival data. The D measure is available for all the above {it:survival_cmd} commands except {cmd:streg, distribution(gamma)}. {pstd} The model is defined by {pmore} {cmd:.} {it:survival_cmd} {it:xvarlist} [ {cmd:,} {it:survival_cmd_options} ] {pstd} See the {cmd:validate()} option for comments on out-of-sample prediction and assessment of R2 in a "validation" or test sample. {pstd} IMPORTANT NOTE: For version 2.0.0 and upwards, {cmd:str2d} has "colon command" syntax. The older syntax (up to and including version 1.2.3) is no longer available. Also, {cmd:str2ph} has been withdrawn and is no longer included in this package. It has been shown to be prone to bias as censoring increases. {title:Options} {phang} {opt adjust} computes adjusted R2, taking into account the dimension (i.e. number of covariates) of the model. This may be helpful when R2 is low and/or the model is complex, since the expected value of R2 under the null hypothesis that the outcome is unrelated to the covariates is greater than zero and depends on the model dimension. Adjustment attempts to eliminate this bias in R2 under the null hypothesis. Since R2 calculated by out-of-sample prediction in a "validation" sample does not require adjustment, the {opt validate()} option is not permitted with {opt adjust}. See also the {opt modeldim()} option. {phang} {opt bootreps(#)} with {it:#} > 0 computes a bootstrap confidence interval for R2, using {it:#} bootstrap replications. A minimum reasonable value of {it:#} is 1000, but a better number is 5000. Note that with {it:#} = 5000, the computation may take quite some time. The default value of {it:#} is 0, meaning no bootstrap CI is computed; in that case, an analytic estimate of the SE of R2 is displayed, derived by the delta method from the SE of D. See Royston & Sauerbrei (2004) for details of the SE of D. {phang} {opt exclude(varlist)} deletes predictors in {it:varlist} from the linear predictor before computing R2 and D. The point of this option is to remove the effect of irrelevant or uninteresting structural or adjustment variables, such as centre or region, from the discrimination of the model of interest. Note that the model is NOT re-fitted without {it:varlist}; the values of regression coefficients are retained. {phang} {opt modeldim(#)} sets the dimension of the fitted model to {it:#}. The default is for the dimension to equal the number of terms in {it:xvarlist}. Some people believe that with stepwise selection of variables, the correct figure to use for the model dimension is the number of candidate predictors (or more generally, with multiparameter predictors such as fractional polynomials functions, the degrees of freedom). {opt modeldim()} has an effect only when the {opt adjust} option is also applied. {phang} {cmd:nodots} suppresses display of the replication dots with bootstrap confidence interval estimation. By default, a single dot character is displayed after each 100 replications. {phang} {cmd:randomness} expresses R2 results as "explained randomness". The default is "explained variation". {phang} {cmd:validate(}{it:varname}{cmd:)} estimates the model in the subsample defined by the low value of {it:varname} and computes R2 in the subsample defined by the high value of {it:varname}. These subsamples may be thought of as a training and a test set. {it:varname} must have exactly two distinct values in the estimation sample defined by {it:xvarlist}, {cmd:if} and {cmd:in}. These two values are arbitrary. {it:varname} may be a string variable, in which case lexicographic ordering is assumed. R2 is computed according to the index ({cmd:xb}) predicted from the training sample (low value of {it:varname}) into the test sample (high value of {it:varname}). The index predicted on the test sample is transformed to scaled normal scores and regression on the scores is performed. The slope of this regression is Royston & Sauerbrei (2004)'s D statistic. This step is required to compute D and hence R2. {phang} {it:survival_cmd_options} are options of {it:survival_cmd}. Examples include {cmd:distribution(weibull)} for {cmd:streg}, {cmd:df(2) scale(hazard)} for {cmd:stpm} and {cmd:stpm2}, and {cmd:strata(x1 x2)} for {cmd:stcox}. {title:Examples} {phang}{cmd:. }{stata webuse brcancer, clear}{p_end} {phang}{cmd:. }{stata stset rectime, failure(censrec) scale(365.24)}{p_end} {phang}{cmd:. }{stata "str2d: stcox x4a x5e x6 hormon"}{p_end} {phang}{cmd:. }{stata "str2d, bootreps(500): stcox x4a x5e x6 hormon"}{p_end} {phang}{cmd:. }{stata "str2d: stpm2 x4a x5e x6 hormon, df(2) scale(hazard)"}{p_end} {phang}{cmd:. }{stata "str2d: streg x4a x5e x6 hormon, distribution(weibull)"}{p_end} {phang}{cmd:. }{stata set seed 10101}{p_end} {phang}{cmd:. }{stata gen byte val = (runiform() < 0.5)}{p_end} {phang}{cmd:. }{stata "str2d, validate(val): stcox x4a x5e x6 hormon"}{p_end} {title:Author} {pstd} Patrick Royston, MRC Clinical Trials Unit at UCL, London.{break} j.royston@ucl.ac.uk {title:References} {phang} B. Choodari-Oskooei, P. Royston and M. K. B. Parmar. 2012. A simulation study of predictive ability measures in a survival model I: Explained variation measures. {it:Statistics in Medicine} {bf:31}: 2627-2643. {phang} J. O’Quigley, R. Xu and J. Stare. 2005. Explained randomness in proportional hazards models. {it:Statistics in Medicine} {bf:24}: 479-489. {phang} P. Royston. 2006. Explained variation for survival models. {it:Stata Journal} {bf:6(1)}: 83-96. {phang} P. Royston and W. Sauerbrei. 2004. A new measure of prognostic separation in survival data. {it:Statistics in Medicine} {bf:23}: 723-748. {title:Also see} {psee} Online: help for {help stcox}, {help streg}; {help stpm}, {help stpm2} (if installed). {p_end}