Title
stcoxgof -- Goodness-of-fit test and plot after a Cox model
Syntax
stcoxgof [, group(#) mol(#) molat(numlist) mom(#) momat(numlist) poidis arjas(#) separate twoway_options ]
options Description ------------------------------------------------------------------------- Main group specify the number of quantiles of risk for Gronnesby and Borgan test mol specify the number of time intervals for Moreau, O'Quigley and Lellouch test molat specify the time intervals for Moreau, O'Quigley and Lellouch test mom specify the number of time intervals for Moreau, O'Quigley and Mesbah test momat specify the time intervals for Moreau, O'Quigley and Mesbah test poidis report probability of observed counts within each decile of risk according to Poisson distibution
Arjas like plots arjas specify the number of quantiles of risk for Arjas like plots. separate separate plots for each qualntile of risk are shown. lineopts(cline_options) affect rendition of the line
Y-Axis, X-Axis, Saving, Scheme twoway_options some of the options documented in [G] twoway_options -------------------------------------------------------------------------
Description
stcoxgof is a post-estimation command testing the goodness of fit after a Cox model. So you must use this command after stcox. To compute Gronnesby and Borgan test and to obtain Arjas like plots Martingale residuals must also be saved specifying stcox's mgale() option; see stcox.
stcoxgof calls scoretest_cox to compute score test statistics. You can obtain this command by clicking here.
When used without options or with the option group(#), stcoxgof computes the added variable version of Gronnesby and Borgan test by the Score statistic and Likelihood ratio statistic for the inclusion of design variables based on risk score. Then a table presenting the observed and expected numbers of events in each quantile of risk is shown. According to May and Hosmer, z-score and two-tailed p-value from standard normal distribution are also tabulated.
An added variables version of the Moreau, O'Quigley and Lellouch test can be computed specifying the mol(#) or the molat(numlist) option. In this case the Score statistic and Likelihood ratio statistic refer to the inclusion of design variables based on cross-products of indicator variables for risk score groups and time intervals.
Specifying mom(#) or the momat(numlist) option an added variables version of the Moreau, O'Quigley and Mesbah test can be computed. In this case the Score statistic and Likelihood ratio statistic refer to the inclusion of design variables based on cross-products of the covariates included in the model and time intervals. Since several interactions terms can be created, it is not advisable to compute this test when the model includes more than a few categorical covariates.
If arjas(#) option is given Arjas like plots by quantiles of risk are displayed.
Options
+------+ ----+ Main +-------------------------------------------------------------
group(#) specifies the number of groups based on risk score to be used to group observed ed expected numbers of events. If not specified the optimal number of quantiles is computed according to the formula int(max(2,min(10,`e(N_fail)'/40))). Indicator variables for each risk score group are then added to the model to compute the Gronnesby and Borgan test. Values allowed are from 2 to 10.
mol(#) specifies the number of time intervals by which the analysis time is partitioned. Then, cross-products of indicator variables for quantiles of risk and time intervals are formed and the Moreau, O'Quigley and Lellouch test computed. Values allowed are from 2 to 10. The number of quantiles of risk can be specified using the group(#) option or the optimal number is determined according to the formula above.
molat(numlist) is an alternative way to partition the analysis time at the times specified in the numlist. As the previous option, cross-products of indicator variables for quantiles of risk and time intervals are then formed the Moreau, O'Quigley and Lellouch test computed.
mom(#) specifies the number of time intervals by which the analysis time is partitioned. The time axis is divided such that each interval contains approximately the same number of events. Then cross-products of covariates included in the model and time intervals are formed and Moreau, O'Quigley and Mesbah test computed. Values allowed are from 2 to 10.
momat(numlist) is an alternative way to partition the analysis time at the times specified in the numlist. As the previous option, cross-products of covariates included in the model and time intervals are then formed and the Moreau, O'Quigley and Mesbah test computed.
poidis estimates the probability of observed counts within each decile of risk according to Poisson distibution with mean equal to the estimated expected number of counts.
+-----------------+ ----+ Arjas like plot +--------------------------------------------------
arjas(#) specifies the number of quantiles of risk used to group the data for Arjas like plots. Values allowed are from 2 to 10.
separate requests that for each quantile of risk a separate graph should be shown.
Examples
use "C:\Data\uis_gof", clear stset time, failure(cens) stcox age beck ndru_1 ndru_2 ivh_3 race treat site agesite racesite, mgale(m) stcoxgof stcoxgof,gr(5) stcoxgof,mol(4) stcoxgof,gr(5) mol(4) stcoxgof,gr(5) molat(84 170 376) stcoxgof,arjas(4) scheme(sj) stcox ivh_3 race treat stcoxgof,mom(4) stcoxgof,momat(170 354 535)
Downloading ancillary files in one of your `"`c(adopath)'"' directory you can run this example.
(click to run)
Remarks
Based on ideas similar to the Hosmer-Lemeshow test for logistic regression, three goodness of fit tests for Cox proportional hazards model can be derived by adding group indicator variables to the model and testing the hypothesis that the coefficients of the group indicator variables are zero via score, likelihood ratio or Wald test.
The first is the Moreau, O'Quigley, and Lellouch (MOL) test obtained by partitioning in intervals the time axis and grouping the individuals based on their risk score. Then, indicator variables are generated as cross products of time intervals with risk score groups and included in the model. The MOL test is an omnibus test and should detect any violations of the PH model.
The second is the added variable version of test proposed by Moreau. O'Quigley, and Mesbah (MOM). The time axis is partitoned in intervals and indicator variables are generated by cross products of time intervals with each level of the covariates in the model. The MOM test is designed to specifically detect violations of the proportional hazards assumption. The fact that we might need to use a large number of added variables limits the use of the MOM test to the case of Cox models with just a few categorical covariates.
The third test is proposed by Gronnesby and Borgan. The idea is to divide the observations into groups based on their estimated risk score. Then, indicator variables for risk score groups are added to the model testing whether their coefficients are zero. This test, like the MOL test, is an omnibus test but it is not appropriate when time-varying covariates are included in the model.
Arjas like plots, as proposed by Hosmer and Lemeshow, compare observed and expected events in groups based on risk score.
Also see
Manual: [S] st stcox
Online: stcox postestimation, stcox diagnostics
References
S. May and D. W. Hosmer. Hosmer and Lemeshow type goodness-of-fit statistics for the Cox proportional hazards model. In: Advances in Survival Analysis: Handbook of Statistics Vol 23, edited by N. Balakrishnana and C. R. Rao, Amsterdam: Elsevier, North-Holland, 2004, p. 383-394.
D. W. Hosmer and S. Lemeshow. Applied survival analysis: Regression modeling of time to event data. Wiley, New York, 1999, p. 225 - 230.
Authors
Enzo Coviello (enzo.coviello@alice.it) John Moran (john.moran@adelaide.edu.au)
Aknowledgments
We are grateful to Isabel Canette for writing scoretest_cox and to Phil Ryan and coll. for their cooperation in checking the results of the tests.