Improving fitting and predictions for flexible parametric survival models

Northern European Stata Conference 2022, Oslo

Paul Lambert

University of Leicester / Karolinska Institutet

October 12, 2022

Introduction

  • The first article in The Stata Journal introduced stpm
  • I wrote stpm2 in 2007.
  • Here I will introduce stpm3

Flexible Parametric Survival Models

  • Flexible parametric survival models are used with time-to-event (survival) outcomes

  • They are “flexible” in that they use spline functions to model the effect of time

  • Easy to relax proportionality assumptions through interactions between covariates and the effect of time.

Choice of Scale

Log Cumulative Hazard (stpm2)

\[ \ln[H(t|\mathbf{x}_i)] = \ln[-\ln(S(t|\mathbf{x}_i))] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

Log odds scale (stpm2)

\[ \ln\left[\frac{1-S(t|\mathbf{x}_i)}{S(t|\mathbf{x}_i)}\right] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

Log hazard scale (strcs) ln(time)

\[ \ln\left[h(t|\mathbf{x}_i)\right] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

stcox vs stpm2

stcox and stpm2
. stcox hormon, nolog noshow 
------------------------------------------------------------------------------
          _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      hormon |   1.540262    .132659     5.02   0.000     1.301016    1.823503
------------------------------------------------------------------------------

. stpm2 hormon, scale(hazard) df(4) eform nolog
------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
      hormon |   1.540779   .1326967     5.02   0.000     1.301464    1.824099
       _rcs1 |    2.50398    .069006    33.31   0.000     2.372319    2.642949
       _rcs2 |   1.198509   .0330973     6.56   0.000     1.135364    1.265166
       _rcs3 |   1.018274   .0145595     1.27   0.205     .9901346    1.047214
       _rcs4 |   .9961938   .0067963    -0.56   0.576     .9829618    1.009604
       _cons |   .2935573   .0097629   -36.85   0.000     .2750327    .3133296
------------------------------------------------------------------------------

Predictions

Why a new command?

  • stpm2 started before Stata factor variables

  • Use better basis functions for splines

  • Make (conditional) predictions easier

  • Use frames for predictions

  • Include splines on log hazard scale

  • Include functional forms of covariates in linear predictor

  • Make marginal/standardized predictions much easier.

    • This is the main reason

Change of spline basis functions

  • stpm2 uses restricted cubic splines to model the effect of follow-up time.

    • data dependent - problems with out of sample prediction.
  • New gensplines command used by stpm3

    • Natural splines basis functions by default.

    • Also possible to use B-splines

Restricted Cubic Splines in stpm2

Natural Splines in stpm3