Title

stjm -- Joint modelling of longitudinal and survival data

Syntax

stjm longdepvar [varlist] [if] [in] , panel(varname) survmodel(survsubmodel) [options]

options Description ------------------------------------------------------------------------- panel(varname) panel identification variable survmodel(survsubmodel) survival submodel

Longitudinal sub-model ffp(numlist) fixed powers of time rfp(numlist) fixed and random powers of time frcs(#) degrees of freedom for fixed splines of time rrcs(#) degrees of freedom for random splines of time timeinteraction(varlist) covariates to interact with fixed time variables covariance(vartype) variance-covariance structure of the random effects

Survival sub-model survcov(varlist) fixed baseline covariates to be included in the survival submodel df(#) degrees of freedom for baseline hazard function knots(numlist) knot locations for baseline hazard function noorthog do not use orthogonal transformation of spline variables

Association nocurrent association not based on the current value of longitudinal response derivassociation association based on the first derivative (slope) of the longitudinal submodel intassociation allow association to be based on the random intercept association(numlist) allow association to be based on the random coefficient of a time variable assoccovariates(varlist) adjust the association parameter(s) by covariates nocoefficient do not include fixed coefficient when using intassociation or association()

Maximisation gh(#) number of Gauss-Hermite quadrature points gk(#) number of Gauss-Kronrod quadrature points gl(#) number of Gauss-Legendre quadrature points adaptit(#) the number of adaptive Gauss-Hermite quadrature iterations; default is 5 noshowadapt suppress display of adaptive quadrature sub-iterations atol(#) tolerance for the log-likelihood under the adaptive quadrature sub-iterations, default is 1.0E-05 nonadapt use non-adaptive Gauss-Hermite quadrature fulldata use all data in survival component maximisation, see details nullassoc sets the initial values for association parameters to be zero noxtem suppresses the default emonly option used in the [XT] xtmixed call for initial values maximize_options control the maximization process; seldom used

Reporting showinitial display output from initial value model fits variance show random-effects parameter estimates as variances-covariances showcons list constraints in output keepcons do not drop constraints used in ml routine level(#) set confidence level; default is level(95)

survsubmodel Description ------------------------------------------------------------------------- exponential exponential survival submodel weibull Weibull survival submodel gompertz Gompertz survival submodel fpm flexible parametric survival submodel weibweib Mixture Weibull-Weibull survival submodel weibexp Mixture Weibull-exponential survival submodel -------------------------------------------------------------------------

vartype Description ------------------------------------------------------------------------- independent one variance parameter per random effect, all covariances zero exchangeable equal variances for random effects, and one common pairwise covariance identity equal variances for random effects, all covariances zero; the default for factor variables unstructured all variances and covariances distinctly estimated; the default -------------------------------------------------------------------------

longdepvar specifies the longitudinal continuous response variable. You must stset your data before using stjm; see [ST] stset. See stjm postestimation for features available after estimation.

Description

stjm fits shared parameter joint models for longitudinal and survival data using maximum likelihood. A single continuous longitudinal response and a single survival outcome are allowed. A linear mixed effects model is used for the longitudinal submodel, which lets time be modelled using fixed and/or random fractional polynomials or resticted cubic splines. Six choices are currently available for the survival submodel, including the exponential, Weibull, Gompertz, 2-component mixture Weibull-Weibull and 2-component mixture Weibull-exponential proportional hazards models. Furthermore, the flexible parametric survival model (see stpm2), modelled on the log cumulative hazard scale is also available. The association between the two processes can be induced via the default current value parameterisation, the first derivative of the longitudinal submodel, and/or a random coefficient such as the intercept. Adaptive or non-adaptive Gauss-Hermite quadrature, coded in Mata, can be used to evaluate the joint likelihood. Under all survival submodels except the flexible parametric model, Gauss-Kronrod quadrature is used to evaluate the cumulative hazard. The dataset must be stset correctly into enter and exit times, using the enter option; see [ST] stset. stjm uses _t0 to denote measurement times. For example, below we have 3 patients with 2, 5 and 3 measurements each, respectively.

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 1.67 2 1.2 1.6 0 1.92 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

Delayed entry joint models can be fitted, allowing age to be used as the timescale.

Options

panel(varname) defines the panel identification variable. Each panel should be identified by a unique integer.

survmodel(survsubmodel) specifies the survival submodel to be fit.

survmodel(fpm) fits a flexible parametric survival submodel. This is a highly flexible fully parametric alternative to the Cox model, modelled on the log cumulative hazard scale using restricted cubic splines. For more details see stpm2.

survmodel(exponential) fits an exponential survival submodel.

survmodel(weibull) fits a Weibull survival submodel.

survmodel(gompertz) fits a Gompertz survival submodel.

survmodel(weibweib) fits a 2-component mixture Weibull-Weibull survival submodel.

survmodel(weibexp) fits a 2-component mixture Weibull-exponential survival submodel.

+------------------------+ ----+ Longitudinal sub-model +-------------------------------------------

ffp(numlist) specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed covariates. _t0 is used as the time of measurements. Values must be in {-5, -4, -3, -2, -1, -0.5, 0, 0.5, 1, 2, 3, 4, 5}.

rfp(numlist) specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed and random covariates. _t0 is used as the time of measurements. Values must be in {-5, -4, -3, -2, -1, -0.5, 0, 0.5, 1, 2, 3, 4, 5}.

frcs(#) specifies the degrees of freedom of the time variable, expanding time into restricted cubic splines as fixed covariates. _t0 is used as the time of measurements. The default knot locations are described under df().

rrcs(#) specifies the degrees of freedom of the time variable, expanding time into restricted cubic splines as random covariates. _t0 is used as the time of measurements. The default knot locations are described under df().

timeinteraction(varlist) covariates to interact with the fixed time components specified in ffp/frcs.

covariance(vartype) specifies the variance-covariance structure of the random effects.

covariance(independent) specifies a distinct variance for each random effect, with all covariances zero.

covariance(exchangeable) specifies equal variances for all random effects, and one common pairwise covariance.

covariance(identity) specifies equal variances for all random effects, with all covariances zero.

covariance(unstructured) specifies that all variances and covariances are distinctly estimated.

+--------------------+ ----+ Survival sub-model +-----------------------------------------------

survcov(varlist) specifies covariates to be included in the survival submodel.

df(#) specifies the degrees of freedom for the restricted cubic spline function used for the baseline function under a flexible parametric survival submodel. # must be between 1 and 10, but usually a value between 1 and 4 is sufficient, with 3 being the default. The knots() option is not applicable if the df() option is specified. The knots are placed at the following centiles of the distribution of the uncensored log survival times:

------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 1 0 (no knots) 2 1 50 3 2 33 67 4 3 25 50 75 5 4 20 40 60 80 6 5 17 33 50 67 83 7 6 14 29 43 57 71 86 8 7 12.5 25 37.5 50 62.5 75 87.5 9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 10 9 10 20 30 40 50 60 70 80 90 ------------------------------------------------------------ Note that these are interior knots and there are also boundary knots placed at the minimum and maximum of the distribution of uncensored survival times.

knots(numlist) specifies knot locations for the baseline distribution function under a flexible parametric survival submodel, as opposed to the default locations set by df(). Note that the locations of the knots are placed on the standard time scale. However, the scale used by the restricted cubic spline function is always log time. Default knot positions are determined by the df() option.

noorthog suppresses orthogonal transformation of spline variables under a flexible parametric survival submodel.

+-------------+ ----+ Association +------------------------------------------------------

nocurrent specifies that the association between the survival and longitudinal submodels is not based on the current value. The default association is based on the current value of the longitudinal response.

derivassociation specifies that the association between the survival and longitudinal submodels is based on the first derivative of the longitudinal submodel.

intassociation specifies that the association between the survival and longitudinal submodels is based on the random intercept of the longitudinal submodel. By default this includes the fixed intercept coefficient.

association(numlist) specifies that the association between the survival and longitudinal submodels is based on a random coefficient of time fractional polynomials specified in rfracpoly. By default this includes the fixed coefficient.

assoccovariates(varlist) covariates to include in the linear predictor of the association parameter(s). Under the default current value association, this corresponds to interacting the longitudinal submodel with covariates.

nocoefficient do not include the fixed coefficient when using intassociation or association().

+--------------+ ----+ Maximisation +-----------------------------------------------------

gh(#) specifies the number of Gauss-Hermite quadrature nodes used to evaluate the integrals over the random effects. The defaults are 5 and 15 under adaptive and non-adaptive, respectively. Minimum number of quadrature points is 2.

gk(#) specifies the number of Gauss-Kronrod quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. Two choices are available, namely 7 or 15, with the default 15.

gl(#) specifies the number of Gauss-Legendre quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. This is an alternative to Gauss-Kronrod quadrature, where the user can specify any number of nodes >=5.

adaptit(#) defines the number of iterations of adaptive Gauss-Hermite quadrature to use in the maximisation process, with the default 5. Adaptive quadrature is implemented at the beginning of each full Newton-Raphson iteration.

noshowadapt suppresses the display of the log-likelihood values under the sub-iterations used to assess convergence of the adaptive quadrature implemented at the beginning of each full Newton-Raphson iteration.

atol(#) tolerance for the log-likelihood under adaptive quadrature sub-iterations, default is 1.0E-05

nonadapt use non-adaptive Gauss-Hermite quadrature to evaluate the joint likelihood. This will generally require a much higher number of nodes, gh, to ensure accurate estimates and standard errors, resulting in much greater computation time.

fulldata forces stjm to use all rows of data in the survival component of the likelihood. By default, stjm assesses whether all covariates specified in survcov() are constant within panels, and if they are, only needs to use the first row of _t0 and the final row of _t in the maximisation process providing considerable speed advantages.

nullassoc sets the initial value for association parameters to be zero. Use of the default initial values may in rare situations cause stjm to display initial values not feasible; using this option solves this, however, convergence time is generally longer.

noxtem suppresses the use of the emonly option in the [XT] xtmixed call uses to obtain initial values. By default, emonly is used which is often quicker and provides adequate starting values for the longitudinal component.

maximize_options; difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, shownrtolerance, tolerance(#), ltolerance(#) gtolerance(#), nrtolerance(#), nonrtolerance, from(init_specs); see [R] maximize. These options are seldom used, but the difficult option may be useful if there are convergence problems.

+-----------+ ----+ Reporting +--------------------------------------------------------

showinitial displays the output from the [XT] xtmixed and [ST] streg, stpm2 or stmix models fitted to obtain initial values for stjm.

variance show random-effects parameter estimates as variances-covariances

showcons displays the constraints used by stpm2 and stjm for the derivatives of the spline function. This option is only valid under a flexible parametric survival submodel.

keepcons prevents the constraints imposed by stjm on the derivatives of the spline function when fitting delayed entry models being dropped. By default, the constraints are dropped. This option is only valid under a flexible parametric survival submodel.

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level.

Remarks

1. A random intercept is always assumed in each stjm call.

2. Measurement time must be exclusively controlled through the options ffp(), rfp(), frcs(), rrcs() and timeinteraction(), and not included as fixed covariates in either submodel through [varlist] or survcov(varlist).

3. As with all survival models with multiple-record st data, time-varying covariates can be included in each submodel.

4. Estimation is performed by maximum likelihood. Optimisation uses the default technique nr, meaning Stata's version of Newton-Raphson iterations.

5. If convergence issues arise, try specifying the nullassoc option and/or increasing the number of Gauss-Hermite nodes, gh(). Users may also wish to vary the number of adaptive quadrature iterations using adaptit().

6. As with all models which use numerical integration, the number of quadrature nodes should be increased to establish the stability of the estimates.

7. Note that under a flexible parametric survival sub-model, if more than one random effect is specified then survival sub-model coefficients are interpreted as proportional cumulative hazard ratios. In this case the equivalency between proportional cumulative hazard ratios and proportional hazards ratios does not hold.

+---------------------------+ ----+ Intermittant missing data +----------------------------------------

If intermittent missing data is present in any covariates, for example:

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 . 2 1.2 1.6 0 . 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

then care must be taken to ensure the appropriate rows of data are included in the survival component of the joint likelihood. By default, stjm assesses whether all covariates included in survcov() are constant within panels, and if they are, only has to use the first row of _t0 and the final row of _t for the survival likelihood component. However, if they are not, or the fulldata option is used, then all rows are included, as with multiple-record st data.

For example, if we were to use the fulldata option when analysing the data in the above Table, then patient 2's survival contribution would be missing between times _t0 = 0.5 and _t = 1.6. However, the correct contribution would be made by using only the first row of _t0 and final row of _t.

stjm displays a warning when this situation is detected. The simplest way of avoiding this is to remove any missing data before using stset.

Example 1: Simulated dataset

Load simulated example dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_example

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph long_response, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a flexible parametric survival submodel with 3 degrees of freedom, and association based on the current value. No covariates in either submodel. . stjm long_response, panel(id) survmodel(fpm) df(3) ffp(1)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, adjusting for treatment in the survival submodel and the interaction between treatment and measurement time in the longitudinal submodel. Current value association. . stjm long_response trt, panel(id) survmodel(weibull) ffp(1) survcov(trt) timeinterac(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and adjusting for treatment in both submodels. Risk of event dependent on the current value and the first derivative of the longitudinal submodel. . stjm long_response trt, panel(id) survmodel(weibull) rfp(1) survcov(trt) derivassoc

Example 2: Primary Biliary Cirrhosis dataset

This example dataset contains 1945 repeated measurements of serum bilirubin, from 312 patients with Primary Biliary Cirrhosis (PBC). Patients received treatment of D-penicillamine or placebo. In all analyses we use the log of serum bilirubin.

Load PBC dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_pbc_example_data

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph logb, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. . stjm logb, panel(id) survmodel(w) ffp(1) timeinterac(trt) survcov(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. We also adjust for the interaction between log serum bilirubin and treatment. . stjm logb, panel(id) survmodel(w) rfp(1) timeinterac(trt) survcov(trt) assoccov(trt)

Example 3: Liver cirrhosis with repeated measures of prothrombin index

This example dataset contains 2968 repeated measurements of prothrombin index, from 488 patients with liver cirrhosis. Patients received treatment of prednisone or placebo.

Load dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_prothro

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph pro, panel(id) lowess

Joint model with a random intercept and random slope in the longitudinal submodel, a mixture Weibull-exponential survival submodel, and association based on the current value. We adjust for treatment in the survival submodel. . stjm pro, panel(id) survmodel(weibexp) rfp(1) survcov(trt)

Joint model with a random intercept and fixed splines of time with 2 degrees of freedom in the longitudinal submodel, a mixture Weibull-Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed splines of time in the longitudinal submodel, and treatment in the survival submodel. . stjm pro, panel(id) survmodel(weibweib) frcs(2) timeinterac(trt) survcov(trt)

Author

Michael J. Crowther Department of Health Sciences University of Leicester E-mail: michael.crowther@le.ac.uk

Part of this work was conducted when MJC was on an internship at StataCorp. In particular, he would like to thank Yulia Marchenko, Jeff Pitblado, Alan Riley and Vince Wiggins.

Please report any errors you may find.

References

Crowther MJ, Abrams KR and Lambert PC. Flexible parametric joint modelling of longitudinal and survival data. Statistics in Medicine 2012; (In Press).

Crowther MJ, Abrams KR and Lambert PC. Joint modelling of longitudinal and survival data in Stata. The Stata Journal 2012; (In Press).

Lambert PC and Royston P. Further development of flexible parametric models for survival analysis. The Stata Journal 2009;9:265-290.

Rabe-Hesketh S, Skrondal A and Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal 2002;2:1-21.

Wulfsohn MS and Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics 1997;53:330-339.