-------------------------------------------------------------------------------
help stjm11                          also see: stjm11 postestimation, stjmgraph
-------------------------------------------------------------------------------

Title

stjm11 -- Joint modelling of longitudinal and survival data

Syntax

stjm11 longdepvar [varlist] [if] [in] , panel(varname) survmodel(survsubmodel) [options]

options Description ------------------------------------------------------------------------- panel(varname) panel identification variable survmodel(survsubmodel) survival submodel

Longitudinal sub-model ffracpoly(numlist) fixed powers of time rfracpoly(numlist) fixed and random powers of time timeinteraction(varlist) covariates to interact with fixed time variables covariance(vartype) variance-covariance structure of the random effects

Survival sub-model survcov(varlist) fixed baseline covariates to be included in the survival submodel df(#) degrees of freedom for baseline hazard function knots(numlist) knot locations for baseline hazard function noorthog do not use orthogonal transformation of spline variables

Association nocurrent association not based on the current value of longitudinal response derivassociation association based on the first derivative (slope) of the longitudinal submodel intassociation allow association to be based on the random intercept association(numlist) allow association to be based on the random coefficient of a time variable assoccovariates(varlist) adjust the association parameter(s) by covariates

Maximisation gh(#) number of Gauss-Hermite quadrature points gk(#) number of Gauss-Kronrod quadrature points adaptit(#) the number of adaptive Gauss-Hermite quadrature iterations; default is 3 nonadapt use non-adaptive Gauss-Hermite quadrature fulldata use all data in survival component maximisation, see details nullassoc sets the initial values for association parameters to be zero maximize_options control the maximization process; seldom used

Reporting showinitial display output from initial value model fits variance show random-effects parameter estimates as variances-covariances showcons list constraints in output keepcons do not drop constraints used in ml routine level(#) set confidence level; default is level(95)

survsubmodel Description ------------------------------------------------------------------------- exponential exponential survival submodel weibull Weibull survival submodel gompertz Gompertz survival submodel fpm flexible parametric survival submodel -------------------------------------------------------------------------

vartype Description ------------------------------------------------------------------------- independent one variance parameter per random effect, all covariances zero exchangeable equal variances for random effects, and one common pairwise covariance identity equal variances for random effects, all covariances zero; the default for factor variables unstructured all variances and covariances distinctly estimated; the default -------------------------------------------------------------------------

longdepvar specifies the longitudinal continuous response variable. You must stset your data before using stjm11; see [ST] stset. See stjm11 postestimation for features available after estimation.

Description

stjm11 fits shared parameter joint models for longitudinal and survival data using maximum likelihood. A single continuous longitudinal response and a single survival outcome are allowed. A linear mixed effects model is used for the longitudinal submodel, which lets time be modelled using fixed and/or random fractional polynomials. Four choices are currently available for the survival submodel; the first being the flexible parametric survival model (see stpm2), modelled on the log cumulative hazard scale. The remaining choices include the exponential, Weibull and Gompertz proportional hazard models. The association between the two processes can be induced via the default current value parameterisation, the first derivative of the longitudinal submodel, and/or a random coefficient such as the intercept. Adaptive Gauss-Hermite quadrature, coded in Mata, is used to evaluate the joint likelihood. Under an exponential/Weibull/Gompertz survival submodel, Gauss-Kronrod quadrature is used to evaluate the cumulative hazard. The dataset must be stset correctly into enter and exit times, using the enter option; see [ST] stset. stjm11 uses _t0 to denote measurement times. For example, below we have 3 patients with 2, 5 and 3 measurements each, respectively.

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 1.67 2 1.2 1.6 0 1.92 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

Delayed entry joint models can be fitted, allowing age to be used as the timescale. Delayed entry models may also be executed when the timescale is adjusted to allow fractional polynomials of time when _t0 = 0.

Options

panel(varname) defines the panel identification variable. Each panel should be identified by a unique integer.

survmodel(survsubmodel) specifies the survival submodel to be fit.

survmodel(fpm) fits a flexible parametric survival submodel. This is a highly flexible fully parametric alternative to the Cox model, modelled on the log cumulative hazard scale using restricted cubic splines. For more details see stpm2.

survmodel(exponential) fits an exponential survival submodel. This is modelled on the hazard scale.

survmodel(weibull) fits a Weibull survival submodel. This is modelled on the hazard scale.

survmodel(gompertz) fits a Gompertz survival submodel. This is modelled on the hazard scale.

+------------------------+ ----+ Longitudinal sub-model +-------------------------------------------

ffracpoly(numlist) specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed covariates. _t0 is used as the time of measurements.

rfracpoly(numlist) specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed and random covariates. _t0 is used as the time of measurements.

timeinteraction(varlist) covariates to interact with the fixed fractional polynomials of measurement time.

covariance(vartype) specifies the variance-covariance structure of the random effects.

covariance(independent) specifies a distinct variance for each random effect, with all covariances zero.

covariance(exchangeable) specifies equal variances for all random effects, and one common pairwise covariance.

covariance(identity) specifies equal variances for all random effects, with all covariances zero.

covariance(unstructured) specifies that all variances and covariances are distinctly estimated.

+--------------------+ ----+ Survival sub-model +-----------------------------------------------

survcov(varlist) specifies covariates to be included in the survival submodel.

df(#) specifies the degrees of freedom for the restricted cubic spline function used for the baseline function under a flexible parametric survival submodel. # must be between 1 and 10, but usually a value between 1 and 4 is sufficient, with 3 being the default. The knots() option is not applicable if the df() option is specified. The knots are placed at the following centiles of the distribution of the uncensored log survival times:

------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 1 0 (no knots) 2 1 50 3 2 33 67 4 3 25 50 75 5 4 20 40 60 80 6 5 17 33 50 67 83 7 6 14 29 43 57 71 86 8 7 12.5 25 37.5 50 62.5 75 87.5 9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 10 9 10 20 30 40 50 60 70 80 90 ------------------------------------------------------------ Note that these are interior knots and there are also boundary knots placed at the minimum and maximum of the distribution of uncensored survival times.

knots(numlist) specifies knot locations for the baseline distribution function under a flexible parametric survival submodel, as opposed to the default locations set by df(). Note that the locations of the knots are placed on the standard time scale. However, the scale used by the restricted cubic spline function is always log time. Default knot positions are determined by the df() option.

noorthog suppresses orthogonal transformation of spline variables under a flexible parametric survival submodel.

+-------------+ ----+ Association +------------------------------------------------------

nocurrent specifies that the association between the survival and longitudinal submodels is not based on the current value. The default association is based on the current value of the longitudinal response.

derivassociation specifies that the association between the survival and longitudinal submodels is based on the first derivative of the longitudinal submodel.

intassociation specifies that the association between the survival and longitudinal submodels is based on the random intercept of the longitudinal submodel.

association(numlist) specifies that the association between the survival and longitudinal submodels is based on a random coefficient of time fractional polynomials specified in rfracpoly.

assoccovariates(varlist) covariates to include in the linear predictor of the association parameter(s). Under the default current value association, this corresponds to interacting the longitudinal submodel with covariates.

+--------------+ ----+ Maximisation +-----------------------------------------------------

gh(#) specifies the number of Gauss-Hermite quadrature nodes used to evaluate the integrals over the random effects. The defaults are 5 and 15 under adaptive and non-adaptive, respectively. Minimum number of quadrature points is 2.

gk(#) specifies the number of Gauss-Kronrod quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. Two choices are available, namely 7 or 15, with the default 15.

adaptit(#) defines the number of iterations of adaptive Gauss-Hermite quadrature to use in the maximisation process, with the default 3. Adaptive quadrature is implemented at the beginning of each full Newton-Raphson iteration.

nonadapt use non-adaptive Gauss-Hermite quadrature to evaluate the joint likelihood. This will generally require a much higher number of nodes, gh, to ensure accurate estimates and standard errors, resulting in much greater computation time.

fulldata forces stjm11 to use all rows of data in the survival component of the likelihood. By default, stjm11 assesses whether all covariates specified in survcov() are constant within panels, and if they are, only needs to use the first row of _t0 and the final row of _t in the maximisation process providing considerable speed advantages.

nullassoc sets the initial value for association parameters to be zero. Use of the default initial values may in rare situations cause stjm11 to display initial values not feasible; using this option solves this, however, convergence time is generally longer.

maximize_options; difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, shownrtolerance, tolerance(#), ltolerance(#) gtolerance(#), nrtolerance(#), nonrtolerance, from(init_specs); see [R] maximize. These options are seldom used, but the difficult option may be useful if there are convergence problems.

+-----------+ ----+ Reporting +--------------------------------------------------------

showinitial displays the output from the xtmixed and stpm2 or streg models fitted to obtain initial values for stjm11.

variance show random-effects parameter estimates as variances-covariances

showcons displays the constraints used by stpm2 and stjm11 for the derivatives of the spline function. This option is only valid under a flexible parametric survival submodel.

keepcons prevents the constraints imposed by stjm11 on the derivatives of the spline function when fitting delayed entry models being dropped. By default, the constraints are dropped. This option is only valid under a flexible parametric survival submodel.

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level.

Remarks

1. A random intercept is always assumed in each stjm11 call.

2. Measurement time must be exclusively controlled through the options ffracpoly, rfracpoly and timeinteraction, and not included as fixed covariates in either submodel through [varlist] or survcov(varlist).

3. As with all survival models with multiple-record st data, time-varying covariates can be included in each submodel.

4. Estimation is performed by maximum likelihood. Optimisation uses the default technique nr, meaning Stata's version of Newton-Raphson iterations.

5. If convergence issues arise, try specifying the nullassoc option and/or increasing the number of Gauss-Hermite nodes, gh. Users may also wish to vary the number of adaptive quadrature iterations.

6. As with all models which use numerical integration, the number of quadrature nodes should be increased to establish the stability of the estimates.

7. Note that under a flexible parametric survival sub-model, if more than one random effect is specified then survival sub-model coefficients are interpreted as proportional cumulative hazard ratios. In this case the equivalency between proportional cumulative hazard ratios and proportional hazards ratios does not hold.

+---------------------------+ ----+ Intermittant missing data +----------------------------------------

If intermittent missing data is present in any covariates, for example:

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 . 2 1.2 1.6 0 . 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

then care must be taken to ensure the appropriate rows of data are included in the survival component of the joint likelihood. By default, stjm assesses whether all covariates included in survcov() are constant within panels, and if they are, only has to use the first row of _t0 and the final row of _t for the survival likelihood component. However, if they are not, or the fulldata option is used, then all rows are included, as with multiple-record st data.

For example, if we were to use the fulldata option when analysing the data in the above Table, then patient 2's survival contribution would be missing between times _t0 = 0.5 and _t = 1.6. However, the correct contribution would be made by using only the first row of _t0 and final row of _t.

stjm displays a warning when this situation is detected. The simplest way of avoiding this is to remove any missing data before using stset.

Example 1: Simulated dataset

Load simulated example dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_example

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph long_response, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a flexible parametric survival submodel with 3 degrees of freedom, and association based on the current value. No covariates in either submodel. . stjm11 long_response, panel(id) survmodel(fpm) df(3) ffracpoly(1)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, adjusting for treatment in the survival submodel and the interaction between treatment and measurement time in the longitudinal submodel. Current value association. . stjm11 long_response trt, panel(id) survmodel(weibull) ffracpoly(1) survcov(trt) timeinterac(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and adjusting for treatment in both submodels. Risk of event dependent on the current value and the first derivative of the longitudinal submodel. . stjm11 long_response trt, panel(id) survmodel(weibull) rfracpoly(1) survcov(trt) derivassoc

Example 2: Primary Biliary Cirrhosis dataset

This example dataset contains 1945 repeated measurements of serum bilirubin, from 312 patients with Primary Biliary Cirrhosis (PBC). Patients received treatment of D-penicillamine or placebo. In all analyses we use the log of serum bilirubin.

Load PBC dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_pbc_example_data

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph logb, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. . stjm11 logb, panel(id) survmodel(w) ffracpoly(1) timeinterac(trt) survcov(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. We also adjust for the interaction between log serum bilirubin and treatment. . stjm11 logb, panel(id) survmodel(w) rfracpoly(1) timeinterac(trt) survcov(trt) assoccov(trt)

Author

Michael J. Crowther Department of Health Sciences University of Leicester E-mail: michael.crowther@le.ac.uk

Please report any errors you may find.

References

Crowther, M. J. & Abrams, K. R. & Lambert, P. C. (2012). Flexible parametric joint modelling of longitudinal and survival data. (Under review).

Crowther, M. J. & Abrams, K. R. & Lambert, P. C. (2012). Joint modelling of longitudinal and survival data. (Under review).

Lambert, P. C. & Royston, P. Further development of flexible parametric models for survival analysis. The Stata Journal, 2009, 9, 265-290.

Royston, P. & Altman, D. G. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. JRSS Series C (Applied Statistics), 1994, 43, 429-467.

Wulfsohn, M. S. & Tsiatis, A. A. A joint model for survival and longitudinal data measured with error. Biometrics, 1997, 53, 330-339.

Also see

Online: stjm11 postestimation, stjmgraph, xtmixed, streg, stpm2 (if installed)