Title

stjm-- Joint modelling of longitudinal and survival data

stjmlongdepvar[varlist] [if] [in],panel(varname)survmodel(survsubmodel)[options]

optionsDescription -------------------------------------------------------------------------panel(varname)panel identification variablesurvmodel(survsubmodel)survival submodelLongitudinal sub-model

ffp(numlist)fixed powers of timerfp(numlist)fixed and random powers of timefrcs(#)degrees of freedom for fixed splines of timerrcs(#)degrees of freedom for random splines of timetimeinteraction(varlist)covariates to interact with fixed time variablescovariance(vartype)variance-covariance structure of the random effectsSurvival sub-model

survcov(varlist)fixed baseline covariates to be included in the survival submodeldf(#)degrees of freedom for baseline hazard functionknots(numlist)knot locations for baseline hazard functionnoorthogdo not use orthogonal transformation of spline variablesAssociation

nocurrentassociation not based on the current value of longitudinal responsederivassociationassociation based on the first derivative (slope) of the longitudinal submodelintassociationallow association to be based on the random interceptassociation(numlist)allow association to be based on the random coefficient of a time variableassoccovariates(varlist)adjust the association parameter(s) by covariatesnocoefficientdo not include fixed coefficient when usingintassociationorassociation()Maximisation

gh(#)number of Gauss-Hermite quadrature pointsgk(#)number of Gauss-Kronrod quadrature pointsgl(#)number of Gauss-Legendre quadrature pointsadaptit(#)the number of adaptive Gauss-Hermite quadrature iterations; default is 5noshowadaptsuppress display of adaptive quadrature sub-iterationsatol(#)tolerance for the log-likelihood under the adaptive quadrature sub-iterations, default is 1.0E-05nonadaptuse non-adaptive Gauss-Hermite quadraturefulldatause all data in survival component maximisation, see detailsnullassocsets the initial values for association parameters to be zeronoxtemsuppresses the defaultemonlyoption used in the[XT] xtmixedcall for initial valuesmaximize_optionscontrol the maximization process; seldom usedReporting

showinitialdisplay output from initial value model fitsvarianceshow random-effects parameter estimates as variances-covariancesshowconslist constraints in outputkeepconsdo not drop constraints used in ml routinelevel(#)set confidence level; default islevel(95)

survsubmodelDescription -------------------------------------------------------------------------exponentialexponential survival submodelweibullWeibull survival submodelgompertzGompertz survival submodelfpmflexible parametric survival submodelweibweibMixture Weibull-Weibull survival submodelweibexpMixture Weibull-exponential survival submodel -------------------------------------------------------------------------

vartypeDescription -------------------------------------------------------------------------independentone variance parameter per random effect, all covariances zeroexchangeableequal variances for random effects, and one common pairwise covarianceidentityequal variances for random effects, all covariances zero; the default for factor variablesunstructuredall variances and covariances distinctly estimated; the default -------------------------------------------------------------------------

longdepvarspecifies the longitudinal continuous response variable. You muststsetyour data before usingstjm; see[ST] stset. Seestjm postestimationfor features available after estimation.

stjmfits shared parameter joint models for longitudinal and survival data using maximum likelihood. A single continuous longitudinal response and a single survival outcome are allowed. A linear mixed effects model is used for the longitudinal submodel, which lets time be modelled using fixed and/or random fractional polynomials or resticted cubic splines. Six choices are currently available for the survival submodel, including the exponential, Weibull, Gompertz, 2-component mixture Weibull-Weibull and 2-component mixture Weibull-exponential proportional hazards models. Furthermore, the flexible parametric survival model (see stpm2), modelled on the log cumulative hazard scale is also available. The association between the two processes can be induced via the default current value parameterisation, the first derivative of the longitudinal submodel, and/or a random coefficient such as the intercept. Adaptive or non-adaptive Gauss-Hermite quadrature, coded in Mata, can be used to evaluate the joint likelihood. Under all survival submodels except the flexible parametric model, Gauss-Kronrod quadrature is used to evaluate the cumulative hazard. The dataset must bestsetcorrectly into enter and exit times, using the enter option; see[ST] stset.stjmuses_t0to denote measurement times. For example, below we have 3 patients with 2, 5 and 3 measurements each, respectively.--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 1.67 2 1.2 1.6 0 1.92 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

Delayed entry joint models can be fitted, allowing age to be used as the timescale.

panel(varname)defines the panel identification variable. Each panel should be identified by a unique integer.

survmodel(survsubmodel)specifies the survival submodel to be fit.

survmodel(fpm)fits a flexible parametric survival submodel. This is a highly flexible fully parametric alternative to the Cox model, modelled on the log cumulative hazard scale using restricted cubic splines. For more details see stpm2.

survmodel(exponential)fits an exponential survival submodel.

survmodel(weibull)fits a Weibull survival submodel.

survmodel(gompertz)fits a Gompertz survival submodel.

survmodel(weibweib)fits a 2-component mixture Weibull-Weibull survival submodel.

survmodel(weibexp)fits a 2-component mixture Weibull-exponential survival submodel.+------------------------+ ----+ Longitudinal sub-model +-------------------------------------------

ffp(numlist)specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed covariates. _t0 is used as the time of measurements. Values must be in {-5, -4, -3, -2, -1, -0.5, 0, 0.5, 1, 2, 3, 4, 5}.

rfp(numlist)specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed and random covariates. _t0 is used as the time of measurements. Values must be in {-5, -4, -3, -2, -1, -0.5, 0, 0.5, 1, 2, 3, 4, 5}.

frcs(#)specifies the degrees of freedom of the time variable, expanding time into restricted cubic splines as fixed covariates. _t0 is used as the time of measurements. The default knot locations are described underdf().

rrcs(#)specifies the degrees of freedom of the time variable, expanding time into restricted cubic splines as random covariates. _t0 is used as the time of measurements. The default knot locations are described underdf().

timeinteraction(varlist)covariates to interact with the fixed time components specified inffp/frcs.

covariance(vartype)specifies the variance-covariance structure of the random effects.

covariance(independent)specifies a distinct variance for each random effect, with all covariances zero.

covariance(exchangeable)specifies equal variances for all random effects, and one common pairwise covariance.

covariance(identity)specifies equal variances for all random effects, with all covariances zero.

covariance(unstructured)specifies that all variances and covariances are distinctly estimated.+--------------------+ ----+ Survival sub-model +-----------------------------------------------

survcov(varlist)specifies covariates to be included in the survival submodel.

df(#)specifies the degrees of freedom for the restricted cubic spline function used for the baseline function under a flexible parametric survival submodel.#must be between 1 and 10, but usually a value between 1 and 4 is sufficient, with 3 being the default. Theknots()option is not applicable if thedf()option is specified. The knots are placed at the following centiles of the distribution of the uncensored log survival times:------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 1 0 (no knots) 2 1 50 3 2 33 67 4 3 25 50 75 5 4 20 40 60 80 6 5 17 33 50 67 83 7 6 14 29 43 57 71 86 8 7 12.5 25 37.5 50 62.5 75 87.5 9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 10 9 10 20 30 40 50 60 70 80 90 ------------------------------------------------------------ Note that these are

interior knotsand there are also boundary knots placed at the minimum and maximum of the distribution of uncensored survival times.

knots(numlist)specifies knot locations for the baseline distribution function under a flexible parametric survival submodel, as opposed to the default locations set by df(). Note that the locations of the knots are placed on the standard time scale. However, the scale used by the restricted cubic spline function is always log time. Default knot positions are determined by the df() option.

noorthogsuppresses orthogonal transformation of spline variables under a flexible parametric survival submodel.+-------------+ ----+ Association +------------------------------------------------------

nocurrentspecifies that the association between the survival and longitudinal submodels is not based on the current value. The default association is based on the current value of the longitudinal response.

derivassociationspecifies that the association between the survival and longitudinal submodels is based on the first derivative of the longitudinal submodel.

intassociationspecifies that the association between the survival and longitudinal submodels is based on the random intercept of the longitudinal submodel. By default this includes the fixed intercept coefficient.

association(numlist)specifies that the association between the survival and longitudinal submodels is based on a random coefficient of time fractional polynomials specified inrfracpoly. By default this includes the fixed coefficient.

assoccovariates(varlist)covariates to include in the linear predictor of the association parameter(s). Under the default current value association, this corresponds to interacting the longitudinal submodel with covariates.

nocoefficientdo not include the fixed coefficient when usingintassociationorassociation().+--------------+ ----+ Maximisation +-----------------------------------------------------

gh(#)specifies the number of Gauss-Hermite quadrature nodes used to evaluate the integrals over the random effects. The defaults are 5 and 15 under adaptive and non-adaptive, respectively. Minimum number of quadrature points is 2.

gk(#)specifies the number of Gauss-Kronrod quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. Two choices are available, namely 7 or 15, with the default 15.

gl(#)specifies the number of Gauss-Legendre quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. This is an alternative to Gauss-Kronrod quadrature, where the user can specify any number of nodes >=5.

adaptit(#)defines the number of iterations of adaptive Gauss-Hermite quadrature to use in the maximisation process, with the default 5. Adaptive quadrature is implemented at the beginning of each full Newton-Raphson iteration.

noshowadaptsuppresses the display of the log-likelihood values under the sub-iterations used to assess convergence of the adaptive quadrature implemented at the beginning of each full Newton-Raphson iteration.

atol(#)tolerance for the log-likelihood under adaptive quadrature sub-iterations, default is 1.0E-05

nonadaptuse non-adaptive Gauss-Hermite quadrature to evaluate the joint likelihood. This will generally require a much higher number of nodes,gh, to ensure accurate estimates and standard errors, resulting in much greater computation time.

fulldataforcesstjmto use all rows of data in the survival component of the likelihood. By default,stjmassesses whether all covariates specified insurvcov()are constant within panels, and if they are, only needs to use the first row of_t0and the final row of_tin the maximisation process providing considerable speed advantages.

nullassocsets the initial value for association parameters to be zero. Use of the default initial values may in rare situations causestjmto displayinitial values not feasible; using this option solves this, however, convergence time is generally longer.

noxtemsuppresses the use of theemonlyoption in the[XT] xtmixedcall uses to obtain initial values. By default,emonlyis used which is often quicker and provides adequate starting values for the longitudinal component.

maximize_options;difficult,technique(algorithm_spec),iterate(#), [no]log,trace,gradient,showstep,hessian,shownrtolerance,tolerance(#),ltolerance(#)gtolerance(#),nrtolerance(#),nonrtolerance,from(init_specs); see[R] maximize. These options are seldom used, but thedifficultoption may be useful if there are convergence problems.+-----------+ ----+ Reporting +--------------------------------------------------------

showinitialdisplays the output from the[XT] xtmixedand[ST] streg,stpm2orstmixmodels fitted to obtain initial values forstjm.

varianceshow random-effects parameter estimates as variances-covariances

showconsdisplays the constraints used bystpm2andstjmfor the derivatives of the spline function. This option is only valid under a flexible parametric survival submodel.

keepconsprevents the constraints imposed bystjmon the derivatives of the spline function when fitting delayed entry models being dropped. By default, the constraints are dropped. This option is only valid under a flexible parametric survival submodel.

level(#)specifies the confidence level, as a percentage, for confidence intervals. The default islevel(95)or as set byset level.

Remarks1. A random intercept is always assumed in each

stjmcall.2. Measurement time must be exclusively controlled through the options

ffp(),rfp(),frcs(),rrcs()andtimeinteraction(), and not included as fixed covariates in either submodel through [varlist] orsurvcov(varlist).3. As with all survival models with multiple-record

stdata, time-varying covariates can be included in each submodel.4. Estimation is performed by maximum likelihood. Optimisation uses the default technique

nr, meaning Stata's version of Newton-Raphson iterations.5. If convergence issues arise, try specifying the

nullassocoption and/or increasing the number of Gauss-Hermite nodes,gh(). Users may also wish to vary the number of adaptive quadrature iterations usingadaptit().6. As with all models which use numerical integration, the number of quadrature nodes should be increased to establish the stability of the estimates.

7. Note that under a flexible parametric survival sub-model, if more than one random effect is specified then survival sub-model coefficients are interpreted as proportional cumulative hazard ratios. In this case the equivalency between proportional cumulative hazard ratios and proportional hazards ratios does not hold.

+---------------------------+ ----+ Intermittant missing data +----------------------------------------

If intermittent missing data is present in any covariates, for example:

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 . 2 1.2 1.6 0 . 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

then care must be taken to ensure the appropriate rows of data are included in the survival component of the joint likelihood. By default,

stjmassesses whether all covariates included insurvcov()are constant within panels, and if they are, only has to use the first row of_t0and the final row of_tfor the survival likelihood component. However, if they are not, or thefulldataoption is used, then all rows are included, as with multiple-recordstdata.For example, if we were to use the

fulldataoption when analysing the data in the above Table, then patient 2's survival contribution would be missing between times_t0 = 0.5and_t = 1.6. However, the correct contribution would be made by using only the first row of_t0and final row of_t.

stjmdisplays a warning when this situation is detected. The simplest way of avoiding this is to remove any missing data before usingstset.

Load simulated example dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_example

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph long_response, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a flexible parametric survival submodel with 3 degrees of freedom, and association based on the current value. No covariates in either submodel. . stjm long_response, panel(id) survmodel(fpm) df(3) ffp(1)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, adjusting for treatment in the survival submodel and the interaction between treatment and measurement time in the longitudinal submodel. Current value association. . stjm long_response trt, panel(id) survmodel(weibull) ffp(1) survcov(trt) timeinterac(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and adjusting for treatment in both submodels. Risk of event dependent on the current value and the first derivative of the longitudinal submodel. . stjm long_response trt, panel(id) survmodel(weibull) rfp(1) survcov(trt) derivassoc

Example 2: Primary Biliary Cirrhosis datasetThis example dataset contains 1945 repeated measurements of serum bilirubin, from 312 patients with Primary Biliary Cirrhosis (PBC). Patients received treatment of D-penicillamine or placebo. In all analyses we use the log of serum bilirubin.

Load PBC dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_pbc_example_data

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph logb, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. . stjm logb, panel(id) survmodel(w) ffp(1) timeinterac(trt) survcov(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. We also adjust for the interaction between log serum bilirubin and treatment. . stjm logb, panel(id) survmodel(w) rfp(1) timeinterac(trt) survcov(trt) assoccov(trt)

Example 3: Liver cirrhosis with repeated measures of prothrombin indexThis example dataset contains 2968 repeated measurements of prothrombin index, from 488 patients with liver cirrhosis. Patients received treatment of prednisone or placebo.

Load dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_prothro

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph pro, panel(id) lowess

Joint model with a random intercept and random slope in the longitudinal submodel, a mixture Weibull-exponential survival submodel, and association based on the current value. We adjust for treatment in the survival submodel. . stjm pro, panel(id) survmodel(weibexp) rfp(1) survcov(trt)

Joint model with a random intercept and fixed splines of time with 2 degrees of freedom in the longitudinal submodel, a mixture Weibull-Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed splines of time in the longitudinal submodel, and treatment in the survival submodel. . stjm pro, panel(id) survmodel(weibweib) frcs(2) timeinterac(trt) survcov(trt)

AuthorMichael J. Crowther Department of Health Sciences University of Leicester E-mail: michael.crowther@le.ac.uk

Part of this work was conducted when MJC was on an internship at StataCorp. In particular, he would like to thank Yulia Marchenko, Jeff Pitblado, Alan Riley and Vince Wiggins.

Please report any errors you may find.

ReferencesCrowther MJ, Abrams KR and Lambert PC. Flexible parametric joint modelling of longitudinal and survival data.

Statistics in Medicine2012; (In Press).Crowther MJ, Abrams KR and Lambert PC. Joint modelling of longitudinal and survival data in Stata.

The Stata Journal2012; (In Press).Lambert PC and Royston P. Further development of flexible parametric models for survival analysis.

The Stata Journal2009;9:265-290.Rabe-Hesketh S, Skrondal A and Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature.

The StataJournal2002;2:1-21.Wulfsohn MS and Tsiatis AA. A joint model for survival and longitudinal data measured with error.

Biometrics1997;53:330-339.