-------------------------------------------------------------------------------help stjm11also see:stjm11 postestimation,stjmgraph-------------------------------------------------------------------------------

Title

stjm11-- Joint modelling of longitudinal and survival data

Syntax

stjm11longdepvar[varlist] [if] [in],panel(varname)survmodel(survsubmodel)[options]

optionsDescription -------------------------------------------------------------------------panel(varname)panel identification variablesurvmodel(survsubmodel)survival submodelLongitudinal sub-model

ffracpoly(numlist)fixed powers of timerfracpoly(numlist)fixed and random powers of timetimeinteraction(varlist)covariates to interact with fixed time variablescovariance(vartype)variance-covariance structure of the random effectsSurvival sub-model

survcov(varlist)fixed baseline covariates to be included in the survival submodeldf(#)degrees of freedom for baseline hazard functionknots(numlist)knot locations for baseline hazard functionnoorthogdo not use orthogonal transformation of spline variablesAssociation

nocurrentassociation not based on the current value of longitudinal responsederivassociationassociation based on the first derivative (slope) of the longitudinal submodelintassociationallow association to be based on the random interceptassociation(numlist)allow association to be based on the random coefficient of a time variableassoccovariates(varlist)adjust the association parameter(s) by covariatesMaximisation

gh(#)number of Gauss-Hermite quadrature pointsgk(#)number of Gauss-Kronrod quadrature pointsadaptit(#)the number of adaptive Gauss-Hermite quadrature iterations; default is 3nonadaptuse non-adaptive Gauss-Hermite quadraturefulldatause all data in survival component maximisation, see detailsnullassocsets the initial values for association parameters to be zeromaximize_optionscontrol the maximization process; seldom usedReporting

showinitialdisplay output from initial value model fitsvarianceshow random-effects parameter estimates as variances-covariancesshowconslist constraints in outputkeepconsdo not drop constraints used in ml routinelevel(#)set confidence level; default islevel(95)

survsubmodelDescription -------------------------------------------------------------------------exponentialexponential survival submodelweibullWeibull survival submodelgompertzGompertz survival submodelfpmflexible parametric survival submodel -------------------------------------------------------------------------

vartypeDescription -------------------------------------------------------------------------independentone variance parameter per random effect, all covariances zeroexchangeableequal variances for random effects, and one common pairwise covarianceidentityequal variances for random effects, all covariances zero; the default for factor variablesunstructuredall variances and covariances distinctly estimated; the default -------------------------------------------------------------------------

longdepvarspecifies the longitudinal continuous response variable. You muststsetyour data before usingstjm11; see[ST] stset. Seestjm11 postestimationfor features available after estimation.

Description

stjm11fits shared parameter joint models for longitudinal and survival data using maximum likelihood. A single continuous longitudinal response and a single survival outcome are allowed. A linear mixed effects model is used for the longitudinal submodel, which lets time be modelled using fixed and/or random fractional polynomials. Four choices are currently available for the survival submodel; the first being the flexible parametric survival model (see stpm2), modelled on the log cumulative hazard scale. The remaining choices include the exponential, Weibull and Gompertz proportional hazard models. The association between the two processes can be induced via the default current value parameterisation, the first derivative of the longitudinal submodel, and/or a random coefficient such as the intercept. Adaptive Gauss-Hermite quadrature, coded in Mata, is used to evaluate the joint likelihood. Under an exponential/Weibull/Gompertz survival submodel, Gauss-Kronrod quadrature is used to evaluate the cumulative hazard. The dataset must bestsetcorrectly into enter and exit times, using the enter option; see[ST]stset.stjm11uses_t0to denote measurement times. For example, below we have 3 patients with 2, 5 and 3 measurements each, respectively.--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 1.67 2 1.2 1.6 0 1.92 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

Delayed entry joint models can be fitted, allowing age to be used as the timescale. Delayed entry models may also be executed when the timescale is adjusted to allow fractional polynomials of time when

_t0= 0.

Options

panel(varname)defines the panel identification variable. Each panel should be identified by a unique integer.

survmodel(survsubmodel)specifies the survival submodel to be fit.

survmodel(fpm)fits a flexible parametric survival submodel. This is a highly flexible fully parametric alternative to the Cox model, modelled on the log cumulative hazard scale using restricted cubic splines. For more details see stpm2.

survmodel(exponential)fits an exponential survival submodel. This is modelled on the hazard scale.

survmodel(weibull)fits a Weibull survival submodel. This is modelled on the hazard scale.

survmodel(gompertz)fits a Gompertz survival submodel. This is modelled on the hazard scale.+------------------------+ ----+ Longitudinal sub-model +-------------------------------------------

ffracpoly(numlist)specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed covariates. _t0 is used as the time of measurements.

rfracpoly(numlist)specifies power transformations of the time variable, to be included in the longitudinal submodel as fixed and random covariates. _t0 is used as the time of measurements.

timeinteraction(varlist)covariates to interact with the fixed fractional polynomials of measurement time.

covariance(vartype)specifies the variance-covariance structure of the random effects.

covariance(independent)specifies a distinct variance for each random effect, with all covariances zero.

covariance(exchangeable)specifies equal variances for all random effects, and one common pairwise covariance.

covariance(identity)specifies equal variances for all random effects, with all covariances zero.

covariance(unstructured)specifies that all variances and covariances are distinctly estimated.+--------------------+ ----+ Survival sub-model +-----------------------------------------------

survcov(varlist)specifies covariates to be included in the survival submodel.

df(#)specifies the degrees of freedom for the restricted cubic spline function used for the baseline function under a flexible parametric survival submodel.#must be between 1 and 10, but usually a value between 1 and 4 is sufficient, with 3 being the default. Theknots()option is not applicable if thedf()option is specified. The knots are placed at the following centiles of the distribution of the uncensored log survival times:------------------------------------------------------------ df knots Centile positions ------------------------------------------------------------ 1 0 (no knots) 2 1 50 3 2 33 67 4 3 25 50 75 5 4 20 40 60 80 6 5 17 33 50 67 83 7 6 14 29 43 57 71 86 8 7 12.5 25 37.5 50 62.5 75 87.5 9 8 11.1 22.2 33.3 44.4 55.6 66.7 77.8 88.9 10 9 10 20 30 40 50 60 70 80 90 ------------------------------------------------------------ Note that these are

interior knotsand there are also boundary knots placed at the minimum and maximum of the distribution of uncensored survival times.

knots(numlist)specifies knot locations for the baseline distribution function under a flexible parametric survival submodel, as opposed to the default locations set by df(). Note that the locations of the knots are placed on the standard time scale. However, the scale used by the restricted cubic spline function is always log time. Default knot positions are determined by the df() option.

noorthogsuppresses orthogonal transformation of spline variables under a flexible parametric survival submodel.+-------------+ ----+ Association +------------------------------------------------------

nocurrentspecifies that the association between the survival and longitudinal submodels is not based on the current value. The default association is based on the current value of the longitudinal response.

derivassociationspecifies that the association between the survival and longitudinal submodels is based on the first derivative of the longitudinal submodel.

intassociationspecifies that the association between the survival and longitudinal submodels is based on the random intercept of the longitudinal submodel.

association(numlist)specifies that the association between the survival and longitudinal submodels is based on a random coefficient of time fractional polynomials specified inrfracpoly.

assoccovariates(varlist)covariates to include in the linear predictor of the association parameter(s). Under the default current value association, this corresponds to interacting the longitudinal submodel with covariates.+--------------+ ----+ Maximisation +-----------------------------------------------------

gh(#)specifies the number of Gauss-Hermite quadrature nodes used to evaluate the integrals over the random effects. The defaults are 5 and 15 under adaptive and non-adaptive, respectively. Minimum number of quadrature points is 2.

gk(#)specifies the number of Gauss-Kronrod quadrature nodes used to evaluate the cumulative hazard under an exponential/Weibull/Gompertz survival submodel. Two choices are available, namely 7 or 15, with the default 15.

adaptit(#)defines the number of iterations of adaptive Gauss-Hermite quadrature to use in the maximisation process, with the default 3. Adaptive quadrature is implemented at the beginning of each full Newton-Raphson iteration.

nonadaptuse non-adaptive Gauss-Hermite quadrature to evaluate the joint likelihood. This will generally require a much higher number of nodes,gh, to ensure accurate estimates and standard errors, resulting in much greater computation time.

fulldataforcesstjm11to use all rows of data in the survival component of the likelihood. By default,stjm11assesses whether all covariates specified insurvcov()are constant within panels, and if they are, only needs to use the first row of_t0and the final row of_tin the maximisation process providing considerable speed advantages.

nullassocsets the initial value for association parameters to be zero. Use of the default initial values may in rare situations causestjm11to displayinitial values not feasible; using this option solves this, however, convergence time is generally longer.

maximize_options;difficult,technique(algorithm_spec),iterate(#), [no]log,trace,gradient,showstep,hessian,shownrtolerance,tolerance(#),ltolerance(#)gtolerance(#),nrtolerance(#),nonrtolerance,from(init_specs); see[R] maximize. These options are seldom used, but thedifficultoption may be useful if there are convergence problems.+-----------+ ----+ Reporting +--------------------------------------------------------

showinitialdisplays the output from the xtmixed and stpm2 or streg models fitted to obtain initial values forstjm11.

varianceshow random-effects parameter estimates as variances-covariances

showconsdisplays the constraints used by stpm2 andstjm11for the derivatives of the spline function. This option is only valid under a flexible parametric survival submodel.

keepconsprevents the constraints imposed bystjm11on the derivatives of the spline function when fitting delayed entry models being dropped. By default, the constraints are dropped. This option is only valid under a flexible parametric survival submodel.

level(#)specifies the confidence level, as a percentage, for confidence intervals. The default islevel(95)or as set byset level.

Remarks1. A random intercept is always assumed in each

stjm11call.2. Measurement time must be exclusively controlled through the options

ffracpoly,rfracpolyandtimeinteraction, and not included as fixed covariates in either submodel through [varlist] orsurvcov(varlist).3. As with all survival models with multiple-record

stdata, time-varying covariates can be included in each submodel.4. Estimation is performed by maximum likelihood. Optimisation uses the default technique

nr, meaning Stata's version of Newton-Raphson iterations.5. If convergence issues arise, try specifying the

nullassocoption and/or increasing the number of Gauss-Hermite nodes,gh. Users may also wish to vary the number of adaptive quadrature iterations.6. As with all models which use numerical integration, the number of quadrature nodes should be increased to establish the stability of the estimates.

7. Note that under a flexible parametric survival sub-model, if more than one random effect is specified then survival sub-model coefficients are interpreted as proportional cumulative hazard ratios. In this case the equivalency between proportional cumulative hazard ratios and proportional hazards ratios does not hold.

+---------------------------+ ----+ Intermittant missing data +----------------------------------------

If intermittent missing data is present in any covariates, for example:

--------------------------------- id _t0 _t _d long_resp --------------------------------- 1 0 0.2 0 0.93 1 0.2 0.7 0 1.32 2 0 0.5 0 1.15 2 0.5 1.2 0 . 2 1.2 1.6 0 . 2 1.6 1.9 0 2.65 2 1.9 2.6 1 3.15 3 0 2 0 0.25 3 2 2.3 0 0.21 3 2.3 2.4 1 0.31 ---------------------------------

then care must be taken to ensure the appropriate rows of data are included in the survival component of the joint likelihood. By default,

stjmassesses whether all covariates included insurvcov()are constant within panels, and if they are, only has to use the first row of_t0and the final row of_tfor the survival likelihood component. However, if they are not, or thefulldataoption is used, then all rows are included, as with multiple-recordstdata.For example, if we were to use the

fulldataoption when analysing the data in the above Table, then patient 2's survival contribution would be missing between times_t0 = 0.5and_t = 1.6. However, the correct contribution would be made by using only the first row of_t0and final row of_t.

stjmdisplays a warning when this situation is detected. The simplest way of avoiding this is to remove any missing data before usingstset.

Example 1: Simulated datasetLoad simulated example dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_example

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph long_response, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a flexible parametric survival submodel with 3 degrees of freedom, and association based on the current value. No covariates in either submodel. . stjm11 long_response, panel(id) survmodel(fpm) df(3) ffracpoly(1)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, adjusting for treatment in the survival submodel and the interaction between treatment and measurement time in the longitudinal submodel. Current value association. . stjm11 long_response trt, panel(id) survmodel(weibull) ffracpoly(1) survcov(trt) timeinterac(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and adjusting for treatment in both submodels. Risk of event dependent on the current value and the first derivative of the longitudinal submodel. . stjm11 long_response trt, panel(id) survmodel(weibull) rfracpoly(1) survcov(trt) derivassoc

Example 2: Primary Biliary Cirrhosis datasetThis example dataset contains 1945 repeated measurements of serum bilirubin, from 312 patients with Primary Biliary Cirrhosis (PBC). Patients received treatment of D-penicillamine or placebo. In all analyses we use the log of serum bilirubin.

Load PBC dataset: . use http://fmwww.bc.edu/repec/bocode/s/stjm_pbc_example_data

stset the data: . stset stop, enter(start) f(event=1) id(id)

Explore the joint data with a joint plot: . stjmgraph logb, panel(id)

Joint model with a random intercept and fixed slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. . stjm11 logb, panel(id) survmodel(w) ffracpoly(1) timeinterac(trt) survcov(trt)

Joint model with a random intercept and random slope in the longitudinal submodel, a Weibull survival submodel, and association based on the current value. We adjust for the interaction between treatment and fixed time in the longitudinal submodel, and treatment in the survival submodel. We also adjust for the interaction between log serum bilirubin and treatment. . stjm11 logb, panel(id) survmodel(w) rfracpoly(1) timeinterac(trt) survcov(trt) assoccov(trt)

AuthorMichael J. Crowther Department of Health Sciences University of Leicester E-mail: michael.crowther@le.ac.uk

Please report any errors you may find.

ReferencesCrowther, M. J. & Abrams, K. R. & Lambert, P. C. (2012). Flexible parametric joint modelling of longitudinal and survival data. (Under review).

Crowther, M. J. & Abrams, K. R. & Lambert, P. C. (2012). Joint modelling of longitudinal and survival data. (Under review).

Lambert, P. C. & Royston, P. Further development of flexible parametric models for survival analysis.

The Stata Journal, 2009, 9, 265-290.Royston, P. & Altman, D. G. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling.

JRSS SeriesC (Applied Statistics), 1994, 43, 429-467.Wulfsohn, M. S. & Tsiatis, A. A. A joint model for survival and longitudinal data measured with error.

Biometrics, 1997, 53, 330-339.

Also seeOnline:

stjm11 postestimation,stjmgraph,xtmixed,streg,stpm2(if installed)