-------------------------------------------------------------------------------
help for stpm                                                   Patrick Royston
-------------------------------------------------------------------------------

Flexible parametric models for survival-time data

        stpm [varlist] [if exp] [in range] , model_complexity
                 scale(hazard|normal|odds) [all cluster(cluster_var)
                 left(leftvar)) mlmaxopts(ml_maximize_options) noconstant
                 nolog noorthog offset(offsetvar) q(orthog_matrix)
                 spline(splinevar derivativevar) stratify(strat_varlist)
                 technique(algorithm_spec) theta(est|#)]

    where

        model_complexity is df(#) or stpmdf(#) or knots([l|%]knotlist) or
        knots(u#1 #2 #3).


    stpm is for use with survival-time data; see st.  You must have stset
    your data before using this command; see help st. Weights are to be set
    by stset. All weight-types are supported.

    The syntax of predict following stpm is

        predict [type] newvarname [if exp] [in range] [, statistic at_stuff
                noconstant nooffset stdp store(global_macro_name [#])
                time(#|vn) zero]


    where at_stuff is varname # |vn [varname #|vn ...]) or at(@varname
    #|string) or at(+#)

    and statistic is

        xb                    index (linear predictor for equation [xb]); the
                              default
        cumodds               log cumulative odds function
        cumhazard             log cumulative hazard function
        normal                Normal deviate function
        spline                fitted spline function
        dzdy                  derivative of fitted spline function w.r.t.
                              ln(_t)
        density               density function
        deviance              deviance residuals
        hazard                hazard function
        martingale            martingale residuals
        survival              survival function
        failure               failure (cumulative incidence) function
        centile(#|varname)    # th centile of survival time distribution (or
                              centiles stored in varname)
        tvc(varname)          time-varying coefficient for varname (see
                              Options for predict)

    All statistics are available both in and out of sample; type "predict ...
    if e(sample) ..." if wanted only for the estimation sample. The default
    is linear prediction of the covariate part of the model, i.e. for
    statistic xb.

    Not all of the options for predict are available with every statistic.
    The SE (stdp option of predict) is not available for every statistic.


Description

    stpm fits spline-based distributional models to right- or
    interval-censored data. varlist is a set of covariates. Multistate models
    are also supported.


Options for stpm

    Note that the complexity of the spline part of the model is defined by
    df() or stpmdf() or knots(). Just one of these options must be specified.

    all includes out-of-sample observations in generated variables which
        contain the spline basis functions.

    cluster(cluster_var)) adjusts standard errors for intra-group
        correlation, where the grouping variable is cluster_var; implies
        robust

    df(#) specifies the degrees of freedom for the natural spline function. #
        must be between 1 and 6. The knots() option is not applicable and the
        knots are placed at the following centiles of the distribution of the
        uncensored log times:

        ---------------------------
        df        Centile positions
        ---------------------------
         1        (no knots)
         2        50
         3        33 67
         4        25 50 75
         5        20 40 60 80
         6        17 33 50 67 83
        >6        (not allowed)
        ---------------------------

    knots([l|%]knotlist) (syntax 1) defines the internal knot positions for
        the spline.  If you specify knots(knotlist) then knotlist should
        consist of values of log time. If you specify knots(lknotlist) then
        the values in knotlist are taken to be times and are automatically
        log transformed by stpm. (This is a convenience feature; it is easier
        to enter times than log times). If you specify knots(%knotlist) then
        the values in knotlist are taken to be centile positions in the
        distribution of the uncensored log times.

    knots(u#1 #2 #3) (syntax 2) also defines the internal knots for the
        spline. #1 knots are assigned at random uniformly distributed
        positions between #2 and #3, where #2 is the lowest centile position
        of the uncensored log times you wish to entertain and #3 is the
        highest. A suggested choice is #2=10, #3=90; knots are to be placed
        at random positions between the 10th and 90th centiles of the
        uncensored log times.

    left(leftvar) specifies that some or all of the survival-time
        observations are interval-censored. The rules for specifying values
        of leftvar and their meanings in terms of interval censoring are as
        follows:

        -------------------------------------------------------------------
        Value of leftvar   _d   Meaning
        -------------------------------------------------------------------
        . or _t             0   Right censored at _t
        . or _t             1   Event at _t
        0                   0   Right censored at _t
        0                   1   Interval censored, event in (0,_t]
        <_t                 0   Late entry at leftvar, right censored at _t
        <_t                 1   Interval censored, event in [leftvar,_t]
        -------------------------------------------------------------------

        Note that stpm support datasets with late entry and interval
        censoring only when late entry is specified through leftvar and the
        value of _d, as given in the above table.

    mlmaxopts(ml_maximize_options) control the likelihood maximization
        process; seldom used. The difficult option is probably the most
        important. See ml##ml_maxopts for details.

    noconstant suppresses the constant term in the [xb] equation.

    nolog suppresses the iteration log while the model is fit.

    noorthog suppresses orthogonalisation of the spline basis functions. You
        should rarely need this option, which is provided for compatibility
        with early versions of stpm, and for pedagngic reasons.

    offset(offsetvar) defines the offset for the [xb] equation.  offsetvar is
        added to the linear predictor.

    q(orthog_matrix) is required with the spline() option, unless the
        noorthog option was used when the model was fitted. The Q-matrix
        orthog_matrix is a linear transformation between the regression
        coefficients on the orthogonalised and original spline basis
        functions. It may be stored following a run of stpm by extracting it
        from e(Q):

        stpm ...
        matrix q = e(Q)

    scale(hazard|normal|odds) is not optional and specifies the scale of the
        model. The hazard, normal and odds options fit models on the scale of
        the log cumulative hazard, normal equivalent deviate or log
        cumulative odds of failure respectively.

    spline(splinevar derivativevar) allows you to specify the baseline spline
        function and its derivative with respect to ln(_t). For a given model
        where the spline function has been estimated, splinevar can be
        created by using for example

        predict splinevar, spline zero

        derivativevar may be created by using for example

        predict derivativevar, dzdy

        Note also that the noconstant option of predict subtracts the
        constant term from the spline function. This will ensure that if the
        model is refitted using splinevar and derivativevar, the constant
        will be reported as (close to) zero.

    stpmdf(#) is identical to df(). stpmdf() is for inputting the df required
        for stpm to mfp, which has its own df() option.

    stratify(strat_varlist) stratifies the spline functions according to the
        variables in strat_varlist. It will rarely make sense for the
        variables in strat_varlist not to be among those in varlist, but this
        is not checked.

    technique() specifies how the likelihood function is to be maximized by
        ml.  The following algorithms are currently implemented in ml: nr
        (the default), bhhh, dfp, bfgs.  For further details, see ml or the
        book Maximum Likelihood Estimation with Stata, 2d Edition (Gould,
        Pitblado, and Sribney 2003).

    theta(est|#) applies only with scale(odds) and estimates the
        transformation parameter (theta) or performs estimation with theta
        set to #. The transformation of the (baseline) survival function
        S_0(t) is then

        g_theta(S_0(t)) = ln(S_0(t)^(-theta)-1)/theta)

        theta = 0 corresponds to the cumulative hazards model. With
        theta(est), theta is estimated and presented on a log scale, i.e.
        ln(theta). With theta(#), # must be positive.


Options for predict

    at(varname #|vn [varname #|vn ...]) (syntax 1) computes the various
        statistics at value(s) (# or vn)... of model covariates varname...,
        where vn means "variable name". The at() option is a convenient way
        of specifying out-of-sample prediction for some or all of the
        covariates in the model.  Covariates in stpm's varlist but NOT listed
        in at() are used in computing predicted values, unless the zero
        option is specified, in which case adjustment is to value 0 of such
        predictors.

    at(@varname #|string)(syntax 2) computes the various statistics at the
        mean of the covariate values for observations in which varname equals
        # (if varname is numeric) or in which varname equals string (if
        varname is a string variable). varname may typically be an
        identification number, e.g.  a row number of the dataset. Note that
        varname is not usually one of the covariates in the model (i.e. it is
        not a member of varlist).

    at(+#)(syntax 3) computes the various statistics after adding # to the
        linear predictor from the [xb] equation. This can be used to examine
        the effects of applying specific offsets to the predicted values on
        other statisics of interest.

    deviance creates deviance-like residuals. The same formula is used as in 
        streg.

    martingale creates martingale-like residuals. The same formula is used as
        in streg.

    noconstant removes the constant (if any) in equation [xb] when computing
        a statistic of interest. This includes the fitted spline function
        using the spline option.

    nooffset is relevant only if you specified offset() with stpm.  It
        modifies the calculations made by predict, xb so that they ignore the
        offset variable.  The linear prediction for equation [xb] is treated
        as x*b rather than x*b + offset.

    stdp computes the standard error of statistics xb, cumhazard, cumodds,
        normal, dzdy, or of the log (note: log) survival time for centile().
        stdp is also implemented for hazard with hazard-scaled models
        (scale(hazard)) only, when it calculates the S.E. of the log (note:
        log) hazard function. stdp is not implemented for other statistics,
        but note that confidence intervals for the survival function may be
        found by back-transformation of confidence intervals for the
        cumulative hazard or odds or normal function, and those for the
        hazard function similarly, through the formula exp(ln(hazard) +/- z *
        SE(log hazard)).

    spline computes the fitted spline function. This incorporates all
        covariate effects and by default is evaluated at the observed failure
        times. Use of the zero option evaluates the spline with all covariate
        values set to zero (i.e. the baseline spline). If there are variables
        in the model with time-dependent effects, specified through the
        stratify() option, and the time(#) option is also used, the result is
        effectively the prognostic index evaluated at time #.

    store(global_macro_name [#]) stores the contents of newvarname in a
        global macro called global_macro_name.  Successive non-missing values
        are stored, separated by spaces.  Optionally, rounding to # decimal
        places is allowed, to save space and make the results more readable.
        The store() option is not often used, but may be helpful when passing
        the output to another program.

    time(#|vn) predicts at time # or at the time values in vn. If time() is
        not specified, prediction is made for time _t.

    tvc(varname) stands for "time-varying coefficient" and computes the
        estimated coefficient for varname, a covariate in stpm's varlist. If
        varname is "time-fixed", then newvarname will be a constant, namely
        [xb]_b[varname]. If varname is included in strat_varlist, then
        newvarname will depend on _t and may be interpreted as the
        time-varying effect of varname on the chosen scale of the model
        (proportional hazards, proportional odds or probit).  For example, in
        a hazards-scale model (scale(hazard)), newvarname multiplied by
        varname will give an estimate of the time-varying log cumulative
        hazard ratio for varname (compared with varname = 0) at every
        observed value of varname. newvarname alone will give the log
        cumulative hazard ratio for a one-unit change in varname. Note that
        the time-varying log cumulative hazard ratio for varname will NOT be
        identical to the time- varying log hazard ratio for varname.

    zero predicts at zero values of covariates in varlist and similarly for
        strat_varlist, if stratify() is specified. See also option at().


Remarks

    Let t denote time. stpm works by first calculating the survival function
    S(t) non-parametrically by using the Kaplan-Meier estimator. The
    procedure is illustrated for proportional hazards models, specified by
    option scale(hazard). S(t) is converted to an estimate of the log
    cumulative hazard function Z(t) by the formula

        Z(t) = ln(-ln S(t))

    This Z(t) is then smoothed on ln(t) using regression splines with knots
    placed at certain quantiles of the distribution of t. The knot positions
    are chosen automatically if the spline complexity is specified by the
    df() option, or manually by way of the knots() option. (Note that the
    knots are values of ln(t), not t.) Denote the predicted values of the log
    cumulative hazard function by Zhat(t). The density function f(t) is

        f(t) = -dS(t)/dt = dS/dZhat dZhat/dt = S(t) exp(Zhat) dZhat(t)/dt

    dZhat(t)/dt is computed from the regression coefficients of the fitted
    spline function. A smoothed survival function is calculated as

        Shat(t) = exp(-exp Zhat(t)).

    The hazard function is calculated as f(t)/Shat(t).

    If varlist is specified, the baseline survival function (i.e. at zero
    values of the covariates) is used instead of the survival function of the
    raw observations. With df(1) a Weibull model is fitted.

    With scale(normal), smoothing is of the Normal quantile function,
    invnorm(1-S(t)), instead of the log cumulative hazard function. With
    df(1) a lognormal model is fitted.

    With scale(odds), smoothing is of the log odds of failure function,
    ln((1-S(t))/S(t)), instead of the log cumulative hazard function. With
    df(1) a log-logistic model is fitted.

    Estimation is performed by maximum likelihood. Optimisation uses the
    default technique (nr, meaning Stata's version of Newton-Raphson
    iteration.

    As an example of output from stpm, consider the following. A stratifying
    variable nn has two levels, 0 and 1. There are two covariates, x5a and
    x5b, with effects assumed to act proportionately on the cumulative hazard
    function:

    . stpm x5a x5b nn, stratify(nn) scale(odds) df(2)

    initial:       log likelihood = -1124.6562
    rescale:       log likelihood = -1124.6562
    rescale eq:    log likelihood = -1124.6562
    Iteration 0:   log likelihood = -1124.6562  
    Iteration 1:   log likelihood =  -1112.096  
    Iteration 2:   log likelihood = -1111.5188  
    Iteration 3:   log likelihood = -1111.5173  
    Iteration 4:   log likelihood = -1111.5173  
    
                                                      Number of obs   =       1
> 289
                                                      Wald chi2(1)    =      10
> .48
    Log likelihood = -1111.5173                       Prob > chi2     =     0.0
> 012
    
    ---------------------------------------------------------------------------
> ---
          _t |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interv
> al]
    ---------+-----------------------------------------------------------------
> ---
    s0       |
          nn |  -1.735629   .5360622     -3.238   0.001      -2.786292   -.6849
> 664
       _cons |   3.416165   .4057033      8.420   0.000       2.621001    4.211
> 329
    ---------+-----------------------------------------------------------------
> ---
    s1       |
          nn |  -.1167243    .046288     -2.522   0.012      -.2074471   -.0260
> 015
       _cons |   .1590071   .0349978      4.543   0.000       .0904127    .2276
> 016
    ---------+-----------------------------------------------------------------
> ---
    xb       |
         x5a |   .9971947   .1749101      5.701   0.000       .6543773    1.340
> 012
         x5b |   1.844591   .2142359      8.610   0.000       1.424696    2.264
> 486
          nn |  -.5830707   .3702606     -1.575   0.115      -1.308768    .1426
> 267
       _cons |  -2.039426   .2662178     -7.661   0.000      -2.561203   -1.517
> 649
    ---------------------------------------------------------------------------
> ---
    Deviance =  2223.035 (1289 observations.)

Notes on the output

    1. The equations: s0 and s1 represent the first and second spline basis
    functions, with [s0]_cons and [s1]_cons being the coefficients for value
    0 of the stratifying variable nn, and [s0]_cons+[s0]_b[nn] and
    [s1]_cons+[s1]_b[nn] the coefficients for value 1 of nn.

    2. If you wanted to test for the need to stratify by nn you could type
    test [s0]nn [s1]nn. This would give a 2 degree of freedom Wald test of
    any difference between the spline functions for nn.

    3. The value of "Wald chi2(1)" should be ignored.

Reproducing the model

    Sometimes one wishes to reproduce a model, that is, to incorporate the
    linear predictor in [xb] and the spline function representing the
    baseline distribution function without re-fitting any parameters. This
    can be done by using the spline, xb and dzdy options of predict to create
    the necessary variables, and storing the matrix e(Q) for later use. The
    model is reproduced by using the spline(), offset(), noconstant and q()
    options of stpm. The reason for the use of noconstant with predict, xb is
    that the constant of the [xb] equation is already included in the spline
    function by predict, spline and should not be introduced again. A
    specific example:

        . stpm model_specifications
        . predict sp, spline zero
        . predict dz, dzdy
        . predict xb, xb noconstant
        . matrix Q = e(Q)

    To refit the model in the subset of the data for which z equals 1 without
    re-estimating any parameters (only the deviance is displayed):

        . stpm if z==1, offset(xb) spline(sp dz) q(Q) noconstant

    To re-estimate the constant in the [xb] equation in a subset of the data:

        . stpm if z==1, offset(xb) spline(sp dz) q(Q)

    At present, the approach cannot be used to reproduce time-varying models,
    i.e. models fitted with the stratify() option.


Stored results

    stpm returns the deviance (minus twice the maximised log likelihood) and
    the Akaike Information Criterion (AIC) in scalars e(dev) and e(aic)
    respectively. Additional quantities are stored in e() functions and may
    be inspected by using ereturn list.


Examples

    . stpm, scale(hazard) df(1)
    . stpm, df(3) scale(normal)
    . stpm, df(3) scale(odds)
    . stpm, df(3) scale(odds) theta(2)
    . stpm, df(3) scale(odds) theta(est)
    . stpm treatmnt, scale(odds) knots(%10 90)
    . xi: stpm x1 x2 x3 i.pop, df(2) scale(o) stratify(i.pop)
    . stpm x1 x2 x3, df(3) scale(h) left(left)

    . predict s0, survival zero
        [baseline survival function]
    . predict surv3, survival time(3)
        [survival probs at time 3]
    . predict median, centile(50)
        [median survival, given covars]
    . predict semedian, centile(50) stdp
    . predict h0, hazard zero
        [baseline hazard function]
    . predict h1, hazard zero at(z 1)
    . predict lnH, cumhazard at(@@id 25)
        [at covars for obs #id[25]]

    . gen hr = h1/h0
        [hr is defined for all obs.]
    . predict bz, tvc(z)
        [time-varying coeff for z]
    . predict sbz, tvc(z) stdp
        [SE of bz]

    . stpm, scale(hazard) df(2)
    . gen times = _n in 1/10
    . predict surv, survival time(times) store(survlist 4)
    . display "$survlist"

    . stpm x1 x2 x3, df(3) scale(h)
    . matrix q = e(Q)
    . predict spline, spline zero noconstant
    . predict dspline, dzdy
    . stpm x1 x2 x3, spline(spline dspline) scale(h) q(q)


Auxiliary ado-files

    frac_spl, frac_s3b, mlsurvlf, stpm_p.


Author

    Patrick Royston, MRC Clinical Trials Unit, London.
    patrick.royston@ctu.mrc.ac.uk


References

    P. Royston. 2001. Flexible alternatives to the Cox model, and more.  The
    Stata Journal 1:1-28.

    P. Royston and M. K. B. Parmar. 2002. Flexible proportional-hazards and
    proportional-odds models for censored survival data, with application to
    prognostic modelling and estimation of treatment effects.  Statistics in
    Medicine 21: 2175-2197.


Also see