Title
survsim -- Simulate complex survival data
Syntax
survsim newvarname1 [newvarname2] [, options]
options Description ------------------------------------------------------------------------- lambdas(numlist) scale parameters gammas(numlist) shape parameters distribution(exponential) exponential survival distribution distribution(gompertz) Gompertz survival distribution distribution(weibull) Weibull survival distribution (default) maxtime(#) administrative censoring time covariates(vn # [# ...] ...) baseline covariates tde(vn # [# ...] ...) time-dependent effects centol(real) set the tolerance for the root finding scheme, default is 1E-08
2-component mixture mixture simulate survival times from a 2-component mixture model pmix(real) mixture parameter, default is 0.5
Competing risks cr simulate survival times from the all-cause distribution of cause-specific hazards ncr(int) specifies the number of competing risks showdiff display the maximum difference in estimates between iterations under the Newton-Raphson scheme
User-defined [log] [cumulative] hazard function loghazard(string) user-defined log baseline hazard function, see details hazard(string) user-defined baseline hazard function, see details logcumhazard(string) user-defined log cumulative baseline hazard function, see details cumhazard(string) user-defined baseline cumulative hazard function, see details nodes(#) number of Gauss-Legendre quadrature nodes, default 30 tdefunction(string) function of time to interact with covariates specified in tde(), see details iterations(#) maximum number of iterations to pass to Mata function mm_root() mintime(#) minimum time for use in root finding, default 1E-08 ------------------------------------------------------------------------- Abbreviation: vn = varname
Description
survsim simulates survival times from parametric distributions and user-defined hazard functions. Distributions include the exponential, Gompertz and Weibull. Newton-Raphson iterations are used to generate survival times under cause-specific hazard models for competing risks, using standard parametric distributions. Non-proportional hazards can be included with all models; under an exponential or Weibull model covariates are interacted with log time, under a Gompertz model covariates are interacted with time. Baseline covariates can be included. newvarname1 specifies the new variable name to contain the generated survival times. newvarname2 is required when generating competing risks data to create the status indicator or when the maxtime() option is specified which defines the time of administrative censoring. Finally, a user-defined [log] [cumulative] baseline hazard function can be specified, in Mata code using colon operators, with survival times generated using a combination of Gaussian quadrature and root finding techniques. Complex time-dependent effects can also be specified.
Options
lambdas(numlist) defines the scale parameters in the Weibull/Gompertz distribution(s). The number of values required depends on the model choice. Default is a single number corresponding to a standard parametric distribution. Under a mixture model 2 values are required. Under a competing risks model, cr, the number of values are defined by ncr().
gammas(numlist) defines the shape parameters of the parametric distribution(s). Number of entries must be equal to that of lambdas.
distribution(string) specifies the parametric survival distrubution to use. exponential, gompertz or weibull can be used, with weibull the default.
maxtime(#) specifies an administrative censoring time. Two new varnames syntax must be specified, the second to contain the event indicator.
covariates(varname # [# ...] ...) defines baseline covariates to be included in the linear predictor of the survival model, along with the value of the corresponding coefficient. For example a treatent variable coded 0/1 can be included, with a log hazard ratio of 0.5, by covariates(treat 0.5). Variable treat must be in the dataset before survsim is run. If cr is used with ncr(4), then a value for each coefficient must be inputted for each competing risk, e.g. covariates(treat 0.5 -0.2 0.1 0.25). If cumhazard() or logcumhazard() are used, then covariates() effects are additive on the log cumulative hazard scale.
tde(varname # [# ...] ...) creates non-proportional hazards by interacting covariates with either log time or time under a Weibull or Gompertz model, respectively. Under a user-defined [log] [cumulative] hazard function, covariates are interacted with tdefunction(). Values should be entered as tde(trt 0.5), for example. If cumhazard() or logcumhazard() are used, then tde() effects are additive on the log cumulative hazard scale.
+---------------+ ----+ Mixture model +----------------------------------------------------
mixture specifies that survival times are simulated from a 2-component mixture model, with mixture component distributions defined by distribution(). lambdas() and gammas() must be of length 2.
pmix(#) defines the mixture parameter. Default is 0.5.
+-----------------+ ----+ Competing risks +--------------------------------------------------
cr specifies that survival times are simulated from the all-cause distribution from ncr() cause-specific hazards, with distributions defined by distribution(). In this case, Newton-Raphson iterations are used to simulate the survival times.
ncr(#) defines the number of competing risks. lambdas() and gammas() must be of length ncr().
+--------------------+ ----+ Convergence scheme +-----------------------------------------------
centol(real) specifies the tolerance of Brent's univariate root finder or the Newton-Raphson scheme. Default is 1E-08.
showdiff display the maximum difference in estimates between iterations when using the Newton-Raphson scheme. This can be used to monitor convergence.
+----------------------------------------------------------+ ----+ User defined [log] [cumulative] baseline hazard function +---------
loghazard(string) is the user-defined log baseline hazard function. This must be written in Mata code using colon operators. Time must be entered as #t. See examples below.
hazard(string) is the user-defined baseline hazard function. This must be written in Mata code using colon operators. Time must be entered as #t. See examples below.
logcumhazard(string) is the user-defined log cumulative baseline hazard function. This must be written in Mata code using colon operators. Time must be entered as #t.
cumhazard(string) is the user-defined baseline cumulative hazard function. This must be written in Mata code using colon operators. Time must be entered as #t.
nodes(#) defines the number of Gauss-Legendre quadrature points used to evaluate the cumulative hazard function when loghazard() or hazard(). Default is 30. This should be increased to assess the stability of the simulation process.
tdefunction(string) defines the function of time to which covariates specified in tde() are interacted with to create time-dependent effects. The default is #t, i.e. linear time. This must be written in Mata code with #t used to represent time, for example #t:^2.
iterations(#) defines the maximum number of iterations passed to mm_root(). Default is 1000. See moremata for more details.
mintime(#) defines the minimum possible simulated survival time passed to mm_root(). Default is 1E-08. See moremata for more details.
Remarks
When simulating from a user-defined loghazard() or hazard() function, numerical quadrature is used to evaluate the cumulative hazard function, within iterations of Brent's univariate root finder. As with all model frameworks which use numerical integration, it is important to assess the stability of the simulated survival times with an increasing number of quadrature nodes.
Examples
Generate times from a Weibull model including a binary treatment variable, with log(hazard ratio) = -0.5, and censoring after 5 years: . set obs 1000 . gen trt = rbinomial(1,0.5) . survsim stime1 died, lambdas(0.1) gammas(1.5) cov(trt -0.5) maxt(5) . stset stime1, f(died = 1) . streg trt, dist(weibull) nohr
Generate times from a Gompertz model: . survsim stime2, lambdas(0.1) gammas(0.05) dist(gompertz)
Generate times from a 2-component mixture Weibull model: . survsim stime3, mixture lambdas(0.1 0.05) gammas(1 1.5) pmix(0.5) maxtime(5)
Generate times from a competing risks model with 4 cause-specific Weibull hazards and 4 cause-specific treatment effects: . survsim stime4 status, cr ncr(4) lambdas(0.1 0.05 0.1 0.05) gammas(0.5 1.5 1 1.2) cov(trt 0.2 0.1 -0.1 0.4)
Generate times from a Weibull model with diminishing treatment effect: . survsim stime5, lambdas(0.1) gammas(1.5) cov(trt -0.5) tde(trt 0.05)
Generate times from user-defined log baseline hazard function: . survsim stime6, loghazard(-1 :+ 0.02:*#t :- 0.03:*#t:^2 :+ 0.005:*#t:^3) maxt(1.5)
Generate times from user-defined log baseline hazard function with diminishing treatment effect: . survsim stime7 died2, loghazard(-1 :+ 0.02:*#t :- 0.03:*#t:^2 :+ 0.005:*#t:^3) cov(trt -0.5) tde(trt 0.03) maxt(1.5)
Author
Michael J. Crowther Department of Health Sciences University of Leicester E-mail: michael.crowther@le.ac.uk
Please report any errors you may find.
References
Bender R, Augustin T and Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine 2005;24:1713-1723.
Beyersmann J, Latouche A, Buchholz A and Schumacher M. Simulating competing risks data in survival analysis. Statistics in Medicine 2009;28:956-971.
Crowther MJ and Lambert PC (2013). Simulating biologically plausible complex survival data. (Under revision).
Crowther MJ and Lambert PC. Simulating complex survival data. The Stata Journal 2012;12(4):674-687.
Jann, B. 2005. moremata: Stata module (Mata) to provide various functions. Available from http://ideas.repec.org/c/boc/bocode/s455001.html.