Title
dthaz -- Discrete-time hazard and survival probability estimates
Syntax
dthaz [varlist] [if] [weight] [, specify(numlist) tpar(#) truncate(#) pretrunc(#) cloglog cluster(varname) display(#) level(#) model suppress graph(#) graph_twoway_options copyleft]
options Description ------------------------------------------------------------------------- Model specify(numlist) specify values for predicted population values tpar(#) select alternative parameterizations of time truncate(#) truncate the maximum time of length to event pretrunc(#) ignore some initial time periods in the model cloglog use a complimentary log-log link (see cloglog)
SE/Robust cluster(varname) adjust standard errors for intragroup correlation
Reporting display(#) limit the maximum displayed period level(#) set confidence level; default is level(95) model output model estimate suppress switch off dthaz output
Graph options graph(#) conditional hazard, survival, or cumulative incidence curves twoway_options graph twoway options
Miscellaneous copyleft display license information ------------------------------------------------------------------------- fweights, iweights, and pweights are allowed; see weight.
Description
dthaz estimates the hazard and survival probabilities of the population, given the specified model by means of a logit link (default) or by a complementary log-log link. This program requires data in person-period format, and person-period variables may be created using prsnperd.
Typed with no varlist and with no tpar() option, dthaz estimates baseline conditional hazard (h) and survival probabilities (S) for the sample. These estimates correspond exactly with actuarial estimates of sample hazard and sample survival functions. Specifying numeric predictors in varlist and the required set of associated values with the specify() option adds them to the model following as follows (for logit hazard):
h_i = 1/(1+e^-(a_i*d_i + BX_i))
Where:
a_i is the effect of the ith time period, d_i, B is a vector of effects for a vector of predictors X_i during the ith time period, and
S_i = (1-h_1)*(1-h_2) * ¥ ¥ ¥ * (1-h_i).
The reported conditional hazard and survival probabilities are accompanied by standard errors approximated using a first order application of the delta method (Dinno and Kim, 2011). The normally approximated confidence intervals drawn using the graph() option are obtained by application of these standard errors with the alpha specified by level().
Options
+-------+ ----+ Model +------------------------------------------------------------
specify(numlist) The user must specify which category of population members the hazard and survival estimates are to be calculated. Currently, if specifications are made with this option, they must be made for each of the variables specified in varlist. Specifications may be separated by spaces, commas or both.
tpar(#) The user may select alternative parameterizations of time. Such time parameterizations allow a parsimonious smoothing of the effects of time, and are as follows:
-1 Fully discrete time parameterization. This setting is the default, and reflects unique effects of time for each period.
0 Constant time parameterization. This model constrains the effect of time to be constant across all periods. The model includes a prespecified constant term, is used in the following models, and permits model nesting.
N Polynomial time parameterization. This model constrains the effect of time as a polynomial function of order N. If the representation of time is over-specified (i.e. has more predictors than the number of periods in the dataset, or than the number the analysis has been truncated to) then the user will be warned and the parameterization will be reset to its maximum. Lower order models nest within higher order ones. N > 0.
-2 Root time parameterization. This model constrains the effect of time as a square-root function of period (plus constant plus linear terms)
truncate(#) The user may truncate the maximum time of length to event to this number. The estimate will censor data for time periods beyond this point. Negative values and values greater than the maximum period value are ignored.
Note: Specifying this option for the baseline model will produce exactly the same estimates as for the untruncated model for the given periods, since baseline estimates are always equal to the sample hazard and sample survival functions.
pretrunc(#) The user may discard early time periods from the new dataset. For example, when pre-truncating with a value of 2, the period that would be indicated by _d3 becomes _d1 instead, and the value of _period would be decreased by 2. The dataset is preserved when using this option
Note: Specifying values of truncate greater than the one minus the maximum value of length-to-event (or specifying negative values) produces the same dataset as one with no value of truncate specified. Also, truncate and pretrunc cannot be combined when their values would result in fewer than two periods. Discrete time survival analyses conducted upon pre-truncated datasets are, in effect analyses conducted upon separate populations from the not pre-truncated datasets if the conditional hazard during the pre-truncated periods is greater than zero. The author suggests that an analyst may desire to perform a pre-truncated analysis either because there are no events during initial periods, or because she is interested in analyzing a surviving sub-population at a later starting period. However, in cases where events occurred during the pre-truncated periods, a survival analysis cannot be said to generalize to the population of the not pre-truncated dataset. In cases where events occur in initial periods, but at rates that are too few to provide reliable estimates for these periods, the analyst should both employ a sensitivity analysis to describe differences between models on pre-truncated and not pre-truncated datasets, but also examine the characteristics of anomalous individuals--qualitative data may particularly help illuminate how these persons differ from the majority of individuals who remain in the pre-truncated dataset.
cloglog This option switches the estimate of the hazard function to a complementary log-log link. This produces estimates under an assumption of proportional hazards, rather than an assumption of proportional odds. The general discrete time complimentary log log hazard model is:
h = 1-exp(-exp(a_i*d_i + B*X_i))
Where the parameters follow the same conventions described for the logit hazard model above.
+-----------+ ----+ SE/Robust +--------------------------------------------------------
cluster(varname) The user may adjust the standard errors of the estimates for person-level (between person) variance in repeated measures designs by specifying the id variable used to construct the person-period dataset.
+-----------+ ----+ Reporting +--------------------------------------------------------
display(#) The user may limit the maximum period for hazard and survival probabilities to this number. This option only affects which values are displayed. The estimated and values returned in r(Hazard) remain as for the maximum period of the person-period dataset. Negative values and values greater than the maximum period value are ignored.
level(#); see [R] estimation options.
model This option includes the estimated model in the output.
suppress Switches off dthaz output. Graphs still display if selected. The estimated model is displayed if the model option is turned on.
+---------------+ ----+ Graph options +----------------------------------------------------
graph(#) Users may opt to graph conditional hazard probabilities (1), survival probabilities (2), both (3) or (4) cumulative incidence probabilities (i.e. 1 - survival) against discrete time periods. Graphing options available to grtwoway are available. The default setting is no graph.
Note: the graph() option does not yet plot confidence intervals in Stata 7.
+---------------+ ----+ Miscellaneous +----------------------------------------------------
copyleft dthaz is free software, licensed under the GPL. The copyleft option displays the copying permission statement for dthaz. The full license can be obtained by typing:
. net describe dthaz, from (http://www.doyenne.com/stata)
and clicking on the click here to get link for the ancillary file.
Examples
. dthaz
. dthaz sex region, specify(0 6) truncate(6)
. dthaz sex educate class, sp(1, 12, 0) gr(3)
. dthaz party age, sp(0 1) model cloglog
. dthaz, tp(3)
Author
Alexis Dinno Portland State University alexis dot dinno at pdx dot edu
Please contact me with any questions, bug reports or suggestions for improvemen > t.
My thanks to Dr. Suzanne Graham.
References
Dinno A and Kim JS. 2011. "Approximating Confidence Intervals About Discrete-Time Survival/Cumulative Incidence Estimates Using the Delta Method." Unpublished (manuscript available on request)
Singer JD and Willett JB. 2003. Applied Longitudinal Data Analysis: Modeling Change and Event Occurence. Oxford, UK: Oxford University Press. 672 pages.
Willet JB and Singer JD. 1991. "From Whether to When: New Methods for Studying Student Dropout and Teacher Attrition." Review of Educational Research. 61: 407-450
Singer JD and Willett JB. 1991. "Modeling the Days of Our Lives: Using Survival Analysis When Designing and Analyzing Longitudinal Studies of Duration and Timing of Events." Psychological Bulletin. 110: 268-290
Saved results
In addition to the results returned by the estimation commands logistic or cloglog, dthaz saves the following in e():
Matrices e(Hazard) Conditional hazard vector for the specified group e(HazardSE) Standard error vector for the conditional hazards e(Survival) Survival probability vector for the specified group e(SurvivalSE) Standard error vector for the survival probabilities
Also See