help dthaz-------------------------------------------------------------------------------

Title

dthaz-- Discrete-time hazard and survival probability estimates

Syntax

dthaz[varlist] [if] [weight] [,specify(numlist)tpar(#)truncate(#)pretrunc(#)cloglogcluster(varname)display(#)level(#)modelsuppressgraph(#)graph_twoway_optionscopyleft]

optionsDescription ------------------------------------------------------------------------- Modelspecify(numlist)specify values for predicted population valuestpar(#)select alternative parameterizations of timetruncate(#)truncate the maximum time of length to eventpretrunc(#)ignore some initial time periods in the modelclogloguse a complimentary log-log link (see cloglog)SE/Robust

cluster(varname)adjust standard errors for intragroup correlationReporting

display(#)limit the maximum displayed periodlevel(#)set confidence level; default islevel(95)modeloutput model estimatesuppressswitch offdthazoutputGraph options

graph(#)conditional hazard, survival, or cumulative incidence curvestwoway_optionsgraph twoway optionsMiscellaneous

copyleftdisplay license information -------------------------------------------------------------------------fweights,iweights, andpweightsare allowed; see weight.

Description

dthazestimates the hazard and survival probabilities of the population, given the specified model by means of a logit link (default) or by a complementary log-log link. This program requires data in person-period format, and person-period variables may be created usingprsnperd.Typed with no

varlistand with notpar()option,dthazestimates baseline conditional hazard (h) and survival probabilities (S) for the sample. These estimates correspond exactly with actuarial estimates of sample hazard and sample survival functions. Specifying numeric predictors invarlistand the required set of associated values with thespecify()option adds them to the model following as follows (for logit hazard):h_i = 1/(1+e^-(a_i*d_i + BX_i))

Where:

a_i is the effect of the ith time period, d_i, B is a vector of effects for a vector of predictors X_i during the ith time period, and

S_i = (1-h_1)*(1-h_2) * ¥ ¥ ¥ * (1-h_i).

The reported conditional hazard and survival probabilities are accompanied by standard errors approximated using a first order application of the delta method (Dinno and Kim, 2011). The normally approximated confidence intervals drawn using the

graph()option are obtained by application of these standard errors with the alpha specified bylevel().

Options+-------+ ----+ Model +------------------------------------------------------------

specify(numlist)The user must specify which category of population members the hazard and survival estimates are to be calculated. Currently, if specifications are made with this option, they must be made for each of the variables specified invarlist. Specifications may be separated by spaces, commas or both.

tpar(#)The user may select alternative parameterizations of time. Such time parameterizations allow a parsimonious smoothing of the effects of time, and are as follows:-1 Fully discrete time parameterization. This setting is the default, and reflects unique effects of time for each period.

0 Constant time parameterization. This model constrains the effect of time to be constant across all periods. The model includes a prespecified constant term, is used in the following models, and permits model nesting.

N Polynomial time parameterization. This model constrains the effect of time as a polynomial function of order N. If the representation of time is over-specified (i.e. has more predictors than the number of periods in the dataset, or than the number the analysis has been truncated to) then the user will be warned and the parameterization will be reset to its maximum. Lower order models nest within higher order ones. N > 0.

-2 Root time parameterization. This model constrains the effect of time as a square-root function of period (plus constant plus linear terms)

truncate(#)The user may truncate the maximum time of length to event to this number. The estimate will censor data for time periods beyond this point. Negative values and values greater than the maximum period value are ignored.Note: Specifying this option for the baseline model will produce exactly the same estimates as for the untruncated model for the given periods, since baseline estimates are always equal to the sample hazard and sample survival functions.

pretrunc(#)The user may discard early time periods from the new dataset. For example, when pre-truncating with a value of 2, the period that would be indicated by _d3 becomes _d1 instead, and the value of _period would be decreased by 2. The dataset is preserved when using this optionNote: Specifying values of

truncategreater than the one minus the maximum value oflength-to-event(or specifying negative values) produces the same dataset as one with no value oftruncatespecified. Also,truncateandpretrunccannot be combined when their values would result in fewer than two periods. Discrete time survival analyses conducted upon pre-truncated datasets are, in effect analyses conducted upon separate populations from the not pre-truncated datasetsif the conditional hazardduring thepre-truncated periods is greater than zero. The author suggests that an analyst may desire to perform a pre-truncated analysis either because there are no events during initial periods, or because she is interested in analyzing a surviving sub-population at a later starting period. However, in cases where events occurred during the pre-truncated periods, a survival analysis cannot be said to generalize to the population of the not pre-truncated dataset. In cases where events occur in initial periods, but at rates that are too few to provide reliable estimates for these periods, the analyst should both employ a sensitivity analysis to describe differences between models on pre-truncated and not pre-truncated datasets, but also examine the characteristics of anomalous individuals--qualitative data may particularly help illuminate how these persons differ from the majority of individuals who remain in the pre-truncated dataset.

cloglogThis option switches the estimate of the hazard function to a complementary log-log link. This produces estimates under an assumption of proportional hazards, rather than an assumption of proportional odds. The general discrete time complimentary log log hazard model is:h = 1-exp(-exp(a_i*d_i + B*X_i))

Where the parameters follow the same conventions described for the logit hazard model above.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

cluster(varname)The user may adjust the standard errors of the estimates for person-level (between person) variance in repeated measures designs by specifying theidvariable used to construct the person-period dataset.+-----------+ ----+ Reporting +--------------------------------------------------------

display(#)The user may limit the maximum period for hazard and survival probabilities to this number. This option only affects which values are displayed. The estimated and values returned in r(Hazard) remain as for the maximum period of the person-period dataset. Negative values and values greater than the maximum period value are ignored.

level(#); see[R] estimation options.

modelThis option includes the estimated model in the output.

suppressSwitches offdthazoutput. Graphs still display if selected. The estimated model is displayed if themodeloption is turned on.+---------------+ ----+ Graph options +----------------------------------------------------

graph(#)Users may opt to graph conditional hazard probabilities (1), survival probabilities (2), both (3) or (4) cumulative incidence probabilities (i.e. 1 - survival) against discrete time periods. Graphing options available to grtwoway are available. The default setting is no graph.Note: the

graph()option does not yet plot confidence intervals in Stata 7.+---------------+ ----+ Miscellaneous +----------------------------------------------------

copyleftdthazis free software, licensed under the GPL. Thecopyleftoption displays the copying permission statement fordthaz. The full license can be obtained by typing:. net describe dthaz, from (http://www.doyenne.com/stata)

and clicking on the click here to get link for the ancillary file.

Examples. dthaz

. dthaz sex region, specify(0 6) truncate(6)

. dthaz sex educate class, sp(1, 12, 0) gr(3)

. dthaz party age, sp(0 1) model cloglog

. dthaz, tp(3)

AuthorAlexis Dinno Portland State University alexis dot dinno at pdx dot edu

Please contact me with any questions, bug reports or suggestions for improvemen > t.

My thanks to Dr. Suzanne Graham.

ReferencesDinno A and Kim JS. 2011. "Approximating Confidence Intervals About Discrete-Time Survival/Cumulative Incidence Estimates Using the Delta Method." Unpublished (manuscript available on request)

Singer JD and Willett JB. 2003.

Applied Longitudinal Data Analysis: ModelingChange and Event Occurence. Oxford, UK: Oxford University Press. 672 pages.Willet JB and Singer JD. 1991. "From Whether to When: New Methods for Studying Student Dropout and Teacher Attrition."

Review of EducationalResearch. 61: 407-450Singer JD and Willett JB. 1991. "Modeling the Days of Our Lives: Using Survival Analysis When Designing and Analyzing Longitudinal Studies of Duration and Timing of Events."

Psychological Bulletin.110: 268-290

Saved resultsIn addition to the results returned by the estimation commands

logisticorcloglog,dthazsaves the following ine():Matrices

e(Hazard)Conditional hazard vector for the specified groupe(HazardSE)Standard error vector for the conditional hazardse(Survival)Survival probability vector for the specified groupe(SurvivalSE)Standard error vector for the survival probabilities

Also See