Title
prsnperd -- A utility for creating person-period datasets for discrete time longitudinal analyses
Syntax
prsnperd id length-to-event [censor] [, truncate(#) pretrunc(#) cswitch tvp(names) fev(name) copyleft]
options Description ------------------------------------------------------------------------- Miscellaneous truncate(#) truncate the maximum time of length-to-event pretrunc(#) ignore some initial time periods in the model cswitch invert censor coding tvp(names) provide root names of flat-encoded time varying predictors fev(name) provide root name of flat-encoded time varying event occurrence copyleft display license information -------------------------------------------------------------------------
Description
prsnperd transforms a person-time dataset into a person-period dataset for discrete-time longitudinal analyses, for example, using dthaz. Input variables are id: the unique id number of each observed individual in the person-time dataset; length-to-event: the duration to event occurrence (in number of discrete time intervals since the study's Beginning of Time); and censor, which indicates censoring status of the observed individual (where 0 = not censored; and 1 = censored, unless the cswitch option is used). Given an input data set of this form, an output dataset is created with expanded observations and several new variables.
NOTE: individuals who were never observed to have experienced an event should be coded as having a length-to-event equal to their total time in the study, and should be censored.
Each individual observation within the person-time dataset is replaced with a number of new observations equal to length-to-event for that id. If there is no event occurrence for a given time period, the user is so notified. Within these new observations either one, or several new variables are created, depending on whether the survival analysis or growth-modeling syntax is used. If application is for growth modeling, then only the _period variable is created, otherwise all the following variables are produced.
_period Specific time interval of this observation. Each id will have at least one observation with _period = 1. The maximum value for _period is equal to the maximum length-to-event of the person-time dataset (or to truncate if specified).
_d1-_dX (Where X is the maximum value for period) These are indicator variables (i.e. "dummy variables") for the current period.
_Y _Y indicates event occurrence for the given period (where 0 = event did not happen and 1 = event happened). _Y is usefully employed as the outcome in event history models. As in a simple logit hazard model:
. logit _Y d1-d8, nocons
produces an estimate of baseline hazard corresponding perfectly with the sample hazard where ^H(t_j) = 1/1+e^-(B_j). The estimate becomes more interesting when additional predictors are added thus:
. logit _Y d1-d8 age, nocons or
Exploration of estimated differences in ^H(t_j) can therefore be modeled using standard nested models of multiple predictors. The or function provides estimated odds for hazard of event compared to non-event for each predictor.
_status A categorical status variable for producing life-tables (where 1 = event occurred; 2 = event did not occur; and 3 = censored). Life tables with sample hazard can be created by using the following:
. tabulate _period _status, row
Options
+---------------+ ----+ Miscellaneous +----------------------------------------------------
truncate(#) restricts the maximum value for length-to-event, censoring those observations with integer values greater than truncate.
NOTE: Specifying values of truncate greater than the maximum value of length-to-event (or specifying negative values) produces the same dataset as one with no value of truncate specified.
pretrunc(#) discards early time periods from the new dataset. For example, when pre-truncating with a value of 2, the period that would be indicated by _d3 becomes _d1 instead, and the value of _period would be decreased by 2.
NOTE: Specifying values of truncate greater than the one minus the maximum value of length-to-event (or specifying negative values) produces the same dataset as one with no value of truncate specified. Also, truncate and pretrunc cannot be combined when their values would result in fewer than two periods. Discrete time survival analyses conducted upon pre-truncated datasets are, in effect analyses conducted upon separate populations from the not pre-truncated datasets {it:if the conditional hazard during the pre-truncated periods is greater than zero}. The author suggests that an analyst may desire to perform a pre-truncated analysis either because there are no events during initial periods, or because she is interested in analyzing a surviving sub-population at a later starting period. However, in cases where events occurred during the pre-truncated periods, a survival analysis cannot be said to generalize to the population of the not pre-truncated dataset. In cases where events occur in initial periods, but at rates that are too few to provide reliable estimates for these periods, the analyst should both employ a sensitivity analysis to describe differences between models on pre-truncated and not pre-truncated datasets, but also examine the characteristics of anomalous individuals--qualitative data may particularly help illuminate how these persons differ from the majority of individuals who remain in the pre-truncated dataset.
cswitch tells prsnperd to expect that censored data are coded with 0 = censored, and 1 = event/failure.
tvp(names) generates variable(s) with the supplied name(s) if the names correspond precisely to prefixed portions of flat coded time varying predictors. Person-time data sets are often constructed with time-varying predictors encoded in such a format (for example, predictor1, predictor2, predictor3, predictor4, where the numeric suffix indicates which time-period the observation was made in). Missing values will not be imputed. The time-designation in the suffix must be ordered in the same manner as the periods of observation.
fev(name) constructs variables named length_to_event and censored with appropriate values if event data are in a flat indicator format (for example, event1 event2 event3 event4), rather than in a single length-to-event variable by specifying the common portion of the event variables' names (for example "event"). This option assumes that all event variables share a common prefix (name), that name has values 0 (no event), 1 (event, or first event), or . (censored), and that there is no left-censoring of observations. The created variables will override supplied length-to-event and censored variables. prsnperd will exit with an error if it encounters left-censored data with the fev option. fev also expects that no data are middle-censored (i.e. all time periods have been observed for each individual between the study's beginning of time and either the first occurence of the event, or right-censoring).
copyleft prsnperd is free software, licensed under the GPL. The copyleft option displays the copying permission statement for prsnperd which is a part of the dthaz package. The full license can be obtained by typing:
. net describe dthaz, from (http://www.doyenne.com/stata)
and clicking on the click here to get link for the ancillary file.
Examples
. prsnperd id length censored
. prsnperd id length censored, truncate(8)
. prsnperd id, tvp(predictor) fev(event)
Author
Alexis Dinno Portland State University alexis dot dinno at pdx dot edu
Please contact me with any questions, bug reports or suggestions for improvemen > t.
My thanks to Dr. Suzanne Graham, Dr. Jim Stiles, and Dr Anna Song.
References
Singer JD and Willett JB. 2003. Applied Longitudinal Data Analysis: Modeling Change and Event Occurence. Oxford, UK: Oxford University Press. 672 pages.
Willet JB and Singer JD. 1991. "From Whether to When: New Methods for Studying Student Dropout and Teacher Attrition." Review of Educational Research. 61: 407-450
Singer JD and Willett JB. 1991. "Modeling the Days of Our Lives: Using Survival Analysis When Designing and Analyzing Longitudinal Studies of Duration and Timing of Events." Psychological Bulletin. 110: 268-290
Also See
Help: dthaz, msdthaz