help sqset (SJ6-4: st0111) -------------------------------------------------------------------------------

Title

sqset -- Declare a dataset to be sequence data

Syntax

Declare data to be sequence data and specify element variable, the sequence identifier and sequence order (positions)

sqset elementvar idvar ordervar [, trim rtrim ltrim keeplongest ]

where elementvar is the variable that contains the elements of sequences, idvar is a variable that identifies the sequences, and ordervar is a variable that defines the order of the sequences.

Display how dataset is currently sqset

sqset

Clear sequence data settings

sqset, clear

Description

sqset declares the data to be sequence data and designates that elementvar represents the variable that represents the elements of the sequences, idvar should be an identifier of the sequences, and ordervar should be a variable that defines the order of each sequence.

When using sqset various checks on the data are performed, and reported back to the user.

sqset without arguments displays whether and how the dataset is currently set.

sqset, clear is a rarely used to erase the settings from the data.

To use sqset, sequence data has to be in long format. Use reshape to change sequence data in wide format to sequence data in long format.

Options

trim means both, ltrim and rtrim. Generally, we recommend using this option.

rtrim erases empty elements at the end of the sequences. Sequence data that stem from data in wide format often contain missing values at the end of sequences. Generally, there is no need for these observations, so that they can simply be erased without loss of information.

ltrim strips all empty elements at the beginning of the sequences. Generally, all sequences should start with the element at the first position. In practice, sequence data sometimes have one or more empty elements at the beginning, which we call "gaps at the beginning". Gaps at the beginning are very common for sequence data that has its origins in unbalanced cross-sectional time-series data (see xt). The SQ-Ados slightly differ in how they deal with gaps at the beginning. Gaps at the beginning are, however, somewhat ill-treated, as this means that a sequence does not start at the first position. ltrim changes the sequence data in such a way that all sequences starts at position one. The dataset is changed by the option ltrim.

keeplongest keeps only the longest section of a sequence with unknown elements at specific positions. Generally, a missing value at a certain position in a sequence is just another element, so there is no specific technical problem. For some commands (i.e., sqom), it is however necessary to consider how similar the missing value is with each of the other elements of a sequence. The answer to this cannot be given by the SQ-Ados themselves. In order to point the user to such problems, sqset checks for missing elements and provides a note. The note points the user to the option keeplongest. sqset with keeplongest will force Stata to keep only the longest available section of a sequence that contains missings. If several sections of a sequence have the same, Stata will randomly select one of them. The dataset is changed by the option keeplongest, and that keeplongest is only one of several ways to deal with missing elements in a sequence (see sq).

Examples

. use http://www.wz-berlin.de/~kohler/ado/youthemp, clear . reshape long st, i(id) j(order) . sqset st id order

Author

Ulrich Kohler, WZB, kohler@wz-berlin.de

Also see

Online: sq, sqdemo, sqset, sqdes, sqegen, sqstat, sqindexplot, sqparcoord, sqom, sqclusterdat, sqclustermat