{smcl}
{* 10april2006}{...}
{vieweralsosee "sqclusterdat" "help sqclusterdat "}{...}
{vieweralsosee "sqdes" "help sqdes "}{...}
{vieweralsosee "sqegen" "help sqegen "}{...}
{vieweralsosee "sqindexplot" "help sqindexplot "}{...}
{vieweralsosee "sqmdsadd" "help sqmdsadd "}{...}
{vieweralsosee "sqmodalplot" "help sqmodalplot "}{...}
{vieweralsosee "sqom" "help sqom "}{...}
{vieweralsosee "sqpercentageplot" "help sqpercentageplot "}{...}
{vieweralsosee "sqset" "help sqset "}{...}
{vieweralsosee "sqstat" "help sqstat "}{...}
{vieweralsosee "sqstrlev" "help sqstrlev "}{...}
{vieweralsosee "sqstrmerge" "help sqstrmerge "}{...}
{vieweralsosee "sqtab" "help sqtab "}{...}

{cmd:help sqset}{right:(SJ6-4: st0111)}
{hline}

{title:Title}

{p2colset 5 14 16 2}{...}
{p2col :{hi: sqset} {hline 2}}Declare a dataset to be sequence data{p_end}
{p2colreset}{...}


{title:Syntax}

{pstd}Declare data to be sequence data and specify element variable,
the sequence identifier and sequence order (positions)

{p 8 15 2}
{cmd:sqset} {it:elementvar idvar ordervar}
                [{cmd:, trim rtrim ltrim keeplongest} ]

{pstd} where {it:elementvar} is the variable that contains the
    elements of sequences, {it:idvar} is a variable that
    identifies the sequences, and {it:ordervar} is a variable that
    defines the order of the sequences.

{pstd}Display how dataset is currently sqset

{p 8 15 2}
{cmd:sqset}

{pstd}Clear sequence data settings

{p 8 15 2}
{cmd:sqset, clear}


{title:Description}

{pstd}
{cmd:sqset} declares the data to be sequence data and designates that
{it:elementvar} represents the variable that represents the elements
of the sequences, {it:idvar} should be an identifier of the
sequences, and {it:ordervar} should be a variable that defines the
order of each sequence.

{pstd} When using {cmd:sqset} various checks on the data are
performed, and reported back to the user.

{pstd} {cmd:sqset} without arguments displays whether and how the
 dataset is currently set.

{pstd} {cmd:sqset, clear} is a rarely used to erase the settings from the data. 

{pstd} To use {cmd:sqset}, sequence data has to be in long format. Use
{helpb reshape} to change sequence data in wide format to sequence data
in long format.


{title:Options}

{phang} {cmd:trim} means both, {cmd:ltrim} and {cmd:rtrim}. Generally,
we recommend using this option.

{phang} {cmd:rtrim} erases empty elements at the end of the
sequences. Sequence data that stem from data in wide format often
contain missing values at the end of sequences. Generally, there is no
need for these observations, so that they can simply be erased without
loss of information. 

{phang} {cmd:ltrim} strips all empty elements at the beginning of the
sequences. Generally, all sequences should start with the element at
the first position. In practice, sequence data sometimes have one or
more empty elements at the beginning, which we call "gaps at the
beginning". Gaps at the beginning are very common for sequence data
that has its origins in unbalanced cross-sectional time-series data
(see {help xt}). The SQ-Ados slightly differ in how they deal
with gaps at the beginning. Gaps at the beginning are, however,
somewhat ill-treated, as this means that a sequence does not start at
the first position. {cmd:ltrim} changes the sequence data in such a
way that all sequences starts at position one. The dataset is changed
by the option {cmd:ltrim}.

{phang} {cmd:keeplongest} keeps only the longest section of a sequence
with unknown elements at specific positions. Generally, a missing
value at a certain position in a sequence is just another element, so
there is no specific technical problem. For some commands
(i.e., {helpb sqom}), it is however necessary to consider how similar the
missing value is with each of the other elements of a sequence. The
answer to this cannot be given by the SQ-Ados themselves. In order to
point the user to such problems, {cmd:sqset} checks for missing
elements and provides a note. The note points the user to the option
{cmd:keeplongest}. {cmd:sqset} with {cmd:keeplongest} will force Stata
to keep only the longest available section of a sequence that contains
missings. If several sections of a sequence have the same, Stata will
randomly select one of them. The dataset is changed by the option
{cmd:keeplongest}, and that {cmd:keeplongest} is only one of several
ways to deal with missing elements in a sequence
(see {help sq##3:sq}). 


{title:Examples}

{phang}{cmd:. use http://www.wz-berlin.de/~kohler/ado/youthemp, clear}{p_end}
{phang}{cmd:. reshape long st, i(id) j(order)}{p_end}
{phang}{cmd:. sqset st id order}


{title:Author}

{pstd}Ulrich Kohler, University of Potsdam, ulrich.kohler@uni-potsdam.de{p_end}


{title:Also see}

{psee} Online: {helpb sq}, {helpb sqdemo}, {helpb sqset},
                {helpb sqdes}, {helpb sqegen}, {helpb sqstat},
		{helpb sqindexplot}, {helpb sqparcoord}, {helpb sqom},
		{helpb sqclusterdat}, {helpb sqclustermat}
{p_end}