{smcl} {* 27mar2006}{...} {hline} help for {cmd:sqegen}{right:(SJ6-4: st0111)} {hline} {title:Extensions to generate (for sequence data)} {p 8 17 2}{cmd:egen} [{it:type}] {it:newvar} {cmd:=} {it:sqfcn}{cmd:()} {ifin} [{cmd:,} {it:options}] {phang}{cmd:Note:} All functions described here allow the option {cmd:subsequence(a,b)}. It is used to include only the part of the sequence that is between position a and b, whereby a and b refer to the position defined in the order variable. {p_end} {title:Description} {pstd} {helpb egen} creates {it:newvar} of the optionally specified storage type equal to {it:sqfcn}{cmd:()}. Unlike standard {cmd:egen} syntax, argument of {it:sqfcn}{cmd:()} is generally left empty. {title:Functions} {phang} {cmd:sqallpos()} {cmd:,} {opt pat:tern(string)} [ gapinclude {opt subseq:uence(range)} ] generates a variable holding the number of occurences in the sequence of the given pattern. To specify the pattern use element[:repetitions] [element:repetitions]. For example, with {cmd: pattern(3:20 5 1:20 3:20)} you specifiy a pattern of length 61, starting with element 3 over 20 positions, followed by one position of elment 5, 20 positions of element 1 and finally again 20 positions of element 3. Note: The program only considers independent occurences of pattern, i.e. if a pattern starts at a position within an already counted pattern it will be skiped. For example, consider the sequence "A A A B A A", in which you want to count the number of occurences of the pattern "A A". The program will count the pattern "A A" starting at positions 1 and 5. It will skip "A A" starting at postion 2 because its first element is part of the first instance. {p_end} {p 8 8 0} Also see below the egen-function {cmd:sqfirstpos()} for the position of the first occurence of a pattern.{p_end} {phang} {cmd:sqelemcount()} [{cmd:,} {opt e:lement(#)} {cmd:gapinclude}] generates a variable holding the number of different elements in each sequence. If {cmd:gapinclude} is specified, variables get defined even for sequences containing gaps. Missing values are generally counted as an element of their own. You might consider using {cmd:sqset} with option {cmd:trim} to get rid of superfluous missings. {phang} {cmd:sqepicount()} [{cmd:,} {opt e:lement(#)} {cmd:gapinclude}] separates a sequence into sections of equal elements (called "episodes"), and generates a variable holding the number of episodes for each sequence. With option {cmd:element()} only the number of episodes of the specified element is generated. If {cmd:gapinclude} is specified, variables get defined even for sequences containing gaps. Episodes with missing values are generally counted as an element of their own. You might consider using {cmd:sqset} with option {cmd:trim} to get rid of superfluous missings. {phang} {cmd:sqfirstpos()} {cmd:,} {opt pat:tern(string)} [ gapinclude {opt subseq:uence(range)} ] generates a variable holding the position of the first occurence of the given pattern. To specify the pattern use element[:repetitions] [element:repetitions]. For example, with {cmd: pattern(3:20 5 1:20 3:20)} you specifiy a pattern of length 61, starting with element 3 over 20 positions, followed by one position of elment 5, 20 positions of element 1 and finally again 20 positions of element 3. {p_end} {p 8 8 0} Also see above the egen-function {cmd:sqallpos()} for the number of occurence of a pattern.{p_end} {phang} {cmd:sqfreq()} [{cmd:,} {cmd:gapinclude so se} {opt subseq:uence(range)} ] generates a variable holding the frequencies of each sequence-type. These are the numbers given in the output of {help sqtab} stored as a variable. The options {cmd: so} and {cmd: se} are described in detail under {help sqtab}. If {cmd:gapinclude} is specified, variables get defined even for sequences containing gaps. Missing values are used as yet another element. You might consider using {cmd:sqset} with option {cmd:trim} to get rid of superfluous missings. {phang} {cmd:sqgapcount()} generates a variable holding the number of gap episodes in each sequence. Only gaps within a sequence is counted as gap (see {help sq##3:sq}). You might consider using {cmd:sqset} with option {cmd:trim} to get rid of "gaps" at the beginning or the end of sequences. {phang} {cmd:sqgaplength()} generates a variable holding the overall length of gap episodes in each sequence. Only gaps within a sequence is counted as gap (see {help sq##3:sq}). You might consider using {cmd:sqset} with option {cmd:trim} to get rid of "gaps" at the beginning or the end of sequences. {phang} {cmd:sqlength()} [{cmd:,} {opt e:lement(#)} {cmd:gapinclude}] generates a variable holding the length -- the number of positions -- of each observed sequence. With option {cmd:element()}, the length of all episodes of the specified element is generated. If {cmd:gapinclude} is specified, variables get defined even for sequences containing gaps. Episodes with missing values adds to the length of the sequences. You might consider using {cmd:sqset} with option {cmd:trim} to get rid of superfluous missings. {phang} {cmd:sqranks()} [{cmd:,} {cmd:gapinclude so se} {opt subseq:uence(range)} ] generates a variable holding rank of the frequencies "league-table" of sequence-types. These are the numbers that define the order of frequencies in the output of {help sqtab} stored as a variable. The options {cmd: so} and {cmd: se} are described in detail under {help sqtab}. If {cmd:gapinclude} is specified, variables get defined even for sequences containing gaps. Missing values are used as yet another element. You might consider using {cmd:sqset} with option {cmd:trim} to get rid of superfluous missings. {title:Author} {pstd}Ulrich Kohler, WZB, kohler@wz-berlin.de{p_end} {title:Examples} {phang}{cmd:. egen length = sqlength()} {phang}{cmd:. egen length1 = sqlength(), element(1) gapinclude} {phang}{cmd:. egen elemnum = sqelemcount()} {phang}{cmd:. egen epinum = sqepicount()} {title:Also see} {psee}Manual: {bf:[D] egen} {psee}Online: {helpb egenmore} (if installed), {helpb sq}, {helpb sqdemo}, {helpb sqset}, {helpb sqdes}, {helpb sqegen}, {helpb sqstat}, {helpb sqindexplot}, {helpb sqparcoord}, {helpb sqom}, {helpb sqclusterdat}, {helpb sqclustermat} {p_end}