Extensions to generate (for sequence data)
egen [type] newvar = sqfcn() [if] [in] [, options]
Note: All functions described here allow the option subsequence(a,b). It is used to include only the part of the sequence that is between position a and b, whereby a and b refer to the position defined in the order variable.
Description
egen creates newvar of the optionally specified storage type equal to sqfcn(). Unlike standard egen syntax, argument of sqfcn() is generally left empty.
Functions
sqallpos() , pattern(string) [ gapinclude subsequence(range) ] generates a variable holding the number of occurences in the sequence of the given pattern. To specify the pattern use element[:repetitions] [element:repetitions]. For example, with pattern(3:20 5 1:20 3:20) you specifiy a pattern of length 61, starting with element 3 over 20 positions, followed by one position of elment 5, 20 positions of element 1 and finally again 20 positions of element 3. Note: The program only considers independent occurences of pattern, i.e. if a pattern starts at a position within an already counted pattern it will be skiped. For example, consider the sequence "A A A B A A", in which you want to count the number of occurences of the pattern "A A". The program will count the pattern "A A" starting at positions 1 and 5. It will skip "A A" starting at postion 2 because its first element is part of the first instance. Also see below the egen-function sqfirstpos() for the position of the first occurence of a pattern.
sqelemcount() [, element(#) gapinclude] generates a variable holding the number of different elements in each sequence. If gapinclude is specified, variables get defined even for sequences containing gaps. Missing values are generally counted as an element of their own. You might consider using sqset with option trim to get rid of superfluous missings.
sqepicount() [, element(#) gapinclude] separates a sequence into sections of equal elements (called "episodes"), and generates a variable holding the number of episodes for each sequence. With option element() only the number of episodes of the specified element is generated. If gapinclude is specified, variables get defined even for sequences containing gaps. Episodes with missing values are generally counted as an element of their own. You might consider using sqset with option trim to get rid of superfluous missings.
sqfirstpos() , pattern(string) [ gapinclude subsequence(range) ] generates a variable holding the position of the first occurence of the given pattern. To specify the pattern use element[:repetitions] [element:repetitions]. For example, with pattern(3:20 5 1:20 3:20) you specifiy a pattern of length 61, starting with element 3 over 20 positions, followed by one position of elment 5, 20 positions of element 1 and finally again 20 positions of element 3.
Also see above the egen-function sqallpos() for the number of occurence of a pattern.
sqfreq() [, gapinclude so se subsequence(range) ] generates a variable holding the frequencies of each sequence-type. These are the numbers given in the output of sqtab stored as a variable. The options so and se are described in detail under sqtab. If gapinclude is specified, variables get defined even for sequences containing gaps. Missing values are used as yet another element. You might consider using sqset with option trim to get rid of superfluous missings.
sqgapcount() generates a variable holding the number of gap episodes in each sequence. Only gaps within a sequence is counted as gap (see sq). You might consider using sqset with option trim to get rid of "gaps" at the beginning or the end of sequences.
sqgaplength() generates a variable holding the overall length of gap episodes in each sequence. Only gaps within a sequence is counted as gap (see sq). You might consider using sqset with option trim to get rid of "gaps" at the beginning or the end of sequences.
sqlength() [, element(#) gapinclude] generates a variable holding the length -- the number of positions -- of each observed sequence. With option element(), the length of all episodes of the specified element is generated. If gapinclude is specified, variables get defined even for sequences containing gaps. Episodes with missing values adds to the length of the sequences. You might consider using sqset with option trim to get rid of superfluous missings.
sqranks() [, gapinclude so se subsequence(range) ] generates a variable holding rank of the frequencies "league-table" of sequence-types. These are the numbers that define the order of frequencies in the output of sqtab stored as a variable. The options so and se are described in detail under sqtab. If gapinclude is specified, variables get defined even for sequences containing gaps. Missing values are used as yet another element. You might consider using sqset with option trim to get rid of superfluous missings.
Author
Ulrich Kohler, WZB, kohler@wz-berlin.de
Examples
. egen length = sqlength()
. egen length1 = sqlength(), element(1) gapinclude
. egen elemnum = sqelemcount()
. egen epinum = sqepicount()
Also see
Manual: [D] egen
Online: egenmore (if installed), sq, sqdemo, sqset, sqdes, sqegen, sqstat, sqindexplot, sqparcoord, sqom, sqclusterdat, sqclustermat