Split All Observations of a Variable into Tokens
vtokenize varname [if exp] [in range] [, stub parse nospace nodelimeters
Description
Splits the varname into its component tokens, generating as many new variables as needed. (The original variable is left untouched.) Used for working with truly nasty text files.
Options
stub specifies the start of the names of the variables which will be generated. If omitted, the stub will simply be the name of the variable being split. The resulting variables will have suffixes _1, _2, ... or _01, _02, ... or _001, _002, ... depending on the number of variables generated.
parse(delimiters) gives the list of delimiters which are used to separate tokens. If omitted, the only delimiter is whitespace (one or more spaces). There is no need to specify space a delimiter, though explicitly specifying it will not cause problems.
nospace is used to prevent spaces from being used as delimiters.
nodelimiters is used to prevent delimiters from being stored as tokens. Note that just as with gettoken, spaces are never kept as tokens.
Example(s)
. vtokenize foo Splits foo into words by breaking on space(s), storing the first word for each observation in foo_1, the second word in foo_2, etc. foo itself is not altered.
. vtokenize foo, stub(bar) Splits foo into words by breaking on space(s), storing the first word for each observation in bar_1, the second word in bar_2, etc.
. vtokenize foo, stub(bar) parse(":") nospace Splits foo into words by breaking on colons (:), storing the first token for each observation in bar_1, the second token in bar_2, etc. The colons themselves are treated as tokens.
. vtokenize foo, stub(bar) parse(":") nospace nodelimiters Splits foo into words by breaking on colons (:), storing the first token for each observation in bar_1, the second token in bar_2, etc. The colons themselves are not treated as tokens.
. vtokenize foo, stub(bar) parse(":") Splits foo into words by breaking on colons (:) and spaces, storing the first token for each observation in bar_1, the second token in bar_2, etc. The colons themselves are treated as tokens, but the spaces are not.
Notes
vtokenize checks only to see if the variable stub_*1 exists when doing error checking. Thus, it will die ungracefully if, say stub_3 exists, but it needs to generate its own stub_3
Also see tokenize, gettoken, vgettoken
Author Bill Rising email: brising@louisville.edu web: http://www.louisville.edu/~wrrisi01
snailmail: Dept. of Family and Community Medicine University of Louisville MedCenter One, Suite 270 501 E. Broadway Louisville, KY 40202
Last Updated: December 9, 2003 @ 21:51:25