{smcl}
{* 18 May 2004}{...}
{hline}
help for {hi:tabsplit}
{hline}

{title:Tabulate string variables split into parts}

{p 8 17 2}
{cmd:tabsplit} 
{it:strvar} 
[{cmd:if} {it:exp}] 
[{cmd:in} {it:range}] 
[
{cmd:,}
{cmdab:char:acters} 
{cmdab:p:arse:(}{it:parse_strings}{cmd:)}
[{cmdab:no:}]{cmdab:t:rim}
{it:tabulate_options}
]  


{title:Description}

{p 4 4 2}
{cmd:tabsplit} tabulates frequencies of occurrence of the parts of a string
variable. By default, the parts of a string are separated by spaces. The parts
of {cmd:"A B C"} are thus {cmd:"A"}, {cmd:"B"} and {cmd:"C"}. Optionally,
alternative parsing strings may be specified. The parts of {cmd:"A,B,C"}
with {cmd:parse(,)} are, again, {cmd:"A"}, {cmd:"B"} and {cmd:"C"}. The parts
of {cmd:"A B C"} with {cmd:parse(,)} are just the single part {cmd:"A B C"}.
The idea of a part thus generalises Stata's concept of a word. 


{title:Remarks} 

{p 4 4 2} 
Suppose data are gathered on modes of transport used in the journey to work. In
addition to values of {cmd:"car"}, {cmd:"cycle"}, {cmd:"foot"}, and so forth,
there may be multiple values such as {cmd:"car train tube foot"} for people who
use two or more modes. Within the {help limits} in your version of Stata such
single or multiple values may be stored as string variables.  It may then be
desired, for example, to count the individual modes used.  {cmd:tabsplit} is
designed for this special problem. 

{p 4 4 2}
By default, leading and trailing spaces are ignored. Thus, string values that
equal one or more spaces are treated just as if they were missing. Also with
{cmd:{bind:" 1,  2,   3"}} and {cmd:parse(,)} the parts are {cmd:"1"}, {cmd:"2"} and
{cmd:"3"}.


{title:Options}

{p 4 8 2}{cmd:characters} specifies that strings are to be split into separate
characters. Thus strings such as {cmd:"ABCDE"} and {cmd:"ABC"} will be split so
that the frequencies of {cmd:"A"}, {cmd:"B"}, etc. will be tabulated.
{cmd:parse()} is ignored if {cmd:characters} is specified. 

{p 4 8 2}
{cmd:parse(}{it:parse_strings}{cmd:)} specifies that, instead of spaces,
parsing should be done using one or more {it:parse_strings}. Most commonly,
one string which is a single punctuation character will be specified.  For
example, if {cmd:parse(,)} is specified, then {cmd:{bind:"1,2,3"}} is split
into {cmd:"1"}, {cmd:"2"} and {cmd:"3"}.

{p 8 8 2}
It is also possible to specify (1) two or more strings which are alternative
separators of parts and/or (2) strings which consist of two or more
characters.  Alternative strings should be separated by spaces and strings
which include spaces should be bound by {cmd:{bind:" "}}. Thus if
{cmd:{bind:parse(, " ")}} is specified, then {cmd:{bind:"1,2 3"}} is also
split into {cmd:"1"}, {cmd:"2"} and {cmd:"3"}.  Note particularly the
difference between (say) {cmd:{bind:parse(a b)}} and {cmd:parse(ab)}: with the
first, {cmd:"a"} and {cmd:"b"} are both acceptable as separators, while with
the second, only the string {cmd:"ab"} is acceptable.

{p 4 8 2}
{cmd:notrim} specifies that the original string variable should not be trimmed
of leading and trailing spaces before being parsed, and that the parts 
should not be trimmed similarly before being tabulated. {cmd:notrim} is not
considered compatible with parsing on spaces, as the latter implies that spaces
in a string are to be discarded: either specify parse strings or by default
allow a {cmd:trim}.
 
{p 4 8 2}{it:tabulate_options} are options of {help tabulate} with one
variable. The most useful in practice is {cmd:sort}. Note that the table 
is based on a temporary dataset which does not remain in memory after
{cmd:tabsplit} has finished. 


{title:Examples}

{p 4 8 2}{inp:. tabsplit authors, parse(,) sort}


{title:Author} 

{p 4 4 2}Nicholas J. Cox, University of Durham, U.K.{break} 
         n.j.cox@durham.ac.uk


{title:Also see}

{p 4 13 2}On-line:  help for {help split}, {help tabulate}