{smcl} {* revised 22apr2016}{...} {cmd:help moss} {hline} {title:Title} {phang} {bf:moss} {hline 2} Find multiple occurrences of substrings {title:Syntax} {p 8 17 2}{cmd:moss} {it:strvar} {ifin} {cmd:,} {cmdab:m:atch(}[{cmd:"}]{it:pattern}[{cmd:"}]{cmd:)} [ {cmdab:r:egex} {cmdab:p:refix(}{it:prefix}{cmd:)} {cmdab:s:uffix(}{it:suffix}{cmd:)} {cmdab:max:imum(}{it:#}{cmd:)} {cmdab:u:nicode} ] {title:Description} {pstd} {cmd:moss} finds occurrences of substrings matching a pattern in a given string variable. Depending on what is sought and what is found, variables are created giving the count of occurrences (always); the positions of occurrences (whenever any are found); and the exact substrings found (when a regular expression defines a subexpression to be returned). The default names are respectively {cmd:_count}, {cmd:_pos1} up, and {cmd:_match1} up. {title:Remarks} {pstd} By default, {cmd:moss} finds repeated occurrences of the string specified in {cmd:match()} using Stata's {help strpos()} string function (in older versions of Stata, {help strpos()} was named {help index()}). A {cmd:_count} variable is created to indicate the number of occurrences per observation. The position, per observation, of the first instance will be recorded in {cmd:_pos1}, the second in {cmd:_pos2}, and so on. {pstd} With the {cmd:regex} option, {cmd:moss} can be used to repeatedly find more complex patterns of text. The specification of the search pattern must follow {help regexm()} syntax and include one and only one subexpression to be matched. When using regular expressions, subexpressions are identified using parentheses. For example, {cmd:match("AMC ([A-Za-z]+)")} will match {cmd:"AMC Concord"}, {cmd:"AMC Pacer"}, and {cmd:"AMC AMC Spirit"} but {cmd:moss} will put in {cmd:_match1} the matched subexpressions {cmd:"Concord"}, {cmd:"Pacer"}, and {cmd:"AMC Spirit"}. {pstd} {cmd:moss} follows the principle that occurrences must be disjoint and may not overlap. That is, it finds just one occurrence of {cmd:"ana"} in {cmd:"banana"}, not two. {title:Options} {phang}{cmd:match()} is required and the pattern can be either literal text or a regular expression. {phang}{cmd:regex} specifies that the pattern is to be interpreted as a regular expression. Such a pattern must contain precisely one subexpression to be extracted. See Examples. {phang}{cmd:prefix()} specifies an alternative prefix for new variable names to be created by {cmd:moss}. Such a prefix must start either with a letter or with an underscore. {phang}{cmd:suffix()} specifies a suffix for new variable names to be created. {phang}{cmd:prefix()} and {cmd:suffix()} may not be combined. {phang}{cmd:maximum()} specifies an upper limit to the number of position and match variables to be created. That is, specify {cmd:max(3)} if you want to see details of at most the first 3 occurrences of your pattern. {phang}{cmd:unicode} specifies that the Unicode versions of Stata's string functions are to be used. This requires Stata version 14 or higher. {title:Examples} {phang}{cmd:. moss make, match(",")}{p_end} {phang}{cmd:. moss make, match("([0-9]+)") regex}{p_end} {phang}{cmd:. moss history, match("(X+)") regex}{p_end} {phang}{cmd:. moss s, match("([^ ]+)") prefix(s_) regex}{p_end} {title:Authors} {pstd}Robert Picard{p_end} {pstd}picard@netbox.com{p_end} {pstd}Nicholas J. Cox, Durham University{p_end} {pstd}n.j.cox@durham.ac.uk{p_end} {title:Acknowledgments} {pstd}A question on Statalist from Rebecca A. Pope was the stimulus for writing this program. {title:Also see} {psee} Help: {manhelp strpos() D}, {manhelp regexm() D}, {manhelp split D} {p_end} {psee} FAQs: {browse "http://www.stata.com/support/faqs/data/regex.html":What are regular expressions and how can I use them in Stata?} {p_end}