help strdist


strdist -- Calculate the Levenshtein distance, or edit distance, between strings.


strdist {varname1|"string1"} {varname2|"string2"} [if] [in] [, generate(newvar) ]


strdist calculates the distance between strings and/or string variables using the Levenshtein distance metric. Levenshtein distance, or edit distance, is the smallest number of edits required to make one string match a second string. An edit may be an insertion, deletion, or substitution of any single letter.

strdist accepts two arguments, which may be string variables or string scalars in any combination. String scalars must be enclosed in quotes.

Edit distances are returned in a scalar or a new variable, depending on the type of arguments supplied. If the arguments contain one or two string variables, edit distances are returned in a new variable with default name strdist. If both arguments are string scalars, edit distance is returned in r(d).


+------+ ----+ Main +-------------------------------------------------------------

generate(newvar) Create a new variable named newvar containing edit distance(s). If the arguments include a string variable without the generate() option, a new variable will be created with default name strdist.


. strdist "cat" "hat"

. sysuse census

. strdist state "west virginia" , gen(wvdist)

Saved results

strdist saves the following in r():

Scalars r(d) edit distance if arguments are both string scalars

Macros r(strdist) name of new edit distance variable if created


Michael Barker Georgetown University

Also see

soundex, strgroup