Title
strdist -- Calculate the Levenshtein distance, or edit distance, between strings.
Syntax
strdist {varname1|"string1"} {varname2|"string2"} [if] [in] [, generate(newvar) ]
Description
strdist calculates the distance between strings and/or string variables using the Levenshtein distance metric. Levenshtein distance, or edit distance, is the smallest number of edits required to make one string match a second string. An edit may be an insertion, deletion, or substitution of any single letter.
strdist accepts two arguments, which may be string variables or string scalars in any combination. String scalars must be enclosed in quotes.
Edit distances are returned in a scalar or a new variable, depending on the type of arguments supplied. If the arguments contain one or two string variables, edit distances are returned in a new variable with default name strdist. If both arguments are string scalars, edit distance is returned in r(d).
Options
+------+ ----+ Main +-------------------------------------------------------------
generate(newvar) Create a new variable named newvar containing edit distance(s). If the arguments include a string variable without the generate() option, a new variable will be created with default name strdist.
Examples
. strdist "cat" "hat"
. sysuse census
. strdist state "west virginia" , gen(wvdist)
Saved results
strdist saves the following in r():
Scalars r(d) edit distance if arguments are both string scalars
Macros r(strdist) name of new edit distance variable if created
Author
Michael Barker Georgetown University mdb96@georgetown.edu
Also see
soundex, strgroup