help distmatch



distmatch -- distance matching based on latitudes and longitudes


distmatch [varlist], id( ) latitude( ) longitude( ) [options]


distmatch provides a fast and easy way to perform distance matching based on latitudes and longitudes. It mathces location identifiers, attributes, or distances from one to another. It also counts the number of neighboring locations within a given area, either in circles or bands of concentric circles.

Outputs produced by distmatch are typically used in studies of geocoded data. These outputs can also be produced by non-Stata softwares such as ArcGIS and ArcView.

distmatch implements haversine formula for distance.

Distance matching with full ranking of nearby neighbors can be computationally intensive. Observations of 3,000 may take several minutes to complete. Observations of 2,000,000 will take about 1 day per nearest neighbor for each observations.

The default setting is to calculate distances for each observation in miles.


+------+ ----+ Main +-------------------------------------------------------------

id(varname) unique identifiers. They cannot be missing.

latitude(varname) decimal-degree latitude.

longitude(varname) decimal-degree longitude.

donor(varname) an indicator for donor. Neighbors will be found for them. The default is 1 for everyone.

recipient(varname) an indicator for recipients. They are candidate to be someone's neighbor. The default is 1 for everyone.

nearest(#) the number of nearest neighbors to be found. nearest(.) will make it look until it runs out of potential neighbors. The default is 5.

minimum(#) the minimum distance.

maximum(#) the maximum distance.

count count the number of neighbors within the area upto nearest(#)

km kilometers instead of miles.

noisily display the progress on the screen.

+--------------+ ----+ Nomenclature +-----------------------------------------------------

dist( ) name of variables to contain distnaces. The default is "dist"

count( ) name of variable to contain the number of neighbors. The default is "count".

prefix( ) prefix. The default is "_".

suffix( ) non-index suffix before the indices. The default is none.


If you have more than one datasets, they need to be appeneded into one dataset. They should have the same name for longigutes and latitutudes so that they will appear in the same columns.

Use donor( ) option to indicate the potential pool of neighbors. See an example below.

Use recipient( ) option to indicate locations for which neighbors will be found. See an example below.

Use noisily option to see the progress on the screen.


* suggestions on how to create unique identifier gen id = _n egen id = group(state county)

* You should download the ancillary file containing population centroids of U.S. counties net describe distmatch, from( use distmatch_county.dta, clear

* noisily produces the id and the distance to 5 nearest neighbors by default distmatch, id(id) lat(lat) lon(lon) browse

* extracts values (attributes) associated with the nearest neighbors drop _* distmatch name lat longi, id(id) lat(lat) lon(lon) browse

* draw potential neighbors from a pool of donors and match them to the recipients drop _* gen donor =1 in 1/1500 gen recipient =1 in 1501/3232 distmatch, id(id) lat(lat) lon(lon) don(donor) rec(recipient) dist(one_to_the_other) browse

* searches in a band of concentric circle between 10 and 11 miles inclusive (for obs 1-500) clear set maxvar 12000 use distmatch_county.dta, clear keep in 1/500 distmatch, id(id) lat(lat) lon(lon) min(10) max(11) nearest(.) count(conc_1_2) browse


Austin Nichols provided suggestions on how to improve the package.


The word "long" is reserved in Stata. Use "lon" or something else instead. minimum( ) and maximum( ) are inclusive, i.e. >= and <=. Conflicts between options should be reported with detailed examples to the address below.


Roy Wada