------------------------------------------------------------------------------
help for nearmrg
-------------------------------------------------------------------------------

Nearest match merging of datasets

nearmrg [varlist] using , nearvar(varname) [ limit(real) genmatch(newvarname) lower upper roundup type(mergetype) mergeoptions]

Description

nearmrg performs nearest match merging of two datasets on the values of the numeric variable nearvar. nearmrg was designed as a way to use lookup tables that have binned or rounded values on the variable of interest.

The user specifies whether the master dataset should be matched with observations in the using dataset with the value closest and higher (or upper) than each nearvar value, or observations nearest and lower than near values.{ }

Since the nearvar must be a numeric variable, be sure to convert any time-date string variables to their numeric equivalent (see datetime). Variables may be specified in an optional varlist and these variables are treated as standard merge variable which must match exactly. This option allows nearest matching within subsets defined by the varlist. nearmrg requires Stata 11+ since it utilizes the newer merge command syntax.

Options

nearvar() is required and specifies the variable in the master and using datasets that is to be matched as closely as possible. nearvar() is not optional and must be unique in the using dataset, but not necessarily in the master dataset.

limit() is optional and specifies a limit to how far away from the master dataset value the matched using dataset value can be. For a nearvar() that represents days or date-time, you can specify "limit(90)" to limit matches to within 90 days of the matching date.

lower, upper, roundup are mutually exclusive options that alter the default approach to defining the nearest match for nearvar. lower matches to the closest value of nearvar in the using dataset that is less than or equal to nearvar in the master dataset. upper matches to the closest value that is greater than or equal to nearvar. roundup breaks distance ties by always selecting the higher value instead of the default lower value. If none of these options are specified, nearmrg matches to the closest observation defined as minimizing the absolute difference between nearvar in the master and using datasets.

type() is an advanced option that overrides the default mergetype m:1. See the help merge documentation for information on the other available mergetypes (e.g., m:1, 1:m, m:m, 1:1).

genmatch() is optional and specifies that a new variable should be created in the master datset that identifies the specific value of nearvar in the using dataset that was matched.

mergeoptions allows the user to specify any of the standard Stata merge options (such as update or replace). See merge for more on these options.

Example

//Find car prices in "autoexpense.dta" within $50 of "auto.dta"//

**1: create 'using' data** webuse autoexpense.dta, clear rename make make2 sa "using.dta", replace

**2: merge to auto.dta by price** sysuse auto.dta, clear nearmrg using "using.dta", upper nearvar(price) genmatch(usingmatch) limit(50) > list make* price usingmatch _m if inrange(_m, 3, 5)

Authors Current version of nearmrg (updated for Stata 11+ merge syntax) is written and maintained by: Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@tamu.edu http://www.eric-a-booth.com

*Original nearmrg package appeared in 2003 and was co-authored by: Michael Blasnik M Blasnik & Associates michael.blasnik@verizon.net Katherine Smith Clinical Epidemiology and Biostatistics Unit Murdoch Childrens Research Institute katherine.smith@mcri.edu.au

Also See

On-line: help for merge