help for dmerge                                              manual:  [R] merge
                                                            dialog:  dmerge    

Merge datasets using a modification of Stata's merge

dmerge [varlist] using filename [, ukeep(varlist) unique uniqmaster uniqusing nolabel update replace nokeep _merge(varname) table loudly ]


dmerge joins corresponding observations from the dataset currently in memory (called the master dataset) with those from the Stata-format dataset stored as filename (called the using dataset) into single observations. If filename is specified without an extension, .dta is assumed. dmerge is a modification of Stata's official merge

In contrast to Stata's merge, dmerge automatically drops _merge() if it exists, automatically sorts the master set by the merging variables, automatically sorts the using data set if it is not sorted by the merging variables, and suppresses Stata's listing of variable labels used in both data sets.

dmerge does not have the full range of options that mmerge has, but is considerably faster than mmerge when the using set is already sorted, as mmerge but not dmerge always requires the master set to be preserved and the using set to be read into memory and saved.

dmerge can perform both one-to-one and match merges. In either case, the variable _merge (or the variable specified in _merge() if provided) is added to the data containing

_merge==1 obs. from master data _merge==2 obs. from using data _merge==3 obs. from both master and using data

If update is specified, the codes for _merge are

_merge==1 obs. from master data _merge==2 obs. from using data _merge==3 obs. from both, master agrees with using _merge==4 obs. from both, missing in master updated _merge==5 obs. from both, master disagrees with using


ukeep(varlist) specifies the variables to be kept from the using data. If ukeep() is not specified, all variables are kept.

The ukeep() varlist differs from standard Stata varlists in two ways. First, if you type a simple name, it is assumed to be the exact name of the variable to be kept; it cannot be an abbreviation. Second, you may not refer to a range of variables; specifying ukeep(age-income) is an error. You may, however, use other standard Stata varlist features such as the * and ~ characters to match one or more characters in a variable name; see help varlist.

unique, uniqmaster, and uniqusing specify that the match variable(s) in a match merge uniquely identify the observations.

unique specifies that the match variable(s) uniquely identify the observations in the master data and in the using data. For most match merges, you should specify unique. dmerge does nothing differently if you specify the option unless the assumption you were making turns out to be false. In that case, rather than merging the data dmerge issues an error message.

uniqmaster specifies that the match variable(s) uniquely identify observations in memory (the master data) but not necessarily in the using data.

uniqusing specifies that the match variable(s) uniquely identify observations in the using data but not necessarily in the master data.

unique is equivalent to specifying uniqmaster and uniqusing. If none of the three options are specified, then observations in the using and master data are not required to be unique. In that case, records that have the same values of the match variables are joined observationwise until all the records one side or the other are matched, then the final record on the shorter side is duplicated over and over again to match with the remaining records with the same match value on the longer side.

nolabel prevents Stata from copying the value label definitions from the disk dataset. Even if you do not specify this option, in no event do label definitions from disk replace those already in memory.

update varies the action dmerge takes when an observation is matched. By default, the master data is held inviolate -- values from the master data are retained when the same variables are found in both datasets If update is specified, however, the values from the using data are retained in cases where the master data contains missing.

replace, allowed with update only, specifies that even in the case when the master data contains nonmissing values, they are to be replaced with corresponding values from the using data when corresponding values are not equal. A nonmissing value, however, will never be replaced with a missing value.

nokeep causes dmerge to ignore observations in the using data that have no corresponding observation in the master. The default is to add these observations to the merged result and mark such observations with _merge==2.

_merge(varname) specifies the name of the variable that will mark the source of the resulting observation. The default is _merge(_merge).

table forces dmerge to tabulate _merge at the completion of the merging process}.

loudly removes quietly from dmerge so that the usual merge descriptions are displayed.

Example: one-to-one merge

. use ds1 . dmerge using ds2, unique

Example: match merge

. use ds2 . sort recid . save ds2, replace . use ds1 . sort recid . dmerge recid using ds2 . tabulate _merge

Example: update match merge

. use original, clear . dmerge make using updata, update . tabulate _merge

Also see

Manual: [U] 25 Commands for combining data, [R] merge