-------------------------------------------------------------------------------
help for addinby                                                 (Roger Newson)
-------------------------------------------------------------------------------

Add in data from a disk dataset using a foreign key

addinby keyvarlist using filename [ , missing unmatched(action_spec) nocomplete fast keep(varlist) generate(newvar) sorted nolabel nonotes update replace ]

where keyvarlist is a varlist specifying a list of key variables, and action_spec may be drop, keep or fail.

Description

addinby is a "cleaner" alternative version of merge m:1, designed to reduce the lines of code in Stata do-files. It adds variables and/or values to existing observations in the dataset currently in memory (the master dataset) from a Stata-format dataset stored in the file filename (the using dataset), using a foreign key of variables specified by the keyvarlist to identify observations in the using dataset. These foreign key variables must identify observations in the using dataset uniquely. Unlike merge m:1, addinby always preserves the observations in the master dataset in their original sorting order, and never adds any additional observations, and only generates a matching information variable if requested to do so. However, addinby may optionally check that there are no unmatched observations in the master dataset, and/or check that there are no missing values in the foreign key variables in the master dataset.

Options

missing specifies that missing values are allowed in the variables in the keyvarlist in the master dataset. If missing is not specified, then missing values in the variables in the keyvarlist cause addinby to fail.

unmatched(action_spec) specifies the action to be taken with observations in the master dataset which are unmatched by the foreign key in the using dataset. The possible values of action_spec are drop, keep and fail. If drop is specified, then unmatched observations are dropped from the master dataset, and addinby completes execution without error. If keep is specified, then unmatched observations are kept in the master dataset (with missing values in any variables added from the using dataset), and addinby completes execution without error. If fail is specified, then addinby fails with an error message if there are any unmatched observations in the master dataset. If the unmatched() option is not specified, then unmatched(fail) is assumed.

nocomplete is a shorthand for unmatched(keep). It is ignored if the unmatched() option is specified.

fast is an option for programmers. It specifies that addinby will take no action to restore the existing master dataset in memory in the event of failure. If fast is not specified, then addinby will take this action, which uses an amount of time depending on the size of the dataset in memory.

keep(varlist) specifies the variables to be kept from the using dataset. If keep() is not specified, then all variables are kept.

generate(newvar) specifies the name of a new variable to be generated, containing match results information, and coded as the variable generated by the generate() option of merge. If generate() is not specified, then no match results variable is generated.

sorted specifies that the using dataset is already sorted by the keyvarlist. If sorted is not specified, then addinby creates a temporary copy of the using dataset in the memory, and sorts it before merging it into the master dataset. If the using dataset is very large, then this may possibly use a lot of time and/or space, which may possibly be saved by specifying sorted.

nolabel, nonotes, update and replace function as the options of the same names for merge.

Remarks

addinby was designed to be used with master datasets and using datasets keyed using the keyby package, which can be downloaded from SSC. Both keyby and addinby are designed to enforce the relational database model, in which a dataset is viewed as a mathematical function, whose domain is the set of existing value combinations of the primary key variables, and whose range is the set of all possible value combinations of the non-key variables. A dataset therefore has one observation per thing (identified by the primary key variables), and data on attributes_of_things (specified by the non-key variables). A foreign key is defined as a list of variables in a dataset (typically the master dataset) which is also the primary key of a second dataset (typically the using dataset). keyby ensures that a dataset is sorted, and its observations uniquely identified, by a primary key. addinby then adds variables and/or values to existing observations in a master dataset, using a foreign key specified by the keyvarlist to identify observations from the using dataset in which the values for the variables are to be found. Therefore, keyby is a "clean" version of sort, which ensures that the observations in a dataset are identified, as well as sorted, by the key variables. And addinby is a "clean" version of merge m:1, which ensures that these observations stay identified, and sorted, after the additional data have been merged in.

Examples

.webuse autotech, clear .describe .describe using http://www.stata-press.com/data/r10/autocost .addinby make using http://www.stata-press.com/data/r10/autocost .describe

.webuse dollars, clear .describe .list .webuse sforce, clear .describe .list .addinby region using http://www.stata-press.com/data/r10/dollars .describe .list

Author

Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also see

Manual: [D] sort, [D] gsort, [D] merge, [D] order, [U] 12.2.1 Missing values

Help: [D] sort, [D] gsort, [D] merge, [D] order, [U] 12.2.1 Missing values keyby, keybygen if installed