Title
mergemany -- A flexible command to merge many files
Syntax
One-to-one merge of files where user lists full file names
mergemany 1:1 filename1 filename2..., match(varlist) [options]
One-to-one merge of files where user takes advantage of numerical regularity in > file name
mergemany 1:1 fileprefix, match(varlist) numerical(numlist) [options]
One-to-one merge of all files in the current working directory
mergemany 1:1 all, match(varlist) all [options]
This syntax also generalises to one-to-many, many-to-one, and many-to-many matc > hes as per merge
options Description ------------------------------------------------------------------------- Options match(varlist) lists the variable(s) upon which the match is performed; this is a required option. numerical(numlist) used when specifying a merge based upon the numerical suffix of a file name; cannot be used with all all merges all files in the current working directory; cannot be used with numerical(numlist) keep conserves the dataset currently in memory while simultaneously performing the merge between all filnames; in this case the option saving(filename) is recommended saving(filename) saves the resulting parent file from all merges as filename.dta; recommended when conserving the dataset in memory via keep verbose creates a variable to mark merge results for each separate merge; by default this is _merge_filename import(filetype) allows for non .dta files to be imported and merged directly. filetype must display the data type which is being imported (eg .csv, .raw). When using .dta files, this option should not be used. inoption(options) allows for insheet options to be specified when importing data. Any options which are available in insheet can be used. This option should only be used when importing via import(filetype) -------------------------------------------------------------------------
Description
mergemany is an extension to the command merge, providing a flexible way for many 'using' datasets to be merged into one final dataset. mergemany is able to perform the standard merges defined in merge (one-to-one, one-to-many, many-to-one, many-to-many); one of these matches must be specified.
mergemany provides a number of ways to specify the files to be merged. File names may be listed in full allowing for merges of files in separate directories or with no obvious naming scheme. A numerical suffix can be used in the case that files share a common prefix but differ due to a non-identical suffix (such as file1, file2, file3...). In this case the suffix is listed as an argument and the option numerical(numlist) must be specified. Finally, all files of a given type from the current working directory can be merged into one file (see cd for help in navigating to a required directory). When merging all files from a directory the argument all should be included in place of file names and the option all must be specified.
The resulting match rate for each using file merged into the parent file are displayed as program output, however in order for a resulting variable to be included listing the source and contents of each observation (as per the variable _merge in merge), the option verbose must be specified. For more details regarding these outputs and the values taken by these variables (if specified), see the match results table in merge.
mergemany allows non .dta files to be imported directly and merged in one step. In this case the option import(filetype) should be specified, where filetype refers to the type of data being imported. This supports any data type which can be imported via the insheet command. In the case that further options of insheet are necessary when importing the data (such as case), the option inoption(options) can be used.
--------------------------------------------------------------------------- Perform 1:1 match merge listing full file names
Setup . webuse autosize . list . webuse autoexpense . list . webuse auto . list
. mergemany 1:1 http://www.stata-press.com/data/r12/autoexpense http://www.stata-press.com/data/r12/autosize http://www.stata-press.com/data/r12/auto, match(make) . list
--------------------------------------------------------------------------- Perform 1:1 match merge, using all files in a folder called auto
Setup . mkdir auto . cd auto . webuse autosize . save auto1 . webuse autoexpense . save auto2 . webuse auto . save auto3
. mergemany 1:1 all, match(make) all
--------------------------------------------------------------------------- Perform 1:1 match merge, using numerical regularity of all files in the auto folder (created above)
. mergemany 1:1 auto, match(make) numerical(1(1)3)
---------------------------------------------------------------------------
Also see
Online: [D] merge [D] insheet, [D] cross, [D] append, [D] joinby, [D] sort
Author
Damian C. Clarke, University of Oxford and ComunidadMujer. mailto:damian.clarke@economics.ox.ac.uk