help mergeall -------------------------------------------------------------------------------

Title

mergeall -- A safe way to merge many files

Syntax

mergeall varlist using folder [, options]

varlist is the match variables that uniquely identify observations. It is requi > red.

options Description ------------------------------------------------------------------------- csv files to be merged are .csv (default) txt files to be merged are .txt dta files to be merged are .dta tab insheet tab delimited data comma insheet comma delimited data double insheet all numeric variables as double. See format format specify a format to be used in the event that a numberic variable must be converted to string. See tostring and format do(filename) runs the specified do file on each individual file before merging strings(varlist) force the varlist to string format and all others to numeric force forces conversion to string or numeric. Required with the string option. showsource generates a new string variable containing the name of the file each observation was drawn from. -------------------------------------------------------------------------

Description

mergeall merges all of the files in a folder without loss of data due to variable storage types or duplicate unique identifiers.

Remarks

mergeall loops through all of the files in the folder you specify, checking variable types before merging. It sets all variables that are string in any file to be string in every file to prevent loss of data. By default, Stata forces the variable type of the master file on the using file, which can result in lost data.

mergeall requires a unique identifier to be specified and exits with error if the identifier is not unique within files. A unique identifier is required because Stata can sometimes merge in unexpected ways when there is no unique identifier, and the goal of mergeall is to make merging many files super-safe.

mergeall Performs 1 to 1 merges using Stata10 style merging and creates a new variable, _disagreement, which equals 1 if an observation exists in two or more datasets, and the datasets disagree on its value. If _disagreement equals 1, you have lost information.

If you are so inclined, you can run a cleaning .do file on each dataset before merging using do(filename) (this is useful for fixing errors in unique identifiers, for example).

strings is probably a bad idea because it can result in the loss of data, but if you are very sure you won't lose data, it runs a little faster.

The showsource option is useful for troubleshooting when you want to return to the raw data to check values, but you don't know which raw file contains the observation you are looking for.

Also see

Online: [D] merge [D] append, [D] cross, [D] joinby, [D] save, [D] sort

Author