------------------------------------------------------------------------------- help for dsconcat (Roger Newson) -------------------------------------------------------------------------------

Concatenate a list of Stata data files into the memory

dsconcat filename_list [ , subset([varlist] [if] [in]) dsid(newvarname) dsname(newvarname) obsseq(newvarname) nolabel noldsid ]

where filename_list is a list of filenames separated by spaces. If any filename in the list is specified without an extension, then .dta is assumed.

Description

dsconcat is a multiple-file version of use. It takes, as input, a list of filenames, assumed to belong to Stata data files, and creates a new dataset in memory, containing a concatenation of the input data files. The new dataset contains all variables in all the input datasets (or a subset of variables specified by the subset option), and all observations in all the input datasets (or a subset of observations specified by the subset option), ordered primarily by source dataset and secondarily by order of observations within source dataset. For any one variable in the output dataset, values of the variable are set to missing in any observation from an input dataset not containing that variable. Optionally, dsconcat creates new variables specifying, for each observation, the input file of origin and/or the sequential order of the observation in its input file of origin.

Options

subset([varlist] [if] [in]) specifies a subset of variables and/or observations in each of the input datasets, to be included in the concatenated output dataset in memory. The value of the subset() option is a combination of a varlist and/or an if qualifier and/or an in qualifier. Each of these is optional. However, they must be valid for all input datasets, according to the rules used by the use command.

dsid(newvarname) specifies a new integer variable to be created, containing, for each observation in the new dataset, the sequential order, in the filename_list, of the input dataset of origin of the observation. If noldsid is not specified, then dsconcat creates a value label for the newvarname with the same name, assigning, to each positive integer i from 1 to the number of input filenames in the filename_list, a label equal to the filename of the ith input dataset. If a value label of that name already exists in one of the input datasets, and nolabel is not specified, then dsconcat adds new labels, but does not replace existing labels.

dsname(newvarname) specifies a new string variable containing, for each observation in the new dataset, the name of the input dataset of origin of that observation, truncated if necessary to the maximum string variable length in the version of Stata being used.

obsseq(newvarname) specifies a new integer variable containing, for each observation in the new dataset, the sequential order of that observation in its input dataset of origin. If the subset() option is specified, then the sequential order of each observation is defined as its sequential order amongst the subset of observations in the original dataset specified by the subset() option, excluding observations in the original dataset excluded by the subset() option.

nolabel prevents dsconcat from copying value label definitions from the input datasets.

noldsid specifies that the new variable generated by the dsid() option will have no value label. This implies that the values of the new variable specified by the dsid() option will be listed as dataset sequence numbers, not as dataset names. This option is useful if the input datasets are very numerous and/or are repeated and/or are temporary files with uninformative names. It is ignored if no dsid() option is specified.

Remarks

dsconcat is a multi-file version of use. However, it is different in that it overwrites existing datasets automatically (as collapse and contract do), instead of than requiring a clear option (as use does).

Examples

. dsconcat auto1 auto2 auto3 auto4,dsid(dsseq) obs(obsnum) . sort dsseq obsnum

. dsconcat "Microsoft is inferior" Unix_is_superior IdontknowaboutMacOS

The following example creates a dataset containing variables make, foreign, mpg and weight in the first 53 observations of each of the datasets auto1, auto2, auto3 and auto4, with the input dataset name stored in the new string variable dslab:

. dsconcat auto1 auto2 auto3 auto4, subset(make foreign mpg weight in 1/53) dsn(dslab)

Author

Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also see

Manual: [D] append, [D] save, [D] use