{smcl} {.-} help for {cmd:dsconcat} {right:(Roger Newson)} {.-} {title:Concatenate a list of Stata data files into the memory} {p 8 21 2} {cmd:dsconcat} {it:filename_list} [ {cmd:,} {cmdab:sub:set}{cmd:(}[{varlist}] {ifin}{cmd:)} {break} {cmdab:dsi:d}{cmd:(}{it:{help newvarname}}{cmd:)} {cmdab:dsn:ame}{cmd:(}{it:{help newvarname}}{cmd:)} {cmdab:obs:seq}{cmd:(}{it:{help newvarname}}{cmd:)} {cmdab:nol:abel} {cmdab:nold:sid} ] {pstd} where {it:filename_list} is a list of filenames separated by spaces. If any filename in the list is specified without an extension, then {hi:.dta} is assumed. {title:Description} {pstd} {cmd:dsconcat} is a multiple-file version of {helpb use}. It takes, as input, a list of filenames, assumed to belong to Stata data files, and creates a new dataset in memory, containing a concatenation of the input data files. The new dataset contains all variables in all the input datasets (or a subset of variables specified by the {cmd:subset} option), and all observations in all the input datasets (or a subset of observations specified by the {cmd:subset} option), ordered primarily by source dataset and secondarily by order of observations within source dataset. For any one variable in the output dataset, values of the variable are set to missing in any observation from an input dataset not containing that variable. Optionally, {cmd:dsconcat} creates new variables specifying, for each observation, the input file of origin and/or the sequential order of the observation in its input file of origin. {title:Options} {p 4 8 2} {cmd:subset(}[{varlist}] {ifin}{cmd:)} specifies a subset of variables and/or observations in each of the input datasets, to be included in the concatenated output dataset in memory. The value of the {cmd:subset()} option is a combination of a {varlist} and/or an {helpb if} qualifier and/or an {helpb in} qualifier. Each of these is optional. However, they must be valid for all input datasets, according to the rules used by the {helpb use} command. {p 4 8 2} {cmd:dsid(}{it:{help newvarname}}{cmd:)} specifies a new integer variable to be created, containing, for each observation in the new dataset, the sequential order, in the {it:filename_list}, of the input dataset of origin of the observation. If {cmd:noldsid} is not specified, then {cmd:dsconcat} creates a value label for the {it:{help newvarname}} with the same name, assigning, to each positive integer {hi:i} from 1 to the number of input filenames in the {it:filename_list}, a label equal to the filename of the {hi:i}th input dataset. If a value label of that name already exists in one of the input datasets, and {cmd:nolabel} is not specified, then {cmd:dsconcat} adds new labels, but does not replace existing labels. {p 4 8 2} {cmd:dsname(}{it:{help newvarname}}{cmd:)} specifies a new string variable containing, for each observation in the new dataset, the name of the input dataset of origin of that observation, truncated if necessary to the {help limits:maximum string variable length} in the version of Stata being used. {p 4 8 2} {cmd:obsseq(}{it:{help newvarname}}{cmd:)} specifies a new integer variable containing, for each observation in the new dataset, the sequential order of that observation in its input dataset of origin. If the {cmd:subset()} option is specified, then the sequential order of each observation is defined as its sequential order amongst the subset of observations in the original dataset specified by the {cmd:subset()} option, excluding observations in the original dataset excluded by the {cmd:subset()} option. {p 4 8 2} {cmd:nolabel} prevents {cmd:dsconcat} from copying {help label:value label} definitions from the input datasets. {p 4 8 2} {cmd:noldsid} specifies that the new variable generated by the {cmd:dsid()} option will have no {help label:value label}. This implies that the values of the new variable specified by the {cmd:dsid()} option will be listed as dataset sequence numbers, not as dataset names. This option is useful if the input datasets are very numerous and/or are repeated and/or are {help tempfile:temporary files} with uninformative names. It is ignored if no {cmd:dsid()} option is specified. {title:Remarks} {pstd} {cmd:dsconcat} is a multi-file version of {helpb use}. However, it is different in that it overwrites existing datasets automatically (as {helpb collapse} and {helpb contract} do), instead of than requiring a {cmd:clear} option (as {helpb use} does). {title:Examples} {p 8 12 2}{cmd:. dsconcat auto1 auto2 auto3 auto4,dsid(dsseq) obs(obsnum)}{p_end} {p 8 12 2}{cmd:. sort dsseq obsnum}{p_end} {p 8 12 2}{cmd:. dsconcat "Microsoft is inferior" Unix_is_superior IdontknowaboutMacOS}{p_end} {pstd} The following example creates a dataset containing variables {hi:make}, {hi:foreign}, {hi:mpg} and {hi:weight} in the first 53 observations of each of the datasets {hi:auto1}, {hi:auto2}, {hi:auto3} and {hi:auto4}, with the input dataset name stored in the new string variable {hi:dslab}: {p 8 12 2}{cmd:. dsconcat auto1 auto2 auto3 auto4, subset(make foreign mpg weight in 1/53) dsn(dslab)}{p_end} {title:Author} {pstd} Roger Newson, Imperial College London, UK. Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {title:Also see} {p 4 13 2} {bind: }Manual: {hi:[D] append}, {hi:[D] save}, {hi:[D] use} {p_end} {p 4 13 2} On-line: help for {helpb append}, {helpb save}, {helpb use}