{smcl} {* 18feb2011}{...} {* @@ Written by Elliott Lowy, mostly on the US government's dime (17 US Code ยง 105).}{...} {vieweralsosee "recent" "recent"}{...} INCLUDE help also_vlowy {title:Title} {pstd}{bf:collect} {hline 2} Concatenate multiple files {title:Syntax} {pmore}{cmdab:collect} {it:{help path_el}} [{cmd:;} {it:{help path_el}} ...] [{cmd:,} {opt ap:pend} {cmdab:k:eep(}{it:{help varelist}}{cmd:)} {opt pass(options)} {opt t:est}] {pstd}Wildcards in {it:{help path_el}} can be used to specify multiple files in a directory. {title:Description} {pstd}{cmd:collect} concatenates multiple files into a single dataset in memory. The files may be in a mix of formats (though some formats would require StatTransfer). Variables can be combined and renamed. {pstd}Before collecting any data, {cmd:collect} gives a fairly detailed report of what it is about to do. It shows, for each data file that will be collected: {phang2}o-{space 2}The paths and assigned id numbers{p_end} {phang2}o-{space 2}The number of variables that will be kept{break} (as a link showing all variables, and highlighting the ones to be kept){p_end} {phang2}o-{space 2}Any naming/renaming errors {pstd}and then: {phang2}o-{space 2}Anything specified in {opt keep()} that was not found in any dataset. {pstd}If there are errors, all of them will be reported before the command aborts. {pstd}{cmd:collect} adds a variable, {cmd:_file}, identifying the file each observation came from. {cmd:_file} contains id numbers, and is labeled with the file names. {title:Options} {phang}{opt ap:pend} causes the specified files to be {bf:appended} to the data in memory, rather than {bf:replacing} it. {phang}{cmdab:k:eep(}{it:{help varelist}}{cmd:)} specifies the variables to be kept from any of the files collected. The variables in {cmdab:k:eep()} do not need to be present in every (or indeed any) file. If they are present in any of the collected files, they will be kept in the final data file. {pmore}{it:{help varelist}} allows {help varelist##mods:modifiers} for combining/renaming variables: {p2col 13 29 29 2:{ul:{help varelist##mods:Modifier}}}{ul:Description}{p_end} {p2col 13 29 29 2:{cmd:(->} {it:varname}{cmd:)}}Rename any modified variables to {it:varname} {pmore}For example: {pmore2}{cmd:collect f*, keep(Rob*(-> Bobby) A-Z )} {pmore}would collect all files starting with {cmd:f}. From each of those files, all variables starting with {cmd:Rob}, and from {cmd:A} to {cmd:Z} would be kept. All variables starting with {cmd:Rob} (eg, {cmd:Rob}, {cmd:Robby}, {cmd:Robert}) would be renamed to {cmd:Bobby}. {pmore}If multiple variables from the same dataset would end up with the same name, an error will be generated. {pmore} {phang}{opt t:est} causes {cmd:collect} to report on what it would do (ie, which files it would use, variables used or not found, any errors, estimate of observations), without actually collecting the data. {phang}{opt pass()} passes import/export options along to the appropriate handler. {phang2}o-{space 2}For file extensions {cmd:.txt} or {cmd:.csv}, the options are those for {help import delimited}.{p_end} {phang2}o-{space 2}For file extensions {cmd:.xl}, {cmd:.xls} or {cmd:.xlsx}, the options are described under {help portel xl}.{p_end} {phang2}o-{space 2}For other file extensions (besides {cmd:.dta}), the options are those for {help callst}.{p_end} {title:Examples} {pstd}Using unrelated semicolon-separated paths: {pstd}{cmd:. collect a:/one/path.dta; b:/another/path.dta} {pstd}Using wildcards to select multiple files from a single directory: {pstd}{cmd:. collect a:/single*/dir*/allofthese*.dta}