{smcl}
{* revised 17may2019}{...}
{cmd:help filelist}
{hline}

{title:Title}

{phang}
{bf:filelist} {hline 2} Recursively search directories for files


{title:Syntax}

{p 4 16 2}
{cmd:filelist} 
	{cmd:,}
	[
	{opt d:irectory(dirpath)} 
	{opt p:attern(search_pattern)} 
	{opt s:ave(stata_dataset)} 
	{opt replace}
	{opt l:ist} 
	{opt nor:ecursive}
	{opt max:deep(#)}
	]
	

{marker Description}{...}
{title:Description}

{pstd}
{cmd:filelist} searches {it:dirpath} for files that match
{it:search_pattern} and continues searching recursively in all
its subdirectories. If {opt d:irectory(dirpath)} is omitted,
the search starts from the current directory 
(see {help pwd} and {help cd}). 

{pstd}
By default, {cmd:filelist} will pick up all files, including
system files that are usually hidden. To target a specific
type of file using a pattern in the file name, use the 
{opt p:attern(search_pattern)} option.
The {it:search_pattern} string must conform to the rules of the {help strmatch()}
function. For example, with {opt pattern("*.csv")}, {cmd:filelist} will return only
file names that end with ".csv".

{pstd}
{cmd:filelist} creates a dataset with 3 variables and as
many observations as there are matching files.
The {cmd:dirname} variable stores the file path to the file,
starting from the initial {it:dirpath}. The {cmd:filename} variable 
stores the file name and {cmd:fsize} stores the file size in bytes.

{pstd}
You can use the {opt s:ave(stata_dataset)} option to save the
results to disk instead of replacing the data in memory. Use
the {opt replace} option if {it:stata_dataset} already exists
and you want to overwrite it.

{pstd}
The {opt l:ist} option is used to print the matched files
in Stata's Results window.

{pstd}
The {opt max:deep(#)} can be used to control how many levels
deep {cmd:filelist} will search for files. {opt max:deep(1)}
is equivalent to using the {opt nor:ecursive} option and will
limit the search to the initial directory specified by {it:dirpath}.
The default is to search all subdirectories recursively.


{marker Limitations}{...}
{title:Limitations}

{pstd}
{cmd:filelist} can recursively scan a directory and return an unlimited 
number of files (it will happily scan a whole hard disk if you ask for it).
Note however that {cmd:filelist} is written in Mata and unfortunately the
{cmd:dir()} function can only return 10,000 filenames from a single directory.
As of May 17, 2019, this hard coded limit is still present in the all versions
of Stata. 

{pstd}
If the directory structure is deep enough,
the file path may exceed the maximum string length for variables of 244
characters in Stata version 9 to 12. When used with Stata 13
or higher, {cmd:filelist} can handle any length.

{pstd}
The Stata routines used by {cmd:filelist} 
to discover the file size work with files
up to 2,147,483,647 bytes (2GB). For files
larger than 2GB, you need a 64-bit version
of Stata 13.1 (revision 03 Jun 2015) or higher.
Otherwise, {cmd:fsize} will
contain a missing value for that observation.


{marker Examples}{...}
{title:Examples}

{pstd}
To find all files in the current directory and its subdirectories

        {cmd:.} {stata filelist}

{pstd}
If there is a "main" directory within the current directory, you can
search for all Stata datasets in "main" using

        {cmd:.} {stata filelist, dir("main") pat("*.dta")}
        
{pstd}
To search for all comma-separated data files in the "main" directory 
within the current directory and save the results to disk
        
        {cmd:.} {stata filelist, dir("main") pat("*.csv") save("csv_datasets.dta")}
        
{pstd}
You can run the following code if you want to use the saved search 
results to append all csv data files

        {cmd:} use "csv_datasets.dta", clear
        {cmd:} local obs = _N
        {cmd:} forvalues i=1/`obs' {
        {cmd:}   use "csv_datasets.dta" in `i', clear
        {cmd:}   local f = dirname + "/" + filename
        {cmd:}   insheet using "`f'", clear
        {cmd:}   gen source = "`f'"
        {cmd:}   tempfile save`i'
        {cmd:}   save "`save`i''"
        {cmd:} }

        {cmd:} use "`save1'", clear
        {cmd:} forvalues i=2/`obs' {
        {cmd:}   append using "`save`i''"
        {cmd:} }


{marker Acknowledgments}{...}
{title:Acknowledgments}

{pstd}A question on Statalist from {browse "http://www.stata.com/statalist/archive/2013-10/msg01014.html":Tim Evans} was the stimulus for
writing this program. 


{marker Author}{...}
{title:Author}

{pstd}Robert Picard{p_end}
{pstd}picard@netbox.com{p_end}


{marker also}{...}
{title:Also see}

{psee}
Stata:  {help pwd}, {help cd:[D] cd}, {help dir:[D] dir}, {help extended_fcn:[P] macro -- Extended macro functions}, {help mf_dir: [M-5] dir()}
{p_end}

{psee}
SSC:  {stata "ssc desc dirlist":dirlist}, {stata "ssc desc dirtools":dirtools}, {stata "ssc desc fs":fs}
{p_end}