{smcl} {* *! version 0.1 07jul2018}{...} {viewerdialog excelclean "dialog misc"}{...} {viewerjumpto "Syntax" "misc##syntax"}{...} {viewerjumpto "Contact" "misc##contact"}{...} {title:Title} {p2colset 5 17 19 2}{...} {p2col :{hi:excelclean} {hline 2} clean and integrate excel files} {p_end} {p2colreset}{...} {marker description}{...} {title:Description} {p 4 4 2}{cmd:excelclean} automatically loads all excel files in a specified directory, organizes variable names and labels, reshapes the dataset if necessary, and integrates all files into a cleaned dataset. {marker syntax}{...} {title:Syntax} {p 4 4 2} {cmd:excelclean} datadir(string) sheet(string) cellrange(string) [{cmd:,} {it:options}] {phang2} {opt datadir(string)} directory where excel files are stored. Please close and save all excel files under this directory before executing the command. {p_end} {pmore2} e.g., datadir("c:/myplace/") {phang2} {opt sheet(string)} the excel sheet in each excel file to be loaded into Stata. {p_end} {pmore2} e.g., sheet("sheet1") or sheet("Results") {phang2} {opt cellrange(string)} the range of data cells on each excel sheet to be loaded into Stata. {p_end} {pmore2} e.g., cellrange("A1") to extract all cells; cellrange("B3") to extract cells starting from the second column and the third row. {title:Options} {phang2} {opt integrate} integrate all datasets into a single dta file. The default is to save each dta file separately using the name of the corresponding excel file. {p_end} {phang2} {opt pivot} reshape variables from a wide format to a long format. By default, the program recogonizes the last word from the formulated variable names as the time indicator. {p_end} {pmore2} The program will detect the variables that need to be reshaped into a long format, e.g. Var2000, Var2001, Var2002 -> Var {phang2} {opt droplist(string)} drop redundant variables from the dataset in the data integration process. It helps to reduce the file size and processing time. Separate variable names by a space " ". {p_end} {pmore2} e.g., droplist("Var1 Var2 "). Remember to leave a space at the end of the last variable. {phang2} {opt resultdir(string)} specify the directory where the results are saved. The default is the directory "datadir" where the excel files are stored. {p_end} {pmore2} e.g., resultdir("C:/myresultdir/"). Always use the {hi:full path} of the directory to aviod possible conflicts. {phang2} {opt extension(string)} specify the extension of files to be included. The deault is "xlsx". {p_end} {pmore2} e.g., extension("xls") {phang2} {opt namerange(integer)} specify the rows that record variables names. The default is the first row. {p_end} {pmore2} e.g., namelines(4) to indicate that the first four rows contain information about variable names. Note that the first four lines will be deleted from the dataset after creating the variable names. {phang2} {opt namelines(string)} select the rows to formulate variables names. The default is the first row. {p_end} {pmore2} e.g., namelines("1 3") to specify the first and the third rows to be used to formulate variable names {phang2} {opt wordfilter(string)} specify specific characters to be excluded from variable names. {p_end} {pmore2} e.g., wordfilter("year quarter the"); to exclude the space before any word use wordfilter(`"" word1" " word2""') {marker contact}{...} {title:Author} {phang} Lu Han {phang} Faculty of Economics, Universtiy of Cambridge {phang} Please report issues on Github {browse "https://github.com/hanlulong/excelclean":https://github.com/hanlulong/excelclean} {p_end}