help cfout


cfout -- Compare two files, outsheeting a list of differences


cfout [varlist] using filename , id(varname) [options]

options Description ------------------------------------------------------------------------- nopunct ignores differences in punctuation and capitalization altid(varname) display an additional identifying variable. name(filename) name of the resulting .csv file format( %fmt) display format to use for numeric variables nomatch surpress warnings about missing observations upper convert all string variables to upper case before comparing lower convert all string variables to lower case before comparing nostring do not compare any string variables replace overwrite existing filename -------------------------------------------------------------------------


cfout compares the variables in varlist from the dataset in memory to the variables in varlist from the using dataset and saves a list of differences to a .csv file. It is useful if you are doing data entry and want to get an easy-to-work-with list of discrepancies between the first and second entries of a dataset.


id(varname) is required. varname is the variable that matches observations in the master dataset to observations in the using dataset. It must uniquely identify observations in both the master and using datasets.

nopunct Deletes the following characters before comparing: ! ? ' and replaces the following characters with a space: . , - / ;

altid(varname) displays varname in the resulting .csv file. Displaying a second id is useful when you suspect there may be errors in the primary id. altid is not used for matching; it is purely cosmetic.

name(filename) specifies the name and path of the resulting .csv file. The default is "discrepancies report.csv"

format( %fmt) specifies the display format to be used for all numeric variables, including id if it is numeric. The default is %9.0g. See format for help with formating.

nomatch is specified if the number of observations in the master and using dataset do not need to match. The default is to assume 1:1 matching between the datasets, and to list any observations that existin in only one dataset.


cfout is intended to be used as part of the data entry process when data is entered two times for accuracy. After the second entry, the datasets need to be reconciled. cfout will compare the first and second entries and generate a list of discrepancies in a format that is useful for the data entry teams. cfout assumes that the variable specified in the id option uniquely idenfifies observations in both datasets. cfout does not compare variables that have a different string/numeric type in both datasets. cfout also doesn't compare variables that are different in all observations.


use "first entry.dta"

cfout region-no_good_at_all using "second entry.dta" , id(uniqueid)

Saved Results

cfout saves the following in r():

Scalars r(discrep) number of discrepenacies r(N) number of data points compared


Ryan Knight, rknight at

Also see

Online: cf, compare