help zerouse
-------------------------------------------------------------------------------

Title

zerouse -- Tabulate the pattern of zero digits in a variable

Syntax

zerouse varname [, pattern(#.#) digit(#) by(varname) generate(name) tabulate_options]

Description

The zerouse function displays the patterns of zero and non-zero digits in a variable. The range of possible formats can be defined using an option, as can the 'zero' digit, and results can be tabulated against a different variable or stored in a new variable for later use.

Options

+---------+ ----+ General +----------------------------------------------------------

pattern(#.#) indicates the number of digits to the left and to the right of the decimal point that should be examined. The default is 9.9 and often this is a reasonable starting value although it may capture rounding errors. Otherwise, select a value that will look at one digit more to the left and to the right than you expect to find, and check that these extreme digits contain zeros rather than non-zero digits in the output. You can also select a pattern that is smaller than the digits used. For example, to tabulate records that do and do not have a zero in the first decimal place, use pattern(0.1).

by(varname) displays results using a two way rather than one way frequency table, giving you digit pattern cross-classified against your variable of choice.

digit(#) is provided so you can use this program to investigate the pattern of use of a digit other than zero. For example, if you were to select the digit 2 then use of the digit 2 would be compared with the use of any digit other than 2 (including zero).

generate(name) allows you to provide a name for a new variable into which the patterns will be stored. This means you can save the zero digit use pattern for a number of variables and then cross tabulate them to identify consistent patterns of (for example) mistranscription.

+------------------+ ----+ Tabulate Options +-------------------------------------------------

Additional options can be passed to the tabulate command used to format output. If the by(varname) option is not specified the output will be displayed in a one way table and the tabulate oneway options are available. If the by(varname) option has been used, output will be displayed in a two way table and the tabulate twoway options can be used.

Remarks

This function is intended to assist with the identification of digit preferance problems when cleaning data, and to help identify any systematic relationship between digit use and other variables, typically the laboratory or instrument that generated the data or the clerk who transcribing the data. It's use is most easily explained using an example:

. zerouse y1, pattern(2.1) sort

Digit | pattern | Freq. Percent Cum. ------------+-------------------------------------- 0#.0 | 11,514 86.21 86.21 ##.0 | 851 6.37 92.59 #0.0 | 532 3.98 96.57 0#.# | 424 3.17 99.75 00.0 | 34 0.25 100.00 ------------+-------------------------------------- Total | 13,355 100.00

The example above shows zero digit use for the variable y1, from which the following can be gathered:

Most values are single digit integers (0#.0). Some values have a tens digit of which about a third have a zero in the units position (#0.0) and two thirds have a digit in the units position (##.0). This suggests the values are mainly below ten with a few values that are ten or above.

Notice that only a few values are not integers and these are all below ten (0#.#). These values may have been measured using greater precision but if this was the case we would expect at least a few values of ten or above (##.#). Alternatively, it may be that some two digit values were incorrectly transcribed. Perhaps a decimal point was inserted and this reduced these values by an order of magnitude. Values with this digit pattern should probably be checked for transcription errors. If there were multiple paths to the database (different data entry clerks, labs or instruments) and this is stored in the database it might also be useful to cross tabulate these digit formats with this information, to see if all the anomalous values originated in the same place.

Examples

. zerouse y1 . zerouse y1, pattern(3.2) sort . zerouse y1, pattern(3.2) by(group) . zerouse y1, pattern(3.2) by(group) nofreq row

. zerouse y1, pattern(3.2) digit(2)

Author

Richard J. Atkins London School of Hygiene and Tropical Medicine e-mail: richard.atkins@lshtm.ac.uk