A plot to show patterns of missing values in a dataset
missingplot [varlist] [if exp] [in range] [, all labels variablenames scatter_options]
Description
missingplot gives a plot showing the incidence of missing values in one or more variables in the current dataset. The horizontal axis shows observation numbers; the vertical axis shows one or more lines, one for each variable shown. Marker symbols show which values are missing.
missingplot treats numeric and string variables alike: what is common to both is whether the missing() function returns true. In the case of numeric variables no distinction is made between system missing (.) and any extended missing value .a ... .z. See missing for a tutorial if desired. Users wishing to select classes of variables, for example all numeric or all string variables, may wish to use first either ds or findname (if installed).
missingplot may be useful for seeing broad patterns in the incidence of missing values, for example blocks of observations with many or all missing values or variables with many or all missing values. It may also be useful for quickly identifying fine structure or notable detail in some instances. See also misstable (Stata 11 up) and nmissing (if installed).
Remarks
For a loosely similar plot, see Wilkinson (2005, p.487). Users of this program knowing of references to interesting earlier or similar work are encouraged to send references to the program author.
The mechanics of the plot are that each variable in the plot is represented by a single variable inside the program. There is currently a limit of 20 variables being shown in any one graph.
Options
all specifies that all variables implied by varlist should be plotted, regardless of whether they contain missing values. The default of missingplot is to omit variables from the plot if they have no missing values (in the observations selected, if either if or in has been specified). Specifying all is more likely to trigger the limit of 20 variables shown.
labels specifies that marker labels be shown identifying the observation number of each missing value. In practice this will work best with a small number of missing values or a small dataset or both. Note that as above marker labels are generated by repeated calls to marker label options for each variable; thus if you wish to change away from the default you would need to specify (e.g.) mlabcolor(blue ..).
variablenames specifies that variable names only be shown to identify variables. The default is to show variable labels if they exist, and variable names otherwise. The value of this option is usually to increase the space devoted to the graph itself.
scatter_options are options of scatter.
Examples
. webuse nlsw88, clear . missingplot . missingplot, var labels . missingplot, var labels mlabcolor(blue ..)
Author
Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk
References
Cox, N.J. 1999. Numbers of missing and present values. Stata Technical Bulletin 49: 7-8. (Software updates Stata Technical Bulletin 60: 2-3; Stata Journal 3:449 and 5:607)
Cox, N.J. 2010. Finding variables. Stata Journal 10: 281-296. (Software updates Stata Journal 10:691 and 12:167)
Wilkinson, L. 2005. The Language of Graphics. New York: Springer.
Also see