-------------------------------------------------------------------------------help distinct(SJ12-2: dm0042) -------------------------------------------------------------------------------

Title

distinct-- Report number(s) of distinct observations or values

Syntax

distinct[varlist] [if] [in] [,missingabbrev(#)jointminimum(#)maximum(#)]

byis allowed; see[D] by.

DescriptionThe

distinctcommand displays the number of distinct observations with respect to the variables invarlist. By default, each variable is considered separately so that the number of distinct observations for each variable is reported; the number of distinct observations is the same as the number of distinct values. Optionally, variables can be considered jointly so that the number of distinct groups defined by the values of variables invarlistis reported.By default, missing values are not counted.

varlistmay contain both numeric and string variables.

Options

missingspecifies that missing values are to be included in counting distinct observations.

abbrev(#)specifies that variable names are to be displayed abbreviated to at most#characters. This option has no effect withjoint.

jointspecifies that distinctness is to be determined jointly for the variables invarlist.

minimum(#)specifies that numbers of distinct values are to be displayed only if they are equal to or greater than a specified minimum.

maximum(#)specifies that numbers of distinct values are to be displayed only if they are less than or equal to a specified maximum.

RemarksDistinctness, duplication, and uniqueness are different aspects of the similarity and difference of observations. Suppose the values of some variable are 1, 2, 2, 3, 3, 3, 4, 4, 4, 4. Then there are four distinct values: 1, 2, 3, and 4. Alternatively, there are, so far as this variable is concerned, four distinct observations because, for example, the second and third observations both containing the value 2 are identical in respect to this variable. Of these values, 2, 3, and 4 are duplicated in the data, meaning that each occurs twice or more. Some people refer to the distinct values as unique values, even though in general distinct values could all be repeated in the data. One logic behind that terminology is that if you remove all duplicates from these data then you are left with four distinct values, each of which occurs once.

Now consider distinctness determined jointly for two variables. Suppose observations are 1 and

"a", 2 and"b", 2 and"b", 3 and"c", 3 and"c", 3 and"d", 4 and"c", 4 and"c", 4 and"d", 4 and"d". Then, as far as these two variables are concerned, there are six distinct observations, 1 and"a", 2 and"b", 3 and"c", 3 and"d", 4 and"c", 4 and"d". Considering the variables individually, there are four distinct values for the first variable and four for the second. Clearly, the same principles of considering variables individually and jointly extend to three or more variables.

Saved results

distinctsaves the following inr():Scalars

r(ndistinct)distinct count (for last variable, or jointly considered group of variables, and, if specified, lastbygroup)r(N)number of observations (for last variable, or jointly considered group of variables, and, if specified, lastbygroup)

Examples

. sysuse auto. distinct. distinct, max(10). distinct make-headroom. distinct make-headroom, missing abbrev(6). distinct foreign rep78, joint. distinct foreign rep78, joint missing

AuthorsGary Longton, Fred Hutchinson Cancer Research Center, USA glongton@fhcrc.org

Nicholas J. Cox, Durham University, UK n.j.cox@durham.ac.uk

AcknowledgmentThis program grew out of one originally posted to Statalist by Patrick Royston for Stata 4.

Also seeArticle:

Stata Journal, volume 8, number 4: dm0042Online:

[D] codebook,[D] contract,[D] duplicates,[D] egen,[D]inspect,[D] isid,[P] levelsof, tabulate, groups (if installed)