.-
help for ^dups^  <version 1.0.3, Stata v6>           (v1.0.1 in STB-41: dm53)
.-


Detection and deletion of duplicate observations
------------------------------------------------

   ^dups^ [varlist] [ ^, drop^ ^e^xpand^(^varname^)^ ^k^ey^(^varlist2^)^
                      ^u^nique ^t^erse ^v^erbose ]


Description
-----------

^dups^ provides information about unique and duplicate observations in the 
dataset and, optionally, drops all duplicate observations. 

^varlist^ is an optional variable list that determines which observations are 
duplicates: observations must match exactly on all variables in the list to 
be duplicates.  If no ^varlist^ is given, then all variables in the dataset 
are used to determine duplicates.


Options
-------

^drop^ causes duplicate observations to be dropped from the dataset.  ^drop^ 
   must be spelled out completely.

   ^drop^ creates an expand variable (the default name is ^_expand^) to allow 
   dropped data to be recreated.  If ^_expand^ exists, an error message is 
   reported and no data are dropped. The expand variable will contain the 
   number of duplicate copies of the observations in the original dataset. 
   A subsequent ^expand^ command will completely resurrect the original data 
   only if ^varlist^ was not specified in the ^dups^ command (or, equivalently,
>  
   if ^varlist^ contains all variables in the dataset), or if the unspecified 
   variables are constant within the subgroups formed by the specified 
   variables.  The data can be partially, but not fully, resurrected if a 
   limited ^varlist^ was used (unique information from the variables not in 
   ^varlist^ cannot be recovered).

^expand(varname)^ specifies a ^varname^ to be used as the expand variable in 
   place of the default name, ^_expand^.  (This option has no effect unless 
   option ^drop^ is also included.)  If the specified ^varname^ exists, an 
   error message is given and no data are dropped.

^key(varlist2)^ causes the value of the variables in ^varlist2^ to be added to 
   the displayed output for each group.  If ^varlist2^ is  assigned value ^*^ 
   then ^varlist2^ will be set the same as ^varlist^.  When option ^verbose^ is
>  
   specified, ^varlist2^ may include one or more variables which identify an 
   individual observation uniquely.  Otherwise, ^varlist2^ should contain 
   variables which assist in identifying the groups of duplicates.  Normally 
   these would be some or all of the variables in ^varlist^.  
   ^key()^ is required if ^verbose^ is requested.

^unique^ causes the default display and option ^key()^ to list information for 
   unique observations also.

^terse^ limits the default display output.  When specified, only the number of 
   duplicate groups, total observations, number of observations in duplicates,
   and number of unique observations are shown.

   Without ^terse^, ^dups^ will number the duplicate groups and provide the 
   observation count in each group, and will do the same for unique 
   observations, if any, when ^unique^ is specified. 

   Specifying ^terse^ cancels ^key()^, ^unique^ and ^verbose^.

^verbose^ displays the values of ^key()^ for every duplicate in each group of
   duplicates.  ^key()^ is required if ^verbose^ is requested.  A typical usage
>  
   would be to place id variables in ^key()^ (see last example below).  


Authors
-------

   Thomas J. Steichen; RJRT; steicht@@rjrt.com
   Nicholas J. Cox; University of Durham, UK; n.j.cox@@durham.ac.uk


Examples
--------

 . ^dups^
 . ^dups, drop^
 . ^dups foreign, key(*) unique^
 . ^dups foreign, drop expand(ex) terse^
 . ^dups foreign, key(make) verbose^


Also see
--------

    STB:  STB-41 dm53
On-line:  help for @expand@, @fillin@, and @chkdup@ (if installed)