/*
collapseunique.ado  6-17-2003

David Kantor, Institute for Policy Studies, Johns Hopkins University

This does a kind of collapsing operation, akin to -collapse-, but just takes
unique values of variables.  Specifically,...
 1, confirm that, within by-groups (formed according to the `by' variables),
 the values in the main varlist are unique, i.e., constant.  (In other words,
 `varlist' is functionally dependent on `by'.)
 2, if the first test passes, then collapse to one observation per by-group,
 taking the unique values in `varlist'
 3, finally, keep only those variables mentioned in `by' and `varlist'.

This will make use of funcdep (a test for functional dependence), another of
my ados.

7-8-2003: adapting to changes in funcdep.ado; the -testsep- option is coded
THERE, so the present program becomes simpler.

10-9-2003; making varlist optional.  With an empty varlist, this should just
collapse to unique values of `by'.

3-11-2004: making by(varlist) optional (that's varlist2 in the help).  The
same was done for funcdep (basis()).
If it is absent,then this considers the whole dataset as one by-group
-- as if you had done...
 gen byte one=1
 collapseunique varlist, by(one)

It is required to have at least at least one of varlist or by().


3-19-2004 funcdep is now sortpreserve; we must do our own sorting herein.
(That's a bit wasteful, as we re-sort here.)
[That's changed as of 6-19-2008.]


3-22-2004: edited comments.

Note: why not try to make this as an additional feature on -collapse- (another
stat item)?  The reason is that there is a fundamental difference between
this and -collapse-: this will sometimes refuse to do any action at all,
depending on the content of the data, whereas -collapse- always does some
kind of collapsing.

One possibility would be to add this in to -collapse-, but to yield missing
values where the var is not functionally dependent on the by-vars.

More on 3-22-2004: adding the emptyvarlist option.

Also, at the outset, we remove from varlist any vars in common with by.
There is no intrinsic need for this,  But it closes a loophole in the
requirement for emptyvarlist.  If it weren't for this, then you could do...
 collapseunique a, by(a b c)
which has the same effect as
 collapseunique, by(a b c)
but there would be no check for emptyvarlist.

Thus, emptyvarlist is required if varlist is empty -- or if it is a subset
of the by-vars.  The latter part of this is a consequence of the editing
of varlist.

3-23-2004: just added comments.

6-19-2008: funcdep is, by default, NOT sortpreserve. So we can skip the
-sort- command (once again).

*/


*! version 1.2.3 19jun2008

program define collapseunique
version 8  // it may work fine in 7 (?)

syntax [varlist(default=none)] [if] [in] , [by(varlist) testsep fast ///
 EMPTYVarlist]

marksample touse, novarlist

if "`varlist'" =="" & "`by'" =="" {
 disp as error "varlist or by() required; they may not both be absent"
 exit 198
}



/* remove from varlist any vars in common with by. */

local commonvars : list varlist & by
if trim("`commonvars'") ~= "" {  // trim probably not needed
 disp as text "Note: {it:varlist} and {it:by} have common elements: `commonvars'"
 disp as text "they are being removed from {it:varlist}."
 local varlist : list uniq varlist
 local varlist : list varlist - by
}

/* The use of -list uniq varlist- reduces repeated items.  Closes another
loophole: 
 collapseunique a a, by(a b c)
would pass otherwise.
*/




if "`varlist'" =="" & "`emptyvarlist'" =="" {
 disp as error "option emptyvarlist required if {it:varlist} is absent"
 exit 198
}




if "`fast'" == "" {
        preserve
}

tempname N1 N2
quietly {
        count
        scalar `N1' = r(N)
        keep if `touse'
        count
        scalar `N2' = `N1' - r(N)
        }

if `N2' >0 {
        disp as text "(" as res `N2' as text plural(`N2', " observation") ///
         " deleted due to " ///
         as input "if" as text " or " as input "in" as text " conditions)"
}


funcdep `varlist', basis(`by') assert `testsep'


/* At this point, we have confirmed that all the vars are functionally
dependent on `by'.  So we can proceed to reduce.
*/

if "`by'" ~= "" {
 /* sort `by' */
 local byby "by `by':"
}

quietly {
        count
        scalar `N1' = r(N)
        `byby' keep if _n==1  // reduce to one observation per by-group
        count
        scalar `N2' = `N1' - r(N)
        }
disp as text "(" as res `N2' as text plural(`N2', " observation") ///
 " deleted in the collapsing)"


keep `by' `varlist'

/* That last -keep- retains only the variables of interest.  Presumably,
any other variables might not be appropriate to keep.  (Any other variable
might not be functionally dependent on `by', and thus, would have an arbitrary
representative value retained.)
*/



if "`fast'" == "" {
        restore, not
}


end