------------------------------------------------------------------------------- help for assertky -------------------------------------------------------------------------------

Sort the data and assert that the given varlist is a key for the dataset

assertky varlist [if exp] [in range] [, stable gen_n(varname1) gen_N(varname2)]

Alternative syntax:

assertky [if exp] [in range] , basis(varlist) [stable gen_n(varname1) gen_N(varname2)]

Description

assertky will sort the data on varlist and test whether varlist is a key for the dataset, that is, whether the values in varlist uniquely identify observations. This is useful when you wish to simultaneously sort the data and test for whether varlist is a key, such as in preparation for a merge or certain by operations. If the test fails, assertky exits with an error condition. assertky will leave the data sorted on varlist (regardless of whether the test succeeds).

Remarks

You must use one of the two syntaxes shown; you may not combine them. (The first syntax may be easier to use; the second syntax is allowed for backward-compatibility.)

The if and in qualifiers would presumably be rarely used. They would be useful when the key test fails on the entire set, but might succeed on a specific subset. If this feature is used, then the dataset is left sorted on varlist, with the excluded cases appearing at the front of each subset of constant values of varlist.

Options

stable specifies that you want a stable sort; cases that have the same values in varlist (i.e., those that violate the key condition) will appear, within sets of constant values of varlist, in the same order as they were prior to the sort. See sort. stable makes no difference if the key test succeeds.

gen_n(varname1) specifies that if the key test fails, then a variable will be generated that enumerates the cases within sets having the same values in varlist. Note that in order to have these values set consistently, you should also specify stable.

gen_N(varname2) specifies that if the key test fails, then a variable will be generated that reports the numbers of cases having the same values in varlist.

Note that both gen_n and gen_N will generate the variables only if the key test fails. These options can be used to identify cases that cause the test to fail (the "key violations"). (gen_N may be more useful than gen_n.) See examples, below. Also, any cases excluded by an if or in qualifier will recieve a missing value.

Examples

. assertky familyid person_no year

. assertky emplid effdate . merge emplid effdate using otherdataset

. assertky cust_no prod_serial_no varlist is not a key r(459); . assertky cust_no prod_serial_no if status=="A"

. assertky emplid effdate varlist is not a key r(459); . assertky emplid effdate, gen_N(N) varlist is not a key r(459); list if N>1, sepby(emplid effdate) /* shows all sets of key violations */

. assertky emplid effdate, gen_n(n) gen_N(N) varlist is not a key r(459); list if N>1 & n==1 /* shows one example from each set of key violations */

Note that if an if or in qualifier is used in combination with gen_n or gen_N, you should accomodate the possibility of missing values in the generated variables:

. assertky emplid effdate if status=="A", gen_N(N) varlist is not a key r(459); list if N>1 & ~mi(N), sepby(emplid effdate)

Or you could also code,

list if N>1 & status=="A", sepby(emplid effdate)

Further Remarks

assertky is useful prior to a merge, though sort is just as good, provided that the merge command is used with the uniq or uniqm option. (assertky was initially developed prior to the advent of these options in merge, and one of the motivations for its development was to facilitate insuring the key condition in a merge.)

Another useful application is prior to a by: prefix command, where a secondary sort varlist is used. (The "secondary" variables are those that appear in parenthese in a by: prefix command. See help by.) In that case, you will often want to be sure that the variables, including the secondaries, put the data into a unique sort order. (In these instances, the primary by: variables serve mainly to group the observations; the actual order of the groups is unimportant, but the uniquness of the sort on the secondary variable(s) may be necessary for the correct functioning of the subsequent command.) Example:

. assertky emplid effdate . by emplid (effdate): gen int spellno = _n

In this situation, assertky is useful because sort (or bysort) alone is not enough to insure a unique result.

assertky (without an if or in or options) is similar to isid but users may find it easier to use. It is ostensibly equivalent to

isid varlist, sort missok

though the author has not verified that it is exactly equivalent.

Author

David Kantor. Email kantor.d@att.net if you observe any problems.

Also See