------------------------------------------------------------------------------- help for funcdep -------------------------------------------------------------------------------
Test whether a set of variables is functionally dependent on a specified set of > basis variables
funcdep [varlist] [if exp] [in range] [, basis(basisvars) assert key testsep gen(genvars) sortpreserve]
Description
funcdep will test whether varlist is functionally dependent on basisvars, which means that observations with equal values of basisvars also have equal values of varlist. If we use the term "basis groups" to mean the sets of observations having the same values in basisvars, then functional dependency is the property that varlist has constant values within basis groups. See more on this subject under Additional Remarks, below.
funcdep first tests whether basisvars is a key for the dataset - whether every observation has a unique value (or tuple of values) in basisvars. If this condition is satisfied, then no further testing is performed, as this condition implies functional dependency, regardless of what varlist is composed of. If basisvars is not a key, then a functional dependency test is performed on varlist with respect to basisvars.
The following r() values will be returned: r(funcdep) r(key) These are numeric 0/1 indicators of, respectively, whether functional dependency holds and whether basisvars is a key. (If the functional dependency test is skipped due to basisvars being a key, then r(funcdep) is still set appropriately, to 1). These return values are provided to enable you write programs or do-files whose actions depend on the results of a functional dependency test. But if your intent is to proceed only if the test is successful, it may be more convenient to just use the assert option.
Options
basis(basisvars) specifies the set of basis variables on which you are testing for functional dependency. This can be absent, in which case funcdep tests whether varlist is constant within the entire dataset.
assert asserts that the functional dependency condition holds. That is, it invokes an error condition if the test fails.
key additionally asserts that basisvars is a key; it only takes effect if assert is also specified.
testsep causes the functional dependency test to be done separately for each variable in varlist. The default is to test the whole of varlist all at once. The utility of this option is that, if the test fails, and if varlist is composed of several variables, then you will be able to tell which of the variables is causing the failure. (Note that varlist is functionally dependent on basisvars if and only if each variable in varlist is, separately, functionally dependent on basisvars.)
gen(genvars) specifies one or more new variables to be generated that indicate where functional dependency fails; they are constant within basis groups. This may be useful for finding cases where functional dependency fails, or for just determining whether a variable varies within basis groups.
If testsep is specified along with gen(genvars), then there must be as many names in genvars as in varlist, and the names in genvars will correspond to those in varlist in the order given. If testsep is not specified, then there must be just one name in genvars, and it will represent all variables in varlist collectively.
If the functional dependency test is skipped due to basisvars being a key, and gen(genvars) is specified, then genvars are set to 0.
sortpreserve turns on the sortpreserve feature; the sort order of the dataset will not be affected. By default, the data will be sorted by basisvars.
Remarks
Both varlist and basis(basisvars) are optional. If varlist is absent then functional dependency is necesarily true. That is, an empty varlist is functionally dependent on any basisvars. If basis(basisvars) is absent, then funcdep tests whether varlist is constant over the entire dataset. (Note that an empty basisvars is a key only if there are <=1 observations.)
The if and in qualifiers would presumably be rarely used. Usually you would want to test functional dependency on an entire dataset, but there may be occasions where it fails on the entire set, but you would want to see if it passes on a specific subset.
Examples
. funcdep state county, basis(family year)
. funcdep msa if ~newengland, basis(state county) assert
. funcdep name yearborn, basis(id) assert
. funcdep name, basis(id) gen(k1) . list id name if k1, sepby(id) /* shows id values that have multiple names */
. funcdep jobcode, basis(id) gen(k2) /* k2 indicates that the person had more than one jobcode */
Additional Remarks
To say that varlist is functionally dependent on basisvars means that values in varlist are uniquely determined by those of basisvars. Equivalently, within basis groups, the values of varlist are constant. Yet another charaterization is that the relationship of basisvars values to varlist values is either one-to-one or many-to-one. An example may help to illustrate:
. list, noobs
+-----------+ | a b c | |-----------| | 6 4 7 | | 9 4 7 | | 1 6 5 | | 2 6 5 | | 4 6 5 | | 2 8 5 | +-----------+
In this example, c is functionally dependent on b, but not vice-versa.
Functional dependency is a condition that is usually considered as an intrinsic property of the construction of a dataset. That is, there is a logical reason why one variable must be functionally dependent on another. One should be careful however, as there are situations where functional dependency arises as an accident of the content of the data (usually in a small sample), rather than the logic behind the construction of the dataset. The same can be said of whether a set of variables forms a key.
It is generally accepted wisdom that tables that serve as primamry storage of information should not have functional dependency between non-key variables. This is the idea behind database normalization; see collapseunique for more on this subject. (You can use collapseunique to "factor out" such internal functional dependencies.) However, it is often acceptable and convenient to allow functional dependency between non-key variables in datasets for analysis. One use of funcdep is to test whether these expected functional dependencies do indeed hold.
Author
David Kantor. Email kantor.d@att.net if you observe any problems.
Also see
varlist, isid, duplicates;