{smcl} {* 19jun2008; prior: 20oct2006} {hline} help for {hi:funcdep} {hline} {title:Test whether a set of variables is functionally dependent on a specified set of basis variables} {p 8 17 2} {cmd:funcdep} [{it:varlist}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [, {cmd:basis(}{it:basisvars}{cmd:)} {cmd:assert key testsep} {cmd:gen(}{it:genvars}{cmd:)} {cmd:sortpreserve}] {title:Description} {p 4 4 2} {cmd:funcdep} will test whether {it:varlist} is functionally dependent on {it:basisvars}, which means that observations with equal values of {it:basisvars} also have equal values of {it:varlist}. If we use the term "basis groups" to mean the sets of observations having the same values in {it:basisvars}, then functional dependency is the property that {it:varlist} has constant values within basis groups. See more on this subject under {ul:{bf:Additional Remarks}}, below. {p 4 4 2} {cmd:funcdep} first tests whether {it:basisvars} is a key for the dataset {c -} whether every observation has a unique value (or tuple of values) in {it:basisvars}. If this condition is satisfied, then no further testing is performed, as this condition implies functional dependency, regardless of what {it:varlist} is composed of. If {it:basisvars} is not a key, then a functional dependency test is performed on {it:varlist} with respect to {it:basisvars}. {p 4 4 2} The following r() values will be returned:{p_end} {col 9}r(funcdep) {col 9}r(key) {p 4 4 2} These are numeric 0/1 indicators of, respectively, whether functional dependency holds and whether {it:basisvars} is a key. (If the functional dependency test is skipped due to {it:basisvars} being a key, then r(funcdep) is still set appropriately, to 1). These return values are provided to enable you write programs or do-files whose actions depend on the results of a functional dependency test. But if your intent is to proceed only if the test is successful, it may be more convenient to just use the {cmd:assert} option. {title:Options} {p 4 8 2} {cmd:basis(}{it:basisvars}{cmd:)} specifies the set of basis variables on which you are testing for functional dependency. This can be absent, in which case {cmd:funcdep} tests whether {it:varlist} is constant within the entire dataset. {p 4 8 2} {cmd:assert} {help assert}s that the functional dependency condition holds. That is, it invokes an error condition if the test fails. {p 4 8 2} {cmd:key} additionally asserts that {it:basisvars} is a key; it only takes effect if {cmd:assert} is also specified. {p 4 8 2} {cmd:testsep} causes the functional dependency test to be done separately for each variable in {it:varlist}. The default is to test the whole of {it:varlist} all at once. The utility of this option is that, if the test fails, and if {it:varlist} is composed of several variables, then you will be able to tell which of the variables is causing the failure. (Note that {it:varlist} is functionally dependent on {it:basisvars} if and only if each variable in {it:varlist} is, separately, functionally dependent on {it:basisvars}.) {p 4 8 2} {cmd:gen(}{it:genvars}{cmd:)} specifies one or more new variables to be generated that indicate where functional dependency fails; they are constant within basis groups. This may be useful for finding cases where functional dependency fails, or for just determining whether a variable varies within basis groups. {p 8 8 2} If {cmd:testsep} is specified along with {cmd:gen(}{it:genvars}{cmd:)}, then there must be as many names in {it:genvars} as in {it:varlist}, and the names in {it:genvars} will correspond to those in {it:varlist} in the order given. If {cmd:testsep} is not specified, then there must be just one name in {it:genvars}, and it will represent all variables in {it:varlist} collectively. {p 8 8 2} If the functional dependency test is skipped due to {it:basisvars} being a key, and {cmd:gen(}{it:genvars}{cmd:)} is specified, then {it:genvars} are set to 0. {p 4 8 2} {cmd:sortpreserve} turns on the sortpreserve feature; the sort order of the dataset will not be affected. By default, the data will be sorted by {it:basisvars}. {title:Remarks} {p 4 4 2} Both {it:varlist} and {cmd:basis(}{it:basisvars}{cmd:)} are optional. If {it:varlist} is absent then functional dependency is necesarily true. That is, an empty {it:varlist} is functionally dependent on any {it:basisvars}. If {cmd:basis(}{it:basisvars}{cmd:)} is absent, then {cmd:funcdep} tests whether {it:varlist} is constant over the entire dataset. (Note that an empty {it:basisvars} is a key only if there are <=1 observations.) {p 4 4 2} The {cmd:if} and {cmd:in} qualifiers would presumably be rarely used. Usually you would want to test functional dependency on an entire dataset, but there may be occasions where it fails on the entire set, but you would want to see if it passes on a specific subset. {title:Examples} {p 6 6 2}{cmd:. funcdep state county, basis(family year)}{p_end} {p 6 6 2}{cmd:. funcdep msa if ~newengland, basis(state county) assert}{p_end} {p 6 6 2}{cmd:. funcdep name yearborn, basis(id) assert}{p_end} {p 6 6 2}{cmd:. funcdep name, basis(id) gen(k1)}{p_end} {p 6 6 2}{cmd:. list id name if k1, sepby(id)}{p_end} {p 8 8 2}/* shows id values that have multiple names */{p_end} {p 6 6 2}{cmd:. funcdep jobcode, basis(id) gen(k2)}{p_end} {p 8 8 2}/* k2 indicates that the person had more than one jobcode */{p_end} {title:Additional Remarks} {p 4 4 2} To say that {it:varlist} is functionally dependent on {it:basisvars} means that values in {it:varlist} are uniquely determined by those of {it:basisvars}. Equivalently, within basis groups, the values of {it:varlist} are constant. Yet another charaterization is that the relationship of {it:basisvars} values to {it:varlist} values is either one-to-one or many-to-one. An example may help to illustrate: {com}. list, noobs {txt} {c TLC}{hline 3}{c -}{hline 3}{c -}{hline 3}{c TRC} {c |} {res}a b c {txt}{c |} {c LT}{hline 3}{c -}{hline 3}{c -}{hline 3}{c RT} {c |} {res}6 4 7 {txt}{c |} {c |} {res}9 4 7 {txt}{c |} {c |} {res}1 6 5 {txt}{c |} {c |} {res}2 6 5 {txt}{c |} {c |} {res}4 6 5 {txt}{c |} {c |} {res}2 8 5 {txt}{c |} {c BLC}{hline 3}{c -}{hline 3}{c -}{hline 3}{c BRC} {p 4 4 2} In this example, c is functionally dependent on b, but not vice-versa. {p 4 4 2} Functional dependency is a condition that is usually considered as an intrinsic property of the construction of a dataset. That is, there is a logical reason why one variable must be functionally dependent on another. One should be careful however, as there are situations where functional dependency arises as an accident of the content of the data (usually in a small sample), rather than the logic behind the construction of the dataset. The same can be said of whether a set of variables forms a key. {p 4 4 2} It is generally accepted wisdom that tables that serve as primamry storage of information should not have functional dependency between non-key variables. This is the idea behind {it:database normalization}; see {help collapseunique} for more on this subject. (You can use {cmd:collapseunique} to "factor out" such internal functional dependencies.) However, it is often acceptable and convenient to allow functional dependency between non-key variables in datasets for analysis. One use of {cmd:funcdep} is to test whether these expected functional dependencies do indeed hold. {title:Author} {p 4 4 2} David Kantor. Email {browse "mailto:kantor.d@att.net":kantor.d@att.net} if you observe any problems. {title:Also see} {p 4 4 2} {help varlist}, {help isid}, {help duplicates}; {p 4 4 2} {help collapseunique}, {help assertky}, (by the same author).