{smcl} {* September 19, 2007 @ 14:17:56}{...} {hi:help ckvar} {hline} {title:Title} {phang} {cmd:ckvar} - Data Validation (or Scoring) using Rules {p_end} {title:Syntax} {* put the syntax in what follows. Don't forget to use [ ] around optional items} {p 8 16 2} {cmd:ckvar} [{varlist}], [ {c -(} {opt val:id} | {opt sc:ore} {c )-} {opt key(varlist)} {opt markd:up(newvar)} {opt nov:ars} {opt keepgoing} {opt brief} {opt progress} {opt stub(str)} {opt droplabels} {opt nopreserve} {opt loud} ] {p_end} {title:Description} {pstd} {cmd:ckvar} is a utility command which can be used to validate or score values of variables. It does this by reading the validation or scoring rules from {help char:characteristics} which are attached to the variables themselves, instead of relying on external do or ado files. {cmd:ckvar} can also be used to check for duplicate observations based on a key, and can mark groups of duplicated keys. {p_end} {pstd} This help file explains the syntax of the {cmd:ckvar} command itself. {p_end} {pstd}To see how to set or edit the rules used by {cmd:ckvar}, look at the help for {help ckvaredit}. (No knowledge of {help char:characteristics} is needed.) {p_end} {pstd}To see an overview of the purpose of {cmd:ckvar}, please look {help ckvar_overview:here}. {p_end} {pstd}To see the details of how {cmd:ckvar} is implemented, and what {help char:charateristics} it uses to validate or score a dataset, please look {help ckchar:here}. {p_end} {title:Options} {phang} {cmd:valid} and {cmd:score} tell whether to run validation (i.e. error-checking) routines or scoring routines associated with the specified variables. Only one can be specified. The {cmd:valid} option is the default. When validatating, a new variable will be produced for each variable which has at least one error; all observations are either valid or invalid. When scoring, a new score variable will be produced for every variable scored, and there is no assumption of just two possible outcomes. {p_end} {phang} {cmd:key} allows checking for observations with duplicate keys. The {it:varlist} here defines the variable(s) which together are supposed to define unique identifiers for the observations of the dataset (in database terms: the fields which define the key). These variables must already exist. {p_end} {phang} {cmd:markdup} allows unique marking of groups of duplicate observations so that they can be investigated more easily. The variable name given here must be that of a new variable. After the duplicate check has been run, this variable contains a 0 for observations which are not part of a group of duplicates and non-negative integer for each observation which is part of a group of duplicates. Each group has its own number to make the comparisons easier. {p_end} {phang} {cmd:novars} may be specified together with {cmd:key} if all that is desired were a check for duplicates. Specifying this option will ignore all validation routines. {p_end} {phang} {cmd:keepgoing} tells {cmd:ckvar} to keep running, even if programming errors are encountered. This can be used to find all problematic characteristics at once. All variables that can be checked are checked. All variables with fatal errors are noted. {p_end} {phang} {cmd:brief} shortens the validation table produced after the variables are checked by eliminating rows for variables which either are completely valid or which do not get validated. This is intended for those concentrating on tracking down errors rather than documenting their existence. {p_end} {phang} {cmd:progress} echos the name of each variable as it is being validated or scored. This is useful for detecting runaway processes, though it clutters the screen when checking datasets with many variables. {p_end} {phang} {cmd:stub} overrides the usual stub for the {help char:characteristics} used by {cmd:ckvar}. By default, validation routines use characteristics starting with {cmd:valid}, while scoring routines use characteristics starting with {cmd:score}. The {cmd:stub} option is intended to allow multiple possible scoring routines on the same dataset. {p_end} {phang} {cmd:droplabels} instructs {cmd:ckvar} to drop value labels associated with variables generated when checking errors. This would be very rarely used, except when debugging validation or scoring routines. {p_end} {phang} {cmd:nopreserve} prevents the dataset from being {help preserve}d before running {cmd:ckvar}. By default, the dataset is preserved so that if there are problems with the validation or scoring, it is returned to its pristine state, without any extra variables. If the data set is large, {cmd:nopreserve} can save time. {p_end} {phang} {cmd:loud} causes output from the underlying {help dochar} program to be echoed to the screen. Its only use is for debugging. {p_end} {title:Examples} {phang}{cmd:. ckvar}{p_end} {phang2} checks all the variables which have validation routines, generating indicator variables for variables which have bad data. If there are any errors, the total number of errors will be stored in a variable called {cmd:error__total}. {p_end} {phang}{cmd:. ckvar, score}{p_end} {phang2} does the same, but scores all the variables, generating one score variable for every variable which has a scoring routine. In this case the total will be stored in a variable called {hi:score__total}. {p_end} {phang}{cmd:. ckvar this that theOther}{p_end} {phang2} checks the three variables {cmd:this}, {cmd:that}, and {cmd:theOther} for errors. If there are errors, the total count of errors for each observation is put into the new variable {cmd:error__total}. {p_end} {phang}{cmd:. ckvar, key(ssn date_of_visit) markdup(duplicates)}{p_end} {phang2} runs all the validation routines and checks to see if there are any observations which have the same combination of {cmd:ssn} and {cmd:date_of_visit}. If there are any duplicates, the variable {cmd:duplicates} will mark groups of duplicates with different numbers. Finally, if there are any errors, the total number of errors found in each observation will be stored in the (new) variable {cmd:error__total} {p_end} {phang}{cmd:. ckvar, key(ssn date_of_visit) markdup(duplicates) novars}{p_end} {phang2} checks only to see if there are any observations which have both the same {cmd:ssn} and the same {cmd:date_of_visit}. Once again, if there are any duplicates, the variable {cmd:duplicates} will mark groups of duplicates with different numbers. Finally, the {cmd:novars} option states that no error checking or scoring is to be done in this case. {p_end} {title:Notes} {pstd} You do not need any understanding of {help char:characteristics} to use {cmd:ckvar}, even if you need very complicated rules. {helpb ckvaredit} provides a dialog box which allows the rules to be entered and edited in a natural fashion. {helpb ckvardo} allows the rules to be dumped into a do file for application to another dataset. If you are, however interested in understanding the naming conventions for the characteristics, look at {helpb ckchar}. If you are truly masochistic, and would like to see how to program complicated rules by hand, first look at {helpb dochar}, and then at {helpb docharprog}. {p_end} {title:Author} {pstd} Bill Rising, StataCorp{break} email: brising@stata.com{break} web: {browse "http://homepage.mac.com/brising":http://homepage.mac.com/brising} {p_end}