Valid Validation Rules

There are 4 ways to create valid validation rules: give a bound, give a range or set of allowable values (the in syntax), give the name of another variable which is checked in the same way (the like syntax), or to use a series of Stata commands (the complex rule syntax). These are explained below.

The bound syntax

{>= | > | == | < | <=} number

For variables which should be either larger or smaller than a number, one of the above will suffice. Only one such bound can be given.

Examples of bounds

>0 checks to see if the values are all positive.

<= 100 checks to see if the values are less than or equal to 100. It does not check anything else, such as whether the values are also non-negative.

>= 0 & <= 100 results in an error. Only one bounding criterion can be specified. If the numbers are restricted to a range, use the in syntax below.

The in syntax

in {Stata numlist | set notation}

Stata numlist is any type of list of numbers which Stata can understand. See numlist for the available syntaxes.

set notation allows using set notation to specify the list. To specify a set of discrete numbers, enclose them in curly braces ({ and } and separate them with commas. To specify ranges of numbers, use square braces [ and ] to include the endpoints and and round braces ( and ) to exclude endpoints. The missing value symbol . is used for infinity.

Examples of in

in {3,4,5,6} checks to see if the values are in given values. Note that this would be identical any of the following Stata numlists 3 4 5 6 3(1)6 3 4 to 6

in [1,5] checks to see if the values are between 1 and 5 inclusive.

in [1,5) checks to see if the values are from >=1 and <5.

in [0,.) checks to see if the values are non-negative (i.e. 0 is a valid value). This is the same as >=0, but looks more impressive. in (-.,0) checks to see if the values are negative (i.e. 0 is not a valid value). This is the same as <0.

The like syntax

like varname

like simply says that the rule given for varname should be used for this variable, also. This allows having just one copy of a rule for a series of similar variables, such as in a wide dataset, making the checking more reliable and easier to alter.

Example of like

like wow will use the same validation rule as is used for the variable wow.

The complex syntax

There really is no syntax for this, since there could be a possible series of commands which are being run. It is best to edit these commands by using the ckvaredit dialog box and pushing the Edit Complex Command button. This will invoke the docharedit command, allowing the use of Stata's doedit do file editor. There are a couple of things to keep in mind when writing a complex rule:

The variable for which the rule is being written is refered to as `self' (note the open and close quotes!). This will ensure that the rule works properly when called using a like syntax or if the variable itself is renamed.

If the rule is being used as a validation rule, so that valid values will generate non-zero results and invalid values will generate zeros, the variable holding the results is called `valid' (note the quotes, again!). If the rule will flag errors, so that non-zero results are errors, and zeros correspond to valid values, use the variable name `error' (quotes required!). Finally, if the routine is a scoring routine, then the new variable must be called `score' (quotes!).

Examples using the complex syntax

gen byte `valid' = `self'>=0 could be used to check if the values are non-negative (i.e. >= 0).

gen byte `error' = `self'<0 could also be used to check if the values are non-negative (i.e. >= 0).

gen byte `valid' = `self' >= someOtherVar could be used to check if the values of the checked variable are at least as big as that of another variable. Note that if the second variable is renamed, the validation rule will break.

gen byte `score' = `self'=="a" + .5*(`self'=="b") would give 1 full point (mark) for the answer "a", and 1/2 point (mark) for the answer "b" when grading questionnaire or multiple-choice exam.