help ckvar_overview
Help for the ckvar system


This help file gives an overall description of how ckvar and its associated commands can be used to keep information about data validation and scoring connected to the variables themselves.

Remarks Remarks are presented under the headings

1. Introduction 1.1. Background 1.2. Overview

2. Quick Start

1. Introduction

1.1. Background

Data validation is an important part of working with any external dataset. Unfortunately, validation rules are rarely passed along when data are shared. When they are, they are typically in a form which forces the receiver of the data to write validation routines from scratch, wasting precious time. To make matters worse, sometimes the validation depends on other variables which are not passed along. In the worst-case (and most common) scenaraio, validation rules are not passed along whatsoever. This, of course, impedes reproducibility and lowers the value of the data. Using characteristics to store the data validation routines (or error-checking or scoring routines) allows the valiation rules to be passed along with the data to the data user, saving time and frustration. It also allows passing on all variables needed for validation.

1.2. General Overview

The ckvar command uses characteristics to store the information it needs to validate variables. Thus, passing along data also passes along the ability to validate the data, even if new observations are added to existing variables. There are also some helper commands which can keep things running smoothly: ckkeep, ckdrop, and ckrename.

2. Quick Start

To add validation rules to a dataset which is already in use, run the ckvaredit command (click here for help instead). It will open up a dialog box which will allow you to add or edit rules for validation for as many as you would like. At its simplest, you may optionally give a validation rule, optionally state whether the variable is required to be non-missing, and optionally state if there are other variables needed to validate this variable. When you are finished, typing ckvar will then validate your data. The help file for ckvar gives more details about what else your validation can involve.

You may also use this system for scoring variables as one might score an exam or a survey instrument. The distinction between validation (or error-checking) and scoring is this: validation and error-checking result in something which is two-valued (either 0 or 1), whereas scoring can result in any value. The idea is that in validation, one would normally like to create a new error variable containing 1's for errors and 0's for valid values, wheras for scoring, there could be multiple possible values depending on the value of the original variable (think: partial credit). The distinction is not large; it has been made because most people think of validation as right or wrong, errors as existing or not, but think of scores as multivalued. The help files for ckvar will primarily talk of validation, since 99% of all users will use ckvar for validation. Scoring simply involves writing more complicated `validation' rules.

Also see

Online: ckvaredit for ways to set up validation rules using a dialog box, along with how validation rules work.


Bill Rising, StataCorp email: web: