help ckchar                                                   dialog: ckvaredit


Details About Using Characteristics to Do Validation, Error Checking, or Scorin > g


This help file explains the characteristics (or chars in Stata parlance) used by ckvar to validate, error-check or score values of variables in a dataset. These are the details about where ckvar finds the information it needs. Most of what is needed to use ckvar can be found in ckvaredit.

Remarks Remarks are presented under the headings

1. Introduction

2. Naming Conventions for Characteristics

2.1 Stubs 2.1.1 The valid stub 2.1.2 The score stub

2.2 Suffixes 2.2.1 Common Suffixes The check suffix The required suffix The missing_value suffix The other_vars_needed suffix

2.2.2 Esoteric Suffixes The varname suffix The vlabel_name suffix The vlabel suffix The wt suffix

1. Introduction

ckvar works by reading information from characteristics attached to each variable, and then using the information to generate new variables (such as variables which mark observations containing errors) and attach value labels to the new variables. It looks for the information it needs in specifically named characteristics, substituting default values if the characteristics are blank. This file explains the naming.

2. Naming Conventions for Characteristics

All the names of characteristics used by ckvar consist of a prefix, or stub, an underscore (_), and then a suffix. In all cases, the stub gives the general overall purpose, such as valid for validation/error-checking or score for scoring, and the suffix gives the information to which the contents of the characteristic pertains, such as required for stating whether non-missing values are required. For example, the characteristic valid_rule has a stub of valid, meaning it is used for validation, and a suffix of rule because it is a rule which can be evaluated.

Each of the stubs and suffixes are explained below.

2.1 Stubs

Stubs are used to denote the purpose of the characteristic. The reason for using a stub is that it will then gather all characteristics of a similar purpose together if a char list command is given.

There are two common stubs used by ckvar: valid and score. These are not restrictive, because the user can specify any other stub to use with ckvar. Using other stubs should be done only if completely necessary. (One rare situation would be if there are multiple different scoring rules which need to be distinguished from one another.) In any other case, using the default stub names is better, because they are what other users of the dataset will be expecting to see.

2.1.1 The valid stub

The valid stub is used for saving characteristics which correspond to data validation or error checking.

2.1.2 The score stub

The score stub is used for saving characteristics which correspond to computing a score from values of the variable. For example, if the variables were responses to test or instrument questions, and such questions were combined to get a score (or scores), the score stub would be used for all characteristics.

2.2 Suffixes

Suffixes are used to denote particular tasks, or rules, which are associated with checking the variable. These are restricted to a particular set of suffixes, and will be explained below. The list of suffixes is split into two groups: The those that are commonly used, and which currently can be set by using the dialog box created by ckvaredit, and those that are a bit more esoteric.

2.2.1 Common Suffixes The rule suffix

The rule suffix is used to hold rules with which the data contained in a variable is validated or scored. (Note: This suffix was _check in earlier versions ckvar earlier than 3.2.0. See ckvarupdate for updating instructions.) The required suffix

The required suffix states whether missing values are errors or not. If the required suffix is "yes", "true", or "1" (or any abbreviation of these), then missing values are considered to be errors. Otherwise, missing values are not errors. The missing_value suffix

By default, if missing values are considered errors, they are marked with a value of -1, so that errors of commission and errors of omission are kept separate. If another value is desired, it goes with the suffix missing_value. The other_vars_needed suffix

If other variables are needed for validating or scoring the variable in question, their names go in the suffix other_vars_needed. This suffix is used by ckvar as well as ckdrop, ckkeep, and ckrename.

2.2.2 Esoteric Suffixes The varname suffix

By default, all variables which hold indicators for errors start with error, and all variables which the scoring routines generate start with score. If another prefix for these variables is desired, it can be put in the suffix varname. The vlabel_name suffix

The vlabel_name suffix contains a name of a value label that should be used for labeling the values of the generated variable. Note that this value label name will typically get overwritten when doing validation checks. The vlabel suffix

The vlabel suffix contains a list of value and label pairs (as is used in label define). It would typically be used for scoring, when the computed scores take on known integer values. The wt suffix

The wt suffix is for weighting scores. It can be a number only, and it is applied to all the scores from that variable. Think of a point value for a question on an exam, or a weighting for a question on an instrument.


If you become interested in writing more complicated error checkers, the general rules for programming with charactaristics and the dochar command are given in the docharprog.


Bill Rising, StataCorp email: web: