help filtertrace -------------------------------------------------------------------------------

Title

filtertrace -- Trace filter or contingency questions

Syntax

Define filters

filtertrace define [varlist :] exp [; exp ...] [if] [in] [, options]

Check filters

filtertrace check vfe [; vfe ...] [, options]

where

vfe is [list :] varlist ([T] filter#) [exp]

in vfe

(filter#) is a numlist indicating the filters for which exp is checked. Note that parenthesis must be used. The optional T indicates that exp is to be checked, only if the filters are true. The optional list is a list with as many elements as filters are specified, and indicates that filter 1 will be used checking exp for the first variable in varlist, filter 2 for the second and so on.

See filtertrace define and filtertrace check for detailed explanations. Also see Workflow examples.

List defined filters

filtertrace

Drop or clear filters and flags

filtertrace {drop|clear} [all]

Reimport filters from variables or create variables

filtertrace {import|export} name

options Description ------------------------------------------------------------------------- main generate(stub) create variables stub# define options add add filter/s replace(numlist) replace filter/s check options fullexp use complete expression as (flag) variable labels noflag do not create variables tagging contradictions -------------------------------------------------------------------------

Description

filtertrace is used to trace filter or contingency questions. In social research questionnaires often contain contingency questions, i.e. questions respondents are asked contingent on their answer to a previous filter question. filtertrace allows to detect (coding) errors in contingency questions in a two-step approach.

In a first step filter questions are reconstructed (see define). In a second step contingency questions are checked and errors are tagged (see check).

To learn about other subcommands see here.

Options

+---------+ ----+ Options +----------------------------------------------------------

generate(stub) creates filter variables stub1, ..., stubk, tagging observations for which exp is true, when used with filtertrace define. Default is not to create filter variables. When used with filtertrace check, it creates indicator variables stub#, tagging contradictions. If not specified, stub defaults to _con.

add adds new filters.

replace(numlist) replaces all filters addressed by numlist.

fullexp uses full expressions as variable labels for flags. Default is to use filter number and expression. Any if and in qualifiers used when defining the filters are omitted.

noflag suppresses the creation of flag variables tagging contradictions. The summary report is also suppressed.

Remarks

filtertrace define is used to reconstruct filter questions in the questionnaire. It saves exp in s(_fltrflg#_). Here exp refers to an expression as used in if statements. Thus exp usually contains at least one variable (e.g. var1 <= 42 & var1 >= 27). One expression represents one filter question. Use ; to separate multiple expressions. You do not specify names for filters. Once defined filters are later addressed using their number. If option generate() is specified, variables indicating observations for which exp is true, are created.

Helpful hint

Note that filtertrace define allows an optional varlist. Placeholders @ are used to refer to variables in varlist. Specifying

filtertrace define var1-var3 : @ == 1

is equal to

filtertrace define var1 == 1 ; var2 == 1 ; var3 == 1

Both lines will define three filters (1, 2 and 3). You may omit placeholder @ in the first command, since a basic expression is used (see Placeholders).

filtertrace check is used to detect errors in contingency questions. Each vfe specified (and separated by ;) is checked separately. A vfe consists of a varlist a list of filter numbers (filter#) and an expression (exp). For each variable specified in varlist, exp is checked for observations identified by the filters in filter#. Each {filter#} is checked separatly. A flag variable is created, tagging observations for which filter# is true but exp is not true. Use placeholders to refer to variables in varlist (see Placeholders). If exp is not specified it defaults to != . (i.e. not system missing).

Helpful hints

Note that

filtertrace check var1 (1) != .

does not only check whether all observations, for which filter 1 is true, do not have system missing values in var1, but also checks the condition the other way round. That is, it also checks

filtertrace check var1 !(1) == .

To suppress the second check, type

filtertrace check var1 (T 1) != .

Also note the use of the optional list

filtertrace check 1/3 : var@ (1/3) != .

is equal to

filtertrace check var1 (1) != . ; var2 (2) != . ; var3 (3) != .

filtertrace typed without subcommand lists and describes all defined filters.

filtertrace drop drops created filter variables. If all is specified, all flag variables, indicating observations with errors in contingency questions are also dropped. Note that it is not necessary to drop flag variables, since they are replaced each time filtertrace check is used.

filtertrace clear clears defined filters. If all is also specified, any user defined stub is cleared from s(). This is similar to specifying sreturn clear.

filtertrace import reconstructs filters from variables previously created using filtertrace define with option generate(). Filters may only be imported if there are no filters defined yet.

filtertrace export creates flag variables name# from defined filters.

Placeholders

Placeholders in expressions

filtertrace knows two types of expressions. An expression is considered basic if it has only one variable and one relational operator. Expressions with at least one of the logical operators & or | are considered complex. Therefore var1 <= 42 is a basic expression, while var1 <=42 & var1 >= 27 is a complex expression. Note that the latter expression can be rewritten as a inrange() function, e.g. inrange(var1, 27, 42). Functions are also considered complex expressions.

In complex expressions placeholder/s @ must be used to refer to variables in varlist. The use of placeholders is not required in basic expressions. Thus typing

filtertrace check var1 var2 (1) > 42

is ok, as is

filtertrace check var1 var2 (1) !inlist(@, 1, 2, 4)

Placeholders in (var)list (filtertrace check only)

filtertrace check allows an optional list. This list indicates, that exp is to be checked for all variables, but contingent on different filters. Use placeholders to indicate which filter is to be used with which variable. The line

filtertrace check a b : var1@ var2@ var3@ (1/2) inlist(@, 1, 2, 4) is equal to

filtertrace check var1a var2a var3a (1) inlist(@, 1, 2, 4) ; /// var1b var2b var3b (2) inlist(@, 1, 2, 4)

Note that list must have as many elements as filter numbers are specified. Also note that the placeholder in the inlist() function (complex expression) does not have anything to do with the specified list.

Examples

. sysuse nlsw88 ,clear

. filtertrace define age > 40 ; inrange(wage, 10, 25)

defines two filters. Filter one is true for all observations older than 40, filter two is true for all observations with wages between 10 and 25.

. filtertrace

lists the (two) defined filters.

. filtertrace export filter

creates dummy variables filter1 and filter2, indicating observations older than 40 and with wages between 10 and 25.

. filtertrace check married (1 2) == 1

checks whether all observations older than 40 are married and all observations age 40 and younger are not married. It also checks whether all observations with wages between 10 and 25 are married and all observations with wages outside this range are single. There will be contradictions.

Workflow examples

I will start with a simple example. Suppose a questionnaire containing the question Are you pregnant?. Obviously this (contingency) question should only be asked, if the respondent was female. Therefore it is preceded by a question about the respondent's gender. In our dataset we have two variables gender (with value 1 for women) and pregnancy. All male respondents are expected to have a (system) missing value in pregnancy. To check, we first reconstruct the filter.

filtertrace define gender == 0

The syntax should be self-explanatory. It saves one condition (that is: one filter) in s(_fltrflg1_). Next we will check, whether pregnancy is missing for all male respondents.

filtertrace check pregnancy (1) == .

Here we specified three arguments. We first specify the variable we want to check: pregnancy. The second argument is the filter's number, for which we want to check the (basic) expression specified as the third argument. The output we get will be something like

variable: pregnancy

(1) checking (pregnancy == .) no contradictions !(1) checking !((pregnancy == .)) no contradictions

Contradictions

no contradictions

It tells us that there are no male respondents who answered the pregnancy question. The output also tells us, that there are no females who did not answer the question.

The above example is very simple, and we would probably be faster using tabulate and look at the cross tabulation of gender and pregnancy (however we would not be able to identify observations contradicting the condition, if there were any). We will see more complex examples below.

For the moment pretend we did not have checked filter 1 yet. In our questionnaire we asked respondents of all ages. However we only asked the pregnancy question if respondents were female and age 14-55. All women younger than 14 or older than 55 are not supposed to have answered this question. To check we first add a second filter.

filtertrace define age : @ < 14 | @ > 55 /// if gender == 1 ,generate(filter) add

We used a varlist (containing only the one variable age) so we can refer to this variable using placeholders in the complex expression. It is also ok to code

filtertrace define age < 14 | age > 55 /// if gender == 1 ,generate(filter) add

Note that age is not considered a varlist or variable here, but is part of exp. Also note the if qualifier (used in both cases) to restrict exp to women. All male respondents will have system missing values in variable filter2. Coding

filtertrace define age : (@ < 14 | @ > 55) /// & gender == 1 ,generate(filter) add

will assign value 0 to all male respondents, as well as to all females aged 14-55, but we do not want that. We can now check for (coding) errors in the pregnancy question.

filtertrace check pregnancy (T 1) == . ; pregnancy (2) == .

In this example we used the optional T in filter1 to suppress checking exp the other way round. We did so, because we do no longer expect all females to have answered the pregnancy question. We check filter2 both ways because all females younger than 14 and older than 55 are expected to have missing values, while all females between age 14 and 55 must have answered the pregnancy question.

In a last example, suppose we used a position generator (Lin and Dumnin 1986) in our questionnaire. We presented respondents a list of 10 occupations asking them to indicate whether they knew anyone having these occupations. If so, we asked them about this person's gender age and education. The position generator leaves us with 40 variables in our dataset (occ1-occ10, gender1-gender10, age1-age10, edu1-edu10). Respondents who do not know anyone having occupation 1 are expected to have missing values in gender1, age1 and edu1. Respondents knowing someone with occupation x must not have (system) missings. We check this by defining ten (more) filters -- one for each occupation.

filtertrace define occ1-occ10 : == 1 ,add

This line creates 10 filters (filters 3 to 12). Next we check the contingency questions.

filtertrace check gender1 age1 edu1 (1) != . ; /// gender2 age2 edu2 (2) != . ; [...] ; /// gender10 age10 edu10 (10) != .

These lines will work fine, but are sure a lot of typing. It might be more convenient to type

filtertrace check 1/10 : gender@ age@ edu@ (3/12) != .

You may also create a simple forvalues loop. The advantage of the line given is, that there will only be one summary report instead of 10 with the forvalues loop.

Acknowledgments

The idea for filtertrace is partly inspired by Krishnan Bhaskaran's datacheck.

See Bill Rising's ckvar for a more sophisticated approach to data validation.

References

Lin, N and Dumnin, M (1986). Access to Occupations through Social Ties. Social Networks, 8. 365-385

Author

Daniel Klein, University of Bamberg, klein.daniel.81@gmail.com

Also see

Online: assert, mi(), inlist(), inrange(), forvalues

if installed: datacheck, ckvar