{smcl} {* version 1.0.0 19apr2011} {cmd:help filtertrace} {hline} {title:Title} {p 5} {cmd:filtertrace} {hline 2} Trace filter or contingency questions {title:Syntax} {p 5} Define filters {p 8} {cmd:filtertrace} {opt d:efine} [{varlist} {cmd::}] {help exp:{it:exp}} [{cmd:;} {it:exp ...}] {ifin} [{cmd:, }{it:options}] {p 5} Check filters {p 8} {cmd:filtertrace} {opt c:heck} {it:vfe} [{cmd:;} {it:vfe ...}] [{cmd:, }{it:options}] {p 5} where {p 8} {it:vfe} is [{it:list} {cmd::}] {help varlist:{bf:v}arlist} {cmd:(}[T] {bf:f}ilter#{cmd:)} [{help exp:{bf:e}xp}] {p 5} in {it:vfe} {p 8 11} (filter#) is a {help numlist} indicating the filters for which {it:exp} is checked. Note that parenthesis must be used. The optional {it:T} indicates that {it:exp} is to be checked, only if the filters are true. The optional {it:list} is a list with as many elements as filters are specified, and indicates that filter 1 will be used checking {it:exp} for the first variable in varlist, filter 2 for the second and so on. {p 5 8 8} See {help filtertrace##def:filtertrace define} and {help filtertrace##chk:filtertrace check} for detailed explanations. Also see {help filtertrace##ex:Workflow examples}. {p 5} List defined filters {p 8} {cmd:filtertrace} {p 5} Drop or clear filters and flags {p 8} {cmd:filtertrace} {{cmd:drop}|{cmd:clear}} [{cmd:all}] {p 5} Reimport filters from variables or create variables {p 8} {cmd:filtertrace} {{opt i:mport}|{opt e:xport}} {it:name} {synoptset 21 tabbed}{...} {synopthdr} {synoptline} {syntab :{it:main}} {synopt:{opt g:enerate(stub)}}create variables {it:stub#}{p_end} {syntab :{it:define options}} {synopt:{opt a:dd}}add filter/s{p_end} {synopt:{opt replace(numlist)}}replace filter/s{p_end} {syntab :{it:check options}} {synopt:{opt full:exp}}use complete expression as (flag) variable labels{p_end} {synopt:{opt nof:lag}}do not create variables tagging contradictions{p_end} {synoptline} {title:Description} {pstd} {cmd:filtertrace} is used to trace filter or contingency questions. In social research questionnaires often contain {it:contingency} questions, i.e. questions respondents are asked contingent on their answer to a previous {it:filter} question. {cmd:filtertrace} allows to detect (coding) errors in contingency questions in a two-step approach. {pstd} In a first step filter questions are reconstructed (see {help filtertrace##def:define}). In a second step contingency questions are checked and errors are tagged (see {help filtertrace##chk:check}). {pstd} To learn about other subcommands see {help filtertrace##oth:here}. {title:Options} {dlgtab:Options} {phang} {opt generate(stub)} creates filter variables {it:stub1, ..., stubk}, tagging observations for which {it:exp} is true, when used with {cmd:filtertrace define}. Default is not to create filter variables. When used with {cmd:filtertrace check}, it creates indicator variables {it:stub#}, tagging contradictions. If not specified, {it:stub} defaults to {it:_con}. {phang} {opt add} adds new filters. {phang} {opt replace(numlist)} replaces all filters addressed by {it:numlist}. {phang} {opt fullexp} uses full expressions as variable labels for flags. Default is to use filter number and expression. Any {it:if} and {it:in} qualifiers used when defining the filters are omitted. {phang} {opt noflag} suppresses the creation of flag variables tagging contradictions. The summary report is also suppressed. {title:Remarks} {marker def} {pstd} {cmd:filtertrace define} is used to reconstruct filter questions in the questionnaire. It saves {it:exp} in {cmd:s(}{it:_fltrflg#_}{cmd:)}. Here {it:exp} refers to an expression as used in {cmd:if} statements. Thus {it:exp} usually contains at least one variable (e.g. var1 <= 42 & var1 >= 27). One expression represents one filter question. Use {bf:;} to separate multiple expressions. You do not specify names for filters. Once defined filters are later addressed using their number. If option {opt generate()} is specified, variables indicating observations for which {it:exp} is true, are created. {pstd} {ul:Helpful hint} {pstd} Note that {cmd:filtertrace define} allows an optional {it:varlist}. Placeholders {it:@} are used to refer to variables in {it:varlist}. Specifying {cmd:filtertrace define var1-var3 : @ == 1} {pstd} is equal to {cmd:filtertrace define var1 == 1 ; var2 == 1 ; var3 == 1} {pstd} Both lines will define three filters (1, 2 and 3). You may omit placeholder {it:@} in the first command, since a {it:basic} expression is used (see {help filtertrace##plc:Placeholders}). {marker chk} {pstd} {cmd:filtertrace check} is used to detect errors in contingency questions. Each {it:vfe} specified (and separated by {bf:;}) is checked separately. A {it:vfe} consists of a {it:varlist} a list of filter numbers ({it:filter#}) and an expression ({it:exp}). For each variable specified in {it:varlist}, {it:exp} is checked for observations identified by the filters in {it:filter#}. Each {filter#} is checked separatly. A flag variable is created, tagging observations for which {it:filter#} is true but {it:exp} is {bf:not} true. Use placeholders to refer to variables in {it:varlist} (see {help filtertrace##plc: Placeholders}). If {it:exp} is not specified it defaults to {bf:!= .} (i.e. not system missing). {pstd} {ul:Helpful hints} {pstd} Note that {cmd:filtertrace check var1 (1) != .} {pstd} does not only check whether all observations, for which filter 1 is true, do not have system missing values in var1, but also checks the condition the other way round. That is, it also checks {cmd:filtertrace check var1 !(1) == .} {pstd} To suppress the second check, type {cmd:filtertrace check var1 (T 1) != .} {pstd} Also note the use of the optional {it:list} {cmd:filtertrace check 1/3 : var@ (1/3) != .} {pstd} is equal to {cmd:filtertrace check var1 (1) != . ; var2 (2) != . ; var3 (3) != .} {marker oth} {pstd} {cmd:filtertrace} typed without subcommand lists and describes all defined filters. {pstd} {cmd:filtertrace drop} drops created filter variables. If {cmd:all} is specified, all flag variables, indicating observations with errors in contingency questions are also dropped. Note that it is not necessary to drop flag variables, since they are replaced each time {cmd: filtertrace check} is used. {pstd} {cmd:filtertrace clear} clears defined filters. If {cmd:all} is also specified, any user defined {it:stub} is cleared from {cmd:s()}. This is similar to specifying {help sreturn clear}. {pstd} {cmd:filtertrace import} reconstructs filters from variables previously created using {cmd:filtertrace define} with option {opt generate()}. Filters may only be imported if there are no filters defined yet. {pstd} {cmd:filtertrace export} creates flag variables {it:name#} from defined filters. {marker plc} {pstd} {ul:{hi:Placeholders}} {pstd} {ul:Placeholders in expressions} {pstd} {cmd:filtertrace} knows two types of expressions. An expression is considered {it:basic} if it has only one variable and one {help operators:relational operator}. Expressions with at least one of the {it:logical operators} {bf:&} or {bf:|} are considered {it:complex}. Therefore {it:var1 <= 42} is a {it:basic} expression, while {it:var1 <=42 & var1 >= 27} is a {it:complex} expression. Note that the latter expression can be rewritten as a {help inrange()} function, e.g. {it: inrange(var1, 27, 42)}. Functions are also considered {it:complex expressions}. {pstd} In {it:complex expressions} placeholder/s {bf:@} must be used to refer to variables in {it:varlist}. The use of placeholders is not required in {it:basic} expressions. Thus typing {cmd:filtertrace check var1 var2 (1) > 42} {pstd} is ok, as is {cmd:filtertrace check var1 var2 (1) !inlist(@, 1, 2, 4)} {pstd} {ul:Placeholders in ({it:var}){it:list} ({cmd:filtertrace check} only)} {pstd} {cmd:filtertrace check} allows an optional {it:list}. This {it:list} indicates, that {it:exp} is to be checked for all variables, but contingent on different filters. Use placeholders to indicate which filter is to be used with which variable. The line {cmd:filtertrace check a b : var1@ var2@ var3@ (1/2) inlist(@, 1, 2, 4)} {pstd} is equal to {cmd:filtertrace check var1a var2a var3a (1) inlist(@, 1, 2, 4) ; ///} {cmd:var1b var2b var3b (2) inlist(@, 1, 2, 4)} {pstd} Note that {it:list} must have as many elements as filter numbers are specified. Also note that the placeholder in the {cmd:inlist()} function ({it:complex} expression) does not have anything to do with the specified {it:list}. {title:Examples} . sysuse nlsw88 ,clear {cmd:. filtertrace define age > 40 ; inrange(wage, 10, 25)} {pstd} defines two filters. Filter one is true for all observations older than 40, filter two is true for all observations with wages between 10 and 25. {cmd:. filtertrace} {pstd} lists the (two) defined filters. {cmd:. filtertrace export filter} {pstd} creates dummy variables {it:filter1} and {it:filter2}, indicating observations older than 40 and with wages between 10 and 25. {cmd:. filtertrace check married (1 2) == 1} {pstd} checks whether all observations older than 40 are married and all observations age 40 and younger are not married. It also checks whether all observations with wages between 10 and 25 are married and all observations with wages outside this range are single. There will be contradictions. {marker ex} {title:Workflow examples} {pstd} I will start with a simple example. Suppose a questionnaire containing the question {it:Are you pregnant?}. Obviously this (contingency) question should only be asked, if the respondent was female. Therefore it is preceded by a question about the respondent's gender. In our dataset we have two variables {it:gender} (with value 1 for women) and {it:pregnancy}. All male respondents are expected to have a (system) missing value in {it:pregnancy}. To check, we first reconstruct the filter. {cmd:filtertrace define gender == 0} {pstd} The syntax should be self-explanatory. It saves one condition (that is: one filter) in {bf:s(}{it:_fltrflg1_}{cmd:)}. Next we will check, whether {it:pregnancy} is missing for all male respondents. {cmd:filtertrace check pregnancy (1) == .} {pstd} Here we specified three arguments. We first specify the variable we want to check: {it:pregnancy}. The second argument is the filter's number, for which we want to check the (basic) expression specified as the third argument. The output we get will be something like variable: pregnancy (1) checking (pregnancy == .) no contradictions !(1) checking !((pregnancy == .)) no contradictions Contradictions no contradictions {pstd} It tells us that there are no male respondents who answered the pregnancy question. The output also tells us, that there are no females who did not answer the question. {pstd} The above example is very simple, and we would probably be faster using {help tabulate_twoway:tabulate} and look at the cross tabulation of {it:gender} and {it:pregnancy} (however we would not be able to identify observations contradicting the condition, if there were any). We will see more complex examples below. {pstd} For the moment pretend we did not have checked filter 1 yet. In our questionnaire we asked respondents of all ages. However we only asked the pregnancy question if respondents were female and age 14-55. All women younger than 14 or older than 55 are not supposed to have answered this question. To check we first add a second filter. {cmd:filtertrace define age : @ < 14 | @ > 55 ///} {cmd:if gender == 1 ,generate(filter) add} {pstd} We used a varlist (containing only the one variable {it:age}) so we can refer to this variable using placeholders in the {it:complex} expression. It is also ok to code {cmd:filtertrace define age < 14 | age > 55 ///} {cmd:if gender == 1 ,generate(filter) add} {pstd} Note that {it:age} is not considered a {it:varlist} or {it:variable} here, but is part of {help expression:exp}. Also note the {it:if} qualifier (used in both cases) to restrict {it:exp} to women. All male respondents will have system missing values in variable {it:filter2}. Coding {cmd:filtertrace define age : (@ < 14 | @ > 55) ///} {cmd:& gender == 1 ,generate(filter) add} {pstd} will assign value 0 to all male respondents, as well as to all females aged 14-55, but we do not want that. We can now check for (coding) errors in the pregnancy question. {cmd:filtertrace check pregnancy (T 1) == . ; pregnancy (2) == .} {pstd} In this example we used the optional {it:T} in {it:filter1} to suppress checking {it:exp} the other way round. We did so, because we do no longer expect {it:all} females to have answered the pregnancy question. We check {it:filter2} both ways because all females younger than 14 and older than 55 are expected to have missing values, while {it:all} females between age 14 and 55 must have answered the pregnancy question. {pstd} In a last example, suppose we used a position generator (Lin and Dumnin 1986) in our questionnaire. We presented respondents a list of 10 occupations asking them to indicate whether they knew anyone having these occupations. If so, we asked them about this person's gender age and education. The position generator leaves us with 40 variables in our dataset (occ1-occ10, gender1-gender10, age1-age10, edu1-edu10). Respondents who do not know anyone having occupation 1 are expected to have missing values in gender1, age1 and edu1. Respondents knowing someone with occupation {it:x} must not have (system) missings. We check this by defining ten (more) filters -- one for each occupation. {cmd:filtertrace define occ1-occ10 : == 1 ,add} {pstd} This line creates 10 filters (filters 3 to 12). Next we check the contingency questions. {cmd:filtertrace check gender1 age1 edu1 (1) != . ; ///} {cmd:gender2 age2 edu2 (2) != . ; [...] ; ///} {cmd:gender10 age10 edu10 (10) != .} {pstd} These lines will work fine, but are sure a lot of typing. It might be more convenient to type {cmd:filtertrace check 1/10 : gender@ age@ edu@ (3/12) != .} {pstd} You may also create a simple {help forvalues} loop. The advantage of the line given is, that there will only be one summary report instead of 10 with the {cmd:forvalues} loop. {title:Acknowledgments} {pstd} The idea for {cmd:filtertrace} is partly inspired by Krishnan Bhaskaran's {cmd:datacheck}. {pstd} See Bill Rising's {cmd:ckvar} for a more sophisticated approach to data validation. {title:References} {pstd} Lin, N and Dumnin, M (1986). Access to Occupations through Social Ties. Social Networks, 8. 365-385 {title:Author} {pstd}Daniel Klein, University of Bamberg, klein.daniel.81@gmail.com {title:Also see} {psee} Online: {help assert}, {help mi()}, {help inlist()}, {help inrange()}, {help forvalues}{p_end} {psee} if installed: {help datacheck}, {help ckvar}{p_end}