/* ado\screenmatches.ado 1-7-2004 This is part of a suite of programs for use with mahapick.ado. This is to be used at the analysis phase, after all the matching and data- assembly is completed. This allows you to have a large pool of matches (controls), but to do analyses on a smaller set. For example, you may have 8 controls per treated case, but you want to use only 3. It is not enough to simply cut down the set to the first n control per treated case. What we want is, for a given set of variables, find the best n controls per treated case that have no missing data on all the variables. The whole point of this is that you can tweak the set of variables in the analysis, and you don't need to redo the matching due to resulting shifts in the "active" set of observations. Without this capability, you may be compelled, in the matching process, to consider what variables are to be analyzed -- limiting the control pool to cases that are nonmissing on a given set of variables. Subsequent tweaking of this set would force you to either redo the matching, or to accept a diminished set of observations in the analysis. (Also, in the latter situation, you might, say, end up with 3 controls for some treated cases, and 2 controls for others.) The present program helps avoid these problems. It allows you to ignore the set of analysis variables at the time you create the matching. The idea is to get a large set of controls per treated case (call it n0) in the matching process. Then at analysis time, you pick a smaller number of controls per treated case (call it n1), but for the given analysis, it is the best n1 cases per treated cases that are possible. If n0 is sufficiently large, then all treated cases will get the full n1 controls cases (not fewer). IMPORTANT: this is based on the structure you get out of mahapick, using the genfile option -- or the pickids option with appropriate stacking/reshaping. That is, we assume a "matchnum" variable which is 0 for the treated case, and 1, 2, 3, etc., for the controls -- where 1 is for the best match, 2 is the next best match, and so on. This is adapted from code in psid012\tab060.do, with certain items converted to (required) options, rather than being hard-coded names. */ prog def screenmatches, sortpreserve /* screen a set of match cases (along with the "treated") to have at most `nummatches' matches. Take the first `nummatches' cases (in terms of `matchnum') that have no missings on varlist. Not that typically, this `nummatches' is smaller than the `nummatches' used in mahapick. */ version 8.2 *! version 0.0.4 2-9-2006 /* prior versions 0.0.0 1-8-2004 0.0.1 1-16-2004 0.0.2 1-22-2004 0.0.3 2-5-2006 History: 1-7-2004: began coding, adapted from part of psid012\tab060.do (identical program found in several related psid012 do files.). 1-8-2004: finished first working version. 1-16-2004: added Verbose option. 1-22-2004: implementing summ and tab options. summ will calculate and summarize the min and max matchnum values for control cases screened in. tab applies only if summ is specified; will also do tabs of these min and max. 2-5-2006: Just fixed comments. 2-9-2006: Just fixed version. */ syntax varlist [if], gen(string) nummatches(integer) /// matchnum(varname) prime_id(varname) [Verbose summ tab] confirm new var `gen' confirm numeric var `matchnum' confirm numeric var `prime_id' marksample m tempvar out control gen byte `out' = ~`m' gen byte `control' = `matchnum' >0 sort `prime_id' `control' `out' `matchnum' by `prime_id' `control': gen byte `gen' = _n<= `nummatches' & `m' // The `m' limits this to cases that are themselves okay. quietly by `prime_id': replace `gen' = 0 if ~`m'[1] /* This limits this to cases where the prime case (the treated) is okay. This feature added 11-11-2003 in tab060.do. */ /* Debug stuff: tempvar q egen byte `q' = max(`out'), by(`prime_id') list `prime_id' `control' `matchnum' `m' `gen' if `q' , sepby(id_prime) */ if "`verbose'" ~= "" { qui count if ~`control' & `gen' local n_treated "`r(N)'" qui count if `control' & `gen' local n_control "`r(N)'" disp "screenmatches; num treated = `n_treated'; num control = `n_control'" } if "`summ'" ~= "" { /* Summarize the ranges of the matchnums taken (among controls). */ tempvar matchnum02 gen long `matchnum02' = `matchnum' if `control' & `gen' tempvar min02 max02 n1 egen long `min02' = min(`matchnum02'), by(`prime_id') egen long `max02' = max(`matchnum02'), by(`prime_id') bysort `prime_id': gen int `n1' = _n disp "summ of the min and max of control matchnums screened" summ `min02' `max02' if `n1' == 1 if "`tab'" ~= "" { disp "tabs of the min and max of control matchnums screened" tab `min02' if `n1' == 1 tab `max02' if `n1' == 1 } } end // screenmatches