{smcl} {.-} help for {cmd:reclass} {right: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx}} {.-} {title:reclass} {p 8 27} {cmd:reclass} {it:varname} , {cmd:[} {cmdab:p:cnttab(}{it:string}{cmd:)} {cmdab:ad:just} {cmdab:b:estmod} {cmd:q(}{it:real}{cmd:)} {cmd:u(}{it:real}{cmd:)} {cmdab:d:ist(}{it:real}{cmd:)} {cmdab:as:soc(}{it:string}{cmd:)} {cmdab:m:isclass(}{it:real}{cmd:)} {cmdab:f:ormat(}{it:string}{cmd:)} {cmdab:v:erbose} {cmd:]} {title:Description} {p} {cmd:reclass} is called by {help perturb} to create a table of reclassification probabilities for use a perturbation analysis. It can be used separately to experiment with different association patterns. {it:varname} should refer to a categorical variable. {cmd:reclass} creates a table of reclassification probabilities such that the expected frequencies of the reclassified variable will be equal to the frequency distribution of {it:varname}. In addition, an appropriate association is imposed between the reclassified variable and {it:varname}. {title:Options} {p 0 4} {cmd:pcnttab} can be either a single value, a row or column matrix, or a square matrix. Usually, a single value between 0 and 100 will be specified indicating the percentage cases to be reclassified to the same category. {p 4 4} If a row or column matrix is specified its dimensions must correspond with the number of categories of {it:varname}. Values should be between 0 and 100 and indicate the percentage of cases to be reclassified to the same category for each category separately. {p 4 4} If a square matrix is specified, its dimensions must correspond with the number of categories of {it:varname}. The matrix should indicate the reclassification probabilities with the original variable in the rows and the reclassified variables in the columns. Either percentages or probabilities may be used. It is not necessary for these to add to 100 or to 1 respectively as {cmd:reclass} transforms them into columnwise proportions. {p} In most cases, the {cmd:pcnttab} option will suffice. The options below are useful for users familiar with loglinear models for square tables (mobility models). {p 0 4} {cmd:adjust} By default, {cmd:reclass} defines reclassification probabilities such that the expected frequencies of the reclassified variable are the same as those of {it:varname} when the {cmd:pcnttab} option is used. Use {cmd:noadjust} to suppress this and use the percentages specified in the {cmd:pcnttab} option unmodified. {cmd:noadjust} implies {cmd:nobestmod}. {p 0 4} {cmd:bestmod} By default, {cmd:reclass} imposes an appropriate pattern of association between {it:varname} and its reclassified counterpart when the {cmd:pcnttab} option is used. Use {cmd:nobestmod} to avoid this. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those of {it:varname} but they will otherwise be close approximations of the values specified in the {cmd:pcnttab} option. {p 0 4} {cmd:misclass} Maintained for compatiblity with the original version of {cmd:misclass} and {help perturb}. Translated by {cmd:reclass} into {cmd:pcnttab(100-}{it:misclass}{cmd:) noadjust}. {p} The options below can be used to specify the parameters of a pattern of association. They will be ignored if {cmd:pncttab} was specified. {p 0 4} {cmd:q} the multiplicative parameter of a quasi-independence (constrained) model. {p 0 4} {cmd:u} the multiplicative parameter of a uniform association model. {p 0 4} {cmd:dist} the multiplicative parameter of a distance model. {p 0 4} {cmd:assoc} This allows users familiar with loglinear mobility models to specify an association pattern of their own choice. The argument for assoc should refer to a Stata program in which the variable {cmd:paras} is defined as a function of the row variable {cmd:orig} and the column variable {cmd:dest} to produce a loglinear pattern of associaton. {p 0 4} {cmd:format }Specify a valid format for printing results. The default is %8.3f. {p 0 4} {cmd:verbose} debugging information. {title:Remarks} {p} The basic idea of reclassifying cases in a perturbation analysis is that each case will have a high probability, say 95%, of being reclassified into the same category. The remaining cases could then be distributed evenly among the remaining categories. There are two problems with this approach. First, smaller categories will tend to grow and larger categories will shrink after reclassification. Second, the association between the original and the reclassified variable will be arbitrary, with some reclassification categories being more likely than others. Both problems occur more strongly to the extent that the variable in question is unevenly distributed. {p} {cmd:reclass} solves thes problems by creating an initial table of expected frequencies for the original by the reclassified variable, given the initial reclassification probabilities as specified by the {cmd:pcnttab} option. The parameters for an {it:appropriate} pattern of association between the original and the reclassified variable are derived from this table. Then an adjusted table of expected frequencies is created with the pattern of a association found, such that the expected frequency distribution of the reclassified variable is identical to that of the original. {p} When a single percentage is specified in {cmd:pcnttab}, this percentage is used for the diagonal cells of the initial reclassification probabilities, with the remaining percentages distributed evenly among the other categories. The {it:appropriate} pattern of association in this case is a "quasi-independence (constrained)" loglinear mobility model (Hout 1983, Goodman 1984). The QI-C model makes the odds of reclassification to the same/different category the same for all categories. In addition, the reclassified category is independent of the original category, given that they are not the same. This model is fitted to the intital table of expected frequencies and a single {cmd:ln(q)} parameter is reported. This {cmd:ln(q)} parameter is used to create the adjusted table. {p} If the argument to {cmd:pcnttab} consists of a row or column vector of percentages, {cmd:reclass} assumes uses different odds for reclassification to the same versus different categories for each category. The {cmd:pcnttab} argument forms the diagonal of the initial reclassification percentages, the remaining percentages are distributed evenly among the other categories. A quasi-independence model is fitted to the initial table of expected frequencies, with separate odds per category for reclassification to the same versus a different category. An {cmd:ln(q)} parameter is reported for each category of {it:varname}. These parameters are then used to create the adjusted table. {p} If the argument to {cmd:pncttab} consists of a square matrix of percentages, {cmd:reclass} assumes that {it:varname} is an ordered categorical variable. Consequently, the percentages should be constructed so that short distance reclassification is more likely than long distance reclassification. {cmd:reclass} fits two models to the intial table of expected frequencies, a "quasi-distance" model and a "quasi-uniform association" model. {p} Both make short distance reclassification more likely than long distance but this is even more pronounced for a quasi-uniform association model than for a quasi-distance model. A quasi-distance model makes the likelihood of reclassification proportionately lower for each step away from the main diagonal. A quasi-uniform model on the other hand is equivalent to a squared distance model, reclassification is proportionately less likely for the squared number of steps from the main diagonal. {p} Both models include an {cmd:ln(q)} parameter that increases the likelihood of reclassification to the same category without affecting short or long distance reclassification. The quasi-uniform model also reports an {cmd:ln(u)} parameter, quasi-distance reports an {cmd:ln(dist)} parameter. The best fitting model is chosen by {cmd:reclass} and the deviance and df are reported. {p} If the patterns of associatioh used by {cmd:reclass} are not in fact appropriate to the problem at hand, the {cmd:nobestmod} option could be used. The final reclassification percentages will the be as close as possible to those in the {cmd:pcnttab} option. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those of the original, leading to some discrepancies. {p} Alternatively, {cmd:reclass} could be run using the {cmd:noadjust} option. The returned result {cmd:e(gentab)} is then equal to the initial table of expected frequencies. {cmd:e(gentab)} could be used in a loglinear analysis to derive an appropriate model of association, which could then be specified in the {cmd:assoc} option. {p} The adjusted table is created using a loglinear model of equal main effects (a halfway model) and the appropriate pattern of association as an offset variable. This is fitted to an arbitrary table with the frequency distribution of {it:varname} as both its row and column marginals. The predicted frequencies of this model form a symmetric table with the pre-specified marginals and pattern of association (Hendrickx, 2004; Kaufman & Schervish, 1986). {p} If the {cmd:q}, {cmd:u}, or {cmd:dist} parameters are known, these can be specified directly in the corresponding options. For other patterns of association, the {cmd:assoc} option could be used. {cmd:assoc} should refer to a small program that defines the variable {cmd:paras} in terms of {cmd:orig} and {cmd:dest} to produce a loglinear pattern of association. For example: {input:program define q25u5} {input: gen paras=(orig==dest)*ln(25) + orig*dest*ln(5)} {input:end} {p} The command {cmd:reclass }{it:myvar}{cmd:, assoc(q25u5)} will produce reclassification probabilities for the variable {it:myvar} using the program q25u5. This is equivalent to using the options {cmd:q(25)} and {cmd:u(5)}. Other loglinear mobility models can be defined in a similar fashion. {title:Saved results} {p 0 4} {cmd:r(gentab)} {break}the initial expected table for the original by reclassified variable {p 0 4} {cmd:r(margin)} {break}the frequency distribution of {it:varname} {p 0 4} {cmd:r(classprob)} {break}the cumulative reclassification probabilities. These will be used by {help perturb} to randomly reclassify {it:varname}. {title:References} {p 0 4} Goodman, Leo A. (1984). {it:The analysis of cross-classified data having ordered categories}. Cambridge, Mass.: Harvard University Press. {p 0 4} Hendrickx, J. (2004). Using standardised tables for interpreting loglinear models. Submitted to {it:Quality & Quantity}. {p 0 4} Hout, M. (1983). {it:Mobility tables}. Beverly Hills: Sage Publications. {p 0 4} Kaufman, R.L., & Schervish, P.G. (1986). Using adjusted crosstabulations to interpret log-linear relationships. {it:American Sociological Review} 51:717-733 {p 0 4} {browse "http://www.xs4all.nl/~jhckx/perturb/":http://www.xs4all.nl/~jhckx/perturb/} {p} Direct comments to: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx} {p} {cmd:reclass} is available at {browse "http://ideas.uqam.ca/ideas/data/bocbocode.html":SSC-IDEAS}. Use {help ssc} {cmd:install perturb} to obtain the latest version. {title:Also see} {p 0 21} On-line: help for {help vif}, {help collin}, {help coldiag}, {help coldiag2}, {help perturb} {p_end}