------------------------------------------------------------------------------- help forreclassJohn Hendrickx -------------------------------------------------------------------------------

reclass

reclassvarname,[pcnttab(string)adjustbestmodq(real)u(real)dist(real)assoc(string)misclass(real)format(string)verbose]

Description

reclassis called by perturb to create a table of reclassification probabilities for use a perturbation analysis. It can be used separately to experiment with different association patterns.varnameshould refer to a categorical variable.reclasscreates a table of reclassification probabilities such that the expected frequencies of the reclassified variable will be equal to the frequency distribution ofvarname. In addition, an appropriate association is imposed between the reclassified variable andvarname.

Options

pcnttabcan be either a single value, a row or column matrix, or a square matrix. Usually, a single value between 0 and 100 will be specified indicating the percentage cases to be reclassified to the same category.If a row or column matrix is specified its dimensions must correspond with the number of categories of

varname. Values should be between 0 and 100 and indicate the percentage of cases to be reclassified to the same category for each category separately.If a square matrix is specified, its dimensions must correspond with the number of categories of

varname. The matrix should indicate the reclassification probabilities with the original variable in the rows and the reclassified variables in the columns. Either percentages or probabilities may be used. It is not necessary for these to add to 100 or to 1 respectively asreclasstransforms them into columnwise proportions.In most cases, the

pcnttaboption will suffice. The options below are useful for users familiar with loglinear models for square tables (mobility models).

adjustBy default,reclassdefines reclassification probabilities such that the expected frequencies of the reclassified variable are the same as those ofvarnamewhen thepcnttaboption is used. Usenoadjustto suppress this and use the percentages specified in thepcnttaboption unmodified.noadjustimpliesnobestmod.

bestmodBy default,reclassimposes an appropriate pattern of association betweenvarnameand its reclassified counterpart when thepcnttaboption is used. Usenobestmodto avoid this. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those ofvarnamebut they will otherwise be close approximations of the values specified in thepcnttaboption.

misclassMaintained for compatiblity with the original version ofmisclassand perturb. Translated byreclassintopcnttab(100-misclass) noadjust.The options below can be used to specify the parameters of a pattern of association. They will be ignored if

pncttabwas specified.

qthe multiplicative parameter of a quasi-independence (constrained) model.

uthe multiplicative parameter of a uniform association model.

distthe multiplicative parameter of a distance model.

assocThis allows users familiar with loglinear mobility models to specify an association pattern of their own choice. The argument for assoc should refer to a Stata program in which the variableparasis defined as a function of the row variableorigand the column variabledestto produce a loglinear pattern of associaton.

formatSpecify a valid format for printing results. The default is %8.3f.

verbosedebugging information.

RemarksThe basic idea of reclassifying cases in a perturbation analysis is that each case will have a high probability, say 95%, of being reclassified into the same category. The remaining cases could then be distributed evenly among the remaining categories. There are two problems with this approach. First, smaller categories will tend to grow and larger categories will shrink after reclassification. Second, the association between the original and the reclassified variable will be arbitrary, with some reclassification categories being more likely than others. Both problems occur more strongly to the extent that the variable in question is unevenly distributed.

reclasssolves thes problems by creating an initial table of expected frequencies for the original by the reclassified variable, given the initial reclassification probabilities as specified by thepcnttaboption. The parameters for anappropriatepattern of association between the original and the reclassified variable are derived from this table. Then an adjusted table of expected frequencies is created with the pattern of a association found, such that the expected frequency distribution of the reclassified variable is identical to that of the original.When a single percentage is specified in

pcnttab, this percentage is used for the diagonal cells of the initial reclassification probabilities, with the remaining percentages distributed evenly among the other categories. Theappropriatepattern of association in this case is a "quasi-independence (constrained)" loglinear mobility model (Hout 1983, Goodman 1984). The QI-C model makes the odds of reclassification to the same/different category the same for all categories. In addition, the reclassified category is independent of the original category, given that they are not the same. This model is fitted to the intital table of expected frequencies and a singleln(q)parameter is reported. Thisln(q)parameter is used to create the adjusted table.If the argument to

pcnttabconsists of a row or column vector of percentages,reclassassumes uses different odds for reclassification to the same versus different categories for each category. Thepcnttabargument forms the diagonal of the initial reclassification percentages, the remaining percentages are distributed evenly among the other categories. A quasi-independence model is fitted to the initial table of expected frequencies, with separate odds per category for reclassification to the same versus a different category. Anln(q)parameter is reported for each category ofvarname. These parameters are then used to create the adjusted table.If the argument to

pncttabconsists of a square matrix of percentages,reclassassumes thatvarnameis an ordered categorical variable. Consequently, the percentages should be constructed so that short distance reclassification is more likely than long distance reclassification.reclassfits two models to the intial table of expected frequencies, a "quasi-distance" model and a "quasi-uniform association" model.Both make short distance reclassification more likely than long distance but this is even more pronounced for a quasi-uniform association model than for a quasi-distance model. A quasi-distance model makes the likelihood of reclassification proportionately lower for each step away from the main diagonal. A quasi-uniform model on the other hand is equivalent to a squared distance model, reclassification is proportionately less likely for the squared number of steps from the main diagonal.

Both models include an

ln(q)parameter that increases the likelihood of reclassification to the same category without affecting short or long distance reclassification. The quasi-uniform model also reports anln(u)parameter, quasi-distance reports anln(dist)parameter. The best fitting model is chosen byreclassand the deviance and df are reported.If the patterns of associatioh used by

reclassare not in fact appropriate to the problem at hand, thenobestmodoption could be used. The final reclassification percentages will the be as close as possible to those in thepcnttaboption. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those of the original, leading to some discrepancies.Alternatively,

reclasscould be run using thenoadjustoption. The returned resulte(gentab)is then equal to the initial table of expected frequencies.e(gentab)could be used in a loglinear analysis to derive an appropriate model of association, which could then be specified in theassocoption.The adjusted table is created using a loglinear model of equal main effects (a halfway model) and the appropriate pattern of association as an offset variable. This is fitted to an arbitrary table with the frequency distribution of

varnameas both its row and column marginals. The predicted frequencies of this model form a symmetric table with the pre-specified marginals and pattern of association (Hendrickx, 2004; Kaufman & Schervish, 1986).If the

q,u, ordistparameters are known, these can be specified directly in the corresponding options. For other patterns of association, theassocoption could be used.assocshould refer to a small program that defines the variableparasin terms oforiganddestto produce a loglinear pattern of association. For example:program define q25u5 gen paras=(orig==dest)*ln(25) + orig*dest*ln(5) end

The command

reclassmyvar, assoc(q25u5)will produce reclassification probabilities for the variablemyvarusing the program q25u5. This is equivalent to using the optionsq(25)andu(5). Other loglinear mobility models can be defined in a similar fashion.

Saved results

r(gentab)the initial expected table for the original by reclassified variable

r(margin)the frequency distribution ofvarname

r(classprob)the cumulative reclassification probabilities. These will be used by perturb to randomly reclassifyvarname.

ReferencesGoodman, Leo A. (1984).

The analysis of cross-classified data having orderedcategories. Cambridge, Mass.: Harvard University Press.Hendrickx, J. (2004). Using standardised tables for interpreting loglinear models. Submitted to

Quality & Quantity.Hout, M. (1983).

Mobility tables. Beverly Hills: Sage Publications.Kaufman, R.L., & Schervish, P.G. (1986). Using adjusted crosstabulations to interpret log-linear relationships.

American Sociological Review51:717-733http://www.xs4all.nl/~jhckx/perturb/

Direct comments to: John Hendrickx

reclassis available at SSC-IDEAS. Use sscinstall perturbto obtain the latest version.

On-line: help for vif, collin, coldiag, coldiag2, perturbAlso see