Deviation contrast transformation for estimation results
devcon [ , groups(glist) equations(numlist) check[(#)] nonoise level(#) ]
where glist is
varlist1 [(varname1)] [, varlist2 [(varname2)] [, ...] ]
Description
A categorical regressor is usually included in a regression model using a set of 0/1 dummies differentiating the effects of the separate categories of the variable. A coefficient associated with such a dummy variable reflects the expected outcome difference between the represented category and some reference category. Since one of the categories serves as the reference category, only k-1 dummy variables are used for a k-category variable.
devcon may be used to transform the coefficients of such 0/1 dummy variables so that they reflect deviations from the "grand mean" (in other words, the modified coefficients will sum up to zero over all categories) rather than deviations from the reference category. The transformed coefficients are equivalent to those obtained by using the so called "effects coding" for the dummy variables (see the e prefix in xi3 or the dev() contrast in desmat; both packages are available from the SSC Archive). However, devcon reports coefficients for all categories (including the category that was used as the reference category in the original model) and modifies the model's constant accordingly (with the effects coding, the coefficient of one of the categories is "hidden" in the constant). Furthermore, the coding of the underlying dummy variables is still 0/1 with devcon.
The deviation contrast transformation is applied to the last (i.e. currently active) estimates. Use the groups() option to define the group(s) of dummy variables. devcon specified without the groups() option may be used to redisplay estimates that have already been transformed by devcon. The devcon routine will work after most estimation commands (see help estcom). Multiple equation models are supported. Use the equations() option to specify the equation(s) to be transformed. Note that devcon also transforms the variance-covariance matrix of the estimates and that the usual post estimation commands such as predict or test may be used with the transformed estimates.
The devcon command has two main benefits. First, it may be very convenient to use devcon to quickly display the deviation contrasts without having to change the coding of the variables and without having to take further action to make the reference category's coefficient visible. Second, the transformed estimates may be valuable for use with some post-estimation procedures. In fact, devcon was originally developed for use with with the Oaxaca-Blinder decomposition (see help oaxaca if installed; the package is available from the SSC Archive, type ssc describe oaxaca). In this decomposition, the results for categorical variables depend on the choice of the reference category (see, e.g., Oaxaca and Ransom 1999). Applying the deviation contrast transformation to the estimates before conducting the decomposition is one solution to this problem (see Yun 2003).
Technical note
The deviation contrast transform can also be applied to the variables used to model an interaction between a categorical and a continuous variable. The relevant continuous variable must be provided in parentheses within the groups() option in such a case.
Options
groups(glist) defines the dummy-variable groups. If more than one group is specified, use commas to separate the groups. Note that in each of the groups a variable reflecting the reference category must be specified (i.e. the variable must exist in the data). If the variables in a group represent interactions with a continuous variable, specify the continuous variable in parentheses at the end of the group. The usual shorthand conventions apply to the varlists specified in glist (see help varlist).
equations(numlist) is relevant only for multiple-equation models. It specifies the equation(s) to be transformed. Use numbers to refer to the equations' positions in the model (1 for the first equation, 2 for the second, and so on). The usual shorthand conventions apply to numlist (see help numlist). The default is equations(1).
check[(#)] checks the integrity of the normalized estimates by verifying that the linear predictions from the original estimates and the normalized estimates are equal for all observations in the estimation sample. If the results do not pass the check, an error message is issued and no results are returned. By default, the check is performed using the models's first equation. To use another equation, specify its number in parentheses. A failed check indicates that the dummy variables used are not well defined (i.e. that the indicated groups overlap or that at least one group has been omitted). In rare cases, however, the results might fail the check even though the dummy variables have been correctly defined (devcon uses the information in e(sample) and, if available, e(subpop) to determine the sample of relevant cases; situations may arise in which the sample would have to be narrowed further).
nonoise suppresses the display of the transformed estimates.
level(#) specifies the confidence level, in percent terms, for the confidence intervals of the coefficients; see help level.
Example
Standard application with one categorical variable ...
. sysuse auto . generate rep1 = rep78 <= 3 if rep78 < . . generate rep2 = rep78 == 4 if rep78 < . . generate rep3 = rep78 == 5 if rep78 < . . logit foreign mpg rep2 rep3, nolog . devcon, groups(rep1 rep2 rep3)
... and interactions with a continuous variable:
. generate mpgrep1 = mpg * rep1 . generate mpgrep2 = mpg * rep2 . generate mpgrep3 = mpg * rep3 . logit foreign mpg rep2 rep3 mpgrep2 mpgrep3, nolog . devcon, groups(rep1 rep2 rep3, mpgr* (mpg))
Transforming OLS estimates for use with the Blinder-Oaxaca decomposition (oaxaca is available from the SSC Archive):
. reg lnwage educ expr expr2 single divorced if female==0 . devcon , groups(married single divorced) . estimates store male . reg lnwage educ expr expr2 single divorced if female==1 . devcon , groups(married single divorced) . estimates store female . oaxaca male female, detail
Methods and Formulas
Consider the model
y = a + b_1*D_1 + b_2*D_2 + e
where "a" is the constant and "e" is the error. D_1 and D_2 are two 0/1 dummy variables representing a polytomous variable with three categories. Alternatively, the above equation can be formulated as
y = a + b_1*D_1 + b_2*D_2 + b_3*D_3 + e
with b_3 constrained to zero and D_3 being the indicator for the (omitted) reference category. Now define c as
c = (b_1 + b_2)/3
and let
a' = a + c b_1' = b_1 - c b_2' = b_2 - c b_3' = b_3 - c = -c
devcon then reports the equation
y = a' + b_1'*D_1 + b_2'*D_2 + b_3'*D_3 + e
More generally,
c = (b_1 + b_2 + ... + b_{k-1}) / k
for a k-category variable.
The transformation can also be applied to interaction terms. Consider the model
y = a + b_1*DX_1 + b_2*DX_2 + d*X + e
where X is a continuous variable and DX_1 and DX_2 are the interaction terms, i.e. DX_1 = D_1*X and DX_2 = D_2*X. The deviation contrast transformation is then
y = a + b_1'*DX_1 + b_2'*DX_2 + b_3'*DX_3 + d'*X + e
where
b_1' = b_1 - c b_2' = b_2 - c b_3' = b_3 - c = -c d' = d + c
devcon also transforms the variance-covariance matrix of the coefficients, applying the general formula for the variances and covariances of weighted sums of random variables (see Mood et al. 1974:179).
References
Mood, A.M., F.A. Graybill, D.C. Boes (1974). Introduction to the Theory of Statistics, 3. edn. New York: McGraw-Hill. Oaxaca, R.L., Ransom, M.R. (1999). Identification in Detailed Wage Decompositions. The Review of Economics and Statistics 81: 154-157. Yun, M.-S. (2003). A Simple Solution to the Identification Problem in Detailed Wage Decompositions. IZA Discussion Paper No. 836.
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see