------------------------------------------------------------------------------- help for shapley2 Version 1.0 -------------------------------------------------------------------------------

Computing the shapley values after a regression command

shapley2 , stat(str) [command(str) depvar(depvarlist) indepvars( indepvars) groups(special) force memory noisily]

Description

This command performs a Shorrocks-Shapely decomposition of many estimation statistics such as the R squared in the OLS regression. It provides an additive decomposition of the statistic, allowing you to see the relative contribution of each regressor. The command is thought as a post-estimation command, hence you should use it right after the estimation.

Comparison shapley vs. shapley2

- shapley2 is faster than shapley but yields to the same results (numerical differences are possible). However, the computation still takes some time and the maximum amount of RHS variables is 20.

- shapley2 allows you to regroup several regressors into groups and to compute the relative importance of the whole group. This allows as well to accelerate the computation.

- shapley2 is designed as a post-command routine. The model specifications are extracted from the previous estimation (ols, probit).

All four columns are stored as matrices in the estimation (type ereturn list to see all stored values), hence you can use them for instance with estout to publish the results in LaTeX.

Options

stat(str) Indicate here the eclass-statistic for which you want to perform the decomposition. For instance, if you want to decompose the Rsquared of the OLS estimation use r2, for the pseudo Rsquared of a probit r2_p. To find you want statistics are available, type {it: ereturn list after the estimation.

command(varlist) Normally the command extracts from the previous estimation the model to estimate, if you want to overwrite this, you can use this option, for instance command(probit) to estimate a probit instead of a dprobit.

depvar(varlist) Normally the command extracts from the previous estimation the dependent variable. If you want to change it or if the command does it wrongly, you can specify it here.

indepvars(varlist) Normally the independent variables are directly extracted from the previous estimation, however, you can change it specifying this option.

groups(special) Instead of computing the shapley value for each variable, it might be interesting to do it by groups of variables. This allows computing the shapley value also when having a lot of variables. Write as a string all variables you want to analyze and separate the groups by comma. Make sure all variables of the regression are in the list. For instance, if you have four variables x1,x2,x3 and x4 you can use the option 'group("x1 x2,x3 x4")' to treat the first two variables as a group and the latter two as a second group.

force The command is limited to 20 RHS variables, because otherwise the number of runs becomes to large. If you still want to perform it, use the option force (this might take a long tiem though)

memory Use this option if you want to allow shapley2 to change the memory used by Stata. This is no longer needed for Stata 12, since memory adapts on the fly.

noisily Use this option to see some of the intermediate estimations.

Example

. webuse auto,clear (1978 Automobile Data)

. reg price mpg headroom trunk

Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 3, 70) = 7.46 Model | 153861671 3 51287223.7 Prob > F = 0.0002 Residual | 481203725 70 6874338.93 R-squared = 0.2423 -------------+------------------------------ Adj R-squared = 0.2098 Total | 635065396 73 8699525.97 Root MSE = 2621.9

------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -224.3597 65.27511 -3.44 0.001 -354.5468 -94.17263 headroom | -659.463 484.5101 -1.36 0.178 -1625.788 306.8619 trunk | 126.6049 107.2399 1.18 0.242 -87.27846 340.4882 _cons | 11175.77 2431.134 4.60 0.000 6327.029 16024.52 ------------------------------------------------------------------------------

. shapley2, stat(r2)

Factor | Shapley value | Per cent | Shapley value| Per cent | (estimate) | (estimate)| (normalized) | (normalized) -----------+---------------+-----------+--------------+------------- mpg | 0.17124 | 70.68 % | 0.17302 | 71.41 % headroom | 0.01409 | 5.82 % | 0.01424 | 5.88 % trunk | 0.05445 | 22.48 % | 0.05502 | 22.71 % -----------+---------------+-----------+--------------+------------- Residual | 0.00249 | 1.03 % | | -----------+---------------+-----------+--------------+------------- TOTAL | 0.24228 | 100.00 % | 0.24228 | 100.00 % -----------+---------------+-----------+--------------+-------------

. shapley2, stat(r2) group(mpg,headroom trunk)

Factor | Shapley value | Per cent | Shapley value| Per cent | (estimate) | (estimate)| (normalized) | (normalized) -----------+---------------+-----------+--------------+------------- Group 1 | 0.17373 | 71.71 % | 0.17373 | 71.71 % Group 2 | 0.06854 | 28.29 % | 0.06854 | 28.29 % -----------+---------------+-----------+--------------+------------- Residual | 0.00000 | 0.00 % | | -----------+---------------+-----------+--------------+------------- TOTAL | 0.24228 | 100.00 % | 0.24228 | 100.00 % -----------+---------------+-----------+--------------+------------- Groups are: Group 1: mpg Group 2: headroom trunk

This very simple example computed the additive decomposition of the R squared for the case of a simple OLS estimation. The results shows above indicate that mpg accounts for about 70% of the R squared, while the contribution of headroom is less than 6%. In the second shapley2 command the two latter regressors are regrouped. Note that there are some small numerical differences, since this is an abbreviated version to compute the shapley value. The numerical differences are very small in general.

Known issues - The program has been tested on OLS, probit, logit and ordered logit model. It is supposed to work also for other models, as long as there is only one parameter by regressor (e.g. for mlogit the method is not defined and the program will not work.}

- The shapley values in the first column do not add up to the true value. This is true whenever the regressors are correlated. In this case, the residual part is indicated, which corresponds to the part of the analyzed statistic that could not be attributed. The last two columns ignore this residual and normalize the values in order to add up to the true value (or 100%).

- If you find another issue, please send me an email indicating the problem.

Author

Florian Wendelspiess Chávez Juárez. University of Geneva, Department of Economics: florian@chavezjuarez.com

References Shorrocks, Anthony. "Decomposition by Factor Components", Econometrica, Vol. 50, No. 1 (Jan., 1982), pp. 193-211. Available at: