{smcl}
{hline}
help for {hi:shapley2} {right:Version 1.5}
{hline}
{title: Computing the shapley values after a regression command}
{p 8 17 2}{cmd:shapley2}
, {cmdab:s:tat}{cmd:(}{it:str}{cmd:)} [{cmdab:c:ommand}{cmd:(}{it:str}{cmd:)}
{cmdab:d:epvar}{cmd:(}{it:{depvars}}{cmd:)}
{cmdab:i:ndepvars}{cmd:(}{it:{indepvars}}{cmd:)}
{cmdab:gr:oups}{cmd:(}{it:special}{cmd:)}
{cmd:force}
{cmdab:mem:ory}
{cmdab:noisily}]
{title:Description}
{p 4 4 2} This command performs a Shorrocks-Shapely decomposition of many estimation statistics such as the R squared in the OLS regression. It provides an additive decomposition of the statistic, allowing
you to see the relative contribution of each regressor. The command is thought as a post-estimation command, hence you should use it right after the estimation.
{p 4 4 2} {cmd:Comparison shapley vs. shapley2}
{p 8 10 2} - {cmd:shapley2} is faster than {stata findit shapley:shapley} but provides the same results (numerical differences are possible). However, the computation still takes some time and the maximum amount of RHS variables is 20.
{p 8 10 2} - {cmd:shapley2} allows you to regroup several regressors into groups and to compute the relative importance of the whole group. This allows as well to accelerate the computation.
{p 8 10 2} - {cmd:shapley2} is designed as a post-command routine. The model specifications are extracted from the previous estimation (ols, probit).
{p 4 4 2} All four columns are stored as matrices in the estimation (type {stata ereturn list} to see all stored values), hence you can use them for instance with {it:estout} to publish the results in LaTeX.
{title:Options}
{p 4 8 2}{cmdab:s:tat}{cmd:(}{it:str}{cmd:)} Indicate here the eclass-statistic for which you want to perform the decomposition. For instance, if you want to decompose the Rsquared of the OLS estimation
use {it:r2}, for the pseudo Rsquared of a probit {it:r2_p}. To find you want statistics are available, type {it:{stata ereturn list}} after the estimation.
{p 4 8 2}{cmdab:c:ommand}{cmd:(}{it:varlist}{cmd:)} Normally the command extracts from the previous estimation the model to estimate, if you want to overwrite this, you can use this option, for instance
{it:command(probit)} to estimate a probit instead of a dprobit.
{p 4 8 2}{cmdab:d:epvar}{cmd:(}{it:varlist}{cmd:)} Normally the command extracts from the previous estimation the dependent variable. If you want to change it or if the command
does it wrongly, you can specify it here.
{p 4 8 2}{cmdab:i:ndepvars}{cmd:(}{it:varlist}{cmd:)} Normally the independent variables are directly extracted from the previous estimation, however, you can change it specifying this option.
{p 4 8 2}{cmdab:gr:oups}{cmd:(}{it:special}{cmd:)} Instead of computing the shapley value for each variable, it might be interesting to do it by groups of variables. This allows computing the shapley value also
when having a lot of variables. Write as a string all variables you want to analyze and separate the groups by comma. Make sure all variables of the regression are in the list. For instance, if you have
four variables x1,x2,x3 and x4 you can use the option 'group(x1 x2,x3 x4)' to treat the first two variables as a group and the latter two as a second group.
{p 4 8 2}{cmd:force} The command is limited to 20 RHS variables, because otherwise the number of runs becomes to large. If you still want to perform it, use the option {cmd:force}
(this might take a long tiem though)
{p 4 8 2}{cmdab:mem:ory} Use this option if you want to allow {cmd:shapley2} to change the memory used by Stata. This is no longer needed for Stata 12, since memory adapts on the fly.
{p 4 8 2}{cmdab:n:oisily} Use this option to see some of the intermediate estimations.
{title:Example}
. {stata webuse auto,clear}
(1978 Automobile Data)
. {stata reg price mpg headroom trunk}
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 7.46
Model | 153861671 3 51287223.7 Prob > F = 0.0002
Residual | 481203725 70 6874338.93 R-squared = 0.2423
-------------+------------------------------ Adj R-squared = 0.2098
Total | 635065396 73 8699525.97 Root MSE = 2621.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -224.3597 65.27511 -3.44 0.001 -354.5468 -94.17263
headroom | -659.463 484.5101 -1.36 0.178 -1625.788 306.8619
trunk | 126.6049 107.2399 1.18 0.242 -87.27846 340.4882
_cons | 11175.77 2431.134 4.60 0.000 6327.029 16024.52
------------------------------------------------------------------------------
. {stata estimates store reg1} //Only needed to run 'shapley2' twice
. {stata shapley2, stat(r2)}
Factor | Shapley value | Per cent
| (estimate) | (estimate)
-----------+---------------+-----------+
mpg | 0.17207 | 71.02 % |
headroom | 0.01492 | 6.16 % |
trunk | 0.05528 | 22.82 % |
-----------+---------------+-----------+
TOTAL | 0.24228 | 100.00 % |
-----------+---------------+-----------+
. {stata estimates restore reg1} //Only needed to run 'shapley2' twice
. {stata shapley2, stat(r2) group(mpg,headroom trunk)}
Factor | Shapley value | Per cent
| (estimate) | (estimate)
-----------+---------------+-----------+
Group 1 | 0.17373 | 71.71 % |
Group 2 | 0.06854 | 28.29 % |
-----------+---------------+-----------+
TOTAL | 0.24228 | 100.00 % |
-----------+---------------+-----------+
Groups are:
Group 1: mpg
Group 2: headroom trunk
{p 4 4 2}This very simple example computed the additive decomposition of the R squared for the case of a simple {help "regress":OLS} estimation. The results shows above indicate that {it:mpg} accounts
for about 70% of the R squared, while the contribution of {it:headroom} is less than 6%. In the second shapley2 command the two latter regressors are regrouped. Note that there are some small numerical differences, since
this is an abbreviated version to compute the shapley value. The numerical differences are very small in general.
{title:Known issues}
{p 4 6 2} - The program has been tested on OLS, probit, logit and ordered logit model. It is supposed to work also for other models, as long as there is only one parameter by regressor (e.g. for mlogit the
method is not defined and the program will not work.}
{p 4 6 2} - Factor variables ({help fvvarlist}) such as i.var are currently not (yet) supported. F. and l. factor variables for the dependent variable are supported, but be careful and double check
with a manually created variable.
{p 4 6 2} - If you find another issue, please send me an email indicating the problem.
{title:Acknowledgements}
{p 4 6 2} I would like to thank Philippe Jacquart, Lian Yujun, Sultan Orazbayev, Marcos Robles and Ahmed Abdalla for indicating small bugs in the program and the documentation.
{title:Alternatives}
{p 4 6 2} shapley2 is not the only package allowing to compute the Shapley value. Alternatives include
'shapley', 'rego' (available at {browse "http://www.uni-leipzig.de/~rego/"}), 'shapleyx', 'adecompt'
{title:Author}
{p 4 4 2} Florian Wendelspiess Chavez Juarez. CIDE, Mexico City. {browse "mailto:florian@chavezjuarez.com":florian@chavezjuarez.com}
{title:References}
{p 4 4 2} Shorrocks, Anthony. "Inequality Decomposition by Factor Components", {it:Econometrica}, Vol. 50, No. 1 (Jan., 1982), pp. 193-211. Available at: {browse "http://www.jstor.org/stable/1912537":http://www.jstor.org/stable/1912537}