EC 761 Fall 1998 Available from the class homepage: http://fmwww.bc.edu/ec-c/f98/ec761.f98.html This handout reproduces the Stata documentation for the 'colldiag' command and then illustrates, with a constructed regressor, how increasing correlation among regressors increases the maximal condition index and generates diagnostics for its identification. ------------------------------------------------------------------------------- help for colldiag (STB-25: sg32.1) ------------------------------------------------------------------------------- Collinearity Diagnostics ------------------------ colldiag [nocons] Description ----------- colldiag calculates and displays the matrix of variance decomposition proportions for the independent variables in a linear regression model. This command must follow a call to fit. Remarks ------- In the case of orthogonal predictors, the variance decomposition proportion matrix would be an identity matrix. One should examine the dependencies of the variances on the principal components, by focusing on the decomposition of the variables associated with high condition numbers. The condition number is a measure of the dependence of the independent variables. Typical values used are (n* = ) 10, 15, or even 30. As, you look at the row associated with the high condition numbers, you should note the variance decomposition propor- tions that are higher than some threshold value (like p*=.50). You should note the following: 1) The independent variable will have a degraded coefficient because of a near dependency if it is one of two or more variates with variance-decomposition proportions in excess of some threshold value p*, such as .50. The number of near dependencies is the number of condition numbers greater than the threshold value n*. 2) Those variates whose aggregate variance-decomposition proportion exceed the threshold value p* are involved in at least one of the dependencies. The aggregate is formed over the competing condition numbers (condition numbers of the same order of magnitude that exceed the threshold value n*). 3) A dominating dependency occurs when the condition number is an order of magnitude larger than the other condition numbers. This can obscure information about the variate's simultaneous involvement in a weaker dependency. In this case, additional analysis is warranted to investigate the relationships of all potentially involved variates. Example ------- We have data on men involved in a physical fitness course. The purpose of the study is to model the oxygen uptake rate by the age, weight, time to run one and a half miles, the heart rate while resting, heart rate while running, and the maximum heart rate while running. de Contains data from fitness.dta Obs: 31 (max= 50172) Fitness data Vars: 7 (max= 99) 16 Nov 1994 15:47 Width: 28 (max= 200) 1. age float %9.0g 2. weight float %9.0g 3. oxy float %9.0g 4. runtime float %9.0g 5. rstpulse float %9.0g 6. runpulse float %9.0g 7. maxpulse float %9.0g Sorted by: fit oxy age weight runtime rstpulse runpulse maxpulse Source | SS df MS Number of obs = 31 ---------+------------------------------ F( 6, 24) = 22.43 Model | 722.543528 6 120.423921 Prob > F = 0.0000 Residual | 128.837947 24 5.3682478 R-squared = 0.8487 ---------+------------------------------ Adj R-squared = 0.8108 Total | 851.381475 30 28.3793825 Root MSE = 2.3169 ------------------------------------------------------------------------------ oxy | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- age | -.2269738 .0998375 -2.273 0.032 -.4330282 -.0209194 weight | -.0741774 .0545932 -1.359 0.187 -.1868521 .0384974 runtime | -2.628653 .3845622 -6.835 0.000 -3.42235 -1.834955 rstpulse | -.0215336 .0660543 -0.326 0.747 -.1578629 .1147957 runpulse | -.3696278 .1198529 -3.084 0.005 -.6169921 -.1222634 maxpulse | .3032171 .1364952 2.221 0.036 .0215049 .5849294 _cons | 102.9345 12.40326 8.299 0.000 77.33541 128.5335 ------------------------------------------------------------------------------ We are somewhat concerned that there may be a dependency among the pulse variables and investigate this with the new diagnostic tool. colldiag Proportion of variance associated with the decomposition Cond | Number | age weight runtime rstpulse runpulse maxpulse _cons ---------+-------------------------------------------------------------------- 1 | 0.0002 0.0002 0.0002 0.0003 0.0000 0.0000 0.0000 19.2909 | 0.1463 0.0104 0.0252 0.3906 0.0000 0.0000 0.0022 21.5007 | 0.1501 0.2357 0.1286 0.0281 0.0012 0.0012 0.0006 27.6212 | 0.0319 0.1831 0.6090 0.1903 0.0015 0.0012 0.0064 33.8292 | 0.1128 0.4444 0.1250 0.3648 0.0151 0.0083 0.0013 82.6376 | 0.4966 0.1033 0.0975 0.0203 0.0695 0.0056 0.7997 196.786 | 0.0621 0.0228 0.0146 0.0057 0.9128 0.9836 0.1898 If we use 30 as our value for n* and .50 as our threshold for p*, then we see that points 2 and 3 from above are exhibited in our output. The competing dependency is for the condition numbers 33.8292 and 82.6376 which are of the same order of magnitude and both exceed our threshold value of 30. Aggregating the variance-decomposition proportions, we note that age (.1128+.4966=.6014), weight (.4444+.1033=.5477), and the constant (.0013+.7997=.8010) are involved in a competing dependency. We also note that we have a dominating dependency with a condition number greater than 196 and involving the runpulse and maxpulse variables. Since we have 3 near dependencies (3 condition numbers greater than n*=30), we should be able to express 3 of our independent variables in terms of the remaining 4. How do we choose the variates for which to solve? Beginning with the largest condition number, we see that we should choose either runpulse or maxpulse. Since maxpulse has the remainder of it variance determined in a more removed dependency, we can choose it as our first dependent variable in the auxiliary regression. Now, since we are not as interested in the constant term, we may choose the weight and age as our remaining pivots. Author ------ James W. Hardin, Stata Corporation stata@stata.com See Also -------- STB: STB-25 sg32.1, STB-24 sg32 Manual: [5s] fit On-line: fit, vif if installed EXAMPLE OF COLLINEARITY INCREASING WITH CORRELATION OF REGRESSORS do ":Keewaydin:Desktop Folder:colldiag.do" . * colldiag EC761 cfb 8902 . set matsize 800 . use ":Keewaydin:Stata:auto.dta" (1978 Automobile Data) . gen pr=price/1000 . gen e = 100*invnorm(uniform()) . gen mpga=mpg+e . corr mpg mpga (obs=74) | mpg mpga --------+------------------ mpg| 1.0000 mpga| 0.1570 1.0000 . fit pr displ mpg Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 13.35 Model | 173.587101 2 86.7935503 Prob > F = 0.0000 Residual | 461.478281 71 6.4996941 R-squared = 0.2733 ---------+------------------------------ Adj R-squared = 0.2529 Total | 635.065382 73 8.69952578 Root MSE = 2.5494 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0105088 .0045855 2.292 0.025 .0013657 .019652 mpg | -.1211832 .0727884 -1.665 0.100 -.2663193 .0239528 _cons | 6.672765 2.29972 2.902 0.005 2.087254 11.25828 ------------------------------------------------------------------------------ . fit pr displ mpg mpga Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 3, 70) = 8.96 Model | 176.159644 3 58.7198815 Prob > F = 0.0000 Residual | 458.905737 70 6.55579625 R-squared = 0.2774 ---------+------------------------------ Adj R-squared = 0.2464 Total | 635.065382 73 8.69952578 Root MSE = 2.5604 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0103511 .0046121 2.244 0.028 .0011526 .0195497 mpg | -.1177836 .0733031 -1.607 0.113 -.2639819 .0284148 mpga | -.001717 .0027409 -0.626 0.533 -.0071835 .0037496 _cons | 6.648921 2.309938 2.878 0.005 2.041896 11.25595 ------------------------------------------------------------------------------ . lincom mpg+mpga ( 1) mpg + mpga = 0.0 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | -.1195005 .0731512 -1.634 0.107 -.2653961 .026395 ------------------------------------------------------------------------------ . colldiag Proportion of variance associated with the decomposition Cond | Number | displ mpg mpga _cons ---------+-------------------------------------- 1 | 0.0098 0.0040 0.0021 0.0021 1.67485 | 0.0012 0.0000 0.9532 0.0000 3.74772 | 0.2611 0.0653 0.0439 0.0015 16.3562 | 0.7278 0.9307 0.0007 0.9964 . gen mpgb=mpg+e/10 . corr mpg mpgb (obs=74) | mpg mpgb --------+------------------ mpg| 1.0000 mpgb| 0.5358 1.0000 . fit pr displ mpg mpgb Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 3, 70) = 8.96 Model | 176.159645 3 58.7198816 Prob > F = 0.0000 Residual | 458.905737 70 6.55579624 R-squared = 0.2774 ---------+------------------------------ Adj R-squared = 0.2464 Total | 635.065382 73 8.69952578 Root MSE = 2.5604 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0103511 .0046121 2.244 0.028 .0011526 .0195497 mpg | -.1023308 .0790545 -1.294 0.200 -.26 .0553384 mpgb | -.0171697 .0274091 -0.626 0.533 -.0718354 .037496 _cons | 6.648921 2.309938 2.878 0.005 2.041896 11.25595 ------------------------------------------------------------------------------ . lincom mpg+mpgb ( 1) mpg + mpgb = 0.0 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | -.1195005 .0731512 -1.634 0.107 -.2653961 .026395 ------------------------------------------------------------------------------ . colldiag Proportion of variance associated with the decomposition Cond | Number | displ mpg mpgb _cons ---------+-------------------------------------- 1 | 0.0056 0.0022 0.0134 0.0013 3.19432 | 0.1292 0.0033 0.2548 0.0010 6.02763 | 0.1645 0.1147 0.7129 0.0163 18.5765 | 0.7006 0.8799 0.0189 0.9814 . gen mpgc=mpg+e/100 . corr mpg mpgc (obs=74) | mpg mpgc --------+------------------ mpg| 1.0000 mpgc| 0.9832 1.0000 . fit pr displ mpg mpgc Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 3, 70) = 8.96 Model | 176.159648 3 58.7198828 Prob > F = 0.0000 Residual | 458.905734 70 6.55579619 R-squared = 0.2774 ---------+------------------------------ Adj R-squared = 0.2464 Total | 635.065382 73 8.69952578 Root MSE = 2.5604 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0103511 .0046121 2.244 0.028 .0011526 .0195497 mpg | .0521968 .2862681 0.182 0.856 -.5187469 .6231405 mpgc | -.1716973 .2740909 -0.626 0.533 -.7183543 .3749596 _cons | 6.648921 2.309938 2.878 0.005 2.041896 11.25595 ------------------------------------------------------------------------------ . lincom mpg+mpgc ( 1) mpg + mpgc = 0.0 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | -.1195006 .0731512 -1.634 0.107 -.2653961 .026395 ------------------------------------------------------------------------------ . colldiag Proportion of variance associated with the decomposition Cond | Number | displ mpg mpgc _cons ---------+-------------------------------------- 1 | 0.0052 0.0002 0.0002 0.0012 3.72551 | 0.2208 0.0013 0.0015 0.0001 17.2937 | 0.7714 0.0110 0.0195 0.9743 56.1544 | 0.0026 0.9875 0.9787 0.0244 . gen mpgd=mpg+e/1000 . corr mpg mpgd (obs=74) | mpg mpgd --------+------------------ mpg| 1.0000 mpgd| 0.9998 1.0000 . fit pr displ mpg mpgd Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 3, 70) = 8.96 Model | 176.159594 3 58.7198647 Prob > F = 0.0000 Residual | 458.905788 70 6.55579697 R-squared = 0.2774 ---------+------------------------------ Adj R-squared = 0.2464 Total | 635.065382 73 8.69952578 Root MSE = 2.5604 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0103511 .0046121 2.244 0.028 .0011526 .0195497 mpg | 1.597456 2.744571 0.582 0.562 -3.876417 7.071329 mpgd | -1.716956 2.740911 -0.626 0.533 -7.18353 3.749617 _cons | 6.64892 2.309938 2.878 0.005 2.041895 11.25595 ------------------------------------------------------------------------------ . lincom mpg+mpgd ( 1) mpg + mpgd = 0.0 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | -.1195005 .0731512 -1.634 0.107 -.2653961 .026395 ------------------------------------------------------------------------------ . colldiag Proportion of variance associated with the decomposition Cond | Number | displ mpg mpgd _cons ---------+-------------------------------------- 1 | 0.0052 0.0000 0.0000 0.0012 3.74763 | 0.2240 0.0000 0.0000 0.0001 17.4837 | 0.7688 0.0002 0.0002 0.9978 554.506 | 0.0020 0.9998 0.9998 0.0009 . gen mpge=mpg+e/10000 . corr mpg mpge (obs=74) | mpg mpge --------+------------------ mpg| 1.0000 mpge| 1.0000 1.0000 . fit pr displ mpg mpge Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 3, 70) = 8.96 Model | 176.159898 3 58.719966 Prob > F = 0.0000 Residual | 458.905484 70 6.55579262 R-squared = 0.2774 ---------+------------------------------ Adj R-squared = 0.2464 Total | 635.065382 73 8.69952578 Root MSE = 2.5604 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- displ | .0103512 .0046121 2.244 0.028 .0011526 .0195497 mpg | 17.05096 27.4117 0.622 0.536 -37.61994 71.72186 mpge | -17.17046 27.40891 -0.626 0.533 -71.83581 37.49488 _cons | 6.648902 2.309937 2.878 0.005 2.041877 11.25593 ------------------------------------------------------------------------------ . lincom mpg+mpge ( 1) mpg + mpge = 0.0 ------------------------------------------------------------------------------ pr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | -.1194998 .0731513 -1.634 0.107 -.2653954 .0263957 ------------------------------------------------------------------------------ . colldiag Proportion of variance associated with the decomposition Cond | Number | displ mpg mpge _cons ---------+-------------------------------------- 1 | 0.0052 0.0000 0.0000 0.0012 3.74956 | 0.2243 0.0000 0.0000 0.0001 17.4885 | 0.7677 0.0000 0.0000 0.9984 5543.13 | 0.0029 1.0000 1.0000 0.0003 . . . . . end of do-file .