EC 761 Fall 1998

Available from the class homepage:

http://fmwww.bc.edu/ec-c/f98/ec761.f98.html

This handout reproduces the Stata documentation for the 'colldiag' command and
then illustrates, with a constructed regressor, how increasing correlation
among regressors increases the maximal condition index and generates
diagnostics for its identification.



-------------------------------------------------------------------------------
help for colldiag                                          (STB-25: sg32.1)
-------------------------------------------------------------------------------

Collinearity Diagnostics
------------------------

        colldiag [nocons]


Description
-----------

colldiag calculates and displays the matrix of variance decomposition
proportions for the independent variables in a linear regression model. This
command must follow a call to fit.


Remarks
-------

In the case of orthogonal predictors, the variance decomposition proportion
matrix would be an identity matrix. One should examine the dependencies of
the variances on the principal components, by focusing on the decomposition
of the variables associated with high condition numbers. The condition number
is a measure of the dependence of the independent variables. Typical values
used are (n* = ) 10, 15, or even 30. As, you look at the row associated with
the high condition numbers, you should note the variance decomposition propor-
tions that are higher than some threshold value (like p*=.50).

You should note the following:

1)      The independent variable will have a degraded coefficient because
        of a near dependency if it is one of two or more variates with
        variance-decomposition proportions in excess of some threshold value
        p*, such as .50. The number of near dependencies is the number of
        condition numbers greater than the threshold value n*.

2)      Those variates whose aggregate variance-decomposition proportion
        exceed the threshold value p* are involved in at least one of
        the dependencies. The aggregate is formed over the competing condition
        numbers (condition numbers of the same order of magnitude that exceed
        the threshold value n*).

3)      A dominating dependency occurs when the condition number is an
        order of magnitude larger than the other condition numbers. This can
        obscure information about the variate's simultaneous involvement in a
        weaker dependency. In this case, additional analysis is warranted to
        investigate the relationships of all potentially involved variates.


Example
-------

We have data on men involved in a physical fitness course. The purpose of
the study is to model the oxygen uptake rate by the age, weight, time
to run one and a half miles, the heart rate while resting, heart rate while
running, and the maximum heart rate while running.

  de

Contains data from fitness.dta
  Obs:    31 (max= 50172)                     Fitness data
 Vars:     7 (max=    99)                     16 Nov 1994 15:47
Width:    28 (max=   200)
  1. age          float  %9.0g
  2. weight       float  %9.0g
  3. oxy          float  %9.0g
  4. runtime      float  %9.0g
  5. rstpulse     float  %9.0g
  6. runpulse     float  %9.0g
  7. maxpulse     float  %9.0g
Sorted by:

  fit oxy age weight runtime rstpulse runpulse maxpulse

  Source |       SS       df       MS                  Number of obs =      31
---------+------------------------------               F(  6,    24) =   22.43
   Model |  722.543528     6  120.423921               Prob > F      =  0.0000
Residual |  128.837947    24   5.3682478               R-squared     =  0.8487
---------+------------------------------               Adj R-squared =  0.8108
   Total |  851.381475    30  28.3793825               Root MSE      =  2.3169

------------------------------------------------------------------------------
     oxy |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     age |  -.2269738   .0998375     -2.273   0.032      -.4330282   -.0209194
  weight |  -.0741774   .0545932     -1.359   0.187      -.1868521    .0384974
 runtime |  -2.628653   .3845622     -6.835   0.000       -3.42235   -1.834955
rstpulse |  -.0215336   .0660543     -0.326   0.747      -.1578629    .1147957
runpulse |  -.3696278   .1198529     -3.084   0.005      -.6169921   -.1222634
maxpulse |   .3032171   .1364952      2.221   0.036       .0215049    .5849294
   _cons |   102.9345   12.40326      8.299   0.000       77.33541    128.5335
------------------------------------------------------------------------------


We are somewhat concerned that there may be a dependency among the pulse
variables and investigate this with the new diagnostic tool.

  colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |     age    weight   runtime  rstpulse  runpulse  maxpulse     _cons
---------+--------------------------------------------------------------------
       1 |  0.0002    0.0002    0.0002    0.0003    0.0000    0.0000    0.0000
 19.2909 |  0.1463    0.0104    0.0252    0.3906    0.0000    0.0000    0.0022
 21.5007 |  0.1501    0.2357    0.1286    0.0281    0.0012    0.0012    0.0006
 27.6212 |  0.0319    0.1831    0.6090    0.1903    0.0015    0.0012    0.0064
 33.8292 |  0.1128    0.4444    0.1250    0.3648    0.0151    0.0083    0.0013
 82.6376 |  0.4966    0.1033    0.0975    0.0203    0.0695    0.0056    0.7997
 196.786 |  0.0621    0.0228    0.0146    0.0057    0.9128    0.9836    0.1898

If we use 30 as our value for n* and .50 as our threshold for p*, then we see
that points 2 and 3 from above are exhibited in our output. The competing
dependency is for the condition numbers 33.8292 and 82.6376 which are of the
same order of magnitude and both exceed our threshold value of 30. Aggregating
the variance-decomposition proportions, we note that age (.1128+.4966=.6014),
weight (.4444+.1033=.5477), and the constant (.0013+.7997=.8010) are involved
in a competing dependency. We also note that we have a dominating dependency
with a condition number greater than 196 and involving the runpulse and
maxpulse variables.

Since we have 3 near dependencies (3 condition numbers greater than n*=30), we
should be able to express 3 of our independent variables in terms of the
remaining 4. How do we choose the variates for which to solve? Beginning with
the largest condition number, we see that we should choose either runpulse or
maxpulse. Since maxpulse has the remainder of it variance determined in a more
removed dependency, we can choose it as our first dependent variable in the
auxiliary regression.  Now, since we are not as interested in the constant
term, we may choose the weight and age as our remaining pivots.


Author
------
        James W. Hardin, Stata Corporation
        stata@stata.com


See Also
--------

    STB: STB-25 sg32.1, STB-24 sg32
 Manual: [5s] fit
On-line: fit, vif if installed




EXAMPLE OF COLLINEARITY INCREASING WITH CORRELATION OF REGRESSORS

do ":Keewaydin:Desktop Folder:colldiag.do"

. * colldiag   EC761 cfb 8902
. set matsize 800

. use  ":Keewaydin:Stata:auto.dta"
(1978 Automobile Data)

. gen pr=price/1000

. gen e = 100*invnorm(uniform())

. gen mpga=mpg+e

. corr mpg mpga
(obs=74)

        |      mpg     mpga
--------+------------------
     mpg|   1.0000
    mpga|   0.1570   1.0000


. fit pr displ  mpg

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  2,    71) =   13.35
   Model |  173.587101     2  86.7935503               Prob > F      =  0.0000
Residual |  461.478281    71   6.4996941               R-squared     =  0.2733
---------+------------------------------               Adj R-squared =  0.2529
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5494

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0105088   .0045855      2.292   0.025       .0013657     .019652
     mpg |  -.1211832   .0727884     -1.665   0.100      -.2663193    .0239528
   _cons |   6.672765    2.29972      2.902   0.005       2.087254    11.25828
------------------------------------------------------------------------------

. fit pr displ  mpg mpga

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  3,    70) =    8.96
   Model |  176.159644     3  58.7198815               Prob > F      =  0.0000
Residual |  458.905737    70  6.55579625               R-squared     =  0.2774
---------+------------------------------               Adj R-squared =  0.2464
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5604

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0103511   .0046121      2.244   0.028       .0011526    .0195497
     mpg |  -.1177836   .0733031     -1.607   0.113      -.2639819    .0284148
    mpga |   -.001717   .0027409     -0.626   0.533      -.0071835    .0037496
   _cons |   6.648921   2.309938      2.878   0.005       2.041896    11.25595
------------------------------------------------------------------------------

. lincom mpg+mpga

 ( 1)  mpg + mpga = 0.0

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     (1) |  -.1195005   .0731512     -1.634   0.107      -.2653961     .026395
------------------------------------------------------------------------------

. colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |   displ       mpg      mpga     _cons
---------+--------------------------------------
       1 |  0.0098    0.0040    0.0021    0.0021
 1.67485 |  0.0012    0.0000    0.9532    0.0000
 3.74772 |  0.2611    0.0653    0.0439    0.0015
 16.3562 |  0.7278    0.9307    0.0007    0.9964

. gen mpgb=mpg+e/10

. corr mpg mpgb
(obs=74)

        |      mpg     mpgb
--------+------------------
     mpg|   1.0000
    mpgb|   0.5358   1.0000


. fit pr displ  mpg mpgb

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  3,    70) =    8.96
   Model |  176.159645     3  58.7198816               Prob > F      =  0.0000
Residual |  458.905737    70  6.55579624               R-squared     =  0.2774
---------+------------------------------               Adj R-squared =  0.2464
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5604

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0103511   .0046121      2.244   0.028       .0011526    .0195497
     mpg |  -.1023308   .0790545     -1.294   0.200           -.26    .0553384
    mpgb |  -.0171697   .0274091     -0.626   0.533      -.0718354     .037496
   _cons |   6.648921   2.309938      2.878   0.005       2.041896    11.25595
------------------------------------------------------------------------------

. lincom mpg+mpgb

 ( 1)  mpg + mpgb = 0.0

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     (1) |  -.1195005   .0731512     -1.634   0.107      -.2653961     .026395
------------------------------------------------------------------------------

. colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |   displ       mpg      mpgb     _cons
---------+--------------------------------------
       1 |  0.0056    0.0022    0.0134    0.0013
 3.19432 |  0.1292    0.0033    0.2548    0.0010
 6.02763 |  0.1645    0.1147    0.7129    0.0163
 18.5765 |  0.7006    0.8799    0.0189    0.9814

. gen mpgc=mpg+e/100

. corr mpg mpgc
(obs=74)

        |      mpg     mpgc
--------+------------------
     mpg|   1.0000
    mpgc|   0.9832   1.0000


. fit pr displ  mpg mpgc

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  3,    70) =    8.96
   Model |  176.159648     3  58.7198828               Prob > F      =  0.0000
Residual |  458.905734    70  6.55579619               R-squared     =  0.2774
---------+------------------------------               Adj R-squared =  0.2464
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5604

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0103511   .0046121      2.244   0.028       .0011526    .0195497
     mpg |   .0521968   .2862681      0.182   0.856      -.5187469    .6231405
    mpgc |  -.1716973   .2740909     -0.626   0.533      -.7183543    .3749596
   _cons |   6.648921   2.309938      2.878   0.005       2.041896    11.25595
------------------------------------------------------------------------------

. lincom mpg+mpgc

 ( 1)  mpg + mpgc = 0.0

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     (1) |  -.1195006   .0731512     -1.634   0.107      -.2653961     .026395
------------------------------------------------------------------------------

. colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |   displ       mpg      mpgc     _cons
---------+--------------------------------------
       1 |  0.0052    0.0002    0.0002    0.0012
 3.72551 |  0.2208    0.0013    0.0015    0.0001
 17.2937 |  0.7714    0.0110    0.0195    0.9743
 56.1544 |  0.0026    0.9875    0.9787    0.0244

. gen mpgd=mpg+e/1000

. corr mpg mpgd
(obs=74)

        |      mpg     mpgd
--------+------------------
     mpg|   1.0000
    mpgd|   0.9998   1.0000


. fit pr displ  mpg mpgd

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  3,    70) =    8.96
   Model |  176.159594     3  58.7198647               Prob > F      =  0.0000
Residual |  458.905788    70  6.55579697               R-squared     =  0.2774
---------+------------------------------               Adj R-squared =  0.2464
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5604

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0103511   .0046121      2.244   0.028       .0011526    .0195497
     mpg |   1.597456   2.744571      0.582   0.562      -3.876417    7.071329
    mpgd |  -1.716956   2.740911     -0.626   0.533       -7.18353    3.749617
   _cons |    6.64892   2.309938      2.878   0.005       2.041895    11.25595
------------------------------------------------------------------------------

. lincom mpg+mpgd

 ( 1)  mpg + mpgd = 0.0

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     (1) |  -.1195005   .0731512     -1.634   0.107      -.2653961     .026395
------------------------------------------------------------------------------

. colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |   displ       mpg      mpgd     _cons
---------+--------------------------------------
       1 |  0.0052    0.0000    0.0000    0.0012
 3.74763 |  0.2240    0.0000    0.0000    0.0001
 17.4837 |  0.7688    0.0002    0.0002    0.9978
 554.506 |  0.0020    0.9998    0.9998    0.0009

. gen mpge=mpg+e/10000

. corr mpg mpge
(obs=74)

        |      mpg     mpge
--------+------------------
     mpg|   1.0000
    mpge|   1.0000   1.0000


. fit pr displ  mpg mpge

  Source |       SS       df       MS                  Number of obs =      74
---------+------------------------------               F(  3,    70) =    8.96
   Model |  176.159898     3   58.719966               Prob > F      =  0.0000
Residual |  458.905484    70  6.55579262               R-squared     =  0.2774
---------+------------------------------               Adj R-squared =  0.2464
   Total |  635.065382    73  8.69952578               Root MSE      =  2.5604

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   displ |   .0103512   .0046121      2.244   0.028       .0011526    .0195497
     mpg |   17.05096    27.4117      0.622   0.536      -37.61994    71.72186
    mpge |  -17.17046   27.40891     -0.626   0.533      -71.83581    37.49488
   _cons |   6.648902   2.309937      2.878   0.005       2.041877    11.25593
------------------------------------------------------------------------------

. lincom mpg+mpge

 ( 1)  mpg + mpge = 0.0

------------------------------------------------------------------------------
      pr |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     (1) |  -.1194998   .0731513     -1.634   0.107      -.2653954    .0263957
------------------------------------------------------------------------------

. colldiag

Proportion of variance associated with the decomposition
  Cond   |
 Number  |   displ       mpg      mpge     _cons
---------+--------------------------------------
       1 |  0.0052    0.0000    0.0000    0.0012
 3.74956 |  0.2243    0.0000    0.0000    0.0001
 17.4885 |  0.7677    0.0000    0.0000    0.9984
 5543.13 |  0.0029    1.0000    1.0000    0.0003

. 
. 
. 
. 
. 
end of do-file

.