^coldiag^

Direct comments to: 

Joseph Harkness <joe.harkness@@jhu.edu >
Institute for Policy Studies
The Johns Hopkins University
3400 N. Charles Street
Baltimore MD 21218

Introduction
------------

Coldiag is an implementation of the regression collinearity diagnostic
procedures found in Belsley, Kuh, and Welsch (1980).  These procedures
examine the "conditioning" of the matrix of independent variables.

Coldiag first computes the condition number of the matrix.  If this
number is "large" (Belsley et al suggest 30 or higher), then there 
may be collinearity problems.  

The condition number is the largest singular value. Coldiag with the 
FULL option list all the singular values.  All "large" singular may
be worth investigating

Coldiag with the FULL option also provides further information that may 
help to identify the source of these problems the variance decomposition 
portions associated with each singular value.  If a large singular value
is associated two or more variables with "large" variance decomposition
portions,  these variables may be causing collinearity problems.  Belsley
et al suggest that a "large" portion is 50 percent or more.

Syntax
------

^coldiag^ [varlist] [^if^ exp] [^in^ range] ^[, Full NOScale]

Options
-------

^full^ produces the full variance decomposition matrix.  By default,
coldiag only generates the condition number.

^noscale^ does not scale each column vector to unit length.  (This
scaling is recommended by Belsley et al)

Example
-------

 . ^use auto^

 . ^coldiag price mpg hdroom trunk weight length turn displ gratio foreign, full^

 condition number =        93.44
  
Singular values:
     1:        1.00
     2:        3.14
     3:        6.34
     4:       10.07
     5:       15.56
     6:       17.97
     7:       20.69
     8:       41.77
     9:       58.60
    10:       93.44

          SV    price      mpg   hdroom    trunk   weight   length     turn
 r1        1     .001        0        0        0        0        0        0
 r2     3.14        0     .001        0     .001        0        0        0
 r3     6.34     .157     .037     .003        0        0        0        0
 r4    10.07     .205     .063     .119      .08        0        0        0
 r5    15.56     .446     .007     .105     .116     .004     .001     .002
 r6    17.97     .033     .091     .576      .27     .006     .002     .005
 r7    20.69     .003     .336     .182     .434     .011     .003     .008
 r8    41.77     .018     .409     .008     .036     .184     .006     .006
 r9     58.6     .102     .032     .007        0     .366        0     .695
r10    93.44     .036     .024        0     .062     .429     .988     .284

       displ   gratio  foreign
 r1        0        0     .001
 r2     .003        0     .199
 r3     .008     .003     .078
 r4     .001     .002     .151
 r5     .282        0     .301
 r6     .086     .008     .001
 r7     .148     .029     .025
 r8      .47     .628     .223
 r9     .002     .211     .011
r10        0     .118     .011
The condition number of 93.44 is fairly large.  Examination of the variance
decomposition portion shows that portions associated with length (.988)
and weight (.429) are fairly high.  

In addition, the singular values of 41.77 and 58.60 are high.  Associated
with the former are the fairly large decomposition portions of .47 for
displ and .628 for gratio.

In this case, coldiag does not provide any insight that a simple correlation
would not.  The correlation between weight and length is .95, and between
displ and gratio it is -.83.  

Reference
---------

D. Belsley, E. Kuh, and R. Welsch (1980).  Regression Diagnositics.
Wiley.