```^coldiag^

Joseph Harkness <joe.harkness@@jhu.edu >
Institute for Policy Studies
The Johns Hopkins University
3400 N. Charles Street
Baltimore MD 21218

Introduction
------------

Coldiag is an implementation of the regression collinearity diagnostic
procedures found in Belsley, Kuh, and Welsch (1980).  These procedures
examine the "conditioning" of the matrix of independent variables.

Coldiag first computes the condition number of the matrix.  If this
number is "large" (Belsley et al suggest 30 or higher), then there
may be collinearity problems.

The condition number is the largest singular value. Coldiag with the
FULL option list all the singular values.  All "large" singular may
be worth investigating

Coldiag with the FULL option also provides further information that may
help to identify the source of these problems the variance decomposition
portions associated with each singular value.  If a large singular value
is associated two or more variables with "large" variance decomposition
portions,  these variables may be causing collinearity problems.  Belsley
et al suggest that a "large" portion is 50 percent or more.

Syntax
------

^coldiag^ [varlist] [^if^ exp] [^in^ range] ^[, Full NOScale]

Options
-------

^full^ produces the full variance decomposition matrix.  By default,
coldiag only generates the condition number.

^noscale^ does not scale each column vector to unit length.  (This
scaling is recommended by Belsley et al)

Example
-------

. ^use auto^

. ^coldiag price mpg hdroom trunk weight length turn displ gratio foreign, ful
> l^

condition number =        93.44

Singular values:
1:        1.00
2:        3.14
3:        6.34
4:       10.07
5:       15.56
6:       17.97
7:       20.69
8:       41.77
9:       58.60
10:       93.44

SV    price      mpg   hdroom    trunk   weight   length     turn
r1        1     .001        0        0        0        0        0        0
r2     3.14        0     .001        0     .001        0        0        0
r3     6.34     .157     .037     .003        0        0        0        0
r4    10.07     .205     .063     .119      .08        0        0        0
r5    15.56     .446     .007     .105     .116     .004     .001     .002
r6    17.97     .033     .091     .576      .27     .006     .002     .005
r7    20.69     .003     .336     .182     .434     .011     .003     .008
r8    41.77     .018     .409     .008     .036     .184     .006     .006
r9     58.6     .102     .032     .007        0     .366        0     .695
r10    93.44     .036     .024        0     .062     .429     .988     .284

displ   gratio  foreign
r1        0        0     .001
r2     .003        0     .199
r3     .008     .003     .078
r4     .001     .002     .151
r5     .282        0     .301
r6     .086     .008     .001
r7     .148     .029     .025
r8      .47     .628     .223
r9     .002     .211     .011
r10        0     .118     .011
The condition number of 93.44 is fairly large.  Examination of the variance
decomposition portion shows that portions associated with length (.988)
and weight (.429) are fairly high.

In addition, the singular values of 41.77 and 58.60 are high.  Associated
with the former are the fairly large decomposition portions of .47 for
displ and .628 for gratio.

In this case, coldiag does not provide any insight that a simple correlation
would not.  The correlation between weight and length is .95, and between
displ and gratio it is -.83.

Reference
---------

D. Belsley, E. Kuh, and R. Welsch (1980).  Regression Diagnositics.
Wiley.

```