^coldiag^
Direct comments to:
Joseph Harkness
Institute for Policy Studies
The Johns Hopkins University
3400 N. Charles Street
Baltimore MD 21218
Introduction
------------
Coldiag is an implementation of the regression collinearity diagnostic
procedures found in Belsley, Kuh, and Welsch (1980). These procedures
examine the "conditioning" of the matrix of independent variables.
Coldiag first computes the condition number of the matrix. If this
number is "large" (Belsley et al suggest 30 or higher), then there
may be collinearity problems.
The condition number is the largest singular value. Coldiag with the
FULL option list all the singular values. All "large" singular may
be worth investigating
Coldiag with the FULL option also provides further information that may
help to identify the source of these problems the variance decomposition
portions associated with each singular value. If a large singular value
is associated two or more variables with "large" variance decomposition
portions, these variables may be causing collinearity problems. Belsley
et al suggest that a "large" portion is 50 percent or more.
Syntax
------
^coldiag^ [varlist] [^if^ exp] [^in^ range] ^[, Full NOScale]
Options
-------
^full^ produces the full variance decomposition matrix. By default,
coldiag only generates the condition number.
^noscale^ does not scale each column vector to unit length. (This
scaling is recommended by Belsley et al)
Example
-------
. ^use auto^
. ^coldiag price mpg hdroom trunk weight length turn displ gratio foreign, full^
condition number = 93.44
Singular values:
1: 1.00
2: 3.14
3: 6.34
4: 10.07
5: 15.56
6: 17.97
7: 20.69
8: 41.77
9: 58.60
10: 93.44
SV price mpg hdroom trunk weight length turn
r1 1 .001 0 0 0 0 0 0
r2 3.14 0 .001 0 .001 0 0 0
r3 6.34 .157 .037 .003 0 0 0 0
r4 10.07 .205 .063 .119 .08 0 0 0
r5 15.56 .446 .007 .105 .116 .004 .001 .002
r6 17.97 .033 .091 .576 .27 .006 .002 .005
r7 20.69 .003 .336 .182 .434 .011 .003 .008
r8 41.77 .018 .409 .008 .036 .184 .006 .006
r9 58.6 .102 .032 .007 0 .366 0 .695
r10 93.44 .036 .024 0 .062 .429 .988 .284
displ gratio foreign
r1 0 0 .001
r2 .003 0 .199
r3 .008 .003 .078
r4 .001 .002 .151
r5 .282 0 .301
r6 .086 .008 .001
r7 .148 .029 .025
r8 .47 .628 .223
r9 .002 .211 .011
r10 0 .118 .011
The condition number of 93.44 is fairly large. Examination of the variance
decomposition portion shows that portions associated with length (.988)
and weight (.429) are fairly high.
In addition, the singular values of 41.77 and 58.60 are high. Associated
with the former are the fairly large decomposition portions of .47 for
displ and .628 for gratio.
In this case, coldiag does not provide any insight that a simple correlation
would not. The correlation between weight and length is .95, and between
displ and gratio it is -.83.
Reference
---------
D. Belsley, E. Kuh, and R. Welsch (1980). Regression Diagnositics.
Wiley.