^coldiag^Direct comments to:

Joseph Harkness <joe.harkness@@jhu.edu > Institute for Policy Studies The Johns Hopkins University 3400 N. Charles Street Baltimore MD 21218

Introduction ------------

Coldiag is an implementation of the regression collinearity diagnostic procedures found in Belsley, Kuh, and Welsch (1980). These procedures examine the "conditioning" of the matrix of independent variables.

Coldiag first computes the condition number of the matrix. If this number is "large" (Belsley et al suggest 30 or higher), then there may be collinearity problems.

The condition number is the largest singular value. Coldiag with the FULL option list all the singular values. All "large" singular may be worth investigating

Coldiag with the FULL option also provides further information that may help to identify the source of these problems the variance decomposition portions associated with each singular value. If a large singular value is associated two or more variables with "large" variance decomposition portions, these variables may be causing collinearity problems. Belsley et al suggest that a "large" portion is 50 percent or more.

Syntax ------

^coldiag^ [varlist] [^if^ exp] [^in^ range] ^[, Full NOScale]

Options -------

^full^ produces the full variance decomposition matrix. By default, coldiag only generates the condition number.

^noscale^ does not scale each column vector to unit length. (This scaling is recommended by Belsley et al)

Example -------

. ^use auto^

. ^coldiag price mpg hdroom trunk weight length turn displ gratio foreign, ful > l^

condition number = 93.44 Singular values: 1: 1.00 2: 3.14 3: 6.34 4: 10.07 5: 15.56 6: 17.97 7: 20.69 8: 41.77 9: 58.60 10: 93.44

SV price mpg hdroom trunk weight length turn r1 1 .001 0 0 0 0 0 0 r2 3.14 0 .001 0 .001 0 0 0 r3 6.34 .157 .037 .003 0 0 0 0 r4 10.07 .205 .063 .119 .08 0 0 0 r5 15.56 .446 .007 .105 .116 .004 .001 .002 r6 17.97 .033 .091 .576 .27 .006 .002 .005 r7 20.69 .003 .336 .182 .434 .011 .003 .008 r8 41.77 .018 .409 .008 .036 .184 .006 .006 r9 58.6 .102 .032 .007 0 .366 0 .695 r10 93.44 .036 .024 0 .062 .429 .988 .284

displ gratio foreign r1 0 0 .001 r2 .003 0 .199 r3 .008 .003 .078 r4 .001 .002 .151 r5 .282 0 .301 r6 .086 .008 .001 r7 .148 .029 .025 r8 .47 .628 .223 r9 .002 .211 .011 r10 0 .118 .011 The condition number of 93.44 is fairly large. Examination of the variance decomposition portion shows that portions associated with length (.988) and weight (.429) are fairly high.

In addition, the singular values of 41.77 and 58.60 are high. Associated with the former are the fairly large decomposition portions of .47 for displ and .628 for gratio.

In this case, coldiag does not provide any insight that a simple correlation would not. The correlation between weight and length is .95, and between displ and gratio it is -.83.

Reference ---------

D. Belsley, E. Kuh, and R. Welsch (1980). Regression Diagnositics. Wiley.