-------------------------------------------------------------------------------
help for coldiag2 and prnt_cx                                    John Hendrickx
-------------------------------------------------------------------------------

coldiag2

coldiag2 [, full noscale corr eigenval w(integer) d(integer)] force fuzz(real) char(string) space(integer)

or

coldiag2 varlist [if exp] [in range] [, nofull noscale corr noconstant eigenval w(integer) d(integer) force fuzz(real) char(string) space(integer) ]

prnt_cx

prnt_cx [, matname(matrix) w(integer) d(integer) force fuzz(real) char(string) space(integer)

Description

Note: coldiag2 is an updated version of coldiag by Joseph Harkness. Note that the lastest version has slightly different defaults:

* a constant term is added to the varlist * the full output with variance decomposition proportions is printed

Use "coldiag2 varlist, nofull noconstant" for backward compatibility with coldiag.

coldiag2 is an implementation of the regression collinearity diagnostic procedures found in Belsley, Kuh, and Welsch (1980). These procedures examine the "conditioning" of the matrix of independent variables.

In the first syntax form, coldiag2 is used as a post-estimation command after an estimation procedure such as regress. Collinearity diagnostics are based on the interrelationships among the independent variables so they are appropriate for models other than linear regression. coldiag2 uses the sample selected by any if or in options in that command and will include an intercept according to the model that was estimated.

In the second syntax form, a varlist is specified. if or in options can be used to restrict the sample. Missing values will be deleted listwise. By default, an intercept term will be added to the varlist; this can be suppressed using the noconstant option.

prnt_cx is called by coldiag2 to print the condition indexes and variance-decompostion proportions. It can also be run after coldiag2 to print the variance decomposition proportions using different options. Note that coldiag2 will usually be fast enough to simply rerun the command with different options, unless the dataset is very largen and a large number of independent variables are used.

Options

nofull By default, coldiag2 prints the condition number and variance decomposition proportions. Use nofull to produce only the condition number.

noscale does not scale each column vector to unit length. (This scaling is recommended by Belsley et al)

corr calculates collinearity diagnostics based on a correlation matrix.

noconstant does not add an intercept term to the varlist.

eigenval prints the eigenvalues of the SSCP matrix. Default is noeigenval.

w specifies the width for printing the result. Default is 12.

d specifies the decimal places for printing the result. Default is 2.

force By default, the widest variable name determines column printing and the value of w is ignored. Use the force option to abbreviate the column labels and obtain compact columns. Note that values of w less than 5 revert to 5.

fuzz If set, variance-decomposition proportions less than fuzz are printed as a "." or optionally by the character specified in the char option. The default is 0 for coldiag2, i.e. all values are printed. The default is .3 for prnt_cx.

char Used in conjuction with the fuzz option. Specify an alternative character to be printed. Default is "."

space Specify the number of spaces between columns. Default is 2.

matname (prnt_cx only) If prnt_cx is run after coldiag2 it will automatically obtain the matrix of condition indexes and variance decomposition proportions. In rare instances, this matrix can be specified manually using the matname option. The first column of matrix should contain the condition indexes, the other columns should contain variance-decomposition proportions.

Remarks

coldiag2 first computes the condition number of the matrix. If this number is "large" (Belsley et al suggest 30 or higher), then there may be collinearity problems.

The condition number is the largest condition index. Unless the nofull option is specified, coldiag2 lists the singular values in the first column of the table under the heading "Variance-Decomposition Proportions". All "large" condition indexes may be worth investigating.

The variance-decomposition proportions can be used to identify the source of collinearity problems indicated by large singular values. If a large condition index is associated two or more variables with "large" variance decomposition portions, these variables may be causing collinearity problems. Belsley et al suggest that a "large" portion is 50 percent or more.

Example

. use auto (1978 Automobile Data)

. coldiag2 price mpg headroom trunk weight length turn displacement gear_ratio > foreign

Condition number using scaled variables = 118.78

Condition Indexes and Variance-Decomposition Proportions

condition index _cons price mpg headroom trunk 1 1.00 0.00 0.00 0.00 0.00 0.00 2 3.32 0.00 0.00 0.00 0.00 0.00 3 6.56 0.00 0.15 0.02 0.00 0.00 4 10.53 0.00 0.20 0.04 0.12 0.08 5 16.38 0.00 0.44 0.01 0.11 0.11 6 18.88 0.00 0.03 0.09 0.53 0.25 7 21.64 0.00 0.00 0.26 0.21 0.43 8 43.87 0.00 0.02 0.33 0.01 0.04 9 59.24 0.05 0.08 0.06 0.00 0.00 10 82.48 0.36 0.00 0.11 0.01 0.00 11 118.78 0.58 0.08 0.08 0.01 0.09

condition index weight length turn displacement gear_ratio 1 1.00 0.00 0.00 0.00 0.00 0.00 2 3.32 0.00 0.00 0.00 0.00 0.00 3 6.56 0.00 0.00 0.00 0.01 0.00 4 10.53 0.00 0.00 0.00 0.00 0.00 5 16.38 0.00 0.00 0.00 0.26 0.00 6 18.88 0.00 0.00 0.00 0.10 0.01 7 21.64 0.01 0.00 0.00 0.14 0.02 8 43.87 0.19 0.01 0.01 0.45 0.47 9 59.24 0.33 0.00 0.35 0.01 0.34 10 82.48 0.03 0.13 0.63 0.01 0.16 11 118.78 0.44 0.86 0.00 0.01 0.01

condition index foreign 1 1.00 0.00 2 3.32 0.20 3 6.56 0.08 4 10.53 0.14 5 16.38 0.30 6 18.88 0.00 7 21.64 0.03 8 43.87 0.22 9 59.24 0.01 10 82.48 0.01 11 118.78 0.00

. prnt_cx, force w(8)

Condition Indexes and Variance-Decomposition Proportions

condition index _cons price mpg headroom trunk weight length tu > rn 1 1.00 . . . . . . . . 2 3.32 . . . . . . . . 3 6.56 . . . . . . . . 4 10.53 . . . . . . . . 5 16.38 . 0.44 . . . . . . 6 18.88 . . . 0.53 . . . . 7 21.64 . . . . 0.43 . . . 8 43.87 . . 0.33 . . . . . 9 59.24 . . . . . 0.33 . 0. > 35 10 82.48 0.36 . . . . . . 0. > 63 11 118.78 0.58 . . . . 0.44 0.86 .

condition index displa~t gear_r~o foreign 1 1.00 . . . 2 3.32 . . . 3 6.56 . . . 4 10.53 . . . 5 16.38 . . 0.30 6 18.88 . . . 7 21.64 . . . 8 43.87 0.45 0.47 . 9 59.24 . 0.34 . 10 82.48 . . . 11 118.78 . . .

Variance Decomposition Proportions less than .3 have been printed as "."

The condition number of 118.78 is fairly large. Examination of the last row of the table of variance decomposition proportions shows that values associated with length (.86), the intercept (.58), and weight (.44) are fairly high. The strong linear relationships among these three variables is the major source of collinearity in this data.

In addition, condition indexes 8, 9 and 10 should be considered high (43.87, 59.24, and 82.48 respectively). Condition index number 10 (82.48) can be attributed to strong inter-relationships between the intercept (.36) and turn (.63). Condition index number 9 (59.24) is attributable to weight (.33), turn (.35), and gear_ratio (.34). Condition index number 8 (43.87) is attributable to mpg (.33), displacement (.45) and gear_ratio (.47).

Remarks

This program is an updated version of coldiag by Joseph Harkness. The main difference is that coldiag calculates the singular value decomposition of X, the matrix of scaled variables in varlist, whereas coldiag2 calculates the eigenvectors and eigenvalues of X'X. Because coldiag reads X into memory, the number of cases it can handle are limited by matsize (maximum 800 for intercooled Stata, 11,000 for Stata/SE). coldiag2 uses matrix accum to calculate X'X and can therefore handle larger datasets.

Belsley argues strongly against mean-centering the data. Use of the corr option is equivalent to a conditioning analysis of standardized variables with mean 0 and sd 1 and thus goes against this advice.

coldiag2 uses the Stata command _getrhs to obtain a list of independent variables when used as a post-estimation command. This should work with any estimation procedure. This will usually be appropriate. If in doubt, an alternative could be to try perturb, available from SSC. perturb evaluates collinearity by adding random noise to selected variables and assessing the impact on parameter stability. perturb is also suitable for use with categorical variables as well.

Saved results

r(pi) A matrix of variance-decomposition proportions

r(cx) A matrix of condition indexes.

r(v) A matrix containing the eigenvalues of the SSCP matrix.

References

D. Belsley, E. Kuh, and R. Welsch (1980). Regression Diagnostics. Wiley.

Belsley, D.A. (1991). Conditioning diagnostics, collinearity and weak data in regression. New York: John Wiley & Sons.

Direct comments to: John Hendrickx

coldiag2 is available at SSC-IDEAS. Use findit coldiag2 to locate the latest version.

collin, coldiag, and perturb are also available from SSC. Click on a name to install or use ssc install

Also see On-line: help for vif, collin, coldiag, perturb