------------------------------------------------------------------------------- help forcoldiag2andprnt_cxJohn Hendrickx -------------------------------------------------------------------------------

coldiag2

coldiag2[,fullnoscalecorreigenvalw(integer)d(integer)]forcefuzz(real)char(string)space(integer)or

coldiag2varlist[ifexp] [inrange] [,nofullnoscalecorrnoconstanteigenvalw(integer)d(integer)forcefuzz(real)char(string)space(integer) ]

prnt_cx

prnt_cx[,matname(matrix)w(integer)d(integer)forcefuzz(real)char(string)space(integer)

DescriptionNote:

coldiag2is an updated version of coldiag by Joseph Harkness. Note that the lastest version has slightly different defaults:* a constant term is added to the

varlist* the full output with variance decomposition proportions is printedUse "

coldiag2varlist, nofull noconstant" for backward compatibility withcoldiag.

coldiag2is an implementation of the regression collinearity diagnostic procedures found in Belsley, Kuh, and Welsch (1980). These procedures examine the "conditioning" of the matrix of independent variables.In the first syntax form,

coldiag2is used as a post-estimation command after an estimation procedure such as regress. Collinearity diagnostics are based on the interrelationships among the independent variables so they are appropriate for models other than linear regression.coldiag2uses the sample selected by anyiforinoptions in that command and will include an intercept according to the model that was estimated.In the second syntax form, a

varlistis specified.iforinoptions can be used to restrict the sample. Missing values will be deleted listwise. By default, an intercept term will be added to thevarlist; this can be suppressed using thenoconstantoption.

prnt_cxis called bycoldiag2to print the condition indexes and variance-decompostion proportions. It can also be run aftercoldiag2to print the variance decomposition proportions using different options. Note thatcoldiag2will usually be fast enough to simply rerun the command with different options, unless the dataset is very largen and a large number of independent variables are used.

Options

nofullBy default,coldiag2prints the condition number and variance decomposition proportions. Usenofullto produce only the condition number.

noscaledoes not scale each column vector to unit length. (This scaling is recommended by Belsley et al)

corrcalculates collinearity diagnostics based on a correlation matrix.

noconstantdoes not add an intercept term to thevarlist.

eigenvalprints the eigenvalues of the SSCP matrix. Default isnoeigenval.

wspecifies the width for printing the result. Default is 12.

dspecifies the decimal places for printing the result. Default is 2.

forceBy default, the widest variable name determines column printing and the value ofwis ignored. Use theforceoption to abbreviate the column labels and obtain compact columns. Note that values ofwless than 5 revert to 5.

fuzzIf set, variance-decomposition proportions less thanfuzzare printed as a "." or optionally by the character specified in thecharoption. The default is 0 forcoldiag2, i.e. all values are printed. The default is .3 forprnt_cx.

charUsed in conjuction with the fuzz option. Specify an alternative character to be printed. Default is "."

spaceSpecify the number of spaces between columns. Default is 2.

matname(prnt_cxonly) Ifprnt_cxis run aftercoldiag2it will automatically obtain the matrix of condition indexes and variance decomposition proportions. In rare instances, this matrix can be specified manually using thematnameoption. The first column ofmatrixshould contain the condition indexes, the other columns should contain variance-decomposition proportions.

Remarks

coldiag2first computes the condition number of the matrix. If this number is "large" (Belsley et al suggest 30 or higher), then there may be collinearity problems.The condition number is the largest condition index. Unless the

nofulloption is specified,coldiag2lists the singular values in the first column of the table under the heading "Variance-Decomposition Proportions". All "large" condition indexes may be worth investigating.The variance-decomposition proportions can be used to identify the source of collinearity problems indicated by large singular values. If a large condition index is associated two or more variables with "large" variance decomposition portions, these variables may be causing collinearity problems. Belsley et al suggest that a "large" portion is 50 percent or more.

Example. use auto (1978 Automobile Data)

. coldiag2 price mpg headroom trunk weight length turn displacement gear_ratio > foreign

Condition number using scaled variables = 118.78

Condition Indexes and Variance-Decomposition Proportions

condition index _cons price mpg headroom trunk 1 1.00 0.00 0.00 0.00 0.00 0.00 2 3.32 0.00 0.00 0.00 0.00 0.00 3 6.56 0.00 0.15 0.02 0.00 0.00 4 10.53 0.00 0.20 0.04 0.12 0.08 5 16.38 0.00 0.44 0.01 0.11 0.11 6 18.88 0.00 0.03 0.09 0.53 0.25 7 21.64 0.00 0.00 0.26 0.21 0.43 8 43.87 0.00 0.02 0.33 0.01 0.04 9 59.24 0.05 0.08 0.06 0.00 0.00 10 82.48 0.36 0.00 0.11 0.01 0.00 11 118.78 0.58 0.08 0.08 0.01 0.09

condition index weight length turn displacement gear_ratio 1 1.00 0.00 0.00 0.00 0.00 0.00 2 3.32 0.00 0.00 0.00 0.00 0.00 3 6.56 0.00 0.00 0.00 0.01 0.00 4 10.53 0.00 0.00 0.00 0.00 0.00 5 16.38 0.00 0.00 0.00 0.26 0.00 6 18.88 0.00 0.00 0.00 0.10 0.01 7 21.64 0.01 0.00 0.00 0.14 0.02 8 43.87 0.19 0.01 0.01 0.45 0.47 9 59.24 0.33 0.00 0.35 0.01 0.34 10 82.48 0.03 0.13 0.63 0.01 0.16 11 118.78 0.44 0.86 0.00 0.01 0.01

condition index foreign 1 1.00 0.00 2 3.32 0.20 3 6.56 0.08 4 10.53 0.14 5 16.38 0.30 6 18.88 0.00 7 21.64 0.03 8 43.87 0.22 9 59.24 0.01 10 82.48 0.01 11 118.78 0.00

. prnt_cx, force w(8)

Condition Indexes and Variance-Decomposition Proportions

condition index _cons price mpg headroom trunk weight length tu > rn 1 1.00 . . . . . . . . 2 3.32 . . . . . . . . 3 6.56 . . . . . . . . 4 10.53 . . . . . . . . 5 16.38 . 0.44 . . . . . . 6 18.88 . . . 0.53 . . . . 7 21.64 . . . . 0.43 . . . 8 43.87 . . 0.33 . . . . . 9 59.24 . . . . . 0.33 . 0. > 35 10 82.48 0.36 . . . . . . 0. > 63 11 118.78 0.58 . . . . 0.44 0.86 .

condition index displa~t gear_r~o foreign 1 1.00 . . . 2 3.32 . . . 3 6.56 . . . 4 10.53 . . . 5 16.38 . . 0.30 6 18.88 . . . 7 21.64 . . . 8 43.87 0.45 0.47 . 9 59.24 . 0.34 . 10 82.48 . . . 11 118.78 . . .

Variance Decomposition Proportions less than .3 have been printed as "."

The condition number of 118.78 is fairly large. Examination of the last row of the table of variance decomposition proportions shows that values associated with

length(.86), theintercept(.58), andweight(.44) are fairly high. The strong linear relationships among these three variables is the major source of collinearity in this data.In addition, condition indexes 8, 9 and 10 should be considered high (43.87, 59.24, and 82.48 respectively). Condition index number 10 (82.48) can be attributed to strong inter-relationships between the

intercept(.36) andturn(.63). Condition index number 9 (59.24) is attributable toweight(.33),turn(.35), andgear_ratio(.34). Condition index number 8 (43.87) is attributable tompg(.33),displacement(.45) andgear_ratio(.47).

RemarksThis program is an updated version of coldiag by Joseph Harkness. The main difference is that

coldiagcalculates the singular value decomposition ofX, the matrix of scaled variables invarlist, whereascoldiag2calculates the eigenvectors and eigenvalues ofX'X. BecausecoldiagreadsXinto memory, the number of cases it can handle are limited by matsize (maximum 800 for intercooled Stata, 11,000 for Stata/SE).coldiag2uses matrix accum to calculateX'Xand can therefore handle larger datasets.Belsley argues strongly against mean-centering the data. Use of the

corroption is equivalent to a conditioning analysis of standardized variables with mean 0 and sd 1 and thus goes against this advice.

coldiag2uses the Stata command_getrhsto obtain a list of independent variables when used as a post-estimation command. This should work with any estimation procedure. This will usually be appropriate. If in doubt, an alternative could be to try perturb, available from SSC.perturbevaluates collinearity by adding random noise to selected variables and assessing the impact on parameter stability.perturbis also suitable for use with categorical variables as well.

Saved results

r(pi)A matrix of variance-decomposition proportions

r(cx)A matrix of condition indexes.

r(v)A matrix containing the eigenvalues of the SSCP matrix.

ReferencesD. Belsley, E. Kuh, and R. Welsch (1980).

Regression Diagnostics. Wiley.Belsley, D.A. (1991).

Conditioning diagnostics, collinearity and weak data inregression. New York: John Wiley & Sons.Direct comments to: John Hendrickx

coldiag2is available at SSC-IDEAS. Use finditcoldiag2to locate the latest version.collin, coldiag, and perturb are also available from SSC. Click on a name to install or use

ssc install

On-line: help for vif, collin, coldiag, perturbAlso see