log using SampleProgramFor_subsetByVIF.log, replace
* SampleProgramFor_subsetByVIF.log

* subsetByVIF is intended for data sets in which the number of covariates is
* large in comparison to the number of observations and the variance-covariance
* matrix is severely illconditioned. Below we illustrate the use of this program 
* in the sysuse auto data set.

sysuse auto

* collin is a contributed program written by Plilip Ender, that can be used in
* conjunction with subsampleByVIF. It can be downloaded from 
* https://stats.idre.ucla.edu/stat/stata/ado/analysis

collin price mpg weight length displacement gear_ratio foreign

* By default subsetByVIF uses a maximum VIF number of 10

subsetByVIF price mpg weight length displacement gear_ratio foreign

display "Value of vifmax = " r(vifmax1)
display "Number of variables in subset = " r(n1)
display "Subset of covariates = `r(covlist1)'"

* Each covariate in the preceding list has a VIF <=10 when this group of covariates
* are analyzed together.

collin `r(covlist1)'

* The following command gives two lists of covariates.  In the first list each covariate
* will have a VIF <= 15 when this list is analyzed as a group. The covariates in the 
* second list have VIFs <= 5

subsetByVIF price mpg weight length displacement gear_ratio foreign, viflist(15 5)
display "Number of maximum VIFs specified = " r(n_vif)
display "Largest value of vifmax = " r(vifmax1)
display "number of variables in subset = " r(n1)
display "Subset of covariates = " r(covlist1)

display "Second largest value of vifmax = " r(vifmax2)
display "number of variables in subset = " r(n2)
display "Subset of covariates = " r(covlist2)
local covlist2 = r(covlist2)

collin `r(covlist1)'
collin `covlist2'

* Create one list of covariates with VIFs <= 4

subsetByVIF price mpg weight length displacement gear_ratio foreign, viflist(4)

display "Value of vifmax = " r(vifmax1)
display "number of variables in subset = " r(n1)
display "Subset of covariates = " r(covlist1)
collin `r(covlist1)'