{smcl} {* *! version 1.0 15 Apr 2019}{...} {vieweralsosee "" "--"}{...} {vieweralsosee "Help collin (if installed)" "help collin"}{...} {viewerjumpto "Syntax" "subsetByVIF##syntax"}{...} {viewerjumpto "Description" "subsetByVIF##description"}{...} {viewerjumpto "Options" "subsetByVIF##options"}{...} {viewerjumpto "Remarks" "subsetByVIF##remarks"}{...} {viewerjumpto "Examples" "subsetByVIF##examples"}{...} {viewerjumpto "Alsosee" "table##video"}{...}{title:Title} {phang} {bf:subsetByVIF} {hline 2} Select a subset of covariates constrained by VIF {marker syntax}{...} {title:Syntax} {p 8 17 2} {cmdab:subsetByVIF} [{varlist}] [{help if}] [{help in}] [{cmd:,} {it:options}] {synoptset 20 tabbed}{...} {synopthdr} {synoptline} {syntab:Main} {synopt:{opt vifl:ist(numlist descending min=1)}} list of maximum variance inflation factors (VIFs) used to subset {it:varlist} {p_end} {synoptline} {p2colreset}{...} {p 4 6 2} {marker description}{...} {title:Description} {pstd} {pstd} {cmd:subsetByVIF} selects subsets of the covariates listed in {it:varlist} such that each covariate in a given subset has a VIF that is less than or equal to a specified value given by viflist. {marker options}{...} {title:Options} {dlgtab:Main} {phang} {opt vifl:ist(numlist descending min=1)} specifies one or more maximum VIF values. These values must be in descending order, and must be greater than one. For each these maximum VIF values, the program identifies the largest possible subset of covariates such that each covariate in this subset has a VIF that is less than or equal to this maximum value. The default maximum VIF is 10. {p_end} {marker remarks}{...} {title:Remarks} {pstd} We are frequently faced with analyzing data sets in which the ratio of covariates to patients is high. There are several approaches to analyzing such data including penalized regression methods, k-fold cross-validation techniques, and bagging. A problem with any of these approaches is that, even after the elimination of variables causing multi-collinearity, the variance-covariance matrix of the remaining covariates is often highly ill-conditioned. The subsetByVIF program reduces the number of covariates to the largest subsample such that the maximum VIF for each variable in the subsample is less than some value specified by the user. These variables are selected without regard to the dependent variable of interest, which should mitigate problems due to overfitting. The use of this program should improve the convergence properties of many methods of exploratory data analysis. {marker examples}{...} {title:Examples} {pstd}Setup{p_end} {phang}{cmd:. webuse auto}{p_end} {pstd}subsetByVIF{p_end} {phang}{cmd:. subsetByVIF price mpg weight length displacement gear_ratio foreign}{p_end} {phang}{cmd:. subsetByVIF price mpg weight length displacement gear_ratio foreign, viflist(15 5)}{p_end} {title:Stored results} {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Locals}{p_end} {synopt:{cmd:r(n_vif)}} number of maximum VIF values specified {p_end} {synopt:{cmd:r(vifmax1)}} largest value of vifmax specified by the viflist option {p_end} {synopt:{cmd:r(n1)}} number of variables in the subset of covariates with VIFs <= vifmax1 {p_end} {synopt:{cmd:r(covlist1)}} local macro consisting of the names of the variables in the subset of covariates with VIFs <= vifmax1 {p_end} {synopt:{cmd:r(vifmax2)}} second largest value of vifmax specified by the viflist option {p_end} {synopt:{cmd:r(n2)}} number of variables in the subset of covariates with VIFs <= vifmax2 {p_end} {synopt:{cmd:r(covlist2)}} local macro consisting of the names of the variables in the subset of covariates with VIFs <= vifmax2 {p_end} {synopt:{cmd:.}} {p_end} {synopt:{cmd:.}} {p_end} {synopt:{cmd:.}} {p_end} {title:Author} {pstd}Dale Plummer{p_end} {pstd}William D. Dupont{p_end} {pstd}Department of Biostatistics{p_end} {pstd}Vanderbilt University School of Medicine{p_end} {pstd}Email {browse "mailto:william.dupont@vumc.org":william.dupont@vumc.org}{p_end} {pstd}Email {browse "mailto:dale.plummer@vumc.org":dale.plummer@vumc.org}{p_end} {marker Alsosee}{...} {title:Also see} {phang}collin.ado: A contributed program by Philip B. Ender that calculates the VIF for each variable in a set of covariates.{p_end} {phang}Manual: {manhelp regress_postestimation R:regress_postestimation}{p_end} {phang}On-line: help for vif{p_end}