{smcl}
{* *! version 1.1  March 2018}{...}
 
{title:Title}

{phang}
{bf:cv_regress} {hline 2} Reproduces the Leave-one-out cross-validation statistics using the shortcut for Linear models.


{marker syntax}{...}
{title:Syntax}

{p 8 17 2}
{cmdab:cv_regress} [{cmd:,} {it:options}] [cvwgt(varname) gen(new varname)] 

{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Main}
{synopt:{opth cvwgt(varname)}} Weights to be used for the error evaluation purpose {p_end}
{synopt:{opth generr(new varname)}} If specified, creates a new variable containing the predicted Leave-one-out error.  y-E(y_-i|X) {p_end}
{synopt:{opth genhat(new varname)}} If specified, creates a new variable containing the Leave-one-out prediction E(y_-i|X)   {p_end}
{synopt:{opth genlev(new varname)}} If specified, creates a new variable containing the leverage statistic h(X). Accounts for the use of weights {p_end}

{synoptline}

{marker description}{...}
{title:Description}

{pstd}
{cmd:cv_regress}  This command uses the shortcut that relies on the leverage statistics to estimate the leave-one-out error, which is typically used 
in the estimation of Cross-Validation Statistics.{p_end} 

{pstd}
For the correct implementation, the OLS model needs to be estimated using -regress-, before this program is executed.{p_end}

{pstd}
cv_regress reports four goodness-of-fit measures: the root mean squared error (RMSE), Log Mean Squared error (LMSE),
 the mean absolute error (MAE), and the pseudo-R2 
 (the square of the correlation coefficient of the predicted and observed values of the dependent variable). {p_end}
 
{pstd} 
It also gives you the option to save the predicted Leave-one-out error,
 leave one out prediction and the leverage statistic from the model.{p_end}
 
{title:Saved Results}

{pstd}
cv_regress returns the root mean squared error r(rmse), the log mean squared error
 r(lmse), the mean absolute deviation r(mae), and the pseudo R squared r(pr2). {p_end}
{pstd}
The program may also create the predicted Leave-one-out error, LOO prediction and 
leverage statistic for the same sample used in the estimated model.

{marker Acknowledgments}
{title: Acknowledgments}

{pstd}
The program is based on the shortcut evaluation of the Leave-one-out CV strategy in linear models described in "An Introduction to Statistical Learning" by James, G. et al (2013). 
The reported statistics are the same as the ones provided in Manuel Barron's -loocv- module. 
The program relies on Stata program -regress-.

I want to thank Scott Susin for his valuable feedback that helped finding a bug in the command. 

All errors are my own.


{marker examples}{...}
{title:Examples}
. sysuse auto,clear
. set seed 1
. gen wgt=runiform()
. loocv reg price weight i.foreign 


 Leave-One-Out Cross-Validation Results 
-----------------------------------------
         Method          |    Value
-------------------------+---------------
Root Mean Squared Errors |   2172.9124
Mean Absolute Errors     |   1690.7928
Pseudo-R2                |   .45133264
-----------------------------------------

. qui:reg price weight i.foreign 

. cv_regress
 
Leave-One-Out Cross-Validation Results 
-----------------------------------------
         Method          |    Value
-------------------------+---------------
Root Mean Squared Errors |    2172.9125
Log Mean Squared Errors  |      15.3676
Mean Absolute Errors     |    1690.7928
Pseudo-R2                |      0.45133
-----------------------------------------

. 
. loocv reg price weight i.foreign [aw=wgt]


 Leave-One-Out Cross-Validation Results 
-----------------------------------------
         Method          |    Value
-------------------------+---------------
Root Mean Squared Errors |   2196.2179
Mean Absolute Errors     |   1671.0773
Pseudo-R2                |   .44172641
-----------------------------------------

. qui:reg price weight i.foreign [aw=wgt]

. cv_regress


Leave-One-Out Cross-Validation Results 

-----------------------------------------
         Method          |    Value
-------------------------+---------------
Root Mean Squared Errors |    2196.2179
Log Mean Squared Errors  |      15.3890
Mean Absolute Errors     |    1671.0773
Pseudo-R2                |      0.44173
-----------------------------------------


* Using different weights for CV evaluation

. qui:reg price weight i.foreign  

. cv_regress, cvwgt(wgt)

Leave-One-Out Cross-Validation Results 
Statistics are estimated using -wgt- as weights

-----------------------------------------
         Method          |    Value
-------------------------+---------------
Root Mean Squared Errors |    2212.4447
Log Mean Squared Errors  |      15.4037
Mean Absolute Errors     |    1703.7026
Pseudo-R2                |      0.37805
-----------------------------------------


** this would be the same as:
. cv_regress, generr(e_i)

. gen double e_ia=abs(e_i)

. gen double e_ib=e_i^2

. sum e_ia [w=wgt]
(analytic weights assumed)

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
        e_ia |      74  32.6092222    1703.703   1421.127   25.81131    6967.48

. sum e_ib [w=wgt]	
(analytic weights assumed)
    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
	    e_ib |      74  32.6092222     4894912    9496328   666.2237   4.85e+07

.display r(mean)^.5
2212.4447
 

{marker Author}{...}
{title:Author}

{pstd}
Fernando Rios-Avila{break}
Levy Economics Institute of Bard College{break}
Blithewood-Bard College{break}
Annandale-on-Hudson, NY{break}
friosavi@levy.org