{smcl}
{* *! version 1.0.0  28Oct2020}{...}
{viewerjumpto "Syntax" "crossvalidate##syntax"}{...}
{viewerjumpto "Description" "crossvalidate##description"}{...}
{viewerjumpto "Options" "crossvalidate##options"}{...}
{viewerjumpto "Examples" "crossvalidate##examples"}{...}
{viewerjumpto "Authors" "crossvalidate##authors"}{...}

{...}{* NB: these hide the newlines }
{...}
{...}
{title:Title}

{p2colset 5 23 25 2}{...}
{p2col :{cmd:crossvalidate} {hline 2}} k-fold Crossvalidation  {p_end}
{p2colreset}{...}


{marker syntax}{...}
{title:Syntax}

{p 8 16 2}
{cmd:crossvalidate} {newvar} {cmd:estimation_command} {depvar} {indepvars} {ifin} {cmd:,} [ folds(#) gen(newvar) shuffle  {it:options} ]

{pstd}
{cmd:crossv} can be used as a synonym for {cmd:crossvalidate}

{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{synopt :{opth folds:(crossvalidate##folds:#)}} Number of crossvalidation folds.  {p_end}
{synopt :{opt gen(newvar)}} Optionally, save the variable that splits observations into folds.{p_end}
{synopt :{opt shuffle}} Optionally, put the data in random order.{p_end}
{synopt :{opt options}} Additional options are passed to the estimation command {p_end}
{synoptline}


{marker description}{...}
{title:Description}

{pstd}
{cmd:crossvalidate} computes k-fold cross-validated predictions from any Stata estimation command.
The command breaks a dataset into a number of subsets ("folds"), and for each
 runs an estimator on everything but that subset, and predicts results.
 {cmd:crossvalidate} stores predicted values in a 
newly generated variable {cmd:newvar}.  
Predicted values are generated by issuing the command {cmd: predict newvar} for each 
fold and, depending on the estimation command, may represent probabilities, class predictions, or continuous
values.

{pstd}
{cmd:crossvalidate} passes whatever options you give it directly to the estimator; it handles only the folding.
 Examples of {cmd:estimation_command} include {cmd:svmachines} and {cmd:logistic}.  

{title:Remarks}
{pstd}
Only estimation commands that allow the use of {cmd:predict} after the estimation command can be used.
The program does not currently support the prediction of multiple variables as would be needed, for example, 
for multinomial logistic regression.
 
 
{marker options}{...}
{title:Options}

{phang}
{marker folds}{...} 
{opt  folds:(#)}  Number of folds. Common values are 5 and 10. By default, {cmd:folds(5)} is used. {p_end}

{phang}
{marker shuffle}{...} 
{opt  shuffle} Optionally, generates random folds. This option uses random values; set the  {cmd:seed} 
if reproducibility is required. 
  By default, folds are in sort order. {p_end}
  
{phang}
{opt  gen(newvar)} Optionally, save the variable that splits the observations into folds into new variable {cmd: newvar}. 
The folds are labeled from 1, 2,...,<number of folds>.
This is useful to compute the average evaluation criterion for each fold later. 
{p_end}
  
{phang}
{marker estimation_options}{...} 
{opt  options}  Additional options are passed to the estimation command. {p_end}


{marker examples}{...}
{title:Examples}

{pstd} Typical classification with support vector machines:

{phang}{cmd:. sysuse auto}

{phang}{cmd:. crossvalidate  P svmachines foreign headroom gear_ratio weight, folds(5) type(svc) gamma(0.4) c(51) }

{phang}{cmd:. n err = foreign != P }

{phang}{cmd:. qui sum err }

{phang}{cmd:. di "Cross-validated error rate: `r(mean)'" }


{pstd} Nearest Neighbor classification:

{phang} {cmd:. crossvalidate P discrim knn headroom gear_ratio weight, k(3) group(foreign)}


{marker authors}{...}
{title:Authors}

{pmore} Matthias Schonlau <schonlau@uwaterloo.ca>{p_end}

{pmore} Nick Guenther <nguenthe@uwaterloo.ca>{p_end}