{smcl}
{* *! version 0.0.8 22mar2024}{...}
{vieweralsosee "[R] estat classification" "mansection R estat_classification"}{...}
{vieweralsosee "" "--"}{...}
{viewerjumpto "Syntax" "fitit##syntax"}{...}
{viewerjumpto "Description" "fitit##description"}{...}
{viewerjumpto "Options" "fitit##options"}{...}
{viewerjumpto "Examples" "fitit##examples"}{...}
{viewerjumpto "Returned Values" "fitit##retvals"}{...}
{viewerjumpto "Additional Information" "fitit##additional"}{...}
{viewerjumpto "Contact" "fitit##contact"}{...}
{title:Model Fitting for Cross-Validation in Stata}

{marker syntax}{...}
{title:Syntax}

{p 8 32 2}
{cmd:fitit} {it:"estimation command"} {cmd:,} {cmdab:spl:it(}{it:passthru}{cmd:)} 
{cmdab:res:ults(}{it:string asis}{cmd:)} [ {cmdab:kf:old(}{it:integer}{cmd:)} 
{cmd:noall} {cmdab:dis:play} {cmdab:na:me(}{it:string asis}{cmd:)}]{p_end}

{synoptset 25 tabbed}{...}
{synoptline}
{synopthdr}
{synoptline}
{syntab:Required}
{synopt :{opt spl:it}}name of the variable that identifies the training split(s){p_end}
{synopt :{opt res:ults}}a stub for storing estimation results{p_end}
{syntab:Optional}
{synopt :{opt kf:old}}specifies the number of folds in the training set; default is {cmd:kfold(1)}.{p_end}
{synopt :{opt noall}}suppresses prediction on entire training set for K-Fold cases{p_end}
{synopt :{opt dis:play}}display results in window; default is {cmd:off}{p_end}
{synopt :{opt na:me}}is used to name the collection storing the results; default is {cmd:name(xvfit)}.{p_end}
{synoptline}


{marker description}{...}
{title:Description}

INCLUDE help xvphase-fit

{marker options}{...}
{title:Options}

{dlgtab:Required}

{phang}
{opt spl:it} must contain the name of the variable that stores the test, 
validation, and test splits.  There will only be a single variable if the splits 
were created using {help splitit}.  

{phang}
{opt res:ults} is used to {help estimates_store:estimates store} the estimation 
results from each of the {opt k:fold} folds in the dataset.  When used with 
K-Fold cross-validation, the estimation results returned by {help ereturn} will 
be based on fitting the model to the entire training set.  Results from each of 
the training folds can be easily recovered using the appropriate reference 
passed to the {opt res:ults} option.  In this case, you will need to add the 
fold number as a suffix to the name you pass to the {opt res:ults} option to 
recover the estimation results for that fold.  The fold number identifies the 
held-out fold.  So, the number 1 will recover the model that was fitted to all 
of the training folds except number 1.

{dlgtab:Optional}

{phang}
{opt kf:old} defines the number of K-Folds used for the training set.  In other 
places, we reference using K-Fold cross-validation in the more common form, 
where the training set consists of multiple subsets of data.  However, standard 
train/test and train/validation/test splits are simply a special case of K-Fold 
cross-validation where there is only a single fold.  

{phang}
{opt no:all} is an option to prevent predicting the outcome for a model fitted 
to the entire training set when using K-Fold cross-validation.  If this option 
is used, {opt kfi:fin} will have no effect since the relevant predictions will 
not be generated.

{phang}
{opt dis:play} an option to display the model fitting results in the result 
window.  If using a large number of K-Folds, it may be useful to not print all 
of the model fitting results to the result window.

{phang}
{opt na:me} is an option to pass a name to the collection created to store the 
results.  When {cmd fitit} is executed, it will initialize a new collection 
or replace the existing collection with the same name.  If you want to retain 
the validation results from multiple executions, pass an argument to this 
option.  {it:Note:} this only affects users using Stata 17 or later.

{marker examples}{...}
{title:Examples}

{p 4 4 2}Load example data{p_end}
{p 8 4 2}{stata sysuse auto.dta, clear}{p_end}
{p 4 4 2}Expand the data to create identical K-Folds{p_end}
{p 8 4 2}{stata expand 6}{p_end}
{p 4 4 2}Create a "split" identifier{p_end}
{p 8 4 2}{stata "bys make: g byte spvar = _n"}{p_end}
{p 4 4 2}Fit a model to each of the K-Folds and all of the training set{p_end}
{p 8 4 2}{stata fitit "reg price mpg headroom", spl(spvar) res(tst) kf(5)}{p_end}
{p 4 4 2}Fit the model only the the individual K-Folds{p_end}
{p 8 4 2}{stata fitit "reg price mpg headroom", spl(spvar) res(tst) kf(5) noall}{p_end}


{marker retvals}{...}
{title:Returned Values}
{p 4 4 8}The following lists the names of the e-macros and their contents.{p_end}

{synoptset 25 tabbed}{...}
{synoptline}
{synopthdr:Name}
{synoptline}
{synopt :{cmd:e(estres#)}}the name to store the estimation results on the #th fold.{p_end}
{synopt :{cmd:e(estresnames)}}the names of all the estimation results{p_end}
{synopt :{cmd:e(estresall)}}the name used to store the estimation results for the entire training set when K-Fold cross-validation is used.{p_end}
{synopt :{cmd:e(predifin)}}the if expression to use when predicting on validation/test split.{p_end}
{synopt :{cmd:e(kfpredifin)}}the if expression to use when predicting on the K-Fold hold out set.{p_end}
{synoptline}

{p 4 4 8}{cmd:fitit} also reposts the e-return values from the model fitted to the entire training set.  As a reminder, when used with {opt k:fold} > 1, the results returned by {help ereturn} will come from fitting the model to the entire training set (e.g., all of the K-Folds simultaneously).  The results from individual folds can be recovered by appending the held-out fold number to the value passed to {opt res:ults} and calling {help estimates_restore:estimates restore}.{p_end}

{marker additional}{...}
{title:Additional Information}
{pstd}
The {cmdab dis:play} option in validateit is enabled by 
{help collect_preview:collect preview}.  In addition to providing a convient way 
for us to structure the display in a useful way, it also makes it easy for you - 
the user - to export the validation results into any of several formats.  The 
results from {help fitit} are all stored in the collection named 
{cmd:xvfit}.  For more information on how to export these results into the 
format of your choosing, please see {help collect_export:collect export}.

{pstd}
If you have questions, comments, or find bugs, please submit an issue in the 
{browse "https://github.com/wbuchanan/crossvalidate":crossvalidate GitHub repository}.


{marker contact}{...}
{title:Contact}
{p 4 4 8}William R. Buchanan, Ph.D.{p_end}
{p 4 4 8}Sr. Research Scientist, SAG Corporation{p_end}
{p 4 4 8}{browse "https://www.sagcorp.com":SAG Corporation}{p_end}
{p 4 4 8}wbuchanan at sagcorp [dot] com{p_end}

{p 4 4 8}Steven D. Brownell, Ph.D.{p_end}
{p 4 4 8}Economist, SAG Corporation{p_end}
{p 4 4 8}{browse "https://www.sagcorp.com":SAG Corporation}{p_end}
{p 4 4 8}sbrownell at sagcorp [dot] com{p_end}