Billy Buchanan, Sr Research Scientist, SAG Corporation
Steven Brownell, Sr Economist, SAG Corporation
Slides available at: https://411steven.github.io/stataConference2024
crossvalidate
packagexv
prefix for the majority of use casesxvloo
prefix for leave-one-out cross-validation use cases xv|xvloo # [#], MEtric(string asis) [seed(integer) Uid(varlist) TPoint(string asis) SPLit(string asis) KFold(integer) RESults(string asis) fitnm(string asis) Classes(integer) PStub(string asis) noall MOnitors(string asis) DISplay RETain valnm(string asis) PMethod(string asis) POpts(string asis)] : estimation command ...
sysuse auto.dta, clear // 80/20 TT Split using MSE for the evaluation metric xv 0.8, me(mse): reg price mpg // 60/20/20 TVT split using MSE for the validation metric xv 0.6 .2, me(mse): reg price mpg, vce(rob) // 60/20/20 TVT 5-fold split using MSE for the validation metric // and also reporting out Mean Avg % Error (MAPE) xv 0.6 .2, me(mse) kf(5) mo(mape): reg price mpg, vce(rob)
real scalar metricName(string scalar pred, string scalar obs, string scalar touse, | transmorphic matrix opts) { real colvector y, yhat real scalar metricValue yhat = st_data(., pred, touse) y = st_data(., obs, touse) ... return(metricValue) }
real scalar metricName(string scalar pred, string scalar obs, string scalar touse, | transmorphic matrix opts) { real colvector y, yhat real scalar metricValue yhat = st_data(., pred, touse) y = st_data(., obs, touse) ... return(metricValue) }
// Load an example dataset sysuse auto.dta, clear // Use an 80/20 TT split using MSE for the metric and also reporting out // on several other model fit statistics for this simple regression model xv .8, metric(mse) pstub(pred) display monitors(rmse((1, 2)) mae /// mape smape(("y"))): reg price mpg i.foreign
// Load an example dataset sysuse auto.dta, clear // Use an 80/20 TT split using MSE for the metric and also reporting out // on several other model fit statistics for this simple regression model xv .8, metric(mse) pstub(pred) display monitors(rmse((1, 2)) mae /// mape smape(("y"))): reg price mpg i.foreign
// Load an example dataset sysuse auto.dta, clear // Use an 80/20 TT split using MSE for the metric and also reporting out // on several other model fit statistics for this simple regression model xv .8, metric(mse) pstub(pred) display monitors(rmse((1, 2)) mae /// mape smape(("y"))): reg price mpg i.foreign
// Load an example dataset webuse lbw.dta, clear // Use a 60/20/20 TVT split with 4-folds using ACC for the metric for logit model xv 0.6 0.2, metric(acc) pstub(p) kfold(4) display split(logsplit2) /// monitors(sens spec prev ppv npv bacc jindex) classes(2) seed(7779311): /// logit low age lwt i.race smoke ptl ht ui
// Load an example dataset webuse lbw.dta, clear // Use a 60/20/20 TVT split with 4-folds using ACC for the metric for logit model xv 0.6 0.2, metric(acc) pstub(p) kfold(4) display split(logsplit2) /// monitors(sens spec prev ppv npv bacc jindex) classes(2) seed(7779311): /// logit low age lwt i.race smoke ptl ht ui
// Load an example dataset webuse lbw.dta, clear // Use a 60/20/20 TVT split with 4-folds using ACC for the metric for logit model xv 0.6 0.2, metric(acc) pstub(p) kfold(4) display split(logsplit2) /// monitors(sens spec prev ppv npv bacc jindex) classes(2) seed(7779311): /// logit low age lwt i.race smoke ptl ht ui
// Load an example dataset webuse fullauto.dta, clear // Drop observations set seed 111 drop if runiform() > 0.22 // Use a 80/20 TT split with n-folds for ordinal logit model xvloo 0.8, metric(mcacc) pstub(p) display classes(5) retain /// split(mcsplit) monitors(mcsens mcprec mcspec mcppv mcnpv mcbacc mcmcc mcf1 /// mcjindex mcdetect mckappa): ologit rep77 foreign length mpg
// Load an example dataset webuse fullauto.dta, clear // Drop observations set seed 111 drop if runiform() > 0.22 // Use a 80/20 TT split with n-folds for ordinal logit model xvloo 0.8, metric(mcacc) pstub(p) display classes(5) retain /// split(mcsplit) monitors(mcsens mcprec mcspec mcppv mcnpv mcbacc mcmcc mcf1 /// mcjindex mcdetect mckappa): ologit rep77 foreign length mpg
// Load an example dataset webuse fullauto.dta, clear // Drop observations set seed 111 drop if runiform() > 0.22 // Use a 80/20 TT split with n-folds for ordinal logit model xvloo 0.8, metric(mcacc) pstub(p) display classes(5) retain /// split(mcsplit) monitors(mcsens mcprec mcspec mcppv mcnpv mcbacc mcmcc mcf1 /// mcjindex mcdetect mckappa): ologit rep77 foreign length mpg