{smcl} {* *! Version 6, G. Cerulli, September 30, 2025}{...} {title:Title} {phang}{bf:opl_ma_fb} {hline 2} Optimal Policy Learning for Multi-Action Treatment using First-Best Policy and Risk Preference {title:Syntax} {p 8 8 2} {cmd:opl_ma_fb} {it:depvar indepvars} {cmd:,} {cmd:policy_train(}{it:varname}{cmd:)} {cmd:model(}{it:string}{cmd:)} {cmd:name_opt_policy(}{it:name}{cmd:)} [{cmd:match_name(}{it:name}{cmd:)} {cmd:new_data(}{it:name}{cmd:)} {cmd:policy_non_optimal_train(}{it:varname}{cmd:)} {cmd:policy_non_optimal_new(}{it:varname}{cmd:)} {cmd:save_preds_vars(}{it:name}{cmd:)} {cmd:gr_action_train(}{it:name}{cmd:)} {cmd:gr_reward_train(}{it:name}{cmd:)} {cmd:gr_reward_new(}{it:name}{cmd:)}] {title:Description} {pstd} {cmd:opl_ma_fb} implements first-best Optimal Policy Learning (OPL) algorithm to estimate the best treatment assignment given an outcome and a set of observed covariates and treatment effects. It allows for different risk preferences in decision-making (i.e., risk-neutral, risk-averse linear, risk-averse quadratic). This command uses linear regression for estimating nuisance conditional means. {title:Options} {dlgtab:Required} {phang} {opt policy_train(varname)} specifies the treatment variable, which must contain consecutive integers starting from 0 (e.g., 0,1,2,...,M). {phang} {opt model(string)} specifies the decision model: {pmore} {it:risk_neutral}: considers only expected reward (no variance or risk are accounted for). {pmore} {it:risk_averse_linear}: adjusts reward by a linear function of its variance. {pmore} {it:risk_averse_quadratic}: adjusts reward by a quadratic function of its variance. {phang} {opt name_opt_policy(name)} specifies the name of the generated variable containing the estimated optimal policy. {dlgtab:Optional} {phang} {opt match_name(name)} specifies the name of the variable that stores whether the actual treatment matches the optimal one. {phang} {opt new_data(name)} provides a second dataset to predict optimal actions for new units. This dataset contains the same features as the training dataset. {phang} {opt policy_non_optimal_train(varname)} is an alternative (non-optimal) policy to compare against either training or optimal policy within training data. {phang} {opt policy_non_optimal_new(varname)} is an alternative (non-optimal) policy to compare against optimal policy within new data. {phang} {opt save_preds_vars(name)} saves conditional expectations and variances. {phang} {opt gr_action_train(name)} saves a graph comparing actual vs. optimal action allocation in the training dataset. {phang} {opt gr_reward_train(name)} saves a graph comparing actual vs. maximal expected reward in the training dataset. {phang} {opt gr_reward_new(name)} saves a graph showing maximal expected reward for new policy observations. {dlgtab:Returns} {synoptset 24 tabbed}{...} {syntab:Scalars} {synopt:{cmd:e(N_train)}}Number of observations in the training dataset{p_end} {synopt:{cmd:e(N_new)}}Number of observations in the new (unlabeled) dataset{p_end} {synopt:{cmd:e(N_train_opt_pol)}}Number of observations for computing the optimal policy in the training dataset{p_end} {synopt:{cmd:e(V_train)}}Value function in the training dataset{p_end} {synopt:{cmd:e(N_V_train)}}Number of observations for computing the value function in the training dataset{p_end} {synopt:{cmd:e(V_non_opt_train)}}Value function in the training dataset for the non-optimal policy{p_end} {synopt:{cmd:e(N_V_non_opt_train)}}Number of observations for computing the value function in the non-optimal training dataset{p_end} {synopt:{cmd:e(V_opt_train)}}Value function with optimal policy in the training dataset{p_end} {synopt:{cmd:e(N_V_opt_train)}}Number of observations for computing with optimal policy the value function in the training dataset{p_end} {synopt:{cmd:e(V_opt_new)}}Value function in the new dataset for the optimal policy{p_end} {synopt:{cmd:e(N_V_opt_new)}} Number of observations for computing the value function in the new dataset for the optimal policy{p_end} {synopt:{cmd:e(rate_opt_match)}}Rate of matches between the optimal and the current training policy{p_end} {dlgtab:Generated variables} {phang} {opt _index}: indicator variable specifying the dataset source of each observation (0 = training data; 1 = new data). {phang} {opt _opt_policy}: the estimated optimal policy rule, assigning to each unit the treatment that maximizes expected welfare. {phang} {opt _Y_hat_policy_train}: the predicted outcome under the actual (observed) training policy, i.e. the historical assignment rule applied in the data. {phang} {opt _Y_hat_policy_train_non_optimal}: the predicted outcome under a given non-optimal policy provided in the training set, used as a benchmark for comparison. {phang} {opt Y_hat_policy_optimal}: the predicted outcome under the estimated optimal policy, i.e. the counterfactual outcome distribution if all units had followed the  rst-best policy. {phang} {opt _match_var}: an indicator variable equal to 1 if the actual treatment coincides with the estimated optimal treatment, and 0 otherwise. It measures the rate of alignment between historical and optimal assignments. {dlgtab:Examples} {pstd}{bf:Example}: Basic usage with a risk-neutral model{p_end} {phang2} Generate the initial dataset by simulation:{p_end} {phang3} {stata clear all}{p_end} {phang3} {stata set obs 100}{p_end} {phang3} {stata set seed 1010}{p_end} {phang3} {stata generate A = floor(runiform()*3)}{p_end} {phang3} {stata gen x1 = rnormal()}{p_end} {phang3} {stata gen x2 = rnormal()}{p_end} {phang3} {stata gen y = 100*runiform()}{p_end} {phang2} Split the dataset into training and testing (i.e., new data):{p_end} {phang3} {stata get_train_test , dataname(mydata) split(0.60 0.40) split_var(svar) rseed(101)}{p_end} {phang2} Run opl_ma_fb with risk-neutral preferences{p_end} {phang3} {stata opl_ma_fb y x1 x2 , policy_train(A) model(risk_neutral) name_opt_policy(opt_policy) new_data(mydata_test) match_name(match_var) gr_action_train(action_graph) gr_reward_train(reward_graph)}{p_end} {dlgtab:References} {phang} Athey, S., and Wager S. 2021. Policy Learning with Observational Data, {it:Econometrica}, 89, 1, 133–161. {phang} Cerulli, G. 2021. Improving econometric prediction by machine learning, {it:Applied Economics Letters}, 28, 16, 1419-1425. {phang} Cerulli, G. 2022. Optimal treatment assignment of a threshold-based policy: empirical protocol and related issues, {it:Applied Economics Letters}, 30, 8, 1010-1017. {phang} Cerulli, G. 2023. {it:Fundamentals of Supervised Machine Learning: With Applications in Python, R, and Stata}, Springer, 2023. {phang} Cerulli, G. 2024. Optimal policy learning with observational data in multi-action scenarios: Estimation, risk preference, and potential failures. {it:arXiv preprint}, arXiv:2403.20250. https://arxiv.org/abs/2403.20250. {phang} Cerulli, G. 2025. Optimal policy learning using Stata. {it:The Stata Journal}, 25, 2, 309-343. {phang} Gareth, J., Witten, D., Hastie, D.T., Tibshirani, R. 2013. {it:An Introduction to Statistical Learning : with Applications in R}. New York, Springer. {phang} Kennedy, E. H. 2023. Towards optimal doubly robust estimation of heterogeneous causal effects. {it:Electronic Journal of Statistics}, 17, 2, 3008-3049. {phang} Kitagawa, T., and A. Tetenov. 2018. Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice, {it:Econometrica}, 86, 2, 591–616. {phang} Kunzel, S. R., Sekhon, J. S., Bickel, P. J., Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. {it:Proceedings of the National Academy of Sciences of the United States of America}, 116, 10, 4156-4165. {dlgtab:Acknowledgment} {pstd} The development of this software was supported by: FOSSR (Fostering Open Science in Social Science Research), a project funded by the European Union - NextGenerationEU under the NPRR Grant agreement n. MURIR0000008; PRIN Project RECIPE (Linking Research Evidence to Policy Impact and Learning: Increasing the Effectiveness of Rural Development Programmes Towards Green Deal Goals), MUR code: 20224ZHNXE. {dlgtab:Author} {phang}Giovanni Cerulli{p_end} {phang}IRCrES-CNR{p_end} {phang}Research Institute for Sustainable Economic Growth, National Research Council of Italy{p_end} {phang}E-mail: {browse "mailto:giovanni.cerulli@cnr.it":giovanni.cerulli@cnr.it}{p_end} {dlgtab:Also see} {psee} Online: {helpb make_cate}, {helpb opl_tb}, {helpb opl_lc}, {helpb opl_lc_c}, {helpb opl_dt}, {helpb opl_dt_c}{p_end}