{smcl}
{* *! version 30aug2024}{...}
{viewerjumpto "Syntax" "ddml_init##syntax"}{...}
{viewerjumpto "Options" "ddml_init##options"}{...}
{viewerjumpto "Installation" "ddml_init##installation"}{...}
{viewerjumpto "References" "ddml_init##references"}{...}
{viewerjumpto "Authors" "ddml_init##authors"}{...}
{vieweralsosee "ddml main page" "ddml"}{...}
{vieweralsosee "Other" "ddml_init##also_see"}{...}
{hline}
{cmd:help ddml init, ddml eq, ddml sample}{right: v1.4.4}
{hline}

{title:ddml init, eq and sample commands for Double Debiased Machine Learning}

{pstd}
{opt ddml} implements algorithms for causal inference aided by supervised
machine learning as proposed in 
{it:Double/debiased machine learning for treatment and structural parameters}
(Econometrics Journal, 2018). Five different models are supported, allowing for 
binary or continuous treatment variables and endogeneity, high-dimensional 
controls and/or instrumental variables. 

{pstd}
{opt ddml init} {it:model} initializes the model,
where {it:model} is either {it:partial}, {it:iv}, {it:interactive}, {it:fiv}, or {it:interactiveiv}.

{pstd}
{cmd: ddml eq: command} adds supervised ML programs for estimating conditional expectations,
where {it:eq} is the conditional expectation to be estimated (e.g., {it:E[Y|X]})
and {it:command} is a supported supervised ML program.

{pstd}
{opt ddml sample} adds cross-fitting repetitions to an existing and possibly already-estimated model.


{marker syntax}{...}
{title:Syntax}

{p 8 14}{cmd:ddml init}
{it:model} [if] [in]
[ , {opt mname(name)}
{opt prefix}
{opt kfolds(integer)}
{opt fcluster(varname)}
{opt foldvar(varlist)}
{opt reps(integer)} 
{opt norandom}
{opt tabfold}
{opt vars(varlist)}{bind: ]}

{pstd}
where {it:model} is either {it:partial}, {it:iv}, {it:interactive}, {it:fiv}, {it:interactiveiv}.

{p 8 14}{cmd:ddml} {it:eq} 
[ , {opt mname(name)}
{opt vname(varname)}
{opt l:earner(varname)}
{opt vtype(string)}
{opt predopt(string)}{bind: ] :}
{it:command} {it:depvar} {it:vars} [ , {it:cmdopt}{bind: ]}

{pstd}
where, depending on model chosen in Step 1,
{it:eq} is either 
{it:E[Y|X]} {it:E[Y|D,X]} {it:E[Y|X,Z]} {it:E[D|X]} {it:E[D|X,Z]} {it:E[Z|X]}.
{it:command} is a supported supervised ML program (e.g. {helpb pystacked} or {helpb cvlasso}).

{pstd}
Note: Options before ":" and after the first comma refer to {cmd:ddml}. 
Options that come after ":" and the final comma refer to the estimation command. 
{p_end}

{p 8 14}{cmd:ddml sample} [ , {opt append}[{cmd:(}{it:integer}{cmd:)}] {opt foldvar(varlist)} {bind: ]}

{pstd}
adds cross-fitting repetitions to an existing and possibly already-estimated model,
where the additional repetitions is indicated either by {opt append(#)}
or by {opt append} and the cross-fit fold identifiers in {opt foldvar(varlist)}.


{marker options}{...}
{synoptset 20}{...}
{synopthdr:init options}
{synoptline}
{synopt:{opt mname(name)}}
name of the DDML model. Allows to run multiple DDML
models simultaneously. Defaults to {it:m0}.
{p_end}
{synopt:{opt prefix}}
tells {opt ddml} to prefix the names of all created variables
with name of the DDML model.
Default is to prefix only the created sample and fold ID variables.
{p_end}
{synopt:{opt kfolds(integer)}}
number of cross-fitting folds. The default is 5.
{p_end}
{synopt:{opt fcluster(varname)}}
cluster identifiers for cluster randomization of random folds.
{p_end}
{synopt:{opt foldvar(varlist)}}
integer variable with user-specified cross-fitting folds (one per cross-fitting repetition).
{p_end}
{synopt:{opt norandom}}
use observations in existing order instead of randomizing before splitting into folds;
if multiple resamples, applies to first resample only;
ignored if user-defined fold variables are provided in {opt foldvar(varlist)}.
{p_end}
{synopt:{opt reps(integer)}}
cross-fitting repetitions, i.e., how often the cross-fitting procedure is
repeated on randomly generated folds. 
{p_end}
{synopt:{opt tabfold}}
prints a table with frequency of observations by fold.
{p_end}
{synopt:{opt vars(varlist)}}
tells {opt ddml} that the variables in {it:varlist} are used in the estimation.
Useful if you want the fold split to take account of
observations dropped because of missing values.
{p_end}
{synoptline}
{p2colreset}{...}
{pstd}

{synoptset 20}{...}
{synopthdr:equation options}
{synoptline}
{synopt:{opt mname(name)}}
name of the DDML model. Defaults to {it:m0}.
{p_end}
{synopt:{opt vname(varname)}}
name of the dependent variable in the reduced form estimation. 
This is usually inferred from the command line but is mandatory
for the {it:fiv} model.
{p_end}
{synopt:{opt l:earner(varname)}}
optional name of the variable to be created. 
{p_end}
{synopt:{opt vtype(string)}}
(rarely used) optional variable type of the variable to be created. Defaults to {it:double}. 
{it:none} can be used to leave the type field blank 
(required when using {cmd:ddml} with {helpb rforest}.)
{p_end}
{synopt:{opt predopt(string)}}
(rarely used) {cmd:predict} option to be used to get predicted values. 
Typical values could be {opt xb} or {opt pr}. Default is 
blank. 
{p_end}
{synoptline}
{p2colreset}{...}
{pstd}

{synoptset 20}{...}
{synopthdr:sample options}
{synoptline}
{synopt:{opt mname(name)}}
name of the DDML model. Defaults to {it:m0}.
{p_end}
{synopt:{opt append(#)}}
number of additional resamples to cross-fit.
{p_end}
{synopt:{opt append}}
when no number of resamples to append is provided, this is based on the list fold IDs in {opt foldvar(varlist)}.
{p_end}
{synopt:{opt foldvar(varlist)}}
integer variable with user-specified cross-fitting folds (one per cross-fitting repetition).
{p_end}
{synoptline}
{p2colreset}{...}
{pstd}


{smcl}
INCLUDE help ddml_install_ref_auth