{smcl}
{* *! version 1.1.0  29apr2020}{...}

{cmd:help epimodels}
{hline}

{title:Title}

{p2colset 5 16 18 2}{...}
{p2col:{hi: epimodels fit} {hline 2} Estimation of parameters of epidemiological models from data.}
{p2colreset}{...}


{title:Syntax}

{p 8 12 2}
{cmd: epimodels fit modelname,}
{it:options}

{pstd}
{cmd:Epimodels fit} allows to estimate the parameters of 
epidemiological models by fitting the model curves to the 
actual data. In the description below {opt modelname} is name 
of the model: SIR or SEIR; {opt param} is any of {it:beta} or 
{it:gamma} for SIR model, and {it:beta}, {it:gamma}, {it:sigma}, 
{it:mu}, or {it:nu} for SEIR model{p_end}


{synoptset 25 tabbed}{...}
{synopthdr}
{synoptline}
{syntab :Model parameters (optional)}
{synopt :{opt param(# | .)}} Known or unknown value of the corresponding 
parameter of the model{break}(default is missing if not specified).

{synopt :{opt param0(#)}} Starting value for the search{break}
(default is 0.5 if not specified).{break}

{syntab :Initial conditions (optional, assumed to be zero if not specified)}
{synopt :{opt susceptible(#)}}Number of susceptible individuals at t0{p_end}
{synopt :{opt exposed(#)}}Number of exposed individuals at t0 
(for SEIR model only){p_end}
{synopt :{opt infected(#)}}Number of infected individuals at t0{p_end}
{synopt :{opt recovered(#)}}Number of recovered individuals at t0{p_end}

{syntab :Levels (optional, but at least one must be specified)}
{synopt :{opt vsusceptible(var)}}Numeric variable indicating number of 
susceptible individuals at t{p_end}
{synopt :{opt vexposed(var)}}Numeric variable indicating number of 
exposed individuals at t (for SEIR model only){p_end}
{synopt :{opt vinfected(var)}}Numeric variable indicating number of 
infected individuals at t{p_end}
{synopt :{opt vrecovered(var)}}Numeric variable indicating number of 
recovered individuals at t{p_end}

{syntab :Other options}
{synopt :{opt format(%fmt)}}Specifies, which format should be used 
  to format the parameter estimates in the output{break} 
  (default is "%10.5f" if nothing is specified){p_end}


{synoptline}
{p2colreset}{...}

{title:Description}

{pstd}
{cmd: epimodels fit} estimates the parameters of epidemiological models 
searching for ones that provide the best fit of the model-based curves 
to the observed data. {p_end}

{pstd}
The supported models are: SIR and SEIR. Not every kind of disease/epidemics 
can be adequately described by these models. For example, the SIR model
implies that the susceptible population is declining (for any non-zero
value of beta and non-zero number of infected individuals) until it
reaches zero. This may or may not be true in your data. Sometimes it is
better to plot the time series that you have for observed levels before
performing the calibration. SEIR model is more flexible due to more
flexibility in parameters, but may be more complex to interpret or
formulate policy recommendations.{p_end}

{pstd}
More than one parameter of the model may be unknown. Indicate values 
of known parameters by specifying them like so: {it:beta(0.90)}. If the 
corresponding option is not specified or is specified as a 
missing value (.) that parameter will be estimated from data. 
For example, {it:gamma(.)}. A common situation is to exclude 
certain model effects, for example, if you know that there is no 
vaccination available or utilized for the disease, you can exclude 
the effect of vaccination from the SEIR model by specifying 
{it:nu(0.00)}. {p_end}

{pstd}
It is possible and common to specify multiple parameters as unknown, 
for example: {it:beta(.) gamma(.)}, in which case all of
them will be estimated from data.{p_end}

{pstd}
For every unknown parameter specify a starting value for the search, 
which is your best guess about this parameter. If such a starting 
value is not specified, the value 0.5 is taken as default. Specifying
starting values close or in the vicinity of the true ones helps the
search to converge.{p_end}

{pstd}
Initial conditions for the population must be specified in the same 
way how they are specified for the corresponding model, and default 
to zero if not specified. Total population may not be zero, hence 
at least one component must be non-zero in the initial conditions.{p_end}

{pstd}
Levels variables indicate the size of each group at a given time.
For example, {it:vinfected(zx75)} indicates that the number of infected 
individuals at time {it:t} is contained in variable {it:zx75}.{p_end}

{pstd}
Depending on the data availability you may have one or more 
variables reflecting the observed evolution of the corresponding 
levels. At least one is required, specifying more helps the model to
converge.{p_end}

{pstd}
Specifying initial conditions is important, since in the case 
where not all the level variables are known this allows establishing 
the total population size (which all these models maintain as constant).
{p_end}

{pstd}
The optimization is done by searching the values of the parameters of 
the model that minimize the deviation of the modelled data from the 
observed data (in the sense of sum of squared differences). Stata's 
built-in optimizer (in Mata) is used, which produces output during the
search indicating the progress made to improve the initial guess about
the values of the parameters. It may or may not converge, depending on
whether there is indeed a unique solution, or several, resulting in
similar quality of fit, and also on how appropriate is the selected
model for your data.{p_end}

{pstd}
Specify option {opt format(%fmt)} to provide a custom formatting of
estimates in the output report. The default format is %10.5f showing
5 decimal digits after comma.{p_end}


{title:Output}

    {hline}
{pstd}epimodels fit produces the report on the estimated values of the
model parameters, which may look like this for the SEIR model:{p_end}

                 SEIR model estimation
     ---------------------------------------------
            Parameter |    Value         Source   
     -----------------+---------------------------
             beta (β) |  0.903009       Estimated
            gamma (γ) |  0.200000       Estimated
            sigma (Ï‚) |  0.102000       Estimated
               mu (μ) |  0.000000        Supplied
               nu (ν) |  0.000000        Supplied
     ---------------------------------------------

{pstd}The first column shows the parameter name, followed by the value. If the
value was estimated, the third column will indicate this with the word,
'Estimated'. If the value of the model parameter was supplied by the user, this
will be indicated with the word 'Supplied' in the third column.{p_end}

{pstd}Correspondingly, the output for the SIR model will have fewer parameters,
and similar interpretation.{p_end}
  
{pstd}As the optimizer searches for the improvement over the starting values
of the parameters, it reports its progress over the iterations. This may take
some time, and it usually takes more time for estimating a larger number of 
parameters.{p_end}


{title:Saved results}

    {hline}

{pstd}{cmd:epimodels fit} is an r-class command which saves the following 
into r():{p_end}	

{phang2}{it:r(param)} - numeric scalars with the final values of the 
parameter estimates. Final values is a combination of the estimated 
and supplied values (if any).{p_end}

{phang2}{it:r(modelcmd)} - name of the command for model simulation that 
corresponds to the estimated model.{p_end}

{phang2}{it:r(modelvars)} - names of the variables for population group sizes
if modelled with the abovementioned command.{p_end}

{phang2}{it:r(datavars)} - names of the variables for population group sizes
as present in the data.{p_end}

{phang2}{it:r(estimated)} - names of the model parameters that were estimated
from the data.{p_end}

{phang2}{it:r(iniconditions)} - all of the options corresponding to the 
initial conditions (including the implied zeroes) as a single string macro 
(useful if you need to graph the estimated model curves immediately after 
the estimation of the parameters).{p_end}

{phang2}{it:r(finparameters)} - all the final parameters of the model 
(estimated and supplied) as a single string macro (useful if you need 
to graph the estimated model curves immediately after the estimation 
of the parameters).{p_end}

{phang2}{it:r(finparamstr)} - all the final parameters of the model 
(estimated and supplied) as a single string macro with greek letters 
used for corresponding parameter names (useful if you need to display
the estimated parameter values on the graph immediately after the 
estimation).{p_end}

{pstd}
Note that both {it:r(modelvars)} and {it:r(datavars)} are specified in the 
same sequence. If a variable is not present in the data, its place is 
occupied by a missing value (.) in the {it:r(datavars)}. A variable 
name will never be a missing in the {it:r(modelvars)}.{p_end}

{title:Examples}

    {hline}
{pstd}Estimation{p_end}

{phang2}{cmd:. epimodels fit SIR , beta(0.9) gamma(.) susceptible(1000)} {break}
  {cmd:infected(50) vinfected(infpop) }{p_end}

{pstd}Perform an estimation of the parameter {it:gamma} of the SIR model for the 
population of 1050 individuals (=1000+50+0) of which 50 were infected and 
none were recovered on day zero, given that the infected population evolved
according to variable {it:infpop}.{p_end}

{phang2}{cmd:. epimodels fit SIR , beta(0.9) gamma(.) gamma0(0.65)} {break}
  {cmd:susceptible(1000) infected(50) vinfected(infpop) }{p_end}

{pstd}Same as above, but indicating that the search for best value of 
{it:gamma} should start from the value 0.65 instead of the default 0.5.{p_end}

{phang2}{cmd:. epimodels fit SIR , beta(.) gamma(.) gamma0(0.65)}{break}
  {cmd:susceptible(1000) infected(50) vinfected(infpop) }{p_end}

{pstd}Same as above, but both parameters {it:beta} and {it:gamma} must be 
estimated from the data.{p_end}

{phang2}{cmd:. epimodels fit SIR , gamma0(0.65) susceptible(1000)}{break}
  {cmd:infected(50) vinfected(infpop) vrecovered(recpop) }{p_end}

{pstd}Same as above, but search for the parameters that fit two series from 
the data: the infected population as contained in the variable {it: infpop} 
and recovered population as contained in the variable {it: recpop}.{p_end}

{phang2}{cmd:. epimodels fit SEIR , mu(0.00) nu(0.00) susceptible(1000)}{break}
  {cmd:infected(50) vinfected(infpop) vrecovered(recpop) }{p_end}

{pstd}Same as above, but estimate the three parameters of the SEIR model, 
namely: {it:beta}, {it:gamma}, {it:sigma}, by improving on the starting 
value of (0.5, 0.5, 0.5) by fitting two series (infected and recovered 
populations).{p_end}


{title:Authors}

{phang}
{it:Sergiy Radyakin}, The World Bank
{p_end}

{phang}
{it:Paolo Verme}, The World Bank
{p_end}

{title:Also see}

{psee}
Online: {browse "http://www.radyakin.org/stata/epimodels/": epimodels homepage}
{p_end}