{smcl}
{* 2021-04-25}{...}
{cmd:help firthlogit} {right:Version 1.2 2021-04-24}
{hline}
{title:Title}
{p2colset 9 23 25 2}{...}
{p2col: {hi:firthlogit} {hline 2}}Penalized maximum likelihood logistic regression{p_end}
{p2colreset}{...}
{title:Syntax}
{p 8 17 2}
{cmdab:firthlogit}
{depvar} [{indepvars}]
{ifin}
{weight}
[{cmd:,} {it:options}]
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Main}
{synopt:{opt l:evel(#)}}set confidence level; default is prevailing setting (see {help creturn}){p_end}
{synopt:{opt or}}report odds ratios{p_end}
{synopt:{it:maximize_options}}maximization options{p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
{it:indepvars} may contain factor variables; see {help fvvarlist}.{p_end}
{p 4 6 2}
{cmd:by} may be used with {cmd:firthlogit}; see {helpb by}.{p_end}
{p 4 6 2}
{cmd:fweight}s are allowed with {cmd:firthlogit}; see {help weight}.
{title:Description}
{pstd}
{cmd:firthlogit} fits logistic models by penalized maximum likelihood
regression. The method originally was proposed to reduce bias in maximum likelihood
estimates in generalized linear models. It also has utility in logistic regression
in circumstances in which "separation" is problematic.
{title:Options}
{dlgtab:Main}
{phang}
{cmd:level} set confidence level; default is the Stata {help level} setting.
{phang}
{cmd:or} is another reporting option, displaying coefficients as odds ratios.
{phang}
{it:maximize_options} many of the conventional {cmd:ml} options are available, the most important
of which is {cmd:constraint()}, which is used in penalized likelihood ratio tests. Options
not available with {it:d0} estimators are not available; see {help ml}.
{title:Remarks}
{pstd}
Firth (1993) suggested a modification of the score equations in order to reduce
bias seen in generalized linear models. Heinze and Schemper (2002) suggested using Firth's
method to overcome the problem of "separation" in logistic regression, a condition in the data
in which maximum likelihood estimates tend to infinity (become inestimable). The method
allows convergence to finite estimates in cases of separation in logistic regression.
{pstd}
The method penalizes the log-likelihood with one-half of the logarithm of the determinant of the
information matrix. {cmd:firthlogit} uses {cmd:ml} with a {it:d0} log-likelihood estimator
program. {it:d0} estimators use numerical derivatives, and so are slower and slightly less
accurate than linear form {it:lf}, {it:d1} or {it:d2} estimator types. Nevertheless,
differences in standard errors of the estimates between {cmd:firthlogit} and other
software packages are very minor. At least one of the latter uses the {it:unpenalized}
Hessian in the Newton-Raphson algorithm in order to avoid resorting to numerical derivatives
there.
{pstd}
When the method is used in fitting logistic models in datasets giving rise to separation, the
affected estimate is typically approaching a boundary condition. As a result, the likelihood
profile is often asymmetric under these conditions; Wald tests and confidence intervals are
liable to be inaccurate. In these circumstances, Heinze and coworkers recommend using
likelihood ratio tests and profile likelihood confidence intervals in lieu of Wald-based
statistics. Calculation of likelihood ratio test statistics with the method is done differently
by Heinze and coworkers from what is conventionally done: instead of omitting the variable of
interest and refitting the reduced model, the coefficient of interest is constrained to zero
and left in the model in order to allow its contributing to the penalization. The test statistic
is then computed as twice the difference in penalized log likelihood values of the unconstrained
and constrained models by {cmd:lrtest} in a manner directly analogous to that of conventional
likelihood ratio tests.
{pstd}
The penalization that allows for convergence to finite estimates in conditions of separation
also allows convergence to finite estimates with very sparse data. In these circumstances,
the penalization tends to over-correct for bias.
{pstd}
Users contemplating using this command should give consideration to alternatives, including
another user-written command, {cmd:penlogit} (available for installation by typing
{cmd:search penlogit} at Stata's command line), as well as official Stata's implementation
of Bayesian modeling procedures and its command for so-called exact logistic regression.
{title:Examples}
{phang}{cmd:. webuse hiv1}
{phang}{cmd:. firthlogit hiv cd4 cd8}
{phang}{cmd:. firthlogit, or}
{phang}{cmd:. estimates store Full}
{phang}{cmd:. constraint define 1 cd4}
{phang}{cmd:. firthlogit hiv cd4 cd8, constraint(1)}
{phang}{cmd:. lrtest Full .}
{title:References}
{pstd}
Firth, D. 1993. Bias reduction of maximum likelihood estimates. {it:Biometrika} {bf:80}:27{c -}38.
{pstd}
Heinze, G. and Schemper, M. 2002. A solution to the problem of separation in
logistic regression. {it:Statistics in Medicine} {bf:21}:2409{c -}19.
{title:Acknowledgements}
{pstd}
Jeff Pitblado provided a valuable pointer in displaying the results correctly. The command is named
so as to acknowledge David Firth as the source of the method. Note that Professor Firth
is not otherwise associated with or responsible for this command: contact the author (below)
to report bugs or other problems with the command.
{title:Author}
{pstd}
Joseph Coveney jcoveney@bigplanet.com
{title:Also see}
{psee}
Manual: {bf:[BAYES] bayeslogistic}, {bf:[R] exlogistic}
{psee}
Online: {helpb penlogit} (if installed), {helpb bayes_logistic}, {helpb exlogistic}