help firthlogit                                          Version 1.0 2008-07-11

-------------------------------------------------------------------------------

Title

firthlogit -- Penalized maximum likelihood logistic regression

Syntax

firthlogit depvar [indepvars] [if] [in] [weight] [, options]

options Description ------------------------------------------------------------------------- Main level(#) set confidence level; default is prevailing setting (see creturn) or report odds ratios maximize_options maximization options ------------------------------------------------------------------------- by may be used with firthlogit; see by. fweights are allowed with firthlogit; see weight.

Description

firthlogit fits logistic models by penalized maximum likelihood regression. The method originally was proposed to reduce bias in maximum likelihood estimates in generalized linear models. It also has utility in logistic regression in circumstances in which "separation" is problematic.

Options

+------+ ----+ Main +-------------------------------------------------------------

level set confidence level; default is the Stata level setting.

or is another reporting option, displaying coefficients as odds ratios.

maximize_options many of the conventional ml options are available, the most important of which is constraint(), which is used in penalized likelihood ratio tests. Options not available with d0 estimators are not available; see ml.

Remarks

Firth (1993) suggested a modification of the score equations in order to reduce bias seen in generalized linear models. Heinze and Schemper (2002) suggested using Firth's method to overcome the problem of "separation" in logistic regression, a condition in the data in which maximum likelihood estimates tend to infinity (become inestimable). The method allows convergence to finite estimates in cases of separation in logistic regression.

The method penalizes the log-likelihood with one-half of the logarithm of the determinant of the information matrix. firthlogit uses ml with a d0 log-likelihood estimator program. d0 estimators use numerical derivatives, and so are slower and slightly less accurate than linear form lf, d1 or d2 estimator types. Nevertheless, differences in standard errors of the estimates between firthlogit and other software packages are very minor. At least one of the latter uses the unpenalized Hessian in the Newton-Raphson algorithm in order to avoid resorting to numerical derivatives there.

When the method is used in fitting logistic models in datasets giving rise to separation, the affected estimate is typically approaching a boundary condition. As a result, the likelihood profile is often asymmetric under these conditions; Wald tests and confidence intervals are liable to be inaccurate. In these circumstances, Heinze and coworkers recommend using likelihood ratio tests and profile likelihood confidence intervals in lieu of Wald-based statistics. Calculation of likelihood ratio test statistics with the method is done differently by Heinze and coworkers from what is conventionally done: instead of omitting the variable of interest and refitting the reduced model, the coefficient of interest is constrained to zero and left in the model in order to allow its contributing to the penalization. The test statistic is then computed as twice the difference in penalized log likelihood values of the unconstrained and constrained models by lrtest in a manner directly analogous to that of conventional likelihood ratio tests.

The penalization that allows for convergence to finite estimates in conditions of separation also allows convergence to finite estimates with very sparse data. In these circumstances, the penalization tends to over-correct for bias.

Examples

. webuse hiv1

. firthlogit hiv cd4 cd8

. firthlogit, or

. estimates store Full

. constraint define 1 cd4

. firthlogit hiv cd4 cd8, constraint(1)

. lrtest Full .

References

Firth, D. 1993. Bias reduction of maximum likelihood estimates. Biometrika 80:27-38.

Heinze, G. and Schemper, M. 2002. A solution to the problem of separation in logistic regression. Statistics in Medicine 21:2409-19.

Acknowledgements

Jeff Pitblado provided a valuable pointer in displaying the results correctly. The command is named so as to acknowledge David Firth as the source of the method. Note that Professor Firth is not otherwise associated with or responsible for this command: contact the author (below) to report bugs or other problems with the command.

Author

Joseph Coveney jcoveney@bigplanet.com

Also see

Manual: [R] exlogistic