{smcl}
{* *! version 1.0 18 Oct 2020}{...}
{vieweralsosee "" "--"}{...}
{vieweralsosee "Install command2" "ssc install command2"}{...}
{vieweralsosee "Help command2 (if installed)" "help command2"}{...}
{viewerjumpto "Syntax" "C:\ado\plus\r\reg2logit##syntax"}{...}
{viewerjumpto "Description" "C:\ado\plus\r\reg2logit##description"}{...}
{viewerjumpto "Options" "C:\ado\plus\r\reg2logit##options"}{...}
{viewerjumpto "Remarks" "C:\ado\plus\r\reg2logit##remarks"}{...}
{viewerjumpto "Examples" "C:\ado\plus\r\reg2logit##examples"}{...}
{title:Title}
{phang}
{bf:reg2logit} {hline 2} Approximates logistic regression parameters using OLS linear regression.
{marker syntax}{...}
{title:Syntax}
{p 8 17 2}
{cmdab:reg2logit}
{{it:yvar}} [{{it:xvars}}] {ifin}
[{cmd:,} {it:options}]
{synoptset 20 tabbed}{...}
{synopthdr}
{synoptline}
{syntab:Optional}
{synopt:{opt iter:ate(#)}} Number of times to iterate after transforming the OLS parameter estimates. Default value is 0.{p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
{marker description}{...}
{title:Description}
{pstd}
{cmd:reg2logit} estimates the parameters of a logistic regression of {it:yvar} on {it:xvars} by transforming OLS estimates of the linear regression of {it:yvar} on {it:xvars}.
Factor {it:xvars} are allowed.
The transformation formula, first derived by Haggstrom (1983), is discussed by Allison (2020). {p_end}
{pstd}The transformed OLS estimates are fully efficient estimates of the logistic regression under the assumption that the {it:xvars} are multivariate normal conditionally on the value of the {it:yvar}.
If the {it:xvars} are in fact conditionally multivariate normal, then the estimates produced by {cmd:reg2logit} are more efficient than the "distribution-free" estimates produced by the {cmd:logit} command,
which assume nothing about the distribution of the {it:xvars}.
If the {it:xvars} are not conditionally multivariate normal, then {cmd:reg2logit} may be more or less efficient than {cmd:logit}, depending on how much the {it:xvars} depart from conditional multivariate normality.
Even when they are less efficient than the {cmd:logit} estimates, the {cmd:reg2logit} estimates are often similar and have the advantage of running more quickly and without iteration.
In the various conditions that have been tested by simulation, {cmd:reg2logit} produced predicted probabilities that are very similar to those produced by {cmd:logit}, except when the model included strong interactions (Allison 2020).{p_end}
{pstd}By default, {cmd:reg2logit} returns the transformed OLS coefficients.
If you set the {cmd:iter()} option to a value greater than zero, {cmd:reg2logit} iterates toward the same distribution-free maximum likelihood estimates produced by {cmd:logit}.{p_end}
{pstd}After {cmd:reg2logit}, you can run {cmd:predict} to get predicted probabilities, just as you can after {cmd:logit}.{p_end}
{marker examples}{...}
{title:Examples}
{phang2}{cmd:sysuse auto, clear}{p_end}
{pstd}/* Approximate logistic regression coefficients from the OLS estimates: */{p_end}
{phang2}{cmd:reg2logit foreign weight price }{p_end}
{pstd}/* Iterate toward maximum likelihood estimates of the logistic regression coefficients.... */{p_end}
{phang2}{cmd:reg2logit foreign weight price, iter(200) }{p_end}
{pstd}/* ...which are not terribly different in this example */{p_end}
{title:Applications}
{pstd}{cmd:reg2logit} has several applications.
{pstd}1. In some settings -- e.g., with big data or many correlated {it:xvars} -- iterative commands like {cmd:logit} can be slow to produce maximum likelihood estimates (Minka 2003; Ji & Telgarsky 2018).
{cmd:reg2logit} produces estimates quickly, without iteration. The estimates are often serviceable, often quite close to the maximum likelihood estimates,
and in fact are fully efficient estimates if {it:xvars} is conditionally multivariate normal.{p_end}
{pstd}2. If you set the {cmd:iter()} option to a value greater than zero, the transformed OLS estimates provide a plausible starting point for iteration toward maximum likelihood estimates.
Yet suprisingly, convergence is not necessarily faster than it is with the {cmd:logit} command, which starts with slope estimates of zero.
{p_end}
{pstd}3. In some settings, predicted probabilities must be obtained from a linear probability model. {cmd:reg2logit} followed by {cmd:predict} provides the best way of doing this (Allison 2020).
{p_end}
{pstd}4. One application for these predicted probabilities is to impute dummy variables after fitting a multivariate normal imputation model. This is implemented by our {cmd:mi_impute_genmod} command,
which you can install using {cmd:ssc install mi_impute_genmod}.
{p_end}
{title:Technical notes}
{pstd}When run with {cmd:iter()} at the default of 0, {cmd:reg2logit} returns a "Warning: Convergence not achieved."
You can generally ignore this, as it merely indicates that no iterations have been performed.{p_end}
{pstd}When run with {cmd:iter()} at the default of 0, {cmd:reg2logit} does not require {it:yvar} to be binary 0/1.
This laxity can be helpful in some problems, such as imputation problems where values of {it:yvar} other than 0 and 1 have previously been imputed by a normal model.
{p_end}
{title:References}
{pstd}
Allison, P.D. (2020, April 24). Better predicted probabilities from linear probability models. Statistical Horizons blog. https://statisticalhorizons.com/better-predicted-probabilities
{p_end}
{pstd}
Haggstrom, G. W. (1983). Logistic regression and discriminant analysis by ordinary least squares. Journal of Business & Economic Statistics, 1(3), 229-238.
{pstd}Ji, Z., & Telgarsky, M. (2018). Risk and parameter convergence of logistic regression. arXiv preprint arXiv:1803.07300.{p_end}
{pstd}Minka, T. P. (2003). A comparison of numerical optimizers for logistic regression. Unpublished manuscript, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.7017&rep=rep1&type=pdf.{p_end}
{title:Authors}
Paul von Hippel
Rich Williams
Paul Allison
{p}