{smcl}
{* 14sep2021}{...}
{cmd:help mata mm_areg()}
{hline}

{title:Title}

{p 4 17 2}
{bf:mm_areg() -- Linear (least-squares) regression with absorbing factor}


{title:Syntax}

{pstd}
Simple syntax

{p 8 24 2}
{it:b} =
{cmd:mm_aregfit(}{it:y}{cmd:,} {it:id} [{cmd:,} {it:X}{cmd:,} {it:w}{cmd:,} {it:sort}{cmd:,} {it:quad}]{cmd:)}

{p 7 20 2}{bind:       }{it:y}:  {it:real colvector} containing dependent variable{p_end}
{p 7 20 2}{bind:      }{it:id}:  {it:real colvector} containing categorical factor to be absorbed{p_end}
{p 7 20 2}{bind:       }{it:X}:  {it:real matrix} containing predictors; may specify {cmd:.} to omit predictors{p_end}
{p 7 20 2}{bind:       }{it:w}:  {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end}
{p 7 20 2}{bind:    }{it:sort}:  whether to sort the data; specify {cmd:0} if the data is already sorted{p_end}
{p 7 20 2}{bind:    }{it:quad}:  whether to use quad precision when computing cross products; specify {cmd:0} to use double precision{p_end}

{pstd}
Advanced syntax

{pmore}
Setup

{p 12 24 2}
{it:S} =
{cmd:mm_areg(}{it:y}{cmd:,} {it:id} [{cmd:,} {it:X}{cmd:,} {it:w}{cmd:,} {it:sort}{cmd:,} {it:quad}]{cmd:)}

{pmore}
Retrieve results

{p2colset 9 41 43 2}{...}
{p2col:{bind:       }{it:b} = {cmd:mm_areg_b(}{it:S}{cmd:)}}coefficient vector ({it:beta} \ {it:alpha}){p_end}
{p2col:{bind:    }{it:beta} = {cmd:mm_areg_beta(}{it:S}{cmd:)}}slope coefficients (column vector){p_end}
{p2col:{bind:   }{it:alpha} = {cmd:mm_areg_alpha(}{it:S}{cmd:)}}global intercept{p_end}
{p2col:{bind:      }{it:xb} = {cmd:mm_areg_xb(}{it:S} [{cmd:,} {it:X}]{cmd:)}}fitted values ({it:alpha} + {it:X}*{it:beta}){p_end}
{p2col:{bind:      }{it:ue} = {cmd:mm_areg_ue(}{it:S}{cmd:)}}combined residual ({it:u} + {it:e}){p_end}
{p2col:{bind:     }{it:xbu} = {cmd:mm_areg_xbu(}{it:S}{cmd:)}}prediction including fixed effect ({it:alpha} + {it:X}*{it:beta} + {it:u}){p_end}
{p2col:{bind:       }{it:u} = {cmd:mm_areg_u(}{it:S}{cmd:)}}fixed effect{p_end}
{p2col:{bind:       }{it:e} = {cmd:mm_areg_e(}{it:S}{cmd:)}}idiosyncratic error{p_end}
{p2col:{bind:       }{it:s} = {cmd:mm_areg_s(}{it:S}{cmd:)}}scale (root mean squared error){p_end}
{p2col:{bind:      }{it:r2} = {cmd:mm_areg_r2(}{it:S}{cmd:)}}R-squared{p_end}
{p2col:{bind:      }{it:se} = {cmd:mm_areg_se(}{it:S}{cmd:)}}(non-robust) standard errors{p_end}
{p2col:{bind:       }{it:V} = {cmd:mm_areg_V(}{it:S}{cmd:)}}(non-robust) variance matrix{p_end}
{p2col:{bind:   }{it:XXinv} = {cmd:mm_areg_XXinv(}{it:S}{cmd:)}}inverse of X'X{p_end}
{p2col:{bind:     }{it:RSS} = {cmd:mm_areg_rss(}{it:S}{cmd:)}}residual sum of squares{p_end}
{p2col:{bind:   }{it:ymean} = {cmd:mm_areg_ymean(}{it:S}{cmd:)}}global mean of y{p_end}
{p2col:{bind:   }{it:means} = {cmd:mm_areg_means(}{it:S}{cmd:)}}global means of X (row vector){p_end}
{p2col:{bind:      }{it:yd} = {cmd:mm_areg_yd(}{it:S}{cmd:)}}group-demeaned y{p_end}
{p2col:{bind:      }{it:Xd} = {cmd:mm_areg_Xd(}{it:S}{cmd:)}}group-demeaned X (row vector){p_end}
{p2col:{bind:    }{it:omit} = {cmd:mm_areg_omit(}{it:S}{cmd:)}}column vector flagging omitted terms{p_end}
{p2col:{bind:  }{it:k_omit} = {cmd:mm_areg_k_omit(}{it:S}{cmd:)}}number of omitted terms{p_end}
{p2col:{bind:       }{it:N} = {cmd:mm_areg_N(}{it:S}{cmd:)}}number of observations (sum of weights){p_end}
{p2col:{bind:  }{it:levels} = {cmd:mm_areg_levels(}{it:S}{cmd:)}}levels (values of groups) in {it:id} {p_end}
{p2col:{bind:}{it:k_levels} = {cmd:mm_areg_k_levels(}{it:S}{cmd:)}}number of levels (groups) in {it:id}{p_end}
{p2col:{bind:       }{it:n} = {cmd:mm_areg_n(}{it:S}{cmd:)}}number of observations per group (unweighted){p_end}

{pmore}
{it:S} is a structure holding results and settings; declare {it:S} as {it:transmorphic}.


{title:Description}

{pstd}
{cmd:mm_areg()} fits a linear regression model with an absorbing categorical
factor using the least-squares technique. Results are equivalent to Stata's 
{helpb areg} or {helpb xtreg:xtreg,fe}.

{pstd}
{cmd:mm_areg()} uses quad precision when computing X'X and X'y. Specifying {it:quad}=0
will make {cmd:mm_areg()} faster, but less precise. Use {it:quad}=0 only if your data 
is well-behaved (reasonable means, not much collinearity).


{title:Examples}

{pstd}
If you are only interested in the coefficients, you can use
{cmd:mm_aregfit()} (simple syntax) to obtain a fit without much typing:

        . {stata sysuse auto, clear}
        . {stata areg price weight length, absorb(headroom)}
        . {stata "mata:"}
        : {stata y = st_data(., "price")}
        : {stata X = st_data(., "weight length")}
        : {stata id = st_data(., "headroom")}
        : {stata mm_aregfit(y, id, X)}
        : {stata end}

{pstd}
For more sophisticated applications, use the advanced syntax. Function
{cmd:mm_areg()} defines the problem and performs the main calculations. You can then
use functions such as {cmd:mm_areg_b()} or {cmd:mm_areg_r2()} to obtain results. The following
example illustrates how to obtain coefficients, standard errors, t values,
and the R-squared:

        . {stata "mata:"}
        : {stata S = mm_areg(y, id, X)}
        : {stata "mm_areg_b(S), mm_areg_se(S), mm_areg_b(S):/mm_areg_se(S)"}
        : {stata mm_areg_r2(S)}
        : {stata end}


{title:Two-step syntax}

{pstd}
    If several absorbing regressions are to be estimated using the same
    sample, computer time can be saved by analyzing {it:id} upfront
    and then pass the collected information through to the different regressions. The
    syntax for such a two-step procedure is

        {it:G} = {cmd:_mm_areg_g(}{it:id}{cmd:,} {it:sort}{cmd:)}
        {it:S} = {cmd:_mm_areg(}{it:G}{cmd:,} {it:y} [{cmd:,} {it:X}{cmd:,} {it:w}{cmd:,} {it:quad}]{cmd:)}

{pstd}
    where {it:G} is a structure holding the information collected from {it:id}
    (declare {it:G} as {it:transmorphic}). Multiple calls to {cmd:_mm_areg()}
    can follow, with different {it:y}, {it:X}, or {it:w} (the number of observations
    must stay the same).


{title:Diagnostics}

{pstd}
The functions return invalid results if {it:y}, {it:X}, or {it:w} contain
missing values.

{pstd}
Coefficients corresponding to omitted (collinear) terms will be set to zero.


{title:Source code}

{pstd}
{help moremata_source##mm_areg:mm_areg.mata}


{title:Author}

{pstd}
Ben Jann, University of Bern, ben.jann@unibe.ch


{title:Also see}

{p 4 13 2}
Online:  help for
{helpb moremata}, {helpb mf_mm_ls:mm_ls()}, {helpb areg}, {helpb xtreg}, {helpb regress}