{smcl} {* 23apr2021}{...} {cmd:help mata mm_ls()} {hline} {title:Title} {p 4 17 2} {bf:mm_ls() -- Linear (least-squares) regression} {title:Syntax} {pstd} Simple syntax {p 8 24 2} {it:b} = {cmd:mm_lsfit(}{it:y} [{cmd:,} {it:X}{cmd:,} {it:w}{cmd:,} {it:cons}{cmd:,} {it:quad}{cmd:,} {it:demean}]{cmd:)} {p 7 20 2}{bind: }{it:y}: {it:real colvector} containing dependent variable{p_end} {p 7 20 2}{bind: }{it:X}: {it:real matrix} containing predictors{p_end} {p 7 20 2}{bind: }{it:w}: {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end} {p 7 20 2}{bind: }{it:cons}: whether to include a constant; specify {cmd:0} to omit the constant{p_end} {p 7 20 2}{bind: }{it:quad}: whether to use quad precision when computing cross products; specify {cmd:0} to use double precision{p_end} {p 7 20 2}{bind: }{it:demean}: whether to compute cross products based on mean-deviated data; specify {cmd:0} to omit demeaning{p_end} {pstd} Advanced syntax {pmore} Setup {p 12 24 2} {it:S} = {cmd:mm_ls(}{it:y} [{cmd:,} {it:X}{cmd:,} {it:w}{cmd:,} {it:cons}{cmd:,} {it:quad}{cmd:,} {it:demean}]{cmd:)} {pmore} Retrieve results {p2colset 9 37 39 2}{...} {p2col:{bind: }{it:b} = {cmd:mm_ls_b(}{it:S}{cmd:)}}coefficient vector (column vector){p_end} {p2col:{bind: }{it:xb} = {cmd:mm_ls_xb(}{it:S} [{cmd:,} {it:X}]{cmd:)}}linear predictions{p_end} {p2col:{bind: }{it:s} = {cmd:mm_ls_s(}{it:S}{cmd:)}}scale (root mean squared error){p_end} {p2col:{bind: }{it:r2} = {cmd:mm_ls_r2(}{it:S}{cmd:)}}R-squared{p_end} {p2col:{bind: }{it:se} = {cmd:mm_ls_se(}{it:S}{cmd:)}}(non-robust) standard errors{p_end} {p2col:{bind: }{it:V} = {cmd:mm_ls_V(}{it:S}{cmd:)}}(non-robust) variance matrix{p_end} {p2col:{bind: }{it:XXinv} = {cmd:mm_ls_XXinv(}{it:S}{cmd:)}}inverse of X'X{p_end} {p2col:{bind: }{it:RSS} = {cmd:mm_ls_rss(}{it:S}{cmd:)}}residual sum of squares{p_end} {p2col:{bind: }{it:ymean} = {cmd:mm_ls_ymean(}{it:S}{cmd:)}}mean of y{p_end} {p2col:{bind: }{it:means} = {cmd:mm_ls_means(}{it:S}{cmd:)}}means of X (row vector){p_end} {p2col:{bind: }{it:omit} = {cmd:mm_ls_omit(}{it:S}{cmd:)}}column vector flagging omitted terms{p_end} {p2col:{bind:}{it:k_omit} = {cmd:mm_ls_k_omit(}{it:S}{cmd:)}}number of omitted terms{p_end} {p2col:{bind: }{it:N} = {cmd:mm_ls_N(}{it:S}{cmd:)}}number of observations (sum of weights){p_end} {pmore} {it:S} is a structure holding results and settings; declare {it:S} as {it:transmorphic}. {title:Description} {pstd} {cmd:mm_ls()} fits a linear regression model using the least-squares technique. Results are equivalent to Stata's {helpb regress}. Speed of {cmd:mm_ls()} is comparable to {helpb regress} under default settings. {pstd} {cmd:mm_ls()} uses quad precision and demeaning (unless the constant is excluded) when computing X'X and X'y. Specifying {it:quad}=0 and/or {it:demean}=0 will make {cmd:mm_ls()} faster, but less precise. {it:demean}=0 is potentially more harmful than {it:quad}=0 (and typically less effective in terms of speed gains). Use {it:quad}=0 and {it:demean}=0 only if your data is well-behaved (reasonable means, not much collinearity). {pstd} For models without constant ({it:cons}=0), argument {it:demean} has no effect. This is because mean-deviation formulas are not applicable in this case (meaning that models without constant will generally be affected by precision issues if the data is not well-behaved). {title:Examples} {pstd} If you are only interested in the coefficients, you can use {cmd:mm_lsfit()} (simple syntax) to obtain a quick least-squares fit without much typing: . {stata sysuse auto} . {stata regress weight length foreign} . {stata "mata:"} : {stata y = st_data(., "weight")} : {stata X = st_data(., "length foreign")} : {stata mm_lsfit(y, X)} : {stata end} {pstd} For more sophisticated applications, use the advanced syntax. Function {cmd:mm_ls()} defines the problem and performs the main calculations. You can then use functions such as {cmd:mm_ls_b()} or {cmd:mm_ls_r2()} to obtain results. The following example illustrates how to obtain coefficients, standard errors, t values, and the R-squared: . {stata "mata:"} : {stata S = mm_ls(y, X)} : {stata "mm_ls_b(S), mm_ls_se(S), mm_ls_b(S):/mm_ls_se(S)"} : {stata mm_ls_r2(S)} : {stata end} {pstd} The R-squared returned by {cmd:mm_ls_r2()} will always be computed with respect to a constant-only model, even if {it:cons}=0 has been specified. This is equivalent to specifying option {cmd:hascons} in {helpb regress} (or option {cmd:noconstant} together with {cmd:tsscons}): . {stata sysuse auto} . {stata regress weight length ibn.foreign, hascons} . {stata "mata:"} : {stata y = st_data(., "weight")} : {stata X = st_data(., "length ibn.foreign")} : {stata S = mm_ls(y, X, 1, 0)} : {stata "mm_ls_b(S), mm_ls_se(S), mm_ls_b(S):/mm_ls_se(S)"} : {stata mm_ls_r2(S)} : {stata end} {title:Diagnostics} {pstd} The functions return invalid results if {it:y}, {it:X}, or {it:w} contain missing values. {pstd} Coefficients corresponding to omitted (collinear) terms will be set to zero. {title:Source code} {pstd} {help moremata_source##mm_ls:mm_ls.mata} {title:Author} {pstd} Ben Jann, University of Bern, ben.jann@soz.unibe.ch {pstd} Thanks to Bill Gould for helpful advice. {title:Also see} {p 4 13 2} Online: help for {helpb moremata}, {helpb regress}