{smcl}
{* 16jun2025}{...}
{cmd:help mata mm_loclin()}
{hline}
{title:Title}
{p 4 17 2}
{bf:mm_loclin() -- Kernel-weighted local linear smoothing}
{title:Syntax}
{p 8 24 2}
{it:fit} =
{cmd:mm_loclin(}{it:y}{cmd:,} {it:x}{cmd:,} {it:w}{cmd:,} {it:at}{cmd:,} {it:bw}
[{cmd:,} {it:kernel}]{cmd:)}
{p 7 20 2}{bind: }{it:fit}: {it:real vector} containing fitted values{p_end}
{p 7 20 2}{bind: }{it:y}: {it:real colvector} containing dependent variable{p_end}
{p 7 20 2}{bind: }{it:x}: {it:real colvector} containing predictor{p_end}
{p 7 20 2}{bind: }{it:w}: {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end}
{p 7 20 2}{bind: }{it:at}: {it:real vector} containing evaluation points{p_end}
{p 7 20 2}{bind: }{it:bw}: {it:real vector} containing bandwidth(s){p_end}
{p 7 20 2}{bind: }{it:kernel}: {it:string scalar} specifying the kernel function; {it:kernel} may be
{cmd:"{ul:e}panechnikov"}, {cmd:"epan2"} (the default), {cmd:"{ul:b}iweight"},
{cmd:"{ul:t}riweight"}, {cmd:"{ul:c}osine"}, {cmd:"{ul:g}aussian"},
{cmd:"{ul:p}arzen"}, {cmd:"{ul:r}ectangle"} or {cmd:"{ul:tria}ngle"};
omitting {it:kernel} or specifying {cmd:""} is equivalent to {cmd:"epan2"}{p_end}
{p 9 24 2}
{it:bw} =
{cmd:mm_loclin_bw(}{it:y}{cmd:,} {it:x}
[{cmd:,} {it:w} {cmd:,} {it:kernel}{cmd:,} {it:fw}]{cmd:)}
{p_end}
{p 9 24 2}
{it:bw} =
{cmd:mm_loclin_bw2(}{it:y}{cmd:,} {it:x}
[{cmd:,} {it:w} {cmd:,} {it:kernel}{cmd:,} {it:fw}]{cmd:)}
{p 7 20 2}{bind: }{it:bw}: {it:real scalar} containing the computed rule-of-thumb bandwidth{p_end}
{p 7 20 2}{bind: }... arguments {it:y}, {it:x}, {it:w}, and {it:kernel} as above {p_end}
{p 7 20 2}{bind: }{it:fw}: {it:real scalar} setting the type of weights; specify
{it:fw}!=0 for frequency weights{p_end}
{title:Description}
{pstd}
{cmd:mm_loclin()} returns the fitted values of a kernel-weighted local
linear regression of {it:y} on {it:x} evaluated at the points provided by
{it:at}. {cmd:mm_loclin()} uses naive (direct) estimation; this means that
computing time is proportional to the number of evaluation points. See
below on how to save time by applying {cmd:mm_loclin()} to aggregated data from
{helpb mf_mm_linbin2:mm_linbin2()}.
{pstd}
Argument {it:bw} in {cmd:mm_loclin()} sets the bandwidth (half-width
of the kernel). The larger the bandwidth, the higher the degree of
smoothing. Specify {it:bw} as a scalar to use the same (global) bandwidth
for all evaluation points. Alternatively, specify {it:bw} as a vector
to use a separate bandwidth for each evaluation point; in this case
{it:bw} must have the same length as {it:at}.
{pstd}
{cmd:mm_loclin_bw()} computes a rule-of-thumb bandwidth from the data
in the same way as Stata command {helpb lpoly}. {cmd:mm_loclin_bw2()} computes the
rule-of-thumb bandwidth in a somewhat different way that is more in line
with the procedure suggested by Fan and Gijbels (1996, page 111).
{title:Examples}
{pstd}
Comparison between {helpb lpoly} and {cmd:mm_loclin()}
. {stata sysuse auto, clear}
. {stata lpoly price turn, degree(1) n(10) generate(at fit) nograph}
. {stata display r(bwidth)}
. {stata list fit at in 1/10}
. {stata "mata:"}
: {stata y = st_data(., "price")}
: {stata x = st_data(., "turn")}
: {stata at = st_data((1,10), "at")}
: {stata bw = mm_loclin_bw(y, x, 1, "epanechnikov")}
: {stata bw}
: {stata fit = mm_loclin(y, x, 1, at, bw, "epanechnikov")}
: {stata fit, at}
: {stata end}
{pstd}
{helpb lpoly} returns missing for the
last evaluation point because there is only a single observation within
the range of the kernel at this point. Other than {helpb lpoly},
{cmd:mm_loclin()} returns a valid result in this case.
{pstd}
Aggregating data using {helpb mf_mm_linbin2:mm_linbin2()}
. {stata "mata:"}
: {stata n = 1e6} // 1 mio observations
: {stata y = rnormal(n,1,0,1)}
: {stata x = rnormal(n,1,0,1)}
: {stata at = rangen(-3,3,100)} // 100 evaluation points
: {stata fit = mm_loclin(y, x, 1, at, 0.5)} // slow
: {stata S = mm_linbin2(y, x, 1, 1000)} // aggregate data
: {stata fit2 = mm_loclin(S.y, S.x, S.w, at, 0.5)} // fast
: {stata corr(variance((fit,fit2)))[1,2]} // result is almost the same
: {stata end}
{pstd}
Even if applying {cmd:mm_loclin()} to aggregated data is much faster, the
overall reduction in computing time is not dramatic (about 70% in the above example)
because aggregating the data also takes time. How much time can be saved
depends on the number of evaluation points (the larger the number of
evaluation points, the more time can be saved).
{title:Diagnostics}
{pstd}
The functions return invalid results if {it:y}, {it:x}, or {it:w} contain
missing values.
{title:References}
{phang} Fan, J., I. Gijbels (1996). Local Polynomial Modelling and Its
Applications. Chapman & Hall/CRC.
{title:Source code}
{pstd}
{help moremata14_source##mm_loclin:mm_loclin.mata}
{title:Author}
{pstd}
Ben Jann, University of Bern, ben.jann@unibe.ch
{title:Also see}
{p 4 13 2}
Online: help for
{helpb moremata}, {helpb mf_mm_linbin2:mm_linbin2()}, {helpb lpoly}