{smcl} {* 16jun2025}{...} {cmd:help mata mm_loclin()} {hline} {title:Title} {p 4 17 2} {bf:mm_loclin() -- Kernel-weighted local linear smoothing} {title:Syntax} {p 8 24 2} {it:fit} = {cmd:mm_loclin(}{it:y}{cmd:,} {it:x}{cmd:,} {it:w}{cmd:,} {it:at}{cmd:,} {it:bw} [{cmd:,} {it:kernel}]{cmd:)} {p 7 20 2}{bind: }{it:fit}: {it:real vector} containing fitted values{p_end} {p 7 20 2}{bind: }{it:y}: {it:real colvector} containing dependent variable{p_end} {p 7 20 2}{bind: }{it:x}: {it:real colvector} containing predictor{p_end} {p 7 20 2}{bind: }{it:w}: {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end} {p 7 20 2}{bind: }{it:at}: {it:real vector} containing evaluation points{p_end} {p 7 20 2}{bind: }{it:bw}: {it:real vector} containing bandwidth(s){p_end} {p 7 20 2}{bind: }{it:kernel}: {it:string scalar} specifying the kernel function; {it:kernel} may be {cmd:"{ul:e}panechnikov"}, {cmd:"epan2"} (the default), {cmd:"{ul:b}iweight"}, {cmd:"{ul:t}riweight"}, {cmd:"{ul:c}osine"}, {cmd:"{ul:g}aussian"}, {cmd:"{ul:p}arzen"}, {cmd:"{ul:r}ectangle"} or {cmd:"{ul:tria}ngle"}; omitting {it:kernel} or specifying {cmd:""} is equivalent to {cmd:"epan2"}{p_end} {p 9 24 2} {it:bw} = {cmd:mm_loclin_bw(}{it:y}{cmd:,} {it:x} [{cmd:,} {it:w} {cmd:,} {it:kernel}{cmd:,} {it:fw}]{cmd:)} {p_end} {p 9 24 2} {it:bw} = {cmd:mm_loclin_bw2(}{it:y}{cmd:,} {it:x} [{cmd:,} {it:w} {cmd:,} {it:kernel}{cmd:,} {it:fw}]{cmd:)} {p 7 20 2}{bind: }{it:bw}: {it:real scalar} containing the computed rule-of-thumb bandwidth{p_end} {p 7 20 2}{bind: }... arguments {it:y}, {it:x}, {it:w}, and {it:kernel} as above {p_end} {p 7 20 2}{bind: }{it:fw}: {it:real scalar} setting the type of weights; specify {it:fw}!=0 for frequency weights{p_end} {title:Description} {pstd} {cmd:mm_loclin()} returns the fitted values of a kernel-weighted local linear regression of {it:y} on {it:x} evaluated at the points provided by {it:at}. {cmd:mm_loclin()} uses naive (direct) estimation; this means that computing time is proportional to the number of evaluation points. See below on how to save time by applying {cmd:mm_loclin()} to aggregated data from {helpb mf_mm_linbin2:mm_linbin2()}. {pstd} Argument {it:bw} in {cmd:mm_loclin()} sets the bandwidth (half-width of the kernel). The larger the bandwidth, the higher the degree of smoothing. Specify {it:bw} as a scalar to use the same (global) bandwidth for all evaluation points. Alternatively, specify {it:bw} as a vector to use a separate bandwidth for each evaluation point; in this case {it:bw} must have the same length as {it:at}. {pstd} {cmd:mm_loclin_bw()} computes a rule-of-thumb bandwidth from the data in the same way as Stata command {helpb lpoly}. {cmd:mm_loclin_bw2()} computes the rule-of-thumb bandwidth in a somewhat different way that is more in line with the procedure suggested by Fan and Gijbels (1996, page 111). {title:Examples} {pstd} Comparison between {helpb lpoly} and {cmd:mm_loclin()} . {stata sysuse auto, clear} . {stata lpoly price turn, degree(1) n(10) generate(at fit) nograph} . {stata display r(bwidth)} . {stata list fit at in 1/10} . {stata "mata:"} : {stata y = st_data(., "price")} : {stata x = st_data(., "turn")} : {stata at = st_data((1,10), "at")} : {stata bw = mm_loclin_bw(y, x, 1, "epanechnikov")} : {stata bw} : {stata fit = mm_loclin(y, x, 1, at, bw, "epanechnikov")} : {stata fit, at} : {stata end} {pstd} {helpb lpoly} returns missing for the last evaluation point because there is only a single observation within the range of the kernel at this point. Other than {helpb lpoly}, {cmd:mm_loclin()} returns a valid result in this case. {pstd} Aggregating data using {helpb mf_mm_linbin2:mm_linbin2()} . {stata "mata:"} : {stata n = 1e6} // 1 mio observations : {stata y = rnormal(n,1,0,1)} : {stata x = rnormal(n,1,0,1)} : {stata at = rangen(-3,3,100)} // 100 evaluation points : {stata fit = mm_loclin(y, x, 1, at, 0.5)} // slow : {stata S = mm_linbin2(y, x, 1, 1000)} // aggregate data : {stata fit2 = mm_loclin(S.y, S.x, S.w, at, 0.5)} // fast : {stata corr(variance((fit,fit2)))[1,2]} // result is almost the same : {stata end} {pstd} Even if applying {cmd:mm_loclin()} to aggregated data is much faster, the overall reduction in computing time is not dramatic (about 70% in the above example) because aggregating the data also takes time. How much time can be saved depends on the number of evaluation points (the larger the number of evaluation points, the more time can be saved). {title:Diagnostics} {pstd} The functions return invalid results if {it:y}, {it:x}, or {it:w} contain missing values. {title:References} {phang} Fan, J., I. Gijbels (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall/CRC. {title:Source code} {pstd} {help moremata14_source##mm_loclin:mm_loclin.mata} {title:Author} {pstd} Ben Jann, University of Bern, ben.jann@unibe.ch {title:Also see} {p 4 13 2} Online: help for {helpb moremata}, {helpb mf_mm_linbin2:mm_linbin2()}, {helpb lpoly}