{smcl}
{* 27may2025}{...}
{cmd:help mata mm_linbin2()}
{hline}
{title:Title}
{p 4 4 2}
{bf:mm_linbin2() -- Aggregate data by linear binning}
{title:Syntax}
{p 8 24 2}
{it:S} =
{cmd:mm_linbin2(}{it:y}{cmd:,} {it:x}{cmd:,} {it:w}{cmd:,} {it:n} [{cmd:,} {it:e}]{cmd:)}
{p 7 18 2}{bind: }{it:y}: {it:real matrix} containing data to be aggregated{p_end}
{p 7 18 2}{bind: }{it:x}: {it:real colvector} containing binning variable{p_end}
{p 7 18 2}{bind: }{it:w}: {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end}
{p 7 18 2}{bind: }{it:n}: {it:real scalar} specifying the grid size (number of grid points){p_end}
{p 7 18 2}{bind: }{it:e}: {it:real scalar} specifying amount of padding to be added to the grid
range or {it:real vector} specifying the minimum and maximum of the grid range;
if {it:e} is a scalar, the grid range is determined as
[{cmd:min(}{it:x}{cmd:)}-{it:e}, {cmd:max(}{it:x}{cmd:)}+{it:e}];
if {it:e} is a vector, the grid range is determined as
[{it:e}{cmd:[1]}, {it:e}{cmd:[2]}];
the default is {it:e} = {cmd:0}
{p 8 8 2}
{it:S} is a structure containing the following elements:
{p 7 18 2}{bind: }{it:S}{cmd:.y}: {it:real matrix} containing aggregated data (average of {it:y} at each grid point){p_end}
{p 7 18 2}{bind: }{it:S}{cmd:.x}: {it:real colvector} containing grid points (discretized {it:x}){p_end}
{p 7 18 2}{bind: }{it:S}{cmd:.w}: {it:real colvector} containing aggregated counts (sum of weights at each grid point){p_end}
{p 7 18 2}{bind: }{it:S}{cmd:.n}: {it:real scalar} containing grid size (number of grid points){p_end}
{p 7 18 2}{bind: }{it:S}{cmd:.d}: {it:real scalar} containing step size between grid points{p_end}
{p 8 8 2}
Within functions, declare {it:S} as
{p 12 12 2}
{cmd:struct mm_linbin2_struct scalar} {it:S}
{title:Description}
{pstd}
{cmd:mm_linbin2()} aggregates data using linear binning by a regular
grid. Let {it:d} be the step size between grid points and let {it:a} and {it:b} be the
two nearest grid points below and above an observation {it:x}. Then
{it:h_a} = {it:w} * ({it:b}-{it:x})/{it:d} is added to the count of point {it:a}
and {it:h_b} = {it:w} * ({it:x}-{it:a})/{it:d} is added to the count of point
{it:b}, where {it:w} is the weight associated with {it:x}. Likewise, {it:h_a}/{it:H_a} * {it:y}
is added to the aggregated data at point {it:a} and {it:h_b}/{it:H_b} * {it:y}
is added to the aggregated data at point {it:b}, where {it:H_a} ({it:H_b})
is the total count of point {it:a} ({it:b}). Note that the aggregated data
will be set to zero (rather than missing) if the total count of a point is
zero.
{title:Example}
. {stata "mata:"}
: {stata y = rnormal(100,2,0,1)}
: {stata x = runiform(100,1)}
: {stata S = mm_linbin2(y, x, 1, 10)} // grid from min(x) to max(x)
: {stata S.y, S.x, S.w}
: {stata S = mm_linbin2(y, x, 1, 10, (0,1))} // grid from 0 to 1
: {stata S.y, S.x, S.w}
: {stata S = mm_linbin2(J(0,0,.), x, 1, 10, (0,1))} // obtain count only
: {stata S.y, S.x, S.w}
: {stata end}
{title:Source code}
{pstd}
{help moremata_source##mm_linbin2:mm_linbin2.mata}
{title:Author}
{pstd}
Ben Jann, University of Bern, ben.jann@unibe.ch
{title:Also see}
{p 4 13 2}
Online: help for
{helpb moremata}, {helpb mf_mm_linbin:mm_linbin()}