{smcl}
{* 27may2025}{...}
{cmd:help mata mm_linbin2()}
{hline}

{title:Title}

{p 4 4 2}
{bf:mm_linbin2() -- Aggregate data by linear binning}

{title:Syntax}

{p 8 24 2}
{it:S} =
{cmd:mm_linbin2(}{it:y}{cmd:,} {it:x}{cmd:,} {it:w}{cmd:,} {it:n} [{cmd:,} {it:e}]{cmd:)}

{p 7 18 2}{bind:     }{it:y}:  {it:real matrix} containing data to be aggregated{p_end}
{p 7 18 2}{bind:     }{it:x}:  {it:real colvector} containing binning variable{p_end}
{p 7 18 2}{bind:     }{it:w}:  {it:real colvector} containing weights; specify {cmd:1} for unweighted results{p_end}
{p 7 18 2}{bind:     }{it:n}:  {it:real scalar} specifying the grid size (number of grid points){p_end}
{p 7 18 2}{bind:     }{it:e}:  {it:real scalar} specifying amount of padding to be added to the grid
    range or {it:real vector} specifying the minimum and maximum of the grid range;
    if {it:e} is a scalar, the grid range is determined as
    [{cmd:min(}{it:x}{cmd:)}-{it:e}, {cmd:max(}{it:x}{cmd:)}+{it:e}];
    if {it:e} is a vector, the grid range is determined as
    [{it:e}{cmd:[1]}, {it:e}{cmd:[2]}];
    the default is {it:e} = {cmd:0}

{p 8 8 2}
{it:S} is a structure containing the following elements:

{p 7 18 2}{bind:   }{it:S}{cmd:.y}:  {it:real matrix} containing aggregated data (average of {it:y} at each grid point){p_end}
{p 7 18 2}{bind:   }{it:S}{cmd:.x}:  {it:real colvector} containing grid points (discretized {it:x}){p_end}
{p 7 18 2}{bind:   }{it:S}{cmd:.w}:  {it:real colvector} containing aggregated counts (sum of weights at each grid point){p_end}
{p 7 18 2}{bind:   }{it:S}{cmd:.n}:  {it:real scalar} containing grid size (number of grid points){p_end}
{p 7 18 2}{bind:   }{it:S}{cmd:.d}:  {it:real scalar} containing step size between grid points{p_end}

{p 8 8 2}
Within functions, declare {it:S} as

{p 12 12 2}
    {cmd:struct mm_linbin2_struct scalar} {it:S}


{title:Description}

{pstd}
    {cmd:mm_linbin2()} aggregates data using linear binning by a regular
    grid. Let {it:d} be the step size between grid points and let {it:a} and {it:b} be the
    two nearest grid points below and above an observation {it:x}. Then
    {it:h_a} = {it:w} * ({it:b}-{it:x})/{it:d} is added to the count of point {it:a}
    and {it:h_b} = {it:w} * ({it:x}-{it:a})/{it:d} is added to the count of point
    {it:b}, where {it:w} is the weight associated with {it:x}. Likewise, {it:h_a}/{it:H_a} * {it:y}
    is added to the aggregated data at point {it:a} and {it:h_b}/{it:H_b} * {it:y}
    is added to the aggregated data at point {it:b}, where {it:H_a} ({it:H_b})
    is the total count of point {it:a} ({it:b}). Note that the aggregated data
    will be set to zero (rather than missing) if the total count of a point is
    zero.


{title:Example}

    . {stata "mata:"}
    : {stata y = rnormal(100,2,0,1)}
    : {stata x = runiform(100,1)}
    : {stata S = mm_linbin2(y, x, 1, 10)}  // grid from min(x) to max(x)
    : {stata S.y, S.x, S.w}
    : {stata S = mm_linbin2(y, x, 1, 10, (0,1))}  // grid from 0 to 1
    : {stata S.y, S.x, S.w}
    : {stata S = mm_linbin2(J(0,0,.), x, 1, 10, (0,1))}  // obtain count only
    : {stata S.y, S.x, S.w}
    : {stata end}


{title:Source code}

{pstd}
    {help moremata_source##mm_linbin2:mm_linbin2.mata}


{title:Author}

{pstd}
    Ben Jann, University of Bern, ben.jann@unibe.ch


{title:Also see}

{p 4 13 2}
Online:  help for
{helpb moremata}, {helpb mf_mm_linbin:mm_linbin()}