help mata mm_linbin()
-------------------------------------------------------------------------------

Title

mm_linbin() -- Linear binning

Syntax

real colvector mm_linbin(x, w, g)

real colvector mm_fastlinbin(x, w, g)

real colvector mm_exactbin(x, w, g [, dir, include])

real colvector mm_makegrid(x [, m, e, min, max])

where

x: real colvector containing data points

w: real colvector containing weights

g: real colvector containing grid points

dir: real scalar idicating the direction of the intervals (default: right open)

include: real scalar idicating that data outside the grid be included in the first and last bins

m: real scalar specifying the number of equally spaced grid points (default is 512)

e: real scalar e extending the grid range

min: real scalar specifying the minimum grid value (default: min(x) - e)

max: real scalar specifying the maximum grid value (default: max(x) + e)

Description

mm_linbin() returns linearly binned counts of x at the grid points g (g must be sorted).

mm_fastlinbin() also performs linear binning but assumes g to be a (sorted) regular grid containing equidistant grid points.

mm_exactbin() returns counts of x within the intervals defined by the grid points g (g must be sorted). The default is to use right open intervals (with the last interval closed). However, dir!=0 specifies that left open intervals be used (with the first interval closed). mm_exactbin() does not allow x to contain data outside the grid range, unless include!=0 is specified, in which case such data is included in the first and last bin, respectively.

w in mm_linbin(), mm_fastlinbin(), and mm_exactbin() specifies weights associated with the observations in x. Specify w as 1 to obtain unweighted results. The sum of returned counts is equal to the sum of weights.

mm_makegrid() returns a grid of m equally spaced points over x. The default range of the grid is [min(x),max(x)]. If e is specified, the range is set to [min(x)-e,max(x)+e]. Alternatively, specify min and/or max to determine the limits of the grid range.

Remarks

Linear binning: Let g(j) and g(j+1) be the two nearest grid points below and above observation x. Then w*(g(j+1)-x)/(g(j+1)-g(j)) is added to the count at g(j) and w*(x-g(j))/(g(j+1)-g(j)) is added to the count at g(j+1), where w is the weight associated with x. Data below (above) the grid range is added to the count of the first (last) grid point.

Conformability

mm_linbin(x, w, g) mm_fastlinbin(x, w, g) x: n x 1 w: n x 1 or 1 x 1 g: m x 1, m>=1 result: m x 1.

mm_exactbin(x, w, g, dir, include) x: n x 1 w: n x 1 or 1 x 1 g: m x 1, m>=2 dir: 1 x 1 include: 1 x 1 result: m-1 x 1.

mm_makegrid(x, m, e, min, max): x: n x 1 m: 1 x 1 e: 1 x 1 min: 1 x 1 max: 1 x 1 result: m x 1.

Diagnostics

mm_exactbin() aborts with error if min(x) < g[1] or max(x) > g[rows(g)] (unless include!=0 is specified).

mm_linbin(), mm_fastlinbin(), and mm_exactbin() produce erroneous results if g is not sorted or if x, w, or g contain missing.

mm_fastlinbin() produces erroneous results if the values in g are not equidistant.

Source code

mm_linbin.mata, mm_fastlinbin.mata, mm_exactbin.mata, mm_makegrid.mata

Author

Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also see

Online: help for [M-5] range(), [M-4] utility, moremata