Title
mm_linbin() -- Linear binning
Syntax
real colvector mm_linbin(x, w, g)
real colvector mm_fastlinbin(x, w, g)
real colvector mm_exactbin(x, w, g [, dir, include])
real colvector mm_makegrid(x [, m, e, min, max])
where
x: real colvector containing data points
w: real colvector containing weights
g: real colvector containing grid points
dir: real scalar idicating the direction of the intervals (default: right open)
include: real scalar idicating that data outside the grid be included in the first and last bins
m: real scalar specifying the number of equally spaced grid points (default is 512)
e: real scalar e extending the grid range
min: real scalar specifying the minimum grid value (default: min(x) - e)
max: real scalar specifying the maximum grid value (default: max(x) + e)
Description
mm_linbin() returns linearly binned counts of x at the grid points g (g must be sorted).
mm_fastlinbin() also performs linear binning but assumes g to be a (sorted) regular grid containing equidistant grid points.
mm_exactbin() returns counts of x within the intervals defined by the grid points g (g must be sorted). The default is to use right open intervals (with the last interval closed). However, dir!=0 specifies that left open intervals be used (with the first interval closed). mm_exactbin() does not allow x to contain data outside the grid range, unless include!=0 is specified, in which case such data is included in the first and last bin, respectively.
w in mm_linbin(), mm_fastlinbin(), and mm_exactbin() specifies weights associated with the observations in x. Specify w as 1 to obtain unweighted results. The sum of returned counts is equal to the sum of weights.
mm_makegrid() returns a grid of m equally spaced points over x. The default range of the grid is [min(x),max(x)]. If e is specified, the range is set to [min(x)-e,max(x)+e]. Alternatively, specify min and/or max to determine the limits of the grid range.
Remarks
Linear binning: Let g(j) and g(j+1) be the two nearest grid points below and above observation x. Then w*(g(j+1)-x)/(g(j+1)-g(j)) is added to the count at g(j) and w*(x-g(j))/(g(j+1)-g(j)) is added to the count at g(j+1), where w is the weight associated with x. Data below (above) the grid range is added to the count of the first (last) grid point.
Conformability
mm_linbin(x, w, g) mm_fastlinbin(x, w, g) x: n x 1 w: n x 1 or 1 x 1 g: m x 1, m>=1 result: m x 1.
mm_exactbin(x, w, g, dir, include) x: n x 1 w: n x 1 or 1 x 1 g: m x 1, m>=2 dir: 1 x 1 include: 1 x 1 result: m-1 x 1.
mm_makegrid(x, m, e, min, max): x: n x 1 m: 1 x 1 e: 1 x 1 min: 1 x 1 max: 1 x 1 result: m x 1.
Diagnostics
mm_exactbin() aborts with error if min(x) < g[1] or max(x) > g[rows(g)] (unless include!=0 is specified).
mm_linbin(), mm_fastlinbin(), and mm_exactbin() produce erroneous results if g is not sorted or if x, w, or g contain missing.
mm_fastlinbin() produces erroneous results if the values in g are not equidistant.
Source code
mm_linbin.mata, mm_fastlinbin.mata, mm_exactbin.mata, mm_makegrid.mata
Author
Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch
Also see
Online: help for [M-5] range(), [M-4] utility, moremata