{smcl}
{* NJC 16jul2015/26jul2015/2sep2015}
{viewerjumpto "Syntax" "mipolate##syntax"}{...}
{viewerjumpto "Description" "mipolate##description"}{...}
{viewerjumpto "Remarks" "mipolate##remarks"}{...}
{viewerjumpto "Options" "mipolate##options"}{...}
{viewerjumpto "Examples" "mipolate##examples"}{...}
{title:Title}
{p2colset 5 18 18 2}{...}
{p2col :mipolate{space 2}{hline 2}} Interpolate (extrapolate) values{p_end}
{p2colreset}{...}
{marker syntax}{...}
{title:Syntax}
{p 8 16 2}
{cmd:mipolate}
{it:yvar}
{it:xvar}
{ifin}
{cmd:,}
{opth gen:erate(newvar)}
[
{opt l:inear}
{opt c:ubic}
{opt s:pline}
{opt p:chip}
{opt i:dw}[{cmd:(}{it:power}{cmd:)}]
{opt f:orward}
{opt b:ackward}
{opt n:earest}
{opt g:roupwise}
{opt ties(ties_rule)}
{opt e:polate}
]
{phang}
{opt by} is allowed; see {manhelp by D}.
{marker description}{...}
{title:Description}
{pstd}
{opt mipolate} creates in {newvar} an interpolation of {it:yvar} on
{it:xvar} for missing values of {it:yvar}.
{pstd}
{opt mipolate} uses one of the following methods: linear, cubic, cubic
spline, pchip (piecewise cubic Hermite interpolation), idw (inverse
distance weighted), forward, backward,
nearest neighbour, groupwise. The default method is linear.
{pstd}
Interpolation requires that {it:yvar} be a function of {it:xvar}, so
{it:yvar} is also interpolated for tied values of {it:xvar}. When
{it:yvar} is not missing and {it:xvar} is neither missing nor repeated,
the value of {it:newvar} is just {it:yvar}.
{marker remarks}{...}
{title:Remarks}
{pstd}
{cmd:mipolate} does not require {help tsset} or {help xtset} data and makes no
check for, or use of, any such settings. With panel data, it will be essential
to specify a panel identifier to {cmd:by:}.
{pstd}
{cmd:mipolate} pays no special attention to extended missing values {cmd:.a}
to {cmd:.z}.
{marker options}{...}
{title:Options}
{phang}{opth generate(newvar)} is required and specifies the name of the
new variable to be created.
{phang}{opt linear} specifies linear interpolation using the known
values of {it:yvar} before and after any missing values. This is the
default method.
{phang}{opt cubic} specifies cubic interpolation, using exact fitting of
a cubic curve to two data points before and two data points after each
observation for which {it:yvar} is missing. Missing values are thus
produced whenever fewer than two data points are present on either side.
Note that this is not a spline method.
{phang}{opt spline} specifies natural cubic spline interpolation. The
method uses Mata functions {cmd:spline3()} and {cmd:spline3eval()}. If
desired see help and in turn Mata source code {help mf_spline3:here}.
That code is a translation of code originally given by Herriot and
Reinsch (1973).
{phang}{opt pchip} specifies piecewise cubic Hermite interpolation. For
a lucid account, see Moler (2004, Ch.3). This method uses piecewise
cubics that join smoothly, so that both the interpolated function and
its first derivative are continuous. In addition, the interpolant is
shape-preserving in the sense that it cannot overshoot locally; sections
in which observed {it:yvar} is increasing, decreasing or constant with
{it:xvar} remain so after interpolation, and local extremes (maxima,
maxima) also remain so. This interpolation method also extrapolates.
{phang}{opt idw}[{cmd:(}{it:power}{cmd:)}] specifies inverse distance
weighted interpolation. This method uses a weighted average of
non-missing values, the weights being reciprocals of the powered
distance between values, the power being zero or positive. The default
power is 2; any other choice must be specified. Thus with power 2,
values at distance 1 from a point with unknown values have weight 1,
values at distance 2 from a point have weight 1/4, distance 3 weight
1/9, and so forth. If the power is 0, all known points have equal
weight and the interpolant reduces to the average of all values. As the
power becomes large, only those values that are nearest have appreciable
weight. This interpolation method also extrapolates.
{phang}{opt forward} specifies forward interpolation, so that any known
value just before one or more missing values is copied in cascade to
provide interpolated values, constant within any such block.
{phang}{opt backward} specifies backward interpolation, so that any known
value just after one or more missing values is copied in cascade to
provide interpolated values, constant within any such block.
{phang}{opt nearest} specifies nearest neighbour interpolation, which
means using known values of {it:yvar} either before or after missing
values, depending on which is nearer in terms of {it:xvar}.
When values before and after are equally distant from a known value,
there is a choice of rules that may be applied. The default rule uses
the mean of the two values. The {opt ties()} option provides
alternative rules. This method also extrapolates, as unknown values
before the first known value and unknown values after the last known
value are replaced by those respective known values.
{phang}{opt groupwise} specifies that non-missing values be copied to
missing values if, and only if, just one distinct non-missing value
occurs in each group. Thus a group of values ., 42, ., . qualifies as 42
is not missing and is the only non-missing value in the group. Hence the
missing values in the group will be replaced with 42 in the new
variable. By the same rules 42, ., 42, . qualifies but 42, ., 43, .
does not. Normally, but not necessarily, this option is used in
conjunction with {cmd:by:}, which is how groups are specified; otherwise
the (single) group is the entire set of observations being used.
{pmore}Note that {it:xvar} is strictly irrelevant for this method, as
order of values is immaterial. To keep syntax consistent, it should be
specified any way.
{phang}{opt ties()} specifies an alternative to the default rule for the
{opt nearest} option, whereby previous and next values equally distant
from a given point are averaged. You may choose one of {cmdab:a:fter}
(value after is used), {cmdab:b:efore} (value before is used),
{cmdab:mi:nimum} (smaller value is used), or {cmdab:ma:ximum} (larger
value is used). As indicated, any unambiguous abbreviation is allowed.
{phang}{opt epolate} specifies that values be both interpolated and
linearly extrapolated. Interpolation only is the default with
{opt linear}, {opt cubic} and {opt spline}. This option is ignored
with {opt pchip}, {opt forward}, {opt backward}, {opt nearest} and
{opt groupwise}, which apply their own kinds of extrapolation.
{marker examples}{...}
{title:Examples}
{hline}
{pstd}Setup{p_end}
{phang2}{cmd:. webuse ipolxmpl1}
{pstd}List the data{p_end}
{phang2}{cmd:. list, sep(0)}
{pstd}Create {cmd:ly1}, {cmd:cy1}, {cmd:sy1}, {cmd:py1}, {cmd:ny1}
containing interpolations of {cmd:y} on {cmd:x} for missing values of
{cmd:y}{p_end}
{phang2}{cmd:. mipolate y x, gen(ly1)}{p_end}
{phang2}{cmd:. mipolate y x, cubic gen(cy1)}{p_end}
{phang2}{cmd:. mipolate y x, spline gen(sy1)}{p_end}
{phang2}{cmd:. mipolate y x, pchip gen(py1)}{p_end}
{phang2}{cmd:. mipolate y x, nearest gen(ny1)}
{pstd}Use alternative rules for handling ties:{p_end}
{phang2}{cmd:. foreach r in after before max min {c -(}}{p_end}
{phang2}{cmd:. mipolate y x, nearest ties(`r') gen(ny`r')}{p_end}
{phang2}{cmd:. {c )-}}{p_end}
{pstd}List the results{p_end}
{phang2}{cmd:. list, sep(0)}
{hline}
{pstd}Setup{p_end}
{phang2}{cmd:. webuse ipolxmpl2}{p_end}
{pstd}Show years for which the circulation data are missing{p_end}
{phang2}{cmd:. tabulate circ year if circ == ., missing}
{pstd}Create {cmd:pchipcirc} containing a pchip interpolation of {cmd:circ} on
{cmd:year} for missing values of {cmd:circ} and perform this calculation
separately for each {cmd:magazine}{p_end}
{phang2}{cmd:. by magazine: mipolate circ year, pchip gen(pchipcirc)}{p_end}
{hline}
{pstd}Moler's example{p_end}
{phang2}{cmd:. clear }{p_end}
{phang2}{cmd:. set obs 6 }{p_end}
{phang2}{cmd:. matrix y = (16, 18, 21, 17, 15, 12)' }{p_end}
{phang2}{cmd:. gen y = y[_n, 1] }{p_end}
{phang2}{cmd:. gen x = _n }{p_end}
{phang2}{cmd:. set obs 61}{p_end}
{phang2}{cmd:. replace x = (_n + 1)/10 in 7/L}{p_end}
{phang2}{cmd:. mipolate y x, pchip gen(pchip)}{p_end}
{phang2}{cmd:. line pchip x, sort || scatter y x }{p_end}
{hline}
{pstd}Sandbox for {cmd:groupwise}:{p_end}
{phang2}{cmd:. clear }{p_end}
{phang2}{cmd:. set obs 10 }{p_end}
{phang2}{cmd:. gen x = _n }{p_end}
{phang2}{cmd:. gen group = 1 in 1 }{p_end}
{phang2}{cmd:. replace group = 2 in 2/3}{p_end}
{phang2}{cmd:. replace group = 3 in 4/6}{p_end}
{phang2}{cmd:. replace group = 4 in 7/10 }{p_end}
{phang2}{cmd:. gen y = . }{p_end}
{phang2}{cmd:. replace y = 2 in 2 }{p_end}
{phang2}{cmd:. replace y = 5 in 5 }{p_end}
{phang2}{cmd:. replace y = 10 in 10 }{p_end}
{phang2}{cmd:. bysort group: mipolate y x, gen(y1) groupwise }{p_end}
{phang2}{cmd:. list, sepby(group) }{p_end}
{phang2}{cmd:. replace y = 9 in 9 }{p_end}
{phang2}{cmd:. * should fail: }{p_end}
{phang2}{cmd:. bysort group: mipolate y x, gen(y2) groupwise }{p_end}
{hline}
{title:Author}
{pstd}Nicholas J. Cox, Durham University, U.K.{break}
n.j.cox@durham.ac.uk
{title:Acknowledgments}
{pstd}Most of the Mata code for the {opt pchip} option is a translation
of MATLAB code given by Moler (2004).
{title:References}
{phang}
Hamming, R.W. 1973. {it:Numerical methods for scientists and engineers.}
New York: McGraw-Hill.
{phang}
Herriot, J.G. and C.H. Reinsch. 1973.
Algorithm 472: procedures for natural spline interpolation.
{it:Communications of the Association for Computing Machinery}
16: 763{c -}768.
{phang}
Lancaster, P. and K. Salkauskas. 1986.
{it:Curve and surface fitting: an introduction.}
London: Academic Press.
[capital S of Salkauskas should bear caron or wedge diacritic]
{phang}
Moler, C. 2004. {it:Numerical Computing with MATLAB.}
Philadelphia: SIAM. Chapter 3.
(also available in slightly different form at
{browse "http://www.mathworks.com/moler/interp.pdf":http://www.mathworks.com/moler/interp.pdf})
{phang}
Morton, B.R. 1964. {it:Numerical approximation}.
London: Routledge and Kegan Paul.
{phang}
Press, W.H., S.A. Teukolsky, W.T. Vetterling, B.P. Flannery. 2007.
{it:Numerical recipes: the art of scientific computing.}
Cambridge: Cambridge University Press.
{vieweralsosee "[D] ipolate" "mansection D ipolate"}{...}
{vieweralsosee "[MI] mi impute" "help mi_impute"}{...}