.-
help for ^mdensity^ (manual: ^[R] kdensity^)
.-
Univariate kernel density estimation, for one or more variables or groups
-------------------------------------------------------------------------
^mdensity^ varname [weight] [^if^ exp] [^in^ range]
[ , ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^)^
^by(^byvar^) miss^ing ^z^ero
{ ^log^ | ^logit^ } ^a(^#^) b(^#^)^
{ ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle }
^i^ntensity
graph_options ]
^mdensity^ varlist [weight] [^if^ exp] [^in^ range]
[, ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^) z^ero
{ ^log^ | ^logit^ } ^a(^#^) b(^#^)^
{ ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle }
^i^ntensity
graph_options ]
^fweight^s and ^aweight^s are allowed; see help @weights@.
Description
-----------
^mdensity^ produces kernel density estimates for one or more variables or
groups and graphs the result.
^mdensity^ is a wrapper for ^kdensity^, which is called in turn for each
variable or group specified. See help for @kdensity@.
Remarks
-------
The probability density f(x) of a continuous variable x has the units and
dimensions of the reciprocal of x. If x is measured in metres, f(x) has
units 1 / metre. The density is thus not measured on a probability scale
and, for example, it is possible for f(x) to exceed 1.
The transformation and back transformation procedure obtained by the ^log^
and ^logit^ options is mentioned briefly by Silverman (1986, pp.27-30),
although his worked example (p.28) is not very encouraging. Good expositions
are given by Wand and Jones (1995, pp.43-45), Simonoff (1996, pp.61-64) and
Bowman and Azzalini (1997, pp.14-16). The underlying principle is that for
a continuous monotone transformation t(x), the densities f(x) and f(t(x))
are related by f(x) = f(t(x)) |dt/dx|. See Plackett (1971, pp.71-72) for
further mathematical details.
Options
-------
^at(^varx^)^ specifies a variable that contains the values at which the
density should be estimated.
^n(^#^)^ specifies the number of points at which the density estimate is to
be evaluated. ^n( )^ only applies if ^at( )^ is not specified. The default
is min(_N,50). ^n( )^ applies to a constructed variable. If ^range(^min^,^max^)^
is specified, it ranges from min to max. Otherwise, it ranges from the
minimum of the data to the maximum of the data. This behaviour differs
from that of ^kdensity^.
^range(^min^,^max^)^ specifies the minimum and maximum values for which
density is estimated. ^range( )^ only applies if ^at( )^ is not specified.
See ^n( )^ above.
^width(^numlist^)^ specifies the halfwidth(s) of the kernel, the width(s) of
the density window(s) around each point.
If ^width( )^ is not specified, or if ^width(0)^ is specified, then
the "optimal" width is used; see ^[R] kdensity^. In addition, that
optimal width will be determined separately for each group or variable.
In fact, for multimodal and highly skewed densities, the "optimal" is
usually too wide and oversmooths the density.
If ^width( )^ is specified, any single number is used for all variables
or groups; several numbers will be used in turn for variables or groups,
any excess of numbers being ignored and any deficiency of numbers being
made up by ^0^ repeated. If ^log^ or ^logit^ is also specified, widths
should be on a natural logarithm or logit scale.
Thus ^mdensity mpg mpg mpg, width(1/3)^ shows the effects of
using different widths ^1 2 3^ on the same data.
See help for @numlist@ for further details on numlists.
^by(^byvar^)^ specifies that calculations are to be carried out separately
for each class defined by a single variable byvar. The graph will,
however, show the functions for all classes. ^by( )^ is only allowed
with a single varname.
^missing^, used only with ^by( )^, permits the use of non-missing values
of varname corresponding to missing values for the variable named by
^by( )^. The default is to ignore such values.
^zero^ specifies that densities estimated as zero are to be shown as such.
The default is not to show such values.
^log^ specifies that estimation is be carried out on logarithms of the data
and inverted. That is, for a density f(x),
estimate of f(x) = estimate of f(log x) * (1 / x),
given that 1 / x = d/dx (log x). See Remarks above. This method is
appropriate only for data that are all positive. In particular, if data
are right skewed, it smooths more in the tail and less near the main
part of the distribution than the default method.
^logit^ specifies that estimation is be carried out on logits of the data
and inverted. That is, for a density f(x),
estimate of f(x) = estimate of f(logit x) * (b - a) / ((x - a)(b - x)),
where logit x = log ((x - a) / (b - x)), a slight generalisation of the
usual definition. Note that (b - a) / ((x - a)(b - x)) =
d/dx (logit x). See Remarks above. This method is appropriate only for
data that are between a and b. See also just below.
^a(^#^)^ and ^b(^#^)^ tune constants in the definition of the logit function
above. By default ^a^ is 0 and ^b^ is 1, giving the usual logit, i.e.
log (x / (1 - x)).
^log^ and ^logit^ may not be specified together.
^biweight^, ^cosine^, ..., ^triangle^ specify the kernel. (Actually, ^cosine^
specifies the cosine trace as there is no such thing as a cosine kernel.)
By default, ^epan^, meaning the Epanechnikov kernel, is used.
^intensity^ specifies that the data are from a point process in one
dimension (e.g. time or space) and that the intensity function
(e.g. frequency per unit time or space) is being estimated. Results
will be shown on an intensity scale, as `density' multiplied by
number of observed data points.
graph_options are any options allowed with ^graph, twoway^; see help
@grtwoway@. With ^by( )^ and several groups, each group will be
shown graphically as if it were a distinct variable.
Saved results
-------------
^r(widths)^ contains the widths used for smoothing each density.
^r(scales)^ contains the bin widths used.
^r(ns)^ contains the numbers of points at which the density was
evaluated.
Examples
--------
. ^mdensity price, by(foreign)^
. ^mdensity mpg mpg mpg, w(1/3) l1(Density with width 1 2 3)^
^b2(Mileage (mpg)) sy(iii) c(sss)^
. ^range size 0 2000^
. ^label var size "Length and width (m)"^
. ^mdensity length width, at(size) xla yla^
OR
. ^mdensity length width, r(0,2000) b2(Length and width (m)) xla yla^
References
----------
Bowman, A.W. and Azzalini, A. 1997. Applied smoothing techniques for
data analysis: the kernel approach with S-Plus applications. Oxford:
Oxford University Press.
Plackett, R.L. 1971. An introduction to the theory of statistics.
Edinburgh: Oliver and Boyd.
Silverman, B.W. 1986. Density estimation for statistics and data analysis.
London: Chapman and Hall.
Simonoff, J.S. 1996. Smoothing methods in statistics. New York: Springer.
Wand, M.P. and Jones, M.C. 1995. Kernel smoothing. London: Chapman and Hall.
Author
------
Nicholas J. Cox, University of Durham, U.K.
n.j.cox@@durham.ac.uk
Acknowledgment
--------------
Patrick Royston, Tom Steichen and Fred Wolfe made helpful comments.
Also see
--------
Manual: ^[R] kdensity^
On-line: help for @kdensity@, @graph@, @numlist@