Univariate kernel density estimation, for one or more variables or groups -------------------------------------------------------------------------
^mdensity^ varname [weight] [^if^ exp] [^in^ range] [ , ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^)^ ^by(^byvar^) miss^ing ^z^ero { ^log^ | ^logit^ } ^a(^#^) b(^#^)^ { ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle } ^i^ntensity graph_options ]
^mdensity^ varlist [weight] [^if^ exp] [^in^ range] [, ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^) z^ero { ^log^ | ^logit^ } ^a(^#^) b(^#^)^ { ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle } ^i^ntensity graph_options ]
^fweight^s and ^aweight^s are allowed; see help @weights@.
Description -----------
^mdensity^ produces kernel density estimates for one or more variables or groups and graphs the result.
^mdensity^ is a wrapper for ^kdensity^, which is called in turn for each variable or group specified. See help for @kdensity@.
Remarks -------
The probability density f(x) of a continuous variable x has the units and dimensions of the reciprocal of x. If x is measured in metres, f(x) has units 1 / metre. The density is thus not measured on a probability scale and, for example, it is possible for f(x) to exceed 1.
The transformation and back transformation procedure obtained by the ^log^ and ^logit^ options is mentioned briefly by Silverman (1986, pp.27-30), although his worked example (p.28) is not very encouraging. Good expositions are given by Wand and Jones (1995, pp.43-45), Simonoff (1996, pp.61-64) and Bowman and Azzalini (1997, pp.14-16). The underlying principle is that for a continuous monotone transformation t(x), the densities f(x) and f(t(x)) are related by f(x) = f(t(x)) |dt/dx|. See Plackett (1971, pp.71-72) for further mathematical details.
Options -------
^at(^varx^)^ specifies a variable that contains the values at which the density should be estimated. ^n(^#^)^ specifies the number of points at which the density estimate is to be evaluated. ^n( )^ only applies if ^at( )^ is not specified. The default is min(_N,50). ^n( )^ applies to a constructed variable. If ^range(^min^,^m > ax^)^ is specified, it ranges from min to max. Otherwise, it ranges from the minimum of the data to the maximum of the data. This behaviour differs from that of ^kdensity^.
^range(^min^,^max^)^ specifies the minimum and maximum values for which density is estimated. ^range( )^ only applies if ^at( )^ is not specified. See ^n( )^ above. ^width(^numlist^)^ specifies the halfwidth(s) of the kernel, the width(s) of the density window(s) around each point. If ^width( )^ is not specified, or if ^width(0)^ is specified, then the "optimal" width is used; see ^[R] kdensity^. In addition, that optimal width will be determined separately for each group or variable. In fact, for multimodal and highly skewed densities, the "optimal" is usually too wide and oversmooths the density.
If ^width( )^ is specified, any single number is used for all variables or groups; several numbers will be used in turn for variables or groups, any excess of numbers being ignored and any deficiency of numbers being made up by ^0^ repeated. If ^log^ or ^logit^ is also specified, widths should be on a natural logarithm or logit scale.
Thus ^mdensity mpg mpg mpg, width(1/3)^ shows the effects of using different widths ^1 2 3^ on the same data.
See help for @numlist@ for further details on numlists. ^by(^byvar^)^ specifies that calculations are to be carried out separately for each class defined by a single variable byvar. The graph will, however, show the functions for all classes. ^by( )^ is only allowed with a single varname.
^missing^, used only with ^by( )^, permits the use of non-missing values of varname corresponding to missing values for the variable named by ^by( )^. The default is to ignore such values.
^zero^ specifies that densities estimated as zero are to be shown as such. The default is not to show such values.
^log^ specifies that estimation is be carried out on logarithms of the data and inverted. That is, for a density f(x),
estimate of f(x) = estimate of f(log x) * (1 / x),
given that 1 / x = d/dx (log x). See Remarks above. This method is appropriate only for data that are all positive. In particular, if data are right skewed, it smooths more in the tail and less near the main part of the distribution than the default method. ^logit^ specifies that estimation is be carried out on logits of the data and inverted. That is, for a density f(x),
estimate of f(x) = estimate of f(logit x) * (b - a) / ((x - a)(b - x)),
where logit x = log ((x - a) / (b - x)), a slight generalisation of the usual definition. Note that (b - a) / ((x - a)(b - x)) = d/dx (logit x). See Remarks above. This method is appropriate only for data that are between a and b. See also just below.
^a(^#^)^ and ^b(^#^)^ tune constants in the definition of the logit function above. By default ^a^ is 0 and ^b^ is 1, giving the usual logit, i.e. log (x / (1 - x)).
^log^ and ^logit^ may not be specified together. ^biweight^, ^cosine^, ..., ^triangle^ specify the kernel. (Actually, ^cosine^ specifies the cosine trace as there is no such thing as a cosine kernel.) By default, ^epan^, meaning the Epanechnikov kernel, is used.
^intensity^ specifies that the data are from a point process in one dimension (e.g. time or space) and that the intensity function (e.g. frequency per unit time or space) is being estimated. Results will be shown on an intensity scale, as `density' multiplied by number of observed data points. graph_options are any options allowed with ^graph, twoway^; see help @grtwoway@. With ^by( )^ and several groups, each group will be shown graphically as if it were a distinct variable.
Saved results -------------
^r(widths)^ contains the widths used for smoothing each density.
^r(scales)^ contains the bin widths used.
^r(ns)^ contains the numbers of points at which the density was evaluated.
Examples --------
. ^mdensity price, by(foreign)^
. ^mdensity mpg mpg mpg, w(1/3) l1(Density with width 1 2 3)^ ^b2(Mileage (mpg)) sy(iii) c(sss)^
. ^range size 0 2000^ . ^label var size "Length and width (m)"^ . ^mdensity length width, at(size) xla yla^
OR
. ^mdensity length width, r(0,2000) b2(Length and width (m)) xla yla^
References ----------
Bowman, A.W. and Azzalini, A. 1997. Applied smoothing techniques for data analysis: the kernel approach with S-Plus applications. Oxford: Oxford University Press.
Plackett, R.L. 1971. An introduction to the theory of statistics. Edinburgh: Oliver and Boyd.
Silverman, B.W. 1986. Density estimation for statistics and data analysis. London: Chapman and Hall.
Simonoff, J.S. 1996. Smoothing methods in statistics. New York: Springer.
Wand, M.P. and Jones, M.C. 1995. Kernel smoothing. London: Chapman and Hall.
Author ------
Nicholas J. Cox, University of Durham, U.K. n.j.cox@@durham.ac.uk
Acknowledgment --------------
Patrick Royston, Tom Steichen and Fred Wolfe made helpful comments.
Also see --------
Manual: ^[R] kdensity^ On-line: help for @kdensity@, @graph@, @numlist@