.-
help for ^mdensity^                                       (manual:  ^[R] kdensi
> ty^)
.-

Univariate kernel density estimation, for one or more variables or groups 
-------------------------------------------------------------------------

        ^mdensity^ varname [weight] [^if^ exp] [^in^ range] 
        [ , ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^)^  
        ^by(^byvar^) miss^ing ^z^ero 
        { ^log^ | ^logit^ } ^a(^#^) b(^#^)^   
        { ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle }
        ^i^ntensity 
        graph_options ]

        ^mdensity^ varlist [weight] [^if^ exp] [^in^ range] 
        [, ^at(^varx^) r^ange^(^min^,^max^) n(^#^) w^idth^(^numlist^) z^ero 
        { ^log^ | ^logit^ } ^a(^#^) b(^#^)^   
        { ^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle }
        ^i^ntensity
        graph_options ]


^fweight^s and ^aweight^s are allowed; see help @weights@.


Description
-----------

^mdensity^ produces kernel density estimates for one or more variables or 
groups and graphs the result.

^mdensity^ is a wrapper for ^kdensity^, which is called in turn for each 
variable or group specified. See help for @kdensity@. 


Remarks
-------

The probability density f(x) of a continuous variable x has the units and 
dimensions of the reciprocal of x. If x is measured in metres, f(x) has 
units 1 / metre. The density is thus not measured on a probability scale
and, for example, it is possible for f(x) to exceed 1. 

The transformation and back transformation procedure obtained by the ^log^ 
and ^logit^ options is mentioned briefly by Silverman (1986, pp.27-30), 
although his worked example (p.28) is not very encouraging. Good expositions 
are given by Wand and Jones (1995, pp.43-45), Simonoff (1996, pp.61-64) and 
Bowman and Azzalini (1997, pp.14-16). The underlying principle is that for 
a continuous monotone transformation t(x), the densities f(x) and f(t(x)) 
are related by f(x) = f(t(x)) |dt/dx|. See Plackett (1971, pp.71-72) for 
further mathematical details.


Options
-------

^at(^varx^)^ specifies a variable that contains the values at which the 
    density should be estimated.   
    
^n(^#^)^ specifies the number of points at which the density estimate is to
    be evaluated. ^n( )^ only applies if ^at( )^ is not specified. The default 
    is min(_N,50). ^n( )^ applies to a constructed variable. If ^range(^min^,^m
> ax^)^
    is specified, it ranges from min to max. Otherwise, it ranges from the 
    minimum of the data to the maximum of the data. This behaviour differs 
    from that of ^kdensity^. 

^range(^min^,^max^)^ specifies the minimum and maximum values for which 
    density is estimated. ^range( )^ only applies if ^at( )^ is not specified. 
    See ^n( )^ above. 
    
^width(^numlist^)^ specifies the halfwidth(s) of the kernel, the width(s) of 
    the density window(s) around each point.  
    
    If ^width( )^ is not specified, or if ^width(0)^ is specified, then 
    the "optimal" width is used; see ^[R] kdensity^. In addition, that 
    optimal width will be determined separately for each group or variable. 
    In fact, for multimodal and highly skewed densities, the "optimal" is 
    usually too wide and oversmooths the density. 

    If ^width( )^ is specified, any single number is used for all variables 
    or groups; several numbers will be used in turn for variables or groups, 
    any excess of numbers being ignored and any deficiency of numbers being 
    made up by ^0^ repeated. If ^log^ or ^logit^ is also specified, widths 
    should be on a natural logarithm or logit scale.

    Thus ^mdensity mpg mpg mpg, width(1/3)^ shows the effects of 
    using different widths ^1 2 3^ on the same data. 

    See help for @numlist@ for further details on numlists. 
   
^by(^byvar^)^ specifies that calculations are to be carried out separately 
    for each class defined by a single variable byvar. The graph will, 
    however, show the functions for all classes. ^by( )^ is only allowed 
    with a single varname.

^missing^, used only with ^by( )^, permits the use of non-missing values
    of varname corresponding to missing values for the variable named by
    ^by( )^. The default is to ignore such values.

^zero^ specifies that densities estimated as zero are to be shown as such. 
    The default is not to show such values.

^log^ specifies that estimation is be carried out on logarithms of the data 
    and inverted. That is, for a density f(x),   

    estimate of f(x) = estimate of f(log x) * (1 / x),  

    given that 1 / x = d/dx (log x). See Remarks above. This method is 
    appropriate only for data that are all positive. In particular, if data 
    are right skewed, it smooths more in the tail and less near the main 
    part of the distribution than the default method.
    
^logit^ specifies that estimation is be carried out on logits of the data 
    and inverted. That is, for a density f(x),   

    estimate of f(x) = estimate of f(logit x) * (b - a) / ((x - a)(b - x)),  

    where logit x = log ((x - a) / (b - x)), a slight generalisation of the 
    usual definition. Note that (b - a) / ((x - a)(b - x)) = 
    d/dx (logit x). See Remarks above. This method is appropriate only for 
    data that are between a and b. See also just below. 

^a(^#^)^ and ^b(^#^)^ tune constants in the definition of the logit function
    above. By default ^a^ is 0 and ^b^ is 1, giving the usual logit, i.e. 
    log (x / (1 - x)). 

^log^ and ^logit^ may not be specified together.     
  
^biweight^, ^cosine^, ..., ^triangle^ specify the kernel.  (Actually, ^cosine^
    specifies the cosine trace as there is no such thing as a cosine kernel.)
    By default, ^epan^, meaning the Epanechnikov kernel, is used. 

^intensity^ specifies that the data are from a point process in one 
    dimension (e.g. time or space) and that the intensity function 
    (e.g. frequency per unit time or space) is being estimated. Results 
    will be shown on an intensity scale, as `density' multiplied by 
    number of observed data points. 
    
graph_options are any options allowed with ^graph, twoway^; see help
    @grtwoway@. With ^by( )^ and several groups, each group will be 
    shown graphically as if it were a distinct variable. 


Saved results
-------------

^r(widths)^ contains the widths used for smoothing each density.

^r(scales)^ contains the bin widths used. 

^r(ns)^ contains the numbers of points at which the density was 
evaluated. 


Examples
--------

        . ^mdensity price, by(foreign)^ 

        . ^mdensity mpg mpg mpg, w(1/3) l1(Density with width 1 2 3)^
          ^b2(Mileage (mpg)) sy(iii) c(sss)^ 

        . ^range size 0 2000^
        . ^label var size "Length and width (m)"^ 
        . ^mdensity length width, at(size) xla yla^

        OR 

        . ^mdensity length width, r(0,2000) b2(Length and width (m)) xla yla^ 


References
----------

Bowman, A.W. and Azzalini, A. 1997. Applied smoothing techniques for 
data analysis: the kernel approach with S-Plus applications. Oxford: 
Oxford University Press.

Plackett, R.L. 1971. An introduction to the theory of statistics. 
Edinburgh: Oliver and Boyd.

Silverman, B.W. 1986. Density estimation for statistics and data analysis. 
London: Chapman and Hall. 

Simonoff, J.S. 1996. Smoothing methods in statistics. New York: Springer. 

Wand, M.P. and Jones, M.C. 1995. Kernel smoothing. London: Chapman and Hall. 


Author
------

         Nicholas J. Cox, University of Durham, U.K.
         n.j.cox@@durham.ac.uk


Acknowledgment
--------------

         Patrick Royston, Tom Steichen and Fred Wolfe made helpful comments.


Also see
--------

 Manual:  ^[R] kdensity^
On-line:  help for @kdensity@, @graph@, @numlist@