{smcl}
{* NJC 10apr2025}{...}

{cmd:help lmomentsets}
{hline}

{title:Title}

{p 4 4 2}{bf:lmomentsets} {hline 2} L-moment-based measures collected as datasets{p_end}


{title:Syntax}

{phang}{it:Variables syntax}

{p 8 11 2}
{cmd:lmomentsets} {varlist} 
{ifin}
[
{cmd:,} 
{opt inclusive} 
{opt lmax(#)}
{opt saving(filespec)}
{it:list_options}
]

{phang}{it:Groups syntax}

{p 8 11 2}
{cmd:lmomentsets} {varname} 
{ifin}
{cmd:,} {opt over(groupvar)} 
[
{cmdab:t:otal}
{opt lmax(#)}
{opt saving(filespec)}
{it:list_options}
]


{title:Description}

{pstd}
{cmd:lmomentsets} computes L-moment-based measures and collects them into
datasets.  Compare {help lmoments} which is separate.          

{pstd}
There are two syntaxes. 

{p 8 8 2}{cmd:lmomentsets} {it:varlist} calculates results
for one or more variables {it:varlist}. This is called the 
{it:variables syntax}.  

{p 8 8 2}{cmd:lmomentsets} {it:varname}{cmd:,} {opt over(groupvar)} 
calculates results for one variable {it:varname} for each distinct value of
{it:groupvar}. This is called the {it:groups syntax}.  

{pstd}
An L-moments-based measures set consists of a temporary dataset consisting of some 
or occasionally all of the following variables. 

{p 4 4 2}
* {cmd:varname} is a string variable holding the name or names of the
variable(s) being summarized. 

{p 4 4 2}
* {cmd:varlabel} is a string variable holding the variable label of each
variable being summarized. If no variable label has been defined, 
its value is instead the variable name. 

{p 4 4 2}
* (Groups syntax only) {cmd:origgvar} is a numeric or string variable 
as specified in the {cmd:over()} option.

{p 4 4 2}
* (Groups syntax only) {cmd:groupvar} is a string variable holding the
name of the group variable specified in the {cmd:over()}
option. 

{p 4 4 2}
* (Groups syntax only) {cmd:gvarlabel} is a string variable holding the
variable label of the group variable {it:groupvar} specified in the 
{cmd:over()} option. If no variable label has been defined, 
the value is instead the variable name.

{p 4 4 2}
* (Groups syntax only) {cmd:group} is a numeric variable with value labels 
describing each distinct value of {it:groupvar}. Each such variable 
has integer values 1 up and value labels derived from the variable specified. 

{p 4 4 2}
* {cmd:n} is a numeric variable holding the number of observations used 
in the estimate. 

{p 4 4 2}
* Any or all of {cmd:l_1} upward, {cmd:t}, {cmd:t_3} upward. The number 
of such variables is determined by the {cmd:lmax()} option, which 
defaults to 4. Thus by default the variables produced are L-moments 1 2 3 4, 
{cmd:t} (= {cmd:l_2/l_1}), 
{cmd:t_3} (= {cmd:l_3/l_2}), 
{cmd:t_4} (= {cmd:l_4/l_2}). 


{title:Remarks}

{title:{it:Mainly motivational}} 

{pstd}Definitions here are by way of example and phrased using slightly 
idiosyncratic notation, as conventional subscripts, summation and integral 
signs are not possible in a help file. 

{pstd}Here A[], following Whittle (1970, 1992, 2000), denotes averaging 
and {bind:C[n, k]} ("n choose k"), following (e.g.) Hamming (1985) or Allenby 
and Slomson (2011), denotes the binomial coefficient or choice number 
{bind:n! / (n - k)! k!}. {bind:C[n, k]} corresponds to the Stata or Mata function 
{cmd:comb(n, k)}. As usual k! is k factorial, namely the product
{bind:k*(k-1)*...*2*1}. 

{pstd}Use of upper case L or lower case l for L-moments is a little
capricious, but may imply emphasis on general definition or specific
calculation from data respectively. 

{pstd}Order statistics x(j:k) from a subsample of size k from a variable 
x are just data values ordered such that
 
{pstd}x(1:k) <= x(2:k) <= ... <= x(k-1:k) <= x(k:k). 

{pstd}To give some flavour of L-moments, we consider three takes. _1
for example indicates that subscript 1 would be shown whenever possible. 

{pstd}{it:Take 1}

{pstd}The first L-moment L_1 is just the usual mean or average as a measure of 
level or location, in this notation L_1 = A[x(1:1)] where the average is taken 
over all C[n, 1] = n subsamples of size 1. 

{pstd}The second L-moment L_2 is half the average difference between the 
larger and smaller order statistics over subsamples of size 2, so half
that average difference over C[n, 2] such subsamples, or 
{bind:(1/2) A[x(2:2) - x(1:2)]}. The second moment is, apart from  halving, the 
measure often known as Gini's mean difference, which however predates Gini. 
It is a measure of spread or scale.

{pstd}The third L-moment L_3 is based on largest, middle, and smallest order 
statistics in subsamples of size 3 and is 
{bind:(1/3) A[x(3:3) - 2x(2:3) + x(1:3)]}. 
The averaging is over {bind:C[n, 3]} such subsamples. 
It is a measure of asymmetry or skewness. 

{pstd}As we proceed, verbal paraphrases become more awkward and no
easier to think about than direct notation. The fourth L-moment L_4 is 
based on the order statistics of subsamples of size 4: there are
{bind:C[n, 4]} such subsamples. It is 
{bind:(1/4) A[x(4:4) - 3x(3:4) + 3x(2:4) - x(1:4)]} and is a measure of tail 
weight or kurtosis. 

{pstd}L-moments all have the same units of measurement and dimensions as the 
original data. As already mentioned, it is often convenient to calculate 
dimensionless versions, most importantly L_2/L_1 =: t, L_3/L_2 =: t_3,
L_4/L_2 =: t_4. 

{pstd}The first four L-moments are the most useful and many projects use no 
others. But L-moments of any order k may be defined as 

{pstd}(1) linear combinations of the order statistics of a subsample of 
size k, with coefficients extending the pattern 

{space 4}1 
{space 4}1 -1 
{space 4}1 -2 1 
{space 4}1 -3 3 1 

{pstd}{c -} namely binomial coefficients alternately assigned positive and negative signs {c -}

{pstd}(2) averaged over all C[n, k] subsamples from a sample of size n and 

{pstd}(3) multiplied by prefactor (1/k). 

{pstd}This approach may help to give much flavour. So the L-moments are 
a series of measures of a sample, giving in turn indicators of level, 
spread, asymmetry, tail weight and yet further properties. However, it is 
utterly hopeless as a practical recipe for calculation, as the number of 
combinations to deal with explodes with even modest n and r. 

{pstd}{it:Take 2}

{pstd}A more abstract but still helpful view is that each L-moment is a 
weighted average over the quantile function x = Q(p) for probability p 
from 0 to 1. The weighting functions are W[p, k]

{pstd}1 =: W[p, 1]

{pstd}2p - 1 =: W[p, 2]

{pstd}6p^2 - 6p + 1 =: W[p, 3]

{pstd}20p^3 - 30p^2 + 12p - 1 =: W[p, 4]

{pstd}so that the kth L-moment is A[ Q(p) W[p, k] ]. 

{pstd}{it:Take 3}

{pstd}Take 2 has a practical equivalent in terms of each L-moment being 
calculated as a L-statistic, a weighted linear combination of the order 
statistics. Without delving into the precise recipe, each L-moment is an
L-statistic with form 

{pstd}L_k = A [ weight(k, j, n) x(j:n) ]

{pstd}where the weights depend on the L-moment being calculated, the
ranks j and the sample size n. 

{pstd}{cmd:lmoments_explain.do}, distributed with this package, contains
code for three explanatory graphs: (1) motivating the idea that
subsamples of size 1, 2, 3 and 4 contain information on level, spread,
(a)symmetry and tail weight; (2) showing weight functions continuous in
probability p; (3) showing weights used in calculating from a sample of
size 19, a size both small enough and large enough to make the idea
concrete. 

{title:{it:Comments on using the command}} 

{pstd}{cmd:lmomentsets} by default lists its results. Although saving to 
a permanent dataset is optional, that is the intended key to many  
useful applications. Either the results dataset is what 
is needed or it may be combined using {help append} or {help merge} 
with other such sets for further analysis. 

{pstd}The approach is thus one of providing a building block that may 
be useful directly or if combined with other building blocks. 
Flexibility is needed because so many different problems may be of interest, 
not just comparison of measures for different variables, or of measures 
for one variable for different groups, but also of measures for several 
variables and several groups; and so forth. 

{pstd}At first sight, such a results set may seem repetitious. 
With a little experience, you will see that such repetition is often 
helpful when combining such sets. In any case, you can always ignore what you 
do not need. Similarly, you can use {help rename} and {help replace}
as you wish downstream of this command.

{pstd}The {cmd:l_*} and {cmd:t_*} variables created by this command do not 
have defined variable labels. As always, you may wish to define your 
own variable labels, particularly for graphical purposes. Note that 
italic font and literal subscripts are available using {help smcl}. 

{pstd}Graphical and other applications lie downstream of this command,
although some suggestions are included in the examples. A plot of 
l_2 against l_1 (mean) could be used as a guide
to the structure of variability. A plot of t_4 against t_3 
is a standard for considering distribution shape. 

{pstd}Helper commands include {cmd:myaxis} to sort on some criterion 
(Cox 2021b) and {cmd:nicelabels} (Cox 2022) and {cmd:niceloglabels} (Cox 2018) 
for automating axis labels. 

{pstd}The approach to correlation confidence intervals of Cox (2008) is 
broadly similar. See also {help cisets}, {help momentsets} or {help pctilesets} if installed. 

{title:{it:Leads to the literature}} 

{pstd}Hosking (1990) is the definitive paper. Although Hosking and Wallis (1997) is largely 
focused on hydrological applications, it contains much material applicable very broadly. 
Hosking (1992), Royston (1992), Vogel and Fennessey (1993) and Wang (1996) are short and/or 
non-technical papers that in various ways explain why you should find L-moments interesting and useful. 


{title:Options}

{it:Options allowed with either syntax}

{phang}
{cmd:lmax()} indicates the highest L-moment to be calculated. The default is 4.

{phang}
{it:list_options} are any options of {help list} other than {cmd:noobs} that 
may be specified to tune listing of the results set. 

{phang}
{opt saving(filespec)} specifies saving the results set to a file as a 
Stata dataset. The suboption {cmd:, replace} must be specified to overwrite
an existing dataset. 


{it:Option allowed with the variables syntax}

{phang}
{opt inclusive} may be specified if you wish to work with several variables 
together. By default, calculations are only made with observations that
have non-missing values for all variables specified. This option
overrides that default selection: hence for several variables which
observations with non-missing values are used will be determined
separately for each variable. In other jargon, this option triggers
casewise deletion, not listwise deletion or complete case analysis. 
As a convenience for people familiar with that term, or with other 
syntax used to this effect, {cmd:cw} and {cmd:allobs} are allowed 
as synonyms. 


{it:Option compulsory with the groups syntax}

{phang}
{opt over(groupvar)} must be specified to name the group variable. 
Distinct groups of observations on {it:groupvar} will be used 
to produce separate results for the main variable specified. 


{it:Option allowed with the groups syntax}

{phang}
{opt total} may be used with {opt over(groupvar)}. It specifies that in addition
to output for each group, output be added for all groups combined.


{title:Examples}

{phang}{cmd:. sysuse auto, clear}{p_end}
{phang}{cmd:. lmomentsets mpg, over(foreign)}{p_end}

{phang}* For this dataset, see Mardia {it:et al.} (1979, 2024){p_end}
{phang}{cmd:. use https://www.stata-journal.com/software/sj20-2/pr0046_1/mathsmarks, clear}{p_end}
{phang}{cmd:. lmomentsets *}{p_end}

{phang}* For this dataset and some Stata uses: see Hosking and Wallis (1997) and Cox (2010, 2021a){p_end}
{phang}{cmd:. use https://www.stata-journal.com/software/sj10-4/gr0046/windspeed.dta, clear}{p_end}
{phang}{cmd:. lmomentsets windspeed, over(place) saving(foo, replace)}{p_end}
{phang}{cmd:. use foo, clear}{p_end}
{phang}{cmd:. gen where = cond(l_1 < 51, 3, 9)}{p_end}
{phang}{cmd:. scatter l_2 l_1, mla(group) mlabvpos(where) name(LMO1, replace)}{p_end}
{phang}{cmd:. replace where = cond(t_3 < 0.25, 3, 9)}{p_end}
{phang}{cmd:. scatter t_4 t_3, mla(group) mlabvpos(where) name(LMO2, replace)}{p_end}
{phang}{cmd:. graph combine LMO1 LMO2}{p_end}


{title:Author}

{p 4 4 2}Nicholas J. Cox, Durham University{break}
n.j.cox@durham.ac.uk


{title:References}

{phang}
Allenby, R. B. J. T., and A. Slomson. 2011.
{it:How to Count: An Introduction to Combinatorics}. 
Boca Raton, FL: CRC Press.

{phang}Cox, N. J. 2008. 
Speaking Stata: Correlation with confidence, or Fisher's z revisited.
{it:Stata Journal} 8: 413{c -}439. 

{phang}Cox, N. J. 2010.
Speaking Stata: Graphing subsets. 
{it:Stata Journal} 10: 670{c -}681.

{phang}Cox, N. J. 2018. 
Speaking Stata: Logarithmic binning and labeling.
{it:Stata Journal} 18: 262{c -}286. 

{phang}Cox, N. J. 2021a. 
Speaking Stata: Front-and-back plots to ease spaghetti and paella problems. 
{it:Stata Journal} 21: 539{c -}554.

{phang}Cox, N. J. 2021b. 
Speaking Stata: Ordering or ranking groups of observations.
{it:Stata Journal} 21: 818{c -}837. 

{phang}Cox, N. J. 2022.       
Speaking Stata: Automating axis labels: Nice numbers and transformed scales. 
{it:Stata Journal} 22: 975{c -}995. 

{phang}
Hamming, R. W. 1985.
{it:Methods of Mathematics Applied to Calculus, Probability, and Statistics}.
Englewood Cliffs, NJ: Prentice-Hall.

{phang}Hosking, J. R. M. 1990. L-moments: Analysis and estimation of
distributions using linear combinations of order statistics.
{it:Journal of the Royal Statistical Society} Series B 52: 105{c -}124.

{phang}Hosking, J. R. M. 1992. Moments or L-moments? 
An example comparing two measures of distributional shape.
{it:American Statistician} 46: 186{c -}189. 

{phang}Hosking, J. R. M. 2006.  
On the characterization of distributions by their L-moments. 
{it:Journal of Statistical Planning and Inference}
136: 193{c -}198.  

{phang}Hosking, J. R. M. and N. Balakrishnan. 2015. 
A uniqueness result for L-estimators, with applications to L-moments. 
{it:Statistical Methodology} 24: 69{c -}80. 

{phang}Hosking, J. R. M. and J. R. Wallis. 1997. 
{it:Regional Frequency Analysis: An Approach Based on L-Moments.} 
Cambridge: Cambridge University Press. 

{phang}Jones, M. C. 2004. 
On some expressions for variance, covariance, skewness and L-moments. 
{it:Journal of Statistical Planning and Inference} 
126: 97{c -}106. 

{phang}Mardia, K. V., J. T. Kent and J. M. Bibby. 1979. 
{it:Multivariate Analysis.} London: Academic Press. 

{phang}Mardia, K. V., J. T. Kent and C. C. Taylor. 2024. 
{it:Multivariate Analysis.} Hoboken, NJ: John Wiley. 

{phang}Royston, P. 1992. 
Which measures of skewness and kurtosis are best? 
{it:Statistics in Medicine} 11: 333{c -}343. 

{phang}Serfling, R. and P. Xiao. 2007. 
A contribution to multivariate L-moments: L-comoment matrices. 
{it:Journal of Multivariate Analysis} 98: 1765{c -}1781.  

{phang}Vogel, R. M. and N. M. Fennessey. 1993. 
L-moment diagrams should replace product moment diagrams. 
{it:Water Resources Research} 29: 1745{c -}1752. 
 
{phang}Vogel, R. M., S. M. Papalexiou, J. R. Lamontagne and F. C. Dolan. 
2024. When heavy tails disrupt statistical inference. 
{it:The American Statistician} 1{c -}15. 
{browse "https://doi.org/10.1080/00031305.2024.2402898":https://doi.org/10.1080/00031305.2024.2402898}

{phang}Wang, Q. J. 1996. Direct sample estimators of L-moments. 
{it:Water Resources Research} 32: 2617{c -}2619. 

{phang}
Whittle, P. 1970. 
{it:Probability.} Harmondsworth: Penguin. 

{phang}
Whittle, P. 1992. 
{it:Probability via Expectation.} 3rd ed. 
New York: Springer. 

{phang}
Whittle, P. 2000.
{it:Probability via Expectation}. 4th ed.
New York: Springer. 


{title:Also see}

{p 4 4 2}help for {help lmoments}