```-------------------------------------------------------------------------------
help for paretofit          Stephen P. Jenkins & Philippe Van Kerm (April 2007)
-------------------------------------------------------------------------------

Fitting a Pareto (Type I) distribution by ML to unit record data

paretofit var [weight] [if exp] [in range] [, avar(varlist)
cdf(cdfname) pdf(pdfname) robust cluster(varname) from(#)
level(#) maximize_options ]

by and svy prefixes are allowed (but not jointly); see prefix.

fweights, aweights, pweights, and iweights are allowed; see weight.

Description

paretofit fits by ML a Pareto (Type I) distribution to sample
observations on a random variable var. Unit record data are assumed
(rather than grouped data).

The Pareto distribution is named after the Italian economist Vilfredo
Pareto (1848-1923). It is one of the most famous and widely studied
statistical size distributions.  It is well-known for approximating
wealth distributions, but has applications in many different fields (e.g.
for size distibutions of human setllements, sand particles, word
frequencies, or for assessing portfolio risk). See Kleiber and Kotz
(2003) for a comprehensive review of the Pareto (and other)
distributions.

The likelihood function for a sample of observations on var is specified
as the product of the densities for each observation (weighted where
relevant), and is maximized using ml model lf. A closed-form expression
for the ML estimator of the Pareto Type I shape parameter is readily
available but estimation with ml model allows us to accommodate various
sample design easily, as well as inclusion of covariates.

Options

x0(scalar) specifies the scale parameter of the Pareto distribution (see
formula below). The Pareto distribution is fitted only to sample
observations where var>=x0.  By default, x0 is set to the minimum
value of var (within the sub-sample identified by if and in clauses).

avar(varlist) allows the user to specify the shape parameter of the
distribution as a function of the covariates specified in varlist. A
constant term is always included.

stats displays selected distributional statistics implied by the Pareto
parameter estimate:  quantiles, cumulative shares of total var at
quantiles (i.e. the Lorenz curve ordinates), the mode, mean, standard
deviation, variance, half the coefficient of variation squared, Gini
coefficient, and quantile ratios p90/p10, p75/p25. This option is not
available together with avar(varlist).

poorfrac(#) displays the estimated proportion with values of var less
than the cut-off specified by #. This option may be specified when
replaying results.

cdf(cdfname) creates a new variable cdfname containing the estimated
Pareto c.d.f. value F(x) for each x.

pdf(pdfname) creates a new variable pdfname containing the estimated
Pareto p.d.f. value f(x) for each x.

robust specifies that the Huber/White/sandwich estimator of variance is
to be used in place of the traditional calculation; see [U] 23.14
Obtaining robust variance estimates.  robust combined with cluster()
allows observations which are not independent within cluster
(although they must be independent between clusters).  pweights imply
robust.

cluster(varname) specifies that the observations are independent across
groups (clusters) but not necessarily within groups.  varname
specifies to which group each observation belongs; e.g.,
cluster(personid) in data with repeated observations on individuals.
See [U] 23.14 Obtaining robust variance estimates. cluster() can be
used with pweights to produce estimates for unstratified
cluster-sampled data. Use the svy prefix for full complex survey
design support. Specifying cluster() implies robust.

from(#) specifies a starting value for the maximum likelhood estimation.

level(#) specifies the confidence level, in percent, for the confidence
intervals of the coefficients; see help level.

nolog suppresses the iteration log.

maximize_options control the maximization process. The options available
are those shown by maximize. If you are seeing many "(not concave)"
messages in the iteration log, using the difficult or technique
options may help convergence.

Saved results

In addition to the usual results saved after ml, paretofit saves the
following, if no covariates have been specified and the relevant options
are used:

e(ba) is the estimated Pareto Type I shape parameter.

e(cdfvar) and e(pdfvar) are the variable names specified for the c.d.f.
and the p.d.f.

e(mode), e(mean), e(var), e(sd), e(i2), and e(gini) are the estimated
mode, mean, variance, standard deviation, half coefficient of variation
squared, Gini coefficient. e(pX), and e(LpX) are the quantiles, and
Lorenz ordinates, where X = {1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 75,
80, 90, 95, 99}.

Formulae

The Pareto (Type I) distribution has cumulative distribution function
(c.d.f.)

F(x) = 1 - { x0 / x }^a

where a>0 is a shape parameter (estimated by paretofit), x0 is a scale
parameter, and x >= x0 > 0 is a random variable.  The right tail of a
Pareto distribution is heavier as a is smaller.

The probability density function (p.d.f.) is

f(x) = a*(x0^a) / x^(a+1).

The formulae used to derive the distributional summary statistics
presented (optionally) are as follows. The r-th moment about the origin
is given by

a*(x0^r) / (a-r)

which exists only if r<a (Kleiber and Kotz, 2003, p. 70).  It follows
that

mean = a*x0 / (a-1)

variance = a*(x0^2) / [ a*(a-2)*(a-1)^2 ]

from which the standard deviation and half the squared coefficient of
variation can be derived. These three statistics are defined only where
a>2. The density is decreasing, so the mode is simply

mode = x0.

The quantiles are derived by inverting the distribution function:

x_s = x0*(1-s)^(-1/a), for each 0 < s = F(x_s) < 1.

The median is therefore

median = x0*(2^(1/a)).

The Gini coefficient of inequality is given by

Gini = 1 / (2a - 1).

The Lorenz curve ordinates at each s = F(x_s) are given by

L(s) = 1 - (1 - s)^{1 - 1/a).

Examples

. paretofit x

. paretofit x [fw=wgt]

. paretofit

. paretofit x [aw=wgt] , x0(20)

. paretofit, stats poorfrac(100) x0(50)

. paretofit x, avar(age sex) x0(50)

Authors

Stephen P. Jenkins <stephenj@essex.ac.uk>, Institute for Social and
Economic Research, University of Essex, Colchester CO4 3SQ, U.K.

Philippe Van Kerm <philippe.vankerm@ceps.lu>, CEPS/INSTEAD, Differdange,
Luxembourg.

Reference

Kleiber, C. and Kotz, S. (2003).  Statistical Size Distributions in
Economics and Actuarial Sciences.  Hoboken, NJ: John Wiley.

Also see

Online: help for smfit, dagumfit, gb2fit, lognfit, hillp if installed.

```