Title
fastgini -- Fast algorithm for calculation of Gini coefficient and it's jackknife standard errors
Syntax
fastgini varname [if] [in] [weight] [, bin(#) jk Level(#) nocheck]
pweights and fweights are allowed; see weight.
Description
fastgini calculates the Gini coefficient for either unit-level or aggregated level data. Optionally it returns the jackknife estimates of the standard error. fastgini uses a fast optimized algorithm that could be especially useful when calculating the Gini coefficient and it's standard errors for the large samples. The command implements algorithms for both exact and approximate calculation of the Gini coefficient.
+------+ ----+ Main +-------------------------------------------------------------
bin(#) set number of bins. Specifying this option can dramatically reduce the computation time when working with large datasets (1M+ obs). When bin(#) is specified fastgini uses approximation algorithm for Gini calculation. Specifying the sufficient number bins allows obtaining the approximation for the Gini at any desired level of precision. For example, on the dataset of 1,000,000 observations bin(100,000) will in most cases estimate computer-exact value of Gini. This calculation required significantly less computer time compared to the exact estimation of the Ginin on whole sample.
jk estimate jackknife (leave-one-out) standard error of the Gini coefficient. An efficient method of calculating jackknife estimates involves only two (one to get the Gini coefficient itself and another for standard errors) runs through the data.
level(#) set confidence level for the reported jackknife confidence intervals; default is level(95).
nocheck by default, non-positive values of varname are excluded from Gini calculations. Specifying {opt nocheck} skips the value check as well as ignores [if] [in] conditions. The option can be useful to speed-up the execution if fastgini is used within loops.
Saved Results
fastgini saves in r():
r(gini) calculated Gini coefficient;
if jk option specified:
r(se) jackknife estimate for the standard error of the Gini;
r(mse) jackknife estimate for the mean standard error of the Gini;
r(gini_jk) jackknife estimate for the Gini.
Remarks
fastgini uses formula:
i=N j=i SUM W_i*(SUM W_j*X_j - W_i*X_i/2) i=1 j=1 G = 1 - 2* ---------------------------------- i=N i=N SUM W_i*X_i * SUM W_i i=1 i=1
where observations are sorted in ascending order of X.
if bin(M) is specified, the data are aggregated into M equal-size bins, i.e.
~ X_i = (X_min + i * binsize) binsize = (X_max - X_min)/M
~ ~ ~ W_i = SUM W_j (if X_(i-1)<=X_j<X_i) i=1..M j
and then Gini coefficient is calculated using aggregated data.
Examples
.fastgini pc_exp
.fastgini income [w=weight], jk
.fastgini income [w=weight], bin(10000)
Author
Zurab Sajaia, DECRG-PO SDG, The World Bank, zsajaia@worldbank.org
References
Karagiannis E. and M. Kovacevic' (2000), "A Method to Calculate Jakknife Variance Estimator For the Gini Coefficient", Oxford Bulletin of Economics and Statistics, Vol. 62 Issue 1 119-122.
Also see
Online: jackknife
Links to user-written programs: inequal7, egen_inequal, mm_gini(), ineqerr, ineqdeco, ineqdec0