{smcl}
{* 6/8/2017}{...}
{hi:help mcib}
Version 5.1, June 14, 2019.
{hline}
{title:Title}
{pstd}{hi:mcib} {hline 2} Mean-constrained Integration over Brackets (MCIB) estimator for grouped income data.
This program implements the method described in Jargowsky and Wheeler (2018), "Estimating Income Statistics From Grouped Data:
Mean-Constrained Integration over Brackets."
{title:Syntax}
{p 8 16 2} {cmd:mcib} {count} {lower} {upper} {if}, Mean(mean)|TWOPoint [{it:options}]
{p_end}
{synoptset 25 tabbed}{...}
{marker opt}{synopthdr:options}
{synoptline}
{synopt :{opt by:(idvar)}} Specifies id variable for units, e.g. metropolitan areas.
{p_end}
{synopt :{opt uni:form(none|first|belowmed)}} Brackets in which to require uniform distribution.
Allowable values are none, first (the default), belowmed (all brackets below median).
{p_end}
{synopt :{opt pare:to(top|toptwo|abovemed)}} Brackets in which to use Pareto distribution.
Allowable values are top (the default), toptwo, or abovemed (all brackets above the meidan).
{p_end}
{synopt :{opt part:s(#)}} Number of parts to divide brackets for calculating Gini.
Default is 5.
{p_end}
{synopt :{opt mina:lpha(#)}} Minimum value for alpha; default is 2.
{p_end}
{synopt :{opt s:aving(filename)}} Save results to specified file.
{p_end}
{synopt :{opt replace: }} Replace results dataset on disk if it already exists.
{p_end}
{synopt :{opt l:ist}} Lists results on screen. Default if saving or keep is not specified.
{p_end}
{synopt :{opt keep: }} Discard original data (destructive) and keep results in memory.
{p_end}
{title:Description}
{pstd} {cmd:mcib} implements the mean-constrained integration over brackets (MCIB) estimator
described in Jargowsky and Wheeler (2018) to estimate the standard deviation and other
parameters of an income distribution from summarized data in brackets or bins. Prior research
typically used midpoints of the brackets and pareto extrapolation in the open-ended top bracket
(Henson 1967; Cloutier 1988). Von Hippel et al. (2016) presented two improved methods: the Robust
Pareto Midpoint Estimator (RPME) and the multimodel generalized beta estimator (MGBE).
This method, MCIB, estimates integrals of the desired statistics over the income brackets.
By default, the density of the lowest bracket is assumed to be uniform,
the intermediate brackets are assumed to have a sloping linear desisty function,
and the open-ended top bracket is assumed to follow a Pareto distribution.
Testing with PUMS data showed MCIB with the mean() option to be more accurate
than all previous methods (see Jargowsky and Wheeler 2018).
{pstd} The following
statistics are estimated: variance, standard deviation, various percentiles, the coefficient of
variation (cov), Gini, Theil, the ratio of P90/P10, the interquartile range, and the shares of
income going to each income quintile. If the twopoint option is specified, the mean is also estimated,
at the expense of some accuracy on other statistics. Results are displayed, saved to a file,
or kept in memory, replacing the original data, depending on the options specified.
{pstd} {cmd:mcib} assumes that the data are "grouped" (a.k.a. binned, bracketed, interval-censored)
so that each row reports how many individuals or households have values in the interval (lower,upper).
Grouped data are commonly used to summarize distributions of income or wealth
across individuals, families, and households.
{title:Data Preparation}
{pstd}The data must be in one row per income bracket (aka bin) with a count variable
specifying the number of households (persons, etc.) in each bracket.
The values representing the lower and upper bounds of the income brackets must also be
specified.
The mean income for the area should be specified if available in the mean() option;
doing so will greatly improve the accuracy of the estimates. If the mean is not available,
specify the twopoint option (Cloutier 1988).
When there are multiple
areas (e.g. metropolitan areas, states, nations), there must be an id variable as well,
include it in the by() option.
{pstd}Reshape can be used to put data in the proper format. For example, the data pumstest.dta has
one row per metropolitan area, and the counts of households by income brackets are in a series
of variables hhs1-hhs16. The data in pumstest.dta look like this:
metaread meanhhy tothhs hhs1 hhs2 ... hhs15 hhs16
{result}Abilene, TX 55,815 49439 4359 3097 ... 1052 1238
Akron, OH 62,347 283246 25801 16915 ... 9899 8347
etc.
{text}
{pstd} The following steps will reshape the data into metro/bin format and
attach the minimum and maximum bin values:
{input} * First, create a temporary file with the bin amounts that will be used
tempfile amounts
input bin min max
1 0 10000
2 10000 15000
3 15000 20000
4 20000 25000
5 25000 30000
6 30000 35000
7 35000 40000
8 40000 45000
9 45000 50000
10 50000 60000
11 60000 75000
12 75000 100000
13 100000 125000
14 125000 150000
15 150000 200000
16 200000 .
end
save `amounts'
* Now, load the test data
use pumstest
* Reshape the data to be in metro/bin observations
reshape long hhs, i(metaread) j(bin)
merge m:1 bin using `amounts'
assert _merge==3
drop _merge
{text}
{pstd} The data are now in the correct format for use with {cmd:mcib}:
{input}list in 1/35, noobs sepby(metaread)
{result}
+----------------------------------------------------------------+
| metaread bin meanhhy tothhs hhs min max |
|----------------------------------------------------------------|
| Abilene, TX 1 55,815 49439 4359 0 10000 |
| Abilene, TX 2 55,815 49439 3097 10000 15000 |
| ...(brackets 3-14 omitted) |
| Abilene, TX 15 55,815 49439 1052 150000 200000 |
| Abilene, TX 16 55,815 49439 1238 200000 . |
|----------------------------------------------------------------|
| Akron, OH 1 62,347 283246 25801 0 10000 |
| Akron, OH 2 62,347 283246 16915 10000 15000 |
| ...(brackets 3-14 omitted) |
| Akron, OH 15 62,347 283246 9899 150000 200000 |
| Akron, OH 16 62,347 283246 8347 200000 . |
|----------------------------------------------------------------|
| Albany, GA 1 50,871 45536 5967 0 10000 |
| Albany, GA 2 50,871 45536 3877 10000 15000 |
| ...etc. |
{text}
{title:Examples}
{pstd}Basic use using defaults, saving results to results.dta
{p 8 16 2} {cmd:mcib hhs min max, mean(meanhhy) by(metaread) saving(results)}
{pstd}Use Pareto distribution in all brackets above median, keeping results in memory
{p 8 16 2} {cmd:mcib hhs min max, mean(meanhhy) by(metaread) pareto(abovemed) keep}
{pstd}Compute values for Los Angeles/Long Beach Metro only (4480), display results on screen.
{p 8 16 2} {cmd:mcib hhs min max if metaread==4480, mean(meanhhy) list}
{pstd}The last command produces the following output:
{text}Basic Descriptives
{result}
ID N mean var sd
1 3,218,501 76,373 7.29e+09 85,354
{text}
Important Percentiles
{result}
ID p5 p25 p50 p75 p95
1 7,144 26,052 53,451 96,505 213,380
{text}
Deciles
{result}
ID p10 p20 p30 p40 p60 p70 p80 p90
1 12,584 21,395 31,162 41,524 67,415 85,075 110,274 155,904
{text}
Inequality Measures
{result}
ID cov gini theil rat9010 iqr
1 1.118 0.486 0.405 12.389 70453.38
{text}
Income shares by quintiles
{result}
ID shrq1 shrq2 shrq3 shrq4 shrq5
1 3.123 8.172 14.083 22.606 52.016
{text}
{pstd} If the mean of the data is not available, specify the twopoint option:
{p 8 16 2} {cmd: mcib hhs min max, twopoint by(metaread) }
{text}
{title:Estimation Details}
{pstd}See Jargowsky and Wheeler (2018), available at
{browse "https://journals.sagepub.com/doi/full/10.1177/0081175018782579"}.
{title:Author}
{pstd}Paul A. Jargowsky, Rutgers University - Camden, paul.jargowsky@rutgers.edu
{title:References}
{p 4 8 2} Cloutier, Norman R. 1988.
“Pareto Extrapolation Using Grouped Income Data.”
Journal of Regional Science 28:415–19.
{p 4 8 2} Henson, Mary F. 1967.
Trends in the Income of Families and Persons in the United States, 1947-1964.
U. S. Dept. of Commerce, Bureau of the Census.
{p 4 8 2} von Hippel, P. T., Scarpino, S. V., & Holas, I. (2016).
Robust estimation of inequality from binned incomes.
Sociological Methodology, 46(1), 212-251.
[also available as an arXiv working paper,
{browse "http://arxiv.org/abs/1402.4061"}.]
{p 4 8 2} Jargowsky, Paul A. and Wheeler, Christopher A. 2018. "Estimating Income Statistics from Grouped Data:
Mean-Constrained Integration over Brackets." Sociological Methodology 48(1): 337-374.