------------------------------------------------------------------------------- help for bkrosenblatt -------------------------------------------------------------------------------

Blum, Kiefer and Rosenblatt test of bivariate independence

bkrosenblatt xvar yvar [if] [in]

Description

bkrosenblatt provides a test of bivariate independence due to Blum, Kiefer and Rosenblatt (1961), itself a variant on a test due to Hoeffding (1948). P-values are computed or indicated following the work of Mudholkar and Wilding (2003, 2005).

Remarks

Bivariate independence of continuous variables is usually tested for by focusing on a specific alternative hypothesis of linear or monotonic relationship. However, Harrell (2001), among others, stresses the value of screening data for non-monotonic relationships. More general omnibus tests have been provided by Hoeffding (1948) and, in related work, by Blum, Kiefer and Rosenblatt (1961) (hereafter BKR). For an introductory overview and comparison of such tests, with many worked empirical examples, see Hollander and Wolfe (1999, Ch.8). Mudholkar and Wilding (2003, 2005) give further references.

These tests have long remained difficult to implement fully because of a lack of information on sampling distributions for finite sample sizes. Hollander and Wolfe (1999, pp.733-734) give detailed tables for Hoeffding's statistic for n = 5(1)9, thereby extending the results of Hoeffding (1948) for n = 5, 6, 7. Hollander and Wolfe (1999, p.735) table the BKR statistic only for the limiting (asymptotic) distribution. Note that the original table in BKR (1961, p.497) carries one more decimal place.

Mudholkar and Wilding (2003, 2005) have addressed this problem with a combination of analytical and simulation work. They show that the Hoeffding and BKR procedures differ more than is widely recognised and that the asymptotic distribution for the BKR statistic can be a poor approximation for many sample sizes. More positively, they detail procedures for calculating P-values for BKR for all but very small sample sizes.

bkrosenblatt implements the BKR test. See Mudholkar and Wilding (2003) for arguments and evidence that this is generally preferable to competing tests in terms of power against positive dependence alternatives.

The main idea is best explained with some details. For data x[i], y[i], i = 1,...,n, count the numbers of values lying in each of four quadrants of the plane relative to each data point. That is, form vectors with typical elements

N_1[i] = #(x <= x[i] & y <= y[i]) N_2[i] = #(x > x[i] & y <= y[i]) N_3[i] = #(x <= x[i] & y > y[i]) N_4[i] = #(x > x[i] & y > y[i])

For conditions near independence these vectors will have very similar averages. The reduction used by BKR is the sum of (N_1 N_4 - N_2 N_3)^2 divided by n^4. (All calculations are performed elementwise.) This statistic, conventionally denoted n B_n, is a measure of bivariate dependence, but it is not on a particularly intuitive scale. Mudholkar and Wilding (2005) show that for moderate sample sizes a power transformation of n B_n has an approximately normal sampling distribution, which provides a way of computing a z-score and thus a P-value. The power h to be used and the mean mu and standard deviation sigma of the resulting distribution all depend on sample size n:

h = -0.36 + 2.866 * n^-0.775 - 0.683 * exp(-0.244 n)

mu = 4.663 - 1/(0.2137 + 0.00448 n) 15 <= n <= 24 = 3.823 - 1/(0.193 + 0.01662 n^0.8481) n >= 25 sigma = 0.614 - 1 / (1.187 + 0.0328 n)

As h is always negative, the transformation reverses high and low values. Hence the z-score reported by bkrosenblatt is calculated as z = (mu - (n B_n)^h) / sigma, so that high positive values correspond to strong bivariate dependence. Note especially that high negative values correspond to conditions near independence.

Mudholkar and Wilding (2005) do not suggest approximations for sample sizes less than 15. However, Mudholkar and Wilding (2003) give simulation results for a range of sample sizes as low as n = 5. For sample sizes between 5 and 14 therefore, bkrosenblatt does not calculate a z-score or P-value, but it does display critical values of n B_n from the simulations for each sample size. See p.46 of the 2003 paper for a fuller table, while noting the irregularity of the distribution near n = 5. Bootstrapping may appeal as an alternative, but best of all is to get more data.

Naturally, you should always look at a scatter plot too and apply whatever subject-matter knowledge is available to interpretation.

Vignettes

Wassily Hoeffding (1914-1991) was born in Mustamäki, then in Finland. He studied and worked in Denmark and Germany, gaining a doctorate from Berlin University, before entering the United States in 1946. Hoeffding attended statistical lectures at Columbia before moving in 1947 to the University of North Carolina at Chapel Hill, where he settled. He produced much pioneering work in the area of nonparametric statistics, including the development of U-statistics. Hoeffding is remembered as well read in Russian literature and as good-natured, even in the face of prolonged ill-health and disability. He was a member of the U.S. National Academy of Sciences.

Julius Rubin Blum (1922-1982) was born in Nuremberg in Germany and moved to the United States in 1937. His parents perished in the Holocaust. After war service, he gained degrees in mathematics and statistics from Berkeley. Blum researched and taught at Indiana University, Sandia Corporation, the University of New Mexico, the University of Wisconsin, Milwaukee, the University of Arizona and the University of California, Davis. He was gregarious and highly productive and collaborative in probability, ergodic theory and mathematical statistics, with 86 publications and 34 co-authors.

Jack Carl Kiefer (1924-1981) was born in Cincinnati. He studied electrical engineering, economics, and statistics at MIT and Columbia and researched and taught at Cornell and Berkeley. Kiefer was many-sided: he nearly sought a career in show business, he was a very serious amateur mycologist and he was active in liberal causes such as opposition to the Vietnam war and to the oppression of Jews and dissidents. In statistics he is best known for outstanding work on optimal design of experiments, but he contributed to several other areas. Kiefer was a member of the U.S. National Academy of Sciences.

Murray Rosenblatt (1926- ) was born in New York City. He gained degrees from City College of New York and Cornell before researching and teaching at Chicago, Indiana, Brown and (from 1964) University of California, San Diego. Rosenblatt has made outstanding contributions to several areas of probability and statistics, including Markov processes and time series. He made key specific contributions to density estimation, central limit theorems under strong mixing and long memory processes. Rosenblatt is a member of the U.S. National Academy of Sciences.

Examples

. bkrosenblatt price foreign

Saved results

r(n) number of data pairs n r(n_B_n) n B_n r(z) z-score (if n >= 15) r(P_n_B_n) P-value (if n >= 15)

Author

Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

References

Blum, J.R., Kiefer, J. and Rosenblatt, M. 1961. Distribution free tests of independence based on the sample distribution function. Annals of Mathematical Statistics 32: 485-498.

Brillinger, D.R. and Davis, R.A. 2009. A conversation with Murray Rosenblatt. Statistical Science 24: 116-140.

Brown, L.D. 1984. The research of Jack Kiefer outside the area of experimental design. Annals of Statistics 12: 406-415.

Fisher, N.I. and van Zwet, W.R. 2005. Wassily Hoeffding 1914-1991. Biographical Memoirs, National Academy of Sciences 86: 208-227.

Fisher, N.I. and van Zwet, W.R. 2008. Remembering Wassily Hoeffding. Statistical Science 23: 536-547.

Harrell, F.E. 2001. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer.

Hoeffding, W. 1948. A non-parametric test of independence. Annals of Mathematical Statistics 19: 546-557.

Hollander, M. and Wolfe, D.A. 1999. Nonparametric statistical methods. New York: Wiley.

The publications and writings of Jack Kiefer. Annals of Statistics 12: 424-430.

Mudholkar, G.S. and Wilding, G.E. 2003. On the conventional wisdom regarding two consistent tests of bivariate dependence. The Statistician 52: 41-57.

Mudholkar, G.S. and Wilding, G.E. 2005. Two Wilson-Hilferty type approximations for the null distribution of the Blum, Kiefer and Rosenblatt test of bivariate independence. Journal of Statistical Planning and Inference 128: 31-41.

Rosenblatt, M. and Samaniego, F.J. 1985. Julius R. Blum 1922-1982. Annals of Statistics 13: 1-9.

Sacks, J. 1984. Jack Carl Kiefer 1924-1981. Annals of Statistics 12: 403-405.

Sen, P.K. 1997. Hoeffding, Wassily. In Johnson, N.L. and Kotz, S. (eds) Leading personalities in statistical sciences. New York: John Wiley, 118-122.

Sun, T.C. 1997. Murray Rosenblatt: his contributions to probability and statistics. Journal of Theoretical Probability 10: 279-286.

Wynn, H.P. 1984. Jack Kiefer's contributions to experimental design. Annals of Statistics 12: 416-423.