Processing math: 58%

Multivariate Yatchew Test

Let D be a vector of K random variables. Let g(D)=E[Y|D]. Denote with ||.,.|| the Euclidean distance between two vectors. The null hypothesis of the multivariate test is g(D)=α0+AD, with A=(α1,...,αK), for K+1 real numbers α0, α1, ..., αK. This means that, under the null, g(.) is linear in D. Following the same logic as the univariate case, in a dataset with N i.i.d. realisations of (Y,D) we can approximate the first difference Δε by ΔY valuing g(.) between consecutive observations. The program runs a nearest neighbor algorithm to find the sequence of observations such that the Euclidean distance between consecutive positions is minimized. The program follows a very simple nearest neighbor approach:
  1. collect all the Euclidean distances between all the possible unique pairs of rows in D in the matrix M, where Mn,m=||Dn,Dm|| with n,m{1,...,N};
  2. setup the queue to Q={1,...,N}, the (empty) path vector I={} and the starting index i=1;
  3. remove i from Q and find the column index j of M such that Mi,j=min;
  4. append j to I and start again from step 3 with i = j until Q is empty.
To improve efficiency, the program collects only the N(N-1)/2 Euclidean distances corresponding to the lower triangle of matrix M and chooses j such that M_{i,j} = \min_{c \in Q} 1\lbrace c < i\rbrace M_{i,c} + 1\lbrace c > i\rbrace M_{c,i}. The output of the algorithm, i.e. the vector I, is a sequence of row numbers such that the distance between the corresponding rows \textbf{D}_{i}s is minimized. The program also uses two refinements suggested in Appendix A of Yatchew (1997): By convention, the program computes (2\lceil \log_{10} N \rceil)^K subcubes, where each univariate partition is defined by grouping observations in 2\lceil \log_{10} N \rceil quantile bins. If K = 2, the user can visualize in a graph the exact path across the normalized \textbf{D}_{i}s by running the command with the option path_plot. Once the dataset is sorted by I, the program resumes from step (2) of the univariate case.