Processing math: 58%
Multivariate Yatchew Test
Let D be a vector of K random variables. Let g(D)=E[Y|D]. Denote with ||.,.|| the Euclidean distance between two vectors. The null hypothesis of the multivariate test is g(D)=α0+A′D, with A=(α1,...,αK), for K+1 real numbers α0, α1, ..., αK. This means that, under the null, g(.) is linear in D. Following the same logic as the univariate case, in a dataset with N i.i.d. realisations of (Y,D) we can approximate the first difference Δε by ΔY valuing g(.) between consecutive observations. The program runs a nearest neighbor algorithm to find the sequence of observations such that the Euclidean distance between consecutive positions is minimized.
The program follows a very simple nearest neighbor approach:
- collect all the Euclidean distances between all the possible unique pairs of rows in D in the matrix M, where Mn,m=||Dn,Dm|| with n,m∈{1,...,N};
- setup the queue to Q={1,...,N}, the (empty) path vector I={} and the starting index i=1;
- remove i from Q and find the column index j of M such that Mi,j=min;
- append j to I and start again from step 3 with i = j until Q is empty.
To improve efficiency, the program collects only the N(N-1)/2 Euclidean distances corresponding to the lower triangle of matrix M and chooses j such that M_{i,j} = \min_{c \in Q} 1\lbrace c < i\rbrace M_{i,c} + 1\lbrace c > i\rbrace M_{c,i}. The output of the algorithm, i.e. the vector I, is a sequence of row numbers such that the distance between the corresponding rows \textbf{D}_{i}s is minimized. The program also uses two refinements suggested in Appendix A of Yatchew (1997):
- The entries in \textbf{D} are normalized in [0,1];
- The algorithm is applied to sub-cubes, i.e. partitions of the [0,1]^K space, and the full path is obtained by joining the extrema of the subpaths.
By convention, the program computes (2\lceil \log_{10} N \rceil)^K subcubes, where each univariate partition is defined by grouping observations in 2\lceil \log_{10} N \rceil quantile bins. If K = 2, the user can visualize in a graph the exact path across the normalized \textbf{D}_{i}s by running the command with the option path_plot.
Once the dataset is sorted by I, the program resumes from step (2) of the univariate case.