Processing math: 100%

Multivariate Yatchew Test

Let D be a vector of K random variables. Let g(D)=E[Y|D]. Denote with ||.,.|| the Euclidean distance between two vectors. The null hypothesis of the multivariate test is g(D)=α0+AD, with A=(α1,...,αK), for K+1 real numbers α0, α1, ..., αK. This means that, under the null, g(.) is linear in D. Following the same logic as the univariate case, in a dataset with N i.i.d. realisations of (Y,D) we can approximate the first difference Δε by ΔY valuing g(.) between consecutive observations. The program runs a nearest neighbor algorithm to find the sequence of observations such that the Euclidean distance between consecutive positions is minimized. The program follows a very simple nearest neighbor approach:
  1. collect all the Euclidean distances between all the possible unique pairs of rows in D in the matrix M, where Mn,m=||Dn,Dm|| with n,m{1,...,N};
  2. setup the queue to Q={1,...,N}, the (empty) path vector I={} and the starting index i=1;
  3. remove i from Q and find the column index j of M such that Mi,j=mincQMi,c;
  4. append j to I and start again from step 3 with i=j until Q is empty.
To improve efficiency, the program collects only the N(N1)/2 Euclidean distances corresponding to the lower triangle of matrix M and chooses j such that Mi,j=mincQ1{c<i}Mi,c+1{c>i}Mc,i. The output of the algorithm, i.e. the vector I, is a sequence of row numbers such that the distance between the corresponding rows Dis is minimized. The program also uses two refinements suggested in Appendix A of Yatchew (1997): By convention, the program computes (2log10N)K subcubes, where each univariate partition is defined by grouping observations in 2log10N quantile bins. If K=2, the user can visualize in a graph the exact path across the normalized Dis by running the command with the option path_plot. Once the dataset is sorted by I, the program resumes from step (2) of the univariate case.