Title
blncdtree() -- Balanced search tree functions used by somersd
Syntax
real matrix blncdtree(nnode)
void _blncdtree(tree, imin, imax)
tidot = tidottree(x, y [ , weight [ , xcen [ , ycen ] ] ] )
tidot = tidot(x, y [ , weight [ , xcen [ , ycen ] ] ] )
where
nnode: real scalar tree: real matrix imin: real scalar imax: real scalar tidot: real colvector x: real colvector y: real colvector weight: real colvector xcen: real colvector ycen: real colvector
Description
These functions are used by the somersd package to create balanced binary search trees and to calculate tidot vectors as defined below under Remarks. Applications of these functions are discussed in the file somersd.pdf, which is distributed with the somersd package.
blncdtree(nnode) returns a balanced binary search tree matrix, representing a balanced binary search tree whose nodes are the positive integer indices from 1 to nnode. The result is defined as an nnode x 2 matrix, whose ith row contains, in column 1, the left daughter index of the index i (or zero if i has no left daughter) and contains, in column 2, the right daughter index of the index i (or zero if i has no right daughter). A balanced binary search tree has the feature that, for any index i, the numbers of indices in the left and right subtrees of i differ by no more than one. The root node of the returned balanced binary search tree matrix is the index given by the Mata expresstion trunc((1+nnode)/2).
_blncdtree(tree, imin, imax) inputs an existing matrix in tree and reassigns columns 1 and 2 of rows imin to imax of tree, so that these rows define a balanced binary search tree whose nodes are the row indices i such that imin <= i <= imax. The root node of this search tree is the index given by the Mata expression trunc((imin+imax)/2). blncdtree() works by calling _blncdtree(), which works by assigning daughter indices to the root node and then calling itself recursively to produce left and right subtrees for the root node. _blncdtree() is a fast program that performs no conformability or consistency checks, although it will abort if the matrix tree has fewer than two columns.
tidottree(x, y, weight, xcen, ycen) inputs a vector of X values in x and a parallel vector of Y values in y, and, optionally, other parallel vectors in weight, xcen, and ycen, containing, respectively, the weights, the censorship indicators for x, and the censorship indicators for y. A censorship indicator value is interpreted as implying left-censorship if negative, right-censorship if positive, and noncensorship if zero. tidottree() assumes that weight is a vector of ones and that xcen and ycen are vectors of zeros, if these arguments are not provided by the user. tidottree() returns, as output, a new vector tidot, parallel to the input vectors, which will contain the weighted sums of concordance-discordance differences corresponding to the input vectors, assuming that the data matrix defined by these input vectors is sorted in ascending order of x. tidottree() performs no checks that the input data matrix is indeed sorted by x. However, tidottree() uses a search tree algorithm, which calculates the tidot values in a time proportional to N*log(N). It creates the search tree by calling blncdtree().
tidot(x, y, weight, xcen, ycen) inputs the same vectors as tidottree() and returns the same output vector tidot, which in this case contains concordance-discordance totals whether or not the input data matrix is sorted. However, tidot() does not use the search tree algorithm and therefore requires a time proportional to the square of N. tidot() is therefore less efficient than tidottree() for large datasets, even after accounting for the time taken to sort the dataset before calling tidottree().
Remarks
The vector tidot calculated by tidottree and tidot is defined as follows. For scalars u, p, v, and q, define the function
csign(u,p,v,q)
as 1 if u>v and p>=0>=q, -1 if u<v and p<=0<=q, and 0 otherwise. For row indices i and j, define
T_ij = csign(x_i,xcen_i,x_j,xcen_j) * csign(y_i,ycen_i,y_j,ycen_j)
as the concordance-discordance difference between rows i and j of the input data vectors. Then the vector tidot, returned by tidottree or tidot, is defined by
N tidot_i = Sum weight_j * T_ij j=1
for each index i. The function tidot() calculates tidot by calculating all the T_ij and summing them, taking a time asymptotically proportional to the square of N. The function tidottree() uses a search tree algorithm to calculate tidot in a time asymptotically proportional to N*log(N). The algorithm used by tidottree() is a more advanced version of the one presented in Newson (2006) and assumes that the input data vectors are sorted in ascending order of the values of x. tidottree() calls blncdtree() to create a search tree with one row for each value of y and uses this search tree to calculate tidot without calculating the individual T_ij.
Applications of these functions are discussed in the manual somersd.pdf, which is distributed with the somersd package as an ancillary file. Balanced binary search trees are discussed in Wirth (1976).
Conformability
blncdtree(nnode): nnode: 1 x 1
_blncdtree(tree, imin, imax): tree: M x K where K >= 2 imin: 1 x 1 imax: 1 x 1
tidottree(x, y, weight, xcen, ycen): x: N x 1 y: N x 1 weight: N x 1 xcen: N x 1 ycen: N x 1
tidot(x, y, weight, xcen, ycen): x: N x 1 y: N x 1 weight: N x 1 xcen: N x 1 ycen: N x 1
Diagnostics
blncdtree(nnode) aborts with error if nnode < 0.
_blncdtree(tree, imin, imax) aborts with error if cols(tree) < 2.
blncdtree(0) can return a 0 x 2 result.
Source code
blncdtree.mata, _blncdtree.mata, tidottree.mata, tidot.mata
Author
Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk
References
Newson, R. 2006. Efficient calculation of jackknife confidence intervals for rank statistics. Journal of Statistical Software 15: 1-10.
Wirth, N. 1976. Algorithms + Data Structures = Programs. Englewood Cliffs, NJ: Prentice-Hall.
Also see
Manual: [M-0] intro
Online: mata, somersd (if installed)