.-
help for ^orthog^                                      (statalist: 10 July 1998
> )
.-

Orthogonalize variables
-----------------------

        ^orthog^ [varlist] [weight] [^if^ exp] [^in^ range] ^,^
                        ^g^enerate^(^newvarlist^)^ [ ^mat^rix^(^matname^)^ ^flo
> at^ ]


^aweight^s and ^fweight^s are allowed; see help @weights@.


Description
-----------

^orthog^ orthogonalizes "varlist" and creates a new set of orthogonal variables
"newvarlist" using a modified Gram-Schmidt procedure (Golub and Van Loan 1989).

The order of the variables in "varlist" determines the orthogonalization.
That is, if "varlist" is ^x1 x2 x3^, then the effect of the constant is first
removed from ^x1 x2 x3^, then ^x1^ is removed from ^x2^ and ^x3^, and then ^x2^
>  is
removed from ^x3^.  If "newvarlist" is ^q1 q2 q3^, we have

        q1 = a10 + a11*x1
        q2 = a20 + a21*x1 + a22*x2
        q3 = a30 + a31*x1 + a32*x2 + a33*x3

where ^q1 q2 q3^ are orthogonal and aij are constants.


Options
-------

^generate(^newvarlist^)^ is not optional.  It creates new variables containing
    the orthogonalized "varlist".  "newvarlist" must either contain exactly
    the same number of variable names as "varlist" or be abbreviated using
    either "newvar1-newvar#" or "newvar*".  See examples below.

^matrix(^matname^)^ creates a  m x m matrix called "matname" containing the
    matrix R defined by X = QR, where X is the m x n matrix representation
    of "varlist" and Q is the m x n matrix representation of "newvarlist"
    (m = number of variables in "varlist" plus the constant; n = number of
    observations).

^float^ specifies that the new variables be of type float.  The default is
    double.


Warning
-------

With many variables, ^orthog^ will be slow.  Time required is proportional to
the square of the number of variables.


Examples
--------

 . ^orthog x1 x2 x3, gen(u1 u2 u3)^
 . ^orthog x1 x2 x3, gen(u1-u3)^
 . ^orthog x1 x2 x3, gen(u*)^

 . ^orthog x1 x2 x3, gen(u*) matrix(r)^
 . ^orthog x*, gen(u*) mat(R) float^

The matrix R created by the ^matrix()^ option can be used to transform
coefficients from a regression:

 . ^orthog x*, gen(u*) mat(R)^
 . ^regress y u*^
 . ^matrix bu = get(_b)^
 . ^matrix invR = inv(R)^
 . ^matrix b1 = bu*invR'^  [note that the transpose of invR is used]

 . ^regress y x*^
 . ^matrix b2 = get(_b)^

Then b1 and b2 will be the same.

The matrix R can also be used to recover X (original "varlist") from
Q (orthogonalized "newvarlist") one variable at a time:

 . ^orthog price weight mpg, gen(upr uwei umpg) mat(R)^
 . ^matrix c = R[.,"price"]^
 . ^matrix c = c'^                      [^matrix score^ requires a row vector]
 . ^matrix score double samepr = c^
 . ^compare price samepr^

That is, the variable ^samepr^ is the same as the original ^price^.
This procedure can be performed as a check of the numerical soundness
of the orthogonalization.


Methods and formulas
--------------------

The X = QR orthogonalization is computed using a modified Gram-Schmidt
procedure (Golub and Van Loan 1989).

The columns of Q are orthogonal and R is upper triangular (actually R
is a permuted upper triangular matrix with row/column 1 interchanged
with row/column m so that the last row corresponds to the constant term).

Q is normalized so that

        Q'WQ = NI

where W = diag(w1, w2,..., wn) with w1, w2,..., wn the weights (all 1
if weights not specified), and N is the sum of the weights.  If the
weights are ^aweight^s, they are first normalized so that N is the
number of observations.


Author
------

        Bill Sribney
        Stata Corporation
        702 University Drive East
        College Station, TX 77840
        Phone: 409-696-4600
               800-782-8272
        Fax:   409-696-4601
        email: tech_support@@stata.com


Reference
---------

Golub, G.H. and C.F. Van Loan. 1989.  Matrix Computations, 2nd ed.
    Baltimore: Johns Hopkins University Press, pp. 218-219.


Also see
--------

 Manual:  ^[R] orthpoly^
On-line:  help for @matrix@, @orthpoly@, @regress@