Friday, September 18, 2020

Correct, but useless, math II: A Block Matrix Generalization of the Givens Rotation

**** Note that this is a working copy (for years now). If I waited until it was ready for publication, it would be be book length, and more years late.

Last edit: January 19th, 2023 - John Halleck
Added advice in the numbering of points paragraph.
Typographic cleanups
Some changes in the layout in the latex mathematics
Wording improvements.

Requires

   Basic familiarity with Matrices (transposes, inverses, identities, ...)
   Basic familiarity with orthogonal matrices.

Assumptions:

   If we multiply matrices below, we assume that they are conformable.

   If we we take the inverse of a matrix below , we assume that the matrix has an inverse.

Notation review:

   $A^{-1}$ is the [multipicative] Inverse of $A$
   $A^T$ is the Transpose of $A$
   $A^{-T}$ is $A$ with both Inverse and Transpose applied.

Contents

See my first post in the "Useless Results" series for some background on why I'm posting these. The short form is: Lots of folk discover useless results, then discover that they are useless, and get on with their lives. They don't publish them so that others can avoid similar problems. In fact, I don't imagine that there are any journals that specialize in publishing useless results. So I'll settle for blog posts, which don't seem to reject useless results.

Overview

This page just covers some basics of a specific block matrix generalization of the Givens Rotation. And how that generalization result can be used to solve some Least Squares problems (with some special, but common, sparsity patterns) in just one application.

My refereed paper on Least Squares Network Adjustments via QR factorizations may be a good reference for more detail on using the Givens rotation to solve a least squares problem. It also has a good example of the numerical unstability of Normal Equations.

Context

There are a class of problems in mathematics called "Least Squares problems". The details of what they are don't matter to this story, but you can Google them or read books on the topic, or even take classes on them if you want to know more. They were originally solved with "(Gaussian) Normal Equations", which were developed by Carl Friedrich Gauss. These are more commonly solved at this time with various "Orthogonalization Methods". The reason for the switch is that Normal Equations are "Numerically Unstable". This is a exact mathematical term, but for the purposes of this story you can pretend that it means "gives really inaccurate results". The opposite of this is called "Numerically Stable". For problems I care about (Least Squares Adjustment of cave and land surveying data) they take in survey data with some errors that make loops not actually come back to their starting points and make it into the "most likely" (but still wrong) data with loops that do. Terminology aside: surveyors call a loop that should return to its starting point "a loop that doesn't close". The errors are now scattered lightly over the data instead of concentrated in a few spots.

Many Orthogonalization Matrix methods exist, some are better at some kinds of problems, some are better at others. For my common problems there are two main orthogonal matrix methods used, Householder Reflections, and Givens Rotations. Both have good numerical stability, so they've edged out Normal Equations in most current mathematical use. My favorite is Givens Rotations, because they can make good use of the sparsity of the underlying problem. Householder reflections are more popular, partly because they have a "block matrix" version that aids rapidly solving problems. I searched the literature and could not find a block matrix version of Givens Rotations. Why not? Nobody seemed to have any reason that there could not be one, and nobody I talked to had even heard of one in the literature.

With no known reason there couldn't be a block matrix form, and apparently none being known, this sounded like a good hobby for me. So... I added it to my projects wish list, and started wondering about what one would look like.



Background

If we are going to extend the Givens Rotation then probably a quick review of traditional Givens Rotations is in order.

Givens Rotations were first introduced by James Wallace Givens Jr. (writing as "Wallace Givens") in "Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form" in the Jourmenal of the Society for Industrial and Applied Mathematics [J. SIAM] 6(1) (1958), pp. 26–50.

The Givens Rotation is:

$\begin{bmatrix} \quad \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$ Where $x$ is non-zero, and we want to make $y$ zero.

Properly, the four by four matrix on the left is the actual Givens Rotation. The column on the right is the column we are operating on. Since the values in that column are what the values in the rotation are computed from, it seemed clearer to show the rotation applied to the column. The rotation is used to selectively zero out entries in a problem matrix in order to (for example) put a matrix into upper triangular form in a numerically stable manner.

Givens Rotations are orthogonal matrices, their inverses are equal to their transpose: $G^{-1}=G^T$. Orthogonal matrices are famous for their numerical stability. Fortunately, for most uses we don't actually compute any sine or cosine functions since the usual trig definitions are really defined in terms of the length of the sides of a right triangle ($\dfrac{opposite}{hypotenuse}$ for sine, and $\dfrac{adjacent}{hypotenuse}$ for cosine), and these are directly available to the problem from the column involved, adjcent = $x$, opposite = $y$, and the hypotenuse is therefore $\sqrt{x^2+y^2}$, without invoking explicit trigonometric functions at all.

Trivial example: If we had a column vector $ \begin{bmatrix} x \\ y \end{bmatrix}$ where $x$ is non-zero, then the Givens rotation to zero out $y$ would be

$c = \dfrac{x}{\sqrt{x^2+y^2}}$

$s = \dfrac{y}{\sqrt{x^2+y^2}}$

$\begin{bmatrix} \quad c & s \\ -s & c \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$ $= \begin{bmatrix} \dfrac{x^2}{\sqrt{x^2+y^2}} + \dfrac{y^2}{\sqrt{x^2+y^2}} \\ 0 \end{bmatrix}$ $= \begin{bmatrix} \dfrac{(x^2+y^2)}{\sqrt{x^2+y^2}} \\ 0 \end{bmatrix}$ $= \begin{bmatrix} \sqrt{x^2+y^2} \\ 0 \end{bmatrix}$

We know that the $x$ position is going to get $\sqrt{x^2+y^2}$ and the $y$ position is going to get $0$, so in practice one can just fill them in without actually doing the full matrix multiply for this column.



Start

A bit of work gives a good candidate for a Block matrix form:

$G = \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix}$ with the intent to zero out $Y$ in a vector $\begin{bmatrix} X \\ Y \end{bmatrix} $

Where $X$ is non-singular.
Note that the transpose in $B^T$ is there because $GG^T$ would have non-conformable matrix elements when $B$ is non-square and therefore $G$ could not possibly be orthogonal. Using $B^T$ keeps everything conformable.
$A$ and $C$ have been split into separate values since when B is non-square, they have different sizes to match $B^TB$ and $BB^T$.


Part one: can the matrix zero out a specific component of the vector?
Givens rotations are used to zero out entries in a matrix. And setting this up will explain why the block matrix form doesn't have EXACTLY the form of the standard Givens rotation.

$\begin{align} \begin{bmatrix} A & AB^{T} \\ -CB & C \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} AX + AB^TY \\ -CBX + CY \end{bmatrix} \end{align}$

For this to be equal to $\begin{bmatrix} AX + AB^TY \\ 0 \end{bmatrix}$

requires that $-CBX + CY = 0$, and we can easily solve for a $B$ that makes it true.

$\begin{align*} -CBX + CY &= 0 &&\text{Given} \\ BX + -Y &= 0 &&\text{Premultiply by $-C^{-1}$} \\ BX &= Y &&\text{Add $Y$ to both sides} \\ B &= YX^{-1} &&\text{Postmultiply by $X^{-1}$} \\ \end{align*} $

so setting $B$ to $YX^{-1}$ insures this property. Note that $B$ is not square if there are any reduntant observations.


Part Two: Is the matrix orthogonal? (I.E. Is $GG^T$ the identity?)

$G^T = \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix}^T = \begin{bmatrix} A^T & (-CB)^T \\ (AB^T)^T & C^T \end{bmatrix} = \begin{bmatrix} A^T & -B^TC^T \\ BA^T & C^T \end{bmatrix}$

Now we just need to multiply out $GG^T$ to see if it is the identity

$\begin{align*} GG^{T} &= \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix} \begin{bmatrix} A^T & -B^TC^T \\ BA^T & C^T \end{bmatrix} \\ \\ &= \begin{bmatrix} AA^T + AB^TBA^T & -AB^TC^T + AB^TC^T \\ -CBA^T + CBA^T & +CBB^TC^T + CC^T \end{bmatrix} \\ \\ &= \begin{bmatrix} AA^T + AB^TBA^T & 0 \\ 0 & CBB^TC^T + CC^T \end{bmatrix} \end{align*} $


Is that really the identity? Yes it is the identity, if $AA^T + AB^TBA^T = I$ and $CC^T + CBB^TC^T = I$.

So, for example, $A$ can be found as follows:
$\begin{align*} AA^T &+ AB^TBA^T &= I &&\text{Given} \\ A^T &+ B^TBA^T &= A^{-1} &&\text{Premultiply both sides by $A^{-1}$} \\ I &+ B^TB &= A^{-1}A^{-T} &&\text{Postmultiply both sides by $A^{-T}$} \\ I &+ B^TB &= (A^TA)^{-1} &&\text{extract inverse} \\ (I &+ B^TB)^{-1} &= A^TA &&\text{take inverse of both sides} \end{align*} $

$A$ can now be computed by a Cholesky factorization of the left hand side.

A similar derivation for works for $C$:
$\begin{align*} CC^T &+ CBB^TC^T &= I &&\text{Given} \\ C^T &+ BB^TC^T &= C^{-1} &&\text{Premultiply both sides by $C^{-1}$} \\ I &+ BB^T &= C^{-1}C^{-T} &&\text{Postmultiply both sides by $C^{-T}$} \\ I &+ BB^T &= (C^TC)^{-1} &&\text{extract inverse} \\ (I &+ BB^T)^{-1} &= C^TC &&\text{take inverse of both sides} \end{align*} $
$C$ can now be computed by a Cholesky factorization of the left hand side.

Note $A$ uses $B^TB$ and $C$ uses $BB^T$, the derivations are otherwise identical.



Summary

If we have $\begin{bmatrix} X \\ Y \end{bmatrix}$ and want to zero the $Y$ component with a single orthogonal block matrix transformation, then we:

set $B$ to $YX^{-1}$

set $A$ to the right hand factor of some $Z^TZ$ factorization of $(I + B^TB)^{-1}$

set $C$ to the right hand factor of some $Z^TZ$ factorization of $(I + BB^T)^{-1}$

The final matrix we want is:

$ \begin{bmatrix}
A & AB^T \\ -CB & C \end{bmatrix} $

Note that $Y X^{-1}$ has to be conformable, but $Y$ need not be square. (I.E. we can zero a column below some entry if need be.)



Notes

My environment has a lot of upper triangular submatrices, which was the reason for choosing the upper triangular Cholesky factor as opposed to other matrix square roots.

Note that while the "Cholesky" factorization traditionally means factoring it into $U^TU$ (or equivalently $LL^T$) my requirement is only that the one factor transpose by the other be the original. So I can use $L^TL$ or $UU^T$ forms, among others. This allows one to pick which kind of sparceity pattern to use for the problem at hand. Note that there are TWO factorizations in the setup ($A$ and $C$), and the form of each may be chosen independently of each other.

Note that $Y$ does not have to be square. Therefore this can be used (Just like Householder transformations) to zero entire columns at a time.


Addendum: Dimensional Analysis

Above I derived the block matrix Givens Rotation, and commented on how some parts are the way they are to make the sizes agree. But I never show it with the sizes explicit. I think it would be clearer for some folk if I did.

So here is the block givens with the size of every matrix, and every matrix operation, subscripted by its size. Extra parenthesies have been added, as needed, so that we have something to stick the subscripts on in the cases of products, transposes, and the like.


The problem matrix has $t$ total number of rows of observations,
and there are $s$ stations in each row.

$r$ is the number of redundant observations ($t-s$)

We look at the problem matrix ($P_{t,s}$) as if it were partitioned into two matrices
$X_{s,x}$ is the $s$ by $s$ set of non-redundant observations.
$Y_{r,x}$ is the the $r$ by $s$ redundant observations.

$\begin{bmatrix} P_{t,s} \end{bmatrix}_{t,s} =\begin{bmatrix} X_{s,s} \\ Y_{r,s} \end{bmatrix}_{t,s} $

Set $B_{r,s}$ to $(Y_{r,s}X_{s,s}^{-1})_{r,s}$

set $A_{s,s}$ to the right hand factor of some $((Z_{s,s}^T)_{s,s}\;Z_{s,s})_{s,s}$ factorization of

$(I_{s,s} + ((B_{r,s}^T)_{s,r}\;B_{r,s})_{s,s}^{-1})_{s,s}$

set $C_{r,r}$ to the right hand factor of some $((Z_{r,r}^T)_{r,r}\;Z_{r,r})_{r,r}$ factorization of

$(I_{r,r} + ((B_{r,s}\;(B_{r,s}^T)_{s,r})_{r,r})^{-1})_{r,r}$

Giving a final result of

$G = \begin{bmatrix} \quad A_{s,s} & (A_{s,s} (B_{r,s}^T)_{s,r})_{s,r} \\ -(C_{r,r}B_{rs})_{r,s} & C_{r,r} \end{bmatrix}_{t,t}$

with the intent to zero out $Y_{r,s}$ in $\begin{bmatrix} X_{s,s} \\ Y_{r,s} \end{bmatrix}_{t,s} $



One step QR factorization for Least Squares (for some problems)

There are some classes of problems where the block Given's allows a one step reduction of the problem to a QR factorization. I'll include my favorite example below.

For example, Surveying problems that have neither triangulation nor trilateralization [I.E. the average land survey] have the property [assuming the survey is connected and contains at least one control point] that Block Given's can directly produce the QR factorization in one step.

For this class of problem you have N points, and M observations of the [linearized] problem such that each observation is of the form "From point A to point B the delta is D" (where D is typically a column 3-vector for 3d). I'll ignore the weighted problem. My $QR$ paper covers casting the weighted problem into this form.

Each row has only two non-zero elements and hundreds, or thousands, or more zero elements in the row, so this matrix very sparse

Geometric constraints of a typical survey require that some point be tied down. (Sometimes just assumed to be zero, sometimes one or more explicit control points.) Surveys are also constrained to be connected. Together these two constraints mean that the survey has a spanning tree. [Usually a very large number of different spanning trees, in fact... choice of which observations are included in the spanning tree and which are redundant can make a big di/fference in both final accuracy and amount of work when solving the problem.] [TODO: characterize my best heuristics in a more formal form.]

Note that there are LOTS of special cases in the redundant rows that can be reduced by orthogonalization methods, near trivially, before hitting the problem with the full transformation. [Example: any two observations that tie together the same two points with no new non-zeros generated, and not affecting any other part of the survey.] TODO: and ordering the redundant rows also affects work load]

One can number the points in a "spanning tree order": Start at a control point, and then pick a point not in the survey but reachable in one step. [Aside: if there is more than one shot that "fits the bill", pick the one with the covariance matrix with the smallest determinant or equivalently the with the weight matrix with the largest determinant.] That's your next numbered point. Add it to the survey and repeat until all points are included. A reverse spanning tree order is obtained by reversing this numbering, or just forming it reversed.

If you consider the links that form the spanning tree (with the points numbered in a reverse spanning tree order), and sort then into descending order, you have an upper triangular matrix. And all the redundant observations are a collection of redundant rows at the end.

Now that the problem is reduced to a (sparse) upper triangular matrix and (sparse) redundant rows:
$\begin{bmatrix} * & * & * & * & * & * & \dots \\ & * & * & * & * & * & \dots \\ & & * & * & * & * & \dots \\ & & & * & * & * & \dots \\ & & & & * & * & \dots \\ & & & & & * & \dots \\ & & & & & & \ddots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}$ is $\begin{bmatrix} X \\ Y \end{bmatrix}$. with an upper triangular $X$ and full matrix $Y$

Note: For a typical land survey problem, the rows generally have only two non-zero elements. This rather extreme sparseness can be exploited when forming $B^TB$ below. I'm ignoring, for the sake of simplicity, the fact that the elements in a 3D survey are really 3x3 covariance matrices (which with a bit of work can be 3x3 upper triangular matrices)

Block Givens can be directly applied to this to zero out the $Y$, in one step, and the Least Squares problem then requires solving the resulting system.

Since the second row of the Block givens just zeros out the Y, we need not actually compute it. We can just compute the first row, and apply it, and zero out the $Y$ [actually, just ignore it], and solve the resultant exactly determined system. With care, $X$ is upper triangular (and with some work) the 3D survey case can be made truely upper triangular, and not just block triangular), computing B is fairly easy. And instead of computing $I+B^TB$ we can observe that using the Cholesky to compute $B$ makes $B$ start out lower triangular, and the computation of I+B^TB$ can be looked at as Cholesky factor update.

I believe that there are transformations in the Block Given's family that can clear Y and preserve X as triangular, if it starts triangular. For transformations that do this, the transformation *IS* the Q needed for a QR transformation, and the QR factorization of the usual Least Squares solution by orthogonalization methods is done in one step. [TODO: I need to add my notes that there is a family in general. ] [TODO: I need to finish proofs to show that this property is preserved with the right restrictions.] [TODO: Can I use my "trick" to using Givens Rotations to preserve triangularity when zeroing partitions for the case where the matrix starts out all partitions start out triangular? ]



Why is this a useless result?

  1. For Least squares problems, normal equations end up forming $A^TA$ which looses a lot of accuracy... a major factor in why orthogonal methods are used for the problem in the first place. And here we are generating the Orthogonal matrix with lots of $B^TB$ expressions in the formulation. Using construction techniques that loose accuracy means that the matrix that is Orthogonal when computed with exact arithmetic is not going to be so when computed with pieces computed with less numerically stable techniques.
  2. The construction is littered with finding a factor of $(I + B^TB)^{-1}$ or $(I + BB^T)^{-1}$. In the prior post in the "Useless Results" series, we showed that this is as much work as doing an $Ax=b$ problem of the same size. Doing more work to set this up than it actually saves, is probably a poor tradeoff.
  3. Of course, the individual Cholesky factorizations of expressions (of the form $I+BB^T$ or $I+B^TB$) needed to form the rotation probably suffer from some of the kind of numerical problems that a Normal Equations solution does and for the same reasons.


Epilogue February 10th, 2022

As I said at the start, this was a hobby project. So when it became a useless result, it wasn't a total surprise. But I learned a lot, and practiced a number of skills. And, there are still a few loose ends to clean up [marked with "TODO" above]. As I explained in the first "useless results" post, I think useless results should also be shared. So I put the main text of the above on my web pages... before they vanished when I retired nine years ago.

Given that the result is useless, you can imagine my surprise when I saw that Admissible and Attainable Convergence Behavior of Block Arnoldi and GMRES (Marie Kubínová and Kirk M. Soodhalter) had cited it. It wasn't cited for the useless result itself, but instead for the reasoning that I employed to get there. It was a simple one line reference in the text ("We follow the idea of [reference to my defunct web page]...)", but they then pushed it further (and in another direction) than I had ever thought of doing, in a somewhat different area of mathematics then I normally play in. It does, however, point to my old (no longer existing) web pages instead of here.

I think having made my "useless result" public has been shown to be worthwhile in this case.


Navigation:

next post | previous post | previous post in series

No comments:

Post a Comment