Saturday, June 04, 2022

Givens Rotations vs. Partitioned Matrix of Triangular Matrices.

Copyright 2022 by John Halleck.
All rights reserved except as modified by the
GNU Free Documentation License version 1.3 or later.

Last Edited July 29th, 2023 - John Halleck Minor terminology fixes.

*** This page is truely a work in progress, expect it to change from time to time.


Requires:
Knowing what a triangular matrix is.
Basic familiarity with Matrices (transposes, inverses, identities, ...)
Basic familiarity with Orthogonal Matrices, At least for the definition.
Knowledge of what a "Least Squares" solution is.
Very Basic familiarity with the Givens Rotation
Some understanding of the "fill in" properties of Givens Rotations.

As shown in my "fill in" properties of Givens Rotations paper. If you do a givens rotation using any two rows of a triangular matrix, trying to zero out one of non-diagonal non-zeros then you WILL introduce non-zeros into the zeros half of triangular matrix. This makes it non-triangular, and loses all the nice mathematical properties of triangular matrices. This therefore defeats the purpose of having triangular matrices in the first place.

Some people might assume that this means that Givens Rotations are not appropriate for triangular matrices.


Here is the critical observation: If you pick two rows, *In Different triangles* of the same size, and in the same column, and they are both the same numbered row of the respective triangles, they will have *EXACTLY* the same sparcity pattern. This is true both before and after the Givens rotation.
Fill in can only happen (see the sparsity post referenced above) in the case that one row has a non-zero value, and the other doesn't. In those specific cases, the zero item becomes non-zero. For example, using "*" for a value that can possibly be non-zero:

$\begin{bmatrix} * & & \\ * & * & \\ * & * & * \\ \\ * & & \\ * & * & \\ * & * & * \\ \end{bmatrix} $

Consider a Givens Rotation of the first row of each triangle to zero out the first item in the first row of the other. The givens rotation is seeing

$\begin{bmatrix} * & 0 & 0 \\ * & 0 & 0 \end{bmatrix}$

Which can not possibly cause any fill in.
The same reasoning applies to the 2nd row of each... only the first two values can end up with non-zeros. and so one to the end of whatever size the matrix is.
Of course, if the outer matrix has zero matrices, then there can be fill in when one matrix is zero:
Consider:

$\begin{bmatrix} * & & \\ * & * & \\ * & * & * \\ \\ 0 & & \\ 0 & 0 & \\ 0 & 0 & 0 \\ \end{bmatrix} $

In this case,
A Givens of the first row of each will only fill in column 1 in the zeroed matrix.
A Givens of the second row of each will only fill in columns 1, and 2 in the zeroed matrix.
A Givens of the third row of each will only fill in columns 1, 2, and 3 in the zeroed matrix.
So the fillin only makes the zero matrix into another triangular matrix.

Of course, Givens rotation applications need to be applied to the paired rows of every single triangular matrix all the way across the two rows of matrix housing these triangular matrices.

Why triangular matrices? Because they have much lower operation counts for almost anything you would like to do with them. (Even finding the inverse of such a matrix is near trivial.]

Footnotes

1) A massive amount of mathematics is has Gauss' name attached to it. The Gauss in question is Karl Friedrich Gauss, a German Mathematian, who lived 30 April 1777 – 23 February 1855

Wednesday, June 01, 2022

Sparsity and the Givens Rotation.

Last edited Dec 14th, 2023 - John Halleck

Requires:
Basic familiarity with Matrices (transposes, inverses, identities, ...)
Basic familiarity with Orthogonal Matrices.
Very Basic familiarity with the Givens Rotation

Gives:
Some introduction to the "fill in" properties of Givens Rotations.

Every so often, on sites that answer math questions, I see a confused request for help something more or less like the following:

"I read up on Givens Rotations, and how they can be used to make a matrix Upper (or Lower) triangular by zeroing out the part of the matrix below (or above) the diagonal. When I do this I can see that it does do that. *BUT*, when I zero out everything below the diagonal, and then zero out everything above the diagonal, I expect to get the diagonal but instead I've filled in elements set that on the side that was only zeros. What's going on here?"

This normally gets a number of correct answers quoting famous theorems the the person asking not only has never heard of before, but doesn't yet have the mathematics background to even read. The asker reframes the question and gets even more esoteric answers. And they are scared off never to return.

So... This post is intended to answer the question directly without reference to any mathematics beyond the rotation itself. It also goes on to discuss in more detail some of the properties of Givens Rotations on sparsity.

What does the Givens Rotation REALLY do?

Text books explaining Givens Rotations usually show how to zero out an element of a matrix, showing only the column that the zeroing is in. But they don't usually show how it effects the entire rest of the row. It is what happens to the rest of the row that eventually answers the original question.

For an example with several columns (which is almost always the case), let's trace what a Givens Rotation actually does for it. Each column is contrived to demonstrate some different situation.

Given: $\begin{bmatrix} x & w & a & 0 & 0 \\ y & z & 0 & b & 0 \end{bmatrix}$

this is really just two rows of a much larger matrix in general.
This example is contrived to have a column with two abitrary column non-zero entries which we are trying to clear the lower entry with a Givens Rotation.
This is followed by a column that with a zero lower entry and a non-zero upper entry.
And then we have a column with only a non-zero entry in the lower entry.
And finally a column where both relevant entries are zero.

We will expand out the sines and cosines into the equivalent fractions
If this form is unfamiliar to you, see my Introduction to the Givens Rotation

$\begin{bmatrix}
\dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\ \dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix} \begin{bmatrix} x & w & a & 0 & 0 \\ y & z & 0 & b & 0 \end{bmatrix} $

$= \begin{bmatrix} \dfrac{x^2+y^2}{\sqrt{x^2+y^2}} & \dfrac{xw+yz} {\sqrt{x^2+y^2}} & \dfrac{ xa} {\sqrt{x^2+y^2}} & \dfrac{yb} {\sqrt{x^2+y^2}} & 0 \\ \dfrac{-yx+xy} {\sqrt{x^2+y^2}} & \dfrac{-yz+xw}{\sqrt{x^2+y^2}} & \dfrac{-ya} {\sqrt{x^2+y^2}} & \dfrac{xb} {\sqrt{x^2+y^2}} & 0 \end{bmatrix} $

$= \begin{bmatrix} \sqrt{x^2+y^2} & \dfrac{xw+yz}{\sqrt{x^2+y^2}} \dfrac{ xa} {\sqrt{x^2+y^2}} & \dfrac{yb} {\sqrt{x^2+y^2}} & 0 \\ 0 & \dfrac{-yz+xw}{\sqrt{x^2+y^2}} \dfrac{-ya} {\sqrt{x^2+y^2}} & \dfrac{xb} {\sqrt{x^2+y^2}} & 0 \end{bmatrix}
$

As can be seen by inspection: After an application of the Givens Rotation, we have:

  • In the column that was used to form the rotation, the lower row will be cleared.
  • In any other column that has some non-zero entry, the result will have a non-zero entry in BOTH rows of that column.
  • In the final case, if BOTH entries are zero, THEN the column has only zeros.

  • Now we can finally answer the original question.

    Assume you have a lower triangular matrix. (Non-zeros on the diagonal, zeros above it.)

    Assume you want to zero out the non-zeroes to leave a diagonal matrix.

    This can't be done because no matter what two rows you pick, the lower row will have a diagonal element that is non-zero, and all entries above it will be zero, including that column in the upper row. And we just saw that if either row of a Givens Rotation is non-zero, then after the rotation they both do. So we have now introduced a non-zero element above the diagonal.

    Because it has an element above the diagonal, it is no longer triangular since it has non-zeros on both sides of the diagonal. So the best one can hope for is to make a triangular matrix... not a diagonal one. Similar reasoning applies if it starts our upper triangular, just swap upper and lower in the proof. Q.E.D.


    Navigation:

    Previous article | Next article

Sunday, January 09, 2022

Musings on Collaboration

This post last edited January 30th, 2023 - John Halleck. (Addendum added October 6th, 2022)
Rewordings, mostly.

Stylistic note: If you see a noun phrase below, and the words look like they are capitalized for no apparent reason, then the phrase is likely something I view as worth googling if you are not familiar with it.

Fair warning: This post is not really finished. It has, however, gotten quite long. I thought I had better post this draft now before it becomes something book length before I get around to cleaning it up. I am still making edits, and will be for the foreseeable future.

I can read (almost anywhere) many great articles, papers, blogs, and web pages, that show off successful efforts to solve problems, often with incredibly clever techniques. But publishers rarely, if ever, cover any failed projects. So people exploring see that everyone else they read about has brilliant flashes of insight. But their own efforts just seem to be hard work, and their own explorations seem to lead nowhere with little or no real insights. Some folk even give up at that point.

BUT, in fact, every mathematician/logician/scientist I've ever talked to at length, has followed very inviting lines of reasoning that dead ended or proved more work than they thought worth it. The explorations that panned out they publish, and the ones that don't work never see the light of day. And the ones that work they ride as far and wide as they can. That was certainly true back in the days that math (and logic, and science) papers were almost always single author works. Nowadays, more and more math (and logic or science) papers have multiple authors. The mathematical/logical/scientifc culture is now much more collaborative than it was... and the fields are better for it.

I used to have the "Everyone else has brilliant insights... but I work at it and barely get any insights at all, much less good ones" view. But in my case my personal history changed it. I had a brilliant coworker/mentor [LeRoy Eide] with very broad interests. I explored his interests, and he mine, and his insights cleared out many of my dead end projects, I did some help on his dead end projects. Almost all of his observations were good, and some of mine were good... But I didn't realize that they were good until he pointed it out. The longer we collaborated the more I was able to recognize good insights.

We were collaborating, although that's not how I viewed it at the time, since we didn't publish anything. We should have, such as what properties a binary operator needs to admit to a "shift and [operator]" version such as multiply being done with "shift and add". I thought we were just having fun. It was enough fun that it expanded into the "Utah Logic Group", that met regularly and exchanged ideas (some in logic) for many many many years (more than 30). Everyone had different but broad interests, and the regulars were interested in expanding their knowledge. Sadly, the group finally folded a few years ago when most of the older regulars died off over a relatively short period.

My first paper in a refereed journal ("Least Squares Network Adjustment via QR factorization" in the June 2001 issue of Surveying and Land Information Systems) was as a one author paper. Mathematicians, almost univerally at the time, used Orthogonalization methods for such tasks. And I was not aware of them using Normal Equations for the task. I didn't have any great flash of insight on the mathematics, but I did notice that the Land Surveying Field instead used Carl F. Gauss's "Normal Equations" for the task. But most surveyors (other than some surveying instructors) had never heard of anything else for the task, and many of those that had heard assumed that the newer methods were "esoteric trigonometry laden complex impractical theoretical methods". The biggest network adjustment in history up to that point in time, was the National Geodetic Survey's NAD 1984 nation wide adjustment [roughly 400,000 stations, and 1.5 million observations] and it was done with Normal Equations [and some very clever factorizations]. But some people involved worried that the numerical stability problems of Normal Equations might cause the project to fail. But the mathematical literature at that time already had articles about using orthogonalization methods for such tasks. In fairness to NGS, their project was already underway when those came out, and the staffing for that adjustment was much lower than the previous adjustments. Totally rewriting and testing new software using QR factorization instead of Normal Equations was not remotely practical at that time.

When I pointed that out to mathematicians that surveyors were still using Normal Equations, the responses were things like "Really???, Why?". The main reason, in my opinion, is that the mathematics folk and the surveying folk had/have different cultures, and different terminology, and different views of the problem. And I have to admit that Normal Equations have a great deal of intuitive appeal in surveying. I'd talk about Givens Rotations, and they'd try to assume it rotates some part of their survey. They weren't being dense... it was just culture and culture language issues. And there was some inertia involved.

Since I'd been involved in surveying caves for years, I was lucky to have a foot in both cultures. So I decided to write a paper introducing the land surveyers to the results of the Mathematics world. And it was a lot of work, with lots of iterations. And got everyone I could find, in both fields , to look it over... in part because I worried that it wouldn't fly since it had no great or even clever insight. From this I learned how careless my usage of the terminology was.. and I corrected it. And I learned lots and lots of ways to explain this to surveyors that didn't actually work, and I found some that did. So, no earth shattering insights, and lots and lots of work.

Did this make everyone convert over to orthogonalization methods? No. But it did enlighten some folk, and I still get favorable email and questions about it. I didn't realize at the time, but the "get everyone to look at it" step was actually collaboration... Just not the way I had envisioned it worked. "Collaboration? Oh no, I'm the sole author, but I had lots of folk proof reading and making suggestions."

I started my drift into *real* collaboration when I contacted a famous author in logic [Larry Wos], that happened to be a compulsive collaborator, because I saw that something I was working on in my solitary fashion that might feed into one of his passions [shortest possible proofs]. This was the start of a number of very fun little collaborations, including us collaborating on a "shortest possible proofs" study that by the end of that week we started was producing useful work on open questions on the status of alleged single axioms for the BCI logic system. And that got me involved in work by the person that produced the candidates we proved to be single axioms of the system. Larry wrote up articles on what we did and things that happened as a result of what we did, always with at least appropriate credit given. The paper on that first collaboration used to be at http://www.automatedreasoning.net/docs_and_pdfs/fecundity_of_the_bci_logic.pdf but now that he has died, his webpages have vanished. There is still a copy at https://web.archive.org/web/20210509092949/https://automatedreasoning.net/docs_and_pdfs/fecundity_of_the_bci_logic.pdf. And all of his old site is at https://web.archive.org/web/20210509092949/https://automatedreasoning. He was a very good writer, and has had fun with the topics. Larry also kept urging me to write up some of what I was doing by myself. I dragged my feet because I was unsure that anything I had to say was all that deep or interesting to anyone other than me, and I thought I showed no real insight. He once made an argument sort of like "Making yourself feel good by playing with your logic behind closed doors is not as fun as letting others play with it, or even you playing with theirs." [He might have quibbled about the exact wording, and possibly the color of that quote.]

I finally dipped my feet in publishing my own work with a reply (October 2010 Association for Automated Reasoning newsletter) to a challenge Larry Wos offered (June 2010 AARP newsletter). [TODO: the articles link to my web pages for more information, and those don't exist any more, I need to host them somewhere.] This paper introduced the idea of using a certain class of not necessarily valid inferences to efficiently produce a valid proof. A lot of people found the very idea humorous (as did I). It does work, it is valid, and makes proving some very special classes of VERY LARGE alleged theorems practical. [ Another single author work, but there were a broad base of people to talk to about it, and interesting email exchanges where others asked for information and they, in turn brought in others.]

My next refereed paper in the Hawai‘i Journal of Medicine & Public Health was actually multiple authors, in a medical journal. I was second of three authors, and the I was the only one not a doctor. Update: The journal has changed names since then, it is now the 'The Hawaiʻi Journal of Health & Social Welfare".

Since then I've worked with a number of folk, using what I've done to help with what they've done. Lately, I've been finding single axioms for systems of logic where the originator has asked if I could [a number of logic folk have their own "pet" logics.] And I'm really pleased and surprised that some of them were pointed my way by people with names I recognize as authors of work I admire.

I'm still having fun, but I'm not following Larry's advice as often as I should.

My personal advice is:

  • Keep broad interests
  • Discuss a broad range of topics with people with broad interests
  • Listen, study, share
  • New folk are a resource, not a burden, get them involved
  • Have a thick skin when asking questions, or for opinions
  • Ask questions, and for opinions
  • Have fun with your interests
  • Share your passions
  • At least try other's passions
  • Follow Larry's advice above, write it down, and SHARE!

Addendum (Oct. 6, 2022): Someone objected to part of the article above, on that grounds that my newsletter articles (mostly caving articles and website postings) aren't even mentioned. The complaint is valid. However, they tend to be soon forgotten, and only relevant (usually) to the conversations going on at the time.


Navigation:

previous post | next post

Friday, September 18, 2020

Correct, but useless, math II: A Block Matrix Generalization of the Givens Rotation

**** Note that this is a working copy (for years now). If I waited until it was ready for publication, it would be be book length, and more years late.

Last edit: January 19th, 2023 - John Halleck
Added advice in the numbering of points paragraph.
Typographic cleanups
Some changes in the layout in the latex mathematics
Wording improvements.

Requires

   Basic familiarity with Matrices (transposes, inverses, identities, ...)
   Basic familiarity with orthogonal matrices.

Assumptions:

   If we multiply matrices below, we assume that they are conformable.

   If we we take the inverse of a matrix below , we assume that the matrix has an inverse.

Notation review:

   $A^{-1}$ is the [multipicative] Inverse of $A$
   $A^T$ is the Transpose of $A$
   $A^{-T}$ is $A$ with both Inverse and Transpose applied.

Contents

See my first post in the "Useless Results" series for some background on why I'm posting these. The short form is: Lots of folk discover useless results, then discover that they are useless, and get on with their lives. They don't publish them so that others can avoid similar problems. In fact, I don't imagine that there are any journals that specialize in publishing useless results. So I'll settle for blog posts, which don't seem to reject useless results.

Overview

This page just covers some basics of a specific block matrix generalization of the Givens Rotation. And how that generalization result can be used to solve some Least Squares problems (with some special, but common, sparsity patterns) in just one application.

My refereed paper on Least Squares Network Adjustments via QR factorizations may be a good reference for more detail on using the Givens rotation to solve a least squares problem. It also has a good example of the numerical unstability of Normal Equations.

Context

There are a class of problems in mathematics called "Least Squares problems". The details of what they are don't matter to this story, but you can Google them or read books on the topic, or even take classes on them if you want to know more. They were originally solved with "(Gaussian) Normal Equations", which were developed by Carl Friedrich Gauss. These are more commonly solved at this time with various "Orthogonalization Methods". The reason for the switch is that Normal Equations are "Numerically Unstable". This is a exact mathematical term, but for the purposes of this story you can pretend that it means "gives really inaccurate results". The opposite of this is called "Numerically Stable". For problems I care about (Least Squares Adjustment of cave and land surveying data) they take in survey data with some errors that make loops not actually come back to their starting points and make it into the "most likely" (but still wrong) data with loops that do. Terminology aside: surveyors call a loop that should return to its starting point "a loop that doesn't close". The errors are now scattered lightly over the data instead of concentrated in a few spots.

Many Orthogonalization Matrix methods exist, some are better at some kinds of problems, some are better at others. For my common problems there are two main orthogonal matrix methods used, Householder Reflections, and Givens Rotations. Both have good numerical stability, so they've edged out Normal Equations in most current mathematical use. My favorite is Givens Rotations, because they can make good use of the sparsity of the underlying problem. Householder reflections are more popular, partly because they have a "block matrix" version that aids rapidly solving problems. I searched the literature and could not find a block matrix version of Givens Rotations. Why not? Nobody seemed to have any reason that there could not be one, and nobody I talked to had even heard of one in the literature.

With no known reason there couldn't be a block matrix form, and apparently none being known, this sounded like a good hobby for me. So... I added it to my projects wish list, and started wondering about what one would look like.



Background

If we are going to extend the Givens Rotation then probably a quick review of traditional Givens Rotations is in order.

Givens Rotations were first introduced by James Wallace Givens Jr. (writing as "Wallace Givens") in "Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form" in the Jourmenal of the Society for Industrial and Applied Mathematics [J. SIAM] 6(1) (1958), pp. 26–50.

The Givens Rotation is:

$\begin{bmatrix} \quad \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$ Where $x$ is non-zero, and we want to make $y$ zero.

Properly, the four by four matrix on the left is the actual Givens Rotation. The column on the right is the column we are operating on. Since the values in that column are what the values in the rotation are computed from, it seemed clearer to show the rotation applied to the column. The rotation is used to selectively zero out entries in a problem matrix in order to (for example) put a matrix into upper triangular form in a numerically stable manner.

Givens Rotations are orthogonal matrices, their inverses are equal to their transpose: $G^{-1}=G^T$. Orthogonal matrices are famous for their numerical stability. Fortunately, for most uses we don't actually compute any sine or cosine functions since the usual trig definitions are really defined in terms of the length of the sides of a right triangle ($\dfrac{opposite}{hypotenuse}$ for sine, and $\dfrac{adjacent}{hypotenuse}$ for cosine), and these are directly available to the problem from the column involved, adjcent = $x$, opposite = $y$, and the hypotenuse is therefore $\sqrt{x^2+y^2}$, without invoking explicit trigonometric functions at all.

Trivial example: If we had a column vector $ \begin{bmatrix} x \\ y \end{bmatrix}$ where $x$ is non-zero, then the Givens rotation to zero out $y$ would be

$c = \dfrac{x}{\sqrt{x^2+y^2}}$

$s = \dfrac{y}{\sqrt{x^2+y^2}}$

$\begin{bmatrix} \quad c & s \\ -s & c \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$ $= \begin{bmatrix} \dfrac{x^2}{\sqrt{x^2+y^2}} + \dfrac{y^2}{\sqrt{x^2+y^2}} \\ 0 \end{bmatrix}$ $= \begin{bmatrix} \dfrac{(x^2+y^2)}{\sqrt{x^2+y^2}} \\ 0 \end{bmatrix}$ $= \begin{bmatrix} \sqrt{x^2+y^2} \\ 0 \end{bmatrix}$

We know that the $x$ position is going to get $\sqrt{x^2+y^2}$ and the $y$ position is going to get $0$, so in practice one can just fill them in without actually doing the full matrix multiply for this column.



Start

A bit of work gives a good candidate for a Block matrix form:

$G = \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix}$ with the intent to zero out $Y$ in a vector $\begin{bmatrix} X \\ Y \end{bmatrix} $

Where $X$ is non-singular.
Note that the transpose in $B^T$ is there because $GG^T$ would have non-conformable matrix elements when $B$ is non-square and therefore $G$ could not possibly be orthogonal. Using $B^T$ keeps everything conformable.
$A$ and $C$ have been split into separate values since when B is non-square, they have different sizes to match $B^TB$ and $BB^T$.


Part one: can the matrix zero out a specific component of the vector?
Givens rotations are used to zero out entries in a matrix. And setting this up will explain why the block matrix form doesn't have EXACTLY the form of the standard Givens rotation.

$\begin{align} \begin{bmatrix} A & AB^{T} \\ -CB & C \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} AX + AB^TY \\ -CBX + CY \end{bmatrix} \end{align}$

For this to be equal to $\begin{bmatrix} AX + AB^TY \\ 0 \end{bmatrix}$

requires that $-CBX + CY = 0$, and we can easily solve for a $B$ that makes it true.

$\begin{align*} -CBX + CY &= 0 &&\text{Given} \\ BX + -Y &= 0 &&\text{Premultiply by $-C^{-1}$} \\ BX &= Y &&\text{Add $Y$ to both sides} \\ B &= YX^{-1} &&\text{Postmultiply by $X^{-1}$} \\ \end{align*} $

so setting $B$ to $YX^{-1}$ insures this property. Note that $B$ is not square if there are any reduntant observations.


Part Two: Is the matrix orthogonal? (I.E. Is $GG^T$ the identity?)

$G^T = \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix}^T = \begin{bmatrix} A^T & (-CB)^T \\ (AB^T)^T & C^T \end{bmatrix} = \begin{bmatrix} A^T & -B^TC^T \\ BA^T & C^T \end{bmatrix}$

Now we just need to multiply out $GG^T$ to see if it is the identity

$\begin{align*} GG^{T} &= \begin{bmatrix} A & AB^T \\ -CB & C \end{bmatrix} \begin{bmatrix} A^T & -B^TC^T \\ BA^T & C^T \end{bmatrix} \\ \\ &= \begin{bmatrix} AA^T + AB^TBA^T & -AB^TC^T + AB^TC^T \\ -CBA^T + CBA^T & +CBB^TC^T + CC^T \end{bmatrix} \\ \\ &= \begin{bmatrix} AA^T + AB^TBA^T & 0 \\ 0 & CBB^TC^T + CC^T \end{bmatrix} \end{align*} $


Is that really the identity? Yes it is the identity, if $AA^T + AB^TBA^T = I$ and $CC^T + CBB^TC^T = I$.

So, for example, $A$ can be found as follows:
$\begin{align*} AA^T &+ AB^TBA^T &= I &&\text{Given} \\ A^T &+ B^TBA^T &= A^{-1} &&\text{Premultiply both sides by $A^{-1}$} \\ I &+ B^TB &= A^{-1}A^{-T} &&\text{Postmultiply both sides by $A^{-T}$} \\ I &+ B^TB &= (A^TA)^{-1} &&\text{extract inverse} \\ (I &+ B^TB)^{-1} &= A^TA &&\text{take inverse of both sides} \end{align*} $

$A$ can now be computed by a Cholesky factorization of the left hand side.

A similar derivation for works for $C$:
$\begin{align*} CC^T &+ CBB^TC^T &= I &&\text{Given} \\ C^T &+ BB^TC^T &= C^{-1} &&\text{Premultiply both sides by $C^{-1}$} \\ I &+ BB^T &= C^{-1}C^{-T} &&\text{Postmultiply both sides by $C^{-T}$} \\ I &+ BB^T &= (C^TC)^{-1} &&\text{extract inverse} \\ (I &+ BB^T)^{-1} &= C^TC &&\text{take inverse of both sides} \end{align*} $
$C$ can now be computed by a Cholesky factorization of the left hand side.

Note $A$ uses $B^TB$ and $C$ uses $BB^T$, the derivations are otherwise identical.



Summary

If we have $\begin{bmatrix} X \\ Y \end{bmatrix}$ and want to zero the $Y$ component with a single orthogonal block matrix transformation, then we:

set $B$ to $YX^{-1}$

set $A$ to the right hand factor of some $Z^TZ$ factorization of $(I + B^TB)^{-1}$

set $C$ to the right hand factor of some $Z^TZ$ factorization of $(I + BB^T)^{-1}$

The final matrix we want is:

$ \begin{bmatrix}
A & AB^T \\ -CB & C \end{bmatrix} $

Note that $Y X^{-1}$ has to be conformable, but $Y$ need not be square. (I.E. we can zero a column below some entry if need be.)



Notes

My environment has a lot of upper triangular submatrices, which was the reason for choosing the upper triangular Cholesky factor as opposed to other matrix square roots.

Note that while the "Cholesky" factorization traditionally means factoring it into $U^TU$ (or equivalently $LL^T$) my requirement is only that the one factor transpose by the other be the original. So I can use $L^TL$ or $UU^T$ forms, among others. This allows one to pick which kind of sparceity pattern to use for the problem at hand. Note that there are TWO factorizations in the setup ($A$ and $C$), and the form of each may be chosen independently of each other.

Note that $Y$ does not have to be square. Therefore this can be used (Just like Householder transformations) to zero entire columns at a time.


Addendum: Dimensional Analysis

Above I derived the block matrix Givens Rotation, and commented on how some parts are the way they are to make the sizes agree. But I never show it with the sizes explicit. I think it would be clearer for some folk if I did.

So here is the block givens with the size of every matrix, and every matrix operation, subscripted by its size. Extra parenthesies have been added, as needed, so that we have something to stick the subscripts on in the cases of products, transposes, and the like.


The problem matrix has $t$ total number of rows of observations,
and there are $s$ stations in each row.

$r$ is the number of redundant observations ($t-s$)

We look at the problem matrix ($P_{t,s}$) as if it were partitioned into two matrices
$X_{s,x}$ is the $s$ by $s$ set of non-redundant observations.
$Y_{r,x}$ is the the $r$ by $s$ redundant observations.

$\begin{bmatrix} P_{t,s} \end{bmatrix}_{t,s} =\begin{bmatrix} X_{s,s} \\ Y_{r,s} \end{bmatrix}_{t,s} $

Set $B_{r,s}$ to $(Y_{r,s}X_{s,s}^{-1})_{r,s}$

set $A_{s,s}$ to the right hand factor of some $((Z_{s,s}^T)_{s,s}\;Z_{s,s})_{s,s}$ factorization of

$(I_{s,s} + ((B_{r,s}^T)_{s,r}\;B_{r,s})_{s,s}^{-1})_{s,s}$

set $C_{r,r}$ to the right hand factor of some $((Z_{r,r}^T)_{r,r}\;Z_{r,r})_{r,r}$ factorization of

$(I_{r,r} + ((B_{r,s}\;(B_{r,s}^T)_{s,r})_{r,r})^{-1})_{r,r}$

Giving a final result of

$G = \begin{bmatrix} \quad A_{s,s} & (A_{s,s} (B_{r,s}^T)_{s,r})_{s,r} \\ -(C_{r,r}B_{rs})_{r,s} & C_{r,r} \end{bmatrix}_{t,t}$

with the intent to zero out $Y_{r,s}$ in $\begin{bmatrix} X_{s,s} \\ Y_{r,s} \end{bmatrix}_{t,s} $



One step QR factorization for Least Squares (for some problems)

There are some classes of problems where the block Given's allows a one step reduction of the problem to a QR factorization. I'll include my favorite example below.

For example, Surveying problems that have neither triangulation nor trilateralization [I.E. the average land survey] have the property [assuming the survey is connected and contains at least one control point] that Block Given's can directly produce the QR factorization in one step.

For this class of problem you have N points, and M observations of the [linearized] problem such that each observation is of the form "From point A to point B the delta is D" (where D is typically a column 3-vector for 3d). I'll ignore the weighted problem. My $QR$ paper covers casting the weighted problem into this form.

Each row has only two non-zero elements and hundreds, or thousands, or more zero elements in the row, so this matrix very sparse

Geometric constraints of a typical survey require that some point be tied down. (Sometimes just assumed to be zero, sometimes one or more explicit control points.) Surveys are also constrained to be connected. Together these two constraints mean that the survey has a spanning tree. [Usually a very large number of different spanning trees, in fact... choice of which observations are included in the spanning tree and which are redundant can make a big di/fference in both final accuracy and amount of work when solving the problem.] [TODO: characterize my best heuristics in a more formal form.]

Note that there are LOTS of special cases in the redundant rows that can be reduced by orthogonalization methods, near trivially, before hitting the problem with the full transformation. [Example: any two observations that tie together the same two points with no new non-zeros generated, and not affecting any other part of the survey.] TODO: and ordering the redundant rows also affects work load]

One can number the points in a "spanning tree order": Start at a control point, and then pick a point not in the survey but reachable in one step. [Aside: if there is more than one shot that "fits the bill", pick the one with the covariance matrix with the smallest determinant or equivalently the with the weight matrix with the largest determinant.] That's your next numbered point. Add it to the survey and repeat until all points are included. A reverse spanning tree order is obtained by reversing this numbering, or just forming it reversed.

If you consider the links that form the spanning tree (with the points numbered in a reverse spanning tree order), and sort then into descending order, you have an upper triangular matrix. And all the redundant observations are a collection of redundant rows at the end.

Now that the problem is reduced to a (sparse) upper triangular matrix and (sparse) redundant rows:
$\begin{bmatrix} * & * & * & * & * & * & \dots \\ & * & * & * & * & * & \dots \\ & & * & * & * & * & \dots \\ & & & * & * & * & \dots \\ & & & & * & * & \dots \\ & & & & & * & \dots \\ & & & & & & \ddots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ * & * & * & * & * & * & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}$ is $\begin{bmatrix} X \\ Y \end{bmatrix}$. with an upper triangular $X$ and full matrix $Y$

Note: For a typical land survey problem, the rows generally have only two non-zero elements. This rather extreme sparseness can be exploited when forming $B^TB$ below. I'm ignoring, for the sake of simplicity, the fact that the elements in a 3D survey are really 3x3 covariance matrices (which with a bit of work can be 3x3 upper triangular matrices)

Block Givens can be directly applied to this to zero out the $Y$, in one step, and the Least Squares problem then requires solving the resulting system.

Since the second row of the Block givens just zeros out the Y, we need not actually compute it. We can just compute the first row, and apply it, and zero out the $Y$ [actually, just ignore it], and solve the resultant exactly determined system. With care, $X$ is upper triangular (and with some work) the 3D survey case can be made truely upper triangular, and not just block triangular), computing B is fairly easy. And instead of computing $I+B^TB$ we can observe that using the Cholesky to compute $B$ makes $B$ start out lower triangular, and the computation of I+B^TB$ can be looked at as Cholesky factor update.

I believe that there are transformations in the Block Given's family that can clear Y and preserve X as triangular, if it starts triangular. For transformations that do this, the transformation *IS* the Q needed for a QR transformation, and the QR factorization of the usual Least Squares solution by orthogonalization methods is done in one step. [TODO: I need to add my notes that there is a family in general. ] [TODO: I need to finish proofs to show that this property is preserved with the right restrictions.] [TODO: Can I use my "trick" to using Givens Rotations to preserve triangularity when zeroing partitions for the case where the matrix starts out all partitions start out triangular? ]



Why is this a useless result?

  1. For Least squares problems, normal equations end up forming $A^TA$ which looses a lot of accuracy... a major factor in why orthogonal methods are used for the problem in the first place. And here we are generating the Orthogonal matrix with lots of $B^TB$ expressions in the formulation. Using construction techniques that loose accuracy means that the matrix that is Orthogonal when computed with exact arithmetic is not going to be so when computed with pieces computed with less numerically stable techniques.
  2. The construction is littered with finding a factor of $(I + B^TB)^{-1}$ or $(I + BB^T)^{-1}$. In the prior post in the "Useless Results" series, we showed that this is as much work as doing an $Ax=b$ problem of the same size. Doing more work to set this up than it actually saves, is probably a poor tradeoff.
  3. Of course, the individual Cholesky factorizations of expressions (of the form $I+BB^T$ or $I+B^TB$) needed to form the rotation probably suffer from some of the kind of numerical problems that a Normal Equations solution does and for the same reasons.


Epilogue February 10th, 2022

As I said at the start, this was a hobby project. So when it became a useless result, it wasn't a total surprise. But I learned a lot, and practiced a number of skills. And, there are still a few loose ends to clean up [marked with "TODO" above]. As I explained in the first "useless results" post, I think useless results should also be shared. So I put the main text of the above on my web pages... before they vanished when I retired nine years ago.

Given that the result is useless, you can imagine my surprise when I saw that Admissible and Attainable Convergence Behavior of Block Arnoldi and GMRES (Marie Kubínová and Kirk M. Soodhalter) had cited it. It wasn't cited for the useless result itself, but instead for the reasoning that I employed to get there. It was a simple one line reference in the text ("We follow the idea of [reference to my defunct web page]...)", but they then pushed it further (and in another direction) than I had ever thought of doing, in a somewhat different area of mathematics then I normally play in. It does, however, point to my old (no longer existing) web pages instead of here.

I think having made my "useless result" public has been shown to be worthwhile in this case.


Navigation:

next post | previous post | previous post in series

Thursday, August 27, 2020

The Givens Rotation

Topics:     The Givens Rotation

Assumes: Basic Familiarity with Matrices

Last edit: November 5th, 2023 - John Halleck, Minor .
Prior edit:August 9th, 2023 - John Halleck, Reinserted more of the original source.

James Wallace Givens, Jr. (1910-1993), writing as Wallace Givens, introduced the Givens rotation in "Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form", Journal of the Society for Industrial and Applied Mathematics Vol. 6, No. 1 (Mar., 1958), pp. 26-50.

Terminology aside: By the usual mathematical conventon, if someone named "Bullwinkle" invented a rotation, it would be called "The Bullwinkle Rotation". For the same reason, Givens' rotation is called "The Givens Rotation".
Unfortunately, his name ends in an "s" sound, leading to an amazing number of arguments on the Internet that falsely claim it should be the "The Given's rotation" (which would be a rotation attributed to someone named "Given", and not one named "Givens")

THIS POST IS MISSING THE ENTIRE FIRST PART THAT USED TO BE HERE... I'm working on reconstructing the post, sorry for the inconvenience. [Thursday, the 30th of June, 2023]

*** TODO: This section needs a lot more work. ***

Much of mathematics can be thought of in many different ways. The Givens rotation is a prime example of this. I'll start with a diagram of what we we are rotating.

NOTE: This right triangle is arranged so that X increases to the right, and Y increases towards the top, which is the standard convention for coordinates nowadays. In ancient Greece (not having invented coordinate systems) they drew it often with the point to the right, as many do today. TODO: Explain why pointing it to the right leads to a common set of errors.

Graphicly, we are going to rotate the hypotenuse down to the X axis. I.E. we are going to rotate the long side to the bottom. TODO: Fix this.

... still haven't replacing the text that used to be here.

... **** TODO: Working on filling this in ...

Applying $(\sin{\theta})^2 + (\cos{\theta})^2 = 1$, and preforming the subtractions to get zeros
[ Text is missing here ]

= $\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}$

which is a 2x2 identity matrix... proving that the Givens Rotation is orthogonal.

But, of course, we can do the same demonstration for trigonophobes with just the definitions of sine and cosine given above.

$G= \begin{bmatrix} \quad \cos{\theta} & \sin{\theta} \\ - \sin{\theta} & \cos{\theta} \end{bmatrix} = \begin{bmatrix} \dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\ \dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix} $


$sin{\theta}$ and $\cos{\theta}$ always return a result in the rangle -1 to 1 inclusive.
And since $\sqrt{x^2+y^2}$ is larger than either $x$ or $y$, the fractions must each also have a magnitude between -1 and 1 (inclusive).
Since the trig functions given and the fraction forms are equal,

$GG^T =\begin{bmatrix} \dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\ \dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix} \begin{bmatrix} \dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\ \dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix}^T $

$=\begin{bmatrix} \dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\ \dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix} \begin{bmatrix} \dfrac{x}{\sqrt{x^2+y^2}} & \dfrac{-y}{\sqrt{x^2+y^2}} \\ \dfrac{y}{\sqrt{x^2+y^2}} & \dfrac{ x}{\sqrt{x^2+y^2}} \end{bmatrix} $

$= \begin{bmatrix} \dfrac{x^2+y^2}{x^2+y^2} & \dfrac{ -xy+yx}{x^2+y^2} \\ \dfrac{yx-xy }{x^2+y^2} & \dfrac{x^2+y^2}{x^2+y^2} \end{bmatrix} =\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $ which is, once again, an identity matrix.

Orthogonal matrices are famous for their numerical stability. And they are particularly good for selectively zeroing elements of a matrix in a manner that doesn't change the least squares solution.

The full Givens rotation matrix has the same width and height as the height of the matrix that you are applying it to. But since it only affects two rows it is usually written as if the two rows were extracted, a 2x2 matrix applied, and the rows put back. Of course, in practice, we don't actually move them around because the code to apply the rotation just has a slightly different indexing into data.

Givens Rotations are used to selectively zero out items in a problem matrix in order to (for example) to put a matrix into upper triangular form in a numerically stable manner, so that it is easy to solve.

Trivial example:

If we had a column vector $ \begin{bmatrix} x \\ y \end{bmatrix}$ and you wanted to zero out $y$, you could set it up as:

$\begin{bmatrix}
\quad c & s \\
-s & c
\end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix}
$

The hypotenuse is computed as $h=\sqrt{x^2+y^2}$

$\cos\theta$ = Adjacent over hypotenuse

$c = \dfrac{x}{h}$

$\sin\theta$ = Opposite over hypotenuse

$s = \dfrac{y}{h}$

Final rotation:

$\begin{bmatrix}
\dfrac{ x}{\sqrt{x^2+y^2}} & \dfrac{y}{\sqrt{x^2+y^2}} \\
\dfrac{-y}{\sqrt{x^2+y^2}} & \dfrac{x}{\sqrt{x^2+y^2}} \end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix}
= \begin{bmatrix} \dfrac{x^2+y^2}{\sqrt{x^2+y^2}} \\ \dfrac{-yx+xy}{\sqrt{x^2+y^2}} \end{bmatrix}
= \begin{bmatrix} \sqrt{x^2+y^2} \\ 0 \end{bmatrix}$
Note that this is what we expected to have in the discussion of the triangle above.

For those that like to have a numeric, rather than symbolic, example:

For the column $\begin{bmatrix} 4 \\ 3 \end{bmatrix}$ we have a hypotenuse of $\sqrt{4^2+3^2} = \sqrt{16+9} = \sqrt{25} = 5$.

$c = \dfrac{4}{5}$ (Adjacent over hypotenuse)

$s = \dfrac{3}{5}$ (Opposite over hypotenuse)

Final rotation:

$\begin{bmatrix}
\quad \dfrac{3}{5} & \dfrac{4}{5} \\
-\dfrac{3}{5} & \dfrac{4}{5}
\end{bmatrix}
\begin{bmatrix} 4 \\ 3 \end{bmatrix}
=
\begin{bmatrix} \dfrac{16+9}{5} \\ \dfrac{-12+12}{5} \end{bmatrix}
=\begin{bmatrix} \dfrac{25}{5} \\ \dfrac{0}{5} \end{bmatrix}
=\begin{bmatrix} 5 \\ 0 \end{bmatrix}
$


Footnotes:

1) While the Pythagorean Theorem is traditionally credited to Pythagoras, archeologists have a Mesopotamian tablet (Plimpton 322) that was written more than a thousand years before Pythagoras [Pythagoras lived from 570 BCE to 495 BCE], that was a list of whole number pythagorean triples.

2 The book ERIC ED037335: The Pythagorean Proposition, Classics in Mathematics Education Series." by National Council of Teachers of Mathematics - NCTM) has "... 370 different proofs, whose origins range from 900 B.C. to 1940 A.D.". It also has the most complete and interesting biography of Pythagoras I (-JH) have ever seen. Lots of things I didn't know, and found interesting. The full book is availiable for free, in a number of formats, at https://archive.org/details/ERIC_ED037335/

Navigation:

previous post | next post

Sunday, July 05, 2020

Correct, but useless, math I: A triangular matrix result.

(This page last edited November 5th, 2023 -JH)

Requires:
Basic familiarity with Matrices (transposes, inverses, identities, ...)
Basic familiarity with Orthogonal Matrices.
Gives:
Some introduction to "big O notation". Some introduction to triangular matrices.
Assumptions:
   If we multiply matrices below, we assume that they are conformable.
   If we we take the inverse of a matrix below , we assume that the matrix has an inverse.

Notation review:
   Matrix [multiplicative] Inverse of A: A-1
   Matrix Transpose of A: AT

General discussion.

There are many absolutely correct results in mathematics that are generally useless.

They can, for example:

  • be trivial or obvious
  • be much more work than usual methods without any other advantage.
  • equivalent to some other well known result
  • be numerically unstable

Note that I don't include here not having a "practical" use because I'm a firm believer in formal mathematics. Results that enlighten us about how things work are of some use, even if they are not of any "practical" use. The term "numerically unstable" is an exact mathematical term, but for the purposes of this story you can pretend that it just means "gives really inaccurate results in some cases".

I, like most people playing in mathematics, have generated my share of these. I think I would have generated fewer if I could have seen bad examples labeled as such. That would have enabled me not to rediscover old ones, and would have given me more things to watch out for.

But, alas, there are no mathematics journals (or journals in other fields) specializing in correct but useless results. Most mathematics folk, on finally proving a useless result, don't run out and broadcast it to the word: "I just wasted months to prove the following useless result..." Blogs, unlike refereed journals, are held to less of a requirement of a topic being interesting or useful. So I've decided to drop some of them here.

Aside: Big O Notation

People dealing with the (worst case) complexity of algorithms use a notation called the "Big O Notation" when talking about how much work a given algorithm is. This just covers what terms are making all the difference when problems get bigger. If "n"is the size of a problem, then O($1$) says that no matter how big the problem is, it takes the same amount of time to solve. $O(n)$ means that if the size of the problem doubles, then the work to solve doubles. $O(n^2)$ problems are four times the work when the problem size doubles.

$O(n) + O(n^2)$ is just $O(n^2)$, because as $n$ gets larger, the $O(n)$ term is *SO* much smaller than the $O(n^2)$ term as to not make any practical difference. This notation does *NOT* say anything about how bad the problem is on the average, it just talks about worst case. And there are cases where an particularly painful amount of setup is required by an $O(n)$ solution that an $O(n^2)$ solution can take less time on the problems of the size you want to solve. But, we have to start somewhere, and "Big O" notation has proven to be very useful for a long time now.

Triangular Matrices.

Triangular matrices have either everything above the diagonal zeros [called a lower triangular matrix], or everything below the diagonal zeros [called an upper triangular matrix]. If everything above and everything below the diagonal is zero, it is called a diagonal matrix assuming that there are no zero elements on the diagonal. If all of the diagonal elements of a (lower or upper) triangular matrix are ones, it is called a unit (lower or upper) matrix.

If we have a matrix problem of the form $Ax=b$ [Non-singular square $A$, both $x$ and $b$ being column vectors or a matrix (viewed as a matrix of column vectors)], then by the usual techniques this is a O($n^3$) problem... if the size of the problem doubles, the amount of work to solve goes up by a factor of eight. The obvious way of solving the problem is to find the inverse of $A$ and multiply both sides of the equation by the inverse. Forming the inverse, in general, is an $O(n^3)$ problem. The traditional way of multiplying by a general matrix takes $O(n^3)$ time.

There are known ways of computing the inverse of a general matrix in time less than $O(n^3)$, but they have not proven practical for a number of reasons, such as being very complex, or have enough setup time that that they are slower than than $O(n^3) for problems of usual size.

But, in the mean time, there are some special cases that can already be solved in $O(n^2)$. Triangular matrices can be inverted in $O(n^2)$ time. So if we had two triangular matrices $L$ and $U$, the problem $LUx=b$ could be solved in $O(n^2)$ time. Since it is straight forward (but $O(n^3)$) to find an $L$ (traditionally unit lower triangular) and a $U$ (traditionally upper triangular) that has the product $A$ in a $Ax=b$ problem, one of the ways this is handled is to find the $L$ and $U$, and solve the triangular matrix problem.

So, what was the blind alley I fell into, and what was wrong with it?

"If you have found a mathematical result that could make you rich, or famous, or both, then in all probability you have made a major mistake or entirely missed something critical."
- Nahaj's conjecture.

Given $Ax=b$, how can I break this down into triangular matrices in a way that hadn't been done in the literature before and is faster than $O(n^3)?

Well, I observed that forming $L$ * $U$ * x = b, had already been beaten to death in the literature. There was no way I could contribute anything new.
*BUT* no literature seemed to exist at all on $(L + U) x = b$. It is easy to form this (scale the matrix to have two's on the diagonal (remembering to scale the right hand side to match), and copy the lower part of $A$ into the lower part of the $L$ matrix, and copy the upper part of $A$ into the upper part of the $U$ matrix, and copy one's onto the diagonals of both. And the scale was only $O(n^2)$ and the copy is also $O(n^2)$... and the copying doesn't even need to be done...
The code in later steps can just reference the corresponding entries in $A$, instead of $L$ and $U$ (with a tiny bit of overhead of generating the ones of the diagonals as needed.)This does, of course, assume that the diagonal elements are all non-zero to start with. *IF* that is not true, than a procedure called "pivoting" needs to be done first. But I'll ignore that here to make the explanation simpler.

But, of course, I have not yet solved the problem... I've only reduced it to another problem. But, I could multiply both sides by $L^{-1}$ (Which we can get in $O(n^2)$ time) and get $(L^{-1}L + L^{-1}U)x = L^{-1} b$ and that is $(1+L^{-1}U)x = L^{-1}b$. So, if I could find a fast way to compute a new $L'$ and $U'$ such that $L'U' = 1 + LU$ for general triangular matrices, and do it in $O(n^2)$ time, I could find one with my $L^{-1}$ as the lower triangular matrix and my $U$ as the upper triangular one, and I could solve the entire problem $Ax=b$ in $O(n^2)$ time and be well on my way to being famous and maybe rich.
This, right then and there, should have been a red flag.
In fact, in retrospect, all steps in solving this problem are known to be no worse than $O(n^2)$ except for the $L'U' = 1 + LU$ problem. If it were also no worse than $O(n^2)$, the entire $A^{-1}$ problem would fall in $O(n^2)$ time.

I worked on it off and on for a year (I did, after all, have a day job), and did nothing but convince myself that the factorization update was a much harder problem than I thought. It was exactly as hard (worst case) as the $Ax=b$ problem since the two problems can be converted to each other in $O(n^2)$ time.

So, maybe, some brighter person than I am will eventually find an $O(n^2)$ algorithm to solve that factorization update problem. But it certainly won't be me. And I doubt it will be anyone trying to use my factorization problem as a route to fast inverses.

So in the end, this approach is just another mathematically correct (unless I made a mistake above I didn't catch) result that is useless.


Navigation:

next post | next post in series | previous post

Tuesday, May 19, 2020

Basic Balanced Ternary Arithmetic.

This page was last edited on July 23rd, 2022.

This column builds on A prior column on "Just in Time Subtraction"

What is Balanced Ternary Arithmetic?

Well, it is a modification of standard base three arithmetic.

OK, what is base three (ternary) arithmetic?

Base three arithmetic is like base 10, except that we only have digits 0, 1, 2 instead of 0 through 9. This means that counting has to carry to the next column after your digits get bigger than two, instead of when they get bigger than nine. In base 10, an integer has a "one's column", a "ten's column", a "hundred's column", and so on. In base 3, an integer has a "one's column", "a "three's column, a "nine's column", and so on.

On to balanced ternary

Balanced ternary is almost exactly like regular ternary arithmetic, except that instead of digits being 0, 1, and 2, they are +1, 0, and -1. This is awkward to write, so I'll follow a common convention and use "+", "0", and "-" as the digits.
Terminology aside: In binary, a digit is referred to as a "bit", short for Binary digiIT. In ternary (or balanced ternary) a digit is called a "trit", short for TeRnary digIT.

Lets try counting:

 
Base 10, Binary, Ternary,  Balanced Ternary

      0,      0,       0,        0
      1,      1,       1,        +
      2,     10,       2,       +-
      3,     11,      10,       +0
      4,    100,      11,       ++
      5,    101,      12,      +--
      6,    110,      20,      +-0
      7,    111,      21,      +-+
      8,   1000,      22,      +0-
      9,   1001,     100,      +00
     10,   1010,     101,      +0+
     11,   1011,     102,      ++-
     12,   1100,     110,      ++0
     13,   1101,     111,      +++
     14,   1110,     112,     +---
     15,   1111,     120,     +--0
     16,  10000,     121,     +--+
     17,  10001,     122,     +-+-
     18,  10010,     200,     +-+0
      .       .        .         .
      .       .        .         .
      .       .        .         .
I imagine that many of my readers are looking at that last column and cringing. No doubt many feel that it is ugly, complex, non-intuitive, etc.. And, I'm sure some question why anyone, in their right mind, would want to compute in balanced ternary.

I don't think anyone would actually want to compute in it. But it, like binary, is great for COMPUTERS to compute with. Balanced ternary does have many nice properties. The famous computer scientist/mathematician Donald Knuth described it as the most elegant of the number bases. For example, in the traditional number bases you have a negative number you mark it with a negative sign ("-6", for example). This convention is called "signed magnitude form". In binary, this convention leads to more complexity in the logic than some other choices. Other representations include "two's complement binary" and "One's complement binary" representations. You don't need to know what these are for this discussion, and you can Google them if you want details. But with balanced ternary you don't need anything other than the number, since the left most non-zero trit is not only a digit, but is also the sign of the number, so we don't need anything extra. As Knuth points out, rounding and truncating are EXACTLY the same in balanced ternary. This is true even to the left of the ternary point if you understand "truncating" as "make all digits to the right be zeros". So there is no complex code needed to deal with rounding. Negating a number is just "flipping the signs all the way through. For example 6 is +0-, and negative six is just -0+. For arithmetic logic units balanced ternary has numerous advantages. For example, the multiplication times table has *no* carries.



      Balanced Ternary
Multiplication        Addition          

* | - 0 +            + |  -   0   +
--.------            --.-----------
- | + 0 -            - | +-   -   0
0 | 0 0 0            0 |  -   0   +
+ | - 0 +            + |  0   +  -+
  0 carries            2 carries

      Contrast with Ternary

* | 0  1  2          + |  0  1  2
--.--------          --.---------
0 | 0  0  0          0 |  0  0  0
1 | 0  1  2          1 |  0  1 10
2 | 0  2 11          2 |  0 10 11
  1 carry              3 carries


Technical/historical aside: Most modern machines using binary use two's compliment arithmetic for signed numbers, because it is straight forward to implement, intuitive, and easy to describe. Some computers (Univac mainframes, CRAY super computers and some others) use one's complement arithmetic instead. As an example of disadvantages of one's complement include that it has both a positive and a negative zero, and it can count down one more than it can count up. The arithmetic unit has to do an "End around carry or borrow" in many cases to make the arithmetic work. An obvious question is why anyone would want to use representation that has those bad properties. I'm told that the answer is that they didn't really want to, but IBM had patents on two's compliment arithmetic, which effectively prevented others from using it. So why is it that modern machines are almost all using two's complement? Simple, the patents expired. END ASIDE

Why do computers, almost universally, use binary, instead of ternary or balanced ternary? Slamming a current level all the way on, or all the way off, is a good match for a transistor, or even a vacuum tube. Trying to exactly hit any intermediate level exactly leads to much more complexity. Transistors devices that can drive a line positive, negative, and zero are much more complex than a single transistor. Why would Russian's Setsun series computers use balanced ternary anyway, given the engineering issue above? I'm told that they didn't want to. But after the US invented the transistor, the military uses were clear... you could (for example) make smaller, lighter, more reliable missile guidance systems. So the US restricted the export of transistors. So in Russia, Computer Science folk at Universities had a very hard time getting access to them. [Of course, they did get them, but they were big budget items due to the difficulty of getting the transistors.] Therefore they were not cost effective for computer logic units. So, what is an engineer to do? You can do logic with things other than transistors. Vacuum tubes work, but are unreliable and relatively short lived, easily broken, quite expensive, and relatively large (100's of times larger).

You can reliably tell the difference between current in one direction, current in the other direction, or no current at all. But, if you wind coils on a metal rod or donut, you can use it for logic. A perfect match to balanced ternary. And diodes were available.


Example of two coils on a metalic rod:

Opposite directions  In the same direction.

  | - 0 +              | - 0 +
--.------            --.-------
- | 0 - 0            - | - 0 0
0 | - 0 +            0 | - 0 +
+ | 0 + 0            + | 0 0 +

You can combine signals directly by different winding arrangement and organizations, and while the russians didn't have easy access to transistors, they had access to diodes (which had been around at least since the old crystal radios)
You can use many many windings on one core, if the windings have the same orientation this is an "OR" gate, with however many inputs as you want. You can make a "NOT" of a signal by reversing the direction of the winding. (You get zero if it is zero, and the opposite direction otherwise. Just like "NOT AND" and "NOT OR" can generate any other binary function, there are "complete" functions for three valued logic. And, there are a lot more than two of them, leaving a lot of room for creative design. You can send the output to other logic gates also, and assemble much more complex logic units.

In a ternary, or balanced ternary, computer one has to communicate with humans, and binary computers. So code or hardware needs to be able to convert balanced ternary (or ternary) numbers to base ten for people, and two or eight for binary computers. There is an method discussed in Just in Time Subtraction I and Just in Time Subtraction II that provides fast divides by any power of the base plus or minus 1. In decimal, that gives you divides and modulo by 10-1 = 9) and 10+1 = 11, 100-1 = 99 and 100+1 = 101... which doesn't seem all that useful. But in base three (balanced or not) it gives fast divides (and modulo functions) by 3-1 = 2, 3+1 = 4, 9-1 = 8, 9+1 = 10, ... Both 2 and 10 are exactly what we need to convert numbers for use by binary computers and humans.


Navigation:

next post | previous post

Friday, May 15, 2020

Logic: Axioms, Single axioms.

(Last edited May 11th, 2022 -JH)

Axioms

You have to start somewhere when dealing with a new area of math or logic. In the western tradition, this is often done with axioms (αξιώματα, classical Greek: worthy, later Greek: fitting). Axioms are, theoretically, statements about the topic that are so obvious that they need no proof. And (hopefully) they allow you to prove ANY valid statement in the system in question, and nothing that is not valid in the system.

Euclid (Εὐκλείδης) of Alexandria pioneered this with geometry around 300 BC, or so. He wrote a series of books on it that are still in print, and have been translated in quite a few languages. David Hilbert formalized axiomatic proofs for logic ("Hilbert style proofs"), and once you have logic you can define all mathematics subject to the usual arguments of exactly which axioms and definitions to use. [As shown by Russell and Whitehead in "Principia Mathmatica"] Computer theorem provers are usually working in axiomatic systems of one sort or another.

Basics:

Notation: I give axioms and theorems as axiom in Łukasiewicz notation, often called Polish Notation, a slash ("/"), and then the infix notation that I suspect more of my readers are more used to. For example the axiom of Modus Ponens is:
CpCCpqq / p -> ((p->q) -> q)

An axiomatic basis for a system has a number of parts. There are rules of inference, and axioms.

The default rules of inference are usually Modus Ponens (MP), and Uniform Substitution (US). If you have those two, you provably also have a rule called Condensed Detachment (CD) that is easier to use for computers and almost subsumes both.

Modus Ponens says that if you have a theorem of the form
   Cpq / p -> q
and you have p, then you also have q.
This is also called "Detachment".
Uniform Substitution says that you can substitute anything well formed in for a variable in a theorem as long as you substitute the exact same thing for each and every occurrence of it. (I'm really oversimplifying here, sometimes one set of rules and an axiom basis can be the same as another set of rules and other axioms.) There are also other rules used in specialized study of (usually unusual) systems.

As an example, using the usual rules, the axioms for the (implicational fragment of) positive propositional logic are often given as:

S:  CCpCqrCCpqCpr   /    (p->(q->r))->((p->q)->(p->r))
K:  CpCqp           /     p->(q->p)

There is, in general, no "one true axiomatic basis" of a logic system. A logic can have MANY different collections of axioms that describe exactly some given system, and nothing else. These collections are called an axiomatic bases for the logic systems. For example, the collection of everything the logic can prove is an axiomatic basis, albeit one with a countable infinity of axioms. A convention that cuts the universe of bases down quite a bit is to only worry about bases that have no "redundant" axioms.

A redundant axiom is one that can be proven by the other axioms. If it is redundant, why have it at all? Removing it means one less case to worry about when doing proofs about the system.

*** EDITING NOTE. Some text has vanished here, probably an editing error.
*** I'm working to find the block of text that went here.***
-JH

CCCpqrCsCCqCrtCqt / ((p→q)→r)→(s→((q→(r→ t))→(q→t)))

(This is from Carew A. Meredith, and it is a shortest single axiom for HI (The implicational fragment of the intuitionist logic] system. (see below)

Another basis for this logic that is quite common is:


B: CCpqCCrpCrq  / (p->q)->((r->p)->(r->q))
C: CCpCqrCqCpr  / (p->(q->r))->(q->(p->r))
K: CpCqp        /  p->(q->p)
W: CCpCpqCpq    / (p->(p->q))->(p->q)

If there are two different axiomatic bases for the same logic, how do know that they are the same? First some terminology: If a logic A can be described without redundancy by some axiom set (B) that is the A set plus some axioms, that A is called a sub logic of B, and B is called an extension of A. If you have two axiom sets, A and B, that allegedly are axioms for the same logic, they each must be able to prove everything in the system. That means that each should be able to prove the other. If axiom set A can prove a set B, then B is at least a sublogic of A, or possibly is A. If B can prove A, A is at least a sublogic of B or possibly is B. If they can each prove the other, then we know that they are at least sub logics of each other, and therefor that must be the same logic. If there is a known basis for a system, another alleged basis is the same system if you can find proofs both ways between the bases. Note that this is easier said than done. Such proofs can be very difficult to find and there are lots of bases for some logics that are suspected to be an axiomatic basis for a specific logic but have not been proven or disproven for fifty or more years.


OK, now that all that is out of the way, we can go on to talk about properties of axiomatic bases. An obvious observation is that some axiomatic basis for the same logic have different numbers of axioms. And if one knows this, an obvious question to ask is whether or not a system has a basis that has only ONE axiom.
In a basis that has only one axiom the axiom is called a "Single Axiom". Like any other basis, it can prove all the theorems of the system, and nothing else. And if you have a collection of single axioms, you can ask which is the shortest. This is called *a* shortest single axiom, since there may be many of that shortest size. As an example, the system called BCI has 80 known shortest single axioms, and it is known that no shorter ones exist, and that there are a small number of candidates of that size that no one knows whether or not they are single axioms, and they've been around for around 40 or so years without anyone finding a proof or disproof of their status.

[TODO: Put in the citations to the actual articles and dates not from memory] A good question at this point is what it takes for a logic to have a single axiom. The first steps in answering that were taken by Tarski who discovered in 1925 or so, and published in 1930 or so, that there were two different sets of two theorems (one given in the text, one in a footnote) that insured that if your system had them, your system also had a single axiom. Although Tarski distributed the proof privately, it never made it to print. It was effectively lost until (1980?) when Adrian Rezus generated a proof based on Lambda Calculus and Combinators it also expanded the number of different logic systems that could proven to have single axioms. This was a constructive proof, and it produced famously long axioms. In (2008??) I produced a proof based on just the logic generally available in Tarski's time, of yet another set of theorems that insured your system had a single axiom. It was also a constructive proof, and while the axioms generated were smaller than Rezus' paper generated, they were HUGH. I was following an approach pioneered by Dr. Dolph Ulrich. He apparently proved the footnote pair of axioms from Tarski's paper... and presented it at a conference that never had the proceedings with his paper published. I was also able to weaken one of the axioms in the set (maintaining the property) Tarski had in the set in the footnotes of his paper. Over the past few years, I've generated several other such pairs of theorems. I have a growing collection of pairs of theorems that work. So far my crowning achievement has been proving that if your logic has the the pair of axioms K: (p -> (q -> p)) and C: ((p -> (q -> r)) -> (q -> (p -> r)) in your system insures that it has a single axiom. There are some other pairs whos presence that (constructively) insure that a single axiom exists.


But, alas, nobody has has been able to produce, with such constructions, anything as short as the known single axiom bases. For one popular logic, BCI, The shortest single axioms are of length 19 (in Polish notation). Rezus's single axiom is about 93 symbols long My constructions are some 50 symbols long. This is a very very large gap. Those of us working in the area would welcome some construction from the axioms that was in the ballpark of the size of known shortest single axioms for the system.


Navigation:

next post |previous post

Thursday, April 16, 2020

A Note on Matrix Factorizations of Orthogonal Matrices

Last edited October 9th, 2023i - John Halleck

Last signifcant edit was on June 4th, 2022 - John Halleck

Requires: Basic Familiarity with Matrices (transposes, inverses, identities, ...)

Assumptions:
   If we multiply matrices below, we assume that they are conformable.
   If we we take the inverse of a matrix below , we assume that the matrix has an inverse.

Notation review:
   Identity Matrix: I
   Zero Matrix: 0
   Matrix [multiplicative] Inverse of A: A-1
   Matrix Transpose of A: AT
   Matrix Both Inverse and Transpose applied to A: A-T
     (Since the inverse of the transpose of a matrix equals the transpose of the inverse of matrix.)

Definitions:
   A matrix Q is orthogonal if it, multiplied by its transpose, is the identity matrix.
Examples:
   QTQ = I and QQT = I
   it follows that the transpose of an orthogonal matrix is equal to its inverse.
   QT = Q-1 And so it follows that Q = Q-T

In a prior post I introduced Orthogonal Matrices, and it is probably past time to say something interesting about them in the context of factoring matrices.

Factoring a matrix just means finding two or more matrices (called factors) that, when multiplied together, give the original matrix as a product. (This is also called a decomposition of the matrix.) An LU factorization, for example, produces an L matrix that is a unit lower triangular matrix, multiplied by an upper triangular matrix U. This particular factorization is one of the most famous in Linear Algebra.

I have heard it argued that if you have a orthogonal matrix, there is little point in forming an LU factorization of it, because you *already* have what you need to solve problems with it efficiently. Well, that does happen to be true, but I'm *not* solving systems... I'm playing with these matrices looking for interesting relationships. :)

So, let's examine the LU factorization (also called a decomposition) of an orthogonal matrix Q: we know that Q = Q-T, so if Q = LU, we have LU = (LU)-T = (U-1L-1)T = L-T U-T. It is true, in general, for any matrix that has an LU decomposition, we have:
(LU)-TL-T U-T
*BUT* with an orthogonal matrix both are also equal to LU.

Side note: There was nothing in that little derivation that depended on the factorization being triangular matrices... This can be done with ANY matrix factorization of Q, Given a factorization ABCD... of Q you would get the obvious A-TB-TC-TD-T...

The LU demo did show that that if LU is orthogonal, then L-T U-T is also orthogonal. L-T U-T is actually a UL factorization (with a different L and U) instead of LU factorization of the *same* matrix when the matrix is orthogonal. L is lower triangular,and L-T (because of the transpose) is upper triangular. And U-T is (because of the transpose) lower triangular. So, for what it is worth, if LU is a unit lower triangular matrix is a unit lower triangular matrix, then L-T is after an inverse and a transpose.

---- LU versus QR:

QR to LU: Solving a matrix problem like Ax=d is often done by finding a factorization of A which is QR where Q is orthogonal and R is upper triangular. If there is an LU factorization of Q, then this is (LU)R and if you re-associated it you would have L(UR), The U and R are both upper triangular, and could be multiplied together into a single upper triangular matrix, making the result of the form of an LU factorization.

*This side of the argument is a bit of a "hand wave", I need to make this a bit more rigorous -JH*
LU to QR: And, if you had Ax = d, and A = LU, you could convert this into a QR factorization by finding an upper triangular matrix E such that LE is orthogonal. Then insert EE-1 into LU to get L EE-1 U. Re-associate this as (LE)(E-1U) which you can see (by inspection) is a QR factorization. [Note that E-1U is upper triangular because both of it's terms are.] Finding such an E is probably as hard, or harder, than producing a QR factoring from scratch... interesting theoretical fact, not even worth it practically.


Navigation:

next post | previous post

Friday, April 03, 2020

Division by "Just in Time" Subtraction II

(Last edited July 25th, 2023 by John Halleck - fix arithmetic error.) Topics:     Division by "Just in Time" subtraction II
Requires: The information in The prior article in this series.
Definition: x mod y (called x modulo y), is just the remainder if you do integer divide of x by y
Disclaimer: The presentation below is mine, the ideas are from the late LeRoy N. Eide.

In my prior post on dividing by "Just in Time" subtraction [JITS], I pointed out that it did something bizarre if you were dividing by nine, and the number was not a multiple of nine. In this column I'll cover what it really does. And two different ways to look at that result that can make proper use of it.

The basic problem:

Given 9x = 1332, JITS properly produces 148 for x. And it produces the obvious (and correct) answer for any multiple of 9. So... let's look at the case where the given number is NOT a multiple of nine. As an example we can take 9x = 1336.

10x - 9x = x, so we try the subtraction:

Borrows:
   10x = ?  ?  ?  ?  ?  0
   -9x =       1  3  3  6
   ======================
     x = ?  ?  ?  ?  ?  ?

And we try to proceed as before...

Borrows:            -1
   10x = ?  ?  ?  ?  4  0
   -9x =       1  3  3  6
   ======================
     x = ?  ?  ?  ?  0  4

Digit by digit...


Borrows:            -1
   10x = ?  ?  ?  ?  4  0
   -9x =       1  3  3  6
   ======================
     x = ?  ?  ?  ?  0  4

... until we get to:


Borrows:          -1    -1
   10x = ... 5  5  7  0  4  0
   -9x =           1  3  3  6
   ==========================
     x = ... 5  5  5  7  0  4

And off to (countable) infinity producing 5's to the left. When this is shown to most people, they object at this point that that is *** not *** the right answer, and it doesn't even look meaningful. And when LeRoy first showed this to me I had a similar reaction. We are used to numbers trailing off to infinity to the right, but not trailing off to the left. [Except for Mathematicians that deal with p-adic numbers, who have seen this before. But that is a topic for a much later time.]

Our normal integers are represented by a series that may run off to smaller powers of the base, but only a finite distance into larger powers. So 123 is really 1x102+2*101+3*100, but if there are infinite powers we don't have much experience. It is easier to not think about these, then to think about this sort of infinite series.

There are several ways of looking at this state of affairs. On is that there is something we can do to "fix" this to match our normal interpretation. The other (and LeRoy's favorite) is to take this as an unfamiliar representation with it's own "virtues".

We'll just show the result from here on, instead of showing the computation of every step.

First, lets look at some very simple cases of dividing some small integers by 9. (And I'll assume everyone is happy with zeros trailing off infinitely to the left.)


0/9
borrows:
10x = ...  0  0  0  0
-9x =               0
=====================
  x = ...  0  0  0  0

1/9                    
borrows:        -1
10x = ...  8  8  9  0
-9x =               1
=====================
  x = ...  8  8  8  9

2/9
borrows:        -1
10x = ...  7  7  8  0
-9x =               2
=====================
  x = ...  7  7  7  8

3/9
borrows:        -1
10x = ...  6  6  7  0
-9x =               3
=====================
  x = ...  6  6  6  7

...

8/9
borrows:        -1
10x = ...  1  1  2  0
-9x =               8
=====================
  x = ...  1  1  1  2

9/9
borrows:        -1
10x = ...  0  0  1  0
-9x =               9
=====================
  x = ...           1

10/9
borrows:     -1
10x = ...  8  9  0  0
-9x =            1  0
=====================
  x = ...  8  8  9  0

That mysterious digit running to infinity on the left (call it d), is nothing but the distance to the next multiple of 9, and 9-d is nothing but the remainder when our 9x number is divided by 9. So, for example, if we have 9x = 1, the number running off is 8, so the remainder is 9-8 = 1. Subtract the remainder from the 9x value, to get it to be a multiple of 9, and use JITS to divide that by 9. A picky and astute student may question why we don't just add the 8 in to the 9x value and use JITS. Well, that would, for 9x=19, give not x=2 with remainder 1, but x=1 with remainder, 10.

On the other hand

While the above is a perfectly good way to view the integer problem, and it reduces to our traditional view, there is another view that looks at it not as divisor and remainder, but as a representation of numbers divided by nine. For example, we can take the "fractions above, and add them.


 Carrys: ...  +1 +1 +1 +1
 1/9 =   ...   8  8  8  8  9
+8/9 =   ...   1  1  1  1  2
============================
 9/9 =  ...    0  0  0  0  1
And

 Carrys: ...  +1 +1 +1 +1
 1/9 =   ...   8  8  8  8  9
+1/9 =   ...   8  8  8  8  9
============================
 2/9 =   ...   7  7  7  7  8
...

LeRoy went on to develop a full theory of rational numbers based on this, and it used to exist out on the net, but like his site, my site, and copies in places like the UK, those sites have long since gone away. I have posted a copy of those notes on the Pages section of my blog at https://eccentric-math.blogspot.com/p/jits-rational-representations-notes.html.
Prior Post

Coming attractions:Why was LeRoy actually doing this in balanced ternary rather than decimal anyway? What is balanced ternary? What has this got to do with a somewhat obscure early series of Russian computers no longer in use? And why did the US not legally exporting transistors lead to that series? That, and more can be found in my Balanced Ternary post


Navigation:

next post | previous post