(This page last edited November 5th, 2023 -JH)
Requires:Basic familiarity with Matrices (transposes, inverses, identities, ...)
Basic familiarity with Orthogonal Matrices.
Gives:
Some introduction to "big O notation". Some introduction to triangular matrices.
Assumptions:
If we multiply matrices below, we assume that they are conformable.
If we we take the inverse of a matrix below , we assume that the matrix has an inverse.
Notation review:
Matrix [multiplicative] Inverse of A: A-1
Matrix Transpose of A: AT
General discussion.
There are many absolutely correct results in mathematics that are generally useless.
They can, for example:
- be trivial or obvious
- be much more work than usual methods without any other advantage.
- equivalent to some other well known result
- be numerically unstable
Note that I don't include here not having a "practical" use because I'm a firm believer in formal mathematics. Results that enlighten us about how things work are of some use, even if they are not of any "practical" use. The term "numerically unstable" is an exact mathematical term, but for the purposes of this story you can pretend that it just means "gives really inaccurate results in some cases".
I, like most people playing in mathematics, have generated my share of these. I think I would have generated fewer if I could have seen bad examples labeled as such. That would have enabled me not to rediscover old ones, and would have given me more things to watch out for.
But, alas, there are no mathematics journals (or journals in other fields) specializing in correct but useless results. Most mathematics folk, on finally proving a useless result, don't run out and broadcast it to the word: "I just wasted months to prove the following useless result..." Blogs, unlike refereed journals, are held to less of a requirement of a topic being interesting or useful. So I've decided to drop some of them here.
Aside: Big O Notation
People dealing with the (worst case) complexity of algorithms use a notation called the "Big O Notation" when talking about how much work a given algorithm is. This just covers what terms are making all the difference when problems get bigger. If "n"is the size of a problem, then O($1$) says that no matter how big the problem is, it takes the same amount of time to solve. $O(n)$ means that if the size of the problem doubles, then the work to solve doubles. $O(n^2)$ problems are four times the work when the problem size doubles.
$O(n) + O(n^2)$ is just $O(n^2)$, because as $n$ gets larger, the $O(n)$ term is *SO* much smaller than the $O(n^2)$ term as to not make any practical difference. This notation does *NOT* say anything about how bad the problem is on the average, it just talks about worst case. And there are cases where an particularly painful amount of setup is required by an $O(n)$ solution that an $O(n^2)$ solution can take less time on the problems of the size you want to solve. But, we have to start somewhere, and "Big O" notation has proven to be very useful for a long time now.
Triangular Matrices.
Triangular matrices have either everything above the diagonal zeros [called a lower triangular matrix], or everything below the diagonal zeros [called an upper triangular matrix]. If everything above and everything below the diagonal is zero, it is called a diagonal matrix assuming that there are no zero elements on the diagonal. If all of the diagonal elements of a (lower or upper) triangular matrix are ones, it is called a unit (lower or upper) matrix.
If we have a matrix problem of the form $Ax=b$ [Non-singular square $A$, both $x$ and $b$ being column vectors or a matrix (viewed as a matrix of column vectors)], then by the usual techniques this is a O($n^3$) problem... if the size of the problem doubles, the amount of work to solve goes up by a factor of eight. The obvious way of solving the problem is to find the inverse of $A$ and multiply both sides of the equation by the inverse. Forming the inverse, in general, is an $O(n^3)$ problem. The traditional way of multiplying by a general matrix takes $O(n^3)$ time.
There are known ways of computing the inverse of a general matrix in time less than $O(n^3)$, but they have not proven practical for a number of reasons, such as being very complex, or have enough setup time that that they are slower than than $O(n^3) for problems of usual size.
But, in the mean time, there are some special cases that can already be solved in $O(n^2)$. Triangular matrices can be inverted in $O(n^2)$ time. So if we had two triangular matrices $L$ and $U$, the problem $LUx=b$ could be solved in $O(n^2)$ time. Since it is straight forward (but $O(n^3)$) to find an $L$ (traditionally unit lower triangular) and a $U$ (traditionally upper triangular) that has the product $A$ in a $Ax=b$ problem, one of the ways this is handled is to find the $L$ and $U$, and solve the triangular matrix problem.
So, what was the blind alley I fell into, and what was wrong with it?
"If you have found a mathematical result that could make you rich, or famous, or both, then
in all probability you have made a major mistake or entirely missed something critical."
- Nahaj's conjecture.
Given $Ax=b$, how can I break this down into triangular matrices in a way that hadn't been done in the literature before and is faster than $O(n^3)?
Well, I observed that forming $L$ * $U$ * x = b, had already been beaten to death in the literature. There was no way I could contribute anything new.
*BUT* no literature seemed to exist at all on $(L + U) x = b$. It is easy to form this (scale the matrix to have two's on the diagonal (remembering to scale the right hand side to match), and copy the lower part of $A$ into the lower part of the $L$ matrix, and copy the upper part of $A$ into the upper part of the $U$ matrix, and copy one's onto the diagonals of both.
And the scale was only $O(n^2)$ and the copy is also $O(n^2)$... and the copying doesn't even need to be done...
The code in later steps can just reference the corresponding entries in $A$, instead of $L$ and $U$ (with a tiny bit of overhead of generating the ones of the diagonals as needed.)This does, of course, assume that the diagonal elements are all non-zero to start with. *IF* that is not true, than a procedure called "pivoting" needs to be done first. But I'll ignore that here to make the explanation simpler.
But, of course, I have not yet solved the problem... I've only reduced it to another problem. But, I could multiply both sides by $L^{-1}$ (Which we can get in $O(n^2)$ time) and get $(L^{-1}L + L^{-1}U)x = L^{-1} b$ and that is $(1+L^{-1}U)x = L^{-1}b$. So, if I could find a fast way to compute a new $L'$ and $U'$ such that $L'U' = 1 + LU$ for general triangular matrices, and do it in $O(n^2)$ time, I could find one with my $L^{-1}$ as the lower triangular matrix and my $U$ as the upper triangular one, and I could solve the entire problem $Ax=b$ in $O(n^2)$ time and be well on my way to being famous and maybe rich.
This, right then and there, should have been a red flag.
In fact, in retrospect, all steps in solving this problem are known to be no worse than $O(n^2)$ except for the $L'U' = 1 + LU$ problem.
If it were also no worse than $O(n^2)$, the entire $A^{-1}$ problem would fall in $O(n^2)$ time.
I worked on it off and on for a year (I did, after all, have a day job), and did nothing but convince myself that the factorization update was a much harder problem than I thought. It was exactly as hard (worst case) as the $Ax=b$ problem since the two problems can be converted to each other in $O(n^2)$ time.
So, maybe, some brighter person than I am will eventually find an $O(n^2)$ algorithm to solve that factorization update problem. But it certainly won't be me. And I doubt it will be anyone trying to use my factorization problem as a route to fast inverses.
So in the end, this approach is just another mathematically correct (unless I made a mistake above I didn't catch) result that is useless.
Navigation: