Author: John B. Halleck
This page last edited 24 August 2023 by John Halleck, Foot note (Addenum] 10, and a pointer to my explanation of the Givens Rotation

© Copyright 2001, by the American Congress on Surveying and Mapping
All Rights Reserved.

Displayed on my web pages with permission of the copyright holder.

This peer reviewed paper appeared in the June 2001 Issue of Surveying and Land Information Systems

**** This page has been hauled over from my now defunct web page. I'm currently switching from ASCII art equations to proper LaTex equations, so changes are still being made. ***
These changes are purely cosmetic.
Footnotes are still being added and corrected.

Title Page

Least Squares Network Adjustments via QR Factorization

John.B.Halleck@gmail.com

Abstract

An introduction to orthogonal QR factorizations (via Given's rotations) as an alternative to Normal Equations for the least squares solution of network adjustment problems. The main advantage is its greater numerical stability. When the ratio of redundant shots to non-redundant shots is small⁶, QR factorization also has lower operation counts. In addition, it allows for efficient use of the underlying sparsity of the design matrix and has better performance on computers with paged memory.

Introduction
Notes
Basics
Review of Solving by Normal Equations
Equivalence of QR and Normal Equation Solutions
Solving by QR Factorization
Given's Rotations
Initial Setup
Processing
Updates
Memory Usage
Operation Counts
Conclusions
Example
Numeric Stability Example
Acknowledgements
References
Addenda

Introduction

Network adjustment problems are usually set up as a (linearized) weighted matrix problem. The least squares solution of this problem is usually computed by "the method of normal equations". While this method has great intuitive appeal, and very long tradition, numerical analysts in other problem domains usually use other methods. The condition number of the matrix is a measure of the amount of significance lost when you solve the system it represents. Solving the problem via normal equations squares the condition number of the problem, while solving by many other methods does not.

Numerical analysts have many methods appropriate to least squares problems. A very good overview of the methods available can be found in "Matrix Computations" (Golub and Van Loan, 1996). "QR factorization" is a popular method that works well for the surveying problem. There are, in fact, several equivalent methods of forming the QR factorization. This article will discuss using Given's rotations for the task. Other techniques that are found in the literature that could also be used include Given's reflections, Householder reflections, fast Givens, Gentleman's methods, etc. Given's rotations are used in this paper because they have low operation counts on problems with the structure that survey networks show, and are relatively easy to describe and implement. For other types of least squares adjustment problems (such as photogrammetry) other methods of obtaining the QR factorization may be more appropriate.

Notes

The notational conventions of the main body of the numerical analysis literature have been used in this paper. This should make it easier for people who read this article to go to the actual literature. However it does make for some awkwardness in a few places. For the same reason the presentation will be in terms of an upper triangular matrix, even though a lower triangular matrix is somewhat more natural to the problem.

Instead of full three dimensional adjustments, level nets will be used in order to simplify this presentation. Extension of the adjustments to three dimensions is the same as it is for traditional normal equation adjustments. Some QR specific optimizations exist, but are beyond the scope of this paper.

Basics

Any good book about adjusting survey networks will cover the details of the problem setup, so only a quick summary of the usual background is given below.

A survey network consists of shots that connect points, along with weights for those shots. At least one point has been tied down (called a "control point"). The problem has been linearized, converting the shots into changes in coordinates. The (co)variances of the shots have also been computed so that weights can be correctly assigned.

There is a design matrix $A$ that describes the connectivity of the problem (the matrix contains only 1, -1, and 0). There is a collection of unknowns $X$, an array of constants $L$ (the constraints and the linearized measured values). The design matrix ($A$) has n columns and m rows (one row for each measurement, and one for each control point). The number of rows is greater than or equal to the number of columns. When it is greater, the problem is called an "over determined system", and least squares techniques are used to solve it. The redundancy of the problem r is the number of rows minus the number of columns (m-n). Not all shots are of equal accuracy, and they are not normally weighted the same. Most problems have a weight matrix $W$ that reflects this fact.

When a system is over determined, it is not generally possible to find "the solution" of the problem, since there is usually no specific solution that makes the result exactly right. There are an infinite number of possible "answers" that are nearly right. The amount that would have to be added to the $L$ matrix to make the problem come out exact is called the "residuals" (usually denoted by $V$). The residuals are often specifically written into the problem equation when people solve by normal equations, but is usually only computed after the least squares solution is found in numerical analysis texts. The specific solution usually sought for adjustments is the solution that minimizes the sums of the squares of the residuals. This is called the "least squares" solution in survey texts, although some numerical analysis literature often refers to this as the "solution that minimizes the 2-norm".

Solving the problem:

1) Unit conversion, linearization, gross blunder detection, and any other preprocessing.

2) The problem is set up in the form $Ax=L$ subject to a weight matrix $W$ (which is the inverse of the matrix of covariances).

3) The least squares solution is then computed. This is the step that this paper addresses.

4) The statistics of the survey are computed, blunder detection, relinearization, or other post processing is performed.

Review of Solving by Normal Equations

The least squares solution of the over determined system $AX=L$, subject to a weight matrix $W$, is given by the equation $A^tWAx = A^tWL$, where $A^t$ means the transpose of $A$.

This new system is square, and may now be solved by a number of methods. The traditional method is to use a Cholesky factorization of the matrix on the left-hand side into an upper triangular matrix $R$ and its transpose. This gives:

$R^tRx = A$

The $R$ matrix above is called the "Cholesky factor" of the normal equations. Solving this system is now easy, since solving triangular matrices is easy. In the worst case, the operation count for Cholesky factorization is about n³/3. The actual work in the sparse matrix case varies with the problem and is not easy to predict in advance.

Other techniques (such as Gaussian elimination) can also be used to solve the system, but there is no advantage in operation counts or in accuracy, so Cholesky factorization will be used in this paper.

The system is now solved for $X$. This is done either by appropriate backward and forward substitutions or by explicitly forming the inverse matrix. A common way of doing this is to form the matrix augmented by the identity, and solve the systems:

   R^tY = [ A^tWL | I ]
      Solving this is equivalent to multiplying both sides by
      the inverse of R^t
   RZ=Y
      Solving this is equivalent to multiplying both sides by
      the inverse of R

    Z = [ X | R^-1(R^t)^-1 ]
   but since R^tR is also A^tWA,
   (R)^-1(R^t)^-1 is also (A^tWA)^-1.

   So our result is [ $X$ | (A^tWA)^-1 ]

It is possible to take advantage of the simple structure of the survey problem and of the resulting (sparse, positive definite) matrices in order to avoid having to do the full work that the matrix multiplications above would imply. There are many simplifications and speedups available that make actual implementations of this process by normal equations the traditional method of choice.

Equivalence of QR and Normal Equation Solutions

The least squares solution to the problem, while traditionally solved by Normal Equations) can be computed with some advantages by various orthogonalization methods such as $QR$ factorization.

The weighted problem is first reduced to an equivalent unweighted problem. The weight matrix can be factored into a matrix $W'$ such that $W'^tW'$ is equal to the original $W$ matrix. (This is called a "Choleskey Factorization") This is near trivial, since the matrix is usually diagonal diagonal (or block diagonal in three dimensional problems). By premultiplying both sides of the problem $Ax=L$ AX = L)W'AX = W'L is produced which has the same solutions as the original weighted problem. Using G for W'A and using H for W'L, the new problem is GX = H.

The fact that this is the same problem can be shown by setting up the solution of this new problem by Normal Equations:

     (W'A)^tW'AX = (W'A)^tW'L

which is

     A^tW'^tW'AX = A^tW'^tW'L

and since W'^tW' is W, that reduces the problem to

     A^tWAX = A^tWL

which is what the normal equation solution would have given.

Solving the weighted system (GX=H) proceeds by factoring G into two matrices, Q and R, such that G = QR, where R is upper triangular, and Q is orthogonal. "Orthogonal" means that the transpose of Q is also the inverse of Q.

The upper triangular part of the matrix R that comes out of this factorization is the same matrix that would have come out of the Cholesky factorization in the normal equation case. This can be demonstrated by forming the normal equations with QR instead of G.

      (QR)^tQR = R^tQ^tQR

and since Q^tQ is Q^-1Q (from the definition of Q being orthogonal) and since that is the identity, the equations in this case reduce to R^tR. Since R is upper triangular, this is already in the form that the Cholesky factorization would produce. Since the Cholesky factorization is unique, our R must be the same R that would have been produced by normal equation methods (up to the signs of the rows⁴). Since the Cholesky factorization's R has positive diagonal elements, the correct signs for rows could be established trivially. Instead of adjusting the signs of the rows after the fact, one can ensure the correct signs merely by making the shots in the spanning tree always shoot from a known point to an unknown point instead of the other way around. This works because the Given's rotations will produce positive values on the diagonal, and the restriction on direction insures that the diagonal starts with positive entries.

Solving by QR Factorization

A QR factorization splits the A matrix up into an upper triangular R matrix, and an orthogonal Q matrix. Since only R is really needed to compute the solution, there is no need to actually form and save Q. This reduces the storage required to just that needed to store R and some space for housekeeping.

Given's rotations, the QR factorization method covered in this paper, are an orthogonalization method that can be used to zero out items below the main diagonal in a matrix without changing the 2-norm of the problem. This means that they can be freely used to work on the problem without making changes to the least squares solution.³

Factoring with Given's rotations takes on the order of (m-n)n²/2 operations, compared with on the order of n³/3 for the Cholesky factorization. For small amounts of redundancy, this is a smaller operation count. (If there is no redundancy the overhead of this step really is zero, since the Cholesky factor is a spanning tree in that case.) The traditional solution has now been replaced with an algorithm that is usually faster and always more stable.

For the QR factorization, the R matrix starts out in the form of a spanning tree (discussed below) followed by the redundant shots. Most surveys are already in that form or can be put into it with trivial effort. For a collection of shots with no order, you would have to put it into that form, but that can be done in linear time.

After the QR factorization, the upper triangular part of R is now the same as would have computed for the Cholesky factorization. This allows us to use the computed R to get the solution exactly the same way that it would be used for a solution by normal equations.

Given's Rotations

A "Given's Rotation" is an orthogonal transformation that can be used to selectively zero out elements of a matrix, without changing the 2-norm (least squares solution) of the problem it represents.

If the leading element in the $R$ matrix row is D, and the leading element in the redundant row is E, then the Given's rotation of the two rows is a premultiply by the following matrix, where S=E/P, C=D/P, where P is the Pythagorean sum of D and E [sqrt(D²+E²)].

[ 1                          ]
[    .                       ]
[      .                     ]
[        1                   ]
[          C . . . S         ]   This row is changed
[          . 1     .         ]
[          .   1   .         ]
[          .     1 .         ]
[         -S . . . C         ]   This row is changed
[                    1       ]
[                      .     ]
[                        .   ]
[                          1 ]   All other rows are unaffected.

All other elements above are zero. Since a multiplication by this matrix only affects two rows, it can be written as a two by two matrix multiplied by those two rows:

[  C  S ] [ Row from $R$          ] = [ Updated row for $R$    ]
[ -S  C ] [ Row we are processing ]   [ Updated current row. ]

Actually computing the scale [sqrt(D²+E²)] by squaring the quantities, adding them, and taking the square root can lead to notable loss of significance in the answer and the danger of overflow. In practice the Moler-Morrison algorithm (Moler and Morrison, 1983) can be used to compute the scale factor, without danger of overflow unless the scale itself overflows. Other techniques for avoiding numerical problems computing this quantity are given in "Matrix Computations" (Golub and Van Loan, 1966). So that some measure of work for this part can be used when computing operation counts, the 24 floating point operations that Moler-Morrison uses to compute 20 significant figures will be used². Variants of the Given's rotation such as Fast Given's or Gentleman's transformation (Golub and Van Loan, 1996) can be used to avoid a square root altogether.

Readers with a computer graphics or analytic geometry background will recognize this matrix as a classic rotation matrix, where C and S are the cosine and sine of an abstract angle.

Initial Setup

Every survey network has what graph theorists call a "rooted spanning tree". This is a minimum collection of shots that tie the network together. There may be many possible spanning trees and any such tree will do for our purposes. Separating the problem into the spanning tree and the remaining redundant shots allows us to set up the problem in a form easy to process with QR factorization. A spanning tree is a survey you would get if you started at a control point and always shot from some already surveyed point to some point not yet in the survey.

The shots in the design matrix $A$ usually form a matrix that is almost lower triangular. The matrix can be arranged to be lower triangular, with a control point first, and then the non-redundant shots. Arranging it in lower triangular form is equivalent to making it start with a spanning tree order ⁷.

Since the order of shots and the exact numbering of points make no difference to the solution of the network, both can be renumbered in order to make this part of the $A$ matrix an upper triangular matrix. (There is no formal requirement that upper triangular matrices be used, since a lower triangular matrix could be made to work just as well. Upper triangular matrices will be used here only to match the normal conventions for numerical analysis texts, and avoid having to rederive the standard QR properties for a QL factorization.) It should be noted that that all the renumbering is just loop housekeeping in the program that processes this matrix, and need not be actual movement of the data.

The $A$ matrix in now (conceptually) in the following form:

[ * * * * ... * * * ]  Spanning tree part. (The $R$ matrix of the spanning tree)
[ 0 * * * ... * * * ]  (Upper triangular)
[ 0 0 * * ... * * * ]
[ 0 0 0 * ... * * * ]
[  ...    ...   ... ]
[ 0 0 0 0 ... 0 * * ]
[ 0 0 0 0 ... 0 0 * ] (This row is a control point)
[                   ]
[         ...       ] Redundant shots.
[         ...       ]
[         ...       ]

Note that if the survey has no redundant shots, then the $A$ matrix at this point is the upper triangular $R$ matrix and the factorization is finished. The QR factorization, as it processes each shot, converts the $R$ matrix from what it was into one that reflects the addition of this shot to the survey.

Solving the problem, given $R$, is equivalent to walking the graph from a control point to each reachable point and adding in a weighted difference from each involved shot. As each redundant shot is added to the factorization, the update is equivalent to walking from each end of the shot back to a control point, while readjusting weights along the way so that they reflect the influence of the shot.

Processing

Since the matrix starts out in upper triangular form, the explanation of QR factorization is straightforward. Orthogonal matrices (Given's rotations in this case) are then used to zero out each non-zero element in a redundant shot being added, and also update the R matrix. After a redundant row (shot) is processed, it will not be needed further and may be discarded.

The problem is started by setting $R$ to the initial spanning tree. Processing each redundant shot (or control point) then proceeds as follows:

While there are non-zero items in this shot:
1. If the first non-zero item of the row is the K'th item, then get the K'th row of $R$.
2. Execute a Given's rotation to clear out that non-zero item. (This may add new non-zero elements to the right of this element in both the row of $R$ and in the redundant shot.)

Since the redundant rows and the rows of the $R$ matrix all start out sparse, the multiplications implied by the Given's rotation are usually a multiply by zero. The row of $R$ that is needed is the one corresponding to the next non-zero element in the redundant vector, so each zero element between our current position and the next non-zero item in the row means a row of R$ that will not be affected by this update.

Updates

Each shot is processed against the current $R$ matrix. This means that if we save the $R$ matrix, adding additional shots (or control points) later may be done directly. Shots that don't add a new point are just processed as above. Shots that do add a new point to the survey just add a row and column to the $R$ matrix. This new row would not have participated in any previous update, so no special processing needs to be done with it.

Assuming a consistent numbering scheme, two $R$ matrices can be combined by taking one as redundant rows for the other. This allows parts of a problem to be processed separately, and the result combined later.

Normal equation solutions can also be updated on the fly, but such Normal Equation update methods are notoriously poor numerically, and lose accuracy rapidly.

Memory Usage

QR factorization by Given's rotations makes use of the design matrix in a row oriented manner, with rows of the matrix being accessed in sequential order. The matrix itself is usually sparse, making it an ideal candidate for simple sparse matrix methods. The fill-in pattern of the result is easy to predict since the sparsity pattern of a processed row is the union of the patterns of the two rows being used for the Given's rotation. Since the operations are row operations, this leads to locality of reference, which minimizes paging overhead on computers with paged memory

On computers with paged memory (PCs, most mainframes, workstations, etc.), programs with memory accesses that are localized have less swapping overhead than those with wider ranging access. A matrix algorithm which has mostly row operations, or mostly column operations, can be optimized to take advantage of pattern of memory accesses. The Cholesky factorization used for solving normal equations (or any equivalent method such as Gaussian elimination) mixes both row and column operations and is therefore difficult to optimize. QR factorization can be easily arranged to do exclusively row operations. This allows implementations of QR factorization to have much lower page swapping than normal equations would have when solving the same problem.

Note that, while all redundant shots are incorporated into the computation of the normal equations, such shots are added to the the QR factorization as they are processed, so that you have many possible trade offs. A single redundant shot can be processed against one linear pass through the existing $R$ matrix, with a memory usage about equal to the number of points ($n$). Or as many shots as will fit in memory can be processed, with an additional memory overhead of the number of shots times the number of redundant observations. Or any scheme between these can be used.

The pattern of memory accesses forming the normal equations jumps around and can cause page swapping on machines with paged memory. The QR factorization does not have that property. The step of finding the spanning tree will not exhibit much page swapping unless the survey was done with shots in somewhat random order.

Operation Counts

The analysis below is for level nets because they are simple to examine. A three dimensional problem would, of course, have higher operation counts. The comparison between the methods would still be similar since they would all have equivalent amounts of extra work.

In general, computational overhead depends on both the number of redundant shots and also on the interconnectivity of the network. The connectivity of the network determines the sparsity of the $R$ matrix, and therefore determines the savings available by sparse matrix techniques.

The standard figures for the amount of work to solve a least squares problem, by the two methods, are given in Matrix Computations (Golub and Van Loan, 1996):

   Given's rotations 3mn²-n³    which is (3m - n  )n²
   Normal equations   mn²+n³/3  which is ( m + n/3)n²

this means that when the redundancy (m - n) is is low, orthogonalization methods have a lower operation count⁸.

These counts, however, are only valid for full matrix problems, and the survey problem has structure that both methods can exploit, and which are usually exploited for normal equation solutions. An analysis taking this into account is more complex, but gives more realistic figures. In addition, the overhead for a Given's rotation is higher than that of the basic operation in a Cholesky factorization; therefore that overhead needs to be accounted for also. These counts are usually just a measure of how fast the work increases as a function of problem size. They have implicit constants of work that don't normally appear in the numerical analysis literature, but are needed to work out an honest comparison.

Once $R$ is computed, the processing is the same for Normal Equations and for QR factorization. Therefore that part of the processing will not need to be considered when comparing the methods.

For a solution by normal equations, the structure of the problem allows formation of the $A^tWA$ matrix without floating point multiplies or divides, and only floating point addition. Since multiplies and divides are the operations that take most of the time in any matrix computations, operation counts consider them and ignore the floating point addition operations. That convention will be followed in this analysis. This means that the normal equation setup can be ignored completely in this analysis. This leaves only the n³/3 term. It is difficult to trim this term any further, since prediction of the effects of the structure of the problem are hard to analyze.

QR factorization has a weighting step that is conceptually three multiplies per shot; but, since it is always a multiply by 1 or -1, these are just data transfers in practice. So, by the reasoning that omitted the setup for normal equations, the setup for QR factorization can be ignored also.

A more exact count of operations for QR factorization is

       24rn     Compute each Given's rotation (by Moler-Morrison)
  + 4rn(n-1)/2  Carry out the rotations
  =================
  24rn +2rn(n-1) = 2(12rn+rn(n-1)) = 2rn(n+11)

These are worst case figures, assuming nothing special is done for sparsity, to match the worst case assumptions made above for the normal equation case. This result is general, and makes no assumptions about the structure of the problem. Without sparse matrix techniques QR factorization wins on computational time if the total redundancy of the survey is less than about 15% or so.

As an example, a survey with 1000 points and 100 redundant shots would be about 333,000,000 operations via Cholesky factorization and about 202,000,000 operations via QR factorization.

Operation counts found in simulations show much better preformance than these figures indicate. A more carefull analysis is needed to see the reasons for this. Taking advantage of the structure at hand (where each shot has just a point it comes from, and a point it goes to) can tighten up the analysis of operation counts.

The leading entries of the row are processed by just giving $R$'s element the scale factor computed by the rotation, and the redundant row's element zero. The remaing entries break down to three cases:

neither entry is non-zero.
only one of the two elements is non-zero.
both of the elements are non-zero.

In case one, both rows get a zero, and no floating point operations are needed. In case two, one of the elements is zero, so there are only 2 multiplications that need be done to update the elements, and both rows end up with a non-zero element. In case three, a full 4 floating point multiplications need be done.

Starting with a problem that has at most 2 non-zero elements per row, the maximum work needed to compute an adjustment (using sparse matrix techniques) is straight forward to compute. If the redundancy is greater than n-1 shots, the analysis is that given above. If the redundancy is less than n-1, then processing the Kth redundant shot has an overhead of n Given's rotations, and 2 operations for each non-zero element in each row that doesn't have a non-zero element in the other row, and 4 operations for each non-zero that appears in the same position of both rows. This is a total of 24n+2(n-k-1)+4(kn-(k-1)k/2) operations.. This simplifies to 26n+4kn-2kk-2 operations to add in the Kth shot. Note that each shot added can, in worst case, take more time than the last. This is because as each shot is added, in can cause zero items in the matrix to get filled in, causing later shots to have more items to process. In order to add in r shots, the work needed to add each shot from 1 to r needs to be summed. This gives us a total count of work that is: 26nr+2nr(r-1)-r(r+1)(2r+1)/3-2r which simplifies to r(2n(12+r)-(r+1)(2r+1)/3-2) This is assuming worst case fill-in, actual operation counts can be much less depending on the structure of the survey.

For the example given earlier (n=1000, r=100) this gives worst case figures to compute R of:

   Cholesky (non-sparse)   about 333,000,000
   QR       (non-sparse)   about 202,000,000
   QR       (sparse)       about   1,560,000

I couldn't find a worst case analysis of a sparse Cholesky factorization on problems of this structure, so I can't properly compare what it would be⁵.

Worst case figures can, of course, be much different than any typical cases. A test was run of 25 sample randomly generated surveys with 1000 points and 100 redundant shots, with instrumented code for each method. The average number of floating point multiplies and divides to compute $R$ were:

   Cholesky (non-sparse) averaged about 167,000,000
   QR       (non-sparse) averaged about   1,570,000
   QR       (sparse)     averaged about     417,000

Note that the actual Cholesky figure is about half of what the published figures for it would indicate, and all methods (of course) did better than their worst case figures.

There are many techniques that can be used with QR factorizations that are not covered in this paper but that are worth mentioning. The collection of redundant shots can, themselves, be simplified by QR techniques to lower operation counts dramatically. For example, two shots between the same two points can be combined into one equivalent weighted shot with a single Given's rotation. Sorting the input shots can also have an impact on the final operation count.

Conclusions

Orthogonalization methods are well known in the numerical analysis community for their numerical stability. Conversely, normal equation methods are known for their lack of numerical stability.

QR factorizations can make very good use of the sparsity of the problem.

If the ratio of redundant shots to non-redundant shots is low, QR factorization has lower operation counts in all cases. When sparse matrix techniques are used the count can be dramatically lower.

Example

With the permission of Dr. Charles D. Ghilani, I will use example 11.1 from "Adjustment Computations" (Wolf and Ghilani, 1997). This problem has a high percentage of redundant shots, so it has a fairly high operation count. However, since a detailed solution by traditional methods is available, it makes comparison of the methods easier. Refer to that book for details of solving the same problem via normal equations in the traditional manner.

The problem as given:

From To   Elevation change  standard deviation
A    B    10.509            0.006
B    C     5.360            0.004
C    D    -8.523            0.005
D    A    -7.348            0.003
B    D    -3.167            0.004
A    C    15.881            0.012

A -- B
| \/ |
| /\ |
D -- C

Wolf and Ghilani do some setup to avoid adjusting the control point, and come up with the system (ignoring residuals for the moment) of:

[  1  0  0 ]         [  558.105 ]
[ -1  1  0 ] [ B ]   [    5.360 ]
[  0 -1  1 ] [ C ] = [   -8.523 ]
[  0  0 -1 ] [ D ]   [ -444.944 ]
[ -1  0  1 ]         [   -3.167 ]
[  0  1  0 ]         [  453.477 ]

Subject to the weight matrix:

[ 1/(0.006²)        0            0            0            0            0      ]
[      0       1/(0.004²)        0            0            0            0      ]
[      0            0       1/(0.005²)        0            0            0      ]
[      0            0            0       1/(0.003²)        0            0      ]
[      0            0            0            0       1/(0.004²)        0      ]
[      0            0            0            0            0       1/(0.012²)  ]

The first step is to weight the problem. For that we need a $W_c$ matrix such that $W_c^tW_c = W$. Since this is a diagonal matrix, this is trivially the matrix where each diagonal element is the square root of that in the diagonal matrix:

[ 1/0.006      0         0         0         0         0    ]
[    0      1/0.004      0         0         0         0    ]
[    0         0      1/0.005      0         0         0    ]
[    0         0         0      1/0.003      0         0    ]
[    0         0         0         0      1/0.004      0    ]
[    0         0         0         0         0      1/0.012 ]

which is:

[ 166.667       0         0         0         0        0   ]
[    0      250.000       0         0         0        0   ]
[    0          0     200.000       0         0        0   ]
[    0          0         0     333.333       0        0   ]
[    0          0         0         0     250.000      0   ]
[    0          0         0         0         0     83.333 ]

Premultiplying by $W_c$ matrix (The same, in the case of level nets, as multiplying each row by the inverse of the associated standard deviation) we get:

[  166.667    0        0     ]         [   74684.167 ]
[ -250.000  250.000    0     ] [ B ]   [    1340.000 ]
[    0     -200.000  200.000 ] [ C ] = [   -1704.600 ]
[    0        0     -333.333 ] [ D ]   [ -148314.667 ]
[ -250.000    0      250.000 ]         [    -791.750 ]
[    0      200.000    0     ]         [   37789.599 ]

This matrix is already in lower triangular form. So, to put it into the upper triangular form usually seen in numerical analysis texts, we need only renumber the points (reversing them) and in the first n rows we renumber the shots (reversing them):

[  200.000 -200.000    0     ] [ D ]   [   -1704.600 ]
[    0      250.000 -250.000 ] [ C ] = [    1340.000 ]
[    0        0      166.667 ] [ B ]   [   74684.167 ]
[                            ]         [             ]
[ -333.000    0        0     ]         [ -148314.667 ]
[  250.000    0     -250.000 ]         [    -791.750 ]
[    0      200.000    0     ]         [   37789.599 ]

 The upper part is the spanning tree, and the lower part the redundant shots.

There is no need to store the redundant shots in expanded rows. Until each shot needs to be processed they can be kept as just a from point, a to point, a value, and a standard deviation. The square of the standard deviation is the variance, and the inverse of the varience is the weight.

Starting with the first redundant row, the first element of that row is non-zero, so we need to process it against the first row of $R$. The leading entries are 200.000 and -333.000, so the Given's rotation is:

Scale =     388.730  (Sqrt (200²+(-300)²))
C =  200.000 / Scale =  0.514
S = -333.000 / Scale = -0.857
[ 0.514  -0.857 ] [ Row from R    ] = [ 388.730 -102.899  0    ] [ 126301.768 ]
[ 0.857   0.514 ] [ Redundant Row ]   [    0    -171.499 0.000 ] [ -77768.949 ]

The second element is now the first non-zero element, so we now process it against the second row of R.

[ 0.825 -0.566 ] [ Row from $R$] = [ 0    303.170   -206.155] [  45097.753 ]
[ 0.566  0.825 ] [ Redundant Row ]   [ 0       0      -141.421] [ -63371.900 ]

The third element is the first non-zero element, so we now process it against the third row of R

[ 0.762 -0.647 ] [ Row from R    ] = [ 0 0 218.581 ] [  97947.549 ]
[ 0.647  0.762 ] [ Redundant Row ]   [ 0 0    0    ] [     -0.216 ]

The result after processing the first redundant row is:

[ 388.730  -102.899      0    ] [ D ]   [ 126301.768 ]
[    0      303.170  -206.155 ] [ C ] = [  45097.753 ]
[    0         0      218.581 ] [ B ]   [  97947.549 ]
[                             ]         [            ]
[    0         0         0    ]         [     -0.216 ]
[  250.000     0     -250.000 ]         [   -791.750 ]
[    0      200.000      0    ]         [  37789.599 ]

Note that the processed shot does not need to be stored any longer, since we know all elements of it are zero.

The second redundant shot is processed in a similar manner, giving:

[ 462.181   -86.546  -135.228 ] [ D ]   [ 105801.372 ]
[    0      308.237  -240.736 ] [ C ]   [  31899.621 ]
[    0         0      276.654 ] [ B ]   [ 123970.872 ]
[                             ]         [            ]
[    0         0         0    ]         [     -0.216 ]
[    0         0         0    ]         [     -0.809 ]
[    0      200.000      0    ]         [  37789.599 ]

The third redundant row is now processed. Note that, since its first item is zero, it doesn't have to process row one at all.

[ 462.181   -86.546  -135.228 ] [ D ]   [ 105801.372 ]
[    0      319.303  -232.392 ] [ C ] = [  40656.640 ]
[    0         0      283.698 ] [ B ]   [ 127127.754 ]
[                             ]         [            ]
[    0         0         0    ]         [     -0.216 ]
[    0         0         0    ]         [     -0.809 ]
[    0         0         0    ]         [      0.755 ]

The upper triangular matrix above is the same as the factor that the Cholesky factorization would have produced. Traditional methods could now be applied to the computed R, and we would have the same overhead as the those methods. Note, however, that the first three lines above are a triangular system that can be solved directly to produce the values for B, C, and D.

B =  448.10871
C =  453.46847
D =  444.94361

This result can be computed solving just one triangular system, without computing $A^tWL$ that traditional methods use on the right hand side of the equation. This simplification was used neither in the computation of operation counts, nor in the comparison of the methods. Computing the inverse matrix, needed for the statistics, will still require both the forward and back substitutions just as it does in traditional adjustments.

Each redundant processed shot still has a non-zero right hand side. These values are a set of residuals of the problem. These are not the residuals of the actual shots that survey books are referring to when they talk of residuals, but are instead the residuals of the adjustment. Since the problem was weighted before we started the QR factorization, these are weighted residuals. The sum of the squares of these residuals is the value that would be computed if the shot residuals ($V$) were computed and $V^tWV$ were evaluated (1.27 in this case). The reference standard deviation can be computed from this in the normal manner.

Numeric Stability Example¹⁰

There are notable differences in numerical stability between Eormal Equation and QR factorization methods. Consider the rather contrived example of

From To Value Std Dev
      A  1.0  0.0001  (Control point)
A     B  1.0  0.0001
B     C  1.0  0.0001
B     C  1.0  0.0001

and we restrict our arithmetic to about 8 significant figures (approximately IEEE single precision) the problem is easily solved by either method, and one gets what one would expect:

A = 1.0
B = 2.0
C = 3.0  with a residual of zero.

If we change the problem slightly to

      A  1.0  0.0001
A     B  1.0  0.1     (The standard deviation here has changed.)
B     C  1.0  0.0001
B     C  1.0  0.0001

and continue to restrict our arithmetic to about eight significant figures, there are now differences in the answer that normal equations generate, while QR factorization result is essentially unchanged. In theory both problems (with perfect arithmetic) should generate exactly the same answer, since the badly weighted shot is not in any loop, and the loop itself closes perfectly. So why are Normal Equations so much worse in cases like this??

What changed is that the Cholesky factor computed by the traditional method (with only 8 digit arithmetic) is now

[ 14142.136  -14142.136      0.000 ]
[                 9.798    -10.206 ]
[                        10000.000 ]

instead of the correct

[ 14142.136  -14142.136      0.000 ]
[                10.000    -10.000 ]
[                        10000.000 ]

(Note that the rows and columns here have been reordered here to match the earlier presentation of QR factorization.) Examining the center element shows that factoring the normal equation has left it with only a little more than one significant figure. This is because forming the normal equations involves adding and subtracting the squares of the standard deviations, while QR factorization is dealing with them directly. Since squaring the numbers increases the differences between them, the chance of losing significance when adding the squares is dramaticly greater with normal equations. This loss of significance happens in the step of forming the normal equation matrix, therefore any method of solving the problem with Normal Equations must inevitably have the same difficulties. And since the problem has the errors, no post processing has any real chance of producing an accurate result for the original survey.

Another important reason that normal equations do worse for this example is that it mixes some quantities that must later cancel out. If precision is lost they no longer cancel. QR factorization avoids some of this mixing to begin with, particularly when Given's rotations are used for the factorization.

On the other hand, the QR factorization has retained almost all significant figures in all of the elements of the array. In fact, QR factorization is still producing the correct results to seven significant figures when the problem is badly weighted to

From To Value Std. Dev.
      A  1.0  0.0001
A     B  1.0  100000000000000000.0/
B     C  1.0  0.0001

The stability problem with normal equations, of course, depends on the underlying arithmetic. If we switch from single precision to double precision we don't see problems this bad with Normal Equations until the weight for the second shot is raised to about 1000.0, but the QR factorization also becomes even more stable and still produces the correct answer with a weight of 10⁶⁰. Admittedly these sorts of weightings are not realistic, but the subtle precision problems that can be caused are hard to show in a small example.

Acknowledgements

I would like to thank Dr. Charles D. Ghilani, since without his encouragement, and time, this paper wouldn't have gotten off the ground.

I would also like to give my heart felt thanks to Olly Betts, whose unique ability to spot problems and suggest improvements has contributed greatly to the accuracy and readability of this paper.

References ¹

Alan George, Joseph Liu, Esmond Ng,. 1994 "Computer Solution of Large Sparse Positive Definite Systems"⁹

Ghilani, Charles D.,1990, "A Surveyor's Guide to Practical Least-Squares Adjustments" Surveying and Land Information Systems, Vol. 50, No. 4, 1999, pp 287-297.

Golub, Gene H. and Van Loan, Charles F. 1996, "Matrix Computations" third edition, Johns Hopkins University Press.

Lawson, Charles L. and Hanson, Richard J. 1974, "Solving Least Squares Problems" Prentice-Hall.

Moler, Cleve and Morrison, Donald 1983, "Replacing Square Roots by Pythagorean Sums", IBM Journal of Research and Development, Volume 27, Number 6, November 1983.

Pissanetzky, Sergio 1984, "Sparse Matrix Technology", Academic Press

Ralston, Anthony and Rabinowitz, Phillip 1978, "A First Course in Numerical Analysis", McGraw-Hill.

Schneick, J. T. 1997, "Linear Algebra with Applications", McGraw Hill.

Strang, Gilbert 1986, "Introduction to Applied Mathematics", Wellesley-Cambridge Press.

Wolf, Paul and Ghilani, Charles 1997, "Adjustment Computations, Statistics and Least Squares in Surveying and GIS", Wiley Interscience.

Addenda, and footnotes

Addendum 1

16oct2001. After the paper was published, I encountered a paper that really should have been referenced, but that I had not known about: Golub, Gene H. and Plemmons, Robert J. "Large-Scale Geodetic Least-Squares Adjustment by Dissection and Orthogonal Decomposition" in the journal "Linear Algebra and its Applications" Volume 34, December 1980. This paper covers the subject matter (from the math point of view) in depth, and the special topics issue it is in contains several other relevant articles, most notably Alan and Michael T Heath's "Solution of Sparse Linear Least Squares Problems Using Givens Rotations".

3nov2004: That special issue was reprinted in 1981 by the publishing company North Holland as: "Large Scale Matrix Problems", Ake Bjorck, Robert J. Plemmons, and Hans Schneider editors.

Addendum 2

6nov2001. Everywhere else I converted to just counting multiplications and divisions, but I somehow left the full operation count for computing the pythagorean sums. (And, in fact, left it that way in the code that checked operation counts) So, for what it is worth, the counts for QR would have come out even better if I had been consistent.

Addendum 3

7feb2002. Several people have asked "WHY doesn't multiplying by an orthgonal matrix change the solution?". (And alternately, "why does multiplying by a non-orthogonal matrix change the solution?") It is probably worth a quick comment. If we have the system $Ax = L$, then what least squares solution is minimizing is the square of the length of the vector Ax-L. So the left hand side looks like:

   (Ax-L)^t(Ax-L)

Now, if we multipy both sides by an orthogonal matrix G then we get the system GAx-GL and the least squares problem (by Normal Equations) has a left hand side of

   (GAx-GL)^t(GAx-GL)

Since we know Orthogonal matrices are non-singular, we can factor G out of both terms...

    (G[Ax-L])^tG(Ax-L)

Noting that the transpose of a product is the product of the transposes in reverse order, we get:

    (Ax-L)^tG^tG(Ax-L)

This is exactly what the Normal Equation solution started with, except that we have G^tG as a center term. If G is orthogonal, then G^tG is the identity (by the definition of orthogonal) and identity changes nothing and can be removed. If $G$ is not orthogonal, then the solution changes by an amount related to $G^tG$, and is therefore not the same as the original.

Addendum 4

22jul2002. The matrix that just changes the sign of a row is orthogonal, so in QR it can be part of the Q. The restriction that diagonal elements of R be positive makes the QR factorization totally unique. The comment "... up to the signs of the rows ..." is there because the proof below it only really established that, and starting the problem with the matrix in spanning tree order could start with positive elements trivially anyway, so nothing more needed to be established.

Had the paper made the direct restriction to positive diagonal elements in R, and included the prior paragraph's information, I could have done away with that restriction and comment. But it didn't, and still doesn't, seem worth it.

Addendum 5

2003nov18: "I couldn't find a worst case analysis of a sparse Cholesky factorization on problems of this structure". To be honest, I can't find a sparse version of the Cholesky in the literature at all. (If you know of one, I'd be interested in being pointed at it^a.)

What the Geodesy Normal Equation folk commonly do is renumber the problem to make it narrow band, and then do a Cholesky confined to that band. This gains them most of the advantages of a sparse Cholesky factorization, without having to face the grimness of the actual sparse Cholesky problem. People processing Large datasets for Geodesy usually make use of matrix partitioning to decrease the work load.

If there was a good fast sparse Cholesky routine, and one ignored all of the overhead it had, and you had effectively infinite memory so you didn't need to worry about locality of reference and page swapping issues, the sparse Cholesky would reduce Normal Equations's operation counts for this step to be within roughly the same range as QR factorization. But Normal equations would still require you to solve two triangular systems afterwards, where QR only requires the solution of one. AND, Normal equations would still have dramaticly less numerical stability.

2007sep27 Tim Davis, of the University of Florida, wrote to inform me that there *is* a lot of literature on Sparse Cholesky, and he recommended the the sparse Cholesky in MATLAB as a good example. It is online at:
http://www.cise.ufl.edu/research/sparse/cholmod His book http://www.cise.ufl.edu/research/sparse/CSparse/ seems to be a good starting place for those interested.

Addendum 6

2007sep27 Most non-Geodesy Survey problems have relatively few redundant shots.

Addendum 7

2007oct16 Actually, to form the Upper triangular matrix the point order has to be the REVERSE of a spanning tree order. Just carelessness on my part. I'm surprised nobody caught this. More proof that nobody reads this.

Addendum 8

2007oct16 Actually the figures for Orthogonalization methods are not as bad as this in real life because dense surveys have exploitable structure. As an extreme (unrealistic) example, once a survey is fully connected with a shot from every station to every station (unrealistic in practice) a new row can be added with only order n operations.

Addendum 9

2017mar13 I've replaced the reference in the paper by a later and better paper by the same authors. This reference may be overkill, covering the whole field, when this problem needs only a small subset of it. And the matrices in this paper have easily exploitable structure.

Addendum 10

Jan 7, 2023 The example didn't show any "fillin", and I forgot to address a fillin objection that some have used mentioned as a strike against using QR factorization. Unfortunately, addressing it is somewhat lengthy, and requires some background. So, I'm just goint to give a handwaving argument here for now.

Both methods (Normal Equations and QR factorizations) are prone to "fillin" problems. That is when the final number of non-zero elements in the final problem matrix is greater than the number of non-zero elements in the original problem. And since more non-zero elements means more work (in both methods) it is considered worth while to keep the number down. BUT both methods have the problem, and Normal Equations have larger matrices.

Eccentric Math/Logic/etc.

Least Squares Network Adjustments via QR Factorization [Updated]