How are Linear Algebraic Equations Solved?

Part 4

In the previous tutorial we ended by mentioning that if some of the singular values are very small then the solution can be “bad”.  Let’s look at that some more.

Remember that the matrix A is assumed to be decomposed into its Singular Value Decomposition:

·         A = U * S * VT

and that the solution to A*X=B is given by

·         X = V* S+* UT *B

When some of the singular values are small then some of the elements of S+ are large, and that means the solution may have unexpectedly large values.  Let’s see an example.  We will go back to one of the example difficulties in an earlier tutorial: A user (Call him Fred) tried to solve this “simple” problem:

1*x + 1*y = 2

1*x + 1.01*y  = 3

The answer Fred got was x= -98 and y=100.  Fred was very surprised at this result as he was expecting answers not far from 1. This kind of difficulty arises in many important problems.  Let’s see how the solution proceeded to see exactly where the large numbers came up..

A=

1

1

1

1.01

 

and B=

2

3

 

The SVD of A is U * S * VT where

U=

.705

.709

.709

-0.705

 

and S=

2.005

0

0

0.0049875

 

and V=

.705

.709

.709

-0.705

(Curiously, U and V are the same here.  This is because A is symmetric.)

We want to compute X = V* S+* UT *B .

UT *B is  (please compute this if you are feeling unsure of what is happening here)

3.537

-0.6983

 

S+ is

3.537

200.5

Oops!  Here is the large number we knew would show up!

And S+* UT *B is

1.764

-140.0

Now we have a large number and a sign change… not good.

And finally X= V times the above which is

-98

100

 

Of course, Fred could have figured this out a lot quicker using a simple elimination and back substitution!  But our point is for you to see that the matrix has a quite small second singular value, and that fact leads directly to large values in the answer, with sign oscillation.

Fred really thought he should have gotten answers near 1.0, but he had no idea what was wrong with the equations.  Here are some approaches he tried to get better answers.  (All of these methods make good sense in particular real situations.  Of course, in realistic situations these systems of equations would probably be somewhat larger, or much larger.  We are using Fred’s tiny system for illustration.)

1.       The second equation looked suspicious, so Fred tried just discarding it.  Then he had just one equation:

1*x + 1*y = 2

There are many answers to this “system” of equations, such as x=0, y=2, or x=1, y=1.  But the solution with the smallest norm is x=1, y=1.  (By smallest norm we mean smallest sum of squares of  x and y.  Technically, we mean the square root of that, but the root makes no difference in this case.)  Fred liked that, but felt maybe he had gone too far just using one equation.

2.       Looking at the singular values, Fred felt that the second singular value of .0049 seemed suspicious, so he tried just setting it to zero and solving (using the pseudo-inverse).  The answer (you can compute it from the matrices above if you like) was x=1.244 and y=1.25.  Now he was getting somewhere!  This answer looked reasonable.

3.       Fred felt that the first equation was correct.  But he was unsure about the second one.  He wanted to somehow emphasize the first equation and de-emphasize the second equation, but with only two equations any attempt at scaling the equations would be useless, as the solution would be exact anyway.  He put this idea aside for now.

4.       Fred decided to try a simple “regularizing” method.  He added two equations asking x and y to be zero:

1*x + 1*y = 2

1*x + 1.01*y  = 3

1*x = 0

1*y = 0

The least-squares answer as x=0.994, y=1.004.  “Hmmm”, thought Fred.  “This looks too much like the 1st answer.  Now I remember!  The regularizing equations are supposed to be scaled much smaller than the ordinary equations!”  So he tried this:

1*x + 1*y = 2

1*x + 1.01*y  = 3

0.01*x = 0

0.01*y = 0

The answer was x=-18.5, y=20.9.  “Heavens!” thought Fred.  And he then tried several other factors.  “I can get just about any set of answers this way!” said Fred.  “How do I know which is right?”

5.       Next Fred decided to try a version of the “discrepancy” method to help pick a solution from the range of solutions he had.  He realized that the answer he liked best so far was the 2nd set, where x and y were both around 1.25.  So he substituted these values into the equations to get:

1*1.25 + 1.*1.25 = 2.5  (not 2)

1*1.25 + 1.01*1.25 = 2.51 (not 3).

So Fred felt this was suggesting that each of the right side values was off by about 0.5.  So Fred then adjusted the multiplier (usually called lambda) for the regularizing equations until the resulting root-mean-square of the residual was 0.5.  The lambda value that worked was 0.363, and the solution was x=1.186 and y=1.230.   This seemed like the best answer yet since Fred thought y should probably be the larger value.

6.       Finally, Fred realized that with these regularized equations and a value for lambda, that he could now try de-emphasizing the second equation.  So he tried this version, which scaled the second equation down by a factor of 10:

1*x + 1*y = 2

0.1*x +0. 1.01*y  = 0.3

0.363*x = 0

0.363*y = 0

The result was x and y both equal 0.85.  Fred had gone too far!  He re-tried with the second equation divided by 2 instead of 10, and got x and y = 0.94.  “This is not helping!” said Fred.  “It was better when I used the discrepancy method with a reasonable estimate for the right side error.!

7.       Last, Fred decided to try SolveSA on his original equation and see what its automatic method produced.  He didn’t fully understand the process, but he liked the answer: x=1.117, y=1.135.  The lambda value reported as a little larger than his, at 0.66, and the RMS of the residual vector was a little larger at 0.55, but then he thought “How much easier it would have been to just use an automatically regularized solver!”.

Summary

So if you have the SVD (that is, the U,S,and V) for a matrix A, it is very straightforward to compute a solution unless the singular values drop to near zero.   Then the latter elements in S+ become large, leading to a final solution then has undesirably large elements, typically with nearly oscillating signs.