How are Linear Algebraic Equations Solved?

Part 1

There are several ways to solve systems of linear algebraic equations.  Here we will focus on a sequence of methods that start with what one does in high school algebra and proceeds to methods used commonly in sophisticated computer solutions.

Let’s focus on a single small problem for illustration”

                1*x + 2 *y + 3*z = 14

                2*x + 2*y + 3*z = 15

                1*x – 2*y + 2*z  = 3

Elimination and Back Substitution

In this method we add or subtract various equation to make certain of the coefficient go to zero.  The ones we want to go to zero are the ones below the “diagonal”.  The diagonal is the set of terms underlined in this version:

                1*x + 2 *y + 3*z = 14

                2*x + 2*y + 3*z = 15

                1*x – 2*y + 2*z  = 3

We can eliminate the 2*x term by subtracting the first equation from the second one twice

(or subtracting 2 times the first equation from the second, in other words):

                1*x + 2 *y + 3*z = 14

                0*x - 2*y + -3*z = -13

                1*x – 2*y + 2*z  = 3

Then we can eliminate the 1*x term by subtracting the first equation from the third equation:

                1*x + 2 *y + 3*z = 14

                0*x - 2*y + -3*z = -13

                0*x – 4*y - 1*z  = -11

Now we need to eliminate the -4*y term.  We can do this by subtracting the second equation from the third twice:

                1*x + 2 *y + 3*z = 14

                0*x - 2*y + -3*z = -13

                0*x + 0*y + 5*z  = 15

Now you can see that we have accomplished our goal: there are only zero coefficients below the diagonal.

The next phase, called back substitution, starts by solving the last equation.  But that is easy: just divide through by 5 so get z=3!  The back substitution process says to substitute this value of z in all the equations above the third.  So we have

                1*x + 2 *y + 3*3 = 14

                0*x - 2*y + -3*3 = -13

Or

                1*x + 2 *y  = 5

                0*x - 2*y  = -4

And we see that equation number 2 is now easy to solve: y=2.

Continuing, we substitute y=2 back into the first equation to get

                1*x + 2 *2  = 5

Or

                1*x + 4  = 5

Or

                x  = 1.

So there we have it: x=1, y=2, z=3.

An important point here is that the solution was easy once the system was in the form with zero coefficients below the diagonal:

                1*x + 2 *y + 3*z = 14

                0*x - 2*y + -3*z = -13

                0*x + 0*y + 5*z  = 15

The matrix of coefficients for this system is

1

2

3

0

2

-3

0

0

5

We call such a matrix “upper triangular” for obvious reasons.  The equations would be equally easy to solve if we had eliminated variables starting at the bottom and ending up with a matrix of coefficients which is “lower diagonal” such as this one:

0

0

5

0

2

-3

1

2

3

 

We will expand on this point in the next portion of the tutorial.

Part 2

The “Elimination and Back Substitution” method discussed in Part 1 is a good pencil-and-paper way to solve a small system of linear equation (up to 3 or 4 or so).  It also is the basis for a more sophisticated method which has been used historically a great deal on computers.  It is called the LU Decomposition Method (abbreviated as just LU).  The motivation for this method originally was to enable solving several different versions of a linear equation system that only differ in the data on the “right-hand-side”.  That is, the matrix of coefficients on the left is unchanged.

Here is the idea for solving with LU is to call a math library to decompose our matrix into an LU form, then “easily” compute the solution using two phases of substitution like back substitution.  This sounds like cheating, right?  If the computer is going to do the LU decomposition, then why not just let it finish the job?  The answer is that we will do just that, but you need to understand some of these fundamentals before you will understand the more complicated calculations the computer will do later.

So what is an LU decomposition?  Given a (square) matrix A, we get back three matrices from the decomposition:

                A = P *L * U

where

                P is a permutation matrix.  This means it is an identity matrix with the rows scrambled.  That all it is. It captures the effect of the algorithm deciding that a different row than the first is the best to begin with, or similar later decisions.

                L is a “lower triangular” matrix:  that is, one with all elements above the diagonal set to zero.

                U is an “upper triangular” matrix: :  that is, one with all elements below the diagonal set to zero.  Note how similar this idea is to the first stage of the EBS method we did in Part 1: we began by changing the equations until all the elements below the diagonal were zero.

 In our case, for this system we have:

                1*x + 2 *y + 3*z = 14

                2*x + 2*y + 3*z = 15

                1*x – 2*y + 2*z  = 3

and the corresponding matrix problem, A*X=B has

A=

1

2

3

2

2

3

1

-2

2

 

and B=

14

15

3

 

The LU (or PLU) decomposition of A is

P =

0

0

1

1

0

0

0

1

0

 

L=

1

0

0

.5

1

0

.5

-1/3

1

 

U=

2

2

3

0

-3

.5

0

0

5/3

 

If you multiply L times U (please do that) you get back the A matrix except the rows are misordered.  If you then that matrix by P (with P on the left), you get back A exactly.

Forward and Back Substitution

Now lets see how we can solve the equations. 

We have

                A*X = B

Or

                P*L*U*X = B

First, we get rid of P but multiplying through by its inverse.  The inverse of a permutation matrix is just is transpose, so this is easy:

                PT*P*L*U*X = PT*B

Or

                L*U*X = PT*B

Now  PT*B is just B with its rows rearranged.  It is easy to calculate PT*B.  It is

15

3

14

 

Let’s reorganize this last equation slightly by grouping some terms.  We get

                L*(U*X) = (PT*B)

It is not really necessary, but it can help us think about the problem if we relabel the terms like this:

                L*XU = BP  where we have used XU  to stand for U*X, and BP for PT*B.

Remember that in the previous tutorial we realized that any system of the form LX=B or UX=B is really easy to solve by starting with the equation with only one non-zero coefficient and substituting that value forward (for L) or backward (for U).  So solving L*XU = BP  is easy.  That gives us the result for XU.   That is, we now know that

                U*X = XU

where we have numbers for all the coefficients in U and XU.    This system is easy to solve too by back substitution.  So now we have the desired solution, X.

So that is how you solve a (square) linear equation system if you have an LU decomposition of the matrix of coefficients.

 

Part 3

There are a lot of reasons for using a better decomposition than LU.  Instead we would like to use a Singular Value Decomposition in most cases.  The SVD for a matrix A has these properties:

·         A = U * S * VT

·         If A is of size m rows by n columns then theoretically

o   U is of size m by m

o   S is a diagonal matrix of size m by n, with at most min(m,n) non-zero elements on that diagonal

o   VT is of size n by n.

·         Both U*UT  and UT * U are identity matrices (of size m) so U and UT are inverses of each other

·         Both V*VT  and VT * V are identity matrices (of size n) so V and VT are inverses of each other

·         The diagonal elements of S are non-increasing.  That means they typically decrease, but values can repeat.

Here is how one solves A*X=B when an SVD is available:

·         We have                                              A*X=B

·         Substituting the SVD we get       U * S * VT * X = B

·         Multiply through by UT :                UT *U * S * VT * X = UT *B

·         But U*UT  = I, so                                S * VT * X = UT *B

·         But UT *B is easily computed so we can call it B’ and have

S * VT * X = B’

·         Which we relabel as                        S * X’ = B’                            (1)     because when we get X’ then computing X is easy.

·         Now since S is a diagonal matrix, it is verse is just its transpose, with each diagonal element replaced by its inverse.

·         That is,                                                  X’ = S-1*B’.

Now that seems easy… but remember that S can have some singular values which are zero, and in that case we can’t compute the elements of  S-1.  Suppose the last two singular values are zero.  Then we simply have to realize that the last two equations in (1) are impossible to satisfy (unless we are so lucky that the last two elements of B’ are also zero!).  In fact, they represent conflicting information within the system of equations.  What we have to do is just drop those two equations, which can be done by setting the last two values of S-1 to zero.  Then we have two equations with all zeros on the left side and something hopefully zero on the right, but probably not.   When we modify S-1 that way we call it a pseudo-inverse and write it as S+.  So we can continue the solution process, realizing we have made an unavoidable error which will affect the final results.

·         So we have                                         X’ = S+*B’             which is easily computed,

·         and then we have                           VT * X = X’

·         and                                                        V* VT * X = V*X’

·         and finally                                           X = V*X’

·         and it is nice to look back and see this means      X =  V* S+*B’   = V* S+* UT *B.

Example

Let’s return to the 3 by 3 matrix problem we used earlier:

                1*x + 2 *y + 3*z = 14

                2*x + 2*y + 3*z = 15

                1*x – 2*y + 2*z  = 3

The corresponding matrix problem, A*X=B has

A=

1

2

3

2

2

3

1

-2

2

 

and B=

14

15

3

 

The SVD of A is U * S * VT where

U=

0.6530

0.1881

-0.7336

0.7263

0.1189

0.6770

0.2146

-0.9749

-0.0590

 

and S=

5.627

0

0

0

2.817

0

0

0

0.6309

 

and V=

0.4123

-0.1949

0.8900

0.4139

0.9103

0.0076

0.8116

-0.3653

-0.4560

 

We want to compute X = V* S+* UT *B .

UT *B is easy to compute:

20.68

1.492

-0.2920

 

S+ is

0.1777

0

0

0

0.3550

0

0

0

1.585

 

So S+* UT *B is the product of the above matrix times the above vector and is

3.675

0.5298

-0.4628

 

And finally X= V times the above which is

1

2

3

Just as we thought!

(If there are arithmetic errors here it is due to my copying computed values into these figures!)

So if you have the SVD (that is, the U,S,and V) for a matrix A, it is very straightforward to compute a solution unless the singular values drop to zero.  But if the singular values drop very near to zero then the latter elements in S+ can then become huge, so the latter elements of S+* UT *B also becomes huge, and the final solution then has huge elements, typically with random or oscillating signs.

 

Part 4

In the previous tutorial we ended by mentioning that if some of the singular values are very small then the solution can be bad (that is, huge oscillatory values).  Let’s look at that some more.

Remember that the matrix A is assumed to be decomposed into its Singular Value Decomposition:

·         A = U * S * VT

and that the solution to A*X=B is given by

·         X = V* S+* UT *B

When some of the singular values are small then some of the elements of S+ are large, and that means the solution may have unexpectedly large values.  Let’s see an example.  We will go back to one of the example difficulties in an earlier tutorial: A user (Call him Fred) tried to solve this “simple” problem:

1*x + 1*y = 2

1*x + 1.01*y  = 3

The answer Fred got was x= -98 and y=100.  Fred was very surprised at this result as he was expecting answers not far from 1. This kind of difficulty arises in many important problems.  Let’s see how the solution proceeded to see exactly where the large numbers came up..

A=

1

1

1

1.01

 

and B=

2

3

 

The SVD of A is U * S * VT where

U=

.705

.709

.709

-0.705

 

and S=

2.005

0

0

0.0049875

 

and V=

.705

.709

.709

-0.705

(Curiously, U and V are the same here.  This is because A is symmetric.)

We want to compute X = V* S+* UT *B .

UT *B is  (please compute this if you are feeling unsure of what is happening here)

3.537

-0.6983

 

S+ is

3.537

200.5

Oops!  Here is the large number we knew would show up!

And S+* UT *B is

1.764

-140.0

Now we have a large number and a sign change… not good.

And finally X= V times the above which is

-98

100

 

Of course, Fred could have figured this out a lot quicker using a simple elimination and back substitution!  But our point is for you to see that the matrix has a quite small second singular value, and that fact leads directly to large values in the answer, with sign oscillation.

Fred really thought he should have gotten answers near 1.0, but he had no idea what was wrong with the equations.  Here are some approaches he tried to get better answers.  (All of these methods make good sense in particular real situations.  Of course, in realistic situations these systems of equations would probably be somewhat larger, or much larger.  We are using Fred’s tiny system for illustration.)

1.       The second equation looked suspicious, so Fred tried just discarding it.  Then he had just one equation:

1*x + 1*y = 2

There are many answers to this “system” of equations, such as x=0, y=2, or x=1, y=1.  But the solution with the smallest norm is x=1, y=1.  (By smallest norm we mean smallest sum of squares of  x and y.  Technically, we mean the square root of that, but the root makes no difference in this case.)  Fred liked that, but felt maybe he had gone too far just using one equation.

2.       Looking at the singular values, Fred felt that the second singular value of .0049 seemed suspicious, so he tried just setting it to zero and solving (using the pseudo-inverse).  The answer (you can compute it from the matrices above if you like) was x=1.244 and y=1.25.  Now he was getting somewhere!  This answer looked reasonable.

3.       Fred felt that the first equation was correct.  But he was unsure about the second one.  He wanted to somehow emphasize the first equation and de-emphasize the second equation, but with only two equations any attempt at scaling the equations would be useless, as the solution would be exact anyway.  He put this idea aside for now.

4.       Fred decided to try a simple “regularizing” method.  He added two equations asking x and y to be zero:

1*x + 1*y = 2

1*x + 1.01*y  = 3

1*x = 0

1*y = 0

The least-squares answer as x=0.994, y=1.004.  “Hmmm”, thought Fred.  “This looks too much like the 1st answer.  Now I remember!  The regularizing equations are supposed to be scaled much smaller than the ordinary equations!”  So he tried this:

1*x + 1*y = 2

1*x + 1.01*y  = 3

0.01*x = 0

0.01*y = 0

The answer was x=-18.5, y=20.9.  “Heavens!” thought Fred.  And he then tried several other factors.  “I can get just about any set of answers this way!” said Fred.  “How do I know which is right?”

5.       Next Fred decided to try a version of the “discrepancy” method to help pick a solution from the range of solutions he had.  He realized that the answer he liked best so far was the 2nd set, where x and y were both around 1.25.  So he substituted these values into the equations to get:

1*1.25 + 1.*1.25 = 2.5  (not 2)

1*1.25 + 1.01*1.25 = 2.51 (not 3).

So Fred felt this was suggesting that each of the right side values was off by about 0.5.  So Fred then adjusted the multiplier (usually called lambda) for the regularizing equations until the resulting root-mean-square of the residual was 0.5.  The lambda value that worked was 0.363, and the solution was x=1.186 and y=1.230.   This seemed like the best answer yet since Fred thought y should probably be the larger value.

6.       Finally, Fred realized that with these regularized equations and a value for lambda, that he could now try de-emphasizing the second equation.  So he tried this version, which scaled the second equation down by a factor of 10:

1*x + 1*y = 2

0.1*x +0. 1.01*y  = 0.3

0.363*x = 0

0.363*y = 0

The result was x and y both equal 0.85.  Fred had gone too far!  He re-tried with the second equation divided by 2 instead of 10, and got x and y = 0.94.  “This is not helping!” said Fred.  “It was better when I used the discrepancy method with a reasonable estimate for the right side error.!

7.       Last, Fred decided to try SolveSA on his original equation and see what its automatic method produced.  He didn’t fully understand the process, but he liked the answer: x=1.117, y=1.135.  The lambda value reported as a little larger than his, at 0.66, and the RMS of the residual vector was a little larger at 0.55, but then he thought “How much easier it would have been to just use an automatically regularized solver!”.

Summary

See www.rejonesconsulting.com for software that will automatically find a reasonable regularization parameter and apply it.

Also, software available there can accept general linear constraints to insure that your solution satisfies all the known criteria.