Solving Ill-Conditioned Systems with the Solve* Programs

The solve* programs may solve a set of equations in up to seven ways. (The set of programs currently offered do not include all the methods at this time. But I you would like to try a method not currently delivered, email me a request.) The reason for this is that we cannot know for sure which is most appropriate for your problem. Here is a description of each way.

In each case, the problem we are solving is the ordinary equations portion of the three-way problem:

Ax = b

where A has m rows and n columns, and the SVD of A is

A=USV^T.

Note: In ALL CASES, we begin by scaling the rows of the matrix A to unit norm, and carry along the right hand side values and the error estimates, if present. Then the rest of the solution process is as described:

CLS, or Classical Least Squares. This method solves the equations exactly as described in Solution Method. No use is made of any error estimates, if provided.

WLS, or Weighted Least Squares. This method picks the resulting median error estimate from the scaled error estimates. Each equation is then scaled to have its error estimate exactly equal to that median. The equations are then solved exactly as described in Solution Method.

DIS, or Discrepancy method. This is the first of the regularized methods. In the regularized methods we (usually) augment the ordinary equations with a scaled set of equations which ask for the solution to be all zeros. That is, Ax=b becomes the augmented system:

(A)x=b (m rows by n columns)

(λI)x=0 (n rows by n columns; that is, I is an identity matrix of size n by n)

So we add n rows to A, each of which has all zeros except for lambda in one place. The right hand side vector b is augmented by n zeros. We relabel this augmented system as A’ x= b’. When we pick a value of λ and solve, we get a solution x_λ which has greater than the minimum least-squares residual. This residual vector is

res(λ) = A x_λ –b.

Generally, res(λ) is an increasing function of λ. The discrepancy method adjusts λ until the norm of res(λ) is equal to the norm of the vector of error estimates. That is, we have induced a discrepancy equal to the expected error vector.

Now, to perform the three-way system solution with DIS, we follow the description in Solution Method except that instead of solving the ordinary equations with x_a= RTPI(A_o)*b_o, we compute x_a using this discrepancy method, with the given set of error estimates.

AUT, or Picard-Jones Auto-regularized method. This method ignores the given error estimates, if any, and finds a value for λ another way. (So it applies even when no error estimates are available.) The Picard Condition says that for a good solution one should expect the coefficients in P = S^-1U^Tb to decline. The general reason for this is that the elements of P become the coefficients of an orthogonal expansion for the solution: x = V*P. (That is, x is a sum of scaled columns of V, the right set of singular vectors.) In orthogonal expansions, such as Fourier series or other classical mathematical expansions, the coefficients form a sort of “spectrum”, and the “high frequency” terms generally decline toward zero after some point. This is not a very exact rule. The approach is fairly straightforward: one computes P, and analyses it for a low point, followed by a significant rise. Traditionally the low point is found using a moving average centered on each successive element of P. However, the elements of P tend to oscillate. So such an odd moving average tends to echo that oscillation. (If three elements are being averaged, for example, one sum will have two large and one small value, and the next will have two small and one large value.) So in our method we use an even number of values in the moving average. We then find the minimum of the moving averages, and check to see if there is a significant rise following that point. If so, then we do a secondary search within the minimum moving average interval (say P_i, for i from j to k) by starting at the right end (P_k) and backing up as long as the earlier element of P is smaller (in absolute value) than its successor. We stop at the beginning of this particular range of elements, or when the preceding element of is larger than P_i. We call r=i the “usable rank”. Then we compute the truncated SVD solution,x_r= VS^-1U^Tb where S_i is set to zero (or infinity, if you prefer that terminology) after s_r. We then compute the residual vector associated with x_r_:: res(r) = A x_r -b. We then estimate that the average error (or standard deviation, or sigma) in the original b_iis norm(res(r))/sqrt(m-r). A key trick here is that final division by sqrt(m-r). Think of this estimated error as an adjusted root-mean-square value. If we divided by sqrt(m) the formula would be exactly the RMS of the residual vector. It easy to set up a sample ill-conditioned problem with random errors in the RHS having a standard deviation of a known amount, s. When it is solved with this method the resulting estimated average error will be dramatically closer to s than it would be without the adjustment of dividing by sqrt(m-r). The use of an even number of elements in the moving average, plus the secondary search for a precise usable rank, plus the special adjusted RMS error estimate, are the main meaning of the “Jones” addition in “Picard-Jones” method.

The following methods are not currently used in any Solve* program on our web site:

PIC, or Simple Picard Method. The purpose of this method is to detect quite mild ill-conditioning which can be missed by the more stringent requirements of the AUT method. If AUT detects ill-conditioning, then PIC is considered irrelevant and not done. PIC is a much simpler method. We just compute the Picard condition vector, P, as follows:

P_I = (1/S_i) * (U^Tb)_I if S_i> EPS(A), or else zero.

Then we look to see if the elements of P are rising at the end of the vector. If so, we look backward from the last element of P for the first element of P for which the preceding element is larger. That is, we look for the first local minimum counting backward from the last element. The position of the element becomes the usable rank, r. To solve the system, we then just do a truncated SVD solution keeping r singular values.

MER, or Matrix Error Method. This method only applies if the user gives a positive value for the estimated percentage error in each element of A. The idea here is to find if some of the latter singular values represent so little to the matrix A that they can be discarded. Here is how we make that decision based on the provided estimated percent error in the coefficient, e. Take a vector Z (of length equal to the number of singular values) and set its elements to zero. Then, set its last element equal to the last singular value. Build the matrix B=UZV^T. Compare the average absolute value of B to the average absolute value of A. If

avg_abs_val(B) < .01*e*avg_abs_val(A)

than this singular value can be neglected. Then, do the calculation over but set the last two elements of Z equal to the last two singular values. If the above condition still holds, then the last two singular values can be neglected. Repeat until the above condition fails. We have then gone one step too far, so we know exactly how many singular values must be kept. This number becomes our usable rank, r. To solve the system, we then just do a truncated SVD solution keeping r singular values.

RNK, or User Given Rank Method. In this method we interactively ask the user what usable rank they would like to try, and do a truncated SVD solution keeping that many singular values.