Linear least squares lets you find the line that best fits a set of data points but sometimes the points don’t lie along a line. The second part of this two-part article explains how you can use polynomial least squares to find curves that fit your data.As the first part of this article explained, sometimes you can use an advanced math concept — linear least squares — to predict the future. If you can find a line that closely fits your data, you can extend that line into the future to make predictions about what’s to come. If your data represents such things as web site traffic, retail sales, or electricity usage over time, then a good linear least squares fit may help you predict future revenue and expenses.But what if your data doesn’t fall along a neat line? Take a look at the data points plotted in Figure 1. These points clearly don’t lie in a line but they do form some sort of pattern. If you can find a curve that the points do fit, then you can use that curve to predict future values. **Figure 1. Out of Line:** These points don’t lie along a line but there is a clear pattern.If the points fall along some polynomial curve, then you can use techniques similar to those described earlier in this article to find a polynomial least squares fit. In other words, you can find a polynomial A0 + A1 * x + A2 * x2 + … + Ad * xd for values A0, A1, …, Ad to minimize the sum of the squares of the vertical distances between the points and the curve. In this equation, d is called the polynomial’s degree.Figure 2 shows a degree 2 polynomial least squares fit for the points shown in Figure 1. This curve fits the data pretty well. **Figure 2. Polynomial Perfection:** This data fits closely to a degree 2 polynomial.To find the least squares polynomial fit to the data, you need to use calculus that’s very similar to the calculus you used to find the linear least squares fit. (For more information on that calculus, see the first part of this article. You also need to use a little linear algebra. Neither of these is too hard but if you don’t want to work through them, you can skip the next few sections.

### A Little Calculus

Consider the general equation for a polynomial of degree d. A0 + A1 * x + A2 * x2 + … + Ad * xdYou need to find the values of A0, A1, A2, … Ad that minimizes the sum of the squares of the vertical distances between the points and the curve. For the ith data point, you can plug in the value xi to find the corresponding value on the curve. You can then subtract that value from the data point’s y coordinate yi and square the result to get the ith error term. Ei = (yi – (A0 + A1 * xi + A2 * xi2 + … + Ad * xid))2This is all analogous to the discussion of linear least squares in the first part of this article. In fact, if the polynomial has degree 1, this is the same equation you use for linear least squares.The next step is to take partial derivatives of this equation with respect to the values you want to find: A0, A1, A2, … Ad. That may seem like a terrifying mess but it’s really not too bad if you use the same three rules described in the first part of the article.To take the partial derivative of a function E with respect to variable A0, written ?/?A0(E) or ?E/?A0, you can use these rules: The partial derivative of a sum is the sum of the derivatives. The partial derivative of a constant (something that doesn’t have the variable A0 in it) is 0. The partial derivative of a function to a power ?/?A0(FK) equals K * (FK-1) * ?/?A0(F).If you use these three rules, you will find that the partial derivative with respect to Aj of the ith error term is: ?Ei/?Aj = 2 * (yi – (A0 + A1 * xi + … + Ad * xid)) * (-xij)= -2 * (yi * xij – A0 * xij – A1 * xij+1 – … – Ad * xij+d) Adding up the terms for the all of the data points you get: ?E/?Aj = ??Ei/?Aj = 2 * ?(yi * xij – A0 * xij – A1 * xij+1 – … – Ad * xij+d) If you pull out the Ai terms you can rewrite this as: ?E/?Aj = 2 * (?yi * xij – A0 * ?xij – A1 * ?xij+1 – … – Ad * ?xij+d)This equation is intimidating but all of the xi and yi values are coordinates of the data points so they’re just numbers. That means this equation has the following form for some constant S values.?E/?Aj = 2 * (S – A0 * S0 – A1 * S1 – … – Ad * Sd)This is a fairly simple equation with d + 1 unknowns A0, A1, … , Ad.When you take the partial derivatives with respect to Aj as j varies from 0 to d, you get d + 1 equations with d + 1 unknowns. Now you can solve the equations to find the coefficients and you’re done.For a linear least squares fit, solving 2 equations with 2 unknowns isn’t too hard. For this more general case where the degree of the polynomial might be large, however, you need a more general solution. That’s where the linear algebra comes in.### A Little Linear Algebra

Suppose you have a series of n equations with n unknown values x1, x2, … , xn: A11 * x1 + A12 * x2 + … + A1n * xn = C1A21 * x1 + A22 * x2 + … + A2n * xn = C2…An1 * x1 + An2 * x2 + … + Ann * xn = Cn (Note: To keep this discussion general and more consistent with what you’ll see in other places, the variables here are x1, x2, and so on and the known constants are A11, A12, and so forth. When we use this to find polynomial least squares fits, you’ll be solving for A1, A2, and so forth.)One way to solve this system of equations is Gaussian elimination. (Actually I use a slightly modified version of Gaussian elimination that avoids the need for back substitution. For a more rigorous discussion of Gaussian elimination, look in Wolfram MathWorld or Wikipedia.)It’s a fairly simple method but learning it from a description can be a bit tricky so after the following description I’ll work through an example that should make things clearer. You can write the system of equations as a matrix equation like this:The first step is to put a 1 in the matrix’s upper left corner. That’s easy. Simply divide each of the entries in the first row by the value currently in the upper left corner. This is called normalizing the row. If you think about it, you’ll realize that this doesn’t change the truth of the first equation. All you’re doing is dividing both sides of the equation by the same number so the equation remains true.Now you have an augmented matrix that looks like the following where ? means an entry whose value isn’t important right now. You’ll need those values later but they don’t change the basic shape of the augmented matrix.Now you repeat these steps to place a 1 in the second row’s second column and 0s in all of the other entries in that column. Continue the process until the matrix has this form:Now if you go back to the original meaning of this matrix, you can read out the xi values. Using the values in this matrix to reconstruct the first equation you get x1 + 0 + 0 + … + 0 = V1 so x1 = V1. Using the other rows in the matrix you can see that xi = Vi. There’s one odd special case that may throw a monkey wrench into this process. When you try to change the ith entry on row i into a 1, there’s a chance that the entry will have value 0. In that case, you cannot divide the other entries in that row by 0 to normalize the row.If that happens, simply swap that row with one of the following rows that does not have a 0 in this column and continue as before. (If there is no such row, then the system of equations does not have a unique solution.) If you think about what this swap means, you can see that it doesn’t change the truth of the equations. This operation corresponds to switching the original order of the equations and that certainly doesn’t change whether they are true or false.### A Gaussian Example

So how about that example? Suppose you want to find a solution to the following three equations with three unknowns:3 * x + 6 * y + 3 * z = 92 * x + 4 * y – z = -3

4 * x + 6 * y + 2 * z = 8The augmented matrix for this system of equations is:Now you can read off the solution: x = 2, y = -1, z = 3. If you plug these values into the original equations, you’ll see that they work.Hopefully by now you understand Gaussian elimination. It’s fairly simple and only requires some easy multiplication, addition, and row swapping. The only catch is that there are a lot of operations so it’s easy to make mistakes. Fortunately the computer is very good at performing these sorts of operations without making mistakes.The following code shows how the PolynomialLeastSquares example program (which is available for download in C# and Visual Basic versions) performs Gaussian elimination. I’ve put this in a separate method because it’s handy to have for solving other systems of equations not just those needed for polynomial least squares.

`// Perform Gaussian elimination on these coefficients.// Return the array of values that gives the solution.private double[] GaussianElimination(double[,] coeffs){ int max_equation = coeffs.GetUpperBound(0); int max_coeff = coeffs.GetUpperBound(1); for (int i = 0; i <= max_equation; i++) { // Use equation_coeffs[i, i] to eliminate the ith // coefficient in all of the other equations. // Find a row with non-zero ith coefficient. if (coeffs[i, i] == 0) { for (int j = i + 1; j<= max_equation; j++) { // See if this one works. if (coeffs[j, i] != 0) { // This one works. Swap equations i and j. // This starts at k = i because all // coefficients to the left are 0. for (int k = i; k <= max_coeff; k++) { double temp = coeffs[i, k]; coeffs[i, k] = coeffs[j, k]; coeffs[j, k] = temp; } break; } } } // Make sure we found an equation with // a non-zero ith coefficient. double coeff_i_i = coeffs[i, i]; if (coeff_i_i == 0) { throw new ArithmeticException(String.Format( "There is no unique solution for these points.", coeffs.GetUpperBound(0) - 1)); } // Normalize the ith equation. for (int j = i; j <= max_coeff; j++) { coeffs[i, j] /= coeff_i_i; } // Use this equation value to zero out // the other equations' ith coefficients. for (int j = 0; j<= max_equation; j++) { // Skip the ith equation. if (j != i) { // Zero the jth equation's ith coefficient. double coef_j_i = coeffs[j, i]; for (int d = 0; d <= max_coeff; d++) { coeffs[j, d] -= coeffs[i, d] * coef_j_i; } } } } // At this point, the ith equation contains // 2 non-zero entries: // The ith entry which is 1 // The last entry coeffs[max_coeff] // This means Ai = equation_coef[max_coeff]. double[] solution = new double[max_equation + 1]; for (int i = 0; i <= max_equation; i++) { solution[i] = coeffs[i, max_coeff]; } // Return the solution values. return solution;}`

### Back to Polynomial Least Squares

That was a long digression from polynomial least squares so let’s review where we were when the digression began. To find the best fit polynomial, you found that the partial derivative of the polynomial with respect to Aj is:?E/?Aj = 2 * (?yi * xij – A0 * ?xij – A1 * ?xij+1 – … – AD * ?xij+D)To find the best solution, you set this equation and the partial derivatives for the other Ai equal to 0 and solve for the Ai values. Now that you know how to perform Gaussian elimination, this is a snap. The only tricky part is figuring out what coefficients to use in the equations that you send into the Gaussian elimination method.The following code shows how the PolynomialLeastSquares example program builds the coeffs array holding the augmented matrix that it passes to the GaussianElimination method.`// Find the least squares polynomial fit.private double[] FindPolynomialLeastSquaresFit( List` points, int degree){ // Allocate space for (degree + 1) equations with // (degree + 2) terms each. double[,] coeffs = new double[degree + 1, degree + 2]; // Calculate the coefficients for the equations. for (int j = 0; j <= degree; j++) { // Calculate the coefficients for the jth equation. // Calculate the constant term for this equation. coeffs[j, degree + 1] = 0; foreach (PointF pt in points) { coeffs[j, degree + 1] -= Math.Pow(pt.X, j) * pt.Y; } // Calculate the other coefficients. for (int a_sub = 0; a_sub <= degree; a_sub++) { // Calculate the dth coefficient. coeffs[j, a_sub] = 0; foreach (PointF pt in points) { coeffs[j, a_sub] -= Math.Pow(pt.X, a_sub + j); } } } // Solve the equations. return GaussianElimination(coeffs);}

By using programs such as PolynomialLeastSquares, you can get a better understanding of your data. You can find the best fit of a given degree and then judge whether the data points fit that polynomial well.

Ideally if you know something about the process that generated the data, you may be able to figure out what degree to use for the polynomial, but if you can't, beware of fitting data to high degree polynomials. A degree d polynomial can exactly fit d + 1 points so if you use too high a degree you'll get a misleadingly exact match.For example, Figure 3 shows a degree 9 polynomial exactly fitting 10 data points. The curve fits the points exactly but that doesn't mean it's the best fit or the most useful one.**Figure 3. Too Tight a Fit:**A high-degree polynomial can fit most any data, possibly obscuring its true structure.Figure 4 shows the same points fitted by a degree 2 polynomial. Although this doesn't fit the points exactly, it's probably is a better representation of the data than the previous curve.For example, suppose you want to predict the y value when x is close to 0. The polynomial in Figure 4 would give you a value close to the bottom of the graph while the polynomial in Figure 3 would give you a value way off the top. Which is really better depends on your situation but unless something really complex is happening, the degree 2 curve is probably a better bet.

**Figure 4. Relaxed Fit:**A degree 2 polynomial probably fits this data better than a degree 9 polynomial even though it doesn't fit the points exactly.You can use techniques similar to those described here to find other kinds of least squares fits, too. For example, depending on your data you may want to use a logarithmic curve of the form y = a + b * ln(x), a power curve of the form y = a * xb, or an exponential curve of the form y = a * ebx. For more information on these kinds of curves, see Wolfram MathWorld.