If you do a little planning, it’s easy to gather information about a business’s past performance. You can easily track web site hits, retail sales, software downloads, or enrollment in your training courses over time. You can also track expenses such as new equipment purchases, electricity use, and overtime requests over time.
Once you have that data, however, what do you do with it? Just looking at the numbers you can probably tell whether the values are increasing, decreasing, or staying about the same over time. It’s amazing how many companies leave it at that without considering questions such as, “How much are sales increasing over time?” If you could answer that question, you might be able to predict how future costs and revenue.For example, suppose the data points shown in Figure 1 represent enrollment in a training course over time. From this data it’s clear that more people are enrolling in the course as time goes on. That’s good to know but it would be even better if you could use the data to predict future enrollment. Then you could decide whether you need to hire more instructors, buy more supplies, or find more classroom space.

A Little Calculus
Suppose the best fit line has equation y = m * x + b for some values m and b. Our goal is to find m and b that minimize the sum of the squares of the distances between this line and the data points.
- 1. The partial derivative of a sum is the sum of the derivatives.
2. The partial derivative of a constant (something that doesn’t have the variable m in it) is 0.



// Find the least squares linear fit.private void FindLinearLeastSquaresFit(List points, out double m, out double b){ // Make sure we have at least two points. if (points.Count < 2) { throw new ArgumentOutOfRangeException("points", "FindLinearLeastSquaresFit: Parameter points " + "must contain at least two points."); } // Perform the calculation. // Find the values S1, Sx, Sy, Sxx, and Sxy. double S1 = points.Count; double Sx = 0; double Sy = 0; double Sxx = 0; double Sxy = 0; foreach (PointF pt in points) { Sx += pt.X; Sy += pt.Y; Sxx += pt.X * pt.X; Sxy += pt.X * pt.Y; } // Solve for m and b. m = (Sxy * S1 - Sx * Sy) / (Sxx * S1 - Sx * Sx); b = (Sxy * Sx - Sy * Sxx) / (Sx * Sx - S1 * Sxx);}
The code first checks that the input data contains at least two points. It then loops through the points calculating the S values. It finishes by plugging the S values into the equations for m and b. (Note that the code is a bit light on error handling. For example, if you enter two points with the same x coordinate, the program tries to divide by 0 and crashes.)
