Understanding the Return Value of np.polynomial.Polynomial.fit when full=True: Why Residual Values Are Always Arrays

Understanding the Return Value of np.polynomial.Polynomial.fit when full=True

===========================================================

In the NumPy module, np.polynomial.Polynomial.fit is a function used to fit a polynomial curve to a set of data points. When calling this function with full=True, it returns an object containing various values related to the fitting process. In this article, we’ll explore why the residual value returned by Polynomial.fit when full=True is always an array, even if it’s just a single number.

The Fitting Process: An Overview

To understand why Polynomial.fit returns an array for the residual value, let’s first take a look at how this function works. When you call Polynomial.fit, it essentially calls the lstsq (Least Squares) algorithm from NumPy’s linear algebra module to find the coefficients of the polynomial curve that best fits the data.

The Role of lstsq

The lstsq function is a powerful tool for finding the coefficients of a linear combination of columns in a matrix. In the context of Polynomial.fit, it’s used to fit the polynomial curve to the data points.

Let’s consider an example with a simple polynomial curve: y = 12 + x + 3*x^2 + 0.5*x^3. We can represent this as a matrix equation:

M * coefficients = y

where M is a matrix containing the basis polynomials (1, x, x^2, x^3), and coefficients is a vector of unknowns.

The Return Values of Polynomial.fit

When you call Polynomial.fit with full=True, it returns an object containing several values:

The fitted polynomial curve itself (coefficients)
The sum of squared residuals (residuals)
The rank of the matrix M (rank)
Singular values of the matrix M (singular_values)

The residual value is calculated by subtracting the predicted y-values from the actual y-values.

Why Is the Residual Value Always an Array?

Now, let’s go back to the question: why is the residual value returned as an array when full=True, even if it’s just a single number?

The answer lies in how NumPy implements lstsq. When you call lstsq with two or more sets of y-values, it returns an array containing the sum of squared residuals for all sets.

However, when you only have one set of y-values (i.e., full=False), the residual value is calculated as a single number.

An Artificial Example

To illustrate this further, let’s create an artificial example using NumPy:

import numpy as np

# Create some data points
x = np.linspace(-1, 1, 11)
y = 12 + x + 3*x**2 + 0.5*x**3

# Create the matrix M and y array
M = np.array([np.ones_like(x), x, x**2, x**3]).T
y_array = np.array([y])

# Call Polynomial.fit with full=True
coefficients, residuals, _, _ = np.polynomial.Polynomial.fit(x, y, 3, full=True)

print(residuals)

This code creates a simple polynomial curve and fits it to the data points using Polynomial.fit. When calling this function with full=True, we can see that the residual value is returned as an array.

Conclusion

In conclusion, the residual value returned by np.polynomial.Polynomial.fit when full=True is always an array, even if it’s just a single number. This is because NumPy’s implementation of lstsq returns an array containing the sum of squared residuals for all sets of y-values.

While this might seem counterintuitive at first, understanding why this happens can help us better appreciate the inner workings of these libraries and make more informed decisions when working with them in our own projects.

Last modified on 2023-09-14