Implement Polynomial Regression in Python

Here’s a very simple tutorial on how to do Polynomial Regression using Python and scikit-learn.

Polynomial Regression is useful when your data has a curved relationship (not a straight line). It works by turning a linear model into a polynomial one by adding higher-degree features (x², x³, etc.).

1. What is Polynomial Regression?

  • It’s still linear regression under the hood.
  • We transform the input features into polynomial terms (e.g., x → [x, x², x³]).
  • Then we fit a normal LinearRegression model on these new features.

2. Install scikit-learn (if needed)

pip install scikit-learn numpy matplotlib

3. Complete Simple Example

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# ------------------- Step 1: Create sample data -------------------
np.random.seed(42)
X = np.linspace(-3, 3, 50).reshape(-1, 1)          # Input feature (50 points)
# True underlying pattern + some noise
y = 0.5 * X**3 - 1.5 * X**2 + X + np.random.randn(50, 1) * 1.5

# Flatten y for scikit-learn
y = y.ravel()

# ------------------- Step 2: Create and train the model -------------------
# Degree = 3 means we include x, x², and x³
degree = 3
model = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False),
    LinearRegression()
)

model.fit(X, y)

# ------------------- Step 3: Make predictions -------------------
X_test = np.linspace(-3, 3, 200).reshape(-1, 1)   # Smoother curve for plotting
y_pred = model.predict(X_test)

# ------------------- Step 4: Plot the results -------------------
plt.figure(figsize=(8, 6))
plt.scatter(X, y, color='blue', label='Actual data', alpha=0.7)
plt.plot(X_test, y_pred, color='red', linewidth=2, label=f'Polynomial Regression (degree={degree})')
plt.title('Polynomial Regression Example')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()

# Optional: Print the learned coefficients
poly_features = model.named_steps['polynomialfeatures']
linear_model = model.named_steps['linearregression']

print("Polynomial coefficients (from highest degree to lowest):")
print(linear_model.coef_)
print(f"Intercept: {linear_model.intercept_:.4f}")

4. How It Works (Simple Explanation)

  1. PolynomialFeatures(degree=3):
    • Transforms each input x into [x, x², x³].
    • This is the key step that makes the model able to fit curves.
  2. LinearRegression:
    • Fits a linear model on the new polynomial features.
    • Learns coefficients for each term (e.g., 0.5 for x³, -1.5 for x², etc.).
  3. make_pipeline:
    • Combines the two steps cleanly so you can call .fit() and .predict() easily.

5. How to Do Inference (Prediction)

Once the model is trained, predicting on new data is very simple:

# Example: Predict for a single new value
new_x = np.array([[2.0]])                    # Must be 2D array
prediction = model.predict(new_x)
print(f"Prediction for x=2.0: {prediction[0]:.4f}")

You can also predict on many points at once:

new_values = np.array([[-2.5], [0], [1.5]])
predictions = model.predict(new_values)

6. Tips for Better Results

  • Try different degrees:
    • Degree 1 → Linear regression
    • Degree 2 → Quadratic (parabola)
    • Degree 3 → Cubic (good for many real curves)
    • Higher degrees (≥10) can cause overfitting (wiggly curve that doesn’t generalize).
  • To experiment quickly:
for degree in [1, 2, 3, 5]:
    model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
    model.fit(X, y)
    # plot or evaluate...
  • Evaluate performance using metrics:
from sklearn.metrics import mean_squared_error, r2_score
y_pred_train = model.predict(X)
print("MSE:", mean_squared_error(y, y_pred_train))
print("R² Score:", r2_score(y, y_pred_train))

Leave a Reply