DATA 622 Meetup 2: The Bias-Variance Tradeoff

George I. Hagstrom

2026-02-03

Recap: The Learning Problem

Goals

Model Fit versus Generalization
Bias-Variance Tradeoff

Weekly Tasks

Lab 1 due Sunday at midnight
Reading: ISLP 2.2-2.4
Coding Vignette: Chapter 2 of HOML in sklearn
Keep posting ideas and finding team-mates in Slack

Assessing Model Accuracy

In regression, measures like mean square error: \[ \mathrm{MSE} = \frac{1}{n}\sum_{i=1}^n (y_i - g(\mathbf{x}_i)^2) \]
or \(R^2:\) \[ R^2 = 1 - \frac{\mathrm{MSE}}{\mathrm{var(y)}} \] Are used to assess model accuracy

Question:

Suppose we have trained a model which has high accuracy on the training data. Are we done?

But what we really care about is whether we can extrapolate assessed accuracy to unseen examples

Generalization

Generalization is defined as the ability of a model to maintain its accuracy on observations outside of its training

When a model has high in-sample accuracy, it is not guaranteed that it performs well out of sample

Overfitting

Overfitting is a phenomenon that causes bad generalization
Consider the following dataset:

Models of Increasing Complexity

Linear Model: \(g(x) = g_0 + g_1 x\)

Models of Increasing Complexity

Quadratic Model: \(g(x) = g_0 + g_1 x + g_2 x^2\)

Models of Increasing Complexity

5th Degree fit: \(g(x) = \sum_{i=0}^5 g_i x^i\)

Models of Increasing Complexity

10th Degree fit: \(g(x) = \sum_{i=0}^{10} g_i x^i\)

Models of Increasing Complexity

25th Degree fit: \(g(x) = \sum_{i=0}^{25} g_i x^i\)

Models of Increasing Complexity

50th Degree fit: \(g(x) = \sum_{i=0}^{50} g_i x^i\)

Overfitting

At low flexibility, in sample error was high
At mid flexibility, in sample error dropped, pattern approximated
At high flexibility, in sample error was zero

Training versus Testing Error

Generalization Gap at high complexity

Irreducible Error

Target function \(f(\mathbf{x})\) encodes all the information about \(y\) contained in the variables \(\mathbf{x}\)

\[ y = f(\mathbf{x}) + \epsilon \]

\(\epsilon\) is called the irreducible error
It accounts for other variables that are not measured and randomness
We have \(E(\epsilon) = 0\)

Reducible Error

When fitting a model \(g\) to the data, there are two sources of error: \[ E((g-y)^2) = E((g-f)^2) - \mathrm{var}(\epsilon) \]
The \(E((g-f)^2)\) term is the reducible error
Total error is a sum of reducible and irreducible errors

Why do complex models overfit?

Hypothetical scenario: study the performance of a model on a repeated learning task

Why do complex models overfit?

Each fit will be compared to the target function \(f\)

Bias Variance Tradeoff

Can look at the “average” fit model

Why do complex models overfit?

Bias is the distance from average fit to target

Why do complex models overfit?

Variance is how much individual fits vary from average

Bias-Variance Tradeoff

Expected out of sample error is sum of squared bias, variance, and irreducible error

Bias-Variance Tradeoff

Bias and variance trade-off

Bias-Variance Tradeoff

Bias and Variance Tradeoff

Learning Curves

Simple versus complex models as amount of data increases

Learning Curves

Simple versus complex models as amount of data increases

What Complexity to Pick?

Model complexity dictated by your data more so than the complexity of the phenomenon!

What Complexity to Pick?

Model complexity dictated by your data more so than the complexity of the phenomenon!

Classification Accuracy

Switch gears to classification
Now \(y\) is a class label
predictors \(\mathbf{x}\) can still be continuous or discrete

Classification Accuracy

\[ \mathrm{Error} = \frac{1}{n}\sum_{i=1}^n I(g(\mathbf{x}_i)\neq y_i) \]

Here \(I=1\) if \(g(\mathbf{x_i}\neq y_i)\) and \(I=0\) if \(g(\mathbf{x}_i=y_i)\)
This counts the number of times the prediction is the wrong class

Bayes Classifier

Conditional probability of \(y\) given \(\mathbf{x}\): \[ P(y|\mathbf{x}) \]

This is probability of class given characteristics
- Probability of default given balance, income
- Probability of hall of fame career given college stats
- Probability of disease given medical tests

Bayes Classifier

Best prediction is to pick class with highest chance: \[ y_{\mathrm{Bayes}}(\mathbf{x}) = \mathrm{argmax}_{y} P(y|\mathbf{x}) \]
Called Bayes Classifier or Bayes Decision Rule

Applying Bayes Decision Rule

We don’t generally know the probabilities
Classification models often approximate them
Often the decision rule is basically an extension of Bayes Decision assuming good probabilities

kNN Model

Very simple non-parametric classification model is called k-nearest-neighbors \[ g(\mathbf{x})_{kNN} = \mathrm{argmax}_{y}\sum_{\mathbf{x}_i \in N_k(\mathbf{x})} I(y\neq y_i) \]
Look at the \(k\) nearest points to \(\mathbf{x}\)
Pick the \(y\) occuring most often

kNN Model

Here is an example for \(k=3\).

kNN Question

What do you think happens when \(k\) is very small?
What about when \(k\) is very big?

kNN Model

Classification problem with two classes
Boundary is the border between 50% probability of blue

kNN Model

Decision boundary for \(k=10\)

kNN Model Over and Under fitting

\(k=1\) corresponds to overfitting
\(k=100\) corresponds to underfitting

kNN Model Generalization Error

\(1/k\) corresponds to model complexity
Optimal out of sample accuracy at intermediate \(k\)

DATA 622 Meetup 2: The Bias-Variance Tradeoff

Recap: The Learning Problem

Goals

Weekly Tasks

Assessing Model Accuracy

Question:

Generalization

Overfitting

Models of Increasing Complexity

Models of Increasing Complexity

Models of Increasing Complexity

Models of Increasing Complexity

Models of Increasing Complexity

Models of Increasing Complexity

Overfitting

Training versus Testing Error

Irreducible Error

Reducible Error

Why do complex models overfit?

Why do complex models overfit?

Bias Variance Tradeoff

Why do complex models overfit?

Why do complex models overfit?

Bias-Variance Tradeoff

Bias-Variance Tradeoff

Bias-Variance Tradeoff

Learning Curves

Learning Curves

What Complexity to Pick?

What Complexity to Pick?

Classification Accuracy

Classification Accuracy

Bayes Classifier

Bayes Classifier

Applying Bayes Decision Rule

kNN Model

kNN Model

kNN Question

kNN Model

kNN Model

kNN Model Over and Under fitting

kNN Model Generalization Error

Thanks!