Module 6 - Resampling and Cross Validation

Overview

In traditional statistical modeling, model fits are evaluated using test statistics, hypothesis tests, or examination of the posterior distribution. These statistical tools are often not available for machine learning models because of their complexity. Instead, computational methods based on resampling have been developed which allow for estimation of uncertainty, out of sample accuracy (generalization), and model comparison. During this week we will begin our exploration of these tools by studying resampling, the boostrap, and cross-validation, which is one of the most crucial techniques for evaluating machine learning models.

Learning Objectives

Understand how to apply cross-validation to assess out of sample accuracy
Understand the trade-offs in different data splits
Apply the bootstrap to estimate uncertainty in predictions and parameters

Readings

ISLP (Introduction to Statistical Learning): Chapter 5

Extra Reading. If you want to be a cross-validation pro, this paper represents the state of the art in my opinion, it extends well beyond the ecology context in which it was published.

Luke Yates et al. 2022: Cross validation for model selection: A review with examples from ecology

Why no HOML recommendation? There really isn’t a specific section on cross-validation in that book. But I highly recommend that if you enjoy the book, you should search for cross-validation (include the hyphen) and read the relevant snippets which occur throughout.

Course Meetup Video

Meetup 6 Video

Vignette Videos

The code for these can be found in the following jupyter notebook: cross-validation-bootstrap-vignette.ipynb or cv-nfl-live.ipynb

Overview

Learning Objectives

Readings

Course Meetup Video

Vignette Videos

Videos

ISLP Coding Videos