Module 6 - Resampling and Cross Validation
Overview
In traditional statistical modeling, model fits are evaluated using test statistics, hypothesis tests, or examination of the posterior distribution. These statistical tools are often not available for machine learning models because of their complexity. Instead, computational methods based on resampling have been developed which allow for estimation of uncertainty, out of sample accuracy (generalization), and model comparison. During this week we will begin our exploration of these tools by studying resampling, the boostrap, and cross-validation, which is one of the most crucial techniques for evaluating machine learning models.
Learning Objectives
- Understand how to apply cross-validation to assess out of sample accuracy
- Understand the trade-offs in different data splits
- Apply the bootstrap to estimate uncertainty in predictions and parameters
Readings
- ISLP (Introduction to Statistical Learning): Chapter 5
Extra Reading. If you want to be a cross-validation pro, this paper represents the state of the art in my opinion, it extends well beyond the ecology context in which it was published.
- Luke Yates et al. 2022: Cross validation for model selection: A review with examples from ecology
Why no HOML recommendation? There really isn’t a specific section on cross-validation in that book. But I highly recommend that if you enjoy the book, you should search for cross-validation (include the hyphen) and read the relevant snippets which occur throughout.
Course Meetup Video
Vignette Videos
The code for these can be found in the following jupyter notebook: cross-validation-bootstrap-vignette.ipynb or cv-nfl-live.ipynb