ML Pipelines in sklearn

Click here to watch three videos on building a full machine learning pipeline in sklearn
Author

George I. Hagstrom

Published

February 4, 2026

I have recorded a series of 3 videos where I go through an example of building a full machine learning pipeline on a dataset of housing prices in California at the census block level. I cover the material in Chapter 2 of HOML and in particular the python notebook for chapter 2 of that book, which you can find here: End-to-End Machine Learning Project. This chapter (and the corresponding notebook) are very long and also comprehensive in terms of the topics covered. They demonstrate many of the features used in deploying a project in sklearn from start to finish (minus some production aspects). Since much of the material is advanced both with respect to the machine learning methodology and the programming concepts used, I made a shorter and less comprehensive version of the same notebook to introduce you to the basics of machine learning pipelines in sklearn. If you are reading Chapter 2, my notebook will follow the development there up until the point where the book talks about hyperparameter optimization, leaving out along the way some concepts. We will return to these more advanced concepts as we proceed through the semester.

I divided the content into three videos, which you can watch below:

  1. Train-Test Splits
  2. Data Preparation: Exploration, Cleaning, and Transformation
  3. Fitting Models and Building ML Pipelines

You can download the python notebook from the vignette here: MLPipelineVignette.ipynb