
2026-01-26
slack channel ‘#data-622-spring-2026’slack slack channelpython and quartopdfpython is primary language of the coursepython/haven’t taken DATA 602 or 608
Abstract
In the age of generative AI and ubiquitous digital tools, human cognition faces a structural paradox: as external aids become more capable, internal memory systems risk atrophy. Drawing on neuroscience and cognitive psychology, this paper examines how heavy reliance on AI systems and discovery-based pedagogies may impair the consolidation of declarative and procedural memory – systems essential for expertise, critical thinking, and long-term retention. We review how tools like ChatGPT and calculators can short-circuit the retrieval, error correction, and schema-building processes necessary for robust neural encoding. Notably, we highlight striking parallels between deep learning phenomena such as “grokking” and the neuroscience of overlearning and intuition. Empirical studies are discussed showing how premature reliance on AI during learning inhibits proceduralization and intuitive mastery. We argue that effective human-AI interaction depends on strong internal models – biological “schemata” and neural manifolds – that enable users to evaluate, refine, and guide AI output….



| Date | Module | Main Deliverables | ||
|---|---|---|---|---|
| Jan 26 | Introduction to Machine Learning | |||
| Feb 2 | Bias-Variance Trade-Off | Lab 1 | ||
| Feb 9 | The Linear Model | Lab 2 | ||
| Feb 16 | Classification | |||
| Feb 23 | Generative Classification Models and Class Imbalance | Lab 3 | ||
| Mar 2 | Resampling and Cross-Validation | Project Proposal | ||
| Mar 9 | Regularization and Model Selection | Lab 4 | ||
| Mar 16 | Tree Models | |||
| Mar 23 | Ensemble Models | Lab 5 | ||
| Mar 30 | Causal Inference | Minimal Viable Product Demo | ||
| Apr 6 | No Meetup (Spring Break) | |||
| Apr 13 | Model Interpretation, Communication, and Ethics | Lab 6 | ||
| Apr 20 | Neural Networks | |||
| Apr 27 | Deep Learning | Lab 7 | ||
| May 4 | Unsupervised Learning | |||
| May 11 | Pretrained Models | Lab 8, Final project Writeup and Demo |
ISLPISLP and vignette videoProblem: Fraudulent transactions are costly
Can be several % of total revenue, billions of dollars
Problem: Fraudulent transactions are costly
Solution: Fradulent transactions are different


Can create a formula: \[ y = f(\mathbf{x}) \]

Can create a formula: \[ y = f(\mathbf{x}) = \mathrm{sign}\left(w_1x_1 + w_2x_2 + w_3x_3 + \cdots + w_n x_n \right) \]

Can create a formula: \[ y = f(\mathbf{x}) = \mathrm{sign}\sum_{i=1}^n w_ix_i \]


Statistical Rethinking

\[ \mathbf{F} = -\frac{Gm_1m_2\hat{\mathbf{r}}}{\|\mathbf{r}\|^2} \]
This is the one dealbreaker here. Without data there is no possibility for machine learning.



Your entire goal is maximizing the accuracy of your prediction of \(y\). Understanding/insight about how \(y\) is determined by each variable in \(x\) is not important
Example: Image recognition, text translation, fraud detection
Predictive models are not necessarily pure black boxes. People will care to know how something works when $ on the line
Example: What is the relationship between having a doorman and rent?
2b. Causal Inference: What will happen to \(y\) if I take a certain action?
Example: A/B testing, randomized controlled trials, if I hire a doorman for my building how much can I increase rent?
Example: Based on the observed fish stocks, water temperature, and ocean productivity, what should be the allowable fish catch?
ISLP
alphago, alpha zero, fine tuning of LLMs
DATA 622