Course Information
- Instructor
- Nicholas Tomlin
- Time
- Tues & Thurs, 2:00–3:20 PM CT
- Location
- TBD
Please Note: This syllabus is under construction, and is subject to change!
This course covers the fundamentals of natural language processing, with a focus on language models and reinforcement learning. The central organizing question for this course is: what objective are we optimizing, and how do we efficiently optimize it? We'll cover everything from next-token prediction to reinforcement learning from human feedback, with a focus on the theoretical foundations and practical details behind today's language models. Students will implement language models from scratch and build a modern post-training stack.
Schedule
| Wk | Date | Topic | Materials |
|---|---|---|---|
| Block I — Next-token prediction: the language modeling objective | |||
| I | Tue, Jan 5 | The language modeling objective; a simple n-gram model | |
| Thu, Jan 7 | Tokenization; learning vector representations of words | ||
| II | Tue, Jan 12 | RNNs and vanishing gradients; LSTMs | |
| Thu, Jan 14 | Sequence-to-sequence modeling; evaluation metrics and BLEU score; attention | ||
| III | Tue, Jan 19 | The transformer architecture; scaling laws | |
| Thu, Jan 21 | Sampling strategies; retrieval; alternative architectures | ||
| Block II — Verifiable rewards: optimizing for correctness | |||
| IV | Tue, Jan 26 | Filtered behavior cloning; weakly supervised semantic parsing; STaR | |
| Thu, Jan 28 | A crash-course intro to RL; REINFORCE | ||
| V | Tue, Feb 2 | Reasoning models; GRPO | |
| Thu, Feb 4 | Midterm Examination | ||
| Block III — Learned reward models: beyond verifiable rewards | |||
| VI | Tue, Feb 9 | RLHF: preference data collection, Bradley-Terry, reward modeling; PPO | |
| Thu, Feb 11 | DPO and offline RLHF; learning from rubrics; process reward models | ||
| VII | Tue, Feb 16 | Distillation; context distillation; self-distillation | |
| Thu, Feb 18 | Additional considerations: LoRA, asynchronous RL, etc. | ||
| Block IV — Listener models: optimizing for interaction | |||
| VIII | Tue, Feb 23 | Computational pragmatics; the Rational Speech Acts framework | |
| Thu, Feb 25 | Training language models with user simulators | ||
| IX | Tue, Mar 2 | LLM agents; tool use; vision-language models | |
| Block V — Objective gaming and misspecification: when objectives break down | |||
| IX | Thu, Mar 4 | Goodhart’s Law; reward hacking; open problems in AI safety | |
| X | Thu, Mar 11 | Final Examination | |
Assignments
Four coding assignments and two examinations.
| Assessment | Description | Due |
|---|---|---|
| Assignment I | Implement an n-gram language model and word2vec | End of Week II |
| Assignment II | Implement a transformer language model | End of Week IV |
| Midterm | — | Week V |
| Assignment III | Implement GRPO for arithmetic problems | End of Week VI |
| Assignment IV | Implement a reward model based on human preference data | End of Week VIII |
| Final | — | Week X |