CMU 15-779: Advanced Topics in Machine Learning Systems (LLM Edition)

Course Overview

University: Carnegie Mellon University
Prerequisites: No strict prerequisites; an intro ML background and hands-on deep learning training experience are recommended; familiarity with PyTorch helps; basic CUDA/GPU knowledge will significantly improve the learning curve
Programming Language: Python (systems and kernel-level topics involve CUDA/hardware concepts)
Course Difficulty: 🌟🌟🌟🌟
Estimated Study Hours: 80-120 hours

This course takes a systems-first view of modern machine learning and LLM infrastructure. The core question it repeatedly answers is: how does a model written in a high-level framework (e.g., PyTorch) get decomposed into low-level kernels, and how is it executed efficiently on heterogeneous accelerators (GPUs/TPUs) and in distributed environments. The syllabus covers GPU programming, ML compilers, graph-level optimizations, distributed training and auto-parallelization, and LLM serving and inference acceleration. It is a strong fit if you want to connect “framework-level experience” with “kernels, compilation, hardware, and cluster execution.”

The workload is organized around consistent pre-lecture reading assignments (paper reviews) and a team-based final course project (proposal, presentation, report). For self-study, it is best to follow the schedule week by week rather than treating it as a slide-only course.

Topics Covered

The course is structured as lectures, with major themes including:

ML systems fundamentals via TensorFlow/PyTorch (abstractions, execution models)
GPU architecture and CUDA programming (memory, performance tuning)
Transformer and attention case studies (FlashAttention and IO-aware attention)
Advanced CUDA techniques (warp specialization, mega kernels)
ML compilation (tile-based DSLs like Triton, kernel auto-tuning, graph-level optimizations, superoptimization such as Mirage)
Parallelization and distributed training (ZeRO/FSDP, model/pipeline parallelism, auto-parallelization such as Alpa)
LLM serving and inference (batching, PagedAttention, RadixAttention, speculative decoding)
Post-training and architectures (PEFT like LoRA/QLoRA, MoE architectures/kernels/parallelism)

Course Resources

Course Website: https://www.cs.cmu.edu/~zhihaoj2/15-779/
Schedule (slides and reading list per lecture): https://www.cs.cmu.edu/~zhihaoj2/15-779/schedule.html
Slides (PDF): https://www.cs.cmu.edu/~zhihaoj2/15-779/slides/
Logistics (grading, paper reviews, course project): https://www.cs.cmu.edu/~zhihaoj2/15-779/logistics.html
Materials (intro deep learning materials): https://www.cs.cmu.edu/~zhihaoj2/15-779/materials.html