Skip to content

Lab: Pipeline and Tensor Parallel Scheduling

Show how pipeline bubbles and tensor-parallel synchronization affect distributed model execution.

The baseline models fill-drain pipeline execution with a synchronous tensor-parallel all-gather after every stage.

The optimized path models 1F1B pipeline scheduling and overlaps part of tensor communication with useful stage compute.

Terminal window
python compare.py

The optimized path should reduce bubble time and exposed tensor-parallel communication.