Lab: NIXL-Style Tier Handoff
Show why Chapter 4 separates training collectives from disaggregated inference data movement. NCCL is the right lens for all-reduce, but KV cache handoff is a point-to-point transfer problem.
This lab adapts the important idea behind baseline and optimized NIXL-style tier handoff examples.
Baseline
Section titled “Baseline”The baseline copies selected KV blocks one at a time through a CPU staging buffer. Each block pays fixed scheduling overhead and creates fragmented movement.
Optimized
Section titled “Optimized”The optimized path packs selected blocks into one contiguous transfer and unpacks at the target tier. This models a NIXL/UCX-style handoff where the movement layer sees a compact payload.
python compare.pyExpected Observation
Section titled “Expected Observation”The optimized path should transfer the same selected KV bytes with fewer operations and lower elapsed time.
What This Proves
Section titled “What This Proves”Disaggregated serving performance depends on the shape of KV movement: block selection, packing, registration reuse, and prefill/decode placement matter as much as raw link bandwidth.