#Normalization (LayerNorm/RMSNorm) is foundational. Improved #torchcompile on #H100 & #B200 reaches near SOTA kernel speed—17x faster than eager on backwards—with automatic fusion for peak e2e performance.
https://bit.ly/3PXhSyf
Today at #PyTorchCon EU: Talk at 15:40 CEST.