After recent study of bf16 accuracy issues, we need fp32
#48
by TenStrip - opened
Recent studies found and concluded that PyTorch (v2.9.1+cu130) is using JIT translation to run Hopper PTX instructions on Blackwell hardware.
ISA Mismatch: The Instruction Set Architecture (ISA) for tensor cores has changed between Hopper and consumer Blackwell. The forward compatibility layer is mistranslating these instructions. Instead of throwing an error or crashing, the hardware performs "silently wrong math," which is particularly dangerous for machine learning model training and inference.
This may never be fixed, but a dev fp32 version shouldn't be difficult to share to make sure people have the best shot at tuning and training.
Do you have a reference to the issue/bug? Was this issue solved on newer PyTorch versions?