Lightricks/LTX-2.3 · After recent study of bf16 accuracy issues, we need fp32

After recent study of bf16 accuracy issues, we need fp32

#48

by TenStrip - opened about 15 hours ago

Recent studies found and concluded that PyTorch (v2.9.1+cu130) is using JIT translation to run Hopper PTX instructions on Blackwell hardware.
ISA Mismatch: The Instruction Set Architecture (ISA) for tensor cores has changed between Hopper and consumer Blackwell. The forward compatibility layer is mistranslating these instructions. Instead of throwing an error or crashing, the hardware performs "silently wrong math," which is particularly dangerous for machine learning model training and inference.
This may never be fixed, but a dev fp32 version shouldn't be difficult to share to make sure people have the best shot at tuning and training.

StatusReport

Lightricks org about 13 hours ago

Do you have a reference to the issue/bug? Was this issue solved on newer PyTorch versions?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment