Post
1307
gpt-oss was possible thanks to new engineering efforts in π€ transformers. We just dropped a blog covering them:
- Kernels from the Hub
- MXFP4 Quantization
- Tensor & Expert Parallelism
- Dynamic Sliding Window & Cache
- Continuous Batching & Paged Attention
Grab a coffee & dive in! βοΈ
https://huggingface.co/blog/faster-transformers
- Kernels from the Hub
- MXFP4 Quantization
- Tensor & Expert Parallelism
- Dynamic Sliding Window & Cache
- Continuous Batching & Paged Attention
Grab a coffee & dive in! βοΈ
https://huggingface.co/blog/faster-transformers