metadata
license: apache-2.0
base_model:
- openai/gpt-oss-20b
GPT-OSS ONNX model (Deqauntized to BF16)
This repository contains an ONNX export of the openai/gpt-oss-20b from Hugging Face, generated using the official onnxruntime-genai builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.
Model Overview
- Source Model: openai/gpt-oss-20b from 🤗
- Exported Format: ONNX
- Precision: BF16 (dequantized from MXFP4 for GPU compatibility)
- Layers: 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head
This repository includes all files: tokenizer, chat templates and configuration files.
Generation Details
The ONNX model was generated using the builder.py
script from the onnxruntime-genai toolkit. The process involved:
- Loading the original gpt-oss-20b checkpoint from 🤗
- Reading and converting all model layers (embedding, decoder, final norm, LM head)
- Saving the ONNX model and associated external data file
- Exporting tokenizer and configuration files
- Model layers and weights were successfully read and converted
- MXFP4 quantized weights were dequantized to BF16
- All necessary files for GenAI runtime and Hugging Face integration were generated
Usage
To use this ONNX model:
- Download the model files and tokenizer assets from this repository.
- Load the ONNX model using onnxruntime or compatible inference engines.
Acknowledgements
- Original model: openai/gpt-oss-20b from 🤗
- ONNX export: onnxruntime-genai by Microsoft