rishiikc's picture
Update README.md
1e010e8 verified
|
raw
history blame
1.91 kB
---
license: apache-2.0
base_model:
- openai/gpt-oss-20b
---
# GPT-OSS ONNX model (Deqauntized to BF16)
This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.
## Model Overview
- **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
- **Exported Format:** ONNX
- **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility)
- **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head
This repository includes all files: tokenizer, chat templates and configuration files.
## Generation Details
The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:
- Loading the original gpt-oss-20b checkpoint from 🤗
- Reading and converting all model layers (embedding, decoder, final norm, LM head)
- Saving the ONNX model and associated external data file
- Exporting tokenizer and configuration files
- Model layers and weights were successfully read and converted
- MXFP4 quantized weights were dequantized to BF16
- All necessary files for GenAI runtime and Hugging Face integration were generated
## Usage
To use this ONNX model:
1. Download the model files and tokenizer assets from this repository.
2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.
## Acknowledgements
- Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
- ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft
---