Update README.md
#1
by
rishiikc
- opened
README.md
CHANGED
@@ -1,3 +1,43 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model:
|
4 |
+
- openai/gpt-oss-20b
|
5 |
+
---
|
6 |
+
# GPT-OSS ONNX model (Deqauntized to BF16)
|
7 |
+
|
8 |
+
This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.
|
9 |
+
|
10 |
+
## Model Overview
|
11 |
+
|
12 |
+
- **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
|
13 |
+
- **Exported Format:** ONNX
|
14 |
+
- **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility)
|
15 |
+
- **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head
|
16 |
+
|
17 |
+
This repository includes all files: tokenizer, chat templates and configuration files.
|
18 |
+
|
19 |
+
## Generation Details
|
20 |
+
|
21 |
+
The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:
|
22 |
+
|
23 |
+
- Loading the original gpt-oss-20b checkpoint from 🤗
|
24 |
+
- Reading and converting all model layers (embedding, decoder, final norm, LM head)
|
25 |
+
- Saving the ONNX model and associated external data file
|
26 |
+
- Exporting tokenizer and configuration files
|
27 |
+
- Model layers and weights were successfully read and converted
|
28 |
+
- MXFP4 quantized weights were dequantized to BF16
|
29 |
+
- All necessary files for GenAI runtime and Hugging Face integration were generated
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
To use this ONNX model:
|
34 |
+
|
35 |
+
1. Download the model files and tokenizer assets from this repository.
|
36 |
+
2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.
|
37 |
+
|
38 |
+
## Acknowledgements
|
39 |
+
|
40 |
+
- Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
|
41 |
+
- ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft
|
42 |
+
|
43 |
+
---
|