Files changed (1) hide show
  1. README.md +43 -3
README.md CHANGED
@@ -1,3 +1,43 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - openai/gpt-oss-20b
5
+ ---
6
+ # GPT-OSS ONNX model (Deqauntized to BF16)
7
+
8
+ This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.
9
+
10
+ ## Model Overview
11
+
12
+ - **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
13
+ - **Exported Format:** ONNX
14
+ - **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility)
15
+ - **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head
16
+
17
+ This repository includes all files: tokenizer, chat templates and configuration files.
18
+
19
+ ## Generation Details
20
+
21
+ The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:
22
+
23
+ - Loading the original gpt-oss-20b checkpoint from 🤗
24
+ - Reading and converting all model layers (embedding, decoder, final norm, LM head)
25
+ - Saving the ONNX model and associated external data file
26
+ - Exporting tokenizer and configuration files
27
+ - Model layers and weights were successfully read and converted
28
+ - MXFP4 quantized weights were dequantized to BF16
29
+ - All necessary files for GenAI runtime and Hugging Face integration were generated
30
+
31
+ ## Usage
32
+
33
+ To use this ONNX model:
34
+
35
+ 1. Download the model files and tokenizer assets from this repository.
36
+ 2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.
37
+
38
+ ## Acknowledgements
39
+
40
+ - Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
41
+ - ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft
42
+
43
+ ---