rishiikc
/

openai-gpt-oss-20b-onnx-bf16

Model card Files Files and versions

Update README.md

#1

by rishiikc - opened 21 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +43 -3

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- openai/gpt-oss-20b
+---
+# GPT-OSS ONNX model (Deqauntized to BF16)
+This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.
+## Model Overview
+- **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
+- **Exported Format:** ONNX
+- **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility)
+- **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head
+This repository includes all files: tokenizer, chat templates and configuration files.
+## Generation Details
+The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:
+- Loading the original gpt-oss-20b checkpoint from 🤗
+- Reading and converting all model layers (embedding, decoder, final norm, LM head)
+- Saving the ONNX model and associated external data file
+- Exporting tokenizer and configuration files
+- Model layers and weights were successfully read and converted
+- MXFP4 quantized weights were dequantized to BF16
+- All necessary files for GenAI runtime and Hugging Face integration were generated
+## Usage
+To use this ONNX model:
+1. Download the model files and tokenizer assets from this repository.
+2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.
+## Acknowledgements
+- Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
+- ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft
+---