rishiikc
/

openai-gpt-oss-20b-onnx-bf16

Model card Files Files and versions

openai-gpt-oss-20b-onnx-bf16 / README.md

rishiikc's picture

Update README.md

1e010e8 verified 20 days ago

|

1.91 kB

	---
	license: apache-2.0
	base_model:
	- openai/gpt-oss-20b
	---
	# GPT-OSS ONNX model (Deqauntized to BF16)

	This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.

	## Model Overview

	- Source Model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
	- Exported Format: ONNX
	- Precision: BF16 (dequantized from MXFP4 for GPU compatibility)
	- Layers: 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head

	This repository includes all files: tokenizer, chat templates and configuration files.

	## Generation Details

	The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:

	- Loading the original gpt-oss-20b checkpoint from 🤗
	- Reading and converting all model layers (embedding, decoder, final norm, LM head)
	- Saving the ONNX model and associated external data file
	- Exporting tokenizer and configuration files
	- Model layers and weights were successfully read and converted
	- MXFP4 quantized weights were dequantized to BF16
	- All necessary files for GenAI runtime and Hugging Face integration were generated

	## Usage

	To use this ONNX model:

	1. Download the model files and tokenizer assets from this repository.
	2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.

	## Acknowledgements

	- Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
	- ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft

	---