openai-gpt-oss-20b-onnx-bf16 / README.md

rishiikc

Update README.md

1e010e8 verified 20 days ago

preview code

raw

history blame

1.91 kB

metadata

license: apache-2.0
base_model:
  - openai/gpt-oss-20b

GPT-OSS ONNX model (Deqauntized to BF16)

This repository contains an ONNX export of the openai/gpt-oss-20b from Hugging Face, generated using the official onnxruntime-genai builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.

Model Overview

Source Model: openai/gpt-oss-20b from 🤗
Exported Format: ONNX
Precision: BF16 (dequantized from MXFP4 for GPU compatibility)
Layers: 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head

This repository includes all files: tokenizer, chat templates and configuration files.

Generation Details

The ONNX model was generated using the builder.py script from the onnxruntime-genai toolkit. The process involved:

Loading the original gpt-oss-20b checkpoint from 🤗
Reading and converting all model layers (embedding, decoder, final norm, LM head)
Saving the ONNX model and associated external data file
Exporting tokenizer and configuration files
Model layers and weights were successfully read and converted
MXFP4 quantized weights were dequantized to BF16
All necessary files for GenAI runtime and Hugging Face integration were generated

Usage

To use this ONNX model:

Download the model files and tokenizer assets from this repository.
Load the ONNX model using onnxruntime or compatible inference engines.

Acknowledgements

Original model: openai/gpt-oss-20b from 🤗
ONNX export: onnxruntime-genai by Microsoft