Seed-OSS-36B-Instruct FP8 quantization (including KV-cache)

This repo contains Seed-OSS-36B-Instruct quantized with FP8, and FP8 KV-cache

Original Model:

ByteDance-Seed/Seed-OSS-36B-Instruct

⚠️⚠️⚠️ This is at the moment a debugging upload. It will trigger a vllm assert as some scaling factors are not >0.0 if used with FP8 KV-cache (--kv-cache-dtype='fp8')

📥 Usage & Running Instructions

The model was tested with vLLM and 2x RTX Pro 6000, here is a script suitable for such configuration.

export MODEL="mratsim/Seed-OSS-36B-Instruct-FP8-KV8"
vllm serve "${MODEL}" \
  --served-model-name seed-oss-36b \
  --tensor-parallel-size 2 \
  --kv-cache-dtype 'fp8' \
  --gpu-memory-utilization 0.85

Downloads last month: 65

Safetensors

Model size

36B params

Tensor type

BF16

F8_E4M3

Model tree for mratsim/Seed-OSS-36B-Instruct-FP8-KV8

Base model

ByteDance-Seed/Seed-OSS-36B-Instruct

Quantized

(43)

this model

Dataset used to train mratsim/Seed-OSS-36B-Instruct-FP8-KV8

Collection including mratsim/Seed-OSS-36B-Instruct-FP8-KV8

2025 - General Purpose - 32B to 70B

Collection

General purpose models in the 32B to 70B parameters range • 3 items • Updated about 23 hours ago