2025 - General Purpose - 32B to 70B
Collection
General purpose models in the 32B to 70B parameters range
•
3 items
•
Updated
This repo contains Seed-OSS-36B-Instruct quantized with FP8, and FP8 KV-cache
Original Model:
⚠️⚠️⚠️ This is at the moment a debugging upload. It will trigger a vllm assert as some scaling factors are not >0.0 if used with FP8 KV-cache (--kv-cache-dtype='fp8')
The model was tested with vLLM and 2x RTX Pro 6000, here is a script suitable for such configuration.
export MODEL="mratsim/Seed-OSS-36B-Instruct-FP8-KV8"
vllm serve "${MODEL}" \
--served-model-name seed-oss-36b \
--tensor-parallel-size 2 \
--kv-cache-dtype 'fp8' \
--gpu-memory-utilization 0.85
Base model
ByteDance-Seed/Seed-OSS-36B-Instruct