@JonnaMat on Hugging Face: "⚡ Qwen3.5, up to 1.4× faster. Same quality. Less latency. We applied…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update about 16 hours ago

Post

38

⚡ Qwen3.5, up to 1.4× faster. Same quality. Less latency.

We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.

📊 embedl/Edge-Inference-Benchmarks

🤗 https://huggingface.co/collections/embedl/qwen35

In this post

JonnaMat Jonna Matthiesen