Post
38
⚡ Qwen3.5, up to 1.4× faster. Same quality. Less latency.
We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.
📊 embedl/Edge-Inference-Benchmarks
🤗 https://huggingface.co/collections/embedl/qwen35
We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.
📊 embedl/Edge-Inference-Benchmarks
🤗 https://huggingface.co/collections/embedl/qwen35