Post
โก Qwen3.5, up to 1.4ร faster. Same quality. Less latency.
We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.
๐ embedl/Edge-Inference-Benchmarks
๐ค https://huggingface.co/collections/embedl/qwen35
We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.
๐ embedl/Edge-Inference-Benchmarks
๐ค https://huggingface.co/collections/embedl/qwen35