Qwen3.5-9B-FlashHead
Optimized version of Qwen/Qwen3.5-9B using FlashHead, Embedl's efficient replacement for the language model head.
This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.
The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.
FlashHead is available as a vLLM plugin via pip install flash-head.
Model Details
| Field | Value |
|---|---|
| Model | embedl/Qwen3.5-9B-FlashHead |
| Base Model | Qwen/Qwen3.5-9B |
| Input / Output | Text + Image / Video -> Text |
| Version | 1.0 |
| Optimizations | FlashHead LM Head |
| Developers | Embedl |
| Licenses | Upstream: Apache License 2.0. Optimized components: Embedl Models Community Licence v1.0 (no redistribution) |
| Intended Use | Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs |
Optimizations
- FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.
Benchmarks
Installation
pip install flash-head
The flash-head vLLM plugin is required. It activates automatically at startup.
License
This model is a derivative of Qwen/Qwen3.5-9B.
- Upstream: Apache License 2.0
- Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)
Contact
- Enterprise and Commercial Inquiries:
models@embedl.com - Technical Issues and Early Access:
https://github.com/embedl/flash-head - More Information and Model Releases:
https://embedl.com
Partner & Developer Opportunities
If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:
- Engineering support for on-prem and edge deployments
- Early access and partner co-marketing opportunities
Contact: models@embedl.com
- Downloads last month
- 386