Qwen3.5-9B-FlashHead

Optimized version of Qwen/Qwen3.5-9B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.

Model Details

Field	Value
Model	embedl/Qwen3.5-9B-FlashHead
Base Model	Qwen/Qwen3.5-9B
Input / Output	Text + Image / Video -> Text
Version	1.0
Optimizations	FlashHead LM Head
Developers	Embedl
Licenses	Upstream: Apache License 2.0. Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use	Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-9B.

Upstream: Apache License 2.0
Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

Enterprise and Commercial Inquiries: models@embedl.com
Technical Issues and Early Access: https://github.com/embedl/flash-head
More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

Engineering support for on-prem and edge deployments
Early access and partner co-marketing opportunities

Contact: models@embedl.com

Downloads last month: 386

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for embedl/Qwen3.5-9B-FlashHead

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(211)

this model

Collections including embedl/Qwen3.5-9B-FlashHead