Qwen3.5-9B-FlashHead

GitHub

Optimized version of Qwen/Qwen3.5-9B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.


Model Details

Field Value
Model embedl/Qwen3.5-9B-FlashHead
Base Model Qwen/Qwen3.5-9B
Input / Output Text + Image / Video -> Text
Version 1.0
Optimizations FlashHead LM Head
Developers Embedl
Licenses Upstream: Apache License 2.0.
Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

  • FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Edge Inference Benchmarks for Qwen3.5

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-9B.

  • Upstream: Apache License 2.0
  • Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

  • Enterprise and Commercial Inquiries: models@embedl.com
  • Technical Issues and Early Access: https://github.com/embedl/flash-head
  • More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

  • Engineering support for on-prem and edge deployments
  • Early access and partner co-marketing opportunities

Contact: models@embedl.com

Downloads last month
386
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for embedl/Qwen3.5-9B-FlashHead

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(211)
this model

Collections including embedl/Qwen3.5-9B-FlashHead