You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

The information you provide will be collected, stored, processed and shared in accordance with the Embedl Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

Optimized by Embedl
Need to fine-tune, hit performance targets, or deploy on specific hardware?
We've got you covered.
Learn more Get in touch →

Qwen3.5-9B-FlashHead

GitHub

Optimized version of Qwen/Qwen3.5-9B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.


Model Details

Field Value
Model embedl/Qwen3.5-9B-FlashHead
Base Model Qwen/Qwen3.5-9B
Input / Output Text + Image / Video -> Text
Version 1.0
Optimizations FlashHead LM Head
Developers Embedl
Licenses Upstream: Apache License 2.0.
Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

  • FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Edge Inference Benchmarks for Qwen3.5

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-9B.

  • Upstream: Apache License 2.0
  • Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

  • Enterprise and Commercial Inquiries: models@embedl.com
  • Technical Issues and Early Access: https://github.com/embedl/flash-head
  • More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

  • Engineering support for on-prem and edge deployments
  • Early access and partner co-marketing opportunities

Contact: models@embedl.com

Community & support
Need help with this model? Chat with the Embedl team and other engineers on Discord.
Quantization gotchas, hardware questions, fine-tuning tips — bring them all.
Join our Discord →
Downloads last month
443
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for embedl/Qwen3.5-9B-FlashHead

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(225)
this model

Collections including embedl/Qwen3.5-9B-FlashHead