Qwen3-Embedding-8B

Multi-format version of Qwen/Qwen3-Embedding-8B - optimized for deployment.

Model Information

Property Value
Base Model Qwen/Qwen3-Embedding-8B
Task feature-extraction
Type Text Model
Trust Remote Code True

Available Versions

Folder Format Description Size
safetensors-fp16/ PyTorch FP16 GPU inference, ~50% smaller 14449 MB

Usage

PyTorch (GPU)

from transformers import AutoModel, AutoTokenizer
import torch

# GPU inference with FP16
model = AutoModel.from_pretrained(
    "n24q02m/Qwen3-Embedding-8B",
    subfolder="safetensors-fp16",
    torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/Qwen3-Embedding-8B",
    subfolder="safetensors-fp16", trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling

Notes

  1. SafeTensors FP16 is the primary format for GPU inference
  2. Load tokenizer from the same folder as the model

License

Apache 2.0 (following the base model's license)

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for n24q02m/Qwen3-Embedding-8B

Base model

Qwen/Qwen3-8B-Base
Finetuned
(19)
this model