Qwen3-Embedding-8B

Multi-format version of Qwen/Qwen3-Embedding-8B - optimized for deployment.

Model Information

Property	Value
Base Model	Qwen/Qwen3-Embedding-8B
Task	feature-extraction
Type	Text Model
Trust Remote Code	True

Available Versions

Folder	Format	Description	Size
`safetensors-fp16/`	PyTorch FP16	GPU inference, ~50% smaller	14449 MB

Usage

PyTorch (GPU)

from transformers import AutoModel, AutoTokenizer
import torch

# GPU inference with FP16
model = AutoModel.from_pretrained(
    "n24q02m/Qwen3-Embedding-8B",
    subfolder="safetensors-fp16",
    torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/Qwen3-Embedding-8B",
    subfolder="safetensors-fp16", trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Mean pooling

Notes

SafeTensors FP16 is the primary format for GPU inference
Load tokenizer from the same folder as the model

License

Apache 2.0 (following the base model's license)

Credits

Base Model: Qwen/Qwen3-Embedding-8B
Conversion: PyTorch + SafeTensors

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for n24q02m/Qwen3-Embedding-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Finetuned

(19)

this model