jina-embeddings-v3

Multi-format version of jinaai/jina-embeddings-v3 - optimized for deployment.

Model Information

Property Value
Base Model jinaai/jina-embeddings-v3
Task jina-embedding
Type Text Model
Trust Remote Code True

Available Versions

Folder Format Description Size
safetensors-fp32/ PyTorch FP32 Baseline, highest accuracy 2200 MB
safetensors-fp16/ PyTorch FP16 GPU inference, ~50% smaller 1108 MB

Usage

PyTorch (GPU)

from transformers import AutoModel, AutoTokenizer
import torch

# GPU inference with FP16
model = AutoModel.from_pretrained(
    "n24q02m/jina-embeddings-v3",
    subfolder="safetensors-fp16",
    torch_dtype=torch.float16,
    trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/jina-embeddings-v3",
    subfolder="safetensors-fp16",
    trust_remote_code=True
)

# Task options: "retrieval.query", "retrieval.passage", "separation", "classification", "text-matching"
task = "retrieval.query"

# Inference
texts = ["Hello world", "How are you?"]
with torch.no_grad():
    embeddings = model.encode(texts, task=task)

Notes

  1. SafeTensors FP16 is the primary format for GPU inference
  2. Model has 5 LoRA adapters for different tasks:
    • retrieval.query: Embedding for queries in retrieval
    • retrieval.passage: Embedding for documents
    • separation: Text separation
    • classification: Text classification
    • text-matching: Text matching
  3. Requires trust_remote_code=True to load model

License

Apache 2.0 (following the base model's license)

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for n24q02m/jina-embeddings-v3

Finetuned
(30)
this model