Tarka-Embedding-150M-V1 (ONNX)

ONNX version of Tarka-AIR/Tarka-Embedding-150M-V1.

  • Embedding dimension: 768
  • Context length: 2048
  • Model size: ~600MB

Usage

With ONNX Runtime (Python)

import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("tarka-150m-v1-onnx/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("permutans/Tarka-Embedding-150M-V1-ONNX")

texts = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

embeddings = []
for text in texts:
    inputs = tokenizer(text, return_tensors="np")
    _, sentence_embedding = session.run(
        None,
        {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
    )
    embeddings.append(sentence_embedding[0])

import numpy as np

embeddings = np.array(embeddings)
print(embeddings.shape)  # (3, 768)

# Compute cosine similarities
from sklearn.metrics.pairwise import cosine_similarity

similarities = cosine_similarity(embeddings)
print(similarities)

With FastEmbed (Rust)

Compatible with fastembed-rs for high-performance embedding generation.

Model Outputs

  • token_embeddings: Token-level embeddings (batch_size, sequence_length, 768)
  • sentence_embedding: Pooled sentence embeddings (batch_size, 768) - use this for most tasks

Performance

This ONNX export works with both CPU and CUDA execution providers for flexible deployment.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for permutans/Tarka-Embedding-150M-V1-ONNX

Quantized
(2)
this model