MiA-Emb-0.6B ONNX

ONNX conversion of MindscapeRAG/MiA-Emb-0.6B for fast CPU/GPU inference.

Model Info

  • Parameters: 0.6B
  • Embedding Dimension: 1024
  • Max Sequence Length: 8192

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("maxiboch/MiA-Emb-0.6B-onnx")
session = ort.InferenceSession("model.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

inputs = tokenizer("Your text here", return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, dict(inputs))
embeddings = outputs[0]

Conversion

Converted to ONNX by @maxiboch.

Original Model

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maxiboch/MiA-Emb-0.6B-onnx

Quantized
(2)
this model

Collection including maxiboch/MiA-Emb-0.6B-onnx