MiA-Emb-8B ONNX

ONNX conversion of MindscapeRAG/MiA-Emb-8B for fast CPU/GPU inference.

Model Info

  • Parameters: 8B
  • Embedding Dimension: 4096
  • Max Sequence Length: 8192

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("maxiboch/MiA-Emb-8B-onnx")
session = ort.InferenceSession("model.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])

inputs = tokenizer("Your text here", return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, dict(inputs))
embeddings = outputs[0]

Conversion

Converted to ONNX by @maxiboch.

Original Model

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maxiboch/MiA-Emb-8B-onnx

Base model

Qwen/Qwen3-8B-Base
Quantized
(2)
this model

Collection including maxiboch/MiA-Emb-8B-onnx