GigaAM-v3 e2e_rnnt (OpenVINO IR, pre-converted)

OpenVINO IR port of ai-sage/GigaAM-v3 revision e2e_rnnt β€” Sber's SOTA Russian ASR model (220M parameters, Conformer + RNN-T with end-to-end punctuation and capitalization).

Conversion done with:

from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("ai-sage/GigaAM-v3", revision="e2e_rnnt", trust_remote_code=True)
model.to_onnx(dir_path="onnx", dtype=torch.float16)
# then:
import openvino as ov
for f in ["encoder", "decoder", "joint"]:
    m = ov.convert_model(f"onnx/v3_e2e_rnnt_{f}.onnx")
    ov.save_model(m, f"v3_e2e_rnnt_{f}.xml")

Files

File Purpose Size
v3_e2e_rnnt_encoder.xml/.bin Conformer encoder (main cost) ~425 MB FP16
v3_e2e_rnnt_decoder.xml/.bin RNN-T decoder (prediction network) ~2 MB
v3_e2e_rnnt_joint.xml/.bin Joint network ~1.3 MB
tokenizer.model SentencePiece vocabulary (1024 subwords) 250 KB
config.json Original model config (for reference) 2 KB

Device compatibility (Intel hardware)

Verified on Intel Core Ultra 9 285H (OpenVINO 2025.4.1):

Device Encoder Decoder Joint Usable?
CPU βœ… βœ… βœ… Yes (~34Γ— RTFx on 10 s chunk)
GPU.0 (Arc Xe2 iGPU) βœ… βœ… βœ… Yes (~520Γ— RTFx on encoder alone)
NPU ❌ (dynamic shapes) βœ… ❌ (dynamic shapes) Partial only

Recommended device: Intel Arc iGPU (GPU.0) β€” fastest and does not compete with NVIDIA for VRAM.

NPU fails compile on encoder/joint due to dynamic input shapes in the exported ONNX (upper bounds 9223372036854775807). A re-export with static reshape at 10 s chunks would likely unlock NPU.

Usage (Python, pure OpenVINO)

import openvino as ov
core = ov.Core()
encoder = core.compile_model("v3_e2e_rnnt_encoder.xml", "GPU.0")
decoder = core.compile_model("v3_e2e_rnnt_decoder.xml", "GPU.0")
joint   = core.compile_model("v3_e2e_rnnt_joint.xml",   "GPU.0")

# Preprocess: audio 16 kHz mono -> log-mel (64 bins, 20 ms win, 10 ms hop)
# Encoder: features -> encoder outputs
# Decoder + Joint: RNN-T greedy decode loop -> token IDs
# SentencePieceProcessor(tokenizer.model).decode(ids) -> text

A reference Python backend is available in the Voice Scribe project (MIT license).

Credits

License

MIT (matches upstream ai-sage/GigaAM-v3).

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for VoiceScribe/voicescribe-gigaam

Finetuned
(7)
this model