GigaAM-v3 e2e_rnnt (OpenVINO IR, pre-converted)
OpenVINO IR port of ai-sage/GigaAM-v3 revision e2e_rnnt β Sber's SOTA Russian ASR model (220M parameters, Conformer + RNN-T with end-to-end punctuation and capitalization).
Conversion done with:
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("ai-sage/GigaAM-v3", revision="e2e_rnnt", trust_remote_code=True)
model.to_onnx(dir_path="onnx", dtype=torch.float16)
# then:
import openvino as ov
for f in ["encoder", "decoder", "joint"]:
m = ov.convert_model(f"onnx/v3_e2e_rnnt_{f}.onnx")
ov.save_model(m, f"v3_e2e_rnnt_{f}.xml")
Files
| File | Purpose | Size |
|---|---|---|
v3_e2e_rnnt_encoder.xml/.bin |
Conformer encoder (main cost) | ~425 MB FP16 |
v3_e2e_rnnt_decoder.xml/.bin |
RNN-T decoder (prediction network) | ~2 MB |
v3_e2e_rnnt_joint.xml/.bin |
Joint network | ~1.3 MB |
tokenizer.model |
SentencePiece vocabulary (1024 subwords) | 250 KB |
config.json |
Original model config (for reference) | 2 KB |
Device compatibility (Intel hardware)
Verified on Intel Core Ultra 9 285H (OpenVINO 2025.4.1):
| Device | Encoder | Decoder | Joint | Usable? |
|---|---|---|---|---|
| CPU | β | β | β | Yes (~34Γ RTFx on 10 s chunk) |
| GPU.0 (Arc Xe2 iGPU) | β | β | β | Yes (~520Γ RTFx on encoder alone) |
| NPU | β (dynamic shapes) | β | β (dynamic shapes) | Partial only |
Recommended device: Intel Arc iGPU (GPU.0) β fastest and does not compete with NVIDIA for VRAM.
NPU fails compile on encoder/joint due to dynamic input shapes in the exported ONNX (upper bounds 9223372036854775807). A re-export with static reshape at 10 s chunks would likely unlock NPU.
Usage (Python, pure OpenVINO)
import openvino as ov
core = ov.Core()
encoder = core.compile_model("v3_e2e_rnnt_encoder.xml", "GPU.0")
decoder = core.compile_model("v3_e2e_rnnt_decoder.xml", "GPU.0")
joint = core.compile_model("v3_e2e_rnnt_joint.xml", "GPU.0")
# Preprocess: audio 16 kHz mono -> log-mel (64 bins, 20 ms win, 10 ms hop)
# Encoder: features -> encoder outputs
# Decoder + Joint: RNN-T greedy decode loop -> token IDs
# SentencePieceProcessor(tokenizer.model).decode(ids) -> text
A reference Python backend is available in the Voice Scribe project (MIT license).
Credits
- Original model: Sber / ai-sage/GigaAM-v3 (MIT)
- OpenVINO conversion: Voice Scribe project
License
MIT (matches upstream ai-sage/GigaAM-v3).
- Downloads last month
- 23
Model tree for VoiceScribe/voicescribe-gigaam
Base model
ai-sage/GigaAM-v3