aystream's picture
Upload folder using huggingface_hub
fb0dfb7 verified
metadata
library_name: mlx
license: mit
language:
  - ru
  - en
tags:
  - automatic-speech-recognition
  - mlx
  - apple-silicon
  - russian
  - gigaam
  - conformer
  - ctc
base_model: ai-sage/GigaAM-v3
pipeline_tag: automatic-speech-recognition
model-index:
  - name: GigaAM-v3-e2e-ctc-mlx
    results:
      - task:
          type: automatic-speech-recognition
        metrics:
          - name: RTF (M2 Max)
            type: rtf
            value: 0.006

GigaAM v3 e2e CTC — MLX

MLX port of GigaAM-v3 for fast Russian speech recognition on Apple Silicon. 180x realtime on M2 Max.

Usage

pip install gigaam-mlx
from gigaam_mlx import load_model, transcribe

model, tokenizer = load_model()  # downloads weights automatically
text = transcribe(model, tokenizer, "recording.wav")
print(text)

Or via CLI:

gigaam-mlx recording.wav

Performance

MacBook Pro M2 Max, 20-second chunk:

Backend Time Realtime
MLX CTC (this) 0.11s 180x
PyTorch MPS RNNT 0.76s 26x
ONNX CPU CTC 1.66s 12x

Model

  • Architecture: Conformer (16 layers, 768d, 16 heads, RoPE) + CTC
  • Parameters: 220M
  • Vocabulary: 257 tokens (SentencePiece)
  • Features: Punctuation, text normalization, Russian + English code-switching

Links