Shenava-Rizeh-v1.0 / README.md
Reza2kn's picture
card: Shenava-1 v1.0 family
c256d4c verified
|
Raw
History Blame Contribute Delete
2.73 kB
metadata
language:
  - fa
license: cc-by-nc-4.0
library_name: nemo
pipeline_tag: automatic-speech-recognition
base_model:
  - Reza2kn/Shenava-Koochik-v1.0
base_model_relation: finetune
tags:
  - automatic-speech-recognition
  - speech
  - persian
  - farsi
  - fastconformer
  - ctc
  - streaming
  - on-device
  - shenava
  - shenava-1
  - visualears
  - rnnt
  - nemo
  - distillation
metrics:
  - wer
  - cer
datasets:
  - Reza2kn/visualears-persian-asr-16k
  - Reza2kn/visualears-golden-6669
  - Reza2kn/fleurs-fa-benchmark

Shenava — Rizeh v1.0 (32M) · Persian streaming ASR

Rizeh (ریزه, “tiny”) is the 32M mid‑tier of the Shenava‑1 family — a FastConformer Hybrid RNNT/CTC model distilled (logit + feature KD) from the Koochik v1.0 114M teacher. CTC head deployed. fp32 NeMo source; quants in own repos below.

The Shenava‑1 family

A knowledge‑distillation cascade of on‑device Persian ASR models — one teacher distilled down to a 6.9M student. This model is one member; its siblings:

Benchmark — fair WER/CER

ITN + Persian‑digit normalizer (the double‑benchmark convention), decoded @ att_context_size=[70,13].

Member golden‑6669 WER CER FLEURS‑fa WER CER
Koochik v1.0 (114M) 7.49% 2.30% 10.64% 3.79%
Rizeh v1.0 (32M) 12.11% 3.94% 14.45% 5.10%
Rizeh Pizeh v1.0 (6.9M) 24.55% 8.89% 26.95% 10.22%

Koochik v1.0 is #2 on the public double‑benchmark, behind only cloud Gemini — the best on‑device Persian ASR, beating a 1.5B Whisper‑Persian by >2× WER at 1/13 the size.

Quantized formats (own repos)

32M, d_model 256 / 16 layers, ×8 subsampling, multi‑latency [[70,13],[70,6],[70,1],[70,0]].

Tokenizer: ve_tok_v4 (SentencePiece BPE‑1024 +blank, digit/punct/«»‑aware). Numbers are spoken‑form; apply ITN at display for digits. Part of VisualEars / Shenava.