Reza2kn
/

Shenava-Rizeh-v1.0

Automatic Speech Recognition

Model card Files Files and versions

Shenava-Rizeh-v1.0 / README.md

Reza2kn's picture

card: Shenava-1 v1.0 family

c256d4c verified 5 days ago

|

History Blame Contribute Delete

2.73 kB

	---
	language:
	- fa
	license: cc-by-nc-4.0
	library_name: nemo
	pipeline_tag: automatic-speech-recognition
	base_model:
	- Reza2kn/Shenava-Koochik-v1.0
	base_model_relation: finetune
	tags:
	- automatic-speech-recognition
	- speech
	- persian
	- farsi
	- fastconformer
	- ctc
	- streaming
	- on-device
	- shenava
	- shenava-1
	- visualears
	- rnnt
	- nemo
	- distillation
	metrics:
	- wer
	- cer
	datasets:
	- Reza2kn/visualears-persian-asr-16k
	- Reza2kn/visualears-golden-6669
	- Reza2kn/fleurs-fa-benchmark
	---

	# Shenava — Rizeh v1.0 (32M) · Persian streaming ASR

	Rizeh (ریزه, “tiny”) is the 32M mid‑tier of the Shenava‑1 family — a FastConformer Hybrid RNNT/CTC model distilled (logit + feature KD) from the [Koochik v1.0 114M teacher](https://huggingface.co/Reza2kn/Shenava-Koochik-v1.0). CTC head deployed. fp32 NeMo source; quants in own repos below.

	## The Shenava‑1 family

	A knowledge‑distillation cascade of on‑device Persian ASR models — one teacher distilled down to a 6.9M student. This model is one member; its siblings:

	- [`Reza2kn/Shenava-Koochik-v1.0`](https://huggingface.co/Reza2kn/Shenava-Koochik-v1.0) — Koochik v1.0 (114M) · teacher / flagship — on-device WER record
	- [`Reza2kn/Shenava-Rizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0) — Rizeh v1.0 (32M) · mid-tier student ◀ this model (or its parent)
	- [`Reza2kn/Shenava-Rizeh-Pizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-Pizeh-v1.0) — Rizeh Pizeh v1.0 (6.9M) · tiniest — real-time on a 2015 Cortex-A7

	## Benchmark — fair WER/CER

	ITN + Persian‑digit normalizer (the [double‑benchmark](https://huggingface.co/spaces/Reza2kn/persian-asr-double-benchmark) convention), decoded @ `att_context_size=[70,13]`.

	\| Member \| golden‑6669 WER \| CER \| FLEURS‑fa WER \| CER \|
	\|---\|---\|---\|---\|---\|
	\| Koochik v1.0 (114M) \| 7.49% \| 2.30% \| 10.64% \| 3.79% \|
	\| Rizeh v1.0 (32M) \| 12.11% \| 3.94% \| 14.45% \| 5.10% \|
	\| Rizeh Pizeh v1.0 (6.9M) \| 24.55% \| 8.89% \| 26.95% \| 10.22% \|

	Koochik v1.0 is #2 on the public double‑benchmark, behind only cloud Gemini — the best on‑device Persian ASR, beating a 1.5B Whisper‑Persian by >2× WER at 1/13 the size.

	## Quantized formats (own repos)

	- [`Shenava-Rizeh-v1.0-ONNX-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-ONNX-fp16)
	- [`Shenava-Rizeh-v1.0-CoreML-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-CoreML-fp16)

	32M, d_model 256 / 16 layers, ×8 subsampling, multi‑latency `[[70,13],[70,6],[70,1],[70,0]]`.

	Tokenizer: ve_tok_v4 (SentencePiece BPE‑1024 +blank, digit/punct/«»‑aware). Numbers are spoken‑form; apply ITN at display for digits. Part of [VisualEars / Shenava](https://shenava.app).