Shenava-Rizeh-v1.0-CoreML-fp16 / README.md

Reza2kn

card: Shenava-1 v1.0 family

ab4ed87 verified 8 days ago

preview code

Raw

History Blame Contribute Delete

2.5 kB

metadata

language:
  - fa
license: cc-by-nc-4.0
library_name: coreml
pipeline_tag: automatic-speech-recognition
base_model:
  - Reza2kn/Shenava-Rizeh-v1.0
base_model_relation: quantized
tags:
  - automatic-speech-recognition
  - speech
  - persian
  - farsi
  - fastconformer
  - ctc
  - streaming
  - on-device
  - shenava
  - shenava-1
  - visualears
  - coreml
  - fp16
  - quantized
  - ios
  - macos
metrics:
  - wer
  - cer
datasets:
  - Reza2kn/visualears-persian-asr-16k
  - Reza2kn/visualears-golden-6669
  - Reza2kn/fleurs-fa-benchmark

Shenava — Rizeh v1.0 (32M) · CoreML fp16

CoreML fp16 mlprogram export of Shenava-Rizeh-v1.0. macOS13+.

The Shenava‑1 family

A knowledge‑distillation cascade of on‑device Persian ASR models — one teacher distilled down to a 6.9M student. This model is one member; its siblings:

Reza2kn/Shenava-Koochik-v1.0 — Koochik v1.0 (114M) · teacher / flagship — on-device WER record
Reza2kn/Shenava-Rizeh-v1.0 — Rizeh v1.0 (32M) · mid-tier student ◀ this model (or its parent)
Reza2kn/Shenava-Rizeh-Pizeh-v1.0 — Rizeh Pizeh v1.0 (6.9M) · tiniest — real-time on a 2015 Cortex-A7

Benchmark — fair WER/CER

ITN + Persian‑digit normalizer (the double‑benchmark convention), decoded @ att_context_size=[70,13].

Member	golden‑6669 WER	CER	FLEURS‑fa WER	CER
Koochik v1.0 (114M)	7.49%	2.30%	10.64%	3.79%
Rizeh v1.0 (32M)	12.11%	3.94%	14.45%	5.10%
Rizeh Pizeh v1.0 (6.9M)	24.55%	8.89%	26.95%	10.22%

Koochik v1.0 is #2 on the public double‑benchmark, behind only cloud Gemini — the best on‑device Persian ASR, beating a 1.5B Whisper‑Persian by >2× WER at 1/13 the size.

Parity (269‑gold)

exact 257/269, argmax 0.9977, char‑sim 0.9981, 52 MB.

Fixed‑window contract: processed_signal [1,80,2005], processed_signal_length [1] → logits [1,~252,1025], encoded_lengths [1]. Sidecars regenerated from the parent .nemo.

Tokenizer: ve_tok_v4 (SentencePiece BPE‑1024 +blank, digit/punct/«»‑aware). Numbers are spoken‑form; apply ITN at display for digits. Part of VisualEars / Shenava.