| --- |
| language: |
| - fa |
| license: cc-by-nc-4.0 |
| library_name: coreml |
| pipeline_tag: automatic-speech-recognition |
| base_model: |
| - Reza2kn/Shenava-Rizeh-v1.0 |
| base_model_relation: quantized |
| tags: |
| - coreml |
| - neuralnetwork |
| - ios |
| - ios15 |
| - ios14 |
| - persian |
| - farsi |
| - asr |
| - fastconformer |
| - streaming |
| - ctc |
| - fp16 |
| - on-device |
| - shenava |
| - shenava-1 |
| - visualears |
| --- |
| |
| # Shenava — Rizeh v1.0 (32M) · CoreML iOS15 NeuralNetwork fp16 |
|
|
| CoreML **NeuralNetwork** (not ML Program) fp16 export of [`Reza2kn/Shenava-Rizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0) — built so **older Apple devices capped at iOS 15** (e.g. iPad Air 2 / iOS 15.8) can load and run it. ML Program packages require iOS 16+; this targets **NeuralNetwork / CoreML spec v5 with iOS 14 availability**, so it runs on iOS 15. |
|
|
| This is the **cache-aware streaming** step (one 170 ms prediction), the same kind of artifact as [`shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16`](https://huggingface.co/Reza2kn/shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16). |
|
|
| ## The Shenava-1 family (CoreML iOS15) |
|
|
| - [`Shenava-Koochik-1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Koochik-1.0-CoreML-iOS15-fp16) — **Koochik 1.0** (114M) · teacher / flagship |
| - [`Shenava-Rizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh v1.0** (32M) · mid-tier |
| - [`Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh Pizeh v1.0** (6.9M) · tiniest |
|
|
| ## Benchmark — fair WER/CER (parent model, decoded @ `[70,13]`) |
|
|
| | Member | golden-6669 WER | CER | FLEURS-fa WER | CER | |
| |---|---|---|---|---| |
| | **Rizeh v1.0 (32M)** | **12.11%** | 3.94% | **14.45%** | 5.10% | |
|
|
| ## CoreML contract (cache-aware streaming CTC step, att_context `[70,0]`) |
| |
| Inputs: |
| - `processed_signal`: `Float32 [1, 80, 17]` |
| - `cache_last_channel`: `Float32 [16, 1, 70, 256]` |
| - `cache_last_time`: `Float32 [16, 1, 256, 8]` |
|
|
| Outputs: |
| - `logits`: `Float32 [1, 1, 1025]` |
| - `cache_last_channel_next`: `Float32 [16, 1, 70, 256]` |
| - `cache_last_time_next`: `Float32 [16, 1, 256, 8]` |
|
|
| Streaming geometry: feature_frames per prediction = **17** (pre_encode_cache 9 + chunk 8), audio window **170 ms**, constant cache length **70**, `d_model=256`, `16` conformer layers, ×8 subsampling (80 ms/frame). |
|
|
| ## Compatibility (Xcode `coremlc`) |
|
|
| - model type: `MLModelType_neuralNetwork` |
| - storage precision: `Float16` |
| - specification version: `5` |
| - availability: `iOS 14.0`, `macOS 11.0` |
|
|
| ```bash |
| coremlc compile shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel /tmp/out --deployment-target 15.0 --platform ios |
| ``` |
|
|
| ## Files |
|
|
| - `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel` — fp16 NeuralNetwork model (~52 MB) |
| - `tokens.json`, `preprocessor.json`, `mel_filters_slaney_80x257.json` — sidecars (ve_tok_v4, shared across the family) |
| - `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16_manifest.json` — export manifest |
| - `export_koochik10_streaming_coreml.py` — reproducible export script |
|
|
| Tokenizer: ve_tok_v4 (SentencePiece BPE-1024 +blank, digit/punct/«»-aware). Numbers are emitted in **spoken form**; apply Persian ITN at display for digits. Part of [VisualEars / Shenava](https://shenava.app). |
|
|
| Export stack: coremltools 9.0, torch 2.7.0, NeMo 2.7.3. fp16 vs fp32 argmax agreement: **1.000**. |
|
|