--- language: - fa license: cc-by-nc-4.0 library_name: coreml pipeline_tag: automatic-speech-recognition base_model: - Reza2kn/Shenava-Rizeh-v1.0 base_model_relation: quantized tags: - coreml - neuralnetwork - ios - ios15 - ios14 - persian - farsi - asr - fastconformer - streaming - ctc - fp16 - on-device - shenava - shenava-1 - visualears --- # Shenava — Rizeh v1.0 (32M) · CoreML iOS15 NeuralNetwork fp16 CoreML **NeuralNetwork** (not ML Program) fp16 export of [`Reza2kn/Shenava-Rizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0) — built so **older Apple devices capped at iOS 15** (e.g. iPad Air 2 / iOS 15.8) can load and run it. ML Program packages require iOS 16+; this targets **NeuralNetwork / CoreML spec v5 with iOS 14 availability**, so it runs on iOS 15. This is the **cache-aware streaming** step (one 170 ms prediction), the same kind of artifact as [`shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16`](https://huggingface.co/Reza2kn/shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16). ## The Shenava-1 family (CoreML iOS15) - [`Shenava-Koochik-1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Koochik-1.0-CoreML-iOS15-fp16) — **Koochik 1.0** (114M) · teacher / flagship - [`Shenava-Rizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh v1.0** (32M) · mid-tier - [`Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh Pizeh v1.0** (6.9M) · tiniest ## Benchmark — fair WER/CER (parent model, decoded @ `[70,13]`) | Member | golden-6669 WER | CER | FLEURS-fa WER | CER | |---|---|---|---|---| | **Rizeh v1.0 (32M)** | **12.11%** | 3.94% | **14.45%** | 5.10% | ## CoreML contract (cache-aware streaming CTC step, att_context `[70,0]`) Inputs: - `processed_signal`: `Float32 [1, 80, 17]` - `cache_last_channel`: `Float32 [16, 1, 70, 256]` - `cache_last_time`: `Float32 [16, 1, 256, 8]` Outputs: - `logits`: `Float32 [1, 1, 1025]` - `cache_last_channel_next`: `Float32 [16, 1, 70, 256]` - `cache_last_time_next`: `Float32 [16, 1, 256, 8]` Streaming geometry: feature_frames per prediction = **17** (pre_encode_cache 9 + chunk 8), audio window **170 ms**, constant cache length **70**, `d_model=256`, `16` conformer layers, ×8 subsampling (80 ms/frame). ## Compatibility (Xcode `coremlc`) - model type: `MLModelType_neuralNetwork` - storage precision: `Float16` - specification version: `5` - availability: `iOS 14.0`, `macOS 11.0` ```bash coremlc compile shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel /tmp/out --deployment-target 15.0 --platform ios ``` ## Files - `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel` — fp16 NeuralNetwork model (~52 MB) - `tokens.json`, `preprocessor.json`, `mel_filters_slaney_80x257.json` — sidecars (ve_tok_v4, shared across the family) - `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16_manifest.json` — export manifest - `export_koochik10_streaming_coreml.py` — reproducible export script Tokenizer: ve_tok_v4 (SentencePiece BPE-1024 +blank, digit/punct/«»-aware). Numbers are emitted in **spoken form**; apply Persian ITN at display for digits. Part of [VisualEars / Shenava](https://shenava.app). Export stack: coremltools 9.0, torch 2.7.0, NeMo 2.7.3. fp16 vs fp32 argmax agreement: **1.000**.