Reza2kn's picture
Add CoreML iOS15 NeuralNetwork fp16 streaming (Rizeh v1.0)
51cb0c3 verified
|
Raw
History Blame Contribute Delete
3.4 kB
---
language:
- fa
license: cc-by-nc-4.0
library_name: coreml
pipeline_tag: automatic-speech-recognition
base_model:
- Reza2kn/Shenava-Rizeh-v1.0
base_model_relation: quantized
tags:
- coreml
- neuralnetwork
- ios
- ios15
- ios14
- persian
- farsi
- asr
- fastconformer
- streaming
- ctc
- fp16
- on-device
- shenava
- shenava-1
- visualears
---
# Shenava — Rizeh v1.0 (32M) · CoreML iOS15 NeuralNetwork fp16
CoreML **NeuralNetwork** (not ML Program) fp16 export of [`Reza2kn/Shenava-Rizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0) — built so **older Apple devices capped at iOS 15** (e.g. iPad Air 2 / iOS 15.8) can load and run it. ML Program packages require iOS 16+; this targets **NeuralNetwork / CoreML spec v5 with iOS 14 availability**, so it runs on iOS 15.
This is the **cache-aware streaming** step (one 170 ms prediction), the same kind of artifact as [`shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16`](https://huggingface.co/Reza2kn/shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16).
## The Shenava-1 family (CoreML iOS15)
- [`Shenava-Koochik-1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Koochik-1.0-CoreML-iOS15-fp16) — **Koochik 1.0** (114M) · teacher / flagship
- [`Shenava-Rizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh v1.0** (32M) · mid-tier
- [`Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16) — **Rizeh Pizeh v1.0** (6.9M) · tiniest
## Benchmark — fair WER/CER (parent model, decoded @ `[70,13]`)
| Member | golden-6669 WER | CER | FLEURS-fa WER | CER |
|---|---|---|---|---|
| **Rizeh v1.0 (32M)** | **12.11%** | 3.94% | **14.45%** | 5.10% |
## CoreML contract (cache-aware streaming CTC step, att_context `[70,0]`)
Inputs:
- `processed_signal`: `Float32 [1, 80, 17]`
- `cache_last_channel`: `Float32 [16, 1, 70, 256]`
- `cache_last_time`: `Float32 [16, 1, 256, 8]`
Outputs:
- `logits`: `Float32 [1, 1, 1025]`
- `cache_last_channel_next`: `Float32 [16, 1, 70, 256]`
- `cache_last_time_next`: `Float32 [16, 1, 256, 8]`
Streaming geometry: feature_frames per prediction = **17** (pre_encode_cache 9 + chunk 8), audio window **170 ms**, constant cache length **70**, `d_model=256`, `16` conformer layers, ×8 subsampling (80 ms/frame).
## Compatibility (Xcode `coremlc`)
- model type: `MLModelType_neuralNetwork`
- storage precision: `Float16`
- specification version: `5`
- availability: `iOS 14.0`, `macOS 11.0`
```bash
coremlc compile shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel /tmp/out --deployment-target 15.0 --platform ios
```
## Files
- `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel` — fp16 NeuralNetwork model (~52 MB)
- `tokens.json`, `preprocessor.json`, `mel_filters_slaney_80x257.json` — sidecars (ve_tok_v4, shared across the family)
- `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16_manifest.json` — export manifest
- `export_koochik10_streaming_coreml.py` — reproducible export script
Tokenizer: ve_tok_v4 (SentencePiece BPE-1024 +blank, digit/punct/«»-aware). Numbers are emitted in **spoken form**; apply Persian ITN at display for digits. Part of [VisualEars / Shenava](https://shenava.app).
Export stack: coremltools 9.0, torch 2.7.0, NeMo 2.7.3. fp16 vs fp32 argmax agreement: **1.000**.