README.md · Reza2kn/Shenava-Rizeh-v1.0-CoreML-iOS15-fp16 at main

Shenava-Rizeh-v1.0-CoreML-iOS15-fp16 / README.md

Reza2kn

Add CoreML iOS15 NeuralNetwork fp16 streaming (Rizeh v1.0)

51cb0c3 verified 6 days ago

preview code

Raw

History Blame Contribute Delete

3.4 kB

	---
	language:
	- fa
	license: cc-by-nc-4.0
	library_name: coreml
	pipeline_tag: automatic-speech-recognition
	base_model:
	- Reza2kn/Shenava-Rizeh-v1.0
	base_model_relation: quantized
	tags:
	- coreml
	- neuralnetwork
	- ios
	- ios15
	- ios14
	- persian
	- farsi
	- asr
	- fastconformer
	- streaming
	- ctc
	- fp16
	- on-device
	- shenava
	- shenava-1
	- visualears
	---

	# Shenava — Rizeh v1.0 (32M) · CoreML iOS15 NeuralNetwork fp16

	CoreML NeuralNetwork (not ML Program) fp16 export of [`Reza2kn/Shenava-Rizeh-v1.0`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0) — built so older Apple devices capped at iOS 15 (e.g. iPad Air 2 / iOS 15.8) can load and run it. ML Program packages require iOS 16+; this targets NeuralNetwork / CoreML spec v5 with iOS 14 availability, so it runs on iOS 15.

	This is the cache-aware streaming step (one 170 ms prediction), the same kind of artifact as [`shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16`](https://huggingface.co/Reza2kn/shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16).

	## The Shenava-1 family (CoreML iOS15)

	- [`Shenava-Koochik-1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Koochik-1.0-CoreML-iOS15-fp16) — Koochik 1.0 (114M) · teacher / flagship
	- [`Shenava-Rizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-v1.0-CoreML-iOS15-fp16) — Rizeh v1.0 (32M) · mid-tier
	- [`Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16`](https://huggingface.co/Reza2kn/Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16) — Rizeh Pizeh v1.0 (6.9M) · tiniest

	## Benchmark — fair WER/CER (parent model, decoded @ `[70,13]`)

	\| Member \| golden-6669 WER \| CER \| FLEURS-fa WER \| CER \|
	\|---\|---\|---\|---\|---\|
	\| Rizeh v1.0 (32M) \| 12.11% \| 3.94% \| 14.45% \| 5.10% \|

	## CoreML contract (cache-aware streaming CTC step, att_context `[70,0]`)

	Inputs:
	- `processed_signal`: `Float32 [1, 80, 17]`
	- `cache_last_channel`: `Float32 [16, 1, 70, 256]`
	- `cache_last_time`: `Float32 [16, 1, 256, 8]`

	Outputs:
	- `logits`: `Float32 [1, 1, 1025]`
	- `cache_last_channel_next`: `Float32 [16, 1, 70, 256]`
	- `cache_last_time_next`: `Float32 [16, 1, 256, 8]`

	Streaming geometry: feature_frames per prediction = 17 (pre_encode_cache 9 + chunk 8), audio window 170 ms, constant cache length 70, `d_model=256`, `16` conformer layers, ×8 subsampling (80 ms/frame).

	## Compatibility (Xcode `coremlc`)

	- model type: `MLModelType_neuralNetwork`
	- storage precision: `Float16`
	- specification version: `5`
	- availability: `iOS 14.0`, `macOS 11.0`

	```bash
	coremlc compile shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel /tmp/out --deployment-target 15.0 --platform ios
	```

	## Files

	- `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel` — fp16 NeuralNetwork model (~52 MB)
	- `tokens.json`, `preprocessor.json`, `mel_filters_slaney_80x257.json` — sidecars (ve_tok_v4, shared across the family)
	- `shenava_rizeh_v1_0_ctc_streaming_att70_0_ios15_fp16_manifest.json` — export manifest
	- `export_koochik10_streaming_coreml.py` — reproducible export script

	Tokenizer: ve_tok_v4 (SentencePiece BPE-1024 +blank, digit/punct/«»-aware). Numbers are emitted in spoken form; apply Persian ITN at display for digits. Part of [VisualEars / Shenava](https://shenava.app).

	Export stack: coremltools 9.0, torch 2.7.0, NeMo 2.7.3. fp16 vs fp32 argmax agreement: 1.000.