FluidInference
/

sensevoice-small-coreml

@@ -32,12 +32,17 @@ audio-event tags. One forward pass yields all output tokens.
 ## Files (3-stage pipeline)
-| File | Precision | Compute unit | Role |
-|------|-----------|--------------|------|
-| `SenseVoicePreprocessor.mlmodelc` | FLOAT32 | CPU | front-end: waveform → 560-d LFR features |
-| `SenseVoiceSmall.mlmodelc` | FLOAT16 | **`CPU_AND_NE` (ANE)** | **primary** encoder+CTC |
-| `SenseVoiceSmall_fp32.mlmodelc` | FLOAT32 | any | encoder fallback (see limitation) |
-| `vocab.json` | — | — | 25055 SentencePiece tokens (array form) |
 Pipeline: `waveform → [Preprocessor, fp32/CPU] → features → [encoder+CTC, fp16/ANE] → logits → host greedy-CTC decode`.

 ## Files (3-stage pipeline)
+| File | Precision | Compute unit | Size | Role |
+|------|-----------|--------------|------|------|
+| `SenseVoicePreprocessor.mlmodelc` | FLOAT32 | CPU | 3 MB | front-end: waveform → 560-d LFR features |
+| `SenseVoiceSmall.mlmodelc` | FLOAT16 | **`CPU_AND_NE` (ANE)** | 447 MB | **default** encoder+CTC |
+| `SenseVoiceSmall_int8.mlmodelc` | INT8 (weights) | `CPU_AND_NE` (ANE) | 225 MB | ~half size, accuracy-neutral |
+| `SenseVoiceSmall_fp32.mlmodelc` | FLOAT32 | any | 897 MB | encoder fallback (non-ANE) |
+| `vocab.json` | — | — | — | 25055 SentencePiece tokens (array form) |
+**int8** is post-training weight quantization (`linear_symmetric`), accuracy-neutral
+vs fp16 on the full canonical sets: LibriSpeech WER 3.27→3.22%, AISHELL CER 3.40→3.43%
+(Δ ≤ 0.05 pp, 0 NaN on ANE). Pick it for ~half the on-disk/memory footprint.
 Pipeline: `waveform → [Preprocessor, fp32/CPU] → features → [encoder+CTC, fp16/ANE] → logits → host greedy-CTC decode`.