Add SenseVoiceSmall CoreML (preprocessor fp32 + encoder fp16/ANE + fp32 fallback + vocab + card)
Browse files- README.md +117 -0
- SenseVoicePreprocessor.mlmodelc/analytics/coremldata.bin +3 -0
- SenseVoicePreprocessor.mlmodelc/coremldata.bin +3 -0
- SenseVoicePreprocessor.mlmodelc/model.mil +101 -0
- SenseVoicePreprocessor.mlmodelc/weights/weight.bin +3 -0
- SenseVoiceSmall.mlmodelc/analytics/coremldata.bin +3 -0
- SenseVoiceSmall.mlmodelc/coremldata.bin +3 -0
- SenseVoiceSmall.mlmodelc/model.mil +0 -0
- SenseVoiceSmall.mlmodelc/weights/weight.bin +3 -0
- SenseVoiceSmall_fp32.mlmodelc/analytics/coremldata.bin +3 -0
- SenseVoiceSmall_fp32.mlmodelc/coremldata.bin +3 -0
- SenseVoiceSmall_fp32.mlmodelc/model.mil +0 -0
- SenseVoiceSmall_fp32.mlmodelc/weights/weight.bin +3 -0
- vocab.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: sensevoice-upstream
|
| 4 |
+
license_link: https://github.com/FunAudioLLM/SenseVoice
|
| 5 |
+
language:
|
| 6 |
+
- zh
|
| 7 |
+
- en
|
| 8 |
+
- ja
|
| 9 |
+
- ko
|
| 10 |
+
- yue
|
| 11 |
+
library_name: coreml
|
| 12 |
+
tags:
|
| 13 |
+
- coreml
|
| 14 |
+
- ane
|
| 15 |
+
- speech-recognition
|
| 16 |
+
- sensevoice
|
| 17 |
+
- funasr
|
| 18 |
+
- fluidaudio
|
| 19 |
+
pipeline_tag: automatic-speech-recognition
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
# SenseVoiceSmall — CoreML (Apple Neural Engine)
|
| 23 |
+
|
| 24 |
+
CoreML conversion of [FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall)
|
| 25 |
+
for on-device inference on Apple Silicon, intended for
|
| 26 |
+
[FluidInference/FluidAudio](https://github.com/FluidInference/FluidAudio)
|
| 27 |
+
(tracks issues #645 / #646).
|
| 28 |
+
|
| 29 |
+
SenseVoiceSmall is a **non-autoregressive** multilingual ASR model (~234M params,
|
| 30 |
+
SANM encoder + single CTC head) covering 50+ languages, with emotion and
|
| 31 |
+
audio-event tags. One forward pass yields all output tokens.
|
| 32 |
+
|
| 33 |
+
## Files (3-stage pipeline)
|
| 34 |
+
|
| 35 |
+
| File | Precision | Compute unit | Role |
|
| 36 |
+
|------|-----------|--------------|------|
|
| 37 |
+
| `SenseVoicePreprocessor.mlmodelc` | FLOAT32 | CPU | front-end: waveform → 560-d LFR features |
|
| 38 |
+
| `SenseVoiceSmall.mlmodelc` | FLOAT16 | **`CPU_AND_NE` (ANE)** | **primary** encoder+CTC |
|
| 39 |
+
| `SenseVoiceSmall_fp32.mlmodelc` | FLOAT32 | any | encoder fallback (see limitation) |
|
| 40 |
+
| `vocab.json` | — | — | 25055 SentencePiece tokens (array form) |
|
| 41 |
+
|
| 42 |
+
Pipeline: `waveform → [Preprocessor, fp32/CPU] → features → [encoder+CTC, fp16/ANE] → logits → host greedy-CTC decode`.
|
| 43 |
+
|
| 44 |
+
> ⚠️ **Compute-unit requirement.** The FLOAT16 encoder is numerically correct on
|
| 45 |
+
> the **Neural Engine** but produces **NaN on the CPU/GPU fp16 path**. Load it
|
| 46 |
+
> with `MLModelConfiguration.computeUnits = .cpuAndNeuralEngine`. On hardware
|
| 47 |
+
> without ANE (or under ANE fallback), use `SenseVoiceSmall_fp32`. The
|
| 48 |
+
> preprocessor must run **fp32** (power-spectrum/log exceed fp16 range).
|
| 49 |
+
|
| 50 |
+
## I/O
|
| 51 |
+
|
| 52 |
+
**`SenseVoicePreprocessor`** — in: `waveform [1, N]` fp32 (16 kHz, scaled ×32768
|
| 53 |
+
like kaldi; flexible length). out: `features [1, T, 560]` fp32.
|
| 54 |
+
|
| 55 |
+
**`SenseVoiceSmall`** (encoder+CTC):
|
| 56 |
+
|
| 57 |
+
| name | shape | dtype | notes |
|
| 58 |
+
|------|-------|-------|-------|
|
| 59 |
+
| `speech` | `[1, T, 560]` | fp32 | preprocessor output; `T` ∈ enumerated buckets `[128,256,512,1024,1800]` (pad up) |
|
| 60 |
+
| `speech_lengths` | `[1]` | int32 | valid frame count (before padding) |
|
| 61 |
+
| `language` | `[1]` | int32 | embed index; `0` = auto |
|
| 62 |
+
| `textnorm` | `[1]` | int32 | `15` = no inverse text-norm (woitn), `14` = withitn |
|
| 63 |
+
|
| 64 |
+
**Output:** `ctc_logits` `[1, T+4, 25055]` — the 4 leading positions are the
|
| 65 |
+
language/emotion/event/itn query tokens; the rest are the transcript.
|
| 66 |
+
|
| 67 |
+
## Host pre/post-processing
|
| 68 |
+
|
| 69 |
+
**Pre:** handled by `SenseVoicePreprocessor` (kaldi fbank80 → LFR m=7,n=6 → CMVN,
|
| 70 |
+
matching FunASR `WavFrontend` to max|Δ|≈2e-5). Pad its output up to the smallest
|
| 71 |
+
encoder bucket ≥ `T`.
|
| 72 |
+
|
| 73 |
+
**Post (decode):** greedy CTC over `ctc_logits` → collapse repeats → drop blank
|
| 74 |
+
(id 0) → SentencePiece detokenize → strip `<|...|>` tags for the clean
|
| 75 |
+
transcript. Reference Python in the repo's `decode.py`.
|
| 76 |
+
|
| 77 |
+
`language`/`textnorm` are **embed indices**, mapped on the host:
|
| 78 |
+
```
|
| 79 |
+
lid_int_dict = {24884:3, 24885:4, 24888:7, 24892:11, 24896:12, 24992:13} # <|zh|> etc -> embed idx
|
| 80 |
+
textnorm_int_dict = {25016:14, 25017:15}
|
| 81 |
+
# language not in dict -> 0 (auto)
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## Verification & benchmarks
|
| 85 |
+
|
| 86 |
+
Conversion = PyTorch (FunASR) → `torch.jit.trace` → coremltools (FLOAT16,
|
| 87 |
+
`EnumeratedShapes`, iOS17). Measured on this machine (M-series), FunASR 1.3.9 /
|
| 88 |
+
coremltools 8.3.
|
| 89 |
+
|
| 90 |
+
- **End-to-end correctness:** on the cached zh sample, the CoreML(ANE) →
|
| 91 |
+
greedy-CTC pipeline reproduces FunASR `am.generate` **exactly**:
|
| 92 |
+
`<|zh|><|NEUTRAL|><|Speech|><|woitn|>欢迎大家来体验达摩院推出的语音识别模型`
|
| 93 |
+
- **Parity (torch ↔ CoreML, ANE):** CTC argmax token agreement **100%** on real audio.
|
| 94 |
+
- **LibriSpeech test-clean (canonical — matches the official chart):** CoreML(ANE)
|
| 95 |
+
**3.21% WER** (torch 3.26%) on n=100 vs the published SenseVoice-Small **~3.1%**.
|
| 96 |
+
Confirms the full pipeline (front-end + CoreML + decode) reproduces the paper.
|
| 97 |
+
(Full 2620-utt split number: see repo README.)
|
| 98 |
+
- **FLEURS WER (CoreML ANE vs torch), 100 samples/lang — conversion is accuracy-neutral:**
|
| 99 |
+
|
| 100 |
+
| lang | torch | CoreML (ANE) | Δ | RTFx |
|
| 101 |
+
|------|-------|--------------|---|------|
|
| 102 |
+
| en_us (WER) | 9.52% | 9.52% | +0.00pp | 402 |
|
| 103 |
+
| cmn_hans_cn (CER) | 9.60% | 9.57% | −0.03pp | 372 |
|
| 104 |
+
|
| 105 |
+
> FLEURS is a harder/different read-speech set than LibriSpeech/Aishell — its
|
| 106 |
+
> absolute numbers are not comparable to the official benchmark chart; it's
|
| 107 |
+
> used here only for cross-language CoreML↔torch parity.
|
| 108 |
+
|
| 109 |
+
- **RTFx (5.55 s clip, by bucket, ANE):** 128→524, 256→274, 512→97, 1024→36, 1800→14.5.
|
| 110 |
+
(M-series; iPhone ANE not yet measured.)
|
| 111 |
+
|
| 112 |
+
## License & attribution
|
| 113 |
+
|
| 114 |
+
Weights derive from [FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall);
|
| 115 |
+
the upstream model license applies. This repo only contains a format conversion
|
| 116 |
+
(no retraining). See the [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
|
| 117 |
+
and [FunASR](https://github.com/modelscope/FunASR) projects.
|
SenseVoicePreprocessor.mlmodelc/analytics/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5bdb0b132e48c7e852ec18eeba7e217b6cb7153e6a939ce76b5ed17242e956dd
|
| 3 |
+
size 243
|
SenseVoicePreprocessor.mlmodelc/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e64cc73b2a9b01bad799a23874bc20dba3cf3342c23e3f60012c3e884f682944
|
| 3 |
+
size 330
|
SenseVoicePreprocessor.mlmodelc/model.mil
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
program(1.0)
|
| 2 |
+
[buildInfo = dict<tensor<string, []>, tensor<string, []>>({{"coremlc-component-MIL", "3520.4.1"}, {"coremlc-version", "3520.5.1"}, {"coremltools-component-torch", "2.5.1"}, {"coremltools-source-dialect", "TorchScript"}, {"coremltools-version", "8.3.0"}})]
|
| 3 |
+
{
|
| 4 |
+
func main<ios17>(tensor<fp32, [1, ?]> waveform) [FlexibleShapeInformation = tuple<tuple<tensor<string, []>, dict<tensor<string, []>, tensor<int32, [?]>>>, tuple<tensor<string, []>, dict<tensor<string, []>, list<tensor<int32, [2]>, ?>>>>((("DefaultShapes", {{"waveform", [1, 88747]}}), ("RangeDims", {{"waveform", [[1, 1], [3200, 480000]]}})))] {
|
| 5 |
+
tensor<fp32, [560]> cmvn_inv_std = const()[name = tensor<string, []>("cmvn_inv_std"), val = tensor<fp32, [560]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(64)))];
|
| 6 |
+
tensor<fp32, [560]> cmvn_neg_mean = const()[name = tensor<string, []>("cmvn_neg_mean"), val = tensor<fp32, [560]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(2368)))];
|
| 7 |
+
tensor<fp32, [560, 80, 7]> lfr_kernel = const()[name = tensor<string, []>("lfr_kernel"), val = tensor<fp32, [560, 80, 7]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4672)))];
|
| 8 |
+
tensor<fp32, [400]> window = const()[name = tensor<string, []>("window"), val = tensor<fp32, [400]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1259136)))];
|
| 9 |
+
tensor<fp32, [400, 1, 400]> frame_kernel = const()[name = tensor<string, []>("frame_kernel"), val = tensor<fp32, [400, 1, 400]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1260800)))];
|
| 10 |
+
tensor<int32, [1]> var_11_axes_0 = const()[name = tensor<string, []>("op_11_axes_0"), val = tensor<int32, [1]>([1])];
|
| 11 |
+
tensor<fp32, [1, 1, ?]> var_11 = expand_dims(axes = var_11_axes_0, x = waveform)[name = tensor<string, []>("op_11")];
|
| 12 |
+
tensor<string, []> var_27_pad_type_0 = const()[name = tensor<string, []>("op_27_pad_type_0"), val = tensor<string, []>("valid")];
|
| 13 |
+
tensor<int32, [1]> var_27_strides_0 = const()[name = tensor<string, []>("op_27_strides_0"), val = tensor<int32, [1]>([160])];
|
| 14 |
+
tensor<int32, [2]> var_27_pad_0 = const()[name = tensor<string, []>("op_27_pad_0"), val = tensor<int32, [2]>([0, 0])];
|
| 15 |
+
tensor<int32, [1]> var_27_dilations_0 = const()[name = tensor<string, []>("op_27_dilations_0"), val = tensor<int32, [1]>([1])];
|
| 16 |
+
tensor<int32, []> var_27_groups_0 = const()[name = tensor<string, []>("op_27_groups_0"), val = tensor<int32, []>(1)];
|
| 17 |
+
tensor<fp32, [1, 400, ?]> var_27 = conv(dilations = var_27_dilations_0, groups = var_27_groups_0, pad = var_27_pad_0, pad_type = var_27_pad_type_0, strides = var_27_strides_0, weight = frame_kernel, x = var_11)[name = tensor<string, []>("op_27")];
|
| 18 |
+
tensor<int32, [3]> var_30_begin_0 = const()[name = tensor<string, []>("op_30_begin_0"), val = tensor<int32, [3]>([0, 0, 0])];
|
| 19 |
+
tensor<int32, [3]> var_30_end_0 = const()[name = tensor<string, []>("op_30_end_0"), val = tensor<int32, [3]>([1, 400, 0])];
|
| 20 |
+
tensor<bool, [3]> var_30_end_mask_0 = const()[name = tensor<string, []>("op_30_end_mask_0"), val = tensor<bool, [3]>([false, true, true])];
|
| 21 |
+
tensor<bool, [3]> var_30_squeeze_mask_0 = const()[name = tensor<string, []>("op_30_squeeze_mask_0"), val = tensor<bool, [3]>([true, false, false])];
|
| 22 |
+
tensor<fp32, [400, ?]> var_30 = slice_by_index(begin = var_30_begin_0, end = var_30_end_0, end_mask = var_30_end_mask_0, squeeze_mask = var_30_squeeze_mask_0, x = var_27)[name = tensor<string, []>("op_30")];
|
| 23 |
+
tensor<int32, [2]> frames_1_perm_0 = const()[name = tensor<string, []>("frames_1_perm_0"), val = tensor<int32, [2]>([1, 0])];
|
| 24 |
+
tensor<int32, [1]> var_36_axes_0 = const()[name = tensor<string, []>("op_36_axes_0"), val = tensor<int32, [1]>([1])];
|
| 25 |
+
tensor<bool, []> var_36_keep_dims_0 = const()[name = tensor<string, []>("op_36_keep_dims_0"), val = tensor<bool, []>(true)];
|
| 26 |
+
tensor<fp32, [?, 400]> frames_1 = transpose(perm = frames_1_perm_0, x = var_30)[name = tensor<string, []>("transpose_5")];
|
| 27 |
+
tensor<fp32, [?, 1]> var_36 = reduce_mean(axes = var_36_axes_0, keep_dims = var_36_keep_dims_0, x = frames_1)[name = tensor<string, []>("op_36")];
|
| 28 |
+
tensor<fp32, [?, 400]> frames_3 = sub(x = frames_1, y = var_36)[name = tensor<string, []>("frames_3")];
|
| 29 |
+
tensor<int32, [2]> var_48_begin_0 = const()[name = tensor<string, []>("op_48_begin_0"), val = tensor<int32, [2]>([0, 0])];
|
| 30 |
+
tensor<int32, [2]> var_48_end_0 = const()[name = tensor<string, []>("op_48_end_0"), val = tensor<int32, [2]>([0, 1])];
|
| 31 |
+
tensor<bool, [2]> var_48_end_mask_0 = const()[name = tensor<string, []>("op_48_end_mask_0"), val = tensor<bool, [2]>([true, false])];
|
| 32 |
+
tensor<fp32, [?, 1]> var_48 = slice_by_index(begin = var_48_begin_0, end = var_48_end_0, end_mask = var_48_end_mask_0, x = frames_3)[name = tensor<string, []>("op_48")];
|
| 33 |
+
tensor<int32, [2]> var_58_begin_0 = const()[name = tensor<string, []>("op_58_begin_0"), val = tensor<int32, [2]>([0, 0])];
|
| 34 |
+
tensor<int32, [2]> var_58_end_0 = const()[name = tensor<string, []>("op_58_end_0"), val = tensor<int32, [2]>([0, 399])];
|
| 35 |
+
tensor<bool, [2]> var_58_end_mask_0 = const()[name = tensor<string, []>("op_58_end_mask_0"), val = tensor<bool, [2]>([true, false])];
|
| 36 |
+
tensor<fp32, [?, 399]> var_58 = slice_by_index(begin = var_58_begin_0, end = var_58_end_0, end_mask = var_58_end_mask_0, x = frames_3)[name = tensor<string, []>("op_58")];
|
| 37 |
+
tensor<int32, []> var_60 = const()[name = tensor<string, []>("op_60"), val = tensor<int32, []>(1)];
|
| 38 |
+
tensor<bool, []> shifted_interleave_0 = const()[name = tensor<string, []>("shifted_interleave_0"), val = tensor<bool, []>(false)];
|
| 39 |
+
tensor<fp32, [?, 400]> shifted = concat(axis = var_60, interleave = shifted_interleave_0, values = (var_48, var_58))[name = tensor<string, []>("shifted")];
|
| 40 |
+
tensor<fp32, []> var_62 = const()[name = tensor<string, []>("op_62"), val = tensor<fp32, []>(0x1.f0a3d8p-1)];
|
| 41 |
+
tensor<fp32, [?, 400]> var_63 = mul(x = shifted, y = var_62)[name = tensor<string, []>("op_63")];
|
| 42 |
+
tensor<fp32, [?, 400]> frames_5 = sub(x = frames_3, y = var_63)[name = tensor<string, []>("frames_5")];
|
| 43 |
+
tensor<fp32, [?, 400]> input = mul(x = frames_5, y = window)[name = tensor<string, []>("input")];
|
| 44 |
+
tensor<fp32, []> const_0 = const()[name = tensor<string, []>("const_0"), val = tensor<fp32, []>(0x0p+0)];
|
| 45 |
+
tensor<int32, [4]> frames_pad_0 = const()[name = tensor<string, []>("frames_pad_0"), val = tensor<int32, [4]>([0, 0, 0, 112])];
|
| 46 |
+
tensor<string, []> frames_mode_0 = const()[name = tensor<string, []>("frames_mode_0"), val = tensor<string, []>("constant")];
|
| 47 |
+
tensor<fp32, [?, 512]> frames = pad(constant_val = const_0, mode = frames_mode_0, pad = frames_pad_0, x = input)[name = tensor<string, []>("frames")];
|
| 48 |
+
tensor<fp32, [257, 512]> transpose_0 = const()[name = tensor<string, []>("transpose_0"), val = tensor<fp32, [257, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1900864)))];
|
| 49 |
+
tensor<fp32, [257]> re_bias_0 = const()[name = tensor<string, []>("re_bias_0"), val = tensor<fp32, [257]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(2427264)))];
|
| 50 |
+
tensor<fp32, [?, 257]> re = linear(bias = re_bias_0, weight = transpose_0, x = frames)[name = tensor<string, []>("re")];
|
| 51 |
+
tensor<fp32, [257, 512]> transpose_1 = const()[name = tensor<string, []>("transpose_1"), val = tensor<fp32, [257, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(2428416)))];
|
| 52 |
+
tensor<fp32, [?, 257]> im = linear(bias = re_bias_0, weight = transpose_1, x = frames)[name = tensor<string, []>("im")];
|
| 53 |
+
tensor<fp32, [?, 257]> var_75 = mul(x = re, y = re)[name = tensor<string, []>("op_75")];
|
| 54 |
+
tensor<fp32, [?, 257]> var_76 = mul(x = im, y = im)[name = tensor<string, []>("op_76")];
|
| 55 |
+
tensor<fp32, [?, 257]> power = add(x = var_75, y = var_76)[name = tensor<string, []>("power")];
|
| 56 |
+
tensor<fp32, [80, 257]> transpose_2 = const()[name = tensor<string, []>("transpose_2"), val = tensor<fp32, [80, 257]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(2954816)))];
|
| 57 |
+
tensor<fp32, [80]> mel_energies_bias_0 = const()[name = tensor<string, []>("mel_energies_bias_0"), val = tensor<fp32, [80]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3037120)))];
|
| 58 |
+
tensor<fp32, [?, 80]> mel_energies = linear(bias = mel_energies_bias_0, weight = transpose_2, x = power)[name = tensor<string, []>("mel_energies")];
|
| 59 |
+
tensor<fp32, []> var_80 = const()[name = tensor<string, []>("op_80"), val = tensor<fp32, []>(0x1p-23)];
|
| 60 |
+
tensor<fp32, []> const_1 = const()[name = tensor<string, []>("const_1"), val = tensor<fp32, []>(0x1.fffffep+127)];
|
| 61 |
+
tensor<fp32, [?, 80]> clip_0 = clip(alpha = var_80, beta = const_1, x = mel_energies)[name = tensor<string, []>("clip_0")];
|
| 62 |
+
tensor<fp32, []> fbank_1_epsilon_0 = const()[name = tensor<string, []>("fbank_1_epsilon_0"), val = tensor<fp32, []>(0x1p-149)];
|
| 63 |
+
tensor<fp32, [?, 80]> fbank_1 = log(epsilon = fbank_1_epsilon_0, x = clip_0)[name = tensor<string, []>("fbank_1")];
|
| 64 |
+
tensor<int32, [2]> var_88_begin_0 = const()[name = tensor<string, []>("op_88_begin_0"), val = tensor<int32, [2]>([0, 0])];
|
| 65 |
+
tensor<int32, [2]> var_88_end_0 = const()[name = tensor<string, []>("op_88_end_0"), val = tensor<int32, [2]>([1, 80])];
|
| 66 |
+
tensor<bool, [2]> var_88_end_mask_0 = const()[name = tensor<string, []>("op_88_end_mask_0"), val = tensor<bool, [2]>([false, true])];
|
| 67 |
+
tensor<fp32, [1, 80]> var_88 = slice_by_index(begin = var_88_begin_0, end = var_88_end_0, end_mask = var_88_end_mask_0, x = fbank_1)[name = tensor<string, []>("op_88")];
|
| 68 |
+
tensor<int32, [2]> var_91 = const()[name = tensor<string, []>("op_91"), val = tensor<int32, [2]>([3, 1])];
|
| 69 |
+
tensor<fp32, [3, 80]> var_92 = tile(reps = var_91, x = var_88)[name = tensor<string, []>("op_92")];
|
| 70 |
+
tensor<int32, [2]> var_97_begin_0 = const()[name = tensor<string, []>("op_97_begin_0"), val = tensor<int32, [2]>([-1, 0])];
|
| 71 |
+
tensor<int32, [2]> var_97_end_0 = const()[name = tensor<string, []>("op_97_end_0"), val = tensor<int32, [2]>([0, 80])];
|
| 72 |
+
tensor<bool, [2]> var_97_end_mask_0 = const()[name = tensor<string, []>("op_97_end_mask_0"), val = tensor<bool, [2]>([true, true])];
|
| 73 |
+
tensor<fp32, [1, 80]> var_97 = slice_by_index(begin = var_97_begin_0, end = var_97_end_0, end_mask = var_97_end_mask_0, x = fbank_1)[name = tensor<string, []>("op_97")];
|
| 74 |
+
tensor<int32, [2]> var_100 = const()[name = tensor<string, []>("op_100"), val = tensor<int32, [2]>([7, 1])];
|
| 75 |
+
tensor<fp32, [7, 80]> var_101 = tile(reps = var_100, x = var_97)[name = tensor<string, []>("op_101")];
|
| 76 |
+
tensor<int32, []> var_103 = const()[name = tensor<string, []>("op_103"), val = tensor<int32, []>(0)];
|
| 77 |
+
tensor<bool, []> fbank_interleave_0 = const()[name = tensor<string, []>("fbank_interleave_0"), val = tensor<bool, []>(false)];
|
| 78 |
+
tensor<fp32, [?, 80]> fbank = concat(axis = var_103, interleave = fbank_interleave_0, values = (var_92, fbank_1, var_101))[name = tensor<string, []>("fbank")];
|
| 79 |
+
tensor<int32, [2]> var_105_perm_0 = const()[name = tensor<string, []>("op_105_perm_0"), val = tensor<int32, [2]>([1, 0])];
|
| 80 |
+
tensor<int32, [1]> var_107_axes_0 = const()[name = tensor<string, []>("op_107_axes_0"), val = tensor<int32, [1]>([0])];
|
| 81 |
+
tensor<fp32, [80, ?]> var_105 = transpose(perm = var_105_perm_0, x = fbank)[name = tensor<string, []>("transpose_4")];
|
| 82 |
+
tensor<fp32, [1, 80, ?]> var_107 = expand_dims(axes = var_107_axes_0, x = var_105)[name = tensor<string, []>("op_107")];
|
| 83 |
+
tensor<string, []> var_123_pad_type_0 = const()[name = tensor<string, []>("op_123_pad_type_0"), val = tensor<string, []>("valid")];
|
| 84 |
+
tensor<int32, [1]> var_123_strides_0 = const()[name = tensor<string, []>("op_123_strides_0"), val = tensor<int32, [1]>([6])];
|
| 85 |
+
tensor<int32, [2]> var_123_pad_0 = const()[name = tensor<string, []>("op_123_pad_0"), val = tensor<int32, [2]>([0, 0])];
|
| 86 |
+
tensor<int32, [1]> var_123_dilations_0 = const()[name = tensor<string, []>("op_123_dilations_0"), val = tensor<int32, [1]>([1])];
|
| 87 |
+
tensor<int32, []> var_123_groups_0 = const()[name = tensor<string, []>("op_123_groups_0"), val = tensor<int32, []>(1)];
|
| 88 |
+
tensor<fp32, [1, 560, ?]> var_123 = conv(dilations = var_123_dilations_0, groups = var_123_groups_0, pad = var_123_pad_0, pad_type = var_123_pad_type_0, strides = var_123_strides_0, weight = lfr_kernel, x = var_107)[name = tensor<string, []>("op_123")];
|
| 89 |
+
tensor<int32, [3]> var_126_begin_0 = const()[name = tensor<string, []>("op_126_begin_0"), val = tensor<int32, [3]>([0, 0, 0])];
|
| 90 |
+
tensor<int32, [3]> var_126_end_0 = const()[name = tensor<string, []>("op_126_end_0"), val = tensor<int32, [3]>([1, 560, 0])];
|
| 91 |
+
tensor<bool, [3]> var_126_end_mask_0 = const()[name = tensor<string, []>("op_126_end_mask_0"), val = tensor<bool, [3]>([false, true, true])];
|
| 92 |
+
tensor<bool, [3]> var_126_squeeze_mask_0 = const()[name = tensor<string, []>("op_126_squeeze_mask_0"), val = tensor<bool, [3]>([true, false, false])];
|
| 93 |
+
tensor<fp32, [560, ?]> var_126 = slice_by_index(begin = var_126_begin_0, end = var_126_end_0, end_mask = var_126_end_mask_0, squeeze_mask = var_126_squeeze_mask_0, x = var_123)[name = tensor<string, []>("op_126")];
|
| 94 |
+
tensor<int32, [2]> lfr_perm_0 = const()[name = tensor<string, []>("lfr_perm_0"), val = tensor<int32, [2]>([1, 0])];
|
| 95 |
+
tensor<fp32, [?, 560]> lfr = transpose(perm = lfr_perm_0, x = var_126)[name = tensor<string, []>("transpose_3")];
|
| 96 |
+
tensor<fp32, [?, 560]> var_129 = add(x = lfr, y = cmvn_neg_mean)[name = tensor<string, []>("op_129")];
|
| 97 |
+
tensor<fp32, [?, 560]> feats = mul(x = var_129, y = cmvn_inv_std)[name = tensor<string, []>("feats")];
|
| 98 |
+
tensor<int32, [1]> var_132_axes_0 = const()[name = tensor<string, []>("op_132_axes_0"), val = tensor<int32, [1]>([0])];
|
| 99 |
+
tensor<fp32, [1, ?, 560]> features = expand_dims(axes = var_132_axes_0, x = feats)[name = tensor<string, []>("op_132")];
|
| 100 |
+
} -> (features);
|
| 101 |
+
}
|
SenseVoicePreprocessor.mlmodelc/weights/weight.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:69c630a115da5e4db36ec41662f0b776c0ef33ec6776d86f8cdaaba022518396
|
| 3 |
+
size 3037504
|
SenseVoiceSmall.mlmodelc/analytics/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2dd2919d1ef534ecd4d0c9843dea078b0ad337e0918e692d9811cb16a31fb02b
|
| 3 |
+
size 243
|
SenseVoiceSmall.mlmodelc/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8af6326236369150e5540e15996877a71b281e98cb9ede6b646c2f4b3d9be88c
|
| 3 |
+
size 436
|
SenseVoiceSmall.mlmodelc/model.mil
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
SenseVoiceSmall.mlmodelc/weights/weight.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f435f29513464bcda175e449fd72e28ef5183b963f116394a38eadbbc12ca694
|
| 3 |
+
size 468060094
|
SenseVoiceSmall_fp32.mlmodelc/analytics/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09bdfe5eee1fd3cc70fc39e1e144ede5118e138c3c2dd52a2822d0d72fbb91f8
|
| 3 |
+
size 243
|
SenseVoiceSmall_fp32.mlmodelc/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ba5a1b5d9bf9b1b85ef2d1f69717e1f4424cc72e7316fc3edb0b604e449f9919
|
| 3 |
+
size 396
|
SenseVoiceSmall_fp32.mlmodelc/model.mil
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
SenseVoiceSmall_fp32.mlmodelc/weights/weight.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:62919f3a37419a1e4ede3763d6efcf2ae9ed320e6bd9fb4a37d2b15ef891b92d
|
| 3 |
+
size 940100992
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|