File size: 3,430 Bytes
4e6f6d4
 
 
 
 
 
 
 
640f759
 
 
 
 
4e6f6d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Kokoro CoreML (HAR-Optimized)

High-performance **Kokoro TTS CoreML conversion** with Apple Neural Engine (ANE) optimized HAR decoder buckets.

This repository contains precompiled `.mlpackage` models for fast on-device speech synthesis on Apple platforms.

---

*based on this this open source project*
https://github.com/mattmireles/kokoro-coreml



## πŸ“¦ Included Models

### 🧠 Duration Model (Stage 1)

- `kokoro_duration.mlpackage`

Handles variable-length text and predicts phoneme durations + intermediate features.

---

### πŸ”Š HAR Decoder Buckets (Stage 2 – ANE Optimized)

Fixed-size audio synthesis models:

- `KokoroDecoder_HAR_1s.mlpackage`
- `KokoroDecoder_HAR_2s.mlpackage`
- `KokoroDecoder_HAR_3s.mlpackage`
- `KokoroDecoder_HAR_5s.mlpackage`
- `KokoroDecoder_HAR_8s.mlpackage`
- `KokoroDecoder_HAR_10s.mlpackage`
- `KokoroDecoder_HAR_15s.mlpackage`
- `KokoroDecoder_HAR_20s.mlpackage`
- `KokoroDecoder_HAR.mlpackage`

---

### πŸ” Decoder-Only Variants

- `kokoro_decoder_only_3s.mlpackage`
- `kokoro_decoder_only_5s.mlpackage`
- `kokoro_decoder_only_10s.mlpackage`

---

### πŸŽ› F0 / Feature Variants

- `kokoro_f0n_3s.mlpackage`
- `kokoro_f0n_5s.mlpackage`
- `kokoro_f0n_10s.mlpackage`

---

### πŸ”Š Vocoder Variants

- `KokoroVocoder.mlpackage`
- `KokoroVocoder_asr64_f0128.mlpackage`
- `KokoroVocoder_asr80_f0160.mlpackage`
- `KokoroVocoder_asr96_f0192.mlpackage`
- `KokoroVocoder_asr128_f0256.mlpackage`
- `KokoroVocoder_asr160_f0320.mlpackage`
- `KokoroVocoder_asr200_f0400.mlpackage`

---

### πŸ§ͺ Experimental / Alternative

- `kokoro_synthesizer_3s.mlpackage`
- `kokoro_synthesizer_3s_nolstm.mlpackage`
- `StyleTTS2_iSTFTNet_Decoder.mlpackage`

---

# πŸ“ Architecture

This CoreML conversion uses a **two-stage pipeline** to support Kokoro’s dynamic operations while maximizing ANE performance.

---

## Stage 1 β€” Duration Model (CPU/GPU)

**Input:** Variable-length text (`ct.RangeDim`)  
**Process:** Transformer + LSTM duration prediction  
**Output:** Phoneme durations + intermediate features  
**Compute:** CPU / GPU  

Why CPU?
- LSTM layers are not ANE-compatible
- Dynamic shape text processing

---

## Stage 2 β€” HAR Decoder (ANE Optimized)

**Input:**
- Features from duration model  
- Alignment matrix (built client-side)

**Process:**  
Vocoder synthesis using iSTFTNet architecture

**Output:**  
24kHz waveform audio

**Compute:**  
Apple Neural Engine

---

# πŸš€ Key Innovations

- **HAR Processing** – Harmonic/phase separation for ANE efficiency  
- **Fixed-size Buckets** – Avoid CoreML dynamic shape issues  
- **Client-side Alignment** – Swift/Python builds alignment matrix  
- **On-demand Model Loading** – Memory optimized  
- **MIL Graph Patching** – CoreML compatibility fixes  

---

# ⚑ Performance

### Runs on ANE (HAR Models)

- Conv1D  
- ConvTranspose1D  
- LeakyReLU  
- Element-wise ops  

**Result:** ~17Γ— faster than real-time synthesis

---

### Runs on CPU/GPU (Duration Model)

- LSTM layers  
- Transformer attention  
- AdaLayerNorm  
- Dynamic shape processing  

---

# 🧠 Production Optimizations

- Bucket auto-selection  
- ~200MB per loaded model  
- Warm-up optimization  
- Graceful bucket fallback  
- Memory cleanup during idle  

---

# πŸ“₯ Downloading

Because `.mlpackage` is a folder, download using Hugging Face CLI:

```bash
huggingface-cli download <username>/<repo> --local-dir .

---
license: mit
---