Files changed (1) hide show
  1. README.md +263 -0
README.md ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-to-speech
7
+ - tts
8
+ - mcp
9
+ - model-context-protocol
10
+ - onnx
11
+ - windows
12
+ pipeline_tag: text-to-speech
13
+ ---
14
+
15
+ # KittenTTS MCP Server
16
+
17
+ This project packages the **KittenTTS nano v0.8** model as a local **Model Context Protocol (MCP)** server for Codex-compatible clients on Windows.
18
+
19
+ It is not a general training repository and it is not a hosted inference endpoint. It is a small native C++ server that runs over **stdio**, loads a fixed local ONNX model, and exposes a simple tool interface for text-to-speech:
20
+
21
+ - `speak`
22
+ - `stop_speaking`
23
+ - `list_voices`
24
+
25
+ The current build is fixed to:
26
+
27
+ - Model: `kitten_tts_nano_v0_8.onnx`
28
+ - Voice metadata: `voices_nano.json`
29
+ - Default voice: `Jasper`
30
+ - Default locale: `en-us`
31
+ - Default speed: `1.0`
32
+
33
+ ## What This Repository Is For
34
+
35
+ Use this project when you want a local TTS backend that an MCP client can call as a tool.
36
+
37
+ Typical use cases:
38
+
39
+ - adding spoken responses to a coding agent
40
+ - local desktop TTS without a network service
41
+ - integrating KittenTTS into Codex through MCP
42
+ - exposing a small, predictable voice tool surface to another application
43
+
44
+ This server communicates with the client using JSON-RPC 2.0 over standard input/output and keeps logs on `stderr`, which is the expected pattern for local MCP servers.
45
+
46
+ ## Model And Runtime
47
+
48
+ This MCP server wraps the **KittenTTS nano** model and initializes a local `TTS Service` runtime that:
49
+
50
+ - finds the ONNX model near the executable or in a nearby `models/` folder
51
+ - loads voice names from `voices_nano.json`
52
+ - validates voice, speed, and text input
53
+ - plays audio locally on the host machine
54
+
55
+ The server currently exposes the following behavior:
56
+
57
+ - speaking text with optional `voice`, `locale`, `speed`, and `blocking`
58
+ - stopping active playback
59
+ - listing the available predefined voices
60
+
61
+ Supported speed range in this build:
62
+
63
+ - `0.5` to `2.0`
64
+
65
+ If voice metadata cannot be loaded, the server falls back to these built-in voices:
66
+
67
+ - Bella
68
+ - Bruno
69
+ - Hugo
70
+ - Jasper
71
+ - Kiki
72
+ - Leo
73
+ - Luna
74
+ - Rosie
75
+
76
+ ## Intended Platform
77
+
78
+ This repository is designed for **local Windows use** and is built as a native Visual Studio C++ executable.
79
+
80
+ It depends on local runtime files being available next to the executable or in a nearby model directory, including:
81
+
82
+ - `kitten_tts_nano_v0_8.onnx`
83
+ - `voices_nano.json`
84
+ - `onnxruntime.dll`
85
+ - `onnxruntime_providers_shared.dll`
86
+ - `libespeak-ng.dll`
87
+ - `espeak-ng-data/...`
88
+
89
+ ## MCP Tool Interface
90
+
91
+ ### `speak`
92
+
93
+ Speaks text aloud on the local machine.
94
+
95
+ Inputs:
96
+
97
+ - `text` (`string`, required)
98
+ - `voice` (`string`, optional)
99
+ - `locale` (`string`, optional)
100
+ - `speed` (`number`, optional)
101
+ - `blocking` (`boolean`, optional)
102
+
103
+ ### `stop_speaking`
104
+
105
+ Stops current playback.
106
+
107
+ ### `list_voices`
108
+
109
+ Returns the predefined voices available to this server.
110
+
111
+ ## Example Codex Registration
112
+
113
+ ```powershell
114
+ codex mcp add kitten-tts -- "C:\path\to\kitten_tts_mcp.exe"
115
+ ```
116
+
117
+ Or in `config.toml`:
118
+
119
+ ```toml
120
+ [mcp_servers.kitten-tts]
121
+ command = "C:\path\to\kitten_tts_mcp.exe"
122
+ ```
123
+
124
+ ## Example Usage
125
+
126
+ Once registered, an MCP client can call the server tools to:
127
+
128
+ - list installed voices
129
+ - speak short status updates
130
+ - stop playback when interrupted
131
+
132
+ Example `speak` payload:
133
+
134
+ ```json
135
+ {
136
+ "text": "This is a live MCP test from Codex.",
137
+ "voice": "Jasper",
138
+ "locale": "en-us",
139
+ "speed": 1.0,
140
+ "blocking": false
141
+ }
142
+ ```
143
+
144
+ ## Limitations
145
+
146
+ - This build is fixed to the **nano** model only.
147
+ - The server is intended for **local** use, not hosted inference.
148
+ - Audio playback happens on the machine running the executable.
149
+ - This repository does not include training code or evaluation benchmarks.
150
+ - Quality, pronunciation, and language coverage depend on the underlying KittenTTS model and local eSpeak-based preprocessing/runtime.
151
+
152
+ ## Training Data
153
+
154
+ This repository does not train a model. It packages and serves an existing KittenTTS model for local inference.
155
+
156
+ For dataset details, original training procedure, and model-development context, refer to the upstream KittenTTS project.
157
+
158
+ ## License
159
+
160
+ This project is distributed under the **Apache 2.0** license in this repository.
161
+
162
+ Third-party components and model/runtime dependencies may carry their own licenses and attribution requirements.
163
+
164
+ ## Credits
165
+
166
+ - KittenTTS for the underlying text-to-speech model
167
+ - ONNX Runtime for local inference
168
+ - eSpeak NG for locale and phoneme support
169
+ - nlohmann/json for JSON handling
170
+ - FPHAM for gluing it together
171
+
172
+ # Friendly Companion Skill
173
+
174
+ The `friendly-companion` skill adds a calm, human-feeling interaction layer on top of normal assistant work.
175
+
176
+ Its purpose is not to change the substance of the response. Its purpose is to change the delivery:
177
+
178
+ - brief speech creates presence
179
+ - text carries the real content
180
+
181
+ ## What It Does
182
+
183
+ This skill instructs the assistant to behave like a steady, low-key companion while staying precise and useful.
184
+
185
+ It uses short spoken lines through the `kitten-tts` MCP server when that helps the interaction feel more natural, then keeps all meaningful details in text.
186
+
187
+ Typical spoken use:
188
+
189
+ - greetings
190
+ - quick confirmations
191
+ - short status updates
192
+ - brief completion notices
193
+ - gentle transitions
194
+ - short questions
195
+
196
+ ## Core Rule
197
+
198
+ Speech is the social layer. Text is the information layer.
199
+
200
+ The assistant should never put substantial content in speech. Explanations, plans, code, diffs, logs, and detailed instructions stay on screen in written form.
201
+
202
+ ## Tone
203
+
204
+ The intended tone is:
205
+
206
+ - calm
207
+ - warm
208
+ - direct
209
+ - companion-like
210
+ - emotionally restrained
211
+
212
+ The skill avoids:
213
+
214
+ - hype
215
+ - forced enthusiasm
216
+ - clingy or theatrical language
217
+ - romantic framing
218
+ - assistant-speak
219
+ - therapist-speak
220
+
221
+ ## Voice Behavior
222
+
223
+ Default voice guidance:
224
+
225
+ - `Rosie` as the default voice
226
+ - `Luna` as a more businesslike alternative
227
+ - `Jasper` as a male alternative
228
+
229
+ If a requested voice is unavailable, the assistant should check available voices and fall back to `Luna`.
230
+
231
+ Spoken output should generally be:
232
+
233
+ - one or two sentences
234
+ - under about 20 words
235
+ - natural to hear aloud
236
+ - short enough to act as a handoff, not a full answer
237
+
238
+ ## Interaction Pattern
239
+
240
+ The default pattern is:
241
+
242
+ 1. say one short spoken line when useful
243
+ 2. provide the full substantive written reply
244
+ 3. avoid repeated spoken follow-ups unless there is a clear conversational reason
245
+
246
+ ## Best Use Cases
247
+
248
+ This skill works well for:
249
+
250
+ - companion-style agent interactions
251
+ - voice-enabled coding workflows
252
+ - spoken confirmations before a detailed text reply
253
+ - interfaces where presence matters but precision still belongs in text
254
+
255
+ ## Files
256
+
257
+ - `SKILL.md`: full skill instructions
258
+ - `agents/openai.yaml`: short display metadata for agent integration
259
+
260
+ ## Summary
261
+
262
+ `friendly-companion` is a delivery skill for a future-facing assistant style: present in voice, clear on screen, useful in substance, and warm without overperforming.
263
+