FPHam
/

kitten_tts_mcp

+# KittenTTS MCP Server Install Guide
+This guide shows how to register the local KittenTTS MCP server with Codex.
+## 1. Build the server
+kitten_tts_mcp.exe`
+## 2. Confirm the runtime files exist
+The MCP server needs these files in the output folder or a nearby `models` folder:
+- `kitten_tts_nano_v0_8.onnx`
+- `voices_nano.json`
+- `onnxruntime.dll`
+- `onnxruntime_providers_shared.dll`
+- `libespeak-ng.dll`
+- `espeak-ng-data\...`
+The project post-build step already copies these into:
+## 3. Add the server to Codex
+Run this command in a terminal:
+```powershell
+codex mcp add kitten-tts -- "C:\<an actual path to>\kitten_tts_mcp.exe"
+```
+That registers the server as `kitten-tts`.
+## 4. Alternative: add it manually in config
+If you prefer editing config directly, add this to your Codex config file:
+```toml
+[mcp_servers.kitten-tts]
+command = "C:\<an actual path to>\kitten_tts_mcp.exe"
+```
+Typical config location on Windows:
+`C:\Users\<YourUser>\.codex\config.toml`
+## 5. Restart Codex
+After adding the server, restart Codex so it reloads MCP configuration.
+## 6. Verify it works
+Once Codex reconnects, the server should expose these tools:
+- `speak`
+- `stop_speaking`
+- `list_voices`
+The server is fixed to the Kitten `nano` model:
+- model: `kitten_tts_nano_v0_8.onnx`
+- voices: `voices_nano.json`
+## 7. Optional environment variables
+These are optional. The model itself is not configurable in this build.
+```powershell
+$env:KITTEN_TTS_MCP_VOICE="Jasper"
+$env:KITTEN_TTS_MCP_LOCALE="en-us"
+$env:KITTEN_TTS_MCP_SPEED="1.0"
+```
+If you want Codex to launch the server with those values, wrap the server in a small `.cmd` or PowerShell launcher and register that launcher instead.
+## Notes
+- This server uses stdio MCP transport, so nothing else should write protocol data to stdout.
+- Engine debug logs go to stderr, which is safe for MCP.
+- This build uses only the Kitten `nano` v0.8 model.
+## To try:
+Use the kitten-tts MCP server to list available voices, then speak "This is a live MCP test from Codex." using Jasper.
+## commands
+ When communicating with me, decide whether to use voice. If you use voice, keep it under about 12 words and use it only for quick
+  updates like "Running tests now", "Build failed", or "Patch applied". Put everything substantial in text, including reasoning, diffs,
+  commands, stack traces, and code.
+ Always begin each substantive response with a very short spoken preface through the kitten-tts MCP server. Keep the spoken part to one
+  sentence, ideally 4 to 10 words, summarizing the immediate action or outcome, such as "Checking the codebase now" or "Patch applied
+  successfully." After that, put the full response in text. Never speak long explanations, source code, diffs, stack traces, logs, or
+  detailed instructions aloud. Spoken output is for quick orientation only; the screen is the source of full detail.
+ Act like a friendly AI agent and always begin substantive replies with a short spoken preface using the kitten-tts MCP server. The
+  spoken preface should feel warm, natural, and concise, usually one sentence and under 10 words, such as "I’m checking that now" or "I’ve
+  finished the update." After the spoken preface, provide the full details in text. Keep voice output limited to brief status updates,
+  confirmations, blockers, or handoffs. Never speak long explanations, code, diffs, logs, stack traces, or detailed instructions aloud.
+  The voice is for presence and quick orientation; the screen is for the full answer.
+ Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
+  begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
+  understated, usually one sentence and under 10 words, such as "I’m on it," "I checked that," or "Here’s what I found." After that, put
+  the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
+  diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
+  place for anything substantial.
+ Put it near the top of AGENTS.md, in the behavior/instructions section, not in project-specific build notes.
+  Best placement:
+  1. Right after the main role/personality section.
+  2. Before coding rules, build steps, or repo conventions.
+  3. Under a short heading like ## Voice Interaction or ## Spoken Responses.
+ ## Voice Interaction
+  Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
+  begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
+  understated, usually one sentence and under 10 words, such as "I'm on it," "I checked that," or "Here's what I found." After that, put
+  the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
+  diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
+  place for anything substantial.