KittenTTS MCP Server Install Guide
This guide shows how to register the local KittenTTS MCP server with Codex.
1. Build the server
kitten_tts_mcp.exe`
2. Confirm the runtime files exist
The MCP server needs these files in the output folder or a nearby models folder:
kitten_tts_nano_v0_8.onnxvoices_nano.jsononnxruntime.dllonnxruntime_providers_shared.dlllibespeak-ng.dllespeak-ng-data\...
The project post-build step already copies these into:
3. Add the server to Codex
Run this command in a terminal:
codex mcp add kitten-tts -- "C:\<an actual path to>\kitten_tts_mcp.exe"
That registers the server as kitten-tts.
4. Alternative: add it manually in config
If you prefer editing config directly, add this to your Codex config file:
[mcp_servers.kitten-tts]
command = "C:\<an actual path to>\kitten_tts_mcp.exe"
Typical config location on Windows:
C:\Users\<YourUser>\.codex\config.toml
5. Restart Codex
After adding the server, restart Codex so it reloads MCP configuration.
6. Verify it works
Once Codex reconnects, the server should expose these tools:
speakstop_speakinglist_voices
The server is fixed to the Kitten nano model:
- model:
kitten_tts_nano_v0_8.onnx - voices:
voices_nano.json
7. Optional environment variables
These are optional. The model itself is not configurable in this build.
$env:KITTEN_TTS_MCP_VOICE="Jasper"
$env:KITTEN_TTS_MCP_LOCALE="en-us"
$env:KITTEN_TTS_MCP_SPEED="1.0"
If you want Codex to launch the server with those values, wrap the server in a small .cmd or PowerShell launcher and register that launcher instead.
Notes
- This server uses stdio MCP transport, so nothing else should write protocol data to stdout.
- Engine debug logs go to stderr, which is safe for MCP.
- This build uses only the Kitten
nanov0.8 model.
To try:
Use the kitten-tts MCP server to list available voices, then speak "This is a live MCP test from Codex." using Jasper.
commands
When communicating with me, decide whether to use voice. If you use voice, keep it under about 12 words and use it only for quick updates like "Running tests now", "Build failed", or "Patch applied". Put everything substantial in text, including reasoning, diffs, commands, stack traces, and code.
Always begin each substantive response with a very short spoken preface through the kitten-tts MCP server. Keep the spoken part to one sentence, ideally 4 to 10 words, summarizing the immediate action or outcome, such as "Checking the codebase now" or "Patch applied successfully." After that, put the full response in text. Never speak long explanations, source code, diffs, stack traces, logs, or detailed instructions aloud. Spoken output is for quick orientation only; the screen is the source of full detail.
Act like a friendly AI agent and always begin substantive replies with a short spoken preface using the kitten-tts MCP server. The spoken preface should feel warm, natural, and concise, usually one sentence and under 10 words, such as "I’m checking that now" or "I’ve finished the update." After the spoken preface, provide the full details in text. Keep voice output limited to brief status updates, confirmations, blockers, or handoffs. Never speak long explanations, code, diffs, logs, stack traces, or detailed instructions aloud. The voice is for presence and quick orientation; the screen is for the full answer.
Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and understated, usually one sentence and under 10 words, such as "I’m on it," "I checked that," or "Here’s what I found." After that, put the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code, diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the place for anything substantial.
Put it near the top of AGENTS.md, in the behavior/instructions section, not in project-specific build notes.
Best placement:
- Right after the main role/personality section.
- Before coding rules, build steps, or repo conventions.
- Under a short heading like ## Voice Interaction or ## Spoken Responses.
Voice Interaction
Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and understated, usually one sentence and under 10 words, such as "I'm on it," "I checked that," or "Here's what I found." After that, put the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code, diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the place for anything substantial.