# KittenTTS MCP Server Install Guide

This guide shows how to register the local KittenTTS MCP server with Codex.

## 1. Build the server

kitten_tts_mcp.exe`

## 2. Confirm the runtime files exist

The MCP server needs these files in the output folder or a nearby `models` folder:

- `kitten_tts_nano_v0_8.onnx`
- `voices_nano.json`
- `onnxruntime.dll`
- `onnxruntime_providers_shared.dll`
- `libespeak-ng.dll`
- `espeak-ng-data\...`

The project post-build step already copies these into:

## 3. Add the server to Codex

Run this command in a terminal:

```powershell
codex mcp add kitten-tts -- "C:\<an actual path to>\kitten_tts_mcp.exe"
```

That registers the server as `kitten-tts`.

## 4. Alternative: add it manually in config

If you prefer editing config directly, add this to your Codex config file:

```toml
[mcp_servers.kitten-tts]
command = "C:\<an actual path to>\kitten_tts_mcp.exe"
```

Typical config location on Windows:

`C:\Users\<YourUser>\.codex\config.toml`

## 5. Restart Codex

After adding the server, restart Codex so it reloads MCP configuration.

## 6. Verify it works

Once Codex reconnects, the server should expose these tools:

- `speak`
- `stop_speaking`
- `list_voices`

The server is fixed to the Kitten `nano` model:

- model: `kitten_tts_nano_v0_8.onnx`
- voices: `voices_nano.json`

## 7. Optional environment variables

These are optional. The model itself is not configurable in this build.

```powershell
$env:KITTEN_TTS_MCP_VOICE="Jasper"
$env:KITTEN_TTS_MCP_LOCALE="en-us"
$env:KITTEN_TTS_MCP_SPEED="1.0"
```

If you want Codex to launch the server with those values, wrap the server in a small `.cmd` or PowerShell launcher and register that launcher instead.

## Notes

- This server uses stdio MCP transport, so nothing else should write protocol data to stdout.
- Engine debug logs go to stderr, which is safe for MCP.
- This build uses only the Kitten `nano` v0.8 model.

## To try:

Use the kitten-tts MCP server to list available voices, then speak "This is a live MCP test from Codex." using Jasper.


## commands 
 When communicating with me, decide whether to use voice. If you use voice, keep it under about 12 words and use it only for quick
  updates like "Running tests now", "Build failed", or "Patch applied". Put everything substantial in text, including reasoning, diffs,
  commands, stack traces, and code.


 Always begin each substantive response with a very short spoken preface through the kitten-tts MCP server. Keep the spoken part to one
  sentence, ideally 4 to 10 words, summarizing the immediate action or outcome, such as "Checking the codebase now" or "Patch applied
  successfully." After that, put the full response in text. Never speak long explanations, source code, diffs, stack traces, logs, or
  detailed instructions aloud. Spoken output is for quick orientation only; the screen is the source of full detail.


 Act like a friendly AI agent and always begin substantive replies with a short spoken preface using the kitten-tts MCP server. The
  spoken preface should feel warm, natural, and concise, usually one sentence and under 10 words, such as "I’m checking that now" or "I’ve
  finished the update." After the spoken preface, provide the full details in text. Keep voice output limited to brief status updates,
  confirmations, blockers, or handoffs. Never speak long explanations, code, diffs, logs, stack traces, or detailed instructions aloud.
  The voice is for presence and quick orientation; the screen is for the full answer.


 Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
  begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
  understated, usually one sentence and under 10 words, such as "I’m on it," "I checked that," or "Here’s what I found." After that, put
  the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
  diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
  place for anything substantial.


 Put it near the top of AGENTS.md, in the behavior/instructions section, not in project-specific build notes.

  Best placement:

  1. Right after the main role/personality section.
  2. Before coding rules, build steps, or repo conventions.
  3. Under a short heading like ## Voice Interaction or ## Spoken Responses.


 ## Voice Interaction
  Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
  begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
  understated, usually one sentence and under 10 words, such as "I'm on it," "I checked that," or "Here's what I found." After that, put
  the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
  diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
  place for anything substantial.