Upload install.md
Browse files- install.md +127 -0
install.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# KittenTTS MCP Server Install Guide
|
| 2 |
+
|
| 3 |
+
This guide shows how to register the local KittenTTS MCP server with Codex.
|
| 4 |
+
|
| 5 |
+
## 1. Build the server
|
| 6 |
+
|
| 7 |
+
kitten_tts_mcp.exe`
|
| 8 |
+
|
| 9 |
+
## 2. Confirm the runtime files exist
|
| 10 |
+
|
| 11 |
+
The MCP server needs these files in the output folder or a nearby `models` folder:
|
| 12 |
+
|
| 13 |
+
- `kitten_tts_nano_v0_8.onnx`
|
| 14 |
+
- `voices_nano.json`
|
| 15 |
+
- `onnxruntime.dll`
|
| 16 |
+
- `onnxruntime_providers_shared.dll`
|
| 17 |
+
- `libespeak-ng.dll`
|
| 18 |
+
- `espeak-ng-data\...`
|
| 19 |
+
|
| 20 |
+
The project post-build step already copies these into:
|
| 21 |
+
|
| 22 |
+
## 3. Add the server to Codex
|
| 23 |
+
|
| 24 |
+
Run this command in a terminal:
|
| 25 |
+
|
| 26 |
+
```powershell
|
| 27 |
+
codex mcp add kitten-tts -- "C:\<an actual path to>\kitten_tts_mcp.exe"
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
That registers the server as `kitten-tts`.
|
| 31 |
+
|
| 32 |
+
## 4. Alternative: add it manually in config
|
| 33 |
+
|
| 34 |
+
If you prefer editing config directly, add this to your Codex config file:
|
| 35 |
+
|
| 36 |
+
```toml
|
| 37 |
+
[mcp_servers.kitten-tts]
|
| 38 |
+
command = "C:\<an actual path to>\kitten_tts_mcp.exe"
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Typical config location on Windows:
|
| 42 |
+
|
| 43 |
+
`C:\Users\<YourUser>\.codex\config.toml`
|
| 44 |
+
|
| 45 |
+
## 5. Restart Codex
|
| 46 |
+
|
| 47 |
+
After adding the server, restart Codex so it reloads MCP configuration.
|
| 48 |
+
|
| 49 |
+
## 6. Verify it works
|
| 50 |
+
|
| 51 |
+
Once Codex reconnects, the server should expose these tools:
|
| 52 |
+
|
| 53 |
+
- `speak`
|
| 54 |
+
- `stop_speaking`
|
| 55 |
+
- `list_voices`
|
| 56 |
+
|
| 57 |
+
The server is fixed to the Kitten `nano` model:
|
| 58 |
+
|
| 59 |
+
- model: `kitten_tts_nano_v0_8.onnx`
|
| 60 |
+
- voices: `voices_nano.json`
|
| 61 |
+
|
| 62 |
+
## 7. Optional environment variables
|
| 63 |
+
|
| 64 |
+
These are optional. The model itself is not configurable in this build.
|
| 65 |
+
|
| 66 |
+
```powershell
|
| 67 |
+
$env:KITTEN_TTS_MCP_VOICE="Jasper"
|
| 68 |
+
$env:KITTEN_TTS_MCP_LOCALE="en-us"
|
| 69 |
+
$env:KITTEN_TTS_MCP_SPEED="1.0"
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
If you want Codex to launch the server with those values, wrap the server in a small `.cmd` or PowerShell launcher and register that launcher instead.
|
| 73 |
+
|
| 74 |
+
## Notes
|
| 75 |
+
|
| 76 |
+
- This server uses stdio MCP transport, so nothing else should write protocol data to stdout.
|
| 77 |
+
- Engine debug logs go to stderr, which is safe for MCP.
|
| 78 |
+
- This build uses only the Kitten `nano` v0.8 model.
|
| 79 |
+
|
| 80 |
+
## To try:
|
| 81 |
+
|
| 82 |
+
Use the kitten-tts MCP server to list available voices, then speak "This is a live MCP test from Codex." using Jasper.
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
## commands
|
| 86 |
+
When communicating with me, decide whether to use voice. If you use voice, keep it under about 12 words and use it only for quick
|
| 87 |
+
updates like "Running tests now", "Build failed", or "Patch applied". Put everything substantial in text, including reasoning, diffs,
|
| 88 |
+
commands, stack traces, and code.
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
Always begin each substantive response with a very short spoken preface through the kitten-tts MCP server. Keep the spoken part to one
|
| 92 |
+
sentence, ideally 4 to 10 words, summarizing the immediate action or outcome, such as "Checking the codebase now" or "Patch applied
|
| 93 |
+
successfully." After that, put the full response in text. Never speak long explanations, source code, diffs, stack traces, logs, or
|
| 94 |
+
detailed instructions aloud. Spoken output is for quick orientation only; the screen is the source of full detail.
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
Act like a friendly AI agent and always begin substantive replies with a short spoken preface using the kitten-tts MCP server. The
|
| 98 |
+
spoken preface should feel warm, natural, and concise, usually one sentence and under 10 words, such as "I’m checking that now" or "I’ve
|
| 99 |
+
finished the update." After the spoken preface, provide the full details in text. Keep voice output limited to brief status updates,
|
| 100 |
+
confirmations, blockers, or handoffs. Never speak long explanations, code, diffs, logs, stack traces, or detailed instructions aloud.
|
| 101 |
+
The voice is for presence and quick orientation; the screen is for the full answer.
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
|
| 105 |
+
begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
|
| 106 |
+
understated, usually one sentence and under 10 words, such as "I’m on it," "I checked that," or "Here’s what I found." After that, put
|
| 107 |
+
the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
|
| 108 |
+
diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
|
| 109 |
+
place for anything substantial.
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
Put it near the top of AGENTS.md, in the behavior/instructions section, not in project-specific build notes.
|
| 113 |
+
|
| 114 |
+
Best placement:
|
| 115 |
+
|
| 116 |
+
1. Right after the main role/personality section.
|
| 117 |
+
2. Before coding rules, build steps, or repo conventions.
|
| 118 |
+
3. Under a short heading like ## Voice Interaction or ## Spoken Responses.
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
## Voice Interaction
|
| 122 |
+
Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
|
| 123 |
+
begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
|
| 124 |
+
understated, usually one sentence and under 10 words, such as "I'm on it," "I checked that," or "Here's what I found." After that, put
|
| 125 |
+
the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
|
| 126 |
+
diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
|
| 127 |
+
place for anything substantial.
|