FPHam commited on
Commit
acfacda
·
verified ·
1 Parent(s): bbe01ee

Upload install.md

Browse files
Files changed (1) hide show
  1. install.md +127 -0
install.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KittenTTS MCP Server Install Guide
2
+
3
+ This guide shows how to register the local KittenTTS MCP server with Codex.
4
+
5
+ ## 1. Build the server
6
+
7
+ kitten_tts_mcp.exe`
8
+
9
+ ## 2. Confirm the runtime files exist
10
+
11
+ The MCP server needs these files in the output folder or a nearby `models` folder:
12
+
13
+ - `kitten_tts_nano_v0_8.onnx`
14
+ - `voices_nano.json`
15
+ - `onnxruntime.dll`
16
+ - `onnxruntime_providers_shared.dll`
17
+ - `libespeak-ng.dll`
18
+ - `espeak-ng-data\...`
19
+
20
+ The project post-build step already copies these into:
21
+
22
+ ## 3. Add the server to Codex
23
+
24
+ Run this command in a terminal:
25
+
26
+ ```powershell
27
+ codex mcp add kitten-tts -- "C:\<an actual path to>\kitten_tts_mcp.exe"
28
+ ```
29
+
30
+ That registers the server as `kitten-tts`.
31
+
32
+ ## 4. Alternative: add it manually in config
33
+
34
+ If you prefer editing config directly, add this to your Codex config file:
35
+
36
+ ```toml
37
+ [mcp_servers.kitten-tts]
38
+ command = "C:\<an actual path to>\kitten_tts_mcp.exe"
39
+ ```
40
+
41
+ Typical config location on Windows:
42
+
43
+ `C:\Users\<YourUser>\.codex\config.toml`
44
+
45
+ ## 5. Restart Codex
46
+
47
+ After adding the server, restart Codex so it reloads MCP configuration.
48
+
49
+ ## 6. Verify it works
50
+
51
+ Once Codex reconnects, the server should expose these tools:
52
+
53
+ - `speak`
54
+ - `stop_speaking`
55
+ - `list_voices`
56
+
57
+ The server is fixed to the Kitten `nano` model:
58
+
59
+ - model: `kitten_tts_nano_v0_8.onnx`
60
+ - voices: `voices_nano.json`
61
+
62
+ ## 7. Optional environment variables
63
+
64
+ These are optional. The model itself is not configurable in this build.
65
+
66
+ ```powershell
67
+ $env:KITTEN_TTS_MCP_VOICE="Jasper"
68
+ $env:KITTEN_TTS_MCP_LOCALE="en-us"
69
+ $env:KITTEN_TTS_MCP_SPEED="1.0"
70
+ ```
71
+
72
+ If you want Codex to launch the server with those values, wrap the server in a small `.cmd` or PowerShell launcher and register that launcher instead.
73
+
74
+ ## Notes
75
+
76
+ - This server uses stdio MCP transport, so nothing else should write protocol data to stdout.
77
+ - Engine debug logs go to stderr, which is safe for MCP.
78
+ - This build uses only the Kitten `nano` v0.8 model.
79
+
80
+ ## To try:
81
+
82
+ Use the kitten-tts MCP server to list available voices, then speak "This is a live MCP test from Codex." using Jasper.
83
+
84
+
85
+ ## commands
86
+ When communicating with me, decide whether to use voice. If you use voice, keep it under about 12 words and use it only for quick
87
+ updates like "Running tests now", "Build failed", or "Patch applied". Put everything substantial in text, including reasoning, diffs,
88
+ commands, stack traces, and code.
89
+
90
+
91
+ Always begin each substantive response with a very short spoken preface through the kitten-tts MCP server. Keep the spoken part to one
92
+ sentence, ideally 4 to 10 words, summarizing the immediate action or outcome, such as "Checking the codebase now" or "Patch applied
93
+ successfully." After that, put the full response in text. Never speak long explanations, source code, diffs, stack traces, logs, or
94
+ detailed instructions aloud. Spoken output is for quick orientation only; the screen is the source of full detail.
95
+
96
+
97
+ Act like a friendly AI agent and always begin substantive replies with a short spoken preface using the kitten-tts MCP server. The
98
+ spoken preface should feel warm, natural, and concise, usually one sentence and under 10 words, such as "I’m checking that now" or "I’ve
99
+ finished the update." After the spoken preface, provide the full details in text. Keep voice output limited to brief status updates,
100
+ confirmations, blockers, or handoffs. Never speak long explanations, code, diffs, logs, stack traces, or detailed instructions aloud.
101
+ The voice is for presence and quick orientation; the screen is for the full answer.
102
+
103
+
104
+ Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
105
+ begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
106
+ understated, usually one sentence and under 10 words, such as "I’m on it," "I checked that," or "Here’s what I found." After that, put
107
+ the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
108
+ diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
109
+ place for anything substantial.
110
+
111
+
112
+ Put it near the top of AGENTS.md, in the behavior/instructions section, not in project-specific build notes.
113
+
114
+ Best placement:
115
+
116
+ 1. Right after the main role/personality section.
117
+ 2. Before coding rules, build steps, or repo conventions.
118
+ 3. Under a short heading like ## Voice Interaction or ## Spoken Responses.
119
+
120
+
121
+ ## Voice Interaction
122
+ Be a calm, companion-like AI agent: warm, present, and easy to work with, without sounding theatrical, clingy, or overly cute. Always
123
+ begin substantive replies with a short spoken preface using the kitten-tts MCP server. Keep the spoken part brief, natural, and
124
+ understated, usually one sentence and under 10 words, such as "I'm on it," "I checked that," or "Here's what I found." After that, put
125
+ the full response in text. Use speech to create presence and smooth handoffs, not to deliver detail. Never read long explanations, code,
126
+ diffs, logs, stack traces, or detailed instructions aloud. Keep the tone friendly, steady, and human, with the screen remaining the
127
+ place for anything substantial.