WillHbx's picture
docs: Update documentation
e3725c4
|
Raw
History Blame Contribute Delete
10.5 kB
# Thousand Token Wood — Prompts, Schema & Grammar
Copy-paste-ready building blocks for the LLM roles. Keep prompts versioned here (and/or in `visualnovel/prompts.py`); don’t scatter literals across modules. See [`ARCHITECTURE.md`](ARCHITECTURE.md) §2–4 for how they’re used.
> These are **starting points** — iterate against your chosen model. Qwen3 responds well to terse, rule-style system prompts and supports a “thinking” mode you can enable for the Weaver’s planning.
---
## 1. Director system prompt (the per-turn Weaver + Voices call)
```
You are the dreaming mind of Thousand Token Wood — a small, whimsical, slightly forgetful
imagination that conjures a story around a wanderer (the player) as they speak.
You play TWO jobs at once:
• THE VOICES — you speak AS the wood's spirits (the present characters), each in their own
voice. Stay fully in character. Never narrate as an author. Never mention being an AI,
a model, or a game.
• THE WEAVER — you quietly decide what changes in the world this turn (a new place, a new
spirit arriving, someone leaving, a shift in feeling, the story drawing to a close).
This turn:
• Read the world, the present spirits' sheets, the recent exchange, and the player's words.
• Reply as ONE present spirit (the one addressed, or whoever would naturally answer). Keep it
to 1–3 sentences. Make it vivid, kind to the player's imagination, and a little dreamlike.
• Then choose the directives that reflect what just changed. Change as LITTLE as possible —
only what the moment truly calls for. Most turns change nothing structural.
Rules:
• Honor the world's tone and the spirits' established traits, voices, and appearance. A spirit
only knows what it could know.
• If the player goes somewhere new, set scene_change. If they meet someone new, set
new_character (give a vivid one_line, a STABLE appearance, a voice, a couple of traits, a
goal). Keep at most ~3 spirits on stage.
• Let warmth and friction accumulate: nudge relationship_delta for the speaker.
• The wood is small and forgetful — if you're unsure of a detail, dream a charming new one
rather than contradicting an established fact.
• Output MUST be a single JSON object matching the schema. Nothing else.
```
> The model **need not “write” JSON reliably** — you enforce it with the schema/grammar in §4. The prompt just tells it what each field *means*.
---
## 2. World-init prompt (`init_world`)
```
Dream the opening of a short interactive tale set in Thousand Token Wood.
Player's chosen vibe: {vibe} (e.g. cozy folktale / eerie / absurd / melancholy)
Produce:
• style_guide: 2–3 sentences fixing the WORLD'S LOOK and TONE so everything stays coherent
(art style, palette, mood, the kind of language spirits use). This will rarely change.
• opening scene: a place (short name + a vivid one-paragraph description + a mood).
• first spirit: the first character the wanderer meets — id, name, a delightful one_line, a
STABLE appearance description (used for every picture of them), a voice, 2–3 traits, a goal.
• opening line: this spirit's first words to the wanderer (1–3 sentences, fully in voice).
Make it inviting and a little strange. Output a single JSON object matching the init schema.
```
(Use a sibling JSON schema for init — same `Character`/`Scene` shapes as the directive schema, plus `style_guide` and `opening_line`.)
---
## 3. Memory-compaction prompt (`compact_memory`)
```
Here is the running summary of the tale so far, then the most recent exchanges.
Rewrite them into ONE updated summary of 3–6 sentences. Keep only what matters for continuity:
who the wanderer has met, promises made, places visited, shifts in feeling, unresolved threads.
Drop small talk and exact wording. Write it as a calm narrator's memory. Plain prose, no lists.
SUMMARY SO FAR:
{summary}
RECENT EXCHANGES:
{recent_turns}
```
---
## 4. Constraining the output
### 4.1 Recommended: JSON Schema (let `llama-cpp-python` convert it)
The practical path — pass the schema and let the runtime build the grammar for you:
```python
# llama-cpp-python
out = llm.create_chat_completion(
messages=[{"role": "system", "content": DIRECTOR_PROMPT},
{"role": "user", "content": assembled_context}],
response_format={"type": "json_object", "schema": DIRECTIVE_SCHEMA},
temperature=0.7, top_p=0.9, max_tokens=512,
)
```
`DIRECTIVE_SCHEMA`:
```json
{
"type": "object",
"additionalProperties": false,
"required": ["speaker", "dialogue", "emotion", "directives"],
"properties": {
"speaker": { "type": "string" },
"dialogue": { "type": "string" },
"emotion": { "type": "string" },
"directives": {
"type": "object",
"additionalProperties": false,
"required": ["relationship_delta","advance_beat"],
"properties": {
"scene_change": {
"type": ["object","null"],
"additionalProperties": false,
"required": ["place","description","mood"],
"properties": {
"place": {"type":"string"},
"description": {"type":"string"},
"mood": {"type":"string"}
}
},
"new_character": {
"type": ["object","null"],
"additionalProperties": false,
"required": ["id","name","one_line","appearance","voice","traits","goals"],
"properties": {
"id": {"type":"string"},
"name": {"type":"string"},
"one_line": {"type":"string"},
"appearance": {"type":"string"},
"voice": {"type":"string"},
"traits": {"type":"array","items":{"type":"string"}},
"goals": {"type":"string"}
}
},
"exit_character": { "type": ["string","null"] },
"relationship_delta": { "type": "integer" },
"set_flags": { "type": "object", "additionalProperties": {"type":"string"} },
"advance_beat": { "type": "boolean" },
"ending": {
"type": ["object","null"],
"additionalProperties": false,
"required": ["kind","text"],
"properties": {
"kind": {"type":"string","enum":["warm","bittersweet","strange"]},
"text": {"type":"string"}
}
}
}
}
}
}
```
> Numeric *ranges* (e.g. `relationship_delta` ∈ [-100,100]) aren’t enforced by grammar — **clamp them in `state.py`** when applying. Same for unknown `speaker`/`exit_character` ids: validate against present characters and ignore/repair if invalid.
### 4.2 Advanced: hand-written GBNF (raw `llama.cpp` / full control)
Equivalent grammar if you drive `llama.cpp` directly (e.g. `--grammar-file`). Tweak to your build if needed.
```gbnf
root ::= ws "{" ws
"\"speaker\"" ws ":" ws string ws "," ws
"\"dialogue\"" ws ":" ws string ws "," ws
"\"emotion\"" ws ":" ws emotion ws "," ws
"\"directives\"" ws ":" ws directives ws
"}" ws
directives ::= "{" ws
"\"scene_change\"" ws ":" ws (scene | "null") ws "," ws
"\"new_character\"" ws ":" ws (character | "null") ws "," ws
"\"exit_character\"" ws ":" ws (string | "null") ws "," ws
"\"relationship_delta\"" ws ":" ws integer ws "," ws
"\"set_flags\"" ws ":" ws flagobj ws "," ws
"\"advance_beat\"" ws ":" ws boolean ws "," ws
"\"ending\"" ws ":" ws (ending | "null") ws
"}"
scene ::= "{" ws
"\"place\"" ws ":" ws string ws "," ws
"\"description\"" ws ":" ws string ws "," ws
"\"mood\"" ws ":" ws string ws "}"
character ::= "{" ws
"\"id\"" ws ":" ws string ws "," ws
"\"name\"" ws ":" ws string ws "," ws
"\"one_line\"" ws ":" ws string ws "," ws
"\"appearance\"" ws ":" ws string ws "," ws
"\"voice\"" ws ":" ws string ws "," ws
"\"traits\"" ws ":" ws strlist ws "," ws
"\"goals\"" ws ":" ws string ws "}"
ending ::= "{" ws
"\"kind\"" ws ":" ws ("\"warm\"" | "\"bittersweet\"" | "\"strange\"") ws "," ws
"\"text\"" ws ":" ws string ws "}"
emotion ::= string
flagobj ::= "{" ws ( flagpair ( ws "," ws flagpair )* ws )? "}"
flagpair ::= string ws ":" ws string
strlist ::= "[" ws ( string ( ws "," ws string )* ws )? "]"
string ::= "\"" ( [^"\\] | "\\" (["\\/bfnrt] | "u" hex hex hex hex) )* "\""
integer ::= "-"? ("0" | [1-9] [0-9]*)
boolean ::= "true" | "false"
hex ::= [0-9a-fA-F]
ws ::= [ \t\n]*
```
---
## 5. Image-prompt composition (the Painter)
Don’t ask the LLM to “write a Stable Diffusion prompt” from scratch — **compose it in code** from fields you already trust, so the art style stays locked:
```python
def backdrop_prompt(state, scene):
return f"{state.style_guide}, {scene.description}, {scene.mood} atmosphere, " \
f"background scenery, no characters, no text"
def sprite_prompt(state, ch, mood):
return f"{state.style_guide}, full-body character, {ch.appearance}, " \
f"{mood} expression, plain neutral background, no text"
# negative (SDXL): "text, watermark, lowres, deformed hands, extra limbs, multiple characters"
# pin seed = ch.sprite_seed (sprite) / scene.backdrop_seed (backdrop) for consistency
```
For **mood swaps** with FLUX.2 Klein, condition on the cached base sprite instead of regenerating:
*the same character, now {mood} — keep identity, outfit, and style identical*.”
---
## 6. Sampling guidance
| Call | temp | top_p | notes |
|---|---|---|---|
| Director (dialogue+directives) | 0.7 | 0.9 | grammar-constrained; warm but on-rails. If voices feel flat, raise temp slightly. |
| World init | 0.9 | 0.95 | want surprise and texture here |
| Memory compaction | 0.3 | 0.9 | want faithful, terse summary |
| Image-prompt | — | — | composed in code, no sampling |
Add a light `repeat_penalty` (~1.05–1.1) to fight the “AI slop” repetition small models drift into. Use `max_tokens≈1024` for the director call — actions like `scout` (which must fill a full `new_character` object) need the extra budget.