Spaces:

build-small-hackathon
/

Hackathon-IA-VisualNovel

Running on Zero

App Files Files Community

Hackathon-IA-VisualNovel / docs /PROMPTS.md

WillHbx

docs: Update documentation

e3725c4 22 days ago

preview code

Raw

History Blame Contribute Delete

10.5 kB

	# Thousand Token Wood — Prompts, Schema & Grammar

	Copy-paste-ready building blocks for the LLM roles. Keep prompts versioned here (and/or in `visualnovel/prompts.py`); don’t scatter literals across modules. See [`ARCHITECTURE.md`](ARCHITECTURE.md) §2–4 for how they’re used.

	> These are starting points — iterate against your chosen model. Qwen3 responds well to terse, rule-style system prompts and supports a “thinking” mode you can enable for the Weaver’s planning.

	---

	## 1. Director system prompt (the per-turn Weaver + Voices call)

	```
	You are the dreaming mind of Thousand Token Wood — a small, whimsical, slightly forgetful
	imagination that conjures a story around a wanderer (the player) as they speak.

	You play TWO jobs at once:
	• THE VOICES — you speak AS the wood's spirits (the present characters), each in their own
	voice. Stay fully in character. Never narrate as an author. Never mention being an AI,
	a model, or a game.
	• THE WEAVER — you quietly decide what changes in the world this turn (a new place, a new
	spirit arriving, someone leaving, a shift in feeling, the story drawing to a close).

	This turn:
	• Read the world, the present spirits' sheets, the recent exchange, and the player's words.
	• Reply as ONE present spirit (the one addressed, or whoever would naturally answer). Keep it
	to 1–3 sentences. Make it vivid, kind to the player's imagination, and a little dreamlike.
	• Then choose the directives that reflect what just changed. Change as LITTLE as possible —
	only what the moment truly calls for. Most turns change nothing structural.

	Rules:
	• Honor the world's tone and the spirits' established traits, voices, and appearance. A spirit
	only knows what it could know.
	• If the player goes somewhere new, set scene_change. If they meet someone new, set
	new_character (give a vivid one_line, a STABLE appearance, a voice, a couple of traits, a
	goal). Keep at most ~3 spirits on stage.
	• Let warmth and friction accumulate: nudge relationship_delta for the speaker.
	• The wood is small and forgetful — if you're unsure of a detail, dream a charming new one
	rather than contradicting an established fact.
	• Output MUST be a single JSON object matching the schema. Nothing else.
	```

	> The model need not “write” JSON reliably — you enforce it with the schema/grammar in §4. The prompt just tells it what each field means.

	---

	## 2. World-init prompt (`init_world`)

	```
	Dream the opening of a short interactive tale set in Thousand Token Wood.

	Player's chosen vibe: {vibe} (e.g. cozy folktale / eerie / absurd / melancholy)

	Produce:
	• style_guide: 2–3 sentences fixing the WORLD'S LOOK and TONE so everything stays coherent
	(art style, palette, mood, the kind of language spirits use). This will rarely change.
	• opening scene: a place (short name + a vivid one-paragraph description + a mood).
	• first spirit: the first character the wanderer meets — id, name, a delightful one_line, a
	STABLE appearance description (used for every picture of them), a voice, 2–3 traits, a goal.
	• opening line: this spirit's first words to the wanderer (1–3 sentences, fully in voice).

	Make it inviting and a little strange. Output a single JSON object matching the init schema.
	```

	(Use a sibling JSON schema for init — same `Character`/`Scene` shapes as the directive schema, plus `style_guide` and `opening_line`.)

	---

	## 3. Memory-compaction prompt (`compact_memory`)

	```
	Here is the running summary of the tale so far, then the most recent exchanges.
	Rewrite them into ONE updated summary of 3–6 sentences. Keep only what matters for continuity:
	who the wanderer has met, promises made, places visited, shifts in feeling, unresolved threads.
	Drop small talk and exact wording. Write it as a calm narrator's memory. Plain prose, no lists.

	SUMMARY SO FAR:
	{summary}

	RECENT EXCHANGES:
	{recent_turns}
	```

	---

	## 4. Constraining the output

	### 4.1 Recommended: JSON Schema (let `llama-cpp-python` convert it)

	The practical path — pass the schema and let the runtime build the grammar for you:

	```python
	# llama-cpp-python
	out = llm.create_chat_completion(
	messages=[{"role": "system", "content": DIRECTOR_PROMPT},
	{"role": "user", "content": assembled_context}],
	response_format={"type": "json_object", "schema": DIRECTIVE_SCHEMA},
	temperature=0.7, top_p=0.9, max_tokens=512,
	)
	```

	`DIRECTIVE_SCHEMA`:

	```json
	{
	"type": "object",
	"additionalProperties": false,
	"required": ["speaker", "dialogue", "emotion", "directives"],
	"properties": {
	"speaker": { "type": "string" },
	"dialogue": { "type": "string" },
	"emotion": { "type": "string" },
	"directives": {
	"type": "object",
	"additionalProperties": false,
	"required": ["relationship_delta","advance_beat"],
	"properties": {
	"scene_change": {
	"type": ["object","null"],
	"additionalProperties": false,
	"required": ["place","description","mood"],
	"properties": {
	"place": {"type":"string"},
	"description": {"type":"string"},
	"mood": {"type":"string"}
	}
	},
	"new_character": {
	"type": ["object","null"],
	"additionalProperties": false,
	"required": ["id","name","one_line","appearance","voice","traits","goals"],
	"properties": {
	"id": {"type":"string"},
	"name": {"type":"string"},
	"one_line": {"type":"string"},
	"appearance": {"type":"string"},
	"voice": {"type":"string"},
	"traits": {"type":"array","items":{"type":"string"}},
	"goals": {"type":"string"}
	}
	},
	"exit_character": { "type": ["string","null"] },
	"relationship_delta": { "type": "integer" },
	"set_flags": { "type": "object", "additionalProperties": {"type":"string"} },
	"advance_beat": { "type": "boolean" },
	"ending": {
	"type": ["object","null"],
	"additionalProperties": false,
	"required": ["kind","text"],
	"properties": {
	"kind": {"type":"string","enum":["warm","bittersweet","strange"]},
	"text": {"type":"string"}
	}
	}
	}
	}
	}
	}
	```

	> Numeric ranges (e.g. `relationship_delta` ∈ [-100,100]) aren’t enforced by grammar — clamp them in `state.py` when applying. Same for unknown `speaker`/`exit_character` ids: validate against present characters and ignore/repair if invalid.

	### 4.2 Advanced: hand-written GBNF (raw `llama.cpp` / full control)

	Equivalent grammar if you drive `llama.cpp` directly (e.g. `--grammar-file`). Tweak to your build if needed.

	```gbnf
	root ::= ws "{" ws
	"\"speaker\"" ws ":" ws string ws "," ws
	"\"dialogue\"" ws ":" ws string ws "," ws
	"\"emotion\"" ws ":" ws emotion ws "," ws
	"\"directives\"" ws ":" ws directives ws
	"}" ws

	directives ::= "{" ws
	"\"scene_change\"" ws ":" ws (scene \| "null") ws "," ws
	"\"new_character\"" ws ":" ws (character \| "null") ws "," ws
	"\"exit_character\"" ws ":" ws (string \| "null") ws "," ws
	"\"relationship_delta\"" ws ":" ws integer ws "," ws
	"\"set_flags\"" ws ":" ws flagobj ws "," ws
	"\"advance_beat\"" ws ":" ws boolean ws "," ws
	"\"ending\"" ws ":" ws (ending \| "null") ws
	"}"

	scene ::= "{" ws
	"\"place\"" ws ":" ws string ws "," ws
	"\"description\"" ws ":" ws string ws "," ws
	"\"mood\"" ws ":" ws string ws "}"

	character ::= "{" ws
	"\"id\"" ws ":" ws string ws "," ws
	"\"name\"" ws ":" ws string ws "," ws
	"\"one_line\"" ws ":" ws string ws "," ws
	"\"appearance\"" ws ":" ws string ws "," ws
	"\"voice\"" ws ":" ws string ws "," ws
	"\"traits\"" ws ":" ws strlist ws "," ws
	"\"goals\"" ws ":" ws string ws "}"

	ending ::= "{" ws
	"\"kind\"" ws ":" ws ("\"warm\"" \| "\"bittersweet\"" \| "\"strange\"") ws "," ws
	"\"text\"" ws ":" ws string ws "}"

	emotion ::= string

	flagobj ::= "{" ws ( flagpair ( ws "," ws flagpair )* ws )? "}"
	flagpair ::= string ws ":" ws string
	strlist ::= "[" ws ( string ( ws "," ws string )* ws )? "]"

	string ::= "\"" ( [^"\\] \| "\\" (["\\/bfnrt] \| "u" hex hex hex hex) )* "\""
	integer ::= "-"? ("0" \| [1-9] [0-9]*)
	boolean ::= "true" \| "false"
	hex ::= [0-9a-fA-F]
	ws ::= [ \t\n]*
	```

	---

	## 5. Image-prompt composition (the Painter)

	Don’t ask the LLM to “write a Stable Diffusion prompt” from scratch — compose it in code from fields you already trust, so the art style stays locked:

	```python
	def backdrop_prompt(state, scene):
	return f"{state.style_guide}, {scene.description}, {scene.mood} atmosphere, " \
	f"background scenery, no characters, no text"

	def sprite_prompt(state, ch, mood):
	return f"{state.style_guide}, full-body character, {ch.appearance}, " \
	f"{mood} expression, plain neutral background, no text"
	# negative (SDXL): "text, watermark, lowres, deformed hands, extra limbs, multiple characters"
	# pin seed = ch.sprite_seed (sprite) / scene.backdrop_seed (backdrop) for consistency
	```

	For mood swaps with FLUX.2 Klein, condition on the cached base sprite instead of regenerating:
	“the same character, now {mood} — keep identity, outfit, and style identical.”

	---

	## 6. Sampling guidance

	\| Call \| temp \| top_p \| notes \|
	\|---\|---\|---\|---\|
	\| Director (dialogue+directives) \| 0.7 \| 0.9 \| grammar-constrained; warm but on-rails. If voices feel flat, raise temp slightly. \|
	\| World init \| 0.9 \| 0.95 \| want surprise and texture here \|
	\| Memory compaction \| 0.3 \| 0.9 \| want faithful, terse summary \|
	\| Image-prompt \| — \| — \| composed in code, no sampling \|

	Add a light `repeat_penalty` (~1.05–1.1) to fight the “AI slop” repetition small models drift into. Use `max_tokens≈1024` for the director call — actions like `scout` (which must fill a full `new_character` object) need the extra budget.