Spaces:

comp5423
/

NewProject

Runtime error

App Files Files Community

NewProject / README.md

PPP

chore: downgrade to Gradio 4.x for compatibility

673d037 8 days ago

preview code

raw

history blame contribute delete

13.7 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: StoryWeaver
emoji: 📖
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 4.43.0
app_file: app.py
python_version: '3.10'
pinned: false
license: mit
short_description: Interactive NLP story engine with evaluation and logging

StoryWeaver

StoryWeaver is an interactive text-adventure system built for our NLP course project. The repo is structured as an engineering project first and a demo second: it contains the playable app, the state-management core, evaluation scripts, and logging utilities needed for report writing and team collaboration.

This README is written for teammates who need to:

understand how the system is organized
run the app locally
know where to change prompts, rules, or UI
collect evaluation results for the report
debug a bad interaction without reading the whole codebase first

What This Repository Contains

At a high level, the project has five responsibilities:

parse player input into structured intent
keep the world state consistent across turns
generate the next story response and options
expose the system through a Gradio UI
export logs and run reproducible evaluation

This means the repo is not only a "game demo". It is also the evidence pipeline for the course deliverables.

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Create `.env`

Create a .env file in the project root:

QWEN_API_KEY=your_api_key_here

Optional:

STORYWEAVER_LOG_DIR=logs/interactions

3. Run the app

python app.py

Default local URL:

http://localhost:7860

4. Run evaluation

python evaluation/run_evaluations.py --task all --repeats 3

Useful variants:

python evaluation/run_evaluations.py --task intent
python evaluation/run_evaluations.py --task consistency
python evaluation/run_evaluations.py --task latency --repeats 5
python evaluation/run_evaluations.py --task branch

Repository Map

StoryWeaver/
|-- app.py
|-- nlu_engine.py
|-- story_engine.py
|-- state_manager.py
|-- telemetry.py
|-- utils.py
|-- requirements.txt
|-- evaluation/
|   |-- run_evaluations.py
|   |-- datasets/
|   `-- results/
`-- logs/
    `-- interactions/

Core responsibilities by file:

app.py Gradio app, session lifecycle, UI callbacks, per-turn logging.
state_manager.py Player/world models, item registry, NPC registry, quest registry, state validation, consistency checks, change application.
nlu_engine.py Intent parsing. Uses LLM parsing when available and keyword fallback when not.
story_engine.py Opening generation, main story generation, option generation, stream handling, fallback handling, telemetry tags.
telemetry.py Session metadata and JSONL interaction log export.
utils.py API client setup, Qwen calls, JSON extraction, retry helpers.
evaluation/run_evaluations.py Reproducible experiment runner for the report.

System Architecture

The main runtime path is:

Player Input -> NLU -> Validation -> Story Generation -> State Update -> UI Output -> Interaction Log

There are two ideas that matter most in this codebase:

1. `GameState` is the source of truth

Almost everything meaningful lives in state_manager.py:

player stats
location
time and weather
inventory and equipment
quests
NPC states
event history

When changing gameplay, try to keep state logic here instead of scattering it across prompts and UI code.

2. The app is a coordinator, not the game logic

app.py should mostly:

receive user input
call NLU
call the story engine
update the chat UI
write telemetry logs

If a new feature changes game rules, it probably belongs in state_manager.py or story_engine.py, not in the UI layer.

Runtime Flow

Text input flow

For normal text input, the path is:

process_user_input receives raw text from the UI
NLUEngine.parse_intent converts it into a structured intent dict
GameState.pre_validate_action blocks clearly invalid actions early
StoryEngine.generate_story_stream runs the main narrative pipeline
GameState.check_consistency and apply_changes update state
UI is refreshed with story text, options, and status panel
_record_interaction_log writes a JSONL record to disk

Option click flow

Button clicks do not go through full free-text parsing. Instead:

the selected option is converted to an intent-like dict
the story engine processes it the same way as text input
the result is rendered and logged

This is useful because option interactions and free-text interactions now share the same evaluation and observability format.

Main Modules in More Detail

`state_manager.py`

This file defines:

PlayerState
WorldState
GameEvent
GameState

Important methods:

pre_validate_action Rejects obviously invalid actions before calling the model.
check_consistency Detects contradictions in proposed state changes.
apply_changes Applies state changes and returns a readable change log.
validate Makes sure the resulting state is legal.
to_prompt Serializes the current game state into prompt-ready text.

When to edit this file:

adding new items, NPCs, quests, or locations
adding deterministic rules
improving consistency checks
changing state serialization for prompts

`nlu_engine.py`

This file is responsible for intent recognition.

Current behavior:

try LLM parsing first
fall back to keyword rules if parsing fails
return a normalized intent dict with parser_source

Current intent labels include:

ATTACK
TALK
MOVE
EXPLORE
USE_ITEM
TRADE
EQUIP
REST
QUEST
SKILL
PICKUP
FLEE
CUSTOM

When to edit this file:

adding a new intent type
improving keyword fallback
adding target extraction logic
improving low-confidence handling

`story_engine.py`

This is the main generation module.

It currently handles:

opening generation
story generation for each turn
streaming and non-streaming paths
default/fallback outputs
consistency-aware regeneration
response telemetry such as fallback reason and engine mode

Important methods:

generate_opening_stream
generate_story
generate_story_stream
process_option_selection_stream
_fallback_response

When to edit this file:

changing prompts
changing multi-stage generation logic
changing fallback behavior
adding generation-side telemetry

`app.py`

This file is the UI entry point and interaction orchestrator.

Important responsibilities:

create a new game session
start and restart the app session
process text input
process option clicks
update Gradio components
write structured interaction logs

When to edit this file:

changing UI flow
adding debug panels
changing how logs are written
changing how outputs are displayed

`telemetry.py`

This file handles structured log export.

It is intentionally simple and file-based:

one session gets one JSONL file
one turn becomes one JSON object line

This is useful for:

report case studies
measuring fallback rate
debugging weird turns
collecting examples for later evaluation

Logging and Observability

Interaction logs are written under:

logs/interactions

Each turn record includes at least:

input source
user input
NLU result
latency
fallback metadata
state changes
consistency issues
final output text
post-turn state snapshot

Example shape:

{
  "timestamp": "2026-03-14T18:55:00",
  "session_id": "sw-20260314-185500-ab12cd34",
  "turn_index": 3,
  "input_source": "text_input",
  "user_input": "和村长老伯谈谈最近森林里的怪事",
  "nlu_result": {
    "intent": "TALK",
    "target": "村长老伯",
    "parser_source": "llm"
  },
  "latency_ms": 842.13,
  "used_fallback": false,
  "state_changes": {},
  "output_text": "...",
  "post_turn_snapshot": {
    "location": "村庄广场"
  }
}

If you need to debug a bad interaction, the fastest path is:

check the log file
inspect nlu_result
inspect telemetry.used_fallback
inspect state_changes
inspect the post-turn snapshot

Evaluation Pipeline

Evaluation entry point:

evaluation/run_evaluations.py

Datasets:

Results:

evaluation/results

What each task measures

Intent

labeled input -> predicted intent
optional target matching
parser source breakdown
per-example latency

Consistency

action guard correctness via pre_validate_action
contradiction detection via check_consistency

Latency

NLU latency
generation latency
total latency
fallback rate

Branch divergence

same start state, different choices
compare resulting story text
compare option differences
compare state snapshot differences

Common Development Tasks

Add a new intent

You will usually need to touch:

Suggested checklist:

add the label to the NLU logic
decide whether it needs pre-validation
make sure story prompts know how to handle it
add at least a few evaluation examples

Add a new location, NPC, quest, or item

Most of the time you only need:

state_manager.py

That file contains the initial world setup and registry-style data.

Add more evaluation cases

Edit files under:

evaluation/datasets

This is the easiest way to improve the report without changing runtime logic.

Investigate a strange game turn

Check in this order:

interaction log under logs/interactions
parser_source in the NLU result
telemetry in the final story result
whether pre_validate_action rejected or allowed the turn
whether check_consistency flagged anything

Change UI behavior without touching gameplay

Edit:

app.py

Try not to put game rules in the UI layer.

Environment Notes

If `QWEN_API_KEY` is missing

warning logs will appear
some paths will still run through fallback logic
evaluation can still execute, but model-quality conclusions are not meaningful

If `openai` is not installed

the repo can still import in some cases because the client is lazily initialized
full Qwen generation will not work
evaluation scripts will mostly reflect fallback behavior

If `gradio` is not installed

the app cannot launch
offline evaluation scripts can still be useful

Current Known Limitations

These are the main gaps we still know about:

some item and equipment effects are stored as metadata but not fully executed as deterministic rules
combat and trade are still more prompt-driven than rule-driven
branch divergence is much more meaningful with a real model than in fallback-only mode
evaluation quality depends on whether the real model environment is available

Suggested Team Workflow

If multiple teammates are working in parallel, this split is usually clean:

gameplay/state teammate Focus on state_manager.py
prompt/generation teammate Focus on story_engine.py
NLU/evaluation teammate Focus on nlu_engine.py and evaluation
UI/demo teammate Focus on app.py
report teammate Focus on evaluation/results, logs/interactions, and case-study collection

What To Use in the Final Report

For the course report, the most useful artifacts from this repo are:

evaluation JSON outputs under evaluation/results
interaction logs under logs/interactions
dataset files under evaluation/datasets
readable state transitions from change_log
fallback metadata from telemetry

These can directly support:

experiment setup
metric definition
result tables
success cases
failure case analysis

License

MIT

StoryWeaver

What This Repository Contains

Quick Start

1. Install dependencies

2. Create .env

3. Run the app

4. Run evaluation

Recommended Reading Order

Repository Map

System Architecture

1. GameState is the source of truth

2. The app is a coordinator, not the game logic

Runtime Flow

Text input flow

Option click flow

Main Modules in More Detail

state_manager.py

nlu_engine.py

story_engine.py

app.py

telemetry.py

Logging and Observability

Evaluation Pipeline

What each task measures

Intent

Consistency

Latency

Branch divergence

Common Development Tasks

Add a new intent

Add a new location, NPC, quest, or item

Add more evaluation cases

Investigate a strange game turn

Change UI behavior without touching gameplay

Environment Notes

If QWEN_API_KEY is missing

If openai is not installed

If gradio is not installed

Current Known Limitations

Suggested Team Workflow

What To Use in the Final Report

License

2. Create `.env`

1. `GameState` is the source of truth

`state_manager.py`

`nlu_engine.py`

`story_engine.py`

`app.py`

`telemetry.py`

If `QWEN_API_KEY` is missing

If `openai` is not installed

If `gradio` is not installed