Spaces:

alexchilton
/

dnd-rag-g

Build error

App Files Files Community

dnd-rag-g / docs /plan_progress.md

alexchilton

refactor: Move equipment.txt to proper RAG data directory

d2c9cda 5 days ago

preview code

raw

history blame contribute delete

23.6 kB

D&D RAG System - Implementation Progress

Project Start Date: November 6, 2024 Status: ✅ Production Ready Last Updated: November 6, 2024

📊 Overall Progress

Phase	Status	Progress	Notes
Phase 1: Core Infrastructure	✅ Complete	5/5	All core systems operational
Phase 2: Data Processors	✅ Complete	4/4	All parsers with name weighting
Phase 3: Initialization	✅ Complete	2/2	Full system initialization working
Phase 4: Query Interface	✅ Complete	1/1	Interactive CLI tool added
Phase 5: GM Dialogue	✅ Complete	2/2	RAG-enhanced AI GM working
Phase 6: Character Creation	✅ Complete	2/2	Full character creator with RAG
Phase 7: Testing & Validation	✅ Complete	3/3	26+ comprehensive tests passing
Phase 8: Game Mechanics Engine	🚧 In Progress	0/5	Character-aware gameplay enhancements

Legend: ✅ Complete | 🚧 In Progress | ⏳ Pending | ❌ Blocked

📁 Phase 1: Core Infrastructure ✅ COMPLETE

✅ 1.1 Project Structure

Created dnd_rag_system/ directory
Created config/, core/, parsers/, systems/ subdirectories
Created __init__.py files for all packages

✅ 1.2 Configuration System

File: config/settings.py

ChromaDB configuration
Ollama model settings
Embedding model settings (all-MiniLM-L6-v2)
Collection naming conventions
Data source paths
Chunk size parameters (400 tokens)

✅ 1.3 Base Parser

File: core/base_parser.py

BaseParser abstract class
PDF extraction utilities (pdfplumber)
Text extraction utilities
Common validation methods
Error handling framework

✅ 1.4 Base Chunker

File: core/base_chunker.py

BaseChunker abstract class
Token estimation function
Chunk splitting with overlap
Metadata generation helpers
Chunk validation

✅ 1.5 ChromaDB Manager

File: core/chroma_manager.py

ChromaDBManager class
Collection management (create, get, delete)
Batch add operations
Single/multi-collection search
Statistics and reporting
Connection pooling

📚 Phase 2: Data Processors ✅ COMPLETE

✅ 2.1 Spell Parser ⭐ ENHANCED

File: parsers/spell_parser.py

Parse spells.txt (detailed descriptions)
Parse all_spells.txt (class/level associations)
Merge spell data
Name weighting - spell names appear 2-3× in chunks
Create spell chunks (full_spell, quick_reference, by_class)
Generate spell metadata (level, school, components, classes)
OCR error handling
~86 spells → 250+ chunks

✅ 2.2 Monster Parser ⭐ ENHANCED

File: initialize_rag.py (inline loader)

Load from extracted_monsters.txt
Name weighting - monster names appear 2-3× in chunks
Monster stat block parsing
Combat stats extraction (CR, AC, HP)
Monster type extraction (e.g., "Large dragon", "Medium humanoid")
Generate monster metadata
Type tags for filtering (dragon, undead, beast, etc.)
~332 monsters loaded

✅ 2.3 Class Parser ⭐ ENHANCED

File: initialize_rag.py (inline loader)

Load from extracted_classes.txt
Name weighting - class names appear 2-3× in chunks
Class feature extraction
Generate class metadata
~12 classes loaded (all core D&D classes)

✅ 2.4 Race Parser ⭐ NEW!

File: initialize_rag.py (inline loader with PDF extraction)

PDF extraction from Player's Handbook (pages 18-46)
Race traits extraction
Ability score bonuses
Name weighting - race names appear 2-3× in chunks
Create race chunks (description, traits)
Generate race metadata (ability_increases, size, speed, darkvision, languages)
~9 core races → 18 chunks

🚀 Phase 3: Initialization System ✅ COMPLETE

✅ 3.1 Master Init Script

File: initialize_rag.py

Command-line argument parsing
ChromaDB initialization
Collection creation/verification
Selective data loading (--only flag)
Clear existing data (--clear flag)
Progress reporting
Error handling and recovery
Summary statistics report
All 4 collections: spells, monsters, classes, races

✅ 3.2 Data Validation

Verify all source files present
Test full initialization
Benchmark loading times (~30s first run, ~5s subsequent)
600+ total chunks loaded

🔍 Phase 4: Query Interface ✅ COMPLETE

✅ 4.1 Interactive Query Tool ⭐ NEW!

File: query_rag.py

Interactive CLI mode
Single-query mode
Collection-specific search (--spell, --monster, --class, --race)
Search all collections
Result formatting with metadata
Relevance scores
Commands: /spell, /monster, /class, /race, /stats, /help, /quit
Beautiful formatted output

Usage:

python query_rag.py                    # Interactive mode
python query_rag.py "fireball"         # Quick search
python query_rag.py --monster "dragon" # Search monsters

🎮 Phase 5: GM Dialogue System ✅ COMPLETE

✅ 5.1 RAG-Enhanced GM

File: systems/gm_dialogue.py

RAG-powered rule lookups in real-time
GM searches ChromaDB for spells, monsters, classes
Ollama integration
Context window management
Session state management

✅ 5.2 Dialogue Manager

File: run_gm_dialogue.py

Interactive game session
Commands: /help, /context, /history, /rag, /save, /quit
Turn tracking
Scene state persistence

👤 Phase 6: Character Creation ✅ COMPLETE

✅ 6.1 Character Creator

File: systems/character_creator.py

Interactive CLI interface
Race selection with RAG lookup
Class selection with RAG lookup
Ability score generation (standard array, roll, point buy)
Background selection
Equipment selection
Spell selection (for casters)
Character validation
JSON export
Character sheet display

✅ 6.2 Character Management

File: create_character.py

Save/load character files
Character sheet viewer
Integration with RAG system

🧪 Phase 7: Testing & Validation ✅ COMPLETE ⭐ NEW!

✅ 7.1 Comprehensive Test Suite

File: test_all_collections.py

26+ automated tests
Name weighting validation - exact names rank first
Semantic search tests - related concepts found
Metadata extraction tests - CR, level, abilities validated
All 4 collections tested - spells, monsters, classes, races
Cross-collection search - multi-type queries
Pass/fail reporting with statistics
Detailed error messages

✅ 7.2 Manual Test Scripts

File: test_spell_search.py

Detailed search results for all collections
Distance/relevance scores
Metadata display
Preview of results

✅ 7.3 Integration Tests

Full initialization test
End-to-end query test
GM dialogue integration
Character creation flow

🎮 Phase 8: Game Mechanics Engine ✅ COMPLETE ⭐ COMPREHENSIVE STATE SYSTEM!

Goal: Transform AI from rule-maker to narrator by implementing programmatic game mechanics

✅ 8.0 Character-Aware Dialogue System ⭐ NEW!

File: play_with_character.py

Load or create characters for gameplay
Character context passed to GM (stats, equipment, spells)
Three character modes: Create new, Load JSON, Quick test
Commands: /character, /stats, /context for character info
Fixed tokenizer warning suppression
Dynamic character support (not hardcoded to one character)
Proper first/second person context ("The player is X" → AI uses "you")
Integration testing completed (Dec 1, 2024)

✅ 8.1 Comprehensive Game State System ⭐ COMPLETE! (Dec 25, 2024)

File: systems/game_state.py

Character State Management:

HP tracking (current, max, temporary HP)
Spell slots by level (1-9) with use/restore mechanics
Inventory system (add/remove items with quantities)
Equipment slots (main_hand, off_hand, armor, etc.)
D&D 5e conditions (14 official conditions: blinded, charmed, etc.)
Death saving throws (3 successes/failures)
Concentration mechanics for spells
Experience points and leveling system
Hit dice for short rest healing
Status query methods

Combat State Management:

Initiative system (sorted by roll)
Turn tracking with round numbers
Active effects with duration (buffs/debuffs)
Combat start/end mechanics
Effect duration ticking

Party Management:

Multiple character support
Party-wide operations (XP distribution, rests)
Shared party inventory
Party gold/currency management
Alive/conscious character filtering

Game Session State:

Location and scene tracking
Quest system (active/completed quests)
NPC tracking
In-game time advancement (day/night cycle)
Session notes
Comprehensive session summaries

Core Mechanics:

Take damage (with temp HP absorption)
Healing (can't exceed max HP)
Spell casting with slot consumption
Cantrip support (no slot cost)
Concentration checks
Short rest (spend hit dice to heal)
Long rest (restore HP, spell slots, hit dice)
Inventory add/remove/check
Item equipping/unequipping
Condition add/remove
State serialization (save/load to JSON)

✅ 8.2 Comprehensive Testing ⭐ 70 TESTS PASSING!

File: tests/test_game_state.py

6 SpellSlots tests (use, restore, long rest, availability)
3 DeathSaves tests (successes, failures, reset)
34 CharacterState tests (HP, damage, healing, spells, inventory, conditions, rests, XP, serialization)
9 CombatState tests (initiative, turns, rounds, effects, combat flow)
11 PartyState tests (characters, XP distribution, gold, shared inventory, party rests)
7 GameSession tests (quests, time, location, session summary)
100% test pass rate
Full coverage of all game mechanics

🚧 8.3 Gradio Integration IN PROGRESS

Status: ⏳ Pending

Integrate state system with Gradio web interface
Display character HP, spell slots, and conditions in UI
Combat mode UI with initiative tracker
Inventory management UI
Party management UI
Save/load game sessions

🏗️ Phase 8 Architecture Notes

Option B: Hybrid AI + Rules Engine (SELECTED)

Problem: AI is unreliable at following D&D rules consistently

Ignores spells player casts
Allows spells player doesn't know
Doesn't track resources (HP, spell slots)
Makes up mechanics on the fly

Solution: Intercept player actions BEFORE AI sees them

Flow:

Player Input: "I cast Magic Missile at the goblin"
Rules Engine (Python code):
- Parse: Detect spell casting intent
- Validate: Check if player owns "Magic Missile" ✓
- Validate: Check if player has 1st-level spell slot ✓
- Retrieve: Get spell details from RAG (3 darts, 1d4+1 each)
- Roll: 3d4+3 = 11 damage (programmatically)
- Deduct: Spell slot consumed
- Update: Target HP reduced by 11
AI Prompt: "You successfully cast Magic Missile dealing 11 force damage to the goblin. The goblin now has 5 HP remaining. Describe the magical missiles striking the goblin."
AI Response: (Just narrates the flavor, mechanics already handled)

Benefits:

AI becomes a narrator, not a rules engine
Mechanics are deterministic and accurate
AI can focus on storytelling
Players can trust the rules

Alternatives Rejected:

Option A (Pure AI): Too unreliable, tested and failed
Option C (Post-process AI): Too hard to fix bad outputs

📦 Supporting Files ✅ COMPLETE

✅ Dependencies

File: requirements.txt

chromadb
sentence-transformers
pdfplumber
ollama (Python client)
All dependencies working

✅ Documentation

README.md with full installation instructions
Quick start guide
Usage examples for all tools
Troubleshooting section
plan_progress.md (this file)

🎯 Success Metrics

Metric	Target	Current	Status
Init Time (full)	< 5 min	~30s	✅ Exceeded
Query Latency	< 500ms	~100-200ms	✅ Exceeded
Name Weighting	Exact match ranks #1	✅ Working	✅ Complete
Total Chunks	~600	612+	✅ Complete
Test Coverage	> 80%	26+ tests	✅ Complete
Collections	4 collections	4 active	✅ Complete

🎨 Key Features Implemented

✅ Name-Weighted Retrieval

Spells: Name appears 3× (SPELL: name, name, name)
Monsters: Name appears 3× (MONSTER: name, name, name) + type extraction
Classes: Name appears 3× (CLASS: name, name, name)
Races: Name appears 3× (RACE: name, name, name) + trait extraction

✅ Multiple Chunk Types Per Entity

Spells: full_spell, quick_reference, by_class
Monsters: monster_stats with type tags
Classes: class_features
Races: race_description, race_traits

✅ Rich Metadata

Spells: level, school, casting_time, range, components, duration, classes, ritual, concentration
Monsters: challenge_rating, monster_type (size + type), type tags
Classes: name, content_type
Races: ability_increases, size, speed, darkvision, languages

📝 Notes & Decisions

Design Decisions

Database: ChromaDB for persistence and semantic search
Embeddings: sentence-transformers/all-MiniLM-L6-v2 for speed/quality balance
LLM: Ollama with Qwen3-4B-RPG-Roleplay-V2 for D&D-tuned responses
Collection Strategy: Separate collections per content type for clean organization
Name Weighting: Entity names repeated 2-3× at chunk start for better exact-match retrieval
Multiple Chunks: Each entity creates multiple specialized chunks for different use cases

Key Improvements (Nov 6, 2024)

✅ Spell Parser Upgrade - Now uses sophisticated SpellParser class instead of inline code
✅ Name Weighting - All entity types now have weighted names for better retrieval
✅ Race Extraction - Full race data extracted from PDF with traits and metadata
✅ Monster Type Extraction - Automatic extraction of size and creature type
✅ Interactive Query Tool - New CLI for exploring the RAG system
✅ Comprehensive Tests - 26+ automated tests validating all functionality

Known Issues (Phase 8 Discovery - Dec 1, 2024)

AI Unreliability: Pure AI approach fails to consistently enforce D&D rules
- Ignores valid spell casts (Magic Missile cast was turned into melee combat)
- Allows invalid spells (Let Elara cast Fireball, which she doesn't know)
- No resource tracking (spell slots, HP, gold)
Solution: Moving to Hybrid Architecture (Option B) with programmatic rules engine

Current Work (Phase 8)

🚧 Spell Validation System: Programmatically check spell ownership before AI generation
🚧 Resource Tracking: HP, spell slots, inventory management
🚧 Combat Mechanics: Attack/damage rolls, initiative, turn tracking
🚧 Rules Engine: Intercept player actions, apply mechanics, then AI narrates

Future Enhancements (Post-Phase 8)

⏳ Subrace Support: High Elf, Mountain Dwarf, etc. with specific abilities
⏳ Advanced Filtering: Search by CR range, spell level range, class, type
⏳ Web UI: Web interface for GM dialogue
⏳ Multiplayer Support: Multi-player sessions
⏳ Custom Content Import: User-created monsters/spells
⏳ Voice Interface: Voice commands for GM dialogue
⏳ Map/Battle Grid Integration: Visual battle maps

📅 Timeline

Date	Milestone
2024-11-06 09:00	Project started, directory structure created
2024-11-06 12:00	Phase 1-2 complete (core infrastructure + basic parsers)
2024-11-06 15:00	Phase 3-6 complete (initialization, query, GM, character creator)
2024-11-06 18:00	Major upgrades: Name weighting, race extraction, comprehensive tests
2024-11-06 20:00	Phase 7 complete: All tests passing, documentation updated
2024-11-06 21:00	V1.0 COMPLETE - Production ready!
2024-12-01 18:00	Phase 8 started: Character-aware dialogue system created
2024-12-01 19:00	Testing reveals AI reliability issues with game mechanics
2024-12-01 19:30	Architecture decision: Option B (Hybrid Rules Engine) selected

🚀 Production Deployment Checklist

All 4 collections operational
Name weighting implemented for all entity types
Comprehensive test suite (26+ tests passing)
Interactive query tool
Documentation complete
GM dialogue system working
Character creator working
All dependencies installed
Error handling in place
Performance targets met

📊 Statistics

Collection Counts

Spells: 86 spells → 250+ chunks
Monsters: 332 monsters → 332 chunks
Classes: 12 classes → 12 chunks
Races: 9 races → 18 chunks
Total: ~612+ chunks in ChromaDB

Test Results

Total Tests: 26+
Pass Rate: 100%
Collections Tested: 4/4
Features Validated: Name weighting, semantic search, metadata extraction, cross-collection search

🏪 Phase 9: Shop System & Equipment Database ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)

Goal: Implement GM-driven conversational shopping with NPC shopkeepers

✅ 9.1 Equipment Database

File: loaders/equipment_loader.py, dnd_rag_system/data/equipment.txt

Parse D&D 5e equipment tables (weapons, armor, gear, tools, mounts)
Extract 58 equipment items with prices, weights, and properties
Load into ChromaDB dnd_equipment collection
Metadata: name, cost_gp, weight, category, properties
Integration with RAG system for shop queries

✅ 9.2 Shop System

File: systems/shop_system.py

ShopSystem class with RAG-powered inventory search
Natural language purchase/sell command parsing
Transaction validation (gold checks, inventory updates)
Fuzzy item name matching ("longsword", "long sword" both work)
D&D 5e sell mechanics (half price)
Shopkeeper personality context generator (friendly, grumpy, mysterious, etc.)
Integration hooks for GM dialogue system

✅ 9.3 Shop System Testing

File: test_shop_system.py

7 comprehensive test suites
Shop inventory search tests
Item price lookup tests (with fuzzy matching)
Purchase transaction tests (gold deduction, inventory add)
Sell transaction tests (gold increase, inventory remove)
Chat command parsing tests (natural language + commands)
Shopkeeper context generation tests
Complete shopping experience simulation
100% test pass rate

✅ 9.4 Documentation

File: SHOP_SYSTEM_GUIDE.md

Comprehensive usage guide with examples
Philosophy: Chat-first, mechanics-second
Example shopping sessions
Technical API documentation
GM best practices

Philosophy: Shop interactions happen through natural GM chat with NPC shopkeepers. System validates transactions and manages gold/inventory automatically while GM brings the shopkeeper to life with personality!

✅ Phase 10: Reality Check System ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)

Goal: Prevent GM hallucinations by validating player actions against game state

✅ 10.1 Action Validation System

File: systems/action_validator.py

ActionType enum (combat, spell_cast, conversation, item_use, exploration)
ValidationResult enum (valid, invalid, npc_introduction, fuzzy_match)
ActionIntent dataclass (structured action parsing)
ValidationReport dataclass (validation results with guidance)
Intent analysis from natural language input
State validation against GameSession
Fuzzy matching for flexible input (e.g., "goblin" → "Goblin Scout")
Context-aware prompting for GM guidance

✅ 10.2 Validation Logic

Combat Validation:

Target must exist in npcs_present or combat.initiative_order
Fuzzy matching for partial names
Clear error messages for invalid targets

Spell Validation:

Character must know the spell (fuzzy matching)
Spell must exist in character's spell list
Helpful suggestions for similar spells

Item Validation:

Item must be in character inventory
Quantity validation

NPC Conversation:

Allows contextually appropriate NPC introductions
Rejects NPCs that don't make sense in current scene
Auto-adds introduced NPCs to game state

✅ 10.3 GM Integration

File: systems/gm_dialogue_unified.py

Integrated ActionValidator into GameMaster.init
Modified generate_response() to validate before LLM generation
Updated _build_prompt() with validation guidance
Added _post_process_response() to auto-add introduced NPCs
Debug logging for validation steps

✅ 10.4 Reality Check Testing

File: test_reality_check.py

Combat validation tests (valid/invalid targets, fuzzy matching)
NPC conversation tests (introduction, rejection, fuzzy matching)
Spell casting tests (known/unknown spells)
Item usage tests (inventory validation)
Exploration tests (always allowed)
100% test pass rate

Benefits:

Prevents GM from inventing non-existent entities
Maintains game state consistency
Preserves narrative freedom (GM can still introduce appropriate NPCs)
Fuzzy matching allows flexible player input
Clear error messaging guides GM narration

📦 Updated Statistics (Dec 26, 2024)

Collection Counts

Spells: 86 spells → 250+ chunks
Monsters: 332 monsters → 332 chunks
Classes: 12 classes → 12 chunks
Races: 9 races → 18 chunks
Equipment: 58 items → 58 chunks ⭐ NEW!
Total: ~670+ chunks in ChromaDB

Test Results

SpellSlots: 6 tests ✅
DeathSaves: 3 tests ✅
CharacterState: 34 tests ✅
CombatState: 9 tests ✅
PartyState: 11 tests ✅
GameSession: 7 tests ✅
Shop System: 7 test suites ✅ NEW!
Reality Check: 15+ tests ✅ NEW!
Total Tests: 92+ tests
Pass Rate: 100% ✅

Status: ✅ V3.0 PRODUCTION READY! (Shop System + Reality Check) Current Focus: Documentation and deployment Latest Achievements (Dec 26, 2024): 🎉 GM-Driven Shop System - Conversational shopping with NPC shopkeepers 🎉 Reality Check System - Prevents hallucinations while preserving narrative freedom 🎉 Equipment Database - 58 D&D 5e items with accurate prices

Next Steps:

Deploy to Hugging Face Spaces
Test with HF Inference API model (Qwen2.5-7B-Instruct)
User feedback and iteration

Last Updated: December 26, 2024