dnd-rag-g / docs /plan_progress.md
alexchilton's picture
refactor: Move equipment.txt to proper RAG data directory
d2c9cda

D&D RAG System - Implementation Progress

Project Start Date: November 6, 2024 Status: ✅ Production Ready Last Updated: November 6, 2024


📊 Overall Progress

Phase Status Progress Notes
Phase 1: Core Infrastructure ✅ Complete 5/5 All core systems operational
Phase 2: Data Processors ✅ Complete 4/4 All parsers with name weighting
Phase 3: Initialization ✅ Complete 2/2 Full system initialization working
Phase 4: Query Interface ✅ Complete 1/1 Interactive CLI tool added
Phase 5: GM Dialogue ✅ Complete 2/2 RAG-enhanced AI GM working
Phase 6: Character Creation ✅ Complete 2/2 Full character creator with RAG
Phase 7: Testing & Validation ✅ Complete 3/3 26+ comprehensive tests passing
Phase 8: Game Mechanics Engine 🚧 In Progress 0/5 Character-aware gameplay enhancements

Legend: ✅ Complete | 🚧 In Progress | ⏳ Pending | ❌ Blocked


📁 Phase 1: Core Infrastructure ✅ COMPLETE

✅ 1.1 Project Structure

  • Created dnd_rag_system/ directory
  • Created config/, core/, parsers/, systems/ subdirectories
  • Created __init__.py files for all packages

✅ 1.2 Configuration System

File: config/settings.py

  • ChromaDB configuration
  • Ollama model settings
  • Embedding model settings (all-MiniLM-L6-v2)
  • Collection naming conventions
  • Data source paths
  • Chunk size parameters (400 tokens)

✅ 1.3 Base Parser

File: core/base_parser.py

  • BaseParser abstract class
  • PDF extraction utilities (pdfplumber)
  • Text extraction utilities
  • Common validation methods
  • Error handling framework

✅ 1.4 Base Chunker

File: core/base_chunker.py

  • BaseChunker abstract class
  • Token estimation function
  • Chunk splitting with overlap
  • Metadata generation helpers
  • Chunk validation

✅ 1.5 ChromaDB Manager

File: core/chroma_manager.py

  • ChromaDBManager class
  • Collection management (create, get, delete)
  • Batch add operations
  • Single/multi-collection search
  • Statistics and reporting
  • Connection pooling

📚 Phase 2: Data Processors ✅ COMPLETE

✅ 2.1 Spell Parser ⭐ ENHANCED

File: parsers/spell_parser.py

  • Parse spells.txt (detailed descriptions)
  • Parse all_spells.txt (class/level associations)
  • Merge spell data
  • Name weighting - spell names appear 2-3× in chunks
  • Create spell chunks (full_spell, quick_reference, by_class)
  • Generate spell metadata (level, school, components, classes)
  • OCR error handling
  • ~86 spells → 250+ chunks

✅ 2.2 Monster Parser ⭐ ENHANCED

File: initialize_rag.py (inline loader)

  • Load from extracted_monsters.txt
  • Name weighting - monster names appear 2-3× in chunks
  • Monster stat block parsing
  • Combat stats extraction (CR, AC, HP)
  • Monster type extraction (e.g., "Large dragon", "Medium humanoid")
  • Generate monster metadata
  • Type tags for filtering (dragon, undead, beast, etc.)
  • ~332 monsters loaded

✅ 2.3 Class Parser ⭐ ENHANCED

File: initialize_rag.py (inline loader)

  • Load from extracted_classes.txt
  • Name weighting - class names appear 2-3× in chunks
  • Class feature extraction
  • Generate class metadata
  • ~12 classes loaded (all core D&D classes)

✅ 2.4 Race Parser ⭐ NEW!

File: initialize_rag.py (inline loader with PDF extraction)

  • PDF extraction from Player's Handbook (pages 18-46)
  • Race traits extraction
  • Ability score bonuses
  • Name weighting - race names appear 2-3× in chunks
  • Create race chunks (description, traits)
  • Generate race metadata (ability_increases, size, speed, darkvision, languages)
  • ~9 core races → 18 chunks

🚀 Phase 3: Initialization System ✅ COMPLETE

✅ 3.1 Master Init Script

File: initialize_rag.py

  • Command-line argument parsing
  • ChromaDB initialization
  • Collection creation/verification
  • Selective data loading (--only flag)
  • Clear existing data (--clear flag)
  • Progress reporting
  • Error handling and recovery
  • Summary statistics report
  • All 4 collections: spells, monsters, classes, races

✅ 3.2 Data Validation

  • Verify all source files present
  • Test full initialization
  • Benchmark loading times (~30s first run, ~5s subsequent)
  • 600+ total chunks loaded

🔍 Phase 4: Query Interface ✅ COMPLETE

✅ 4.1 Interactive Query Tool ⭐ NEW!

File: query_rag.py

  • Interactive CLI mode
  • Single-query mode
  • Collection-specific search (--spell, --monster, --class, --race)
  • Search all collections
  • Result formatting with metadata
  • Relevance scores
  • Commands: /spell, /monster, /class, /race, /stats, /help, /quit
  • Beautiful formatted output

Usage:

python query_rag.py                    # Interactive mode
python query_rag.py "fireball"         # Quick search
python query_rag.py --monster "dragon" # Search monsters

🎮 Phase 5: GM Dialogue System ✅ COMPLETE

✅ 5.1 RAG-Enhanced GM

File: systems/gm_dialogue.py

  • RAG-powered rule lookups in real-time
  • GM searches ChromaDB for spells, monsters, classes
  • Ollama integration
  • Context window management
  • Session state management

✅ 5.2 Dialogue Manager

File: run_gm_dialogue.py

  • Interactive game session
  • Commands: /help, /context, /history, /rag, /save, /quit
  • Turn tracking
  • Scene state persistence

👤 Phase 6: Character Creation ✅ COMPLETE

✅ 6.1 Character Creator

File: systems/character_creator.py

  • Interactive CLI interface
  • Race selection with RAG lookup
  • Class selection with RAG lookup
  • Ability score generation (standard array, roll, point buy)
  • Background selection
  • Equipment selection
  • Spell selection (for casters)
  • Character validation
  • JSON export
  • Character sheet display

✅ 6.2 Character Management

File: create_character.py

  • Save/load character files
  • Character sheet viewer
  • Integration with RAG system

🧪 Phase 7: Testing & Validation ✅ COMPLETE ⭐ NEW!

✅ 7.1 Comprehensive Test Suite

File: test_all_collections.py

  • 26+ automated tests
  • Name weighting validation - exact names rank first
  • Semantic search tests - related concepts found
  • Metadata extraction tests - CR, level, abilities validated
  • All 4 collections tested - spells, monsters, classes, races
  • Cross-collection search - multi-type queries
  • Pass/fail reporting with statistics
  • Detailed error messages

✅ 7.2 Manual Test Scripts

File: test_spell_search.py

  • Detailed search results for all collections
  • Distance/relevance scores
  • Metadata display
  • Preview of results

✅ 7.3 Integration Tests

  • Full initialization test
  • End-to-end query test
  • GM dialogue integration
  • Character creation flow

🎮 Phase 8: Game Mechanics Engine ✅ COMPLETE ⭐ COMPREHENSIVE STATE SYSTEM!

Goal: Transform AI from rule-maker to narrator by implementing programmatic game mechanics

✅ 8.0 Character-Aware Dialogue System ⭐ NEW!

File: play_with_character.py

  • Load or create characters for gameplay
  • Character context passed to GM (stats, equipment, spells)
  • Three character modes: Create new, Load JSON, Quick test
  • Commands: /character, /stats, /context for character info
  • Fixed tokenizer warning suppression
  • Dynamic character support (not hardcoded to one character)
  • Proper first/second person context ("The player is X" → AI uses "you")
  • Integration testing completed (Dec 1, 2024)

✅ 8.1 Comprehensive Game State System ⭐ COMPLETE! (Dec 25, 2024)

File: systems/game_state.py

Character State Management:

  • HP tracking (current, max, temporary HP)
  • Spell slots by level (1-9) with use/restore mechanics
  • Inventory system (add/remove items with quantities)
  • Equipment slots (main_hand, off_hand, armor, etc.)
  • D&D 5e conditions (14 official conditions: blinded, charmed, etc.)
  • Death saving throws (3 successes/failures)
  • Concentration mechanics for spells
  • Experience points and leveling system
  • Hit dice for short rest healing
  • Status query methods

Combat State Management:

  • Initiative system (sorted by roll)
  • Turn tracking with round numbers
  • Active effects with duration (buffs/debuffs)
  • Combat start/end mechanics
  • Effect duration ticking

Party Management:

  • Multiple character support
  • Party-wide operations (XP distribution, rests)
  • Shared party inventory
  • Party gold/currency management
  • Alive/conscious character filtering

Game Session State:

  • Location and scene tracking
  • Quest system (active/completed quests)
  • NPC tracking
  • In-game time advancement (day/night cycle)
  • Session notes
  • Comprehensive session summaries

Core Mechanics:

  • Take damage (with temp HP absorption)
  • Healing (can't exceed max HP)
  • Spell casting with slot consumption
  • Cantrip support (no slot cost)
  • Concentration checks
  • Short rest (spend hit dice to heal)
  • Long rest (restore HP, spell slots, hit dice)
  • Inventory add/remove/check
  • Item equipping/unequipping
  • Condition add/remove
  • State serialization (save/load to JSON)

✅ 8.2 Comprehensive Testing ⭐ 70 TESTS PASSING!

File: tests/test_game_state.py

  • 6 SpellSlots tests (use, restore, long rest, availability)
  • 3 DeathSaves tests (successes, failures, reset)
  • 34 CharacterState tests (HP, damage, healing, spells, inventory, conditions, rests, XP, serialization)
  • 9 CombatState tests (initiative, turns, rounds, effects, combat flow)
  • 11 PartyState tests (characters, XP distribution, gold, shared inventory, party rests)
  • 7 GameSession tests (quests, time, location, session summary)
  • 100% test pass rate
  • Full coverage of all game mechanics

🚧 8.3 Gradio Integration IN PROGRESS

Status: ⏳ Pending

  • Integrate state system with Gradio web interface
  • Display character HP, spell slots, and conditions in UI
  • Combat mode UI with initiative tracker
  • Inventory management UI
  • Party management UI
  • Save/load game sessions

🏗️ Phase 8 Architecture Notes

Option B: Hybrid AI + Rules Engine (SELECTED)

Problem: AI is unreliable at following D&D rules consistently

  • Ignores spells player casts
  • Allows spells player doesn't know
  • Doesn't track resources (HP, spell slots)
  • Makes up mechanics on the fly

Solution: Intercept player actions BEFORE AI sees them

Flow:

  1. Player Input: "I cast Magic Missile at the goblin"
  2. Rules Engine (Python code):
    • Parse: Detect spell casting intent
    • Validate: Check if player owns "Magic Missile" ✓
    • Validate: Check if player has 1st-level spell slot ✓
    • Retrieve: Get spell details from RAG (3 darts, 1d4+1 each)
    • Roll: 3d4+3 = 11 damage (programmatically)
    • Deduct: Spell slot consumed
    • Update: Target HP reduced by 11
  3. AI Prompt: "You successfully cast Magic Missile dealing 11 force damage to the goblin. The goblin now has 5 HP remaining. Describe the magical missiles striking the goblin."
  4. AI Response: (Just narrates the flavor, mechanics already handled)

Benefits:

  • AI becomes a narrator, not a rules engine
  • Mechanics are deterministic and accurate
  • AI can focus on storytelling
  • Players can trust the rules

Alternatives Rejected:

  • Option A (Pure AI): Too unreliable, tested and failed
  • Option C (Post-process AI): Too hard to fix bad outputs

📦 Supporting Files ✅ COMPLETE

✅ Dependencies

File: requirements.txt

  • chromadb
  • sentence-transformers
  • pdfplumber
  • ollama (Python client)
  • All dependencies working

✅ Documentation

  • README.md with full installation instructions
  • Quick start guide
  • Usage examples for all tools
  • Troubleshooting section
  • plan_progress.md (this file)

🎯 Success Metrics

Metric Target Current Status
Init Time (full) < 5 min ~30s ✅ Exceeded
Query Latency < 500ms ~100-200ms ✅ Exceeded
Name Weighting Exact match ranks #1 ✅ Working ✅ Complete
Total Chunks ~600 612+ ✅ Complete
Test Coverage > 80% 26+ tests ✅ Complete
Collections 4 collections 4 active ✅ Complete

🎨 Key Features Implemented

✅ Name-Weighted Retrieval

  • Spells: Name appears 3× (SPELL: name, name, name)
  • Monsters: Name appears 3× (MONSTER: name, name, name) + type extraction
  • Classes: Name appears 3× (CLASS: name, name, name)
  • Races: Name appears 3× (RACE: name, name, name) + trait extraction

✅ Multiple Chunk Types Per Entity

  • Spells: full_spell, quick_reference, by_class
  • Monsters: monster_stats with type tags
  • Classes: class_features
  • Races: race_description, race_traits

✅ Rich Metadata

  • Spells: level, school, casting_time, range, components, duration, classes, ritual, concentration
  • Monsters: challenge_rating, monster_type (size + type), type tags
  • Classes: name, content_type
  • Races: ability_increases, size, speed, darkvision, languages

📝 Notes & Decisions

Design Decisions

  • Database: ChromaDB for persistence and semantic search
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2 for speed/quality balance
  • LLM: Ollama with Qwen3-4B-RPG-Roleplay-V2 for D&D-tuned responses
  • Collection Strategy: Separate collections per content type for clean organization
  • Name Weighting: Entity names repeated 2-3× at chunk start for better exact-match retrieval
  • Multiple Chunks: Each entity creates multiple specialized chunks for different use cases

Key Improvements (Nov 6, 2024)

  1. Spell Parser Upgrade - Now uses sophisticated SpellParser class instead of inline code
  2. Name Weighting - All entity types now have weighted names for better retrieval
  3. Race Extraction - Full race data extracted from PDF with traits and metadata
  4. Monster Type Extraction - Automatic extraction of size and creature type
  5. Interactive Query Tool - New CLI for exploring the RAG system
  6. Comprehensive Tests - 26+ automated tests validating all functionality

Known Issues (Phase 8 Discovery - Dec 1, 2024)

  • AI Unreliability: Pure AI approach fails to consistently enforce D&D rules
    • Ignores valid spell casts (Magic Missile cast was turned into melee combat)
    • Allows invalid spells (Let Elara cast Fireball, which she doesn't know)
    • No resource tracking (spell slots, HP, gold)
  • Solution: Moving to Hybrid Architecture (Option B) with programmatic rules engine

Current Work (Phase 8)

  • 🚧 Spell Validation System: Programmatically check spell ownership before AI generation
  • 🚧 Resource Tracking: HP, spell slots, inventory management
  • 🚧 Combat Mechanics: Attack/damage rolls, initiative, turn tracking
  • 🚧 Rules Engine: Intercept player actions, apply mechanics, then AI narrates

Future Enhancements (Post-Phase 8)

  • Subrace Support: High Elf, Mountain Dwarf, etc. with specific abilities
  • Advanced Filtering: Search by CR range, spell level range, class, type
  • Web UI: Web interface for GM dialogue
  • Multiplayer Support: Multi-player sessions
  • Custom Content Import: User-created monsters/spells
  • Voice Interface: Voice commands for GM dialogue
  • Map/Battle Grid Integration: Visual battle maps

📅 Timeline

Date Milestone
2024-11-06 09:00 Project started, directory structure created
2024-11-06 12:00 Phase 1-2 complete (core infrastructure + basic parsers)
2024-11-06 15:00 Phase 3-6 complete (initialization, query, GM, character creator)
2024-11-06 18:00 Major upgrades: Name weighting, race extraction, comprehensive tests
2024-11-06 20:00 Phase 7 complete: All tests passing, documentation updated
2024-11-06 21:00 V1.0 COMPLETE - Production ready!
2024-12-01 18:00 Phase 8 started: Character-aware dialogue system created
2024-12-01 19:00 Testing reveals AI reliability issues with game mechanics
2024-12-01 19:30 Architecture decision: Option B (Hybrid Rules Engine) selected

🚀 Production Deployment Checklist

  • All 4 collections operational
  • Name weighting implemented for all entity types
  • Comprehensive test suite (26+ tests passing)
  • Interactive query tool
  • Documentation complete
  • GM dialogue system working
  • Character creator working
  • All dependencies installed
  • Error handling in place
  • Performance targets met

📊 Statistics

Collection Counts

  • Spells: 86 spells → 250+ chunks
  • Monsters: 332 monsters → 332 chunks
  • Classes: 12 classes → 12 chunks
  • Races: 9 races → 18 chunks
  • Total: ~612+ chunks in ChromaDB

Test Results

  • Total Tests: 26+
  • Pass Rate: 100%
  • Collections Tested: 4/4
  • Features Validated: Name weighting, semantic search, metadata extraction, cross-collection search


🏪 Phase 9: Shop System & Equipment Database ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)

Goal: Implement GM-driven conversational shopping with NPC shopkeepers

✅ 9.1 Equipment Database

File: loaders/equipment_loader.py, dnd_rag_system/data/equipment.txt

  • Parse D&D 5e equipment tables (weapons, armor, gear, tools, mounts)
  • Extract 58 equipment items with prices, weights, and properties
  • Load into ChromaDB dnd_equipment collection
  • Metadata: name, cost_gp, weight, category, properties
  • Integration with RAG system for shop queries

✅ 9.2 Shop System

File: systems/shop_system.py

  • ShopSystem class with RAG-powered inventory search
  • Natural language purchase/sell command parsing
  • Transaction validation (gold checks, inventory updates)
  • Fuzzy item name matching ("longsword", "long sword" both work)
  • D&D 5e sell mechanics (half price)
  • Shopkeeper personality context generator (friendly, grumpy, mysterious, etc.)
  • Integration hooks for GM dialogue system

✅ 9.3 Shop System Testing

File: test_shop_system.py

  • 7 comprehensive test suites
  • Shop inventory search tests
  • Item price lookup tests (with fuzzy matching)
  • Purchase transaction tests (gold deduction, inventory add)
  • Sell transaction tests (gold increase, inventory remove)
  • Chat command parsing tests (natural language + commands)
  • Shopkeeper context generation tests
  • Complete shopping experience simulation
  • 100% test pass rate

✅ 9.4 Documentation

File: SHOP_SYSTEM_GUIDE.md

  • Comprehensive usage guide with examples
  • Philosophy: Chat-first, mechanics-second
  • Example shopping sessions
  • Technical API documentation
  • GM best practices

Philosophy: Shop interactions happen through natural GM chat with NPC shopkeepers. System validates transactions and manages gold/inventory automatically while GM brings the shopkeeper to life with personality!


✅ Phase 10: Reality Check System ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)

Goal: Prevent GM hallucinations by validating player actions against game state

✅ 10.1 Action Validation System

File: systems/action_validator.py

  • ActionType enum (combat, spell_cast, conversation, item_use, exploration)
  • ValidationResult enum (valid, invalid, npc_introduction, fuzzy_match)
  • ActionIntent dataclass (structured action parsing)
  • ValidationReport dataclass (validation results with guidance)
  • Intent analysis from natural language input
  • State validation against GameSession
  • Fuzzy matching for flexible input (e.g., "goblin" → "Goblin Scout")
  • Context-aware prompting for GM guidance

✅ 10.2 Validation Logic

Combat Validation:

  • Target must exist in npcs_present or combat.initiative_order
  • Fuzzy matching for partial names
  • Clear error messages for invalid targets

Spell Validation:

  • Character must know the spell (fuzzy matching)
  • Spell must exist in character's spell list
  • Helpful suggestions for similar spells

Item Validation:

  • Item must be in character inventory
  • Quantity validation

NPC Conversation:

  • Allows contextually appropriate NPC introductions
  • Rejects NPCs that don't make sense in current scene
  • Auto-adds introduced NPCs to game state

✅ 10.3 GM Integration

File: systems/gm_dialogue_unified.py

  • Integrated ActionValidator into GameMaster.init
  • Modified generate_response() to validate before LLM generation
  • Updated _build_prompt() with validation guidance
  • Added _post_process_response() to auto-add introduced NPCs
  • Debug logging for validation steps

✅ 10.4 Reality Check Testing

File: test_reality_check.py

  • Combat validation tests (valid/invalid targets, fuzzy matching)
  • NPC conversation tests (introduction, rejection, fuzzy matching)
  • Spell casting tests (known/unknown spells)
  • Item usage tests (inventory validation)
  • Exploration tests (always allowed)
  • 100% test pass rate

Benefits:

  • Prevents GM from inventing non-existent entities
  • Maintains game state consistency
  • Preserves narrative freedom (GM can still introduce appropriate NPCs)
  • Fuzzy matching allows flexible player input
  • Clear error messaging guides GM narration

📦 Updated Statistics (Dec 26, 2024)

Collection Counts

  • Spells: 86 spells → 250+ chunks
  • Monsters: 332 monsters → 332 chunks
  • Classes: 12 classes → 12 chunks
  • Races: 9 races → 18 chunks
  • Equipment: 58 items → 58 chunks ⭐ NEW!
  • Total: ~670+ chunks in ChromaDB

Test Results

  • SpellSlots: 6 tests ✅
  • DeathSaves: 3 tests ✅
  • CharacterState: 34 tests ✅
  • CombatState: 9 tests ✅
  • PartyState: 11 tests ✅
  • GameSession: 7 tests ✅
  • Shop System: 7 test suites ✅ NEW!
  • Reality Check: 15+ tests ✅ NEW!
  • Total Tests: 92+ tests
  • Pass Rate: 100% ✅

Status: ✅ V3.0 PRODUCTION READY! (Shop System + Reality Check) Current Focus: Documentation and deployment Latest Achievements (Dec 26, 2024): 🎉 GM-Driven Shop System - Conversational shopping with NPC shopkeepers 🎉 Reality Check System - Prevents hallucinations while preserving narrative freedom 🎉 Equipment Database - 58 D&D 5e items with accurate prices

Next Steps:

  1. Deploy to Hugging Face Spaces
  2. Test with HF Inference API model (Qwen2.5-7B-Instruct)
  3. User feedback and iteration

Last Updated: December 26, 2024