Spaces:
Build error
D&D RAG System - Implementation Progress
Project Start Date: November 6, 2024 Status: ✅ Production Ready Last Updated: November 6, 2024
📊 Overall Progress
| Phase | Status | Progress | Notes |
|---|---|---|---|
| Phase 1: Core Infrastructure | ✅ Complete | 5/5 | All core systems operational |
| Phase 2: Data Processors | ✅ Complete | 4/4 | All parsers with name weighting |
| Phase 3: Initialization | ✅ Complete | 2/2 | Full system initialization working |
| Phase 4: Query Interface | ✅ Complete | 1/1 | Interactive CLI tool added |
| Phase 5: GM Dialogue | ✅ Complete | 2/2 | RAG-enhanced AI GM working |
| Phase 6: Character Creation | ✅ Complete | 2/2 | Full character creator with RAG |
| Phase 7: Testing & Validation | ✅ Complete | 3/3 | 26+ comprehensive tests passing |
| Phase 8: Game Mechanics Engine | 🚧 In Progress | 0/5 | Character-aware gameplay enhancements |
Legend: ✅ Complete | 🚧 In Progress | ⏳ Pending | ❌ Blocked
📁 Phase 1: Core Infrastructure ✅ COMPLETE
✅ 1.1 Project Structure
- Created
dnd_rag_system/directory - Created
config/,core/,parsers/,systems/subdirectories - Created
__init__.pyfiles for all packages
✅ 1.2 Configuration System
File: config/settings.py
- ChromaDB configuration
- Ollama model settings
- Embedding model settings (all-MiniLM-L6-v2)
- Collection naming conventions
- Data source paths
- Chunk size parameters (400 tokens)
✅ 1.3 Base Parser
File: core/base_parser.py
-
BaseParserabstract class - PDF extraction utilities (pdfplumber)
- Text extraction utilities
- Common validation methods
- Error handling framework
✅ 1.4 Base Chunker
File: core/base_chunker.py
-
BaseChunkerabstract class - Token estimation function
- Chunk splitting with overlap
- Metadata generation helpers
- Chunk validation
✅ 1.5 ChromaDB Manager
File: core/chroma_manager.py
-
ChromaDBManagerclass - Collection management (create, get, delete)
- Batch add operations
- Single/multi-collection search
- Statistics and reporting
- Connection pooling
📚 Phase 2: Data Processors ✅ COMPLETE
✅ 2.1 Spell Parser ⭐ ENHANCED
File: parsers/spell_parser.py
- Parse
spells.txt(detailed descriptions) - Parse
all_spells.txt(class/level associations) - Merge spell data
- Name weighting - spell names appear 2-3× in chunks
- Create spell chunks (full_spell, quick_reference, by_class)
- Generate spell metadata (level, school, components, classes)
- OCR error handling
- ~86 spells → 250+ chunks
✅ 2.2 Monster Parser ⭐ ENHANCED
File: initialize_rag.py (inline loader)
- Load from
extracted_monsters.txt - Name weighting - monster names appear 2-3× in chunks
- Monster stat block parsing
- Combat stats extraction (CR, AC, HP)
- Monster type extraction (e.g., "Large dragon", "Medium humanoid")
- Generate monster metadata
- Type tags for filtering (dragon, undead, beast, etc.)
- ~332 monsters loaded
✅ 2.3 Class Parser ⭐ ENHANCED
File: initialize_rag.py (inline loader)
- Load from
extracted_classes.txt - Name weighting - class names appear 2-3× in chunks
- Class feature extraction
- Generate class metadata
- ~12 classes loaded (all core D&D classes)
✅ 2.4 Race Parser ⭐ NEW!
File: initialize_rag.py (inline loader with PDF extraction)
- PDF extraction from Player's Handbook (pages 18-46)
- Race traits extraction
- Ability score bonuses
- Name weighting - race names appear 2-3× in chunks
- Create race chunks (description, traits)
- Generate race metadata (ability_increases, size, speed, darkvision, languages)
- ~9 core races → 18 chunks
🚀 Phase 3: Initialization System ✅ COMPLETE
✅ 3.1 Master Init Script
File: initialize_rag.py
- Command-line argument parsing
- ChromaDB initialization
- Collection creation/verification
- Selective data loading (--only flag)
- Clear existing data (--clear flag)
- Progress reporting
- Error handling and recovery
- Summary statistics report
- All 4 collections: spells, monsters, classes, races
✅ 3.2 Data Validation
- Verify all source files present
- Test full initialization
- Benchmark loading times (~30s first run, ~5s subsequent)
- 600+ total chunks loaded
🔍 Phase 4: Query Interface ✅ COMPLETE
✅ 4.1 Interactive Query Tool ⭐ NEW!
File: query_rag.py
- Interactive CLI mode
- Single-query mode
- Collection-specific search (--spell, --monster, --class, --race)
- Search all collections
- Result formatting with metadata
- Relevance scores
- Commands: /spell, /monster, /class, /race, /stats, /help, /quit
- Beautiful formatted output
Usage:
python query_rag.py # Interactive mode
python query_rag.py "fireball" # Quick search
python query_rag.py --monster "dragon" # Search monsters
🎮 Phase 5: GM Dialogue System ✅ COMPLETE
✅ 5.1 RAG-Enhanced GM
File: systems/gm_dialogue.py
- RAG-powered rule lookups in real-time
- GM searches ChromaDB for spells, monsters, classes
- Ollama integration
- Context window management
- Session state management
✅ 5.2 Dialogue Manager
File: run_gm_dialogue.py
- Interactive game session
- Commands: /help, /context, /history, /rag, /save, /quit
- Turn tracking
- Scene state persistence
👤 Phase 6: Character Creation ✅ COMPLETE
✅ 6.1 Character Creator
File: systems/character_creator.py
- Interactive CLI interface
- Race selection with RAG lookup
- Class selection with RAG lookup
- Ability score generation (standard array, roll, point buy)
- Background selection
- Equipment selection
- Spell selection (for casters)
- Character validation
- JSON export
- Character sheet display
✅ 6.2 Character Management
File: create_character.py
- Save/load character files
- Character sheet viewer
- Integration with RAG system
🧪 Phase 7: Testing & Validation ✅ COMPLETE ⭐ NEW!
✅ 7.1 Comprehensive Test Suite
File: test_all_collections.py
- 26+ automated tests
- Name weighting validation - exact names rank first
- Semantic search tests - related concepts found
- Metadata extraction tests - CR, level, abilities validated
- All 4 collections tested - spells, monsters, classes, races
- Cross-collection search - multi-type queries
- Pass/fail reporting with statistics
- Detailed error messages
✅ 7.2 Manual Test Scripts
File: test_spell_search.py
- Detailed search results for all collections
- Distance/relevance scores
- Metadata display
- Preview of results
✅ 7.3 Integration Tests
- Full initialization test
- End-to-end query test
- GM dialogue integration
- Character creation flow
🎮 Phase 8: Game Mechanics Engine ✅ COMPLETE ⭐ COMPREHENSIVE STATE SYSTEM!
Goal: Transform AI from rule-maker to narrator by implementing programmatic game mechanics
✅ 8.0 Character-Aware Dialogue System ⭐ NEW!
File: play_with_character.py
- Load or create characters for gameplay
- Character context passed to GM (stats, equipment, spells)
- Three character modes: Create new, Load JSON, Quick test
- Commands:
/character,/stats,/contextfor character info - Fixed tokenizer warning suppression
- Dynamic character support (not hardcoded to one character)
- Proper first/second person context ("The player is X" → AI uses "you")
- Integration testing completed (Dec 1, 2024)
✅ 8.1 Comprehensive Game State System ⭐ COMPLETE! (Dec 25, 2024)
File: systems/game_state.py
Character State Management:
- HP tracking (current, max, temporary HP)
- Spell slots by level (1-9) with use/restore mechanics
- Inventory system (add/remove items with quantities)
- Equipment slots (main_hand, off_hand, armor, etc.)
- D&D 5e conditions (14 official conditions: blinded, charmed, etc.)
- Death saving throws (3 successes/failures)
- Concentration mechanics for spells
- Experience points and leveling system
- Hit dice for short rest healing
- Status query methods
Combat State Management:
- Initiative system (sorted by roll)
- Turn tracking with round numbers
- Active effects with duration (buffs/debuffs)
- Combat start/end mechanics
- Effect duration ticking
Party Management:
- Multiple character support
- Party-wide operations (XP distribution, rests)
- Shared party inventory
- Party gold/currency management
- Alive/conscious character filtering
Game Session State:
- Location and scene tracking
- Quest system (active/completed quests)
- NPC tracking
- In-game time advancement (day/night cycle)
- Session notes
- Comprehensive session summaries
Core Mechanics:
- Take damage (with temp HP absorption)
- Healing (can't exceed max HP)
- Spell casting with slot consumption
- Cantrip support (no slot cost)
- Concentration checks
- Short rest (spend hit dice to heal)
- Long rest (restore HP, spell slots, hit dice)
- Inventory add/remove/check
- Item equipping/unequipping
- Condition add/remove
- State serialization (save/load to JSON)
✅ 8.2 Comprehensive Testing ⭐ 70 TESTS PASSING!
File: tests/test_game_state.py
- 6 SpellSlots tests (use, restore, long rest, availability)
- 3 DeathSaves tests (successes, failures, reset)
- 34 CharacterState tests (HP, damage, healing, spells, inventory, conditions, rests, XP, serialization)
- 9 CombatState tests (initiative, turns, rounds, effects, combat flow)
- 11 PartyState tests (characters, XP distribution, gold, shared inventory, party rests)
- 7 GameSession tests (quests, time, location, session summary)
- 100% test pass rate
- Full coverage of all game mechanics
🚧 8.3 Gradio Integration IN PROGRESS
Status: ⏳ Pending
- Integrate state system with Gradio web interface
- Display character HP, spell slots, and conditions in UI
- Combat mode UI with initiative tracker
- Inventory management UI
- Party management UI
- Save/load game sessions
🏗️ Phase 8 Architecture Notes
Option B: Hybrid AI + Rules Engine (SELECTED)
Problem: AI is unreliable at following D&D rules consistently
- Ignores spells player casts
- Allows spells player doesn't know
- Doesn't track resources (HP, spell slots)
- Makes up mechanics on the fly
Solution: Intercept player actions BEFORE AI sees them
Flow:
- Player Input: "I cast Magic Missile at the goblin"
- Rules Engine (Python code):
- Parse: Detect spell casting intent
- Validate: Check if player owns "Magic Missile" ✓
- Validate: Check if player has 1st-level spell slot ✓
- Retrieve: Get spell details from RAG (3 darts, 1d4+1 each)
- Roll: 3d4+3 = 11 damage (programmatically)
- Deduct: Spell slot consumed
- Update: Target HP reduced by 11
- AI Prompt: "You successfully cast Magic Missile dealing 11 force damage to the goblin. The goblin now has 5 HP remaining. Describe the magical missiles striking the goblin."
- AI Response: (Just narrates the flavor, mechanics already handled)
Benefits:
- AI becomes a narrator, not a rules engine
- Mechanics are deterministic and accurate
- AI can focus on storytelling
- Players can trust the rules
Alternatives Rejected:
- Option A (Pure AI): Too unreliable, tested and failed
- Option C (Post-process AI): Too hard to fix bad outputs
📦 Supporting Files ✅ COMPLETE
✅ Dependencies
File: requirements.txt
- chromadb
- sentence-transformers
- pdfplumber
- ollama (Python client)
- All dependencies working
✅ Documentation
- README.md with full installation instructions
- Quick start guide
- Usage examples for all tools
- Troubleshooting section
- plan_progress.md (this file)
🎯 Success Metrics
| Metric | Target | Current | Status |
|---|---|---|---|
| Init Time (full) | < 5 min | ~30s | ✅ Exceeded |
| Query Latency | < 500ms | ~100-200ms | ✅ Exceeded |
| Name Weighting | Exact match ranks #1 | ✅ Working | ✅ Complete |
| Total Chunks | ~600 | 612+ | ✅ Complete |
| Test Coverage | > 80% | 26+ tests | ✅ Complete |
| Collections | 4 collections | 4 active | ✅ Complete |
🎨 Key Features Implemented
✅ Name-Weighted Retrieval
- Spells: Name appears 3× (SPELL: name, name, name)
- Monsters: Name appears 3× (MONSTER: name, name, name) + type extraction
- Classes: Name appears 3× (CLASS: name, name, name)
- Races: Name appears 3× (RACE: name, name, name) + trait extraction
✅ Multiple Chunk Types Per Entity
- Spells: full_spell, quick_reference, by_class
- Monsters: monster_stats with type tags
- Classes: class_features
- Races: race_description, race_traits
✅ Rich Metadata
- Spells: level, school, casting_time, range, components, duration, classes, ritual, concentration
- Monsters: challenge_rating, monster_type (size + type), type tags
- Classes: name, content_type
- Races: ability_increases, size, speed, darkvision, languages
📝 Notes & Decisions
Design Decisions
- Database: ChromaDB for persistence and semantic search
- Embeddings: sentence-transformers/all-MiniLM-L6-v2 for speed/quality balance
- LLM: Ollama with Qwen3-4B-RPG-Roleplay-V2 for D&D-tuned responses
- Collection Strategy: Separate collections per content type for clean organization
- Name Weighting: Entity names repeated 2-3× at chunk start for better exact-match retrieval
- Multiple Chunks: Each entity creates multiple specialized chunks for different use cases
Key Improvements (Nov 6, 2024)
- ✅ Spell Parser Upgrade - Now uses sophisticated
SpellParserclass instead of inline code - ✅ Name Weighting - All entity types now have weighted names for better retrieval
- ✅ Race Extraction - Full race data extracted from PDF with traits and metadata
- ✅ Monster Type Extraction - Automatic extraction of size and creature type
- ✅ Interactive Query Tool - New CLI for exploring the RAG system
- ✅ Comprehensive Tests - 26+ automated tests validating all functionality
Known Issues (Phase 8 Discovery - Dec 1, 2024)
- AI Unreliability: Pure AI approach fails to consistently enforce D&D rules
- Ignores valid spell casts (Magic Missile cast was turned into melee combat)
- Allows invalid spells (Let Elara cast Fireball, which she doesn't know)
- No resource tracking (spell slots, HP, gold)
- Solution: Moving to Hybrid Architecture (Option B) with programmatic rules engine
Current Work (Phase 8)
- 🚧 Spell Validation System: Programmatically check spell ownership before AI generation
- 🚧 Resource Tracking: HP, spell slots, inventory management
- 🚧 Combat Mechanics: Attack/damage rolls, initiative, turn tracking
- 🚧 Rules Engine: Intercept player actions, apply mechanics, then AI narrates
Future Enhancements (Post-Phase 8)
- ⏳ Subrace Support: High Elf, Mountain Dwarf, etc. with specific abilities
- ⏳ Advanced Filtering: Search by CR range, spell level range, class, type
- ⏳ Web UI: Web interface for GM dialogue
- ⏳ Multiplayer Support: Multi-player sessions
- ⏳ Custom Content Import: User-created monsters/spells
- ⏳ Voice Interface: Voice commands for GM dialogue
- ⏳ Map/Battle Grid Integration: Visual battle maps
📅 Timeline
| Date | Milestone |
|---|---|
| 2024-11-06 09:00 | Project started, directory structure created |
| 2024-11-06 12:00 | Phase 1-2 complete (core infrastructure + basic parsers) |
| 2024-11-06 15:00 | Phase 3-6 complete (initialization, query, GM, character creator) |
| 2024-11-06 18:00 | Major upgrades: Name weighting, race extraction, comprehensive tests |
| 2024-11-06 20:00 | Phase 7 complete: All tests passing, documentation updated |
| 2024-11-06 21:00 | V1.0 COMPLETE - Production ready! |
| 2024-12-01 18:00 | Phase 8 started: Character-aware dialogue system created |
| 2024-12-01 19:00 | Testing reveals AI reliability issues with game mechanics |
| 2024-12-01 19:30 | Architecture decision: Option B (Hybrid Rules Engine) selected |
🚀 Production Deployment Checklist
- All 4 collections operational
- Name weighting implemented for all entity types
- Comprehensive test suite (26+ tests passing)
- Interactive query tool
- Documentation complete
- GM dialogue system working
- Character creator working
- All dependencies installed
- Error handling in place
- Performance targets met
📊 Statistics
Collection Counts
- Spells: 86 spells → 250+ chunks
- Monsters: 332 monsters → 332 chunks
- Classes: 12 classes → 12 chunks
- Races: 9 races → 18 chunks
- Total: ~612+ chunks in ChromaDB
Test Results
- Total Tests: 26+
- Pass Rate: 100%
- Collections Tested: 4/4
- Features Validated: Name weighting, semantic search, metadata extraction, cross-collection search
🏪 Phase 9: Shop System & Equipment Database ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)
Goal: Implement GM-driven conversational shopping with NPC shopkeepers
✅ 9.1 Equipment Database
File: loaders/equipment_loader.py, dnd_rag_system/data/equipment.txt
- Parse D&D 5e equipment tables (weapons, armor, gear, tools, mounts)
- Extract 58 equipment items with prices, weights, and properties
- Load into ChromaDB
dnd_equipmentcollection - Metadata: name, cost_gp, weight, category, properties
- Integration with RAG system for shop queries
✅ 9.2 Shop System
File: systems/shop_system.py
- ShopSystem class with RAG-powered inventory search
- Natural language purchase/sell command parsing
- Transaction validation (gold checks, inventory updates)
- Fuzzy item name matching ("longsword", "long sword" both work)
- D&D 5e sell mechanics (half price)
- Shopkeeper personality context generator (friendly, grumpy, mysterious, etc.)
- Integration hooks for GM dialogue system
✅ 9.3 Shop System Testing
File: test_shop_system.py
- 7 comprehensive test suites
- Shop inventory search tests
- Item price lookup tests (with fuzzy matching)
- Purchase transaction tests (gold deduction, inventory add)
- Sell transaction tests (gold increase, inventory remove)
- Chat command parsing tests (natural language + commands)
- Shopkeeper context generation tests
- Complete shopping experience simulation
- 100% test pass rate
✅ 9.4 Documentation
File: SHOP_SYSTEM_GUIDE.md
- Comprehensive usage guide with examples
- Philosophy: Chat-first, mechanics-second
- Example shopping sessions
- Technical API documentation
- GM best practices
Philosophy: Shop interactions happen through natural GM chat with NPC shopkeepers. System validates transactions and manages gold/inventory automatically while GM brings the shopkeeper to life with personality!
✅ Phase 10: Reality Check System ✅ COMPLETE ⭐ NEW! (Dec 26, 2024)
Goal: Prevent GM hallucinations by validating player actions against game state
✅ 10.1 Action Validation System
File: systems/action_validator.py
- ActionType enum (combat, spell_cast, conversation, item_use, exploration)
- ValidationResult enum (valid, invalid, npc_introduction, fuzzy_match)
- ActionIntent dataclass (structured action parsing)
- ValidationReport dataclass (validation results with guidance)
- Intent analysis from natural language input
- State validation against GameSession
- Fuzzy matching for flexible input (e.g., "goblin" → "Goblin Scout")
- Context-aware prompting for GM guidance
✅ 10.2 Validation Logic
Combat Validation:
- Target must exist in npcs_present or combat.initiative_order
- Fuzzy matching for partial names
- Clear error messages for invalid targets
Spell Validation:
- Character must know the spell (fuzzy matching)
- Spell must exist in character's spell list
- Helpful suggestions for similar spells
Item Validation:
- Item must be in character inventory
- Quantity validation
NPC Conversation:
- Allows contextually appropriate NPC introductions
- Rejects NPCs that don't make sense in current scene
- Auto-adds introduced NPCs to game state
✅ 10.3 GM Integration
File: systems/gm_dialogue_unified.py
- Integrated ActionValidator into GameMaster.init
- Modified generate_response() to validate before LLM generation
- Updated _build_prompt() with validation guidance
- Added _post_process_response() to auto-add introduced NPCs
- Debug logging for validation steps
✅ 10.4 Reality Check Testing
File: test_reality_check.py
- Combat validation tests (valid/invalid targets, fuzzy matching)
- NPC conversation tests (introduction, rejection, fuzzy matching)
- Spell casting tests (known/unknown spells)
- Item usage tests (inventory validation)
- Exploration tests (always allowed)
- 100% test pass rate
Benefits:
- Prevents GM from inventing non-existent entities
- Maintains game state consistency
- Preserves narrative freedom (GM can still introduce appropriate NPCs)
- Fuzzy matching allows flexible player input
- Clear error messaging guides GM narration
📦 Updated Statistics (Dec 26, 2024)
Collection Counts
- Spells: 86 spells → 250+ chunks
- Monsters: 332 monsters → 332 chunks
- Classes: 12 classes → 12 chunks
- Races: 9 races → 18 chunks
- Equipment: 58 items → 58 chunks ⭐ NEW!
- Total: ~670+ chunks in ChromaDB
Test Results
- SpellSlots: 6 tests ✅
- DeathSaves: 3 tests ✅
- CharacterState: 34 tests ✅
- CombatState: 9 tests ✅
- PartyState: 11 tests ✅
- GameSession: 7 tests ✅
- Shop System: 7 test suites ✅ NEW!
- Reality Check: 15+ tests ✅ NEW!
- Total Tests: 92+ tests
- Pass Rate: 100% ✅
Status: ✅ V3.0 PRODUCTION READY! (Shop System + Reality Check) Current Focus: Documentation and deployment Latest Achievements (Dec 26, 2024): 🎉 GM-Driven Shop System - Conversational shopping with NPC shopkeepers 🎉 Reality Check System - Prevents hallucinations while preserving narrative freedom 🎉 Equipment Database - 58 D&D 5e items with accurate prices
Next Steps:
- Deploy to Hugging Face Spaces
- Test with HF Inference API model (Qwen2.5-7B-Instruct)
- User feedback and iteration
Last Updated: December 26, 2024