Spaces:

alexchilton
/

dnd-rag-g

Running

alexchilton Claude commited on Dec 1, 2025

Commit

dc58c5d

1 Parent(s): b3bc075

feat: Add auto-detection for HF Spaces with unified GameMaster

- Auto-detects environment (local vs HF Spaces) without manual config
- Uses same RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) everywhere
- Local: Ollama with quantized model (Q4_K_M)
- HF Spaces: Inference API with full precision model
- Optimized Dockerfile to skip Ollama on HF Spaces (faster builds)
- Updated start.sh to conditionally start Ollama
- Added huggingface_hub dependency to requirements.txt
- Updated deployment docs with auto-detection info

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (6) hide show

Dockerfile +5 -2
HUGGINGFACE_DEPLOYMENT.md +40 -29
app_gradio.py +6 -3
dnd_rag_system/systems/gm_dialogue_unified.py +36 -17
requirements.txt +4 -1
start.sh +15 -7

Dockerfile CHANGED Viewed

@@ -15,8 +15,11 @@ RUN pip install --no-cache-dir -r requirements.txt
 # The .dockerignore file will exclude specified files and directories
 COPY . .
-# Install Ollama so the application can call it
-RUN apt-get update && apt-get install -y curl && curl -fsSL https://ollama.com/install.sh | sh
 # Run the ingestion script to populate the ChromaDB vector store
 # This step pre-builds the database so the app starts ready

 # The .dockerignore file will exclude specified files and directories
 COPY . .
+# Only install Ollama if not on Hugging Face Spaces
+# HF Spaces will use the Inference API instead
+RUN if [ -z "$SPACE_ID" ]; then \
+        apt-get update && apt-get install -y curl && curl -fsSL https://ollama.com/install.sh | sh; \
+    fi
 # Run the ingestion script to populate the ChromaDB vector store
 # This step pre-builds the database so the app starts ready

HUGGINGFACE_DEPLOYMENT.md CHANGED Viewed

@@ -4,11 +4,11 @@ This guide explains how to deploy your D&D RAG Game Master to Hugging Face Space
 ## 🎯 Overview
-The app now supports **BOTH** local Ollama and Hugging Face Inference API:
-- **Local Mode** (default): Uses Ollama with `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`
-- **HF Spaces Mode**: Uses HF Inference API with `Chun121/Qwen3-4B-RPG-Roleplay-V2`
-The same model is used in both modes for consistency!
 ## 📦 Step 1: Prepare Files
@@ -40,16 +40,16 @@ dnd_rag_system/                 # Entire package
    - **Space hardware**: CPU basic (free tier works!)
    - **Visibility**: Public or Private
-## ⚙️ Step 3: Configure Environment Variables
-In your Space settings, add these **Secrets**:
-1. **`USE_HF_API`** = `true`
-   - This enables HF Inference API mode
-2. **`HF_TOKEN`** = Your HF token
    - Get from: https://huggingface.co/settings/tokens
-   - Needs "Read" permissions
 ## 📁 Step 4: Upload Files
@@ -89,7 +89,8 @@ git push
 2. Check logs for:
    ```
    🎲 Initializing D&D RAG System...
-   🌐 Using Hugging Face Inference API mode
    ```
 3. Test the interface:
    - Load a character
@@ -98,32 +99,38 @@ git push
 ## 🏠 Running Locally vs HF Spaces
-### Local (Ollama):
 ```bash
 # No environment variables needed
-python3 app.py
-# Uses Ollama by default
 ```
-### Local (Test HF Mode):
 ```bash
 export USE_HF_API=true
 export HF_TOKEN=your_token_here
-python3 app.py
-# Uses HF API locally (for testing before deployment)
 ```
-### HF Spaces:
-- Automatically uses HF API when `USE_HF_API=true` is set in Space secrets
-- No Ollama needed!
 ## 📊 Model Information
-**Model Used:** `Chun121/Qwen3-4B-RPG-Roleplay-V2`
-- **Local**: Via Ollama `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`
-- **HF Spaces**: Via Inference API `Chun121/Qwen3-4B-RPG-Roleplay-V2`
-- **Size**: 4B parameters, quantized to Q4_K_M for Ollama
-- **Optimized for**: D&D roleplay and narrative generation
 ## 🐛 Troubleshooting
@@ -133,8 +140,10 @@ python3 app.py
 - Check build logs for errors
 ### "HF_TOKEN not found" error:
-- Add `HF_TOKEN` to Space secrets
-- Make sure `USE_HF_API=true` is set
 ### ChromaDB errors:
 - Ensure entire `chromadb/` folder is uploaded
@@ -156,7 +165,9 @@ python3 app.py
 1. **Free tier works!** CPU basic is enough for this app
 2. **Keep ChromaDB small**: Current 85MB is fine
 3. **Monitor usage**: HF Inference API has rate limits on free tier
-4. **Test locally first**: Use `USE_HF_API=true` locally before deploying
 ## 📚 Resources

 ## 🎯 Overview
+The app **automatically detects** its environment and uses the optimal backend:
+- **Local Mode** (auto-detected): Uses Ollama with `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M` (quantized)
+- **HF Spaces Mode** (auto-detected): Uses HF Inference API with `Chun121/Qwen3-4B-RPG-Roleplay-V2` (full model)
+**Same RPG-optimized model in both modes!** The app detects HF Spaces by checking for `SPACE_ID`, `SPACE_AUTHOR_NAME`, or `HF_SPACE` environment variables.
 ## 📦 Step 1: Prepare Files
    - **Space hardware**: CPU basic (free tier works!)
    - **Visibility**: Public or Private
+## ⚙️ Step 3: Configure Environment Variables (Optional)
+The app **auto-detects** HF Spaces, so you only need to set:
+1. **`HF_TOKEN`** (Optional) = Your HF token
    - Get from: https://huggingface.co/settings/tokens
+   - Only needed if using private models
+   - The RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) is public
+**Note:** No need to set `USE_HF_API` - it's detected automatically!
 ## 📁 Step 4: Upload Files
 2. Check logs for:
    ```
    🎲 Initializing D&D RAG System...
+   🤗 Using Hugging Face Inference API mode
+      Model: Chun121/Qwen3-4B-RPG-Roleplay-V2
    ```
 3. Test the interface:
    - Load a character
 ## 🏠 Running Locally vs HF Spaces
+### Local (Ollama) - Auto-detected:
 ```bash
 # No environment variables needed
+python3 app_gradio.py
+# Automatically uses Ollama
 ```
+### Local (Test HF Mode) - Manual override:
 ```bash
 export USE_HF_API=true
 export HF_TOKEN=your_token_here
+python3 app_gradio.py
+# Manually enables HF API mode for testing
 ```
+### HF Spaces - Auto-detected:
+- **Automatically** detects HF Spaces environment
+- Uses HF Inference API without any configuration
+- Skips Ollama installation in Docker (faster builds!)
+- No manual env vars needed!
 ## 📊 Model Information
+**Model Used:** `Chun121/Qwen3-4B-RPG-Roleplay-V2` (same model everywhere!)
+- **Local Ollama**: `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M` (quantized to Q4_K_M)
+- **HF Spaces**: `Chun121/Qwen3-4B-RPG-Roleplay-V2` (full precision model)
+**Benefits of using the same model:**
+- **Consistent behavior** across local and cloud environments
+- **RPG-optimized** - specifically fine-tuned for D&D roleplay and narrative
+- **Better on HF Spaces** - full precision vs quantized version
+- **Faster inference** via HF's optimized infrastructure
 ## 🐛 Troubleshooting
 - Check build logs for errors
 ### "HF_TOKEN not found" error:
+- Only needed for private models
+- The RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) should be public
+- Check model visibility at: https://huggingface.co/Chun121/Qwen3-4B-RPG-Roleplay-V2
+- If private, add `HF_TOKEN` to Space secrets
 ### ChromaDB errors:
 - Ensure entire `chromadb/` folder is uploaded
 1. **Free tier works!** CPU basic is enough for this app
 2. **Keep ChromaDB small**: Current 85MB is fine
 3. **Monitor usage**: HF Inference API has rate limits on free tier
+4. **Auto-detection**: No need to configure env vars - it just works!
+5. **Same RPG model everywhere**: Consistent D&D gameplay experience
+6. **Better quality on HF Spaces**: Full precision vs quantized local version
 ## 📚 Resources

app_gradio.py CHANGED Viewed

@@ -19,7 +19,7 @@ sys.path.insert(0, str(Path(__file__).parent))
 from dnd_rag_system.core.chroma_manager import ChromaDBManager
 from dnd_rag_system.systems.character_creator import Character
-from dnd_rag_system.systems.gm_dialogue import GameMaster
 # Initialize system
@@ -229,7 +229,10 @@ Otherwise, just type your action and press Enter!"""
             {"role": "assistant", "content": response}
         ]
     except Exception as e:
-        error_msg = f"Error: {str(e)}\n\nMake sure Ollama is running and the model is installed:\n`ollama pull hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`"
         return history + [
             {"role": "user", "content": message},
             {"role": "assistant", "content": error_msg}
@@ -308,7 +311,7 @@ with gr.Blocks(title="D&D RAG Game Master") as demo:
     - `/rag Goblin` - Look up monster stats
     - `/rag Fighter` - Look up class features
-    **Powered by:** ChromaDB RAG + Ollama (Qwen3-4B-RPG-Roleplay-V2)
     """)
     # Event handlers

 from dnd_rag_system.core.chroma_manager import ChromaDBManager
 from dnd_rag_system.systems.character_creator import Character
+from dnd_rag_system.systems.gm_dialogue_unified import GameMaster
 # Initialize system
             {"role": "assistant", "content": response}
         ]
     except Exception as e:
+        error_msg = f"Error: {str(e)}"
+        # Only add Ollama instructions if running locally
+        if not (os.getenv("SPACE_ID") or os.getenv("SPACE_AUTHOR_NAME") or os.getenv("HF_SPACE")):
+            error_msg += "\n\nMake sure Ollama is running and the model is installed:\n`ollama pull hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`"
         return history + [
             {"role": "user", "content": message},
             {"role": "assistant", "content": error_msg}
     - `/rag Goblin` - Look up monster stats
     - `/rag Fighter` - Look up class features
+    **Powered by:** ChromaDB RAG + AI Language Model (Auto-detected: Ollama locally, HF Inference API on Spaces)
     """)
     # Event handlers

dnd_rag_system/systems/gm_dialogue_unified.py CHANGED Viewed

@@ -1,11 +1,11 @@
 """
 D&D Game Master Dialogue System - Unified Version
-RAG-enhanced AI Dungeon Master that works with both:
 - Local Ollama (for local development)
-- Hugging Face Inference API (for HF Spaces)
-Set USE_HF_API=true environment variable to use HF Inference API.
 """
 import sys
@@ -22,6 +22,16 @@ from dnd_rag_system.core.chroma_manager import ChromaDBManager
 from dnd_rag_system.config import settings
 @dataclass
 class Message:
     """Conversation message."""
@@ -51,36 +61,45 @@ class GameMaster:
     """
     RAG-Enhanced AI Game Master - Unified Version.
-    Supports both local Ollama and HF Inference API.
-    Uses the same model: Chun121/Qwen3-4B-RPG-Roleplay-V2
     """
-    def __init__(self, db_manager: ChromaDBManager, hf_token: str = None):
         """
         Initialize Game Master.
         Args:
             db_manager: ChromaDBManager instance
-            hf_token: Hugging Face API token (optional, for HF API mode)
         """
         self.db = db_manager
         self.session = GameSession()
-        # Detect mode
-        self.use_hf_api = os.getenv("USE_HF_API", "false").lower() == "true"
         if self.use_hf_api:
-            print("🌐 Using Hugging Face Inference API mode")
-            from huggingface_hub import InferenceClient
             self.hf_token = hf_token or os.getenv("HF_TOKEN")
-            # Use the same model from HF
-            self.model_name = "Chun121/Qwen3-4B-RPG-Roleplay-V2"
             self.client = InferenceClient(token=self.hf_token)
         else:
-            print("🖥️  Using local Ollama mode")
             # Local Ollama model
-            self.model_name = settings.OLLAMA_MODEL_NAME
             self._verify_ollama()
     def _verify_ollama(self):
         """Check if Ollama is installed and model is available (local mode only)."""
@@ -246,7 +265,7 @@ GM RESPONSE:"""
         return prompt
-    def _query_ollama(self, prompt: str, timeout: int = 30) -> str:
         """
         Send prompt to Ollama and get response (local mode).
@@ -281,7 +300,7 @@ GM RESPONSE:"""
         except Exception as e:
             raise Exception(f"Ollama query failed: {e}")
-    def _query_hf(self, prompt: str, timeout: int = 30) -> str:
         """
         Send prompt to Hugging Face Inference API and get response (HF mode).

 """
 D&D Game Master Dialogue System - Unified Version
+RAG-enhanced AI Dungeon Master that automatically detects environment:
+- Hugging Face Inference API (when running on HF Spaces)
 - Local Ollama (for local development)
+Auto-detection based on SPACE_ID, SPACE_AUTHOR_NAME, or HF_SPACE environment variables.
 """
 import sys
 from dnd_rag_system.config import settings
+def is_huggingface_space() -> bool:
+    """Check if running on Hugging Face Spaces."""
+    return (
+        os.getenv("SPACE_ID") is not None or
+        os.getenv("SPACE_AUTHOR_NAME") is not None or
+        os.getenv("HF_SPACE") is not None or
+        os.getenv("USE_HF_API", "false").lower() == "true"  # Manual override
+    )
 @dataclass
 class Message:
     """Conversation message."""
     """
     RAG-Enhanced AI Game Master - Unified Version.
+    Automatically uses:
+    - HF Inference API on Hugging Face Spaces
+    - Ollama for local development
     """
+    def __init__(self, db_manager: ChromaDBManager, hf_token: str = None, model_name: str = None):
         """
         Initialize Game Master.
         Args:
             db_manager: ChromaDBManager instance
+            hf_token: Hugging Face API token (optional, will use env var)
+            model_name: Model name override (optional)
         """
         self.db = db_manager
         self.session = GameSession()
+        # Auto-detect environment
+        self.use_hf_api = is_huggingface_space()
         if self.use_hf_api:
+            print("🤗 Using Hugging Face Inference API mode")
+            try:
+                from huggingface_hub import InferenceClient
+            except ImportError:
+                raise ImportError("huggingface_hub is required for HF Spaces. Install with: pip install huggingface_hub")
             self.hf_token = hf_token or os.getenv("HF_TOKEN")
+            # Use the same RPG roleplay model as local Ollama for consistency
+            self.model_name = model_name or "Chun121/Qwen3-4B-RPG-Roleplay-V2"
             self.client = InferenceClient(token=self.hf_token)
+            print(f"   Model: {self.model_name}")
         else:
+            print("🦙 Using local Ollama mode")
             # Local Ollama model
+            self.model_name = model_name or settings.OLLAMA_MODEL_NAME
+            self.client = None
             self._verify_ollama()
+            print(f"   Model: {self.model_name}")
     def _verify_ollama(self):
         """Check if Ollama is installed and model is available (local mode only)."""
         return prompt
+    def _query_ollama(self, prompt: str, timeout: int = 120) -> str:
         """
         Send prompt to Ollama and get response (local mode).
         except Exception as e:
             raise Exception(f"Ollama query failed: {e}")
+    def _query_hf(self, prompt: str, timeout: int = 60) -> str:
         """
         Send prompt to Hugging Face Inference API and get response (HF mode).

requirements.txt CHANGED Viewed

@@ -6,9 +6,12 @@ chromadb>=0.4.18
 sentence-transformers>=2.2.0
 pdfplumber>=0.10.0
-# Ollama Python client
 ollama>=0.1.0
 # Web UI
 gradio>=4.0.0

 sentence-transformers>=2.2.0
 pdfplumber>=0.10.0
+# Ollama Python client (for local development)
 ollama>=0.1.0
+# Hugging Face Inference API (for HF Spaces deployment)
+huggingface_hub>=0.20.0
 # Web UI
 gradio>=4.0.0

start.sh CHANGED Viewed

@@ -1,13 +1,21 @@
 #!/bin/bash
-# Start Ollama server in the background
-ollama serve &
-# Wait a few seconds for the server to initialize
-sleep 3
-# Pull the required model
-echo "Pulling Ollama model..."
-ollama pull "hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M"
 # Start the Gradio application
 echo "Starting Gradio app..."

 #!/bin/bash
+# Check if running on Hugging Face Spaces
+if [ -n "$SPACE_ID" ] || [ -n "$SPACE_AUTHOR_NAME" ] || [ -n "$HF_SPACE" ]; then
+    echo "🤗 Running on Hugging Face Spaces - using HF Inference API"
+    echo "Skipping Ollama setup..."
+else
+    echo "🦙 Running locally - starting Ollama"
+    # Start Ollama server in the background
+    ollama serve &
+    # Wait a few seconds for the server to initialize
+    sleep 3
+    # Pull the required model
+    echo "Pulling Ollama model..."
+    ollama pull "hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M"
+fi
 # Start the Gradio application
 echo "Starting Gradio app..."