alexchilton Claude commited on
Commit
dc58c5d
·
1 Parent(s): b3bc075

feat: Add auto-detection for HF Spaces with unified GameMaster

Browse files

- Auto-detects environment (local vs HF Spaces) without manual config
- Uses same RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) everywhere
- Local: Ollama with quantized model (Q4_K_M)
- HF Spaces: Inference API with full precision model
- Optimized Dockerfile to skip Ollama on HF Spaces (faster builds)
- Updated start.sh to conditionally start Ollama
- Added huggingface_hub dependency to requirements.txt
- Updated deployment docs with auto-detection info

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Dockerfile CHANGED
@@ -15,8 +15,11 @@ RUN pip install --no-cache-dir -r requirements.txt
15
  # The .dockerignore file will exclude specified files and directories
16
  COPY . .
17
 
18
- # Install Ollama so the application can call it
19
- RUN apt-get update && apt-get install -y curl && curl -fsSL https://ollama.com/install.sh | sh
 
 
 
20
 
21
  # Run the ingestion script to populate the ChromaDB vector store
22
  # This step pre-builds the database so the app starts ready
 
15
  # The .dockerignore file will exclude specified files and directories
16
  COPY . .
17
 
18
+ # Only install Ollama if not on Hugging Face Spaces
19
+ # HF Spaces will use the Inference API instead
20
+ RUN if [ -z "$SPACE_ID" ]; then \
21
+ apt-get update && apt-get install -y curl && curl -fsSL https://ollama.com/install.sh | sh; \
22
+ fi
23
 
24
  # Run the ingestion script to populate the ChromaDB vector store
25
  # This step pre-builds the database so the app starts ready
HUGGINGFACE_DEPLOYMENT.md CHANGED
@@ -4,11 +4,11 @@ This guide explains how to deploy your D&D RAG Game Master to Hugging Face Space
4
 
5
  ## 🎯 Overview
6
 
7
- The app now supports **BOTH** local Ollama and Hugging Face Inference API:
8
- - **Local Mode** (default): Uses Ollama with `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`
9
- - **HF Spaces Mode**: Uses HF Inference API with `Chun121/Qwen3-4B-RPG-Roleplay-V2`
10
 
11
- The same model is used in both modes for consistency!
12
 
13
  ## 📦 Step 1: Prepare Files
14
 
@@ -40,16 +40,16 @@ dnd_rag_system/ # Entire package
40
  - **Space hardware**: CPU basic (free tier works!)
41
  - **Visibility**: Public or Private
42
 
43
- ## ⚙️ Step 3: Configure Environment Variables
44
 
45
- In your Space settings, add these **Secrets**:
46
 
47
- 1. **`USE_HF_API`** = `true`
48
- - This enables HF Inference API mode
49
-
50
- 2. **`HF_TOKEN`** = Your HF token
51
  - Get from: https://huggingface.co/settings/tokens
52
- - Needs "Read" permissions
 
 
 
53
 
54
  ## 📁 Step 4: Upload Files
55
 
@@ -89,7 +89,8 @@ git push
89
  2. Check logs for:
90
  ```
91
  🎲 Initializing D&D RAG System...
92
- 🌐 Using Hugging Face Inference API mode
 
93
  ```
94
  3. Test the interface:
95
  - Load a character
@@ -98,32 +99,38 @@ git push
98
 
99
  ## 🏠 Running Locally vs HF Spaces
100
 
101
- ### Local (Ollama):
102
  ```bash
103
  # No environment variables needed
104
- python3 app.py
105
- # Uses Ollama by default
106
  ```
107
 
108
- ### Local (Test HF Mode):
109
  ```bash
110
  export USE_HF_API=true
111
  export HF_TOKEN=your_token_here
112
- python3 app.py
113
- # Uses HF API locally (for testing before deployment)
114
  ```
115
 
116
- ### HF Spaces:
117
- - Automatically uses HF API when `USE_HF_API=true` is set in Space secrets
118
- - No Ollama needed!
 
 
119
 
120
  ## 📊 Model Information
121
 
122
- **Model Used:** `Chun121/Qwen3-4B-RPG-Roleplay-V2`
123
- - **Local**: Via Ollama `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`
124
- - **HF Spaces**: Via Inference API `Chun121/Qwen3-4B-RPG-Roleplay-V2`
125
- - **Size**: 4B parameters, quantized to Q4_K_M for Ollama
126
- - **Optimized for**: D&D roleplay and narrative generation
 
 
 
 
127
 
128
  ## 🐛 Troubleshooting
129
 
@@ -133,8 +140,10 @@ python3 app.py
133
  - Check build logs for errors
134
 
135
  ### "HF_TOKEN not found" error:
136
- - Add `HF_TOKEN` to Space secrets
137
- - Make sure `USE_HF_API=true` is set
 
 
138
 
139
  ### ChromaDB errors:
140
  - Ensure entire `chromadb/` folder is uploaded
@@ -156,7 +165,9 @@ python3 app.py
156
  1. **Free tier works!** CPU basic is enough for this app
157
  2. **Keep ChromaDB small**: Current 85MB is fine
158
  3. **Monitor usage**: HF Inference API has rate limits on free tier
159
- 4. **Test locally first**: Use `USE_HF_API=true` locally before deploying
 
 
160
 
161
  ## 📚 Resources
162
 
 
4
 
5
  ## 🎯 Overview
6
 
7
+ The app **automatically detects** its environment and uses the optimal backend:
8
+ - **Local Mode** (auto-detected): Uses Ollama with `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M` (quantized)
9
+ - **HF Spaces Mode** (auto-detected): Uses HF Inference API with `Chun121/Qwen3-4B-RPG-Roleplay-V2` (full model)
10
 
11
+ **Same RPG-optimized model in both modes!** The app detects HF Spaces by checking for `SPACE_ID`, `SPACE_AUTHOR_NAME`, or `HF_SPACE` environment variables.
12
 
13
  ## 📦 Step 1: Prepare Files
14
 
 
40
  - **Space hardware**: CPU basic (free tier works!)
41
  - **Visibility**: Public or Private
42
 
43
+ ## ⚙️ Step 3: Configure Environment Variables (Optional)
44
 
45
+ The app **auto-detects** HF Spaces, so you only need to set:
46
 
47
+ 1. **`HF_TOKEN`** (Optional) = Your HF token
 
 
 
48
  - Get from: https://huggingface.co/settings/tokens
49
+ - Only needed if using private models
50
+ - The RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) is public
51
+
52
+ **Note:** No need to set `USE_HF_API` - it's detected automatically!
53
 
54
  ## 📁 Step 4: Upload Files
55
 
 
89
  2. Check logs for:
90
  ```
91
  🎲 Initializing D&D RAG System...
92
+ 🤗 Using Hugging Face Inference API mode
93
+ Model: Chun121/Qwen3-4B-RPG-Roleplay-V2
94
  ```
95
  3. Test the interface:
96
  - Load a character
 
99
 
100
  ## 🏠 Running Locally vs HF Spaces
101
 
102
+ ### Local (Ollama) - Auto-detected:
103
  ```bash
104
  # No environment variables needed
105
+ python3 app_gradio.py
106
+ # Automatically uses Ollama
107
  ```
108
 
109
+ ### Local (Test HF Mode) - Manual override:
110
  ```bash
111
  export USE_HF_API=true
112
  export HF_TOKEN=your_token_here
113
+ python3 app_gradio.py
114
+ # Manually enables HF API mode for testing
115
  ```
116
 
117
+ ### HF Spaces - Auto-detected:
118
+ - **Automatically** detects HF Spaces environment
119
+ - Uses HF Inference API without any configuration
120
+ - Skips Ollama installation in Docker (faster builds!)
121
+ - No manual env vars needed!
122
 
123
  ## 📊 Model Information
124
 
125
+ **Model Used:** `Chun121/Qwen3-4B-RPG-Roleplay-V2` (same model everywhere!)
126
+ - **Local Ollama**: `hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M` (quantized to Q4_K_M)
127
+ - **HF Spaces**: `Chun121/Qwen3-4B-RPG-Roleplay-V2` (full precision model)
128
+
129
+ **Benefits of using the same model:**
130
+ - **Consistent behavior** across local and cloud environments
131
+ - **RPG-optimized** - specifically fine-tuned for D&D roleplay and narrative
132
+ - **Better on HF Spaces** - full precision vs quantized version
133
+ - **Faster inference** via HF's optimized infrastructure
134
 
135
  ## 🐛 Troubleshooting
136
 
 
140
  - Check build logs for errors
141
 
142
  ### "HF_TOKEN not found" error:
143
+ - Only needed for private models
144
+ - The RPG model (Chun121/Qwen3-4B-RPG-Roleplay-V2) should be public
145
+ - Check model visibility at: https://huggingface.co/Chun121/Qwen3-4B-RPG-Roleplay-V2
146
+ - If private, add `HF_TOKEN` to Space secrets
147
 
148
  ### ChromaDB errors:
149
  - Ensure entire `chromadb/` folder is uploaded
 
165
  1. **Free tier works!** CPU basic is enough for this app
166
  2. **Keep ChromaDB small**: Current 85MB is fine
167
  3. **Monitor usage**: HF Inference API has rate limits on free tier
168
+ 4. **Auto-detection**: No need to configure env vars - it just works!
169
+ 5. **Same RPG model everywhere**: Consistent D&D gameplay experience
170
+ 6. **Better quality on HF Spaces**: Full precision vs quantized local version
171
 
172
  ## 📚 Resources
173
 
app_gradio.py CHANGED
@@ -19,7 +19,7 @@ sys.path.insert(0, str(Path(__file__).parent))
19
 
20
  from dnd_rag_system.core.chroma_manager import ChromaDBManager
21
  from dnd_rag_system.systems.character_creator import Character
22
- from dnd_rag_system.systems.gm_dialogue import GameMaster
23
 
24
 
25
  # Initialize system
@@ -229,7 +229,10 @@ Otherwise, just type your action and press Enter!"""
229
  {"role": "assistant", "content": response}
230
  ]
231
  except Exception as e:
232
- error_msg = f"Error: {str(e)}\n\nMake sure Ollama is running and the model is installed:\n`ollama pull hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`"
 
 
 
233
  return history + [
234
  {"role": "user", "content": message},
235
  {"role": "assistant", "content": error_msg}
@@ -308,7 +311,7 @@ with gr.Blocks(title="D&D RAG Game Master") as demo:
308
  - `/rag Goblin` - Look up monster stats
309
  - `/rag Fighter` - Look up class features
310
 
311
- **Powered by:** ChromaDB RAG + Ollama (Qwen3-4B-RPG-Roleplay-V2)
312
  """)
313
 
314
  # Event handlers
 
19
 
20
  from dnd_rag_system.core.chroma_manager import ChromaDBManager
21
  from dnd_rag_system.systems.character_creator import Character
22
+ from dnd_rag_system.systems.gm_dialogue_unified import GameMaster
23
 
24
 
25
  # Initialize system
 
229
  {"role": "assistant", "content": response}
230
  ]
231
  except Exception as e:
232
+ error_msg = f"Error: {str(e)}"
233
+ # Only add Ollama instructions if running locally
234
+ if not (os.getenv("SPACE_ID") or os.getenv("SPACE_AUTHOR_NAME") or os.getenv("HF_SPACE")):
235
+ error_msg += "\n\nMake sure Ollama is running and the model is installed:\n`ollama pull hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M`"
236
  return history + [
237
  {"role": "user", "content": message},
238
  {"role": "assistant", "content": error_msg}
 
311
  - `/rag Goblin` - Look up monster stats
312
  - `/rag Fighter` - Look up class features
313
 
314
+ **Powered by:** ChromaDB RAG + AI Language Model (Auto-detected: Ollama locally, HF Inference API on Spaces)
315
  """)
316
 
317
  # Event handlers
dnd_rag_system/systems/gm_dialogue_unified.py CHANGED
@@ -1,11 +1,11 @@
1
  """
2
  D&D Game Master Dialogue System - Unified Version
3
 
4
- RAG-enhanced AI Dungeon Master that works with both:
 
5
  - Local Ollama (for local development)
6
- - Hugging Face Inference API (for HF Spaces)
7
 
8
- Set USE_HF_API=true environment variable to use HF Inference API.
9
  """
10
 
11
  import sys
@@ -22,6 +22,16 @@ from dnd_rag_system.core.chroma_manager import ChromaDBManager
22
  from dnd_rag_system.config import settings
23
 
24
 
 
 
 
 
 
 
 
 
 
 
25
  @dataclass
26
  class Message:
27
  """Conversation message."""
@@ -51,36 +61,45 @@ class GameMaster:
51
  """
52
  RAG-Enhanced AI Game Master - Unified Version.
53
 
54
- Supports both local Ollama and HF Inference API.
55
- Uses the same model: Chun121/Qwen3-4B-RPG-Roleplay-V2
 
56
  """
57
 
58
- def __init__(self, db_manager: ChromaDBManager, hf_token: str = None):
59
  """
60
  Initialize Game Master.
61
 
62
  Args:
63
  db_manager: ChromaDBManager instance
64
- hf_token: Hugging Face API token (optional, for HF API mode)
 
65
  """
66
  self.db = db_manager
67
  self.session = GameSession()
68
 
69
- # Detect mode
70
- self.use_hf_api = os.getenv("USE_HF_API", "false").lower() == "true"
71
 
72
  if self.use_hf_api:
73
- print("🌐 Using Hugging Face Inference API mode")
74
- from huggingface_hub import InferenceClient
 
 
 
 
75
  self.hf_token = hf_token or os.getenv("HF_TOKEN")
76
- # Use the same model from HF
77
- self.model_name = "Chun121/Qwen3-4B-RPG-Roleplay-V2"
78
  self.client = InferenceClient(token=self.hf_token)
 
79
  else:
80
- print("🖥️ Using local Ollama mode")
81
  # Local Ollama model
82
- self.model_name = settings.OLLAMA_MODEL_NAME
 
83
  self._verify_ollama()
 
84
 
85
  def _verify_ollama(self):
86
  """Check if Ollama is installed and model is available (local mode only)."""
@@ -246,7 +265,7 @@ GM RESPONSE:"""
246
 
247
  return prompt
248
 
249
- def _query_ollama(self, prompt: str, timeout: int = 30) -> str:
250
  """
251
  Send prompt to Ollama and get response (local mode).
252
 
@@ -281,7 +300,7 @@ GM RESPONSE:"""
281
  except Exception as e:
282
  raise Exception(f"Ollama query failed: {e}")
283
 
284
- def _query_hf(self, prompt: str, timeout: int = 30) -> str:
285
  """
286
  Send prompt to Hugging Face Inference API and get response (HF mode).
287
 
 
1
  """
2
  D&D Game Master Dialogue System - Unified Version
3
 
4
+ RAG-enhanced AI Dungeon Master that automatically detects environment:
5
+ - Hugging Face Inference API (when running on HF Spaces)
6
  - Local Ollama (for local development)
 
7
 
8
+ Auto-detection based on SPACE_ID, SPACE_AUTHOR_NAME, or HF_SPACE environment variables.
9
  """
10
 
11
  import sys
 
22
  from dnd_rag_system.config import settings
23
 
24
 
25
+ def is_huggingface_space() -> bool:
26
+ """Check if running on Hugging Face Spaces."""
27
+ return (
28
+ os.getenv("SPACE_ID") is not None or
29
+ os.getenv("SPACE_AUTHOR_NAME") is not None or
30
+ os.getenv("HF_SPACE") is not None or
31
+ os.getenv("USE_HF_API", "false").lower() == "true" # Manual override
32
+ )
33
+
34
+
35
  @dataclass
36
  class Message:
37
  """Conversation message."""
 
61
  """
62
  RAG-Enhanced AI Game Master - Unified Version.
63
 
64
+ Automatically uses:
65
+ - HF Inference API on Hugging Face Spaces
66
+ - Ollama for local development
67
  """
68
 
69
+ def __init__(self, db_manager: ChromaDBManager, hf_token: str = None, model_name: str = None):
70
  """
71
  Initialize Game Master.
72
 
73
  Args:
74
  db_manager: ChromaDBManager instance
75
+ hf_token: Hugging Face API token (optional, will use env var)
76
+ model_name: Model name override (optional)
77
  """
78
  self.db = db_manager
79
  self.session = GameSession()
80
 
81
+ # Auto-detect environment
82
+ self.use_hf_api = is_huggingface_space()
83
 
84
  if self.use_hf_api:
85
+ print("🤗 Using Hugging Face Inference API mode")
86
+ try:
87
+ from huggingface_hub import InferenceClient
88
+ except ImportError:
89
+ raise ImportError("huggingface_hub is required for HF Spaces. Install with: pip install huggingface_hub")
90
+
91
  self.hf_token = hf_token or os.getenv("HF_TOKEN")
92
+ # Use the same RPG roleplay model as local Ollama for consistency
93
+ self.model_name = model_name or "Chun121/Qwen3-4B-RPG-Roleplay-V2"
94
  self.client = InferenceClient(token=self.hf_token)
95
+ print(f" Model: {self.model_name}")
96
  else:
97
+ print("🦙 Using local Ollama mode")
98
  # Local Ollama model
99
+ self.model_name = model_name or settings.OLLAMA_MODEL_NAME
100
+ self.client = None
101
  self._verify_ollama()
102
+ print(f" Model: {self.model_name}")
103
 
104
  def _verify_ollama(self):
105
  """Check if Ollama is installed and model is available (local mode only)."""
 
265
 
266
  return prompt
267
 
268
+ def _query_ollama(self, prompt: str, timeout: int = 120) -> str:
269
  """
270
  Send prompt to Ollama and get response (local mode).
271
 
 
300
  except Exception as e:
301
  raise Exception(f"Ollama query failed: {e}")
302
 
303
+ def _query_hf(self, prompt: str, timeout: int = 60) -> str:
304
  """
305
  Send prompt to Hugging Face Inference API and get response (HF mode).
306
 
requirements.txt CHANGED
@@ -6,9 +6,12 @@ chromadb>=0.4.18
6
  sentence-transformers>=2.2.0
7
  pdfplumber>=0.10.0
8
 
9
- # Ollama Python client
10
  ollama>=0.1.0
11
 
 
 
 
12
  # Web UI
13
  gradio>=4.0.0
14
 
 
6
  sentence-transformers>=2.2.0
7
  pdfplumber>=0.10.0
8
 
9
+ # Ollama Python client (for local development)
10
  ollama>=0.1.0
11
 
12
+ # Hugging Face Inference API (for HF Spaces deployment)
13
+ huggingface_hub>=0.20.0
14
+
15
  # Web UI
16
  gradio>=4.0.0
17
 
start.sh CHANGED
@@ -1,13 +1,21 @@
1
  #!/bin/bash
2
- # Start Ollama server in the background
3
- ollama serve &
4
 
5
- # Wait a few seconds for the server to initialize
6
- sleep 3
 
 
 
 
 
 
7
 
8
- # Pull the required model
9
- echo "Pulling Ollama model..."
10
- ollama pull "hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M"
 
 
 
 
11
 
12
  # Start the Gradio application
13
  echo "Starting Gradio app..."
 
1
  #!/bin/bash
 
 
2
 
3
+ # Check if running on Hugging Face Spaces
4
+ if [ -n "$SPACE_ID" ] || [ -n "$SPACE_AUTHOR_NAME" ] || [ -n "$HF_SPACE" ]; then
5
+ echo "🤗 Running on Hugging Face Spaces - using HF Inference API"
6
+ echo "Skipping Ollama setup..."
7
+ else
8
+ echo "🦙 Running locally - starting Ollama"
9
+ # Start Ollama server in the background
10
+ ollama serve &
11
 
12
+ # Wait a few seconds for the server to initialize
13
+ sleep 3
14
+
15
+ # Pull the required model
16
+ echo "Pulling Ollama model..."
17
+ ollama pull "hf.co/Chun121/Qwen3-4B-RPG-Roleplay-V2:Q4_K_M"
18
+ fi
19
 
20
  # Start the Gradio application
21
  echo "Starting Gradio app..."