Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -12,8 +12,83 @@ tags:
|
|
| 12 |
- reinforcement-learning
|
| 13 |
- planning
|
| 14 |
- ai-education
|
|
|
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
-
# World Model Demo
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- reinforcement-learning
|
| 13 |
- planning
|
| 14 |
- ai-education
|
| 15 |
+
- model-based-rl
|
| 16 |
+
- muzero
|
| 17 |
+
- dreamer
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# π§ World Model Demo
|
| 21 |
|
| 22 |
+
**An interactive visualization of model-based reinforcement learning concepts**
|
| 23 |
+
|
| 24 |
+
## What is a World Model?
|
| 25 |
+
|
| 26 |
+
A **world model** is an internal representation that an AI agent uses to *simulate* the environment without actually interacting with it. Think of it as the agent's "imagination" - it can mentally rehearse actions and predict their outcomes before committing to them in the real world.
|
| 27 |
+
|
| 28 |
+
### The Key Insight
|
| 29 |
+
|
| 30 |
+
Instead of learning through pure trial-and-error (which is slow and potentially dangerous), an agent with a world model can:
|
| 31 |
+
|
| 32 |
+
1. **Imagine** possible futures by simulating "what if I do X?"
|
| 33 |
+
2. **Evaluate** which imagined future looks best
|
| 34 |
+
3. **Plan** a sequence of actions to reach that future
|
| 35 |
+
4. **Act** with confidence, having already "seen" the outcome
|
| 36 |
+
|
| 37 |
+
## How This Differs from Language Models
|
| 38 |
+
|
| 39 |
+
| Aspect | Language Model (GPT, Claude) | World Model (MuZero, Dreamer) |
|
| 40 |
+
|--------|------------------------------|-------------------------------|
|
| 41 |
+
| **Primary function** | Predict next token in a sequence | Predict next *state* given an action |
|
| 42 |
+
| **Training signal** | Text prediction loss | Reward from environment |
|
| 43 |
+
| **"Imagination"** | Generates plausible text continuations | Simulates future environment states |
|
| 44 |
+
| **Planning** | Implicit (via chain-of-thought) | Explicit (via tree search or rollouts) |
|
| 45 |
+
| **Grounding** | Statistical patterns in text | Causal dynamics of an environment |
|
| 46 |
+
|
| 47 |
+
### A Concrete Example
|
| 48 |
+
|
| 49 |
+
**Language Model**: "If I push a ball off a table, it will..." β generates plausible text based on patterns
|
| 50 |
+
|
| 51 |
+
**World Model**: Given state (ball on table) + action (push) β predicts new state (ball falling, trajectory, landing position) with enough fidelity to *plan* around it
|
| 52 |
+
|
| 53 |
+
## What You're Seeing in This Demo
|
| 54 |
+
|
| 55 |
+
This visualization shows a simplified world model operating on a grid navigation task:
|
| 56 |
+
|
| 57 |
+
### The Four Phases
|
| 58 |
+
|
| 59 |
+
1. **π Observe**: The agent perceives the current grid state (its position, goal location, obstacles)
|
| 60 |
+
|
| 61 |
+
2. **π Imagine**: The world model predicts what would happen for each possible action (up/down/left/right). You see this as the "mental simulation" exploring future states.
|
| 62 |
+
|
| 63 |
+
3. **π³ Plan**: Using tree search (similar to how chess engines work), the agent evaluates sequences of actions by imagining multiple steps ahead. Better paths to the goal get higher scores.
|
| 64 |
+
|
| 65 |
+
4. **β‘ Act**: The agent executes the best action found during planning, then the cycle repeats.
|
| 66 |
+
|
| 67 |
+
### Why This Matters for AI Safety
|
| 68 |
+
|
| 69 |
+
World models are crucial for AI safety research because:
|
| 70 |
+
|
| 71 |
+
- **Predictability**: Agents that plan can be analyzed - we can inspect what futures they're considering
|
| 72 |
+
- **Corrigibility**: Planning agents can incorporate "don't do irreversible things" into their search
|
| 73 |
+
- **Interpretability**: The world model's predictions can be examined for accuracy and bias
|
| 74 |
+
- **Scalable oversight**: Humans can audit the agent's "reasoning" by inspecting its simulated futures
|
| 75 |
+
|
| 76 |
+
## Real-World Architectures
|
| 77 |
+
|
| 78 |
+
This demo is inspired by:
|
| 79 |
+
|
| 80 |
+
- **MuZero** (DeepMind): Learned world models that mastered Go, chess, and Atari without knowing the rules
|
| 81 |
+
- **Dreamer** (Hafner et al.): World models for continuous control from pixels
|
| 82 |
+
- **IRIS** (Micheli et al.): Transformer-based world models for Atari
|
| 83 |
+
- **Genie** (DeepMind): Generative world models from video
|
| 84 |
+
|
| 85 |
+
## Try It Yourself
|
| 86 |
+
|
| 87 |
+
1. Click **"Run World Model"** to watch the full planning cycle
|
| 88 |
+
2. Use **Step Mode** to see each phase individually
|
| 89 |
+
3. Adjust grid size and obstacles to see how planning adapts
|
| 90 |
+
4. Watch the **Imagined Futures** panel to see the agent's "thoughts"
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
*Created by [Anthony Maio](https://huggingface.co/anthonym21) as an educational resource for AI safety research*
|
app.py
CHANGED
|
@@ -99,21 +99,21 @@ class WorldModel:
|
|
| 99 |
# Visualization
|
| 100 |
# ============================================================================
|
| 101 |
|
| 102 |
-
def render_grid_html(state,
|
| 103 |
-
"""Render the grid as
|
| 104 |
-
size = state['size']
|
| 105 |
agent = state['agent']
|
| 106 |
goal = state['goal']
|
| 107 |
-
obstacles = set(
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
-
'observe': '#3b82f6',
|
| 111 |
-
'predict': '#
|
| 112 |
-
'plan': '#
|
| 113 |
-
'act': '#
|
| 114 |
-
'learn': '#ec4899'
|
| 115 |
}
|
| 116 |
-
phase_color =
|
| 117 |
|
| 118 |
html = f'''
|
| 119 |
<div style="text-align: center; font-family: system-ui, sans-serif;">
|
|
@@ -203,8 +203,9 @@ def do_action(action):
|
|
| 203 |
|
| 204 |
def reset_env():
|
| 205 |
global current_state, current_phase, world, model
|
|
|
|
|
|
|
| 206 |
current_state = world.reset()
|
| 207 |
-
model = WorldModel()
|
| 208 |
current_phase = "observe"
|
| 209 |
html, stats = get_display()
|
| 210 |
return html, stats, "Environment reset!"
|
|
@@ -214,14 +215,8 @@ with gr.Blocks(title="World Model Demo", theme=gr.themes.Soft()) as demo:
|
|
| 214 |
gr.Markdown("""
|
| 215 |
# π§ World Model Demo
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
**The Learning Cycle:**
|
| 220 |
-
1. **Observe** - Agent perceives current state
|
| 221 |
-
2. **Predict** - World model predicts action outcomes
|
| 222 |
-
3. **Plan** - Agent evaluates possible futures
|
| 223 |
-
4. **Act** - Execute chosen action
|
| 224 |
-
5. **Learn** - Update model from observed outcome
|
| 225 |
""")
|
| 226 |
|
| 227 |
with gr.Row():
|
|
@@ -233,9 +228,9 @@ with gr.Blocks(title="World Model Demo", theme=gr.themes.Soft()) as demo:
|
|
| 233 |
with gr.Column(scale=1):
|
| 234 |
gr.Markdown("### Controls")
|
| 235 |
with gr.Row():
|
| 236 |
-
gr.Button(""
|
| 237 |
up_btn = gr.Button("β¬οΈ Up")
|
| 238 |
-
gr.Button(""
|
| 239 |
with gr.Row():
|
| 240 |
left_btn = gr.Button("β¬
οΈ Left")
|
| 241 |
down_btn = gr.Button("β¬οΈ Down")
|
|
@@ -244,16 +239,62 @@ with gr.Blocks(title="World Model Demo", theme=gr.themes.Soft()) as demo:
|
|
| 244 |
reset_btn = gr.Button("π Reset", variant="secondary")
|
| 245 |
|
| 246 |
gr.Markdown("""
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
Used in: MuZero, Dreamer, PlaNet
|
| 255 |
""")
|
| 256 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 257 |
# Connect buttons
|
| 258 |
up_btn.click(lambda: do_action("up"), outputs=[grid_display, stats_display, message_display])
|
| 259 |
down_btn.click(lambda: do_action("down"), outputs=[grid_display, stats_display, message_display])
|
|
|
|
| 99 |
# Visualization
|
| 100 |
# ============================================================================
|
| 101 |
|
| 102 |
+
def render_grid_html(state, phase="observe", prediction=None):
|
| 103 |
+
"""Render the grid as HTML with phase-appropriate styling"""
|
|
|
|
| 104 |
agent = state['agent']
|
| 105 |
goal = state['goal']
|
| 106 |
+
obstacles = set(tuple(o) if isinstance(o, list) else o for o in state['obstacles'])
|
| 107 |
+
size = state['size']
|
| 108 |
|
| 109 |
+
phase_colors = {
|
| 110 |
+
'observe': '#3b82f6', # blue
|
| 111 |
+
'predict': '#f59e0b', # amber
|
| 112 |
+
'plan': '#8b5cf6', # purple
|
| 113 |
+
'act': '#10b981', # green
|
| 114 |
+
'learn': '#ec4899' # pink
|
| 115 |
}
|
| 116 |
+
phase_color = phase_colors.get(phase, '#6b7280')
|
| 117 |
|
| 118 |
html = f'''
|
| 119 |
<div style="text-align: center; font-family: system-ui, sans-serif;">
|
|
|
|
| 203 |
|
| 204 |
def reset_env():
|
| 205 |
global current_state, current_phase, world, model
|
| 206 |
+
world = GridWorld() # Create fresh world
|
| 207 |
+
model = WorldModel() # Create fresh model
|
| 208 |
current_state = world.reset()
|
|
|
|
| 209 |
current_phase = "observe"
|
| 210 |
html, stats = get_display()
|
| 211 |
return html, stats, "Environment reset!"
|
|
|
|
| 215 |
gr.Markdown("""
|
| 216 |
# π§ World Model Demo
|
| 217 |
|
| 218 |
+
**What is this?** An interactive demonstration of how AI agents can build internal "mental models"
|
| 219 |
+
of the world to plan and reason, rather than just reacting to inputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
""")
|
| 221 |
|
| 222 |
with gr.Row():
|
|
|
|
| 228 |
with gr.Column(scale=1):
|
| 229 |
gr.Markdown("### Controls")
|
| 230 |
with gr.Row():
|
| 231 |
+
gr.Button("", visible=False, min_width=1)
|
| 232 |
up_btn = gr.Button("β¬οΈ Up")
|
| 233 |
+
gr.Button("", visible=False, min_width=1)
|
| 234 |
with gr.Row():
|
| 235 |
left_btn = gr.Button("β¬
οΈ Left")
|
| 236 |
down_btn = gr.Button("β¬οΈ Down")
|
|
|
|
| 239 |
reset_btn = gr.Button("π Reset", variant="secondary")
|
| 240 |
|
| 241 |
gr.Markdown("""
|
| 242 |
+
---
|
| 243 |
+
**The Learning Cycle:**
|
| 244 |
+
1. π **Observe** - Perceive state
|
| 245 |
+
2. π **Predict** - Imagine outcomes
|
| 246 |
+
3. β‘ **Act** - Execute action
|
| 247 |
+
4. π **Learn** - Update model
|
|
|
|
|
|
|
| 248 |
""")
|
| 249 |
|
| 250 |
+
# Educational content in collapsible sections
|
| 251 |
+
with gr.Accordion("π What is a World Model?", open=False):
|
| 252 |
+
gr.Markdown("""
|
| 253 |
+
A **world model** is an internal representation that an AI agent uses to *simulate* the
|
| 254 |
+
environment without actually interacting with it. Think of it as the agent's "imagination."
|
| 255 |
+
|
| 256 |
+
**Instead of pure trial-and-error, an agent with a world model can:**
|
| 257 |
+
- π― **Imagine** possible futures ("what if I do X?")
|
| 258 |
+
- βοΈ **Evaluate** which imagined future looks best
|
| 259 |
+
- πΊοΈ **Plan** a sequence of actions to reach that future
|
| 260 |
+
- β
**Act** with confidence, having already "seen" the outcome
|
| 261 |
+
|
| 262 |
+
**Real examples:** MuZero (mastered Go/Chess without knowing rules), Dreamer (robot control),
|
| 263 |
+
IRIS (Atari from pixels)
|
| 264 |
+
""")
|
| 265 |
+
|
| 266 |
+
with gr.Accordion("π€ How is this different from ChatGPT/Claude?", open=False):
|
| 267 |
+
gr.Markdown("""
|
| 268 |
+
| Aspect | Language Model (GPT, Claude) | World Model (This Demo) |
|
| 269 |
+
|--------|------------------------------|-------------------------|
|
| 270 |
+
| **Predicts** | Next *word* in a sequence | Next *state* given an action |
|
| 271 |
+
| **Training** | Text prediction | Reward from environment |
|
| 272 |
+
| **"Thinking"** | Generates plausible text | Simulates physical outcomes |
|
| 273 |
+
| **Planning** | Implicit (chain-of-thought) | Explicit (tree search) |
|
| 274 |
+
| **Grounding** | Statistical text patterns | Causal dynamics |
|
| 275 |
+
|
| 276 |
+
**Example:**
|
| 277 |
+
- **LLM**: "If I push a ball off a table..." β generates plausible *text*
|
| 278 |
+
- **World Model**: state(ball on table) + action(push) β predicts actual *trajectory*
|
| 279 |
+
|
| 280 |
+
Language models learn *what sounds right*. World models learn *what actually happens*.
|
| 281 |
+
""")
|
| 282 |
+
|
| 283 |
+
with gr.Accordion("π¬ Why does this matter for AI Safety?", open=False):
|
| 284 |
+
gr.Markdown("""
|
| 285 |
+
World models are crucial for AI safety research because:
|
| 286 |
+
|
| 287 |
+
- **Predictability**: Agents that plan can be analyzed - we can inspect what futures they're considering
|
| 288 |
+
- **Corrigibility**: Planning agents can incorporate "avoid irreversible actions" into their search
|
| 289 |
+
- **Interpretability**: The model's predictions can be examined for accuracy and bias
|
| 290 |
+
- **Scalable Oversight**: Humans can audit the agent's "reasoning" by inspecting simulated futures
|
| 291 |
+
|
| 292 |
+
Understanding how AI systems model the world helps us build systems we can trust and verify.
|
| 293 |
+
|
| 294 |
+
---
|
| 295 |
+
*Created by [Anthony Maio](https://huggingface.co/anthonym21) as an educational resource*
|
| 296 |
+
""")
|
| 297 |
+
|
| 298 |
# Connect buttons
|
| 299 |
up_btn.click(lambda: do_action("up"), outputs=[grid_display, stats_display, message_display])
|
| 300 |
down_btn.click(lambda: do_action("down"), outputs=[grid_display, stats_display, message_display])
|