Spaces:

kgdrathan
/

explainer-env

Sleeping

App Files Files Community

kgdrathan commited on Apr 25

Commit

eb1ebe6

verified ·

1 Parent(s): c5b0dcd

Upload folder using huggingface_hub

Browse files

Files changed (27) hide show

README.md +65 -213
client.py +10 -6
models.py +43 -12
openenv_explainer_env.egg-info/PKG-INFO +3 -0
openenv_explainer_env.egg-info/SOURCES.txt +13 -1
openenv_explainer_env.egg-info/requires.txt +3 -0
out.txt +0 -0
pyproject.toml +8 -2
rewards/README.md +107 -0
rewards/__init__.py +16 -0
rewards/exploration.py +138 -0
rewards/generation.py +218 -0
rewards/llm_judge.py +132 -0
rewards/notes.ipynb +53 -0
rewards/sandbox.py +83 -0
rewards/sources.py +321 -0
server/explainer_env_environment.py +264 -346
task_bank.py +4 -33
tests/__init__.py +0 -0
tests/run_tests.sh +76 -0
tests/test_client_server.py +96 -0
tests/test_docker.py +113 -0
tests/test_environment.py +163 -0
tests/test_models.py +77 -0
tests/test_rewards.py +217 -0
tests/test_task_bank.py +58 -0
uv.lock +102 -70

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Explainer Env Environment Server
-emoji: 💻
 colorFrom: pink
 colorTo: gray
 sdk: docker
@@ -11,245 +11,97 @@ tags:
   - openenv
 ---
-# Explainer Env Environment
-A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
-## Quick Start
-The simplest way to use the Explainer Env environment is through the `ExplainerEnv` class:
-```python
-from explainer_env import ExplainerAction, ExplainerEnv
-try:
-    # Create environment from Docker image
-    explainer_envenv = ExplainerEnv.from_docker_image("explainer_env-env:latest")
-    # Reset
-    result = explainer_envenv.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Send multiple messages
-    messages = ["Hello, World!", "Testing echo", "Final message"]
-    for msg in messages:
-        result = explainer_envenv.step(ExplainerAction(message=msg))
-        print(f"Sent: '{msg}'")
-        print(f"  → Echoed: '{result.observation.echoed_message}'")
-        print(f"  → Length: {result.observation.message_length}")
-        print(f"  → Reward: {result.reward}")
-finally:
-    # Always clean up
-    explainer_envenv.close()
 ```
-That's it! The `ExplainerEnv.from_docker_image()` method handles:
-- Starting the Docker container
-- Waiting for the server to be ready
-- Connecting to the environment
-- Container cleanup when you call `close()`
-## Building the Docker Image
-Before using the environment, you need to build the Docker image:
-```bash
-# From project root
-docker build -t explainer_env-env:latest -f server/Dockerfile .
 ```
-## Deploying to Hugging Face Spaces
-You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
 ```bash
-# From the environment directory (where openenv.yaml is located)
-openenv push
-# Or specify options
-openenv push --namespace my-org --private
 ```
-The `openenv push` command will:
-1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
-2. Prepare a custom build for Hugging Face Docker space (enables web interface)
-3. Upload to Hugging Face (ensuring you're logged in)
-### Prerequisites
-- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
-### Options
-- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
-- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
-- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
-- `--private`: Deploy the space as private (default: public)
-### Examples
 ```bash
-# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
-openenv push
-# Push to a specific repository
-openenv push --repo-id my-org/my-env
-# Push with a custom base image
-openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
-# Push as a private space
-openenv push --private
-# Combine options
-openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
-```
-After deployment, your space will be available at:
-`https://huggingface.co/spaces/<repo-id>`
-The deployed space includes:
-- **Web Interface** at `/web` - Interactive UI for exploring the environment
-- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
-- **Health Check** at `/health` - Container health monitoring
-- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
-## Environment Details
-### Action
-**ExplainerAction**: Contains a single field
-- `message` (str) - The message to echo back
-### Observation
-**ExplainerObservation**: Contains the echo response and metadata
-- `echoed_message` (str) - The message echoed back
-- `message_length` (int) - Length of the message
-- `reward` (float) - Reward based on message length (length × 0.1)
-- `done` (bool) - Always False for echo environment
-- `metadata` (dict) - Additional info like step count
-### Reward
-The reward is calculated as: `message_length × 0.1`
-- "Hi" → reward: 0.2
-- "Hello, World!" → reward: 1.3
-- Empty message → reward: 0.0
-## Advanced Usage
-### Connecting to an Existing Server
-If you already have a Explainer Env environment server running, you can connect directly:
-```python
-from explainer_env import ExplainerEnv
-# Connect to existing server
-explainer_envenv = ExplainerEnv(base_url="<ENV_HTTP_URL_HERE>")
-# Use as normal
-result = explainer_envenv.reset()
-result = explainer_envenv.step(ExplainerAction(message="Hello!"))
 ```
-Note: When connecting to an existing server, `explainer_envenv.close()` will NOT stop the server.
-### Using the Context Manager
-The client supports context manager usage for automatic connection management:
 ```python
-from explainer_env import ExplainerAction, ExplainerEnv
-# Connect with context manager (auto-connects and closes)
-with ExplainerEnv(base_url="http://localhost:8000") as env:
-    result = env.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Multiple steps with low latency
-    for msg in ["Hello", "World", "!"]:
-        result = env.step(ExplainerAction(message=msg))
-        print(f"Echoed: {result.observation.echoed_message}")
-```
-The client uses WebSocket connections for:
-- **Lower latency**: No HTTP connection overhead per request
-- **Persistent session**: Server maintains your environment state
-- **Efficient for episodes**: Better for many sequential steps
-### Concurrent WebSocket Sessions
-The server supports multiple concurrent WebSocket connections. To enable this,
-modify `server/app.py` to use factory mode:
-```python
-# In server/app.py - use factory mode for concurrent sessions
-app = create_app(
-    ExplainerEnvironment,  # Pass class, not instance
-    ExplainerAction,
-    ExplainerObservation,
-    max_concurrent_envs=4,  # Allow 4 concurrent sessions
-)
-```
-Then multiple clients can connect simultaneously:
-```python
-from explainer_env import ExplainerAction, ExplainerEnv
 from concurrent.futures import ThreadPoolExecutor
 def run_episode(client_id: int):
-    with ExplainerEnv(base_url="http://localhost:8000") as env:
-        result = env.reset()
-        for i in range(10):
-            result = env.step(ExplainerAction(message=f"Client {client_id}, step {i}"))
-        return client_id, result.observation.message_length
-# Run 4 episodes concurrently
 with ThreadPoolExecutor(max_workers=4) as executor:
     results = list(executor.map(run_episode, range(4)))
 ```
-## Development & Testing
-### Direct Environment Testing
-Test the environment logic directly without starting the HTTP server:
-```bash
-# From the server directory
-python3 server/explainer_env_environment.py
-```
-This verifies that:
-- Environment resets correctly
-- Step executes actions properly
-- State tracking works
-- Rewards are calculated correctly
-### Running Locally
-Run the server locally for development:
-```bash
-uvicorn server.app:app --reload
-```
-## Project Structure
-```
-explainer_env/
-├── .dockerignore         # Docker build exclusions
-├── __init__.py            # Module exports
-├── README.md              # This file
-├── openenv.yaml           # OpenEnv manifest
-├── pyproject.toml         # Project metadata and dependencies
-├── uv.lock                # Locked dependencies (generated)
-├── client.py              # ExplainerEnv client
-├── models.py              # Action and Observation models
-└── server/
-    ├── __init__.py        # Server module exports
-    ├── explainer_env_environment.py  # Core environment logic
-    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
-    └── Dockerfile         # Container image definition
-```

 ---
 title: Explainer Env Environment Server
+emoji: "\U0001F4BB"
 colorFrom: pink
 colorTo: gray
 sdk: docker
   - openenv
 ---
+# Research → Interactive Explainer Environment
+An OpenEnv RL environment that trains small language models to create interactive educational content. Given a research topic, the agent:
+1. **Explores** — searches HuggingFace Papers (ML topics) or Wikipedia (general topics) for relevant content
+2. **Generates** — produces a **Marimo** reactive notebook or **Manim** math animation explaining the topic
+The agent learns *what* to search, *when to stop exploring*, and how to produce high-quality interactive explanations.
+## Episode Flow
 ```
+reset() → topic + tier assigned
+  ↓
+explore × 0..3 → search queries, accumulate research context
+  ↓
+generate × 1 → produce marimo/manim code → episode ends
 ```
+Each step returns a per-step reward. See [rewards/README.md](rewards/README.md) for the full reward breakdown.
+## Quick Start
 ```bash
+# Install & run locally
+cd explainer_env && uv sync
+uv run server  # http://localhost:8000
+# Client usage
+python -c "
+from client import ExplainerEnv
+from models import ExplainerAction
+with ExplainerEnv(base_url='http://localhost:8000').sync() as sc:
+    result = sc.reset()
+    print(f'Topic: {result.observation.topic}, Tier: {result.observation.tier}')
+    # Explore
+    result = sc.step(ExplainerAction(action_type='explore', query=result.observation.topic))
+    print(f'Explore reward: {result.reward:.3f}')
+    # Generate
+    result = sc.step(ExplainerAction(
+        action_type='generate',
+        format='marimo',
+        code='import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    mo.md(\"# Hello\")\n    return\n',
+    ))
+    print(f'Generate reward: {result.reward:.3f}, done: {result.done}')
+"
 ```
+## LLM-as-Judge (Optional Eval)
+For final evaluation of explanation quality, an optional LLM judge scores outputs on clarity, accuracy, engagement, completeness, and appropriateness.
+**Not used during training** — too slow and non-deterministic for RL rewards. Training uses 12 fast heuristic reward components instead.
 ```bash
+# Configure (any OpenAI-compatible endpoint)
+export JUDGE_API_URL="http://localhost:11434/v1"  # e.g. ollama
+export JUDGE_MODEL="llama3"
+# Usage
+python -c "
+from rewards.llm_judge import judge_explainability, is_available
+if is_available():
+    score, details = judge_explainability(code='...', topic='Linear Regression', tier='beginner')
+    print(f'Score: {score:.2f}, Rationale: {details.get(\"rationale\", \"\")}')"
 ```
+See [rewards/README.md](rewards/README.md) for full configuration details.
+## Concurrent WebSocket Sessions
+The server supports multiple concurrent WebSocket connections for parallel training rollouts:
 ```python
+from client import ExplainerEnv
+from models import ExplainerAction
 from concurrent.futures import ThreadPoolExecutor
 def run_episode(client_id: int):
+    with ExplainerEnv(base_url="http://localhost:8000").sync() as sc:
+        result = sc.reset()
+        result = sc.step(ExplainerAction(action_type="explore", query=result.observation.topic))
+        result = sc.step(ExplainerAction(
+            action_type="generate", format="marimo",
+            code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    return\n",
+        ))
+        return client_id, result.reward
 with ThreadPoolExecutor(max_workers=4) as executor:
     results = list(executor.map(run_episode, range(4)))
 ```

client.py CHANGED Viewed

@@ -16,12 +16,16 @@ class ExplainerEnv(
     Client for the Research → Interactive Explainer environment.
     Example:
-        >>> with ExplainerEnv(base_url="http://localhost:8000") as client:
-        ...     result = client.reset()
-        ...     print(result.observation.topic)
-        ...     action = ExplainerAction(format="marimo", code="import marimo...")
-        ...     result = client.step(action)
-        ...     print(result.reward)
     """
     def _step_payload(self, action: ExplainerAction) -> Dict:

     Client for the Research → Interactive Explainer environment.
     Example:
+        >>> with ExplainerEnv(base_url="http://localhost:8000").sync() as sc:
+        ...     result = sc.reset()
+        ...     # Explore phase
+        ...     result = sc.step(ExplainerAction(
+        ...         action_type="explore", query="attention mechanism transformers"
+        ...     ))
+        ...     # Generate phase
+        ...     result = sc.step(ExplainerAction(
+        ...         action_type="generate", format="marimo", code="import marimo..."
+        ...     ))
     """
     def _step_payload(self, action: ExplainerAction) -> Dict:

models.py CHANGED Viewed

@@ -1,8 +1,9 @@
 """
 Data models for the Research → Interactive Explainer environment.
-The agent receives a topic/paper and generates interactive educational content
-as either a Marimo notebook or Manim animation (with narration script).
 """
 from typing import Literal
@@ -12,29 +13,59 @@ from pydantic import Field
 class ExplainerAction(Action):
-    """Action: agent chooses a format and generates code (+ optional narration)."""
-    format: Literal["marimo", "manim"] = Field(
-        ..., description="Output format: 'marimo' for interactive notebook, 'manim' for animation"
     )
-    code: str = Field(..., description="Complete Python source code (Marimo .py or Manim Scene)")
     narration: str = Field(
         default="",
-        description="Scene-by-scene narration script (required when format is 'manim')",
     )
 class ExplainerObservation(Observation):
-    """Observation: the topic to explain and feedback on the last attempt."""
     topic: str = Field(default="", description="Title of the topic or paper")
     content: str = Field(default="", description="Abstract or concept description")
     tier: Literal["beginner", "intermediate", "advanced"] = Field(
         default="beginner", description="Explanation depth tier"
     )
-    keywords: str = Field(default="", description="Comma-separated key terms from the source")
-    category: str = Field(default="", description="arXiv category or domain (e.g. cs.LG, math.NA)")
     data_available: bool = Field(
-        default=False, description="Whether the topic references datasets/numbers"
     )
-    feedback: str = Field(default="", description="Feedback on the last action (execution result)")

 """
 Data models for the Research → Interactive Explainer environment.
+Two-phase episode:
+  1. Explore: agent searches for papers/resources (1-3 steps)
+  2. Generate: agent produces marimo/manim code (1 step, ends episode)
 """
 from typing import Literal
 class ExplainerAction(Action):
+    """Action: agent either explores (searches) or generates (produces code)."""
+    action_type: Literal["explore", "generate"] = Field(
+        ..., description="'explore' to search for papers, 'generate' to produce code"
+    )
+    # -- explore fields --
+    query: str = Field(
+        default="",
+        description="Search query for arXiv/HF papers (used when action_type='explore')",
+    )
+    # -- generate fields --
+    format: Literal["marimo", "manim"] | None = Field(
+        default=None,
+        description="Output format (required when action_type='generate')",
+    )
+    code: str = Field(
+        default="",
+        description="Complete Python source code (required when action_type='generate')",
     )
     narration: str = Field(
         default="",
+        description="Narration script (required when format='manim')",
     )
 class ExplainerObservation(Observation):
+    """Observation returned to the agent after each step."""
+    # -- task info (set on reset, echoed back each step) --
     topic: str = Field(default="", description="Title of the topic or paper")
     content: str = Field(default="", description="Abstract or concept description")
     tier: Literal["beginner", "intermediate", "advanced"] = Field(
         default="beginner", description="Explanation depth tier"
     )
+    keywords: str = Field(default="", description="Comma-separated key terms")
     data_available: bool = Field(
+        default=False, description="Whether the topic references datasets"
+    )
+    # -- per-step feedback --
+    phase: Literal["explore", "generate", "done"] = Field(
+        default="explore", description="Current episode phase"
+    )
+    feedback: str = Field(default="", description="Feedback on the last action")
+    search_results: str = Field(
+        default="", description="Papers/snippets returned from an explore step"
+    )
+    explored_context: str = Field(
+        default="",
+        description="Accumulated research context from all explore steps so far",
+    )
+    explore_steps_left: int = Field(
+        default=3, description="Remaining explore steps before forced generate"
     )

openenv_explainer_env.egg-info/PKG-INFO CHANGED Viewed

@@ -6,6 +6,9 @@ Requires-Python: >=3.10
 Requires-Dist: openenv-core[core]>=0.2.2
 Requires-Dist: marimo>=0.10.0
 Requires-Dist: manim>=0.18.0
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

 Requires-Dist: openenv-core[core]>=0.2.2
 Requires-Dist: marimo>=0.10.0
 Requires-Dist: manim>=0.18.0
+Requires-Dist: wikipedia-api>=0.14.1
+Requires-Dist: huggingface-hub>=1.12.0
+Requires-Dist: httpx>=0.28.1
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_explainer_env.egg-info/SOURCES.txt CHANGED Viewed

@@ -14,6 +14,18 @@ openenv_explainer_env.egg-info/dependency_links.txt
 openenv_explainer_env.egg-info/entry_points.txt
 openenv_explainer_env.egg-info/requires.txt
 openenv_explainer_env.egg-info/top_level.txt
 server/__init__.py
 server/app.py
-server/explainer_env_environment.py

 openenv_explainer_env.egg-info/entry_points.txt
 openenv_explainer_env.egg-info/requires.txt
 openenv_explainer_env.egg-info/top_level.txt
+rewards/__init__.py
+rewards/exploration.py
+rewards/generation.py
+rewards/llm_judge.py
+rewards/sandbox.py
+rewards/sources.py
 server/__init__.py
 server/app.py
+server/explainer_env_environment.py
+tests/test_client_server.py
+tests/test_docker.py
+tests/test_environment.py
+tests/test_models.py
+tests/test_rewards.py
+tests/test_task_bank.py

openenv_explainer_env.egg-info/requires.txt CHANGED Viewed

@@ -1,6 +1,9 @@
 openenv-core[core]>=0.2.2
 marimo>=0.10.0
 manim>=0.18.0
 [dev]
 pytest>=8.0.0

 openenv-core[core]>=0.2.2
 marimo>=0.10.0
 manim>=0.18.0
+wikipedia-api>=0.14.1
+huggingface-hub>=1.12.0
+httpx>=0.28.1
 [dev]
 pytest>=8.0.0

out.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml CHANGED Viewed

@@ -11,6 +11,9 @@ dependencies = [
     "openenv-core[core]>=0.2.2",
     "marimo>=0.10.0",
     "manim>=0.18.0",
 ]
 [project.optional-dependencies]
@@ -24,5 +27,8 @@ server = "explainer_env.server.app:main"
 [tool.setuptools]
 include-package-data = true
-packages = ["explainer_env", "explainer_env.server"]
-package-dir = { "explainer_env" = ".", "explainer_env.server" = "server" }

     "openenv-core[core]>=0.2.2",
     "marimo>=0.10.0",
     "manim>=0.18.0",
+    "wikipedia-api>=0.14.1",
+    "huggingface-hub>=1.12.0",
+    "httpx>=0.28.1",
 ]
 [project.optional-dependencies]
 [tool.setuptools]
 include-package-data = true
+packages = ["explainer_env", "explainer_env.server", "explainer_env.rewards"]
+package-dir = { "explainer_env" = ".", "explainer_env.server" = "server", "explainer_env.rewards" = "rewards" }
+[dependency-groups]
+dev = []

rewards/README.md ADDED Viewed

	@@ -0,0 +1,107 @@

+# Rewards
+Multi-component reward system for the two-phase explore → generate episode.
+## Episode Flow
+```
+reset() → [explore × 0..3] → generate × 1 → done
+```
+Each step returns a per-step reward. The agent learns both *what* to explore and *when to stop*.
+## Exploration Rewards (`exploration.py`)
+Per-step reward for each `explore` action. Gated by information need — once the agent has enough info, further exploration yields diminishing returns.
+| Component | Weight | Range | Description |
+|---|---|---|---|
+| `query_relevance` | 0.40 | 0–1 | Topic + keyword overlap with search query |
+| `result_novelty` | 0.30 | 0–1 | New words vs. already-seen content |
+| `research_breadth` | 0.10 | 0–1 | Number of sources gathered (target >= 2) |
+| `content_sufficiency` | 0.20 | 0–1 | Keyword coverage across task + research (gates reward) |
+| `step_cost` | -0.05 | flat | Per-step penalty — exploration must justify itself |
+**Gating mechanism**: `info_need = 1 - sufficiency`. Raw reward is scaled by `0.3 + 0.7 * info_need`, so high sufficiency → low reward for more exploration. This teaches the agent to stop when it has enough.
+## Generation Rewards (`generation.py`)
+Single reward on the `generate` action that ends the episode.
+| Component | Weight | Range | Description |
+|---|---|---|---|
+| `code_valid` | 0.15 | 0/1 | AST parses without errors |
+| `code_runs` | 0.15 | 0/1 | Sandbox execution succeeds (marimo export / manim render) |
+| `coverage` | 0.15 | 0–1 | Fraction of task keywords in generated code |
+| `format_match` | 0.10 | 0.3/1.0 | Chosen format matches task's preferred format (1.0 if task has no preference) |
+| `structure` | 0.15* | 0–1 | Structural quality (cells/scenes, UI elements, viz) |
+| `narration` | 0.10* | 0–1 | Narration quality (manim only; words, scene markers) |
+| `context_usage` | 0.20 | 0–1 | Code references terms from exploration research |
+*For marimo format, narration weight (0.10) is redistributed to structure (→ 0.25 total).
+**Skip penalty**: Generating without any exploration incurs -0.1 penalty.
+## Search Sources (`sources.py`)
+All search calls are **async** (httpx + wikipediaapi.AsyncWikipedia). Content is retrieved at section/chunk level and ranked using **BM25** to surface the most relevant parts.
+| Source | Library | Use Case | Retrieval |
+|---|---|---|---|
+| HuggingFace Papers | httpx → `huggingface.co/api/papers/search` + `papers/{id}.md` | ML/AI topics (semantic search) | Search → top paper → read markdown → BM25 chunk ranking |
+| Wikipedia | `wikipediaapi.AsyncWikipedia` | Math, algorithms, general topics | Search → top page → section tree → BM25 section ranking |
+**Routing**: ML-related queries (detected by keyword heuristic) → HF Papers. Everything else → Wikipedia. Agent can override with prefix: `hf: query` or `wiki: query`. No explicit routing reward — bad routing leads to weak content → low novelty/relevance naturally.
+**Top 1 result** by default from each source, with top-3 BM25-ranked sections/chunks returned.
+## Sandbox (`sandbox.py`)
+| Check | Tool | Timeout |
+|---|---|---|
+| `ast_parses` | Python `ast.parse` | — |
+| `run_marimo` | `marimo export html` | 15s |
+| `run_manim` | `manim render -ql` | 30s |
+## LLM-as-Judge (`llm_judge.py`)
+**Eval-only** — not used in the training loop (too slow, non-deterministic for RL reward signals).
+### What it scores
+5 dimensions on a 1-10 scale, normalized to 0-1:
+| Dimension | Description |
+|---|---|
+| Clarity | Is the concept explained clearly for the target tier? |
+| Accuracy | Is the content technically correct? |
+| Engagement | Does the code create an engaging, interactive experience? |
+| Completeness | Does it cover the key aspects of the topic? |
+| Appropriateness | Is the depth appropriate for the audience tier? |
+### Configuration
+Set environment variables:
+- `JUDGE_API_URL` (required) — OpenAI-compatible endpoint (e.g. vLLM, ollama, OpenAI)
+- `JUDGE_API_KEY` (optional) — Bearer token for the API
+- `JUDGE_MODEL` (optional, default: `gpt-4o-mini`) — Model to use for judging
+### Usage
+```python
+from rewards.llm_judge import judge_explainability, is_available
+if is_available():
+    score, details = judge_explainability(
+        code="import marimo as mo\n...",
+        topic="Linear Regression",
+        tier="beginner",
+        fmt="marimo",
+    )
+    print(f"Explainability score: {score:.2f}")
+    print(f"Rationale: {details.get('rationale', '')}")
+```
+### What's used during training instead
+During GRPO training, the 12 heuristic reward components above provide the learning signal. They are deterministic, fast (<1ms per step), and decomposable for debugging. The LLM-as-judge is reserved for final evaluation and human-interpretable quality assessment.

rewards/__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+"""Reward components for the Explainer environment."""
+from .exploration import compute_explore_reward
+from .generation import compute_generate_reward
+from .sandbox import run_marimo, run_manim
+from .sources import search, search_hf_papers, search_wikipedia
+__all__ = [
+    "compute_explore_reward",
+    "compute_generate_reward",
+    "run_marimo",
+    "run_manim",
+    "search",
+    "search_hf_papers",
+    "search_wikipedia",
+]

rewards/exploration.py ADDED Viewed

	@@ -0,0 +1,138 @@

+"""Reward components for the exploration phase.
+During exploration, the agent searches for papers/resources relevant to the
+task topic. Rewards measure query quality, result relevance, research breadth,
+and exploration efficiency (knowing when to stop).
+"""
+from __future__ import annotations
+def query_relevance(query: str, topic: str, keywords_csv: str) -> float:
+    """Score how relevant the search query is to the task (0-1)."""
+    if not query or not query.strip():
+        return 0.0
+    query_lower = query.strip().lower()
+    score = 0.0
+    if topic.lower() in query_lower:
+        score += 0.4
+    keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
+    if keywords:
+        hits = sum(1 for kw in keywords if kw in query_lower)
+        score += 0.4 * (hits / len(keywords))
+    if len(query_lower.split()) >= 3:
+        score += 0.2
+    return min(1.0, score)
+def result_novelty(
+    new_content: str, accumulated_context: list[str]
+) -> float:
+    """Score how much new information this result adds (0-1).
+    Penalises repeated searches that return content already seen.
+    """
+    if not new_content or not new_content.strip():
+        return 0.0
+    if not accumulated_context:
+        return 1.0
+    new_words = set(new_content.lower().split())
+    seen_words: set[str] = set()
+    for ctx in accumulated_context:
+        seen_words.update(ctx.lower().split())
+    if not new_words:
+        return 0.0
+    novel = new_words - seen_words
+    return min(1.0, len(novel) / max(len(new_words), 1))
+def research_breadth(accumulated_context: list[str], min_sources: int = 2) -> float:
+    """Score whether the agent gathered enough sources (0-1)."""
+    n = len(accumulated_context)
+    if n >= min_sources:
+        return 1.0
+    return n / min_sources
+def content_sufficiency(
+    task_content: str,
+    keywords_csv: str,
+    accumulated_context: list[str],
+) -> float:
+    """Measure how much of the task's keywords are already covered (0-1).
+    Combines the task's own content with accumulated research. When this is
+    high (>0.8), further exploration has diminishing value — the agent already
+    has enough information.
+    """
+    keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
+    if not keywords:
+        return 1.0  # no keywords to cover
+    # Build combined text from task content + all research so far
+    combined = task_content.lower()
+    for ctx in accumulated_context:
+        combined += " " + ctx.lower()
+    hits = sum(1 for kw in keywords if kw in combined)
+    return hits / len(keywords)
+# -- Weights --
+W_QUERY = 0.40
+W_NOVELTY = 0.30
+W_BREADTH = 0.10
+W_SUFFICIENCY_GATE = 0.20  # gates reward by remaining information need
+# Flat cost per explore step — agent must expect enough gain to justify it
+STEP_COST = 0.05
+def compute_explore_reward(
+    query: str,
+    result_text: str,
+    topic: str,
+    keywords_csv: str,
+    task_content: str,
+    accumulated_context: list[str],
+) -> tuple[float, dict]:
+    """Compute per-step exploration reward. Returns (total, components).
+    Reward is gated by (1 - sufficiency): once the agent has enough info,
+    further exploration is nearly unrewarded. A flat step cost penalises
+    unnecessary searches.
+    """
+    q_rel = query_relevance(query, topic, keywords_csv)
+    novelty = result_novelty(result_text, accumulated_context)
+    breadth = research_breadth(accumulated_context)
+    sufficiency = content_sufficiency(task_content, keywords_csv, accumulated_context)
+    # Information need: how much value exploration still has
+    info_need = max(0.0, 1.0 - sufficiency)
+    # Raw reward from query + novelty + breadth
+    raw = W_QUERY * q_rel + W_NOVELTY * novelty + W_BREADTH * breadth
+    # Gate by info need: high sufficiency → low reward for exploring more
+    # Also add direct sufficiency-gate component so agent sees the signal
+    total = raw * (0.3 + 0.7 * info_need) + W_SUFFICIENCY_GATE * info_need - STEP_COST
+    total = max(0.0, total)
+    components = {
+        "query_relevance": round(q_rel, 3),
+        "result_novelty": round(novelty, 3),
+        "research_breadth": round(breadth, 3),
+        "content_sufficiency": round(sufficiency, 3),
+        "info_need": round(info_need, 3),
+        "step_cost": STEP_COST,
+        "explore_total": round(total, 4),
+    }
+    return total, components

rewards/generation.py ADDED Viewed

	@@ -0,0 +1,218 @@

+"""Reward components for the generation phase.
+After exploration, the agent generates marimo/manim code. Rewards measure
+code quality, execution success, keyword coverage, format match, structural
+quality, and narration (manim only).
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+from .sandbox import ast_parses
+if TYPE_CHECKING:
+    from ..task_bank import Task
+# ---------------------------------------------------------------------------
+# Individual scorers
+# ---------------------------------------------------------------------------
+def keyword_coverage(code: str, keywords_csv: str) -> float:
+    """Fraction of task keywords mentioned in the code (case-insensitive)."""
+    if not keywords_csv:
+        return 0.0
+    keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
+    if not keywords:
+        return 0.0
+    code_lower = code.lower()
+    hits = sum(1 for kw in keywords if kw in code_lower)
+    return hits / len(keywords)
+def format_match(chosen_format: str, task: Task) -> float:
+    """1.0 if format matches the task's preferred format, else 0.3.
+    If the task has no preferred format (None), any choice scores 1.0.
+    """
+    if task.preferred_format is None:
+        return 1.0
+    return 1.0 if chosen_format == task.preferred_format else 0.3
+def marimo_structure(code: str, task: Task) -> float:
+    """Score structural quality of a marimo notebook (0-1)."""
+    score = 0.0
+    if "import marimo" in code or "from marimo" in code:
+        score += 0.2
+    if "marimo.App" in code or "mo.App" in code:
+        score += 0.1
+    cell_count = code.count("@app.cell")
+    if cell_count >= 3:
+        score += 0.2
+    elif cell_count >= 1:
+        score += 0.1
+    ui_patterns = ["mo.ui.", "mo.md(", "mo.Html", "mo.accordion", "mo.callout"]
+    ui_hits = sum(1 for p in ui_patterns if p in code)
+    score += min(0.2, ui_hits * 0.05)
+    viz_patterns = ["plt.", "px.", "altair", "matplotlib", "plotly", "mo.ui.slider"]
+    viz_hits = sum(1 for p in viz_patterns if p in code)
+    if task.data_available and viz_hits > 0:
+        score += 0.2
+    elif viz_hits > 0:
+        score += 0.1
+    if task.tier == "advanced" and cell_count >= 6:
+        score += 0.1
+    elif task.tier == "intermediate" and cell_count >= 4:
+        score += 0.1
+    elif task.tier == "beginner" and cell_count >= 2:
+        score += 0.1
+    return min(1.0, score)
+def manim_structure(code: str, task: Task) -> float:
+    """Score structural quality of a manim scene (0-1)."""
+    from .sandbox import extract_scene_class
+    score = 0.0
+    if "from manim" in code or "import manim" in code:
+        score += 0.2
+    if extract_scene_class(code) is not None:
+        score += 0.2
+    if "def construct" in code:
+        score += 0.1
+    anim_patterns = [
+        "self.play(",
+        "self.wait(",
+        "Create(",
+        "FadeIn(",
+        "FadeOut(",
+        "Transform(",
+        "Write(",
+        "MoveToTarget",
+        "Indicate(",
+        "ReplacementTransform(",
+    ]
+    anim_hits = sum(1 for p in anim_patterns if p in code)
+    score += min(0.3, anim_hits * 0.05)
+    math_patterns = ["MathTex(", "Tex(", "Axes(", "NumberPlane(", "Graph("]
+    math_hits = sum(1 for p in math_patterns if p in code)
+    if math_hits > 0:
+        score += 0.1
+    if task.tier == "advanced" and anim_hits >= 6:
+        score += 0.1
+    elif task.tier == "intermediate" and anim_hits >= 4:
+        score += 0.1
+    elif task.tier == "beginner" and anim_hits >= 2:
+        score += 0.1
+    return min(1.0, score)
+def structure_score(code: str, fmt: str, task: Task) -> float:
+    if fmt == "marimo":
+        return marimo_structure(code, task)
+    return manim_structure(code, task)
+def narration_score(narration: str, fmt: str) -> float:
+    """Score narration quality. Only relevant for manim format."""
+    if fmt != "manim":
+        return 1.0
+    if not narration or not narration.strip():
+        return 0.0
+    score = 0.0
+    words = narration.split()
+    if len(words) >= 30:
+        score += 0.4
+    elif len(words) >= 10:
+        score += 0.2
+    scene_markers = ["scene", "step", "first", "next", "then", "finally", "now"]
+    marker_hits = sum(1 for m in scene_markers if m in narration.lower())
+    score += min(0.3, marker_hits * 0.1)
+    if len(words) >= 50:
+        score += 0.3
+    elif len(words) >= 20:
+        score += 0.15
+    return min(1.0, score)
+def context_usage(code: str, accumulated_context: list[str]) -> float:
+    """Score whether the generated code incorporates research findings (0-1).
+    Higher score if the code references terms found during exploration.
+    """
+    if not accumulated_context:
+        return 0.5  # no exploration context to compare against
+    context_words: set[str] = set()
+    for ctx in accumulated_context:
+        context_words.update(
+            w.lower() for w in ctx.split() if len(w) > 3
+        )
+    if not context_words:
+        return 0.5
+    code_words = set(w.lower() for w in code.split() if len(w) > 3)
+    overlap = code_words & context_words
+    return min(1.0, len(overlap) / max(len(context_words), 1) * 5)
+# -- Weights for generation reward --
+W_CODE_VALID = 0.15
+W_CODE_RUNS = 0.15
+W_COVERAGE = 0.15
+W_FORMAT = 0.10
+W_STRUCTURE = 0.15
+W_NARRATION = 0.10
+W_CONTEXT_USE = 0.20  # rewards using exploration findings
+def compute_generate_reward(
+    code: str,
+    fmt: str,
+    narration: str,
+    task: Task,
+    exec_success: bool,
+    accumulated_context: list[str],
+) -> tuple[float, dict]:
+    """Compute the generation-phase reward. Returns (total, components)."""
+    c_valid = 1.0 if ast_parses(code) else 0.0
+    c_runs = 1.0 if exec_success else 0.0
+    c_coverage = keyword_coverage(code, task.keywords)
+    c_format = format_match(fmt, task)
+    c_struct = structure_score(code, fmt, task)
+    c_narr = narration_score(narration, fmt)
+    c_ctx = context_usage(code, accumulated_context)
+    # Redistribute narration weight to structure for marimo
+    if fmt == "marimo":
+        w_struct = W_STRUCTURE + W_NARRATION
+        w_narr = 0.0
+    else:
+        w_struct = W_STRUCTURE
+        w_narr = W_NARRATION
+    total = (
+        W_CODE_VALID * c_valid
+        + W_CODE_RUNS * c_runs
+        + W_COVERAGE * c_coverage
+        + W_FORMAT * c_format
+        + w_struct * c_struct
+        + w_narr * c_narr
+        + W_CONTEXT_USE * c_ctx
+    )
+    components = {
+        "code_valid": round(c_valid, 3),
+        "code_runs": round(c_runs, 3),
+        "coverage": round(c_coverage, 3),
+        "format_match": round(c_format, 3),
+        "structure": round(c_struct, 3),
+        "narration": round(c_narr, 3),
+        "context_usage": round(c_ctx, 3),
+        "generate_total": round(total, 4),
+    }
+    return total, components

rewards/llm_judge.py ADDED Viewed

	@@ -0,0 +1,132 @@

+"""Optional LLM-as-judge for final explainability scoring.
+This module is eval-only — it is NOT used in the training loop because
+LLM judge calls are too slow and non-deterministic for RL reward signals.
+Usage at eval time:
+    score, rationale = judge_explainability(code, topic, tier)
+Requires an OpenAI-compatible endpoint (e.g. vLLM, ollama, or OpenAI API).
+Set JUDGE_API_URL and optionally JUDGE_API_KEY environment variables.
+"""
+from __future__ import annotations
+import json
+import os
+import urllib.request
+JUDGE_API_URL = os.environ.get("JUDGE_API_URL", "")
+JUDGE_API_KEY = os.environ.get("JUDGE_API_KEY", "")
+JUDGE_MODEL = os.environ.get("JUDGE_MODEL", "gpt-4o-mini")
+JUDGE_PROMPT = """\
+You are an expert educator evaluating the quality of an interactive explanation.
+TOPIC: {topic}
+AUDIENCE TIER: {tier}
+FORMAT: {fmt}
+CODE:
+```
+{code}
+```
+{narration_section}
+Rate the explanation on a scale of 1-10 across these dimensions:
+1. **Clarity**: Is the concept explained clearly for the target audience tier?
+2. **Accuracy**: Is the content technically correct?
+3. **Engagement**: Does the code create an engaging, interactive experience?
+4. **Completeness**: Does it cover the key aspects of the topic?
+5. **Appropriateness**: Is the depth appropriate for the audience tier?
+Respond in JSON format:
+{{
+  "clarity": <1-10>,
+  "accuracy": <1-10>,
+  "engagement": <1-10>,
+  "completeness": <1-10>,
+  "appropriateness": <1-10>,
+  "overall": <1-10>,
+  "rationale": "<brief explanation>"
+}}
+"""
+def judge_explainability(
+    code: str,
+    topic: str,
+    tier: str = "intermediate",
+    fmt: str = "marimo",
+    narration: str = "",
+    api_url: str | None = None,
+    api_key: str | None = None,
+    model: str | None = None,
+) -> tuple[float, dict]:
+    """Score explainability using an LLM judge.
+    Returns (normalized_score, details) where normalized_score is 0.0-1.0
+    and details contains per-dimension scores and rationale.
+    Returns (0.0, {"error": ...}) if the judge is unavailable or fails.
+    """
+    url = api_url or JUDGE_API_URL
+    key = api_key or JUDGE_API_KEY
+    mdl = model or JUDGE_MODEL
+    if not url:
+        return 0.0, {"error": "JUDGE_API_URL not configured"}
+    narration_section = ""
+    if narration and fmt == "manim":
+        narration_section = f"NARRATION:\n{narration}"
+    prompt = JUDGE_PROMPT.format(
+        topic=topic,
+        tier=tier,
+        fmt=fmt,
+        code=code[:4000],  # trim to avoid exceeding context
+        narration_section=narration_section,
+    )
+    payload = json.dumps({
+        "model": mdl,
+        "messages": [{"role": "user", "content": prompt}],
+        "temperature": 0.0,
+        "max_tokens": 300,
+    }).encode()
+    headers = {
+        "Content-Type": "application/json",
+        "User-Agent": "ExplainerEnv/1.0",
+    }
+    if key:
+        headers["Authorization"] = f"Bearer {key}"
+    try:
+        req = urllib.request.Request(
+            f"{url.rstrip('/')}/chat/completions",
+            data=payload,
+            headers=headers,
+        )
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            data = json.loads(resp.read().decode())
+        content = data["choices"][0]["message"]["content"]
+        # Parse JSON from response (handle markdown code blocks)
+        content = content.strip()
+        if content.startswith("```"):
+            content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip()
+        scores = json.loads(content)
+        overall = scores.get("overall", 5) / 10.0
+        return overall, scores
+    except Exception as e:
+        return 0.0, {"error": str(e)}
+def is_available() -> bool:
+    """Check if the LLM judge is configured."""
+    return bool(JUDGE_API_URL)

rewards/notes.ipynb ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c55af9de",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/Users/mmt10913/Personal/hackathons/openenv-hackathon/.venv/bin/python\n"
+     ]
+    }
+   ],
+   "source": [
+    "! which python"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4905024a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

rewards/sandbox.py ADDED Viewed

	@@ -0,0 +1,83 @@

+"""Sandbox execution for marimo and manim code."""
+import ast
+import subprocess
+import tempfile
+from pathlib import Path
+def ast_parses(code: str) -> bool:
+    """Check whether the code is valid Python (AST-parseable)."""
+    try:
+        ast.parse(code)
+        return True
+    except SyntaxError:
+        return False
+def extract_scene_class(code: str) -> str | None:
+    """Return the first Scene subclass name found in manim code."""
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return None
+    for node in ast.walk(tree):
+        if isinstance(node, ast.ClassDef):
+            for base in node.bases:
+                base_name = ""
+                if isinstance(base, ast.Name):
+                    base_name = base.id
+                elif isinstance(base, ast.Attribute):
+                    base_name = base.attr
+                if "Scene" in base_name:
+                    return node.name
+    return None
+def run_marimo(code: str, timeout: int = 15) -> tuple[bool, str]:
+    """Try exporting a marimo notebook to HTML. Returns (success, message)."""
+    with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
+        f.write(code)
+        f.flush()
+        tmp = f.name
+    try:
+        result = subprocess.run(
+            ["marimo", "export", "html", tmp],
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+        )
+        if result.returncode == 0:
+            return True, "marimo export succeeded"
+        return False, result.stderr[:500]
+    except FileNotFoundError:
+        return False, "marimo not installed"
+    except subprocess.TimeoutExpired:
+        return False, "marimo export timed out"
+    finally:
+        Path(tmp).unlink(missing_ok=True)
+def run_manim(code: str, timeout: int = 30) -> tuple[bool, str]:
+    """Try rendering a manim scene (low quality). Returns (success, message)."""
+    scene = extract_scene_class(code)
+    if scene is None:
+        return False, "No Scene subclass found in code"
+    with tempfile.TemporaryDirectory() as tmpdir:
+        src = Path(tmpdir) / "scene.py"
+        src.write_text(code)
+        try:
+            result = subprocess.run(
+                ["manim", "render", "-ql", "--media_dir", tmpdir, str(src), scene],
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+            )
+            if result.returncode == 0:
+                return True, "manim render succeeded"
+            return False, result.stderr[:500]
+        except FileNotFoundError:
+            return False, "manim not installed"
+        except subprocess.TimeoutExpired:
+            return False, "manim render timed out"

rewards/sources.py ADDED Viewed

	@@ -0,0 +1,321 @@

+"""Async search sources for the exploration phase.
+Two backends:
+  - HuggingFace Papers: ML-focused semantic search via huggingface_hub
+  - Wikipedia: general topics via wikipediaapi (section-level + BM25 RAG)
+The agent's query is routed to the most appropriate source, or the agent
+can specify a source prefix (e.g. "wiki: merge sort", "hf: attention").
+All external calls use async I/O (httpx / wikipediaapi.AsyncWikipedia).
+"""
+from __future__ import annotations
+import math
+import re
+from collections import Counter
+import httpx
+import wikipediaapi
+HF_MAX_RESULTS = 1
+WIKI_TOP_SECTIONS = 3
+# BM25 parameters
+_BM25_K1 = 1.5
+_BM25_B = 0.75
+# ---------------------------------------------------------------------------
+# BM25 scoring (pure Python, no external deps)
+# ---------------------------------------------------------------------------
+_STOP_WORDS = {
+    "the", "a", "an", "is", "are", "was", "were", "be", "been", "being",
+    "have", "has", "had", "do", "does", "did", "will", "would", "could",
+    "should", "may", "might", "shall", "can", "need", "dare", "ought",
+    "to", "of", "in", "for", "on", "with", "at", "by", "from", "as",
+    "into", "through", "during", "before", "after", "and", "but", "or",
+    "not", "no", "nor", "so", "yet", "both", "either", "neither",
+    "this", "that", "these", "those", "it", "its", "he", "she", "they",
+}
+def _tokenize(text: str) -> list[str]:
+    """Lowercase alphanumeric tokenization, stop words removed."""
+    return [w for w in re.findall(r"\w+", text.lower()) if w not in _STOP_WORDS and len(w) > 1]
+def _bm25_rank(
+    query: str, documents: list[tuple[str, str]], top_k: int = 3
+) -> list[tuple[float, str, str]]:
+    """Rank (title, text) documents against query using BM25.
+    Returns top_k results sorted by score descending.
+    """
+    if not documents:
+        return []
+    query_terms = _tokenize(query)
+    if not query_terms:
+        return [(0.0, t, txt) for t, txt in documents[:top_k]]
+    # Precompute document token stats
+    doc_tokens = [_tokenize(f"{title} {text}") for title, text in documents]
+    doc_lengths = [len(t) for t in doc_tokens]
+    avgdl = sum(doc_lengths) / max(len(doc_lengths), 1)
+    n_docs = len(documents)
+    # Document frequency per query term
+    df: dict[str, int] = {}
+    for term in set(query_terms):
+        df[term] = sum(1 for tokens in doc_tokens if term in tokens)
+    # Score each document
+    scored: list[tuple[float, str, str]] = []
+    for i, (title, text) in enumerate(documents):
+        tf_counts = Counter(doc_tokens[i])
+        dl = doc_lengths[i]
+        score = 0.0
+        for term in query_terms:
+            if term not in df or df[term] == 0:
+                continue
+            idf = math.log((n_docs - df[term] + 0.5) / (df[term] + 0.5) + 1.0)
+            tf = tf_counts.get(term, 0)
+            numerator = tf * (_BM25_K1 + 1)
+            denominator = tf + _BM25_K1 * (1 - _BM25_B + _BM25_B * dl / max(avgdl, 1))
+            score += idf * numerator / denominator
+        scored.append((score, title, text))
+    scored.sort(key=lambda x: x[0], reverse=True)
+    return scored[:top_k]
+# ---------------------------------------------------------------------------
+# Wikipedia section flattening
+# ---------------------------------------------------------------------------
+_SKIP_SECTIONS = {
+    "references", "external links", "see also", "further reading",
+    "notes", "citations", "bibliography", "sources",
+}
+def _flatten_sections(
+    sections: list[wikipediaapi.WikipediaPageSection],
+    max_depth: int = 2,
+    _depth: int = 0,
+) -> list[tuple[str, str]]:
+    """Flatten Wikipedia section tree into (title, text) pairs."""
+    result: list[tuple[str, str]] = []
+    for section in sections:
+        if section.title.lower() in _SKIP_SECTIONS:
+            continue
+        if section.text.strip():
+            result.append((section.title, section.text.strip()))
+        if _depth < max_depth and section.sections:
+            result.extend(
+                _flatten_sections(section.sections, max_depth, _depth + 1)
+            )
+    return result
+# ---------------------------------------------------------------------------
+# Wikipedia (async, section-level BM25)
+# ---------------------------------------------------------------------------
+async def search_wikipedia(
+    query: str, top_sections: int = WIKI_TOP_SECTIONS
+) -> str:
+    """Search Wikipedia and return the most relevant sections via BM25.
+    Flow: search(query) -> top page -> get sections -> BM25 rank -> top-k.
+    """
+    try:
+        wiki = wikipediaapi.AsyncWikipedia(
+            user_agent="ExplainerEnv/1.0 (hackathon project)",
+            language="en",
+        )
+        # Search for the top page
+        search_results = await wiki.search(query, limit=1)
+        if not search_results or not search_results.pages:
+            return f"No Wikipedia results for: {query}"
+        # pages is a dict keyed by title
+        title = next(iter(search_results.pages))
+        page = wiki.page(title)
+        # Check page exists
+        exists = await page.exists()
+        if not exists:
+            return f"No Wikipedia article found for: {query}"
+        # Get summary + sections
+        summary = await page.summary
+        sections = await page.sections
+        # Build document list: summary as first doc, then flattened sections
+        docs: list[tuple[str, str]] = []
+        if summary:
+            docs.append((title, summary))
+        docs.extend(_flatten_sections(sections))
+        if not docs:
+            return f"Wikipedia article '{title}' has no content."
+        # BM25 rank sections against query
+        ranked = _bm25_rank(query, docs, top_k=top_sections)
+        parts = []
+        for score, sec_title, sec_text in ranked:
+            # Truncate long sections to keep total size reasonable
+            trimmed = sec_text[:800] if len(sec_text) > 800 else sec_text
+            parts.append(f"## {sec_title}\n{trimmed}")
+        return f"Wikipedia: {title}\n\n" + "\n\n---\n\n".join(parts)
+    except Exception as e:
+        return f"Wikipedia search error: {e}"
+# ---------------------------------------------------------------------------
+# HuggingFace Papers (async, httpx + read_paper)
+# ---------------------------------------------------------------------------
+async def search_hf_papers(
+    query: str, max_results: int = HF_MAX_RESULTS
+) -> str:
+    """Search HuggingFace Papers (semantic search) and read top result's content.
+    Flow: search(query) -> top paper ID -> read_paper(id) -> BM25 chunk.
+    """
+    try:
+        async with httpx.AsyncClient(timeout=15.0) as client:
+            # 1. Search for papers
+            resp = await client.get(
+                "https://huggingface.co/api/papers/search",
+                params={"q": query, "limit": max_results},
+                headers={"User-Agent": "ExplainerEnv/1.0"},
+            )
+            resp.raise_for_status()
+            papers = resp.json()
+            if not papers:
+                return f"No HF papers found for: {query}"
+            paper = papers[0]
+            paper_id = paper.get("id", "")
+            title = paper.get("title", "Untitled")
+            summary = paper.get("summary", "")
+            if not paper_id:
+                # No paper ID — return just the search result
+                return f"Title: {title}\nAbstract: {summary[:600]}"
+            # 2. Read paper markdown content
+            md_resp = await client.get(
+                f"https://huggingface.co/papers/{paper_id}.md",
+                headers={"User-Agent": "ExplainerEnv/1.0"},
+                follow_redirects=True,
+            )
+            if md_resp.status_code == 200 and md_resp.text.strip():
+                md_content = md_resp.text
+                # Chunk markdown by headings
+                chunks = _chunk_markdown(md_content)
+                if chunks:
+                    ranked = _bm25_rank(query, chunks, top_k=3)
+                    parts = [f"Title: {title}\nPaper ID: {paper_id}\n"]
+                    for _score, sec_title, sec_text in ranked:
+                        trimmed = sec_text[:800] if len(sec_text) > 800 else sec_text
+                        parts.append(f"## {sec_title}\n{trimmed}")
+                    return "\n\n---\n\n".join(parts)
+            # Fallback: return abstract only
+            return (
+                f"Title: {title}\n"
+                f"Paper ID: {paper_id}\n"
+                f"Abstract: {summary[:600]}"
+            )
+    except Exception as e:
+        return f"HF Papers search error: {e}"
+def _chunk_markdown(md_text: str) -> list[tuple[str, str]]:
+    """Split markdown text into (heading, body) chunks."""
+    chunks: list[tuple[str, str]] = []
+    current_heading = "Introduction"
+    current_lines: list[str] = []
+    for line in md_text.split("\n"):
+        if line.startswith("#"):
+            # Save previous chunk
+            body = "\n".join(current_lines).strip()
+            if body:
+                chunks.append((current_heading, body))
+            # Start new chunk
+            current_heading = line.lstrip("#").strip() or "Section"
+            current_lines = []
+        else:
+            current_lines.append(line)
+    # Save last chunk
+    body = "\n".join(current_lines).strip()
+    if body:
+        chunks.append((current_heading, body))
+    return chunks
+# ---------------------------------------------------------------------------
+# Router
+# ---------------------------------------------------------------------------
+# Keywords that suggest ML/AI topics (used when category is not available)
+_ML_KEYWORDS = {
+    "neural", "network", "transformer", "attention", "embedding", "gradient",
+    "backpropagation", "cnn", "rnn", "lstm", "gpt", "bert", "diffusion",
+    "reinforcement", "generative", "discriminative", "autoencoder", "vae",
+    "gan", "fine-tuning", "pretraining", "tokenizer", "llm", "rlhf",
+    "classification", "regression", "clustering", "deep learning",
+    "machine learning", "optimization", "sgd", "adam", "batch normalization",
+}
+def _is_ml_topic(query: str) -> bool:
+    """Heuristic: does the query look like an ML/AI topic?"""
+    query_lower = query.lower()
+    return any(kw in query_lower for kw in _ML_KEYWORDS)
+async def search(query: str, category_hint: str = "") -> str:
+    """Route a search query to the best source.
+    The agent can override by prefixing the query:
+      - "hf: attention mechanism"    -> HF Papers only
+      - "wiki: merge sort"           -> Wikipedia only
+    Otherwise, uses keyword heuristic to route ML topics to HF Papers
+    and everything else to Wikipedia.
+    """
+    query = query.strip()
+    # Explicit source prefix
+    lower = query.lower()
+    if lower.startswith("hf:"):
+        return await search_hf_papers(query[3:].strip())
+    if lower.startswith("wiki:"):
+        return await search_wikipedia(query[5:].strip())
+    # Auto-route based on keyword heuristic
+    if _is_ml_topic(query) or _is_ml_topic(category_hint):
+        hf = await search_hf_papers(query)
+        if "error" in hf.lower() or "no hf papers" in hf.lower():
+            return await search_wikipedia(query)
+        return hf
+    # Default: Wikipedia
+    return await search_wikipedia(query)

server/explainer_env_environment.py CHANGED Viewed

@@ -1,17 +1,19 @@
 """
-Research → Interactive Explainer Environment.
-The agent receives a topic and generates interactive educational content
-as either a Marimo notebook or Manim animation (with narration script).
-Reward is computed from 6 components: code_valid, code_runs, coverage,
-format_match, structure, and narration.
 """
-import ast
 import random
-import subprocess
-import tempfile
-from pathlib import Path
 from uuid import uuid4
 from openenv.core.env_server.interfaces import Environment
@@ -19,284 +21,31 @@ from openenv.core.env_server.types import State
 try:
     from ..models import ExplainerAction, ExplainerObservation
     from ..task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
 except ImportError:
     from models import ExplainerAction, ExplainerObservation
     from task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
-# ---------------------------------------------------------------------------
-# Reward helpers
-# ---------------------------------------------------------------------------
-MAX_STEPS = 1  # single-turn for now
-def _ast_parses(code: str) -> bool:
-    """Check whether the code is valid Python (AST-parseable)."""
-    try:
-        ast.parse(code)
-        return True
-    except SyntaxError:
-        return False
-def _run_marimo(code: str, timeout: int = 15) -> tuple[bool, str]:
-    """Try exporting a marimo notebook to HTML. Returns (success, message)."""
-    with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
-        f.write(code)
-        f.flush()
-        tmp = f.name
-    try:
-        result = subprocess.run(
-            ["marimo", "export", "html", tmp],
-            capture_output=True,
-            text=True,
-            timeout=timeout,
-        )
-        if result.returncode == 0:
-            return True, "marimo export succeeded"
-        return False, result.stderr[:500]
-    except FileNotFoundError:
-        return False, "marimo not installed"
-    except subprocess.TimeoutExpired:
-        return False, "marimo export timed out"
-    finally:
-        Path(tmp).unlink(missing_ok=True)
-def _extract_scene_class(code: str) -> str | None:
-    """Return the first Scene subclass name found in the code."""
-    try:
-        tree = ast.parse(code)
-    except SyntaxError:
-        return None
-    for node in ast.walk(tree):
-        if isinstance(node, ast.ClassDef):
-            for base in node.bases:
-                base_name = ""
-                if isinstance(base, ast.Name):
-                    base_name = base.id
-                elif isinstance(base, ast.Attribute):
-                    base_name = base.attr
-                if "Scene" in base_name:
-                    return node.name
-    return None
-def _run_manim(code: str, timeout: int = 30) -> tuple[bool, str]:
-    """Try rendering a manim scene (low quality). Returns (success, message)."""
-    scene = _extract_scene_class(code)
-    if scene is None:
-        return False, "No Scene subclass found in code"
-    with tempfile.TemporaryDirectory() as tmpdir:
-        src = Path(tmpdir) / "scene.py"
-        src.write_text(code)
-        try:
-            result = subprocess.run(
-                ["manim", "render", "-ql", "--media_dir", tmpdir, str(src), scene],
-                capture_output=True,
-                text=True,
-                timeout=timeout,
-            )
-            if result.returncode == 0:
-                return True, "manim render succeeded"
-            return False, result.stderr[:500]
-        except FileNotFoundError:
-            return False, "manim not installed"
-        except subprocess.TimeoutExpired:
-            return False, "manim render timed out"
-def _keyword_coverage(code: str, keywords_csv: str) -> float:
-    """Fraction of task keywords mentioned in the code (case-insensitive)."""
-    if not keywords_csv:
-        return 0.0
-    keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
-    if not keywords:
-        return 0.0
-    code_lower = code.lower()
-    hits = sum(1 for kw in keywords if kw in code_lower)
-    return hits / len(keywords)
-def _format_match_score(chosen_format: str, task: Task) -> float:
-    """1.0 if format matches the task's preferred format, else 0.3."""
-    return 1.0 if chosen_format == task.preferred_format else 0.3
-def _marimo_structure(code: str, task: Task) -> float:
-    """Score structural quality of a marimo notebook (0-1)."""
-    score = 0.0
-    # Has marimo import
-    if "import marimo" in code or "from marimo" in code:
-        score += 0.2
-    # Has app = marimo.App()
-    if "marimo.App" in code or "mo.App" in code:
-        score += 0.1
-    # Cell count: look for @app.cell decorators
-    cell_count = code.count("@app.cell")
-    if cell_count >= 3:
-        score += 0.2
-    elif cell_count >= 1:
-        score += 0.1
-    # Interactive elements
-    ui_patterns = ["mo.ui.", "mo.md(", "mo.Html", "mo.accordion", "mo.callout"]
-    ui_hits = sum(1 for p in ui_patterns if p in code)
-    score += min(0.2, ui_hits * 0.05)
-    # Data visualization when data_available
-    viz_patterns = ["plt.", "px.", "altair", "matplotlib", "plotly", "mo.ui.slider"]
-    viz_hits = sum(1 for p in viz_patterns if p in code)
-    if task.data_available and viz_hits > 0:
-        score += 0.2
-    elif viz_hits > 0:
-        score += 0.1
-    # Tier depth: advanced should have more cells
-    if task.tier == "advanced" and cell_count >= 6:
-        score += 0.1
-    elif task.tier == "intermediate" and cell_count >= 4:
-        score += 0.1
-    elif task.tier == "beginner" and cell_count >= 2:
-        score += 0.1
-    return min(1.0, score)
-def _manim_structure(code: str, task: Task) -> float:
-    """Score structural quality of a manim scene (0-1)."""
-    score = 0.0
-    # Has manim import
-    if "from manim" in code or "import manim" in code:
-        score += 0.2
-    # Has Scene subclass
-    if _extract_scene_class(code) is not None:
-        score += 0.2
-    # Has construct method
-    if "def construct" in code:
-        score += 0.1
-    # Animation calls
-    anim_patterns = [
-        "self.play(",
-        "self.wait(",
-        "Create(",
-        "FadeIn(",
-        "FadeOut(",
-        "Transform(",
-        "Write(",
-        "MoveToTarget",
-        "Indicate(",
-        "ReplacementTransform(",
-    ]
-    anim_hits = sum(1 for p in anim_patterns if p in code)
-    score += min(0.3, anim_hits * 0.05)
-    # Math objects for math topics
-    math_patterns = ["MathTex(", "Tex(", "Axes(", "NumberPlane(", "Graph("]
-    math_hits = sum(1 for p in math_patterns if p in code)
-    if task.category.startswith("math") and math_hits > 0:
-        score += 0.1
-    elif math_hits > 0:
-        score += 0.05
-    # Tier depth
-    if task.tier == "advanced" and anim_hits >= 6:
-        score += 0.1
-    elif task.tier == "intermediate" and anim_hits >= 4:
-        score += 0.1
-    elif task.tier == "beginner" and anim_hits >= 2:
-        score += 0.1
-    return min(1.0, score)
-def _structure_score(code: str, fmt: str, task: Task) -> float:
-    if fmt == "marimo":
-        return _marimo_structure(code, task)
-    return _manim_structure(code, task)
-def _narration_score(narration: str, fmt: str) -> float:
-    """Score narration quality. Only relevant for manim format."""
-    if fmt != "manim":
-        return 1.0  # full marks when narration not applicable
-    if not narration or not narration.strip():
-        return 0.0
-    score = 0.0
-    words = narration.split()
-    # Has meaningful length
-    if len(words) >= 30:
-        score += 0.4
-    elif len(words) >= 10:
-        score += 0.2
-    # Has scene markers or structure
-    scene_markers = ["scene", "step", "first", "next", "then", "finally", "now"]
-    marker_hits = sum(1 for m in scene_markers if m in narration.lower())
-    score += min(0.3, marker_hits * 0.1)
-    # Proportional to code complexity (rough heuristic)
-    if len(words) >= 50:
-        score += 0.3
-    elif len(words) >= 20:
-        score += 0.15
-    return min(1.0, score)
-# Reward weights
-W_CODE_VALID = 0.20
-W_CODE_RUNS = 0.20
-W_COVERAGE = 0.20
-W_FORMAT = 0.15
-W_STRUCTURE = 0.15
-W_NARRATION = 0.10
-def compute_reward(
-    action: ExplainerAction, task: Task, exec_success: bool
-) -> tuple[float, dict]:
-    """Compute the 6-component reward. Returns (total, components_dict)."""
-    code_valid = 1.0 if _ast_parses(action.code) else 0.0
-    code_runs = 1.0 if exec_success else 0.0
-    coverage = _keyword_coverage(action.code, task.keywords)
-    fmt_match = _format_match_score(action.format, task)
-    structure = _structure_score(action.code, action.format, task)
-    narration = _narration_score(action.narration, action.format)
-    # When format is marimo, redistribute narration weight to structure
-    if action.format == "marimo":
-        w_struct = W_STRUCTURE + W_NARRATION
-        w_narr = 0.0
-    else:
-        w_struct = W_STRUCTURE
-        w_narr = W_NARRATION
-    total = (
-        W_CODE_VALID * code_valid
-        + W_CODE_RUNS * code_runs
-        + W_COVERAGE * coverage
-        + W_FORMAT * fmt_match
-        + w_struct * structure
-        + w_narr * narration
-    )
-    components = {
-        "code_valid": round(code_valid, 3),
-        "code_runs": round(code_runs, 3),
-        "coverage": round(coverage, 3),
-        "format_match": round(fmt_match, 3),
-        "structure": round(structure, 3),
-        "narration": round(narration, 3),
-        "total": round(total, 4),
-    }
-    return total, components
-# ---------------------------------------------------------------------------
-# Environment
-# ---------------------------------------------------------------------------
 class ExplainerEnvironment(Environment):
     """
-    Research → Interactive Explainer environment.
-    reset() samples a task from the task bank and returns it as an observation.
-    step() receives the agent's generated code, executes it in a sandbox,
-    computes the multi-component reward, and returns feedback.
     """
     SUPPORTS_CONCURRENT_SESSIONS: bool = True
@@ -305,15 +54,108 @@ class ExplainerEnvironment(Environment):
         super().__init__()
         self._state = State(episode_id=str(uuid4()), step_count=0)
         self._current_task: Task | None = None
-        self._difficulty_pool: list[Task] = EASY_TASKS  # start easy
     def reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
-        """Sample a task and return the initial observation."""
         self._state = State(
             episode_id=episode_id or str(uuid4()), step_count=0
         )
-        # Allow caller to set difficulty via kwargs
         difficulty = kwargs.get("difficulty", None)
         if difficulty == "medium":
             pool = MEDIUM_TASKS
@@ -324,11 +166,7 @@ class ExplainerEnvironment(Environment):
         else:
             pool = self._difficulty_pool
-        if seed is not None:
-            rng = random.Random(seed)
-        else:
-            rng = random.Random()
         self._current_task = rng.choice(pool) if pool else rng.choice(ALL_TASKS)
         t = self._current_task
@@ -337,80 +175,160 @@ class ExplainerEnvironment(Environment):
             content=t.content,
             tier=t.tier,
             keywords=t.keywords,
-            category=t.category,
             data_available=t.data_available,
-            feedback="",
             done=False,
             reward=0.0,
         )
-    def step(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
-        """Execute the agent's code, compute reward, return feedback."""
-        self._state.step_count += 1
-        task = self._current_task
-        if task is None:
-            return ExplainerObservation(
-                feedback="Error: no task set. Call reset() first.",
-                done=True,
-                reward=-1.0,
             )
-        try:
-            # 1. Check if code parses
-            parses = _ast_parses(action.code)
-            # 2. Try to run the code
-            exec_success = False
-            exec_msg = ""
-            if parses:
-                if action.format == "marimo":
-                    exec_success, exec_msg = _run_marimo(action.code)
-                elif action.format == "manim":
-                    exec_success, exec_msg = _run_manim(action.code)
-            else:
-                exec_msg = "Code has syntax errors and cannot be parsed."
-            # 3. Compute reward
-            reward, components = compute_reward(action, task, exec_success)
-            # 4. Build feedback
-            feedback_parts = []
-            if not parses:
-                feedback_parts.append("SYNTAX ERROR: code does not parse.")
-            elif not exec_success:
-                feedback_parts.append(f"EXECUTION FAILED: {exec_msg}")
-            else:
-                feedback_parts.append(f"EXECUTION OK: {exec_msg}")
-            feedback_parts.append(
-                f"Reward breakdown: {', '.join(f'{k}={v}' for k, v in components.items())}"
-            )
-            feedback = "\n".join(feedback_parts)
-            done = self._state.step_count >= MAX_STEPS
-            return ExplainerObservation(
-                topic=task.topic,
-                content=task.content,
-                tier=task.tier,
-                keywords=task.keywords,
-                category=task.category,
-                data_available=task.data_available,
-                feedback=feedback,
-                done=done,
-                reward=reward,
-                metadata={"step": self._state.step_count, **components},
-            )
-        except Exception as e:
-            return ExplainerObservation(
-                topic=task.topic if task else "",
-                content="",
-                tier="beginner",
-                feedback=f"Environment error: {e}",
-                done=True,
-                reward=0.0,
-            )
     @property
     def state(self) -> State:

 """
+Research → Interactive Explainer Environment (multi-step, async).
+Episode flow:
+  1. reset() → agent gets a topic + tier
+  2. step(explore) × 1..MAX_EXPLORE → agent searches, gets papers back
+  3. step(generate) × 1 → agent produces marimo/manim code → episode ends
+Each step returns a per-step reward. The final generate step also includes
+a generation reward that accounts for how well the code uses the research.
+The environment supports async via reset_async() / step_async() overrides.
+OpenEnv's HTTP server detects these and calls them directly (no thread pool).
 """
 import random
 from uuid import uuid4
 from openenv.core.env_server.interfaces import Environment
 try:
     from ..models import ExplainerAction, ExplainerObservation
+    from ..rewards.exploration import compute_explore_reward
+    from ..rewards.generation import compute_generate_reward
+    from ..rewards.sandbox import ast_parses, run_manim, run_marimo
+    from ..rewards.sources import search as search_sources
     from ..task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
 except ImportError:
     from models import ExplainerAction, ExplainerObservation
+    from rewards.exploration import compute_explore_reward
+    from rewards.generation import compute_generate_reward
+    from rewards.sandbox import ast_parses, run_manim, run_marimo
+    from rewards.sources import search as search_sources
     from task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
+MAX_EXPLORE_STEPS = 3
 class ExplainerEnvironment(Environment):
     """
+    Multi-step Research → Interactive Explainer environment.
+    Phase 1 (explore): agent issues search queries, receives papers/wiki sections.
+    Phase 2 (generate): agent produces marimo/manim code using the research.
+    Supports async via reset_async() / step_async() — OpenEnv's server detects
+    the overrides and awaits them directly instead of using a thread pool.
     """
     SUPPORTS_CONCURRENT_SESSIONS: bool = True
         super().__init__()
         self._state = State(episode_id=str(uuid4()), step_count=0)
         self._current_task: Task | None = None
+        self._difficulty_pool: list[Task] = EASY_TASKS
+        self._accumulated_context: list[str] = []
+        self._explore_steps: int = 0
+    # ------------------------------------------------------------------
+    # Sync interface (fallback — OpenEnv prefers async when overridden)
+    # ------------------------------------------------------------------
     def reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
+        """Sample a task and return the initial observation (sync)."""
+        return self._do_reset(seed=seed, episode_id=episode_id, **kwargs)
+    def step(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
+        """Route to explore or generate handler (sync — explore uses blocking fallback)."""
+        import asyncio
+        self._state.step_count += 1
+        task = self._current_task
+        if task is None:
+            return ExplainerObservation(
+                feedback="Error: no task set. Call reset() first.",
+                done=True,
+                reward=-1.0,
+            )
+        try:
+            if action.action_type == "explore":
+                # Run async explore in a new event loop for sync callers
+                return asyncio.run(self._handle_explore(action, task))
+            elif action.action_type == "generate":
+                return self._handle_generate(action, task)
+            else:
+                return self._make_obs(
+                    task,
+                    phase="explore",
+                    feedback=f"Unknown action_type: {action.action_type}",
+                    reward=0.0,
+                    done=True,
+                )
+        except Exception as e:
+            return self._make_obs(
+                task,
+                phase="done",
+                feedback=f"Environment error: {e}",
+                reward=0.0,
+                done=True,
+            )
+    # ------------------------------------------------------------------
+    # Async interface (preferred — OpenEnv detects these overrides)
+    # ------------------------------------------------------------------
+    async def reset_async(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
+        """Sample a task and return the initial observation (async)."""
+        return self._do_reset(seed=seed, episode_id=episode_id, **kwargs)
+    async def step_async(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
+        """Route to explore or generate handler (async)."""
+        self._state.step_count += 1
+        task = self._current_task
+        if task is None:
+            return ExplainerObservation(
+                feedback="Error: no task set. Call reset() first.",
+                done=True,
+                reward=-1.0,
+            )
+        try:
+            if action.action_type == "explore":
+                return await self._handle_explore(action, task)
+            elif action.action_type == "generate":
+                return self._handle_generate(action, task)
+            else:
+                return self._make_obs(
+                    task,
+                    phase="explore",
+                    feedback=f"Unknown action_type: {action.action_type}",
+                    reward=0.0,
+                    done=True,
+                )
+        except Exception as e:
+            return self._make_obs(
+                task,
+                phase="done",
+                feedback=f"Environment error: {e}",
+                reward=0.0,
+                done=True,
+            )
+    # ------------------------------------------------------------------
+    # Internal
+    # ------------------------------------------------------------------
+    def _do_reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
+        """Shared reset logic (no I/O, so sync is fine)."""
         self._state = State(
             episode_id=episode_id or str(uuid4()), step_count=0
         )
+        self._accumulated_context = []
+        self._explore_steps = 0
         difficulty = kwargs.get("difficulty", None)
         if difficulty == "medium":
             pool = MEDIUM_TASKS
         else:
             pool = self._difficulty_pool
+        rng = random.Random(seed) if seed is not None else random.Random()
         self._current_task = rng.choice(pool) if pool else rng.choice(ALL_TASKS)
         t = self._current_task
             content=t.content,
             tier=t.tier,
             keywords=t.keywords,
             data_available=t.data_available,
+            phase="explore",
+            feedback="Research phase: search for relevant papers before generating.",
+            search_results="",
+            explored_context="",
+            explore_steps_left=MAX_EXPLORE_STEPS,
             done=False,
             reward=0.0,
         )
+    async def _handle_explore(self, action: ExplainerAction, task: Task) -> ExplainerObservation:
+        """Process an explore action: search HF Papers/Wikipedia, score query."""
+        if self._explore_steps >= MAX_EXPLORE_STEPS:
+            return self._make_obs(
+                task,
+                phase="generate",
+                feedback="Max explore steps reached. You must now generate.",
+                reward=0.0,
+            )
+        self._explore_steps += 1
+        query = action.query.strip()
+        if not query:
+            return self._make_obs(
+                task,
+                phase="explore",
+                feedback="Empty query. Provide a search query.",
+                reward=0.0,
             )
+        # Search HF Papers / Wikipedia (async, routed by keyword heuristic)
+        results_text = await search_sources(query, category_hint=task.topic)
+        self._accumulated_context.append(results_text)
+        # Compute per-step exploration reward
+        reward, components = compute_explore_reward(
+            query=query,
+            result_text=results_text,
+            topic=task.topic,
+            keywords_csv=task.keywords,
+            task_content=task.content,
+            accumulated_context=self._accumulated_context,
+        )
+        steps_left = MAX_EXPLORE_STEPS - self._explore_steps
+        if steps_left > 0:
+            phase = "explore"
+            hint = f"{steps_left} explore step(s) left. Continue researching or generate."
+        else:
+            phase = "generate"
+            hint = "Max explore steps reached. You must now generate."
+        return self._make_obs(
+            task,
+            phase=phase,
+            feedback=f"{hint}\nReward: {components}",
+            search_results=results_text,
+            reward=reward,
+            metadata={"step": self._state.step_count, "phase": "explore", **components},
+        )
+    def _handle_generate(self, action: ExplainerAction, task: Task) -> ExplainerObservation:
+        """Process a generate action: run sandbox, compute generation reward."""
+        fmt = action.format or "marimo"
+        code = action.code
+        narration = action.narration
+        # Penalise generating without any exploration
+        if self._explore_steps == 0:
+            skip_penalty = -0.1
+            penalty_msg = "Warning: generating without any research. -0.1 penalty."
+        else:
+            skip_penalty = 0.0
+            penalty_msg = ""
+        # Sandbox execution
+        parses = ast_parses(code)
+        exec_success = False
+        exec_msg = ""
+        if parses:
+            if fmt == "marimo":
+                exec_success, exec_msg = run_marimo(code)
+            elif fmt == "manim":
+                exec_success, exec_msg = run_manim(code)
+        else:
+            exec_msg = "Code has syntax errors and cannot be parsed."
+        # Generation reward
+        reward, components = compute_generate_reward(
+            code=code,
+            fmt=fmt,
+            narration=narration,
+            task=task,
+            exec_success=exec_success,
+            accumulated_context=self._accumulated_context,
+        )
+        reward = max(0.0, reward + skip_penalty)
+        # Feedback
+        parts = []
+        if penalty_msg:
+            parts.append(penalty_msg)
+        if not parses:
+            parts.append("SYNTAX ERROR: code does not parse.")
+        elif not exec_success:
+            parts.append(f"EXECUTION FAILED: {exec_msg}")
+        else:
+            parts.append(f"EXECUTION OK: {exec_msg}")
+        parts.append(
+            f"Reward: {', '.join(f'{k}={v}' for k, v in components.items())}"
+        )
+        return self._make_obs(
+            task,
+            phase="done",
+            feedback="\n".join(parts),
+            reward=reward,
+            done=True,
+            metadata={
+                "step": self._state.step_count,
+                "phase": "generate",
+                "explore_steps_used": self._explore_steps,
+                **components,
+            },
+        )
+    def _make_obs(
+        self,
+        task: Task,
+        *,
+        phase: str,
+        feedback: str,
+        reward: float = 0.0,
+        done: bool = False,
+        search_results: str = "",
+        metadata: dict | None = None,
+    ) -> ExplainerObservation:
+        """Helper to build a consistent observation."""
+        return ExplainerObservation(
+            topic=task.topic,
+            content=task.content,
+            tier=task.tier,
+            keywords=task.keywords,
+            data_available=task.data_available,
+            phase=phase,
+            feedback=feedback,
+            search_results=search_results,
+            explored_context="\n---\n".join(self._accumulated_context),
+            explore_steps_left=MAX_EXPLORE_STEPS - self._explore_steps,
+            done=done,
+            reward=reward,
+            metadata=metadata or {},
+        )
     @property
     def state(self) -> State:

task_bank.py CHANGED Viewed

@@ -1,8 +1,9 @@
 """
 Curated task bank for the Research → Interactive Explainer environment.
-Tasks are organized by category and difficulty. Each task has a preferred format
-(marimo or manim) that the format_match reward component checks against.
 """
 from dataclasses import dataclass
@@ -15,10 +16,9 @@ class Task:
     content: str
     tier: Literal["beginner", "intermediate", "advanced"]
     keywords: str
-    category: str
     data_available: bool
-    preferred_format: Literal["marimo", "manim"]
     difficulty: Literal["easy", "medium", "hard"]
 # ---------- ML Concepts (Marimo-biased) ----------
@@ -29,7 +29,6 @@ ML_CONCEPTS: list[Task] = [
         content="Linear regression fits a line to data by minimizing squared errors. Given input features X and target y, it finds weights w such that y ≈ Xw. The loss function is MSE = (1/n) Σ(yi - ŷi)².",
         tier="beginner",
         keywords="linear regression,least squares,MSE,gradient descent,weights,bias",
-        category="cs.LG",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
@@ -39,7 +38,6 @@ ML_CONCEPTS: list[Task] = [
         content="Gradient descent iteratively updates parameters by moving in the direction of steepest decrease of the loss function. Update rule: θ = θ - α∇L(θ), where α is the learning rate. Variants include SGD, mini-batch, and Adam.",
         tier="beginner",
         keywords="gradient descent,learning rate,loss function,SGD,convergence,optimization",
-        category="cs.LG",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
@@ -49,7 +47,6 @@ ML_CONCEPTS: list[Task] = [
         content="Decision trees split data recursively based on feature thresholds that maximize information gain (or minimize Gini impurity). Each leaf node represents a class label or regression value.",
         tier="beginner",
         keywords="decision tree,information gain,Gini impurity,splitting,leaf node,classification",
-        category="cs.LG",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
@@ -59,7 +56,6 @@ ML_CONCEPTS: list[Task] = [
         content="K-means partitions n observations into k clusters by iteratively assigning points to nearest centroid and updating centroids to cluster means. Converges to local optimum. Sensitive to initialization — use k-means++ for better starts.",
         tier="intermediate",
         keywords="k-means,clustering,centroid,Euclidean distance,convergence,k-means++",
-        category="cs.LG",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
@@ -69,7 +65,6 @@ ML_CONCEPTS: list[Task] = [
         content="The attention mechanism computes a weighted sum of values (V) where weights come from compatibility of queries (Q) and keys (K): Attention(Q,K,V) = softmax(QK^T/√dk)V. Self-attention allows each position to attend to all positions in the input.",
         tier="intermediate",
         keywords="attention,self-attention,query,key,value,softmax,transformer,scaled dot-product",
-        category="cs.LG",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
@@ -79,7 +74,6 @@ ML_CONCEPTS: list[Task] = [
         content="Backpropagation computes gradients of the loss with respect to each weight by applying the chain rule layer by layer from output to input. It enables efficient training of deep networks by reusing intermediate computations.",
         tier="intermediate",
         keywords="backpropagation,chain rule,gradient,computational graph,forward pass,backward pass",
-        category="cs.LG",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
@@ -89,7 +83,6 @@ ML_CONCEPTS: list[Task] = [
         content="CNNs use learnable filters that slide over input (convolution) to detect local patterns like edges, textures, and shapes. Key operations: convolution, pooling, and fully-connected layers. Translation equivariance is a key inductive bias.",
         tier="intermediate",
         keywords="CNN,convolution,pooling,filter,feature map,stride,padding,translation equivariance",
-        category="cs.CV",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
@@ -99,7 +92,6 @@ ML_CONCEPTS: list[Task] = [
         content="Batch normalization normalizes activations within a mini-batch: x̂ = (x - μ_B) / √(σ²_B + ε), then scales and shifts: y = γx̂ + β. Reduces internal covariate shift, enables higher learning rates, and acts as a regularizer.",
         tier="advanced",
         keywords="batch normalization,internal covariate shift,running mean,running variance,gamma,beta",
-        category="cs.LG",
         data_available=False,
         preferred_format="marimo",
         difficulty="hard",
@@ -109,7 +101,6 @@ ML_CONCEPTS: list[Task] = [
         content="VAEs learn a probabilistic latent space by encoding inputs to distributions q(z|x) and decoding samples p(x|z). The ELBO loss = reconstruction + KL divergence. Reparameterization trick enables backprop through sampling: z = μ + σ⊙ε.",
         tier="advanced",
         keywords="VAE,ELBO,KL divergence,reparameterization,latent space,encoder,decoder,generative",
-        category="cs.LG",
         data_available=False,
         preferred_format="marimo",
         difficulty="hard",
@@ -119,9 +110,7 @@ ML_CONCEPTS: list[Task] = [
         content="An agent interacts with an environment, observing states, taking actions, and receiving rewards. The goal is to learn a policy π(a|s) that maximizes cumulative discounted reward. Key concepts: value function V(s), Q-function Q(s,a), Bellman equation.",
         tier="beginner",
         keywords="reinforcement learning,agent,environment,reward,policy,value function,Q-function,Bellman",
-        category="cs.LG",
         data_available=False,
-        preferred_format="marimo",
         difficulty="easy",
     ),
 ]
@@ -135,7 +124,6 @@ MATH_TOPICS: list[Task] = [
         content="The Fourier transform decomposes a function into its constituent frequencies: F(ω) = ∫f(t)e^(-iωt)dt. Any periodic signal can be represented as a sum of sines and cosines. The DFT computes this for discrete samples.",
         tier="intermediate",
         keywords="Fourier transform,frequency,sine,cosine,DFT,spectrum,decomposition,harmonics",
-        category="math.NA",
         data_available=True,
         preferred_format="manim",
         difficulty="medium",
@@ -145,7 +133,6 @@ MATH_TOPICS: list[Task] = [
         content="For a matrix A, eigenvector v satisfies Av = λv where λ is the eigenvalue. Eigenvectors represent directions unchanged by the transformation (only scaled). PCA uses eigenvectors of the covariance matrix.",
         tier="intermediate",
         keywords="eigenvalue,eigenvector,matrix,linear transformation,PCA,covariance,diagonalization",
-        category="math.LA",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
@@ -155,7 +142,6 @@ MATH_TOPICS: list[Task] = [
         content="The Taylor series expands a function as an infinite sum of terms: f(x) = Σ f^(n)(a)/n! · (x-a)^n. Provides polynomial approximations to functions. Convergence depends on the radius of convergence.",
         tier="beginner",
         keywords="Taylor series,polynomial approximation,derivative,convergence,Maclaurin,expansion",
-        category="math.CA",
         data_available=False,
         preferred_format="manim",
         difficulty="easy",
@@ -165,9 +151,7 @@ MATH_TOPICS: list[Task] = [
         content="Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A)P(A)/P(B). It enables updating beliefs given new evidence. Foundation of Bayesian inference, spam filters, and medical diagnosis.",
         tier="beginner",
         keywords="Bayes theorem,conditional probability,prior,posterior,likelihood,evidence,Bayesian",
-        category="stat.ML",
         data_available=True,
-        preferred_format="manim",
         difficulty="easy",
     ),
     Task(
@@ -175,7 +159,6 @@ MATH_TOPICS: list[Task] = [
         content="The gradient ∇f points in the direction of steepest ascent. The directional derivative Duf = ∇f · u gives the rate of change in direction u. Gradient descent follows -∇f to minimize functions.",
         tier="intermediate",
         keywords="gradient,directional derivative,steepest ascent,contour,level set,multivariable calculus",
-        category="math.CA",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
@@ -185,7 +168,6 @@ MATH_TOPICS: list[Task] = [
         content="Multiplying a vector by a matrix transforms it: the columns of A define where basis vectors land. Composition of transformations = matrix multiplication. Determinant measures area/volume scaling.",
         tier="beginner",
         keywords="matrix multiplication,linear transformation,basis vectors,determinant,composition",
-        category="math.LA",
         data_available=False,
         preferred_format="manim",
         difficulty="easy",
@@ -195,9 +177,7 @@ MATH_TOPICS: list[Task] = [
         content="The CLT states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the population distribution. Requires finite variance. Rate: O(1/√n).",
         tier="intermediate",
         keywords="central limit theorem,normal distribution,sample mean,variance,convergence,CLT",
-        category="stat.ML",
         data_available=True,
-        preferred_format="manim",
         difficulty="medium",
     ),
     Task(
@@ -205,7 +185,6 @@ MATH_TOPICS: list[Task] = [
         content="SVD decomposes any matrix A = UΣV^T where U,V are orthogonal and Σ is diagonal with singular values. Used in dimensionality reduction, matrix completion, and computing pseudoinverse. Truncated SVD approximates with k largest singular values.",
         tier="advanced",
         keywords="SVD,singular value,orthogonal,dimensionality reduction,low-rank approximation,pseudoinverse",
-        category="math.LA",
         data_available=False,
         preferred_format="manim",
         difficulty="hard",
@@ -221,7 +200,6 @@ ALGORITHMS: list[Task] = [
         content="Merge sort divides the array in half, recursively sorts each half, then merges the sorted halves. Time complexity O(n log n), space O(n). Stable sort. Divide-and-conquer paradigm.",
         tier="beginner",
         keywords="merge sort,divide and conquer,recursion,O(n log n),stable sort,merging",
-        category="cs.DS",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
@@ -231,7 +209,6 @@ ALGORITHMS: list[Task] = [
         content="Binary search finds a target in a sorted array by repeatedly halving the search space. Compare target with middle element; eliminate half. Time O(log n). Requires sorted input.",
         tier="beginner",
         keywords="binary search,sorted array,O(log n),divide and conquer,search space,comparison",
-        category="cs.DS",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
@@ -241,7 +218,6 @@ ALGORITHMS: list[Task] = [
         content="Dijkstra's algorithm finds shortest paths from a source vertex to all others in a weighted graph with non-negative edges. Uses a priority queue. Greedily selects the nearest unvisited vertex. Time O((V+E) log V) with binary heap.",
         tier="intermediate",
         keywords="Dijkstra,shortest path,graph,priority queue,greedy,weighted edges,relaxation",
-        category="cs.DS",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
@@ -251,7 +227,6 @@ ALGORITHMS: list[Task] = [
         content="A* combines Dijkstra's algorithm with heuristics: f(n) = g(n) + h(n), where g is cost-so-far and h is estimated cost-to-goal. Optimal if h is admissible (never overestimates). Used in pathfinding and game AI.",
         tier="intermediate",
         keywords="A-star,heuristic,admissible,pathfinding,f-score,g-score,h-score,optimal",
-        category="cs.AI",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
@@ -261,7 +236,6 @@ ALGORITHMS: list[Task] = [
         content="Quick sort selects a pivot, partitions elements into less-than and greater-than groups, then recursively sorts each. Average O(n log n), worst O(n²). In-place. Pivot selection strategy matters (median-of-three).",
         tier="beginner",
         keywords="quick sort,pivot,partition,in-place,O(n log n),recursion,divide and conquer",
-        category="cs.DS",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
@@ -277,7 +251,6 @@ STATISTICS_TASKS: list[Task] = [
         content="EDA uses summary statistics and visualizations to understand data distributions, correlations, and anomalies before modeling. Key tools: histograms, box plots, scatter matrices, correlation heatmaps.",
         tier="beginner",
         keywords="EDA,histogram,box plot,correlation,scatter plot,distribution,outliers,summary statistics",
-        category="stat.ML",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
@@ -287,7 +260,6 @@ STATISTICS_TASKS: list[Task] = [
         content="Hypothesis testing determines if observed data provides sufficient evidence against a null hypothesis H0. Steps: formulate H0/H1, choose significance level α, compute test statistic, compare with critical value or p-value.",
         tier="intermediate",
         keywords="hypothesis testing,null hypothesis,p-value,significance level,t-test,type I error,type II error",
-        category="stat.ML",
         data_available=True,
         preferred_format="marimo",
         difficulty="medium",
@@ -297,7 +269,6 @@ STATISTICS_TASKS: list[Task] = [
         content="PCA reduces dimensionality by projecting data onto directions of maximum variance. Steps: center data, compute covariance matrix, find eigenvectors (principal components), project. Explained variance ratio guides k selection.",
         tier="intermediate",
         keywords="PCA,principal components,variance,dimensionality reduction,eigenvector,covariance,projection",
-        category="stat.ML",
         data_available=True,
         preferred_format="marimo",
         difficulty="medium",

 """
 Curated task bank for the Research → Interactive Explainer environment.
+Tasks are organized by difficulty (easy/medium/hard) and tier (beginner/intermediate/
+advanced). Each task optionally specifies a preferred format (marimo or manim); when
+None, the SLM must infer the best format and gets full format_match reward either way.
 """
 from dataclasses import dataclass
     content: str
     tier: Literal["beginner", "intermediate", "advanced"]
     keywords: str
     data_available: bool
     difficulty: Literal["easy", "medium", "hard"]
+    preferred_format: Literal["marimo", "manim"] | None = None
 # ---------- ML Concepts (Marimo-biased) ----------
         content="Linear regression fits a line to data by minimizing squared errors. Given input features X and target y, it finds weights w such that y ≈ Xw. The loss function is MSE = (1/n) Σ(yi - ŷi)².",
         tier="beginner",
         keywords="linear regression,least squares,MSE,gradient descent,weights,bias",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
         content="Gradient descent iteratively updates parameters by moving in the direction of steepest decrease of the loss function. Update rule: θ = θ - α∇L(θ), where α is the learning rate. Variants include SGD, mini-batch, and Adam.",
         tier="beginner",
         keywords="gradient descent,learning rate,loss function,SGD,convergence,optimization",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
         content="Decision trees split data recursively based on feature thresholds that maximize information gain (or minimize Gini impurity). Each leaf node represents a class label or regression value.",
         tier="beginner",
         keywords="decision tree,information gain,Gini impurity,splitting,leaf node,classification",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
         content="K-means partitions n observations into k clusters by iteratively assigning points to nearest centroid and updating centroids to cluster means. Converges to local optimum. Sensitive to initialization — use k-means++ for better starts.",
         tier="intermediate",
         keywords="k-means,clustering,centroid,Euclidean distance,convergence,k-means++",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
         content="The attention mechanism computes a weighted sum of values (V) where weights come from compatibility of queries (Q) and keys (K): Attention(Q,K,V) = softmax(QK^T/√dk)V. Self-attention allows each position to attend to all positions in the input.",
         tier="intermediate",
         keywords="attention,self-attention,query,key,value,softmax,transformer,scaled dot-product",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
         content="Backpropagation computes gradients of the loss with respect to each weight by applying the chain rule layer by layer from output to input. It enables efficient training of deep networks by reusing intermediate computations.",
         tier="intermediate",
         keywords="backpropagation,chain rule,gradient,computational graph,forward pass,backward pass",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
         content="CNNs use learnable filters that slide over input (convolution) to detect local patterns like edges, textures, and shapes. Key operations: convolution, pooling, and fully-connected layers. Translation equivariance is a key inductive bias.",
         tier="intermediate",
         keywords="CNN,convolution,pooling,filter,feature map,stride,padding,translation equivariance",
         data_available=False,
         preferred_format="marimo",
         difficulty="medium",
         content="Batch normalization normalizes activations within a mini-batch: x̂ = (x - μ_B) / √(σ²_B + ε), then scales and shifts: y = γx̂ + β. Reduces internal covariate shift, enables higher learning rates, and acts as a regularizer.",
         tier="advanced",
         keywords="batch normalization,internal covariate shift,running mean,running variance,gamma,beta",
         data_available=False,
         preferred_format="marimo",
         difficulty="hard",
         content="VAEs learn a probabilistic latent space by encoding inputs to distributions q(z|x) and decoding samples p(x|z). The ELBO loss = reconstruction + KL divergence. Reparameterization trick enables backprop through sampling: z = μ + σ⊙ε.",
         tier="advanced",
         keywords="VAE,ELBO,KL divergence,reparameterization,latent space,encoder,decoder,generative",
         data_available=False,
         preferred_format="marimo",
         difficulty="hard",
         content="An agent interacts with an environment, observing states, taking actions, and receiving rewards. The goal is to learn a policy π(a|s) that maximizes cumulative discounted reward. Key concepts: value function V(s), Q-function Q(s,a), Bellman equation.",
         tier="beginner",
         keywords="reinforcement learning,agent,environment,reward,policy,value function,Q-function,Bellman",
         data_available=False,
         difficulty="easy",
     ),
 ]
         content="The Fourier transform decomposes a function into its constituent frequencies: F(ω) = ∫f(t)e^(-iωt)dt. Any periodic signal can be represented as a sum of sines and cosines. The DFT computes this for discrete samples.",
         tier="intermediate",
         keywords="Fourier transform,frequency,sine,cosine,DFT,spectrum,decomposition,harmonics",
         data_available=True,
         preferred_format="manim",
         difficulty="medium",
         content="For a matrix A, eigenvector v satisfies Av = λv where λ is the eigenvalue. Eigenvectors represent directions unchanged by the transformation (only scaled). PCA uses eigenvectors of the covariance matrix.",
         tier="intermediate",
         keywords="eigenvalue,eigenvector,matrix,linear transformation,PCA,covariance,diagonalization",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
         content="The Taylor series expands a function as an infinite sum of terms: f(x) = Σ f^(n)(a)/n! · (x-a)^n. Provides polynomial approximations to functions. Convergence depends on the radius of convergence.",
         tier="beginner",
         keywords="Taylor series,polynomial approximation,derivative,convergence,Maclaurin,expansion",
         data_available=False,
         preferred_format="manim",
         difficulty="easy",
         content="Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A)P(A)/P(B). It enables updating beliefs given new evidence. Foundation of Bayesian inference, spam filters, and medical diagnosis.",
         tier="beginner",
         keywords="Bayes theorem,conditional probability,prior,posterior,likelihood,evidence,Bayesian",
         data_available=True,
         difficulty="easy",
     ),
     Task(
         content="The gradient ∇f points in the direction of steepest ascent. The directional derivative Duf = ∇f · u gives the rate of change in direction u. Gradient descent follows -∇f to minimize functions.",
         tier="intermediate",
         keywords="gradient,directional derivative,steepest ascent,contour,level set,multivariable calculus",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
         content="Multiplying a vector by a matrix transforms it: the columns of A define where basis vectors land. Composition of transformations = matrix multiplication. Determinant measures area/volume scaling.",
         tier="beginner",
         keywords="matrix multiplication,linear transformation,basis vectors,determinant,composition",
         data_available=False,
         preferred_format="manim",
         difficulty="easy",
         content="The CLT states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the population distribution. Requires finite variance. Rate: O(1/√n).",
         tier="intermediate",
         keywords="central limit theorem,normal distribution,sample mean,variance,convergence,CLT",
         data_available=True,
         difficulty="medium",
     ),
     Task(
         content="SVD decomposes any matrix A = UΣV^T where U,V are orthogonal and Σ is diagonal with singular values. Used in dimensionality reduction, matrix completion, and computing pseudoinverse. Truncated SVD approximates with k largest singular values.",
         tier="advanced",
         keywords="SVD,singular value,orthogonal,dimensionality reduction,low-rank approximation,pseudoinverse",
         data_available=False,
         preferred_format="manim",
         difficulty="hard",
         content="Merge sort divides the array in half, recursively sorts each half, then merges the sorted halves. Time complexity O(n log n), space O(n). Stable sort. Divide-and-conquer paradigm.",
         tier="beginner",
         keywords="merge sort,divide and conquer,recursion,O(n log n),stable sort,merging",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
         content="Binary search finds a target in a sorted array by repeatedly halving the search space. Compare target with middle element; eliminate half. Time O(log n). Requires sorted input.",
         tier="beginner",
         keywords="binary search,sorted array,O(log n),divide and conquer,search space,comparison",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
         content="Dijkstra's algorithm finds shortest paths from a source vertex to all others in a weighted graph with non-negative edges. Uses a priority queue. Greedily selects the nearest unvisited vertex. Time O((V+E) log V) with binary heap.",
         tier="intermediate",
         keywords="Dijkstra,shortest path,graph,priority queue,greedy,weighted edges,relaxation",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
         content="A* combines Dijkstra's algorithm with heuristics: f(n) = g(n) + h(n), where g is cost-so-far and h is estimated cost-to-goal. Optimal if h is admissible (never overestimates). Used in pathfinding and game AI.",
         tier="intermediate",
         keywords="A-star,heuristic,admissible,pathfinding,f-score,g-score,h-score,optimal",
         data_available=False,
         preferred_format="manim",
         difficulty="medium",
         content="Quick sort selects a pivot, partitions elements into less-than and greater-than groups, then recursively sorts each. Average O(n log n), worst O(n²). In-place. Pivot selection strategy matters (median-of-three).",
         tier="beginner",
         keywords="quick sort,pivot,partition,in-place,O(n log n),recursion,divide and conquer",
         data_available=True,
         preferred_format="manim",
         difficulty="easy",
         content="EDA uses summary statistics and visualizations to understand data distributions, correlations, and anomalies before modeling. Key tools: histograms, box plots, scatter matrices, correlation heatmaps.",
         tier="beginner",
         keywords="EDA,histogram,box plot,correlation,scatter plot,distribution,outliers,summary statistics",
         data_available=True,
         preferred_format="marimo",
         difficulty="easy",
         content="Hypothesis testing determines if observed data provides sufficient evidence against a null hypothesis H0. Steps: formulate H0/H1, choose significance level α, compute test statistic, compare with critical value or p-value.",
         tier="intermediate",
         keywords="hypothesis testing,null hypothesis,p-value,significance level,t-test,type I error,type II error",
         data_available=True,
         preferred_format="marimo",
         difficulty="medium",
         content="PCA reduces dimensionality by projecting data onto directions of maximum variance. Steps: center data, compute covariance matrix, find eigenvectors (principal components), project. Explained variance ratio guides k selection.",
         tier="intermediate",
         keywords="PCA,principal components,variance,dimensionality reduction,eigenvector,covariance,projection",
         data_available=True,
         preferred_format="marimo",
         difficulty="medium",

tests/__init__.py ADDED Viewed

File without changes

tests/run_tests.sh ADDED Viewed

	@@ -0,0 +1,76 @@

+#!/usr/bin/env bash
+# Run the test suite for explainer_env.
+#
+# Usage:
+#   tests/run_tests.sh              # fast tests (models, task_bank, rewards, environment)
+#   tests/run_tests.sh --all        # fast + client-server integration
+#   tests/run_tests.sh --docker     # fast + docker build & test
+#   tests/run_tests.sh --full       # everything
+set -euo pipefail
+cd "$(dirname "$0")/.."  # explainer_env/
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+NC='\033[0m'
+PASSED=0
+FAILED=0
+SKIPPED=0
+run() {
+    local label="$1"; shift
+    printf "%-40s" "$label"
+    if output=$("$@" 2>&1); then
+        echo -e "${GREEN}OK${NC}"
+        PASSED=$((PASSED + 1))
+    else
+        echo -e "${RED}FAIL${NC}"
+        echo "$output" | tail -5
+        FAILED=$((FAILED + 1))
+    fi
+}
+skip() {
+    printf "%-40s" "$1"
+    echo -e "${YELLOW}SKIP${NC}"
+    SKIPPED=$((SKIPPED + 1))
+}
+echo "=== explainer_env test suite ==="
+echo ""
+# --- Fast tests (no server needed) ---
+echo "--- Unit tests ---"
+run "models"      uv run python tests/test_models.py
+run "task_bank"   uv run python tests/test_task_bank.py
+run "rewards"     uv run python tests/test_rewards.py
+run "environment" uv run python tests/test_environment.py
+run "ruff lint"   uvx ruff check .
+# --- Integration tests (need server / docker) ---
+MODE="${1:-}"
+if [[ "$MODE" == "--all" || "$MODE" == "--full" ]]; then
+    echo ""
+    echo "--- Client-server integration ---"
+    run "client_server" uv run python tests/test_client_server.py
+else
+    echo ""
+    skip "client_server (use --all)"
+fi
+if [[ "$MODE" == "--docker" || "$MODE" == "--full" ]]; then
+    echo ""
+    echo "--- Docker integration ---"
+    run "docker" uv run python tests/test_docker.py
+else
+    skip "docker (use --docker or --full)"
+fi
+# --- Summary ---
+echo ""
+TOTAL=$((PASSED + FAILED + SKIPPED))
+echo "=== ${PASSED} passed, ${FAILED} failed, ${SKIPPED} skipped (${TOTAL} total) ==="
+[[ $FAILED -eq 0 ]] || exit 1

tests/test_client_server.py ADDED Viewed

	@@ -0,0 +1,96 @@

+"""Integration test: start server, connect client, run explore→generate.
+Usage:
+    uv run python tests/test_client_server.py          # auto-starts server
+    uv run python tests/test_client_server.py --url http://localhost:8000
+"""
+import argparse
+import subprocess
+import sys
+import time
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from client import ExplainerEnv
+from models import ExplainerAction
+def wait_for_server(url: str, timeout: int = 15):
+    import urllib.request
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        try:
+            urllib.request.urlopen(f"{url}/health", timeout=2)
+            return True
+        except Exception:
+            time.sleep(0.5)
+    return False
+def run_tests(base_url: str):
+    client = ExplainerEnv(base_url=base_url)
+    with client.sync() as sc:
+        # --- reset ---
+        result = sc.reset()
+        obs = result.observation
+        assert obs.topic, "reset should return a topic"
+        assert obs.phase == "explore"
+        assert obs.explore_steps_left == 3
+        print(f"  reset: topic={obs.topic!r}, phase={obs.phase}")
+        # --- explore ---
+        action = ExplainerAction(action_type="explore", query=obs.topic)
+        result = sc.step(action)
+        assert not result.done
+        assert result.observation.explore_steps_left == 2
+        print(f"  explore: reward={result.reward:.3f}, steps_left={result.observation.explore_steps_left}")
+        # --- generate ---
+        action = ExplainerAction(
+            action_type="generate",
+            format="marimo",
+            code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    mo.md('hi')\n    return\n",
+        )
+        result = sc.step(action)
+        assert result.done
+        assert isinstance(result.reward, (int, float))
+        print(f"  generate: reward={result.reward:.3f}, done={result.done}")
+        # --- second episode ---
+        result2 = sc.reset()
+        assert result2.observation.topic
+        print(f"  reset2: topic={result2.observation.topic!r}")
+    print("PASS: test_client_server (4/4)")
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--url", default=None)
+    args = parser.parse_args()
+    if args.url:
+        run_tests(args.url)
+    else:
+        proc = subprocess.Popen(
+            ["uv", "run", "server"],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+        )
+        try:
+            url = "http://localhost:8000"
+            if not wait_for_server(url):
+                stderr = proc.stderr.read().decode() if proc.stderr else ""
+                print(f"FAIL: server did not start\n{stderr}", file=sys.stderr)
+                sys.exit(1)
+            run_tests(url)
+        finally:
+            proc.terminate()
+            proc.wait(timeout=5)
+if __name__ == "__main__":
+    main()

tests/test_docker.py ADDED Viewed

	@@ -0,0 +1,113 @@

+"""Integration test: build + run Docker image, test via client.
+Usage:
+    uv run python tests/test_docker.py                    # build + run + test
+    uv run python tests/test_docker.py --skip-build       # reuse existing image
+    uv run python tests/test_docker.py --image my:tag     # custom image name
+"""
+import argparse
+import subprocess
+import sys
+import time
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+IMAGE = "explainer-env:latest"
+CONTAINER = "explainer-env-test"
+def wait_for_server(url: str, timeout: int = 30):
+    import urllib.request
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        try:
+            urllib.request.urlopen(f"{url}/health", timeout=2)
+            return True
+        except Exception:
+            time.sleep(1)
+    return False
+def docker_build(image: str):
+    env_dir = Path(__file__).resolve().parents[1]
+    print(f"  building {image} from {env_dir}...")
+    result = subprocess.run(
+        ["docker", "build", "-t", image, "-f", "server/Dockerfile", "."],
+        cwd=str(env_dir),
+        capture_output=True,
+        text=True,
+    )
+    if result.returncode != 0:
+        print(f"FAIL: docker build\n{result.stderr[-1000:]}", file=sys.stderr)
+        sys.exit(1)
+    print("  build OK")
+def docker_run(image: str, container: str):
+    # clean up stale container
+    subprocess.run(["docker", "rm", "-f", container], capture_output=True)
+    result = subprocess.run(
+        ["docker", "run", "-d", "--name", container, "-p", "8000:8000", image],
+        capture_output=True,
+        text=True,
+    )
+    if result.returncode != 0:
+        print(f"FAIL: docker run\n{result.stderr}", file=sys.stderr)
+        sys.exit(1)
+    print(f"  container {container} started")
+def docker_cleanup(container: str):
+    subprocess.run(["docker", "rm", "-f", container], capture_output=True)
+    print(f"  container {container} removed")
+def run_tests(base_url: str):
+    from client import ExplainerEnv
+    from models import ExplainerAction
+    client = ExplainerEnv(base_url=base_url)
+    with client.sync() as sc:
+        result = sc.reset()
+        assert result.observation.topic, "reset should return topic"
+        print(f"  reset: topic={result.observation.topic!r}")
+        action = ExplainerAction(
+            format="marimo",
+            code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    return\n",
+        )
+        result = sc.step(action)
+        assert isinstance(result.reward, (int, float))
+        print(f"  step:  reward={result.reward:.3f}, done={result.done}")
+    print("PASS: test_docker (2/2)")
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--skip-build", action="store_true")
+    parser.add_argument("--image", default=IMAGE)
+    args = parser.parse_args()
+    if not args.skip_build:
+        docker_build(args.image)
+    docker_run(args.image, CONTAINER)
+    try:
+        url = "http://localhost:8000"
+        if not wait_for_server(url):
+            logs = subprocess.run(
+                ["docker", "logs", CONTAINER], capture_output=True, text=True
+            )
+            print(f"FAIL: container didn't start\n{logs.stdout}\n{logs.stderr}", file=sys.stderr)
+            sys.exit(1)
+        run_tests(url)
+    finally:
+        docker_cleanup(CONTAINER)
+if __name__ == "__main__":
+    main()

tests/test_environment.py ADDED Viewed

	@@ -0,0 +1,163 @@

+"""Tests for ExplainerEnvironment — multi-step explore→generate lifecycle."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from models import ExplainerAction, ExplainerObservation
+from server.explainer_env_environment import ExplainerEnvironment
+def test_reset_returns_observation():
+    env = ExplainerEnvironment()
+    obs = env.reset(seed=1)
+    assert isinstance(obs, ExplainerObservation)
+    assert obs.topic != ""
+    assert obs.phase == "explore"
+    assert obs.explore_steps_left == 3
+    assert obs.done is False
+def test_reset_deterministic_with_seed():
+    env = ExplainerEnvironment()
+    obs1 = env.reset(seed=42)
+    obs2 = env.reset(seed=42)
+    assert obs1.topic == obs2.topic
+def test_explore_step():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    action = ExplainerAction(action_type="explore", query="gradient descent optimization")
+    obs = env.step(action)
+    assert obs.done is False
+    assert obs.explore_steps_left == 2
+    assert isinstance(obs.reward, (int, float))
+    assert obs.reward >= 0.0
+def test_explore_empty_query():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    action = ExplainerAction(action_type="explore", query="")
+    obs = env.step(action)
+    assert obs.reward == 0.0
+    assert "Empty query" in obs.feedback
+def test_explore_max_steps():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    for i in range(3):
+        obs = env.step(ExplainerAction(action_type="explore", query=f"search {i}"))
+    assert obs.phase == "generate"
+    assert obs.explore_steps_left == 0
+def test_explore_then_generate():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    # Explore
+    obs = env.step(ExplainerAction(action_type="explore", query="gradient descent"))
+    assert obs.done is False
+    assert obs.explored_context != ""
+    # Generate
+    obs = env.step(ExplainerAction(
+        action_type="generate",
+        format="marimo",
+        code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    return\n",
+    ))
+    assert obs.done is True
+    assert obs.phase == "done"
+    assert isinstance(obs.reward, (int, float))
+def test_generate_without_explore_penalty():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    obs = env.step(ExplainerAction(
+        action_type="generate",
+        format="marimo",
+        code="x = 1",
+    ))
+    assert obs.done is True
+    assert "penalty" in obs.feedback.lower() or "without" in obs.feedback.lower()
+def test_step_without_reset():
+    env = ExplainerEnvironment()
+    action = ExplainerAction(action_type="explore", query="test")
+    obs = env.step(action)
+    assert obs.done is True
+    assert obs.reward == -1.0
+def test_generate_reward_in_metadata():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    env.step(ExplainerAction(action_type="explore", query="gradient descent"))
+    obs = env.step(ExplainerAction(
+        action_type="generate",
+        format="marimo",
+        code="x = 1",
+    ))
+    for key in ("code_valid", "code_runs", "coverage", "format_match", "structure"):
+        assert key in obs.metadata, f"missing {key} in metadata"
+    assert "explore_steps_used" in obs.metadata
+def test_state_episode_id_changes():
+    env = ExplainerEnvironment()
+    env.reset()
+    eid1 = env.state.episode_id
+    env.reset()
+    eid2 = env.state.episode_id
+    assert eid1 != eid2
+def test_step_increments_count():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    assert env.state.step_count == 0
+    env.step(ExplainerAction(action_type="explore", query="test"))
+    assert env.state.step_count == 1
+    env.step(ExplainerAction(action_type="generate", format="marimo", code="x=1"))
+    assert env.state.step_count == 2
+def test_bad_code_does_not_crash():
+    env = ExplainerEnvironment()
+    env.reset(seed=1)
+    obs = env.step(ExplainerAction(
+        action_type="generate",
+        format="marimo",
+        code=")))syntax error(((",
+    ))
+    assert obs.done is True
+    assert "SYNTAX ERROR" in obs.feedback
+if __name__ == "__main__":
+    tests = [
+        test_reset_returns_observation,
+        test_reset_deterministic_with_seed,
+        test_explore_step,
+        test_explore_empty_query,
+        test_explore_max_steps,
+        test_explore_then_generate,
+        test_generate_without_explore_penalty,
+        test_step_without_reset,
+        test_generate_reward_in_metadata,
+        test_state_episode_id_changes,
+        test_step_increments_count,
+        test_bad_code_does_not_crash,
+    ]
+    passed = 0
+    for t in tests:
+        try:
+            t()
+            passed += 1
+        except Exception as e:
+            print(f"FAIL: {t.__name__}: {e}")
+    print(f"PASS: test_environment ({passed}/{len(tests)})")

tests/test_models.py ADDED Viewed

	@@ -0,0 +1,77 @@

+"""Tests for Action/Observation model creation and validation."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from models import ExplainerAction, ExplainerObservation
+def test_action_explore():
+    a = ExplainerAction(action_type="explore", query="attention mechanism")
+    assert a.action_type == "explore"
+    assert a.query == "attention mechanism"
+    assert a.code == ""
+    assert a.format is None
+def test_action_generate_marimo():
+    a = ExplainerAction(
+        action_type="generate",
+        format="marimo",
+        code="import marimo as mo\napp = mo.App()",
+    )
+    assert a.action_type == "generate"
+    assert a.format == "marimo"
+    assert a.narration == ""
+def test_action_generate_manim():
+    a = ExplainerAction(
+        action_type="generate",
+        format="manim",
+        code="from manim import *\nclass S(Scene): pass",
+        narration="First we show the scene.",
+    )
+    assert a.format == "manim"
+    assert a.narration != ""
+def test_observation_defaults():
+    obs = ExplainerObservation()
+    assert obs.topic == ""
+    assert obs.tier == "beginner"
+    assert obs.phase == "explore"
+    assert obs.explore_steps_left == 3
+    assert obs.done is False
+def test_observation_full():
+    obs = ExplainerObservation(
+        topic="Gradient Descent",
+        content="GD iteratively updates params.",
+        tier="intermediate",
+        keywords="gradient,learning rate",
+        data_available=True,
+        phase="generate",
+        feedback="looks good",
+        search_results="paper1...",
+        explored_context="accumulated...",
+        explore_steps_left=1,
+        done=True,
+        reward=0.85,
+    )
+    assert obs.topic == "Gradient Descent"
+    assert obs.phase == "generate"
+    assert obs.explore_steps_left == 1
+    assert obs.reward == 0.85
+if __name__ == "__main__":
+    test_action_explore()
+    test_action_generate_marimo()
+    test_action_generate_manim()
+    test_observation_defaults()
+    test_observation_full()
+    print("PASS: test_models (5/5)")

tests/test_rewards.py ADDED Viewed

	@@ -0,0 +1,217 @@

+"""Tests for reward components — exploration and generation."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from rewards.exploration import (
+    compute_explore_reward,
+    query_relevance,
+    research_breadth,
+    result_novelty,
+)
+from rewards.generation import (
+    compute_generate_reward,
+    context_usage,
+    format_match,
+    keyword_coverage,
+    marimo_structure,
+    narration_score,
+)
+from rewards.sandbox import ast_parses
+from task_bank import ALL_TASKS
+MARIMO_TASK = next(t for t in ALL_TASKS if t.topic == "Linear Regression")
+MANIM_TASK = next(t for t in ALL_TASKS if t.topic == "Fourier Transform")
+# --- Sandbox ---
+def test_ast_parses():
+    assert ast_parses("x = 1") is True
+    assert ast_parses("not python!!!") is False
+# --- Exploration rewards ---
+def test_query_relevance():
+    assert query_relevance("linear regression MSE", "Linear Regression", "linear regression,MSE") > 0.5
+    assert query_relevance("", "Linear Regression", "x") == 0.0
+    assert query_relevance("cats", "Linear Regression", "linear regression") < 0.3
+def test_result_novelty():
+    assert result_novelty("new information here", []) == 1.0
+    assert result_novelty("same words again", ["same words again"]) < 0.5
+    assert result_novelty("", []) == 0.0
+def test_research_breadth():
+    assert research_breadth([], min_sources=2) == 0.0
+    assert research_breadth(["a"], min_sources=2) == 0.5
+    assert research_breadth(["a", "b"], min_sources=2) == 1.0
+def test_explore_reward_integration():
+    reward, comp = compute_explore_reward(
+        query="linear regression least squares",
+        result_text="Linear regression minimizes squared error...",
+        topic="Linear Regression",
+        keywords_csv="linear regression,least squares,MSE",
+        task_content="Linear regression is a method for modeling the relationship between variables.",
+        accumulated_context=["first search result"],
+    )
+    assert reward > 0.1
+    assert "query_relevance" in comp
+    assert "result_novelty" in comp
+    assert "research_breadth" in comp
+    assert "content_sufficiency" in comp
+# --- Generation rewards ---
+def test_keyword_coverage():
+    assert keyword_coverage("linear regression MSE", "linear regression,MSE,gradient descent") > 0.5
+    assert keyword_coverage("nothing", "linear regression,MSE") == 0.0
+def test_format_match():
+    assert format_match("marimo", MARIMO_TASK) == 1.0
+    assert format_match("manim", MARIMO_TASK) == 0.3
+    # Task with preferred_format=None should score 1.0 for any format
+    no_pref_task = next(t for t in ALL_TASKS if t.preferred_format is None)
+    assert format_match("marimo", no_pref_task) == 1.0
+    assert format_match("manim", no_pref_task) == 1.0
+def test_narration_marimo():
+    assert narration_score("", "marimo") == 1.0
+def test_narration_manim():
+    assert narration_score("", "manim") == 0.0
+    long_narration = (
+        "First we introduce the concept. Next we show the graph. "
+        "Then we animate the transformation step by step. "
+        "Finally we summarize the key takeaways from this scene."
+    )
+    assert narration_score(long_narration, "manim") > 0.5
+def test_structure_marimo():
+    good = """import marimo as mo
+app = mo.App()
+@app.cell
+def _():
+    mo.md("# Regression")
+    return
+@app.cell
+def _():
+    import matplotlib.pyplot as plt
+    return
+@app.cell
+def _():
+    slider = mo.ui.slider(0, 5)
+    return
+"""
+    assert marimo_structure(good, MARIMO_TASK) > 0.5
+def test_context_usage():
+    assert context_usage("x = 1", []) == 0.5  # no context
+    assert context_usage(
+        "linear regression least squares gradient descent optimization",
+        ["linear regression least squares optimization methods"],
+    ) > 0.3
+def test_generate_reward_garbage():
+    reward, comp = compute_generate_reward(
+        code="not python!!!",
+        fmt="marimo",
+        narration="",
+        task=MARIMO_TASK,
+        exec_success=False,
+        accumulated_context=[],
+    )
+    assert reward < 0.4
+    assert comp["code_valid"] == 0.0
+def test_generate_reward_good():
+    code = """import marimo as mo
+app = mo.App()
+@app.cell
+def _():
+    mo.md("# Linear Regression")
+    return
+@app.cell
+def _():
+    import numpy as np
+    import matplotlib.pyplot as plt
+    # linear regression least squares MSE gradient descent weights bias
+    X = np.linspace(0, 10, 50)
+    y = 2 * X + 1
+    return X, y
+@app.cell
+def _(X, y):
+    slider = mo.ui.slider(0, 5, value=2, label="Slope")
+    return
+"""
+    reward, comp = compute_generate_reward(
+        code=code,
+        fmt="marimo",
+        narration="",
+        task=MARIMO_TASK,
+        exec_success=True,
+        accumulated_context=["linear regression least squares"],
+    )
+    assert reward > 0.6
+    assert comp["code_valid"] == 1.0
+    assert comp["code_runs"] == 1.0
+def test_generate_reward_wrong_format():
+    code = "import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n    return\n"
+    r_right, _ = compute_generate_reward(code, "marimo", "", MARIMO_TASK, False, [])
+    r_wrong, _ = compute_generate_reward(code, "manim", "", MARIMO_TASK, False, [])
+    assert r_right > r_wrong
+def test_reward_spread():
+    rewards = []
+    for task in ALL_TASKS[:5]:
+        for code in ["bad!!!", "x = 1", "import marimo as mo\napp = mo.App()"]:
+            r, _ = compute_generate_reward(code, "marimo", "", task, False, [])
+            rewards.append(r)
+    unique = set(round(r, 3) for r in rewards)
+    assert len(unique) >= 3
+if __name__ == "__main__":
+    tests = [
+        test_ast_parses,
+        test_query_relevance,
+        test_result_novelty,
+        test_research_breadth,
+        test_explore_reward_integration,
+        test_keyword_coverage,
+        test_format_match,
+        test_narration_marimo,
+        test_narration_manim,
+        test_structure_marimo,
+        test_context_usage,
+        test_generate_reward_garbage,
+        test_generate_reward_good,
+        test_generate_reward_wrong_format,
+        test_reward_spread,
+    ]
+    passed = 0
+    for t in tests:
+        try:
+            t()
+            passed += 1
+        except Exception as e:
+            print(f"FAIL: {t.__name__}: {e}")
+    print(f"PASS: test_rewards ({passed}/{len(tests)})")

tests/test_task_bank.py ADDED Viewed

	@@ -0,0 +1,58 @@

+"""Tests for task bank integrity."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+from task_bank import (
+    ALL_TASKS,
+    ALGORITHMS,
+    EASY_TASKS,
+    HARD_TASKS,
+    MATH_TOPICS,
+    MEDIUM_TASKS,
+    ML_CONCEPTS,
+    STATISTICS_TASKS,
+    Task,
+)
+def test_task_counts():
+    assert len(ML_CONCEPTS) >= 5, f"ML_CONCEPTS has {len(ML_CONCEPTS)} (need >=5)"
+    assert len(MATH_TOPICS) >= 5, f"MATH_TOPICS has {len(MATH_TOPICS)} (need >=5)"
+    assert len(ALGORITHMS) >= 3, f"ALGORITHMS has {len(ALGORITHMS)} (need >=3)"
+    assert len(STATISTICS_TASKS) >= 2, f"STATISTICS_TASKS has {len(STATISTICS_TASKS)} (need >=2)"
+    assert len(ALL_TASKS) == len(ML_CONCEPTS) + len(MATH_TOPICS) + len(ALGORITHMS) + len(STATISTICS_TASKS)
+def test_difficulty_partition():
+    assert len(EASY_TASKS) + len(MEDIUM_TASKS) + len(HARD_TASKS) == len(ALL_TASKS)
+    assert len(EASY_TASKS) > 0
+    assert len(MEDIUM_TASKS) > 0
+    assert len(HARD_TASKS) > 0
+def test_task_fields():
+    for t in ALL_TASKS:
+        assert isinstance(t, Task)
+        assert t.topic, f"empty topic: {t}"
+        assert t.content, f"empty content: {t}"
+        assert t.tier in ("beginner", "intermediate", "advanced"), f"bad tier: {t.tier}"
+        assert t.keywords, f"empty keywords: {t.topic}"
+        assert t.preferred_format in ("marimo", "manim", None), f"bad format: {t.preferred_format}"
+        assert t.difficulty in ("easy", "medium", "hard"), f"bad difficulty: {t.difficulty}"
+def test_both_formats_present():
+    formats = {t.preferred_format for t in ALL_TASKS}
+    assert "marimo" in formats, "no marimo tasks"
+    assert "manim" in formats, "no manim tasks"
+if __name__ == "__main__":
+    test_task_counts()
+    test_difficulty_partition()
+    test_task_fields()
+    test_both_formats_present()
+    print("PASS: test_task_bank (4/4)")

uv.lock CHANGED Viewed

@@ -544,14 +544,14 @@ wheels = [
 [[package]]
 name = "click"
-version = "8.3.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "colorama", marker = "sys_platform == 'win32'" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/bb/63/f9e1ea081ce35720d8b92acde70daaedace594dc93b693c869e0d5910718/click-8.3.3.tar.gz", hash = "sha256:398329ad4837b2ff7cbe1dd166a4c0f8900c3ca3a218de04466f38f6497f18a2", size = 328061, upload-time = "2026-04-22T15:11:27.506Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/ae/44/c1221527f6a71a01ec6fbad7fa78f1d50dfa02217385cf0fa3eec7087d59/click-8.3.3-py3-none-any.whl", hash = "sha256:a2bf429bb3033c89fa4936ffb35d5cb471e3719e1f3c8a7c3fff0b8314305613", size = 110502, upload-time = "2026-04-22T15:11:25.044Z" },
 ]
 [[package]]
@@ -696,62 +696,62 @@ toml = [
 [[package]]
 name = "cryptography"
-version = "46.0.7"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
     { name = "typing-extensions", marker = "python_full_version < '3.11'" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/47/93/ac8f3d5ff04d54bc814e961a43ae5b0b146154c89c61b47bb07557679b18/cryptography-46.0.7.tar.gz", hash = "sha256:e4cfd68c5f3e0bfdad0d38e023239b96a2fe84146481852dffbcca442c245aa5", size = 750652, upload-time = "2026-04-08T01:57:54.692Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/0b/5d/4a8f770695d73be252331e60e526291e3df0c9b27556a90a6b47bccca4c2/cryptography-46.0.7-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:ea42cbe97209df307fdc3b155f1b6fa2577c0defa8f1f7d3be7d31d189108ad4", size = 7179869, upload-time = "2026-04-08T01:56:17.157Z" },
-    { url = "https://files.pythonhosted.org/packages/5f/45/6d80dc379b0bbc1f9d1e429f42e4cb9e1d319c7a8201beffd967c516ea01/cryptography-46.0.7-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b36a4695e29fe69215d75960b22577197aca3f7a25b9cf9d165dcfe9d80bc325", size = 4275492, upload-time = "2026-04-08T01:56:19.36Z" },
-    { url = "https://files.pythonhosted.org/packages/4a/9a/1765afe9f572e239c3469f2cb429f3ba7b31878c893b246b4b2994ffe2fe/cryptography-46.0.7-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ad9ef796328c5e3c4ceed237a183f5d41d21150f972455a9d926593a1dcb308", size = 4426670, upload-time = "2026-04-08T01:56:21.415Z" },
-    { url = "https://files.pythonhosted.org/packages/8f/3e/af9246aaf23cd4ee060699adab1e47ced3f5f7e7a8ffdd339f817b446462/cryptography-46.0.7-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:73510b83623e080a2c35c62c15298096e2a5dc8d51c3b4e1740211839d0dea77", size = 4280275, upload-time = "2026-04-08T01:56:23.539Z" },
-    { url = "https://files.pythonhosted.org/packages/0f/54/6bbbfc5efe86f9d71041827b793c24811a017c6ac0fd12883e4caa86b8ed/cryptography-46.0.7-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cbd5fb06b62bd0721e1170273d3f4d5a277044c47ca27ee257025146c34cbdd1", size = 4928402, upload-time = "2026-04-08T01:56:25.624Z" },
-    { url = "https://files.pythonhosted.org/packages/2d/cf/054b9d8220f81509939599c8bdbc0c408dbd2bdd41688616a20731371fe0/cryptography-46.0.7-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:420b1e4109cc95f0e5700eed79908cef9268265c773d3a66f7af1eef53d409ef", size = 4459985, upload-time = "2026-04-08T01:56:27.309Z" },
-    { url = "https://files.pythonhosted.org/packages/f9/46/4e4e9c6040fb01c7467d47217d2f882daddeb8828f7df800cb806d8a2288/cryptography-46.0.7-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:24402210aa54baae71d99441d15bb5a1919c195398a87b563df84468160a65de", size = 3990652, upload-time = "2026-04-08T01:56:29.095Z" },
-    { url = "https://files.pythonhosted.org/packages/36/5f/313586c3be5a2fbe87e4c9a254207b860155a8e1f3cca99f9910008e7d08/cryptography-46.0.7-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:8a469028a86f12eb7d2fe97162d0634026d92a21f3ae0ac87ed1c4a447886c83", size = 4279805, upload-time = "2026-04-08T01:56:30.928Z" },
-    { url = "https://files.pythonhosted.org/packages/69/33/60dfc4595f334a2082749673386a4d05e4f0cf4df8248e63b2c3437585f2/cryptography-46.0.7-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9694078c5d44c157ef3162e3bf3946510b857df5a3955458381d1c7cfc143ddb", size = 4892883, upload-time = "2026-04-08T01:56:32.614Z" },
-    { url = "https://files.pythonhosted.org/packages/c7/0b/333ddab4270c4f5b972f980adef4faa66951a4aaf646ca067af597f15563/cryptography-46.0.7-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:42a1e5f98abb6391717978baf9f90dc28a743b7d9be7f0751a6f56a75d14065b", size = 4459756, upload-time = "2026-04-08T01:56:34.306Z" },
-    { url = "https://files.pythonhosted.org/packages/d2/14/633913398b43b75f1234834170947957c6b623d1701ffc7a9600da907e89/cryptography-46.0.7-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91bbcb08347344f810cbe49065914fe048949648f6bd5c2519f34619142bbe85", size = 4410244, upload-time = "2026-04-08T01:56:35.977Z" },
-    { url = "https://files.pythonhosted.org/packages/10/f2/19ceb3b3dc14009373432af0c13f46aa08e3ce334ec6eff13492e1812ccd/cryptography-46.0.7-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5d1c02a14ceb9148cc7816249f64f623fbfee39e8c03b3650d842ad3f34d637e", size = 4674868, upload-time = "2026-04-08T01:56:38.034Z" },
-    { url = "https://files.pythonhosted.org/packages/1a/bb/a5c213c19ee94b15dfccc48f363738633a493812687f5567addbcbba9f6f/cryptography-46.0.7-cp311-abi3-win32.whl", hash = "sha256:d23c8ca48e44ee015cd0a54aeccdf9f09004eba9fc96f38c911011d9ff1bd457", size = 3026504, upload-time = "2026-04-08T01:56:39.666Z" },
-    { url = "https://files.pythonhosted.org/packages/2b/02/7788f9fefa1d060ca68717c3901ae7fffa21ee087a90b7f23c7a603c32ae/cryptography-46.0.7-cp311-abi3-win_amd64.whl", hash = "sha256:397655da831414d165029da9bc483bed2fe0e75dde6a1523ec2fe63f3c46046b", size = 3488363, upload-time = "2026-04-08T01:56:41.893Z" },
-    { url = "https://files.pythonhosted.org/packages/7b/56/15619b210e689c5403bb0540e4cb7dbf11a6bf42e483b7644e471a2812b3/cryptography-46.0.7-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:d151173275e1728cf7839aaa80c34fe550c04ddb27b34f48c232193df8db5842", size = 7119671, upload-time = "2026-04-08T01:56:44Z" },
-    { url = "https://files.pythonhosted.org/packages/74/66/e3ce040721b0b5599e175ba91ab08884c75928fbeb74597dd10ef13505d2/cryptography-46.0.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:db0f493b9181c7820c8134437eb8b0b4792085d37dbb24da050476ccb664e59c", size = 4268551, upload-time = "2026-04-08T01:56:46.071Z" },
-    { url = "https://files.pythonhosted.org/packages/03/11/5e395f961d6868269835dee1bafec6a1ac176505a167f68b7d8818431068/cryptography-46.0.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ebd6daf519b9f189f85c479427bbd6e9c9037862cf8fe89ee35503bd209ed902", size = 4408887, upload-time = "2026-04-08T01:56:47.718Z" },
-    { url = "https://files.pythonhosted.org/packages/40/53/8ed1cf4c3b9c8e611e7122fb56f1c32d09e1fff0f1d77e78d9ff7c82653e/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b7b412817be92117ec5ed95f880defe9cf18a832e8cafacf0a22337dc1981b4d", size = 4271354, upload-time = "2026-04-08T01:56:49.312Z" },
-    { url = "https://files.pythonhosted.org/packages/50/46/cf71e26025c2e767c5609162c866a78e8a2915bbcfa408b7ca495c6140c4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:fbfd0e5f273877695cb93baf14b185f4878128b250cc9f8e617ea0c025dfb022", size = 4905845, upload-time = "2026-04-08T01:56:50.916Z" },
-    { url = "https://files.pythonhosted.org/packages/c0/ea/01276740375bac6249d0a971ebdf6b4dc9ead0ee0a34ef3b5a88c1a9b0d4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:ffca7aa1d00cf7d6469b988c581598f2259e46215e0140af408966a24cf086ce", size = 4444641, upload-time = "2026-04-08T01:56:52.882Z" },
-    { url = "https://files.pythonhosted.org/packages/3d/4c/7d258f169ae71230f25d9f3d06caabcff8c3baf0978e2b7d65e0acac3827/cryptography-46.0.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:60627cf07e0d9274338521205899337c5d18249db56865f943cbe753aa96f40f", size = 3967749, upload-time = "2026-04-08T01:56:54.597Z" },
-    { url = "https://files.pythonhosted.org/packages/b5/2a/2ea0767cad19e71b3530e4cad9605d0b5e338b6a1e72c37c9c1ceb86c333/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:80406c3065e2c55d7f49a9550fe0c49b3f12e5bfff5dedb727e319e1afb9bf99", size = 4270942, upload-time = "2026-04-08T01:56:56.416Z" },
-    { url = "https://files.pythonhosted.org/packages/41/3d/fe14df95a83319af25717677e956567a105bb6ab25641acaa093db79975d/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:c5b1ccd1239f48b7151a65bc6dd54bcfcc15e028c8ac126d3fada09db0e07ef1", size = 4871079, upload-time = "2026-04-08T01:56:58.31Z" },
-    { url = "https://files.pythonhosted.org/packages/9c/59/4a479e0f36f8f378d397f4eab4c850b4ffb79a2f0d58704b8fa0703ddc11/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:d5f7520159cd9c2154eb61eb67548ca05c5774d39e9c2c4339fd793fe7d097b2", size = 4443999, upload-time = "2026-04-08T01:57:00.508Z" },
-    { url = "https://files.pythonhosted.org/packages/28/17/b59a741645822ec6d04732b43c5d35e4ef58be7bfa84a81e5ae6f05a1d33/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fcd8eac50d9138c1d7fc53a653ba60a2bee81a505f9f8850b6b2888555a45d0e", size = 4399191, upload-time = "2026-04-08T01:57:02.654Z" },
-    { url = "https://files.pythonhosted.org/packages/59/6a/bb2e166d6d0e0955f1e9ff70f10ec4b2824c9cfcdb4da772c7dd69cc7d80/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:65814c60f8cc400c63131584e3e1fad01235edba2614b61fbfbfa954082db0ee", size = 4655782, upload-time = "2026-04-08T01:57:04.592Z" },
-    { url = "https://files.pythonhosted.org/packages/95/b6/3da51d48415bcb63b00dc17c2eff3a651b7c4fed484308d0f19b30e8cb2c/cryptography-46.0.7-cp314-cp314t-win32.whl", hash = "sha256:fdd1736fed309b4300346f88f74cd120c27c56852c3838cab416e7a166f67298", size = 3002227, upload-time = "2026-04-08T01:57:06.91Z" },
-    { url = "https://files.pythonhosted.org/packages/32/a8/9f0e4ed57ec9cebe506e58db11ae472972ecb0c659e4d52bbaee80ca340a/cryptography-46.0.7-cp314-cp314t-win_amd64.whl", hash = "sha256:e06acf3c99be55aa3b516397fe42f5855597f430add9c17fa46bf2e0fb34c9bb", size = 3475332, upload-time = "2026-04-08T01:57:08.807Z" },
-    { url = "https://files.pythonhosted.org/packages/a7/7f/cd42fc3614386bc0c12f0cb3c4ae1fc2bbca5c9662dfed031514911d513d/cryptography-46.0.7-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:462ad5cb1c148a22b2e3bcc5ad52504dff325d17daf5df8d88c17dda1f75f2a4", size = 7165618, upload-time = "2026-04-08T01:57:10.645Z" },
-    { url = "https://files.pythonhosted.org/packages/a5/d0/36a49f0262d2319139d2829f773f1b97ef8aef7f97e6e5bd21455e5a8fb5/cryptography-46.0.7-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:84d4cced91f0f159a7ddacad249cc077e63195c36aac40b4150e7a57e84fffe7", size = 4270628, upload-time = "2026-04-08T01:57:12.885Z" },
-    { url = "https://files.pythonhosted.org/packages/8a/6c/1a42450f464dda6ffbe578a911f773e54dd48c10f9895a23a7e88b3e7db5/cryptography-46.0.7-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:128c5edfe5e5938b86b03941e94fac9ee793a94452ad1365c9fc3f4f62216832", size = 4415405, upload-time = "2026-04-08T01:57:14.923Z" },
-    { url = "https://files.pythonhosted.org/packages/9a/92/4ed714dbe93a066dc1f4b4581a464d2d7dbec9046f7c8b7016f5286329e2/cryptography-46.0.7-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:5e51be372b26ef4ba3de3c167cd3d1022934bc838ae9eaad7e644986d2a3d163", size = 4272715, upload-time = "2026-04-08T01:57:16.638Z" },
-    { url = "https://files.pythonhosted.org/packages/b7/e6/a26b84096eddd51494bba19111f8fffe976f6a09f132706f8f1bf03f51f7/cryptography-46.0.7-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cdf1a610ef82abb396451862739e3fc93b071c844399e15b90726ef7470eeaf2", size = 4918400, upload-time = "2026-04-08T01:57:19.021Z" },
-    { url = "https://files.pythonhosted.org/packages/c7/08/ffd537b605568a148543ac3c2b239708ae0bd635064bab41359252ef88ed/cryptography-46.0.7-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:1d25aee46d0c6f1a501adcddb2d2fee4b979381346a78558ed13e50aa8a59067", size = 4450634, upload-time = "2026-04-08T01:57:21.185Z" },
-    { url = "https://files.pythonhosted.org/packages/16/01/0cd51dd86ab5b9befe0d031e276510491976c3a80e9f6e31810cce46c4ad/cryptography-46.0.7-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:cdfbe22376065ffcf8be74dc9a909f032df19bc58a699456a21712d6e5eabfd0", size = 3985233, upload-time = "2026-04-08T01:57:22.862Z" },
-    { url = "https://files.pythonhosted.org/packages/92/49/819d6ed3a7d9349c2939f81b500a738cb733ab62fbecdbc1e38e83d45e12/cryptography-46.0.7-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:abad9dac36cbf55de6eb49badd4016806b3165d396f64925bf2999bcb67837ba", size = 4271955, upload-time = "2026-04-08T01:57:24.814Z" },
-    { url = "https://files.pythonhosted.org/packages/80/07/ad9b3c56ebb95ed2473d46df0847357e01583f4c52a85754d1a55e29e4d0/cryptography-46.0.7-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:935ce7e3cfdb53e3536119a542b839bb94ec1ad081013e9ab9b7cfd478b05006", size = 4879888, upload-time = "2026-04-08T01:57:26.88Z" },
-    { url = "https://files.pythonhosted.org/packages/b8/c7/201d3d58f30c4c2bdbe9b03844c291feb77c20511cc3586daf7edc12a47b/cryptography-46.0.7-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:35719dc79d4730d30f1c2b6474bd6acda36ae2dfae1e3c16f2051f215df33ce0", size = 4449961, upload-time = "2026-04-08T01:57:29.068Z" },
-    { url = "https://files.pythonhosted.org/packages/a5/ef/649750cbf96f3033c3c976e112265c33906f8e462291a33d77f90356548c/cryptography-46.0.7-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7bbc6ccf49d05ac8f7d7b5e2e2c33830d4fe2061def88210a126d130d7f71a85", size = 4401696, upload-time = "2026-04-08T01:57:31.029Z" },
-    { url = "https://files.pythonhosted.org/packages/41/52/a8908dcb1a389a459a29008c29966c1d552588d4ae6d43f3a1a4512e0ebe/cryptography-46.0.7-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a1529d614f44b863a7b480c6d000fe93b59acee9c82ffa027cfadc77521a9f5e", size = 4664256, upload-time = "2026-04-08T01:57:33.144Z" },
-    { url = "https://files.pythonhosted.org/packages/4b/fa/f0ab06238e899cc3fb332623f337a7364f36f4bb3f2534c2bb95a35b132c/cryptography-46.0.7-cp38-abi3-win32.whl", hash = "sha256:f247c8c1a1fb45e12586afbb436ef21ff1e80670b2861a90353d9b025583d246", size = 3013001, upload-time = "2026-04-08T01:57:34.933Z" },
-    { url = "https://files.pythonhosted.org/packages/d2/f1/00ce3bde3ca542d1acd8f8cfa38e446840945aa6363f9b74746394b14127/cryptography-46.0.7-cp38-abi3-win_amd64.whl", hash = "sha256:506c4ff91eff4f82bdac7633318a526b1d1309fc07ca76a3ad182cb5b686d6d3", size = 3472985, upload-time = "2026-04-08T01:57:36.714Z" },
-    { url = "https://files.pythonhosted.org/packages/63/0c/dca8abb64e7ca4f6b2978769f6fea5ad06686a190cec381f0a796fdcaaba/cryptography-46.0.7-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:fc9ab8856ae6cf7c9358430e49b368f3108f050031442eaeb6b9d87e4dcf4e4f", size = 3476879, upload-time = "2026-04-08T01:57:38.664Z" },
-    { url = "https://files.pythonhosted.org/packages/3a/ea/075aac6a84b7c271578d81a2f9968acb6e273002408729f2ddff517fed4a/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:d3b99c535a9de0adced13d159c5a9cf65c325601aa30f4be08afd680643e9c15", size = 4219700, upload-time = "2026-04-08T01:57:40.625Z" },
-    { url = "https://files.pythonhosted.org/packages/6c/7b/1c55db7242b5e5612b29fc7a630e91ee7a6e3c8e7bf5406d22e206875fbd/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:d02c738dacda7dc2a74d1b2b3177042009d5cab7c7079db74afc19e56ca1b455", size = 4385982, upload-time = "2026-04-08T01:57:42.725Z" },
-    { url = "https://files.pythonhosted.org/packages/cb/da/9870eec4b69c63ef5925bf7d8342b7e13bc2ee3d47791461c4e49ca212f4/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:04959522f938493042d595a736e7dbdff6eb6cc2339c11465b3ff89343b65f65", size = 4219115, upload-time = "2026-04-08T01:57:44.939Z" },
-    { url = "https://files.pythonhosted.org/packages/f4/72/05aa5832b82dd341969e9a734d1812a6aadb088d9eb6f0430fc337cc5a8f/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:3986ac1dee6def53797289999eabe84798ad7817f3e97779b5061a95b0ee4968", size = 4385479, upload-time = "2026-04-08T01:57:46.86Z" },
-    { url = "https://files.pythonhosted.org/packages/20/2a/1b016902351a523aa2bd446b50a5bc1175d7a7d1cf90fe2ef904f9b84ebc/cryptography-46.0.7-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:258514877e15963bd43b558917bc9f54cf7cf866c38aa576ebf47a77ddbc43a4", size = 3412829, upload-time = "2026-04-08T01:57:48.874Z" },
 ]
 [[package]]
@@ -1136,7 +1136,7 @@ wheels = [
 [[package]]
 name = "huggingface-hub"
-version = "1.11.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "filelock" },
@@ -1149,9 +1149,9 @@ dependencies = [
     { name = "typer" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/dc/89/e7aa12d8a6b9259bed10671abb25ae6fa437c0f88a86ecbf59617bae7759/huggingface_hub-1.11.0.tar.gz", hash = "sha256:15fb3713c7f9cdff7b808a94fd91664f661ab142796bb48c9cd9493e8d166278", size = 761749, upload-time = "2026-04-16T13:07:39.73Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/37/02/4f3f8997d1ea7fe0146b343e5e14bd065fa87af790d07e5576d31b31cc18/huggingface_hub-1.11.0-py3-none-any.whl", hash = "sha256:42a6de0afbfeb5e022222d36398f029679db4eb4778801aafda32257ae9131ab", size = 645499, upload-time = "2026-04-16T13:07:37.716Z" },
 ]
 [[package]]
@@ -1751,7 +1751,7 @@ wheels = [
 [[package]]
 name = "marimo"
-version = "0.23.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "click" },
@@ -1774,9 +1774,9 @@ dependencies = [
     { name = "uvicorn" },
     { name = "websockets" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/31/7f/8490043913942e48e8f21b88ad49fe736c2a236ff2a393c8ae67724105f2/marimo-0.23.2.tar.gz", hash = "sha256:25d810b4864d534c1cf33eb3020320e8ef7319e9809b3fb6ae5644135f5a660a", size = 38383208, upload-time = "2026-04-20T21:49:06.137Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/b2/23/c4d34eb5da111b0f31d2feeddf7ee619ce4883999dbf360ed781b327dc0f/marimo-0.23.2-py3-none-any.whl", hash = "sha256:b1fbf5684fbb20d987d9ce6f569fd32789693ff4fd59a5678a0598b680cc41ba", size = 38801200, upload-time = "2026-04-20T21:48:58.165Z" },
 ]
 [[package]]
@@ -2311,10 +2311,13 @@ name = "openenv-explainer-env"
 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
     { name = "manim", version = "0.19.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
     { name = "manim", version = "0.20.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
     { name = "marimo" },
     { name = "openenv-core", extra = ["core"] },
 ]
 [package.optional-dependencies]
@@ -2325,25 +2328,31 @@ dev = [
 [package.metadata]
 requires-dist = [
     { name = "manim", specifier = ">=0.18.0" },
     { name = "marimo", specifier = ">=0.10.0" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 ]
 provides-extras = ["dev"]
 [[package]]
 name = "opentelemetry-api"
-version = "1.41.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "importlib-metadata" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/47/8e/3778a7e87801d994869a9396b9fc2a289e5f9be91ff54a27d41eace494b0/opentelemetry_api-1.41.0.tar.gz", hash = "sha256:9421d911326ec12dee8bc933f7839090cad7a3f13fcfb0f9e82f8174dc003c09", size = 71416, upload-time = "2026-04-09T14:38:34.544Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/58/ee/99ab786653b3bda9c37ade7e24a7b607a1b1f696063172768417539d876d/opentelemetry_api-1.41.0-py3-none-any.whl", hash = "sha256:0e77c806e6a89c9e4f8d372034622f3e1418a11bdbe1c80a50b3d3397ad0fa4f", size = 69007, upload-time = "2026-04-09T14:38:11.833Z" },
 ]
 [[package]]
@@ -2429,11 +2438,11 @@ wheels = [
 [[package]]
 name = "packaging"
-version = "26.1"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/df/de/0d2b39fb4af88a0258f3bac87dfcbb48e73fbdea4a2ed0e2213f9a4c2f9a/packaging-26.1.tar.gz", hash = "sha256:f042152b681c4bfac5cae2742a55e103d27ab2ec0f3d88037136b6bfe7c9c5de", size = 215519, upload-time = "2026-04-14T21:12:49.362Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/7a/c2/920ef838e2f0028c8262f16101ec09ebd5969864e5a64c4c05fad0617c56/packaging-26.1-py3-none-any.whl", hash = "sha256:5d9c0669c6285e491e0ced2eee587eaf67b670d94a19e94e3984a481aba6802f", size = 95831, upload-time = "2026-04-14T21:12:47.56Z" },
 ]
 [[package]]
@@ -3087,7 +3096,7 @@ name = "pyobjc-framework-cocoa"
 version = "12.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "pyobjc-core" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/02/a3/16ca9a15e77c061a9250afbae2eae26f2e1579eb8ca9462ae2d2c71e1169/pyobjc_framework_cocoa-12.1.tar.gz", hash = "sha256:5556c87db95711b985d5efdaaf01c917ddd41d148b1e52a0c66b1a2e2c5c1640", size = 2772191, upload-time = "2025-11-14T10:13:02.069Z" }
 wheels = [
@@ -3810,6 +3819,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/48/2c/6c9bb53db56c8a12a736d2158a8b842a5993b96daabc29d90a098e840280/svgelements-1.9.6-py2.py3-none-any.whl", hash = "sha256:8a5cf2cc066d98e713d5b875b1d6e5eeb9b92e855e835ebd7caab2713ae1dcad", size = 137856, upload-time = "2023-08-17T02:01:48.76Z" },
 ]
 [[package]]
 name = "tomli"
 version = "2.4.1"
@@ -3932,11 +3950,11 @@ wheels = [
 [[package]]
 name = "tzdata"
-version = "2026.1"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/19/f5/cd531b2d15a671a40c0f66cf06bc3570a12cd56eef98960068ebbad1bf5a/tzdata-2026.1.tar.gz", hash = "sha256:67658a1903c75917309e753fdc349ac0efd8c27db7a0cb406a25be4840f87f98", size = 197639, upload-time = "2026-04-03T11:25:22.002Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/b0/70/d460bd685a170790ec89317e9bd33047988e4bce507b831f5db771e142de/tzdata-2026.1-py2.py3-none-any.whl", hash = "sha256:4b1d2be7ac37ceafd7327b961aa3a54e467efbdb563a23655fbfe0d39cfc42a9", size = 348952, upload-time = "2026-04-03T11:25:20.313Z" },
 ]
 [[package]]
@@ -4174,6 +4192,20 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
 ]
 [[package]]
 name = "zipp"
 version = "3.23.1"

 [[package]]
 name = "click"
+version = "8.3.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "colorama", marker = "sys_platform == 'win32'" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/57/75/31212c6bf2503fdf920d87fee5d7a86a2e3bcf444984126f13d8e4016804/click-8.3.2.tar.gz", hash = "sha256:14162b8b3b3550a7d479eafa77dfd3c38d9dc8951f6f69c78913a8f9a7540fd5", size = 302856, upload-time = "2026-04-03T19:14:45.118Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/e4/20/71885d8b97d4f3dde17b1fdb92dbd4908b00541c5a3379787137285f602e/click-8.3.2-py3-none-any.whl", hash = "sha256:1924d2c27c5653561cd2cae4548d1406039cb79b858b747cfea24924bbc1616d", size = 108379, upload-time = "2026-04-03T19:14:43.505Z" },
 ]
 [[package]]
 [[package]]
 name = "cryptography"
+version = "47.0.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
     { name = "typing-extensions", marker = "python_full_version < '3.11'" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/ef/b2/7ffa7fe8207a8c42147ffe70c3e360b228160c1d85dc3faff16aaa3244c0/cryptography-47.0.0.tar.gz", hash = "sha256:9f8e55fe4e63613a5e1cc5819030f27b97742d720203a087802ce4ce9ceb52bb", size = 830863, upload-time = "2026-04-24T19:54:57.056Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a4/98/40dfe932134bdcae4f6ab5927c87488754bf9eb79297d7e0070b78dd58e9/cryptography-47.0.0-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:160ad728f128972d362e714054f6ba0067cab7fb350c5202a9ae8ae4ce3ef1a0", size = 7912214, upload-time = "2026-04-24T19:53:03.864Z" },
+    { url = "https://files.pythonhosted.org/packages/34/c6/2733531243fba725f58611b918056b277692f1033373dcc8bd01af1c05d4/cryptography-47.0.0-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b9a8943e359b7615db1a3ba587994618e094ff3d6fa5a390c73d079ce18b3973", size = 4644617, upload-time = "2026-04-24T19:53:06.909Z" },
+    { url = "https://files.pythonhosted.org/packages/00/e3/b27be1a670a9b87f855d211cf0e1174a5d721216b7616bd52d8581d912ed/cryptography-47.0.0-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f5c15764f261394b22aef6b00252f5195f46f2ca300bec57149474e2538b31f8", size = 4668186, upload-time = "2026-04-24T19:53:09.053Z" },
+    { url = "https://files.pythonhosted.org/packages/81/b9/8443cfe5d17d482d348cee7048acf502bb89a51b6382f06240fd290d4ca3/cryptography-47.0.0-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:9c59ab0e0fa3a180a5a9c59f3a5abe3ef90d474bc56d7fadfbe80359491b615b", size = 4651244, upload-time = "2026-04-24T19:53:11.217Z" },
+    { url = "https://files.pythonhosted.org/packages/5d/5e/13ed0cdd0eb88ba159d6dd5ebfece8cb901dbcf1ae5ac4072e28b55d3153/cryptography-47.0.0-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:34b4358b925a5ea3e14384ca781a2c0ef7ac219b57bb9eacc4457078e2b19f92", size = 5252906, upload-time = "2026-04-24T19:53:13.532Z" },
+    { url = "https://files.pythonhosted.org/packages/64/16/ed058e1df0f33d440217cd120d41d5dda9dd215a80b8187f68483185af82/cryptography-47.0.0-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0024b87d47ae2399165a6bfb20d24888881eeab83ae2566d62467c5ff0030ce7", size = 4701842, upload-time = "2026-04-24T19:53:15.618Z" },
+    { url = "https://files.pythonhosted.org/packages/02/e0/3d30986b30fdbd9e969abbdf8ba00ed0618615144341faeb57f395a084fe/cryptography-47.0.0-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:1e47422b5557bb82d3fff997e8d92cff4e28b9789576984f08c248d2b3535d93", size = 4289313, upload-time = "2026-04-24T19:53:17.755Z" },
+    { url = "https://files.pythonhosted.org/packages/df/fd/32db38e3ad0cb331f0691cb4c7a8a6f176f679124dee746b3af6633db4d9/cryptography-47.0.0-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:6f29f36582e6151d9686235e586dd35bb67491f024767d10b842e520dc6a07ac", size = 4650964, upload-time = "2026-04-24T19:53:20.062Z" },
+    { url = "https://files.pythonhosted.org/packages/86/53/5395d944dfd48cb1f67917f533c609c34347185ef15eb4308024c876f274/cryptography-47.0.0-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:a9b761f012a943b7de0e828843c5688d0de94a0578d44d6c85a1bae32f87791f", size = 5207817, upload-time = "2026-04-24T19:53:22.498Z" },
+    { url = "https://files.pythonhosted.org/packages/34/4f/e5711b28e1901f7d480a2b1b688b645aa4c77c73f10731ed17e7f7db3f0d/cryptography-47.0.0-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4e1de79e047e25d6e9f8cea71c86b4a53aced64134f0f003bbcbf3655fd172c8", size = 4701544, upload-time = "2026-04-24T19:53:24.356Z" },
+    { url = "https://files.pythonhosted.org/packages/22/22/c8ddc25de3010fc8da447648f5a092c40e7a8fadf01dd6d255d9c0b9373d/cryptography-47.0.0-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ef6b3634087f18d2155b1e8ce264e5345a753da2c5fa9815e7d41315c90f8318", size = 4783536, upload-time = "2026-04-24T19:53:26.665Z" },
+    { url = "https://files.pythonhosted.org/packages/66/b6/d4a68f4ea999c6d89e8498579cba1c5fcba4276284de7773b17e4fa69293/cryptography-47.0.0-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:11dbb9f50a0f1bb9757b3d8c27c1101780efb8f0bdecfb12439c22a74d64c001", size = 4926106, upload-time = "2026-04-24T19:53:28.686Z" },
+    { url = "https://files.pythonhosted.org/packages/54/ed/5f524db1fade9c013aa618e1c99c6ed05e8ffc9ceee6cda22fed22dda3f4/cryptography-47.0.0-cp311-abi3-win32.whl", hash = "sha256:7fda2f02c9015db3f42bb8a22324a454516ed10a8c29ca6ece6cdbb5efe2a203", size = 3258581, upload-time = "2026-04-24T19:53:31.058Z" },
+    { url = "https://files.pythonhosted.org/packages/b2/dc/1b901990b174786569029f67542b3edf72ac068b6c3c8683c17e6a2f5363/cryptography-47.0.0-cp311-abi3-win_amd64.whl", hash = "sha256:f5c3296dab66202f1b18a91fa266be93d6aa0c2806ea3d67762c69f60adc71aa", size = 3775309, upload-time = "2026-04-24T19:53:33.054Z" },
+    { url = "https://files.pythonhosted.org/packages/14/88/7aa18ad9c11bc87689affa5ce4368d884b517502d75739d475fc6f4a03c7/cryptography-47.0.0-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:be12cb6a204f77ed968bcefe68086eb061695b540a3dd05edac507a3111b25f0", size = 7904299, upload-time = "2026-04-24T19:53:35.003Z" },
+    { url = "https://files.pythonhosted.org/packages/07/55/c18f75724544872f234678fdedc871391722cb34a2aee19faa9f63100bb2/cryptography-47.0.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2ebd84adf0728c039a3be2700289378e1c164afc6748df1a5ed456767bef9ba7", size = 4631180, upload-time = "2026-04-24T19:53:37.517Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/65/31a5cc0eaca99cec5bafffe155d407115d96136bb161e8b49e0ef73f09a7/cryptography-47.0.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7f68d6fbc7fbbcfb0939fea72c3b96a9f9a6edfc0e1b1d29778a2066030418b1", size = 4653529, upload-time = "2026-04-24T19:53:39.775Z" },
+    { url = "https://files.pythonhosted.org/packages/e5/bc/641c0519a495f3bfd0421b48d7cd325c4336578523ccd76ea322b6c29c7a/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:6651d32eff255423503aa276739da98c30f26c40cbeffcc6048e0d54ef704c0c", size = 4638570, upload-time = "2026-04-24T19:53:42.129Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/f2/300327b0a47f6dc94dd8b71b57052aefe178bb51745073d73d80604f11ab/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:3fb8fa48075fad7193f2e5496135c6a76ac4b2aa5a38433df0a539296b377829", size = 5238019, upload-time = "2026-04-24T19:53:44.577Z" },
+    { url = "https://files.pythonhosted.org/packages/e9/5a/5b5cf994391d4bf9d9c7efd4c66aabe4d95227256627f8fea6cff7dfadbd/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:11438c7518132d95f354fa01a4aa2f806d172a061a7bed18cf18cbdacdb204d7", size = 4686832, upload-time = "2026-04-24T19:53:47.015Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/2c/ae950e28fd6475c852fc21a44db3e6b5bcc1261d1e370f2b6e42fa800fef/cryptography-47.0.0-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:8c1a736bbb3288005796c3f7ccb9453360d7fed483b13b9f468aea5171432923", size = 4269301, upload-time = "2026-04-24T19:53:48.97Z" },
+    { url = "https://files.pythonhosted.org/packages/67/fb/6a39782e150ffe5cc1b0018cb6ddc48bf7ca62b498d7539ffc8a758e977d/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:f1557695e5c2b86e204f6ce9470497848634100787935ab7adc5397c54abd7ab", size = 4638110, upload-time = "2026-04-24T19:53:51.011Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/d7/0b3c71090a76e5c203164a47688b697635ece006dcd2499ab3a4dbd3f0bd/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:f9a034b642b960767fb343766ae5ba6ad653f2e890ddd82955aef288ffea8736", size = 5194988, upload-time = "2026-04-24T19:53:52.962Z" },
+    { url = "https://files.pythonhosted.org/packages/63/33/63a961498a9df51721ab578c5a2622661411fc520e00bd83b0cc64eb20c4/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:b1c76fca783aa7698eb21eb14f9c4aa09452248ee54a627d125025a43f83e7a7", size = 4686563, upload-time = "2026-04-24T19:53:55.274Z" },
+    { url = "https://files.pythonhosted.org/packages/b7/bf/5ee5b145248f92250de86145d1c1d6edebbd57a7fe7caa4dedb5d4cf06a1/cryptography-47.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:4f7722c97826770bab8ae92959a2e7b20a5e9e9bf4deae68fd86c3ca457bab52", size = 4770094, upload-time = "2026-04-24T19:53:57.753Z" },
+    { url = "https://files.pythonhosted.org/packages/92/43/21d220b2da5d517773894dacdcdb5c682c28d3fffce65548cb06e87d5501/cryptography-47.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:09f6d7bf6724f8db8b32f11eccf23efc8e759924bc5603800335cf8859a3ddbd", size = 4913811, upload-time = "2026-04-24T19:54:00.236Z" },
+    { url = "https://files.pythonhosted.org/packages/31/98/dc4ad376ac5f1a1a7d4a83f7b0c6f2bcad36b5d2d8f30aeb482d3a7d9582/cryptography-47.0.0-cp314-cp314t-win32.whl", hash = "sha256:6eebcaf0df1d21ce1f90605c9b432dd2c4f4ab665ac29a40d5e3fc68f51b5e63", size = 3237158, upload-time = "2026-04-24T19:54:02.606Z" },
+    { url = "https://files.pythonhosted.org/packages/bc/da/97f62d18306b5133468bc3f8cc73a3111e8cdc8cf8d3e69474d6e5fd2d1b/cryptography-47.0.0-cp314-cp314t-win_amd64.whl", hash = "sha256:51c9313e90bd1690ec5a75ed047c27c0b8e6c570029712943d6116ef9a90620b", size = 3758706, upload-time = "2026-04-24T19:54:04.433Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/34/a4fae8ae7c3bc227460c9ae43f56abf1b911da0ec29e0ebac53bb0a4b6b7/cryptography-47.0.0-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:14432c8a9bcb37009784f9594a62fae211a2ae9543e96c92b2a8e4c3cd5cd0c4", size = 7904072, upload-time = "2026-04-24T19:54:06.411Z" },
+    { url = "https://files.pythonhosted.org/packages/01/64/d7b1e54fdb69f22d24a64bb3e88dc718b31c7fb10ef0b9691a3cf7eeea6e/cryptography-47.0.0-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:07efe86201817e7d3c18781ca9770bc0db04e1e48c994be384e4602bc38f8f27", size = 4635767, upload-time = "2026-04-24T19:54:08.519Z" },
+    { url = "https://files.pythonhosted.org/packages/8b/7b/cca826391fb2a94efdcdfe4631eb69306ee1cff0b22f664a412c90713877/cryptography-47.0.0-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:2b45761c6ec22b7c726d6a829558777e32d0f1c8be7c3f3480f9c912d5ee8a10", size = 4654350, upload-time = "2026-04-24T19:54:10.795Z" },
+    { url = "https://files.pythonhosted.org/packages/4c/65/4b57bcc823f42a991627c51c2f68c9fd6eb1393c1756aac876cba2accae2/cryptography-47.0.0-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:edd4da498015da5b9f26d38d3bfc2e90257bfa9cbed1f6767c282a0025ae649b", size = 4643394, upload-time = "2026-04-24T19:54:13.275Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/c4/2c5fbeea70adbbca2bbae865e1d605d6a4a7f8dbd9d33eaf69645087f06c/cryptography-47.0.0-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:9af828c0d5a65c70ec729cd7495a4bf1a67ecb66417b8f02ff125ab8a6326a74", size = 5225777, upload-time = "2026-04-24T19:54:15.18Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/b8/ac57107ef32749d2b244e36069bb688792a363aaaa3acc9e3cf84c130315/cryptography-47.0.0-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:256d07c78a04d6b276f5df935a9923275f53bd1522f214447fdf365494e2d515", size = 4688771, upload-time = "2026-04-24T19:54:17.835Z" },
+    { url = "https://files.pythonhosted.org/packages/56/fc/9f1de22ff8be99d991f240a46863c52d475404c408886c5a38d2b5c3bb26/cryptography-47.0.0-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:5d0e362ff51041b0c0d219cc7d6924d7b8996f57ce5712bdcef71eb3c65a59cc", size = 4270753, upload-time = "2026-04-24T19:54:19.963Z" },
+    { url = "https://files.pythonhosted.org/packages/00/68/d70c852797aa68e8e48d12e5a87170c43f67bb4a59403627259dd57d15de/cryptography-47.0.0-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:1581aef4219f7ca2849d0250edaa3866212fb74bf5667284f46aa92f9e65c1ca", size = 4642911, upload-time = "2026-04-24T19:54:21.818Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/51/661cbee74f594c5d97ff82d34f10d5551c085ca4668645f4606ebd22bd5d/cryptography-47.0.0-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:a49a3eb5341b9503fa3000a9a0db033161db90d47285291f53c2a9d2cd1b7f76", size = 5181411, upload-time = "2026-04-24T19:54:24.376Z" },
+    { url = "https://files.pythonhosted.org/packages/94/87/f2b6c374a82cf076cfa1416992ac8e8ec94d79facc37aec87c1a5cb72352/cryptography-47.0.0-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:2207a498b03275d0051589e326b79d4cf59985c99031b05bb292ac52631c37fe", size = 4688262, upload-time = "2026-04-24T19:54:26.946Z" },
+    { url = "https://files.pythonhosted.org/packages/14/e2/8b7462f4acf21ec509616f0245018bb197194ab0b65c2ea21a0bdd53c0eb/cryptography-47.0.0-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7a02675e2fabd0c0fc04c868b8781863cbf1967691543c22f5470500ff840b31", size = 4775506, upload-time = "2026-04-24T19:54:28.926Z" },
+    { url = "https://files.pythonhosted.org/packages/70/75/158e494e4c08dc05e039da5bb48553826bd26c23930cf8d3cd5f21fa8921/cryptography-47.0.0-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80887c5cbd1774683cb126f0ab4184567f080071d5acf62205acb354b4b753b7", size = 4912060, upload-time = "2026-04-24T19:54:30.869Z" },
+    { url = "https://files.pythonhosted.org/packages/06/bd/0a9d3edbf5eadbac926d7b9b3cd0c4be584eeeae4a003d24d9eda4affbbd/cryptography-47.0.0-cp38-abi3-win32.whl", hash = "sha256:ed67ea4e0cfb5faa5bc7ecb6e2b8838f3807a03758eec239d6c21c8769355310", size = 3248487, upload-time = "2026-04-24T19:54:33.494Z" },
+    { url = "https://files.pythonhosted.org/packages/60/80/5681af756d0da3a599b7bdb586fac5a1540f1bcefd2717a20e611ddade45/cryptography-47.0.0-cp38-abi3-win_amd64.whl", hash = "sha256:835d2d7f47cdc53b3224e90810fb1d36ca94ea29cc1801fb4c1bc43876735769", size = 3755737, upload-time = "2026-04-24T19:54:35.408Z" },
+    { url = "https://files.pythonhosted.org/packages/1b/a0/928c9ce0d120a40a81aa99e3ba383e87337b9ac9ef9f6db02e4d7822424d/cryptography-47.0.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:7f1207974a904e005f762869996cf620e9bf79ecb4622f148550bb48e0eb35a7", size = 3909893, upload-time = "2026-04-24T19:54:38.334Z" },
+    { url = "https://files.pythonhosted.org/packages/81/75/d691e284750df5d9569f2b1ce4a00a71e1d79566da83b2b3e5549c84917f/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:1a405c08857258c11016777e11c02bacbe7ef596faf259305d282272a3a05cbe", size = 4587867, upload-time = "2026-04-24T19:54:40.619Z" },
+    { url = "https://files.pythonhosted.org/packages/07/d6/1b90f1a4e453009730b4545286f0b39bb348d805c11181fc31544e4f9a65/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:20fdbe3e38fb67c385d233c89371fa27f9909f6ebca1cecc20c13518dae65475", size = 4627192, upload-time = "2026-04-24T19:54:42.849Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/53/cb358a80e9e359529f496870dd08c102aa8a4b5b9f9064f00f0d6ed5b527/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:f7db373287273d8af1414cf95dc4118b13ffdc62be521997b0f2b270771fef50", size = 4587486, upload-time = "2026-04-24T19:54:44.908Z" },
+    { url = "https://files.pythonhosted.org/packages/8b/57/aaa3d53876467a226f9a7a82fd14dd48058ad2de1948493442dfa16e2ffd/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:9fe6b7c64926c765f9dff301f9c1b867febcda5768868ca084e18589113732ab", size = 4626327, upload-time = "2026-04-24T19:54:47.813Z" },
+    { url = "https://files.pythonhosted.org/packages/ab/9c/51f28c3550276bcf35660703ba0ab829a90b88be8cd98a71ef23c2413913/cryptography-47.0.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:cffbba3392df0fa8629bb7f43454ee2925059ee158e23c54620b9063912b86c8", size = 3698916, upload-time = "2026-04-24T19:54:49.782Z" },
 ]
 [[package]]
 [[package]]
 name = "huggingface-hub"
+version = "1.12.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "filelock" },
     { name = "typer" },
     { name = "typing-extensions" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/56/52/1b54cb569509c725a32c1315261ac9fd0e6b91bbbf74d86fca10d3376164/huggingface_hub-1.12.0.tar.gz", hash = "sha256:7c3fe85e24b652334e5d456d7a812cd9a071e75630fac4365d9165ab5e4a34b6", size = 763091, upload-time = "2026-04-24T13:32:08.674Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/7e/2b/ef03ddb96bd1123503c2bd6932001020292deea649e9bf4caa2cb65a85bf/huggingface_hub-1.12.0-py3-none-any.whl", hash = "sha256:d74939969585ee35748bd66de09baf84099d461bda7287cd9043bfb99b0e424d", size = 646806, upload-time = "2026-04-24T13:32:06.717Z" },
 ]
 [[package]]
 [[package]]
 name = "marimo"
+version = "0.23.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "click" },
     { name = "uvicorn" },
     { name = "websockets" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/b6/3f/7fb38c6c2a1f8d6b3c3ffb8ca6db5ff0b9dacbb113b4d05aa7690b51a771/marimo-0.23.3.tar.gz", hash = "sha256:251a8724b58882d65956ff6a20552cb21e59a6fd4149ca437727894375ec31e9", size = 38406206, upload-time = "2026-04-24T17:56:21.016Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/46/e7/02d672006fb04cb8aef23aeaf0384482fe63a13f9db6125ad8e13146daee/marimo-0.23.3-py3-none-any.whl", hash = "sha256:329b35b9ca221db9c78780d1714b11f010a00e2a929942db8ae6187960d42496", size = 38828150, upload-time = "2026-04-24T17:56:16.204Z" },
 ]
 [[package]]
 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
+    { name = "httpx" },
+    { name = "huggingface-hub" },
     { name = "manim", version = "0.19.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
     { name = "manim", version = "0.20.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
     { name = "marimo" },
     { name = "openenv-core", extra = ["core"] },
+    { name = "wikipedia-api" },
 ]
 [package.optional-dependencies]
 [package.metadata]
 requires-dist = [
+    { name = "httpx", specifier = ">=0.28.1" },
+    { name = "huggingface-hub", specifier = ">=1.12.0" },
     { name = "manim", specifier = ">=0.18.0" },
     { name = "marimo", specifier = ">=0.10.0" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
+    { name = "wikipedia-api", specifier = ">=0.14.1" },
 ]
 provides-extras = ["dev"]
+[package.metadata.requires-dev]
+dev = []
 [[package]]
 name = "opentelemetry-api"
+version = "1.41.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "importlib-metadata" },
     { name = "typing-extensions" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/fa/fc/b7564cbef36601aef0d6c9bc01f7badb64be8e862c2e1c3c5c3b43b53e4f/opentelemetry_api-1.41.1.tar.gz", hash = "sha256:0ad1814d73b875f84494387dae86ce0b12c68556331ce6ce8fe789197c949621", size = 71416, upload-time = "2026-04-24T13:15:38.262Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/29/59/3e7118ed140f76b0982ba4321bdaed1997a0473f9720de2d10788a577033/opentelemetry_api-1.41.1-py3-none-any.whl", hash = "sha256:a22df900e75c76dc08440710e51f52f1aa6b451b429298896023e60db5b3139f", size = 69007, upload-time = "2026-04-24T13:15:15.662Z" },
 ]
 [[package]]
 [[package]]
 name = "packaging"
+version = "26.2"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d7/f1/e7a6dd94a8d4a5626c03e4e99c87f241ba9e350cd9e6d75123f992427270/packaging-26.2.tar.gz", hash = "sha256:ff452ff5a3e828ce110190feff1178bb1f2ea2281fa2075aadb987c2fb221661", size = 228134, upload-time = "2026-04-24T20:15:23.917Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/df/b2/87e62e8c3e2f4b32e5fe99e0b86d576da1312593b39f47d8ceef365e95ed/packaging-26.2-py3-none-any.whl", hash = "sha256:5fc45236b9446107ff2415ce77c807cee2862cb6fac22b8a73826d0693b0980e", size = 100195, upload-time = "2026-04-24T20:15:22.081Z" },
 ]
 [[package]]
 version = "12.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "pyobjc-core", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/02/a3/16ca9a15e77c061a9250afbae2eae26f2e1579eb8ca9462ae2d2c71e1169/pyobjc_framework_cocoa-12.1.tar.gz", hash = "sha256:5556c87db95711b985d5efdaaf01c917ddd41d148b1e52a0c66b1a2e2c5c1640", size = 2772191, upload-time = "2025-11-14T10:13:02.069Z" }
 wheels = [
     { url = "https://files.pythonhosted.org/packages/48/2c/6c9bb53db56c8a12a736d2158a8b842a5993b96daabc29d90a098e840280/svgelements-1.9.6-py2.py3-none-any.whl", hash = "sha256:8a5cf2cc066d98e713d5b875b1d6e5eeb9b92e855e835ebd7caab2713ae1dcad", size = 137856, upload-time = "2023-08-17T02:01:48.76Z" },
 ]
+[[package]]
+name = "tenacity"
+version = "9.1.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/47/c6/ee486fd809e357697ee8a44d3d69222b344920433d3b6666ccd9b374630c/tenacity-9.1.4.tar.gz", hash = "sha256:adb31d4c263f2bd041081ab33b498309a57c77f9acf2db65aadf0898179cf93a", size = 49413, upload-time = "2026-02-07T10:45:33.841Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d7/c1/eb8f9debc45d3b7918a32ab756658a0904732f75e555402972246b0b8e71/tenacity-9.1.4-py3-none-any.whl", hash = "sha256:6095a360c919085f28c6527de529e76a06ad89b23659fa881ae0649b867a9d55", size = 28926, upload-time = "2026-02-07T10:45:32.24Z" },
+]
 [[package]]
 name = "tomli"
 version = "2.4.1"
 [[package]]
 name = "tzdata"
+version = "2026.2"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ba/19/1b9b0e29f30c6d35cb345486df41110984ea67ae69dddbc0e8a100999493/tzdata-2026.2.tar.gz", hash = "sha256:9173fde7d80d9018e02a662e168e5a2d04f87c41ea174b139fbef642eda62d10", size = 198254, upload-time = "2026-04-24T15:22:08.651Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/ce/e4/dccd7f47c4b64213ac01ef921a1337ee6e30e8c6466046018326977efd95/tzdata-2026.2-py2.py3-none-any.whl", hash = "sha256:bbe9af844f658da81a5f95019480da3a89415801f6cc966806612cc7169bffe7", size = 349321, upload-time = "2026-04-24T15:22:05.876Z" },
 ]
 [[package]]
     { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
 ]
+[[package]]
+name = "wikipedia-api"
+version = "0.14.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "click" },
+    { name = "httpx" },
+    { name = "tenacity" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/98/a5/166011c4d24d80a88e466a9ce1beb4d39884569f329dad82aa7d15c001f7/wikipedia_api-0.14.1.tar.gz", hash = "sha256:1a4ac428711f673a983be5676eb6c5fa39130fc5869893923435884e0e2c3c31", size = 141350, upload-time = "2026-04-10T22:38:34.313Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/72/de/0c66576815650bc74d6fbbdc92d17df906db34043bfc59323a004391c0ed/wikipedia_api-0.14.1-py3-none-any.whl", hash = "sha256:cacfdb953c3802b96605d7ac78ee42dd7fe049f28ed47e632cfe943187b83c2b", size = 129096, upload-time = "2026-04-10T22:38:32.22Z" },
+]
 [[package]]
 name = "zipp"
 version = "3.23.1"