kgdrathan commited on
Commit
eb1ebe6
·
verified ·
1 Parent(s): c5b0dcd

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Explainer Env Environment Server
3
- emoji: 💻
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: docker
@@ -11,245 +11,97 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # Explainer Env Environment
15
 
16
- A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
 
18
- ## Quick Start
19
-
20
- The simplest way to use the Explainer Env environment is through the `ExplainerEnv` class:
21
-
22
- ```python
23
- from explainer_env import ExplainerAction, ExplainerEnv
24
 
25
- try:
26
- # Create environment from Docker image
27
- explainer_envenv = ExplainerEnv.from_docker_image("explainer_env-env:latest")
28
 
29
- # Reset
30
- result = explainer_envenv.reset()
31
- print(f"Reset: {result.observation.echoed_message}")
32
 
33
- # Send multiple messages
34
- messages = ["Hello, World!", "Testing echo", "Final message"]
35
-
36
- for msg in messages:
37
- result = explainer_envenv.step(ExplainerAction(message=msg))
38
- print(f"Sent: '{msg}'")
39
- print(f" → Echoed: '{result.observation.echoed_message}'")
40
- print(f" → Length: {result.observation.message_length}")
41
- print(f" → Reward: {result.reward}")
42
-
43
- finally:
44
- # Always clean up
45
- explainer_envenv.close()
46
  ```
47
-
48
- That's it! The `ExplainerEnv.from_docker_image()` method handles:
49
- - Starting the Docker container
50
- - Waiting for the server to be ready
51
- - Connecting to the environment
52
- - Container cleanup when you call `close()`
53
-
54
- ## Building the Docker Image
55
-
56
- Before using the environment, you need to build the Docker image:
57
-
58
- ```bash
59
- # From project root
60
- docker build -t explainer_env-env:latest -f server/Dockerfile .
61
  ```
62
 
63
- ## Deploying to Hugging Face Spaces
64
 
65
- You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
66
 
67
  ```bash
68
- # From the environment directory (where openenv.yaml is located)
69
- openenv push
70
-
71
- # Or specify options
72
- openenv push --namespace my-org --private
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ```
74
 
75
- The `openenv push` command will:
76
- 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
77
- 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
78
- 3. Upload to Hugging Face (ensuring you're logged in)
79
-
80
- ### Prerequisites
81
-
82
- - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
83
 
84
- ### Options
85
 
86
- - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
87
- - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
88
- - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
89
- - `--private`: Deploy the space as private (default: public)
90
-
91
- ### Examples
92
 
93
  ```bash
94
- # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
95
- openenv push
96
-
97
- # Push to a specific repository
98
- openenv push --repo-id my-org/my-env
99
-
100
- # Push with a custom base image
101
- openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
102
-
103
- # Push as a private space
104
- openenv push --private
105
-
106
- # Combine options
107
- openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
108
- ```
109
-
110
- After deployment, your space will be available at:
111
- `https://huggingface.co/spaces/<repo-id>`
112
-
113
- The deployed space includes:
114
- - **Web Interface** at `/web` - Interactive UI for exploring the environment
115
- - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
116
- - **Health Check** at `/health` - Container health monitoring
117
- - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
118
-
119
- ## Environment Details
120
-
121
- ### Action
122
- **ExplainerAction**: Contains a single field
123
- - `message` (str) - The message to echo back
124
-
125
- ### Observation
126
- **ExplainerObservation**: Contains the echo response and metadata
127
- - `echoed_message` (str) - The message echoed back
128
- - `message_length` (int) - Length of the message
129
- - `reward` (float) - Reward based on message length (length × 0.1)
130
- - `done` (bool) - Always False for echo environment
131
- - `metadata` (dict) - Additional info like step count
132
-
133
- ### Reward
134
- The reward is calculated as: `message_length × 0.1`
135
- - "Hi" → reward: 0.2
136
- - "Hello, World!" → reward: 1.3
137
- - Empty message → reward: 0.0
138
-
139
- ## Advanced Usage
140
-
141
- ### Connecting to an Existing Server
142
-
143
- If you already have a Explainer Env environment server running, you can connect directly:
144
-
145
- ```python
146
- from explainer_env import ExplainerEnv
147
-
148
- # Connect to existing server
149
- explainer_envenv = ExplainerEnv(base_url="<ENV_HTTP_URL_HERE>")
150
-
151
- # Use as normal
152
- result = explainer_envenv.reset()
153
- result = explainer_envenv.step(ExplainerAction(message="Hello!"))
154
  ```
155
 
156
- Note: When connecting to an existing server, `explainer_envenv.close()` will NOT stop the server.
157
 
158
- ### Using the Context Manager
159
 
160
- The client supports context manager usage for automatic connection management:
161
 
162
  ```python
163
- from explainer_env import ExplainerAction, ExplainerEnv
164
-
165
- # Connect with context manager (auto-connects and closes)
166
- with ExplainerEnv(base_url="http://localhost:8000") as env:
167
- result = env.reset()
168
- print(f"Reset: {result.observation.echoed_message}")
169
- # Multiple steps with low latency
170
- for msg in ["Hello", "World", "!"]:
171
- result = env.step(ExplainerAction(message=msg))
172
- print(f"Echoed: {result.observation.echoed_message}")
173
- ```
174
-
175
- The client uses WebSocket connections for:
176
- - **Lower latency**: No HTTP connection overhead per request
177
- - **Persistent session**: Server maintains your environment state
178
- - **Efficient for episodes**: Better for many sequential steps
179
-
180
- ### Concurrent WebSocket Sessions
181
-
182
- The server supports multiple concurrent WebSocket connections. To enable this,
183
- modify `server/app.py` to use factory mode:
184
-
185
- ```python
186
- # In server/app.py - use factory mode for concurrent sessions
187
- app = create_app(
188
- ExplainerEnvironment, # Pass class, not instance
189
- ExplainerAction,
190
- ExplainerObservation,
191
- max_concurrent_envs=4, # Allow 4 concurrent sessions
192
- )
193
- ```
194
-
195
- Then multiple clients can connect simultaneously:
196
-
197
- ```python
198
- from explainer_env import ExplainerAction, ExplainerEnv
199
  from concurrent.futures import ThreadPoolExecutor
200
 
201
  def run_episode(client_id: int):
202
- with ExplainerEnv(base_url="http://localhost:8000") as env:
203
- result = env.reset()
204
- for i in range(10):
205
- result = env.step(ExplainerAction(message=f"Client {client_id}, step {i}"))
206
- return client_id, result.observation.message_length
 
 
 
207
 
208
- # Run 4 episodes concurrently
209
  with ThreadPoolExecutor(max_workers=4) as executor:
210
  results = list(executor.map(run_episode, range(4)))
211
  ```
212
-
213
- ## Development & Testing
214
-
215
- ### Direct Environment Testing
216
-
217
- Test the environment logic directly without starting the HTTP server:
218
-
219
- ```bash
220
- # From the server directory
221
- python3 server/explainer_env_environment.py
222
- ```
223
-
224
- This verifies that:
225
- - Environment resets correctly
226
- - Step executes actions properly
227
- - State tracking works
228
- - Rewards are calculated correctly
229
-
230
- ### Running Locally
231
-
232
- Run the server locally for development:
233
-
234
- ```bash
235
- uvicorn server.app:app --reload
236
- ```
237
-
238
- ## Project Structure
239
-
240
- ```
241
- explainer_env/
242
- ├── .dockerignore # Docker build exclusions
243
- ├── __init__.py # Module exports
244
- ├── README.md # This file
245
- ├── openenv.yaml # OpenEnv manifest
246
- ├── pyproject.toml # Project metadata and dependencies
247
- ├── uv.lock # Locked dependencies (generated)
248
- ├── client.py # ExplainerEnv client
249
- ├── models.py # Action and Observation models
250
- └── server/
251
- ├── __init__.py # Server module exports
252
- ├── explainer_env_environment.py # Core environment logic
253
- ├── app.py # FastAPI application (HTTP + WebSocket endpoints)
254
- └── Dockerfile # Container image definition
255
- ```
 
1
  ---
2
  title: Explainer Env Environment Server
3
+ emoji: "\U0001F4BB"
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: docker
 
11
  - openenv
12
  ---
13
 
14
+ # Research Interactive Explainer Environment
15
 
16
+ An OpenEnv RL environment that trains small language models to create interactive educational content. Given a research topic, the agent:
17
 
18
+ 1. **Explores** — searches HuggingFace Papers (ML topics) or Wikipedia (general topics) for relevant content
19
+ 2. **Generates** — produces a **Marimo** reactive notebook or **Manim** math animation explaining the topic
 
 
 
 
20
 
21
+ The agent learns *what* to search, *when to stop exploring*, and how to produce high-quality interactive explanations.
 
 
22
 
23
+ ## Episode Flow
 
 
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ```
26
+ reset() → topic + tier assigned
27
+
28
+ explore × 0..3 search queries, accumulate research context
29
+
30
+ generate × 1 produce marimo/manim code → episode ends
 
 
 
 
 
 
 
 
 
31
  ```
32
 
33
+ Each step returns a per-step reward. See [rewards/README.md](rewards/README.md) for the full reward breakdown.
34
 
35
+ ## Quick Start
36
 
37
  ```bash
38
+ # Install & run locally
39
+ cd explainer_env && uv sync
40
+ uv run server # http://localhost:8000
41
+
42
+ # Client usage
43
+ python -c "
44
+ from client import ExplainerEnv
45
+ from models import ExplainerAction
46
+
47
+ with ExplainerEnv(base_url='http://localhost:8000').sync() as sc:
48
+ result = sc.reset()
49
+ print(f'Topic: {result.observation.topic}, Tier: {result.observation.tier}')
50
+
51
+ # Explore
52
+ result = sc.step(ExplainerAction(action_type='explore', query=result.observation.topic))
53
+ print(f'Explore reward: {result.reward:.3f}')
54
+
55
+ # Generate
56
+ result = sc.step(ExplainerAction(
57
+ action_type='generate',
58
+ format='marimo',
59
+ code='import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n mo.md(\"# Hello\")\n return\n',
60
+ ))
61
+ print(f'Generate reward: {result.reward:.3f}, done: {result.done}')
62
+ "
63
  ```
64
 
65
+ ## LLM-as-Judge (Optional Eval)
 
 
 
 
 
 
 
66
 
67
+ For final evaluation of explanation quality, an optional LLM judge scores outputs on clarity, accuracy, engagement, completeness, and appropriateness.
68
 
69
+ **Not used during training** — too slow and non-deterministic for RL rewards. Training uses 12 fast heuristic reward components instead.
 
 
 
 
 
70
 
71
  ```bash
72
+ # Configure (any OpenAI-compatible endpoint)
73
+ export JUDGE_API_URL="http://localhost:11434/v1" # e.g. ollama
74
+ export JUDGE_MODEL="llama3"
75
+
76
+ # Usage
77
+ python -c "
78
+ from rewards.llm_judge import judge_explainability, is_available
79
+ if is_available():
80
+ score, details = judge_explainability(code='...', topic='Linear Regression', tier='beginner')
81
+ print(f'Score: {score:.2f}, Rationale: {details.get(\"rationale\", \"\")}')"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
+ See [rewards/README.md](rewards/README.md) for full configuration details.
85
 
86
+ ## Concurrent WebSocket Sessions
87
 
88
+ The server supports multiple concurrent WebSocket connections for parallel training rollouts:
89
 
90
  ```python
91
+ from client import ExplainerEnv
92
+ from models import ExplainerAction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  from concurrent.futures import ThreadPoolExecutor
94
 
95
  def run_episode(client_id: int):
96
+ with ExplainerEnv(base_url="http://localhost:8000").sync() as sc:
97
+ result = sc.reset()
98
+ result = sc.step(ExplainerAction(action_type="explore", query=result.observation.topic))
99
+ result = sc.step(ExplainerAction(
100
+ action_type="generate", format="marimo",
101
+ code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n return\n",
102
+ ))
103
+ return client_id, result.reward
104
 
 
105
  with ThreadPoolExecutor(max_workers=4) as executor:
106
  results = list(executor.map(run_episode, range(4)))
107
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
client.py CHANGED
@@ -16,12 +16,16 @@ class ExplainerEnv(
16
  Client for the Research → Interactive Explainer environment.
17
 
18
  Example:
19
- >>> with ExplainerEnv(base_url="http://localhost:8000") as client:
20
- ... result = client.reset()
21
- ... print(result.observation.topic)
22
- ... action = ExplainerAction(format="marimo", code="import marimo...")
23
- ... result = client.step(action)
24
- ... print(result.reward)
 
 
 
 
25
  """
26
 
27
  def _step_payload(self, action: ExplainerAction) -> Dict:
 
16
  Client for the Research → Interactive Explainer environment.
17
 
18
  Example:
19
+ >>> with ExplainerEnv(base_url="http://localhost:8000").sync() as sc:
20
+ ... result = sc.reset()
21
+ ... # Explore phase
22
+ ... result = sc.step(ExplainerAction(
23
+ ... action_type="explore", query="attention mechanism transformers"
24
+ ... ))
25
+ ... # Generate phase
26
+ ... result = sc.step(ExplainerAction(
27
+ ... action_type="generate", format="marimo", code="import marimo..."
28
+ ... ))
29
  """
30
 
31
  def _step_payload(self, action: ExplainerAction) -> Dict:
models.py CHANGED
@@ -1,8 +1,9 @@
1
  """
2
  Data models for the Research → Interactive Explainer environment.
3
 
4
- The agent receives a topic/paper and generates interactive educational content
5
- as either a Marimo notebook or Manim animation (with narration script).
 
6
  """
7
 
8
  from typing import Literal
@@ -12,29 +13,59 @@ from pydantic import Field
12
 
13
 
14
  class ExplainerAction(Action):
15
- """Action: agent chooses a format and generates code (+ optional narration)."""
16
 
17
- format: Literal["marimo", "manim"] = Field(
18
- ..., description="Output format: 'marimo' for interactive notebook, 'manim' for animation"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  )
20
- code: str = Field(..., description="Complete Python source code (Marimo .py or Manim Scene)")
21
  narration: str = Field(
22
  default="",
23
- description="Scene-by-scene narration script (required when format is 'manim')",
24
  )
25
 
26
 
27
  class ExplainerObservation(Observation):
28
- """Observation: the topic to explain and feedback on the last attempt."""
29
 
 
30
  topic: str = Field(default="", description="Title of the topic or paper")
31
  content: str = Field(default="", description="Abstract or concept description")
32
  tier: Literal["beginner", "intermediate", "advanced"] = Field(
33
  default="beginner", description="Explanation depth tier"
34
  )
35
- keywords: str = Field(default="", description="Comma-separated key terms from the source")
36
- category: str = Field(default="", description="arXiv category or domain (e.g. cs.LG, math.NA)")
37
  data_available: bool = Field(
38
- default=False, description="Whether the topic references datasets/numbers"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  )
40
- feedback: str = Field(default="", description="Feedback on the last action (execution result)")
 
1
  """
2
  Data models for the Research → Interactive Explainer environment.
3
 
4
+ Two-phase episode:
5
+ 1. Explore: agent searches for papers/resources (1-3 steps)
6
+ 2. Generate: agent produces marimo/manim code (1 step, ends episode)
7
  """
8
 
9
  from typing import Literal
 
13
 
14
 
15
  class ExplainerAction(Action):
16
+ """Action: agent either explores (searches) or generates (produces code)."""
17
 
18
+ action_type: Literal["explore", "generate"] = Field(
19
+ ..., description="'explore' to search for papers, 'generate' to produce code"
20
+ )
21
+
22
+ # -- explore fields --
23
+ query: str = Field(
24
+ default="",
25
+ description="Search query for arXiv/HF papers (used when action_type='explore')",
26
+ )
27
+
28
+ # -- generate fields --
29
+ format: Literal["marimo", "manim"] | None = Field(
30
+ default=None,
31
+ description="Output format (required when action_type='generate')",
32
+ )
33
+ code: str = Field(
34
+ default="",
35
+ description="Complete Python source code (required when action_type='generate')",
36
  )
 
37
  narration: str = Field(
38
  default="",
39
+ description="Narration script (required when format='manim')",
40
  )
41
 
42
 
43
  class ExplainerObservation(Observation):
44
+ """Observation returned to the agent after each step."""
45
 
46
+ # -- task info (set on reset, echoed back each step) --
47
  topic: str = Field(default="", description="Title of the topic or paper")
48
  content: str = Field(default="", description="Abstract or concept description")
49
  tier: Literal["beginner", "intermediate", "advanced"] = Field(
50
  default="beginner", description="Explanation depth tier"
51
  )
52
+ keywords: str = Field(default="", description="Comma-separated key terms")
 
53
  data_available: bool = Field(
54
+ default=False, description="Whether the topic references datasets"
55
+ )
56
+
57
+ # -- per-step feedback --
58
+ phase: Literal["explore", "generate", "done"] = Field(
59
+ default="explore", description="Current episode phase"
60
+ )
61
+ feedback: str = Field(default="", description="Feedback on the last action")
62
+ search_results: str = Field(
63
+ default="", description="Papers/snippets returned from an explore step"
64
+ )
65
+ explored_context: str = Field(
66
+ default="",
67
+ description="Accumulated research context from all explore steps so far",
68
+ )
69
+ explore_steps_left: int = Field(
70
+ default=3, description="Remaining explore steps before forced generate"
71
  )
 
openenv_explainer_env.egg-info/PKG-INFO CHANGED
@@ -6,6 +6,9 @@ Requires-Python: >=3.10
6
  Requires-Dist: openenv-core[core]>=0.2.2
7
  Requires-Dist: marimo>=0.10.0
8
  Requires-Dist: manim>=0.18.0
 
 
 
9
  Provides-Extra: dev
10
  Requires-Dist: pytest>=8.0.0; extra == "dev"
11
  Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
 
6
  Requires-Dist: openenv-core[core]>=0.2.2
7
  Requires-Dist: marimo>=0.10.0
8
  Requires-Dist: manim>=0.18.0
9
+ Requires-Dist: wikipedia-api>=0.14.1
10
+ Requires-Dist: huggingface-hub>=1.12.0
11
+ Requires-Dist: httpx>=0.28.1
12
  Provides-Extra: dev
13
  Requires-Dist: pytest>=8.0.0; extra == "dev"
14
  Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_explainer_env.egg-info/SOURCES.txt CHANGED
@@ -14,6 +14,18 @@ openenv_explainer_env.egg-info/dependency_links.txt
14
  openenv_explainer_env.egg-info/entry_points.txt
15
  openenv_explainer_env.egg-info/requires.txt
16
  openenv_explainer_env.egg-info/top_level.txt
 
 
 
 
 
 
17
  server/__init__.py
18
  server/app.py
19
- server/explainer_env_environment.py
 
 
 
 
 
 
 
14
  openenv_explainer_env.egg-info/entry_points.txt
15
  openenv_explainer_env.egg-info/requires.txt
16
  openenv_explainer_env.egg-info/top_level.txt
17
+ rewards/__init__.py
18
+ rewards/exploration.py
19
+ rewards/generation.py
20
+ rewards/llm_judge.py
21
+ rewards/sandbox.py
22
+ rewards/sources.py
23
  server/__init__.py
24
  server/app.py
25
+ server/explainer_env_environment.py
26
+ tests/test_client_server.py
27
+ tests/test_docker.py
28
+ tests/test_environment.py
29
+ tests/test_models.py
30
+ tests/test_rewards.py
31
+ tests/test_task_bank.py
openenv_explainer_env.egg-info/requires.txt CHANGED
@@ -1,6 +1,9 @@
1
  openenv-core[core]>=0.2.2
2
  marimo>=0.10.0
3
  manim>=0.18.0
 
 
 
4
 
5
  [dev]
6
  pytest>=8.0.0
 
1
  openenv-core[core]>=0.2.2
2
  marimo>=0.10.0
3
  manim>=0.18.0
4
+ wikipedia-api>=0.14.1
5
+ huggingface-hub>=1.12.0
6
+ httpx>=0.28.1
7
 
8
  [dev]
9
  pytest>=8.0.0
out.txt ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml CHANGED
@@ -11,6 +11,9 @@ dependencies = [
11
  "openenv-core[core]>=0.2.2",
12
  "marimo>=0.10.0",
13
  "manim>=0.18.0",
 
 
 
14
  ]
15
 
16
  [project.optional-dependencies]
@@ -24,5 +27,8 @@ server = "explainer_env.server.app:main"
24
 
25
  [tool.setuptools]
26
  include-package-data = true
27
- packages = ["explainer_env", "explainer_env.server"]
28
- package-dir = { "explainer_env" = ".", "explainer_env.server" = "server" }
 
 
 
 
11
  "openenv-core[core]>=0.2.2",
12
  "marimo>=0.10.0",
13
  "manim>=0.18.0",
14
+ "wikipedia-api>=0.14.1",
15
+ "huggingface-hub>=1.12.0",
16
+ "httpx>=0.28.1",
17
  ]
18
 
19
  [project.optional-dependencies]
 
27
 
28
  [tool.setuptools]
29
  include-package-data = true
30
+ packages = ["explainer_env", "explainer_env.server", "explainer_env.rewards"]
31
+ package-dir = { "explainer_env" = ".", "explainer_env.server" = "server", "explainer_env.rewards" = "rewards" }
32
+
33
+ [dependency-groups]
34
+ dev = []
rewards/README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Rewards
2
+
3
+ Multi-component reward system for the two-phase explore → generate episode.
4
+
5
+ ## Episode Flow
6
+
7
+ ```
8
+ reset() → [explore × 0..3] → generate × 1 → done
9
+ ```
10
+
11
+ Each step returns a per-step reward. The agent learns both *what* to explore and *when to stop*.
12
+
13
+ ## Exploration Rewards (`exploration.py`)
14
+
15
+ Per-step reward for each `explore` action. Gated by information need — once the agent has enough info, further exploration yields diminishing returns.
16
+
17
+ | Component | Weight | Range | Description |
18
+ |---|---|---|---|
19
+ | `query_relevance` | 0.40 | 0–1 | Topic + keyword overlap with search query |
20
+ | `result_novelty` | 0.30 | 0–1 | New words vs. already-seen content |
21
+ | `research_breadth` | 0.10 | 0–1 | Number of sources gathered (target >= 2) |
22
+ | `content_sufficiency` | 0.20 | 0–1 | Keyword coverage across task + research (gates reward) |
23
+ | `step_cost` | -0.05 | flat | Per-step penalty — exploration must justify itself |
24
+
25
+ **Gating mechanism**: `info_need = 1 - sufficiency`. Raw reward is scaled by `0.3 + 0.7 * info_need`, so high sufficiency → low reward for more exploration. This teaches the agent to stop when it has enough.
26
+
27
+ ## Generation Rewards (`generation.py`)
28
+
29
+ Single reward on the `generate` action that ends the episode.
30
+
31
+ | Component | Weight | Range | Description |
32
+ |---|---|---|---|
33
+ | `code_valid` | 0.15 | 0/1 | AST parses without errors |
34
+ | `code_runs` | 0.15 | 0/1 | Sandbox execution succeeds (marimo export / manim render) |
35
+ | `coverage` | 0.15 | 0–1 | Fraction of task keywords in generated code |
36
+ | `format_match` | 0.10 | 0.3/1.0 | Chosen format matches task's preferred format (1.0 if task has no preference) |
37
+ | `structure` | 0.15* | 0–1 | Structural quality (cells/scenes, UI elements, viz) |
38
+ | `narration` | 0.10* | 0–1 | Narration quality (manim only; words, scene markers) |
39
+ | `context_usage` | 0.20 | 0–1 | Code references terms from exploration research |
40
+
41
+ *For marimo format, narration weight (0.10) is redistributed to structure (→ 0.25 total).
42
+
43
+ **Skip penalty**: Generating without any exploration incurs -0.1 penalty.
44
+
45
+ ## Search Sources (`sources.py`)
46
+
47
+ All search calls are **async** (httpx + wikipediaapi.AsyncWikipedia). Content is retrieved at section/chunk level and ranked using **BM25** to surface the most relevant parts.
48
+
49
+ | Source | Library | Use Case | Retrieval |
50
+ |---|---|---|---|
51
+ | HuggingFace Papers | httpx → `huggingface.co/api/papers/search` + `papers/{id}.md` | ML/AI topics (semantic search) | Search → top paper → read markdown → BM25 chunk ranking |
52
+ | Wikipedia | `wikipediaapi.AsyncWikipedia` | Math, algorithms, general topics | Search → top page → section tree → BM25 section ranking |
53
+
54
+ **Routing**: ML-related queries (detected by keyword heuristic) → HF Papers. Everything else → Wikipedia. Agent can override with prefix: `hf: query` or `wiki: query`. No explicit routing reward — bad routing leads to weak content → low novelty/relevance naturally.
55
+
56
+ **Top 1 result** by default from each source, with top-3 BM25-ranked sections/chunks returned.
57
+
58
+ ## Sandbox (`sandbox.py`)
59
+
60
+ | Check | Tool | Timeout |
61
+ |---|---|---|
62
+ | `ast_parses` | Python `ast.parse` | — |
63
+ | `run_marimo` | `marimo export html` | 15s |
64
+ | `run_manim` | `manim render -ql` | 30s |
65
+
66
+ ## LLM-as-Judge (`llm_judge.py`)
67
+
68
+ **Eval-only** — not used in the training loop (too slow, non-deterministic for RL reward signals).
69
+
70
+ ### What it scores
71
+
72
+ 5 dimensions on a 1-10 scale, normalized to 0-1:
73
+
74
+ | Dimension | Description |
75
+ |---|---|
76
+ | Clarity | Is the concept explained clearly for the target tier? |
77
+ | Accuracy | Is the content technically correct? |
78
+ | Engagement | Does the code create an engaging, interactive experience? |
79
+ | Completeness | Does it cover the key aspects of the topic? |
80
+ | Appropriateness | Is the depth appropriate for the audience tier? |
81
+
82
+ ### Configuration
83
+
84
+ Set environment variables:
85
+ - `JUDGE_API_URL` (required) — OpenAI-compatible endpoint (e.g. vLLM, ollama, OpenAI)
86
+ - `JUDGE_API_KEY` (optional) — Bearer token for the API
87
+ - `JUDGE_MODEL` (optional, default: `gpt-4o-mini`) — Model to use for judging
88
+
89
+ ### Usage
90
+
91
+ ```python
92
+ from rewards.llm_judge import judge_explainability, is_available
93
+
94
+ if is_available():
95
+ score, details = judge_explainability(
96
+ code="import marimo as mo\n...",
97
+ topic="Linear Regression",
98
+ tier="beginner",
99
+ fmt="marimo",
100
+ )
101
+ print(f"Explainability score: {score:.2f}")
102
+ print(f"Rationale: {details.get('rationale', '')}")
103
+ ```
104
+
105
+ ### What's used during training instead
106
+
107
+ During GRPO training, the 12 heuristic reward components above provide the learning signal. They are deterministic, fast (<1ms per step), and decomposable for debugging. The LLM-as-judge is reserved for final evaluation and human-interpretable quality assessment.
rewards/__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Reward components for the Explainer environment."""
2
+
3
+ from .exploration import compute_explore_reward
4
+ from .generation import compute_generate_reward
5
+ from .sandbox import run_marimo, run_manim
6
+ from .sources import search, search_hf_papers, search_wikipedia
7
+
8
+ __all__ = [
9
+ "compute_explore_reward",
10
+ "compute_generate_reward",
11
+ "run_marimo",
12
+ "run_manim",
13
+ "search",
14
+ "search_hf_papers",
15
+ "search_wikipedia",
16
+ ]
rewards/exploration.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Reward components for the exploration phase.
2
+
3
+ During exploration, the agent searches for papers/resources relevant to the
4
+ task topic. Rewards measure query quality, result relevance, research breadth,
5
+ and exploration efficiency (knowing when to stop).
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+
11
+ def query_relevance(query: str, topic: str, keywords_csv: str) -> float:
12
+ """Score how relevant the search query is to the task (0-1)."""
13
+ if not query or not query.strip():
14
+ return 0.0
15
+
16
+ query_lower = query.strip().lower()
17
+ score = 0.0
18
+
19
+ if topic.lower() in query_lower:
20
+ score += 0.4
21
+
22
+ keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
23
+ if keywords:
24
+ hits = sum(1 for kw in keywords if kw in query_lower)
25
+ score += 0.4 * (hits / len(keywords))
26
+
27
+ if len(query_lower.split()) >= 3:
28
+ score += 0.2
29
+
30
+ return min(1.0, score)
31
+
32
+
33
+ def result_novelty(
34
+ new_content: str, accumulated_context: list[str]
35
+ ) -> float:
36
+ """Score how much new information this result adds (0-1).
37
+
38
+ Penalises repeated searches that return content already seen.
39
+ """
40
+ if not new_content or not new_content.strip():
41
+ return 0.0
42
+ if not accumulated_context:
43
+ return 1.0
44
+
45
+ new_words = set(new_content.lower().split())
46
+ seen_words: set[str] = set()
47
+ for ctx in accumulated_context:
48
+ seen_words.update(ctx.lower().split())
49
+
50
+ if not new_words:
51
+ return 0.0
52
+
53
+ novel = new_words - seen_words
54
+ return min(1.0, len(novel) / max(len(new_words), 1))
55
+
56
+
57
+ def research_breadth(accumulated_context: list[str], min_sources: int = 2) -> float:
58
+ """Score whether the agent gathered enough sources (0-1)."""
59
+ n = len(accumulated_context)
60
+ if n >= min_sources:
61
+ return 1.0
62
+ return n / min_sources
63
+
64
+
65
+ def content_sufficiency(
66
+ task_content: str,
67
+ keywords_csv: str,
68
+ accumulated_context: list[str],
69
+ ) -> float:
70
+ """Measure how much of the task's keywords are already covered (0-1).
71
+
72
+ Combines the task's own content with accumulated research. When this is
73
+ high (>0.8), further exploration has diminishing value — the agent already
74
+ has enough information.
75
+ """
76
+ keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
77
+ if not keywords:
78
+ return 1.0 # no keywords to cover
79
+
80
+ # Build combined text from task content + all research so far
81
+ combined = task_content.lower()
82
+ for ctx in accumulated_context:
83
+ combined += " " + ctx.lower()
84
+
85
+ hits = sum(1 for kw in keywords if kw in combined)
86
+ return hits / len(keywords)
87
+
88
+
89
+ # -- Weights --
90
+ W_QUERY = 0.40
91
+ W_NOVELTY = 0.30
92
+ W_BREADTH = 0.10
93
+ W_SUFFICIENCY_GATE = 0.20 # gates reward by remaining information need
94
+
95
+ # Flat cost per explore step — agent must expect enough gain to justify it
96
+ STEP_COST = 0.05
97
+
98
+
99
+ def compute_explore_reward(
100
+ query: str,
101
+ result_text: str,
102
+ topic: str,
103
+ keywords_csv: str,
104
+ task_content: str,
105
+ accumulated_context: list[str],
106
+ ) -> tuple[float, dict]:
107
+ """Compute per-step exploration reward. Returns (total, components).
108
+
109
+ Reward is gated by (1 - sufficiency): once the agent has enough info,
110
+ further exploration is nearly unrewarded. A flat step cost penalises
111
+ unnecessary searches.
112
+ """
113
+ q_rel = query_relevance(query, topic, keywords_csv)
114
+ novelty = result_novelty(result_text, accumulated_context)
115
+ breadth = research_breadth(accumulated_context)
116
+ sufficiency = content_sufficiency(task_content, keywords_csv, accumulated_context)
117
+
118
+ # Information need: how much value exploration still has
119
+ info_need = max(0.0, 1.0 - sufficiency)
120
+
121
+ # Raw reward from query + novelty + breadth
122
+ raw = W_QUERY * q_rel + W_NOVELTY * novelty + W_BREADTH * breadth
123
+
124
+ # Gate by info need: high sufficiency → low reward for exploring more
125
+ # Also add direct sufficiency-gate component so agent sees the signal
126
+ total = raw * (0.3 + 0.7 * info_need) + W_SUFFICIENCY_GATE * info_need - STEP_COST
127
+ total = max(0.0, total)
128
+
129
+ components = {
130
+ "query_relevance": round(q_rel, 3),
131
+ "result_novelty": round(novelty, 3),
132
+ "research_breadth": round(breadth, 3),
133
+ "content_sufficiency": round(sufficiency, 3),
134
+ "info_need": round(info_need, 3),
135
+ "step_cost": STEP_COST,
136
+ "explore_total": round(total, 4),
137
+ }
138
+ return total, components
rewards/generation.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Reward components for the generation phase.
2
+
3
+ After exploration, the agent generates marimo/manim code. Rewards measure
4
+ code quality, execution success, keyword coverage, format match, structural
5
+ quality, and narration (manim only).
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ from typing import TYPE_CHECKING
11
+
12
+ from .sandbox import ast_parses
13
+
14
+ if TYPE_CHECKING:
15
+ from ..task_bank import Task
16
+
17
+
18
+ # ---------------------------------------------------------------------------
19
+ # Individual scorers
20
+ # ---------------------------------------------------------------------------
21
+
22
+
23
+ def keyword_coverage(code: str, keywords_csv: str) -> float:
24
+ """Fraction of task keywords mentioned in the code (case-insensitive)."""
25
+ if not keywords_csv:
26
+ return 0.0
27
+ keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
28
+ if not keywords:
29
+ return 0.0
30
+ code_lower = code.lower()
31
+ hits = sum(1 for kw in keywords if kw in code_lower)
32
+ return hits / len(keywords)
33
+
34
+
35
+ def format_match(chosen_format: str, task: Task) -> float:
36
+ """1.0 if format matches the task's preferred format, else 0.3.
37
+
38
+ If the task has no preferred format (None), any choice scores 1.0.
39
+ """
40
+ if task.preferred_format is None:
41
+ return 1.0
42
+ return 1.0 if chosen_format == task.preferred_format else 0.3
43
+
44
+
45
+ def marimo_structure(code: str, task: Task) -> float:
46
+ """Score structural quality of a marimo notebook (0-1)."""
47
+ score = 0.0
48
+ if "import marimo" in code or "from marimo" in code:
49
+ score += 0.2
50
+ if "marimo.App" in code or "mo.App" in code:
51
+ score += 0.1
52
+ cell_count = code.count("@app.cell")
53
+ if cell_count >= 3:
54
+ score += 0.2
55
+ elif cell_count >= 1:
56
+ score += 0.1
57
+ ui_patterns = ["mo.ui.", "mo.md(", "mo.Html", "mo.accordion", "mo.callout"]
58
+ ui_hits = sum(1 for p in ui_patterns if p in code)
59
+ score += min(0.2, ui_hits * 0.05)
60
+ viz_patterns = ["plt.", "px.", "altair", "matplotlib", "plotly", "mo.ui.slider"]
61
+ viz_hits = sum(1 for p in viz_patterns if p in code)
62
+ if task.data_available and viz_hits > 0:
63
+ score += 0.2
64
+ elif viz_hits > 0:
65
+ score += 0.1
66
+ if task.tier == "advanced" and cell_count >= 6:
67
+ score += 0.1
68
+ elif task.tier == "intermediate" and cell_count >= 4:
69
+ score += 0.1
70
+ elif task.tier == "beginner" and cell_count >= 2:
71
+ score += 0.1
72
+ return min(1.0, score)
73
+
74
+
75
+ def manim_structure(code: str, task: Task) -> float:
76
+ """Score structural quality of a manim scene (0-1)."""
77
+ from .sandbox import extract_scene_class
78
+
79
+ score = 0.0
80
+ if "from manim" in code or "import manim" in code:
81
+ score += 0.2
82
+ if extract_scene_class(code) is not None:
83
+ score += 0.2
84
+ if "def construct" in code:
85
+ score += 0.1
86
+ anim_patterns = [
87
+ "self.play(",
88
+ "self.wait(",
89
+ "Create(",
90
+ "FadeIn(",
91
+ "FadeOut(",
92
+ "Transform(",
93
+ "Write(",
94
+ "MoveToTarget",
95
+ "Indicate(",
96
+ "ReplacementTransform(",
97
+ ]
98
+ anim_hits = sum(1 for p in anim_patterns if p in code)
99
+ score += min(0.3, anim_hits * 0.05)
100
+ math_patterns = ["MathTex(", "Tex(", "Axes(", "NumberPlane(", "Graph("]
101
+ math_hits = sum(1 for p in math_patterns if p in code)
102
+ if math_hits > 0:
103
+ score += 0.1
104
+ if task.tier == "advanced" and anim_hits >= 6:
105
+ score += 0.1
106
+ elif task.tier == "intermediate" and anim_hits >= 4:
107
+ score += 0.1
108
+ elif task.tier == "beginner" and anim_hits >= 2:
109
+ score += 0.1
110
+ return min(1.0, score)
111
+
112
+
113
+ def structure_score(code: str, fmt: str, task: Task) -> float:
114
+ if fmt == "marimo":
115
+ return marimo_structure(code, task)
116
+ return manim_structure(code, task)
117
+
118
+
119
+ def narration_score(narration: str, fmt: str) -> float:
120
+ """Score narration quality. Only relevant for manim format."""
121
+ if fmt != "manim":
122
+ return 1.0
123
+ if not narration or not narration.strip():
124
+ return 0.0
125
+ score = 0.0
126
+ words = narration.split()
127
+ if len(words) >= 30:
128
+ score += 0.4
129
+ elif len(words) >= 10:
130
+ score += 0.2
131
+ scene_markers = ["scene", "step", "first", "next", "then", "finally", "now"]
132
+ marker_hits = sum(1 for m in scene_markers if m in narration.lower())
133
+ score += min(0.3, marker_hits * 0.1)
134
+ if len(words) >= 50:
135
+ score += 0.3
136
+ elif len(words) >= 20:
137
+ score += 0.15
138
+ return min(1.0, score)
139
+
140
+
141
+ def context_usage(code: str, accumulated_context: list[str]) -> float:
142
+ """Score whether the generated code incorporates research findings (0-1).
143
+
144
+ Higher score if the code references terms found during exploration.
145
+ """
146
+ if not accumulated_context:
147
+ return 0.5 # no exploration context to compare against
148
+
149
+ context_words: set[str] = set()
150
+ for ctx in accumulated_context:
151
+ context_words.update(
152
+ w.lower() for w in ctx.split() if len(w) > 3
153
+ )
154
+
155
+ if not context_words:
156
+ return 0.5
157
+
158
+ code_words = set(w.lower() for w in code.split() if len(w) > 3)
159
+ overlap = code_words & context_words
160
+ return min(1.0, len(overlap) / max(len(context_words), 1) * 5)
161
+
162
+
163
+ # -- Weights for generation reward --
164
+ W_CODE_VALID = 0.15
165
+ W_CODE_RUNS = 0.15
166
+ W_COVERAGE = 0.15
167
+ W_FORMAT = 0.10
168
+ W_STRUCTURE = 0.15
169
+ W_NARRATION = 0.10
170
+ W_CONTEXT_USE = 0.20 # rewards using exploration findings
171
+
172
+
173
+ def compute_generate_reward(
174
+ code: str,
175
+ fmt: str,
176
+ narration: str,
177
+ task: Task,
178
+ exec_success: bool,
179
+ accumulated_context: list[str],
180
+ ) -> tuple[float, dict]:
181
+ """Compute the generation-phase reward. Returns (total, components)."""
182
+ c_valid = 1.0 if ast_parses(code) else 0.0
183
+ c_runs = 1.0 if exec_success else 0.0
184
+ c_coverage = keyword_coverage(code, task.keywords)
185
+ c_format = format_match(fmt, task)
186
+ c_struct = structure_score(code, fmt, task)
187
+ c_narr = narration_score(narration, fmt)
188
+ c_ctx = context_usage(code, accumulated_context)
189
+
190
+ # Redistribute narration weight to structure for marimo
191
+ if fmt == "marimo":
192
+ w_struct = W_STRUCTURE + W_NARRATION
193
+ w_narr = 0.0
194
+ else:
195
+ w_struct = W_STRUCTURE
196
+ w_narr = W_NARRATION
197
+
198
+ total = (
199
+ W_CODE_VALID * c_valid
200
+ + W_CODE_RUNS * c_runs
201
+ + W_COVERAGE * c_coverage
202
+ + W_FORMAT * c_format
203
+ + w_struct * c_struct
204
+ + w_narr * c_narr
205
+ + W_CONTEXT_USE * c_ctx
206
+ )
207
+
208
+ components = {
209
+ "code_valid": round(c_valid, 3),
210
+ "code_runs": round(c_runs, 3),
211
+ "coverage": round(c_coverage, 3),
212
+ "format_match": round(c_format, 3),
213
+ "structure": round(c_struct, 3),
214
+ "narration": round(c_narr, 3),
215
+ "context_usage": round(c_ctx, 3),
216
+ "generate_total": round(total, 4),
217
+ }
218
+ return total, components
rewards/llm_judge.py ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Optional LLM-as-judge for final explainability scoring.
2
+
3
+ This module is eval-only — it is NOT used in the training loop because
4
+ LLM judge calls are too slow and non-deterministic for RL reward signals.
5
+
6
+ Usage at eval time:
7
+ score, rationale = judge_explainability(code, topic, tier)
8
+
9
+ Requires an OpenAI-compatible endpoint (e.g. vLLM, ollama, or OpenAI API).
10
+ Set JUDGE_API_URL and optionally JUDGE_API_KEY environment variables.
11
+ """
12
+
13
+ from __future__ import annotations
14
+
15
+ import json
16
+ import os
17
+ import urllib.request
18
+
19
+ JUDGE_API_URL = os.environ.get("JUDGE_API_URL", "")
20
+ JUDGE_API_KEY = os.environ.get("JUDGE_API_KEY", "")
21
+ JUDGE_MODEL = os.environ.get("JUDGE_MODEL", "gpt-4o-mini")
22
+
23
+ JUDGE_PROMPT = """\
24
+ You are an expert educator evaluating the quality of an interactive explanation.
25
+
26
+ TOPIC: {topic}
27
+ AUDIENCE TIER: {tier}
28
+ FORMAT: {fmt}
29
+
30
+ CODE:
31
+ ```
32
+ {code}
33
+ ```
34
+
35
+ {narration_section}
36
+
37
+ Rate the explanation on a scale of 1-10 across these dimensions:
38
+ 1. **Clarity**: Is the concept explained clearly for the target audience tier?
39
+ 2. **Accuracy**: Is the content technically correct?
40
+ 3. **Engagement**: Does the code create an engaging, interactive experience?
41
+ 4. **Completeness**: Does it cover the key aspects of the topic?
42
+ 5. **Appropriateness**: Is the depth appropriate for the audience tier?
43
+
44
+ Respond in JSON format:
45
+ {{
46
+ "clarity": <1-10>,
47
+ "accuracy": <1-10>,
48
+ "engagement": <1-10>,
49
+ "completeness": <1-10>,
50
+ "appropriateness": <1-10>,
51
+ "overall": <1-10>,
52
+ "rationale": "<brief explanation>"
53
+ }}
54
+ """
55
+
56
+
57
+ def judge_explainability(
58
+ code: str,
59
+ topic: str,
60
+ tier: str = "intermediate",
61
+ fmt: str = "marimo",
62
+ narration: str = "",
63
+ api_url: str | None = None,
64
+ api_key: str | None = None,
65
+ model: str | None = None,
66
+ ) -> tuple[float, dict]:
67
+ """Score explainability using an LLM judge.
68
+
69
+ Returns (normalized_score, details) where normalized_score is 0.0-1.0
70
+ and details contains per-dimension scores and rationale.
71
+
72
+ Returns (0.0, {"error": ...}) if the judge is unavailable or fails.
73
+ """
74
+ url = api_url or JUDGE_API_URL
75
+ key = api_key or JUDGE_API_KEY
76
+ mdl = model or JUDGE_MODEL
77
+
78
+ if not url:
79
+ return 0.0, {"error": "JUDGE_API_URL not configured"}
80
+
81
+ narration_section = ""
82
+ if narration and fmt == "manim":
83
+ narration_section = f"NARRATION:\n{narration}"
84
+
85
+ prompt = JUDGE_PROMPT.format(
86
+ topic=topic,
87
+ tier=tier,
88
+ fmt=fmt,
89
+ code=code[:4000], # trim to avoid exceeding context
90
+ narration_section=narration_section,
91
+ )
92
+
93
+ payload = json.dumps({
94
+ "model": mdl,
95
+ "messages": [{"role": "user", "content": prompt}],
96
+ "temperature": 0.0,
97
+ "max_tokens": 300,
98
+ }).encode()
99
+
100
+ headers = {
101
+ "Content-Type": "application/json",
102
+ "User-Agent": "ExplainerEnv/1.0",
103
+ }
104
+ if key:
105
+ headers["Authorization"] = f"Bearer {key}"
106
+
107
+ try:
108
+ req = urllib.request.Request(
109
+ f"{url.rstrip('/')}/chat/completions",
110
+ data=payload,
111
+ headers=headers,
112
+ )
113
+ with urllib.request.urlopen(req, timeout=30) as resp:
114
+ data = json.loads(resp.read().decode())
115
+
116
+ content = data["choices"][0]["message"]["content"]
117
+ # Parse JSON from response (handle markdown code blocks)
118
+ content = content.strip()
119
+ if content.startswith("```"):
120
+ content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip()
121
+
122
+ scores = json.loads(content)
123
+ overall = scores.get("overall", 5) / 10.0
124
+ return overall, scores
125
+
126
+ except Exception as e:
127
+ return 0.0, {"error": str(e)}
128
+
129
+
130
+ def is_available() -> bool:
131
+ """Check if the LLM judge is configured."""
132
+ return bool(JUDGE_API_URL)
rewards/notes.ipynb ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "id": "c55af9de",
7
+ "metadata": {},
8
+ "outputs": [
9
+ {
10
+ "name": "stdout",
11
+ "output_type": "stream",
12
+ "text": [
13
+ "/Users/mmt10913/Personal/hackathons/openenv-hackathon/.venv/bin/python\n"
14
+ ]
15
+ }
16
+ ],
17
+ "source": [
18
+ "! which python"
19
+ ]
20
+ },
21
+ {
22
+ "cell_type": "code",
23
+ "execution_count": null,
24
+ "id": "4905024a",
25
+ "metadata": {},
26
+ "outputs": [],
27
+ "source": [
28
+ "from huggingface_hub import"
29
+ ]
30
+ }
31
+ ],
32
+ "metadata": {
33
+ "kernelspec": {
34
+ "display_name": ".venv",
35
+ "language": "python",
36
+ "name": "python3"
37
+ },
38
+ "language_info": {
39
+ "codemirror_mode": {
40
+ "name": "ipython",
41
+ "version": 3
42
+ },
43
+ "file_extension": ".py",
44
+ "mimetype": "text/x-python",
45
+ "name": "python",
46
+ "nbconvert_exporter": "python",
47
+ "pygments_lexer": "ipython3",
48
+ "version": "3.12.12"
49
+ }
50
+ },
51
+ "nbformat": 4,
52
+ "nbformat_minor": 5
53
+ }
rewards/sandbox.py ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Sandbox execution for marimo and manim code."""
2
+
3
+ import ast
4
+ import subprocess
5
+ import tempfile
6
+ from pathlib import Path
7
+
8
+
9
+ def ast_parses(code: str) -> bool:
10
+ """Check whether the code is valid Python (AST-parseable)."""
11
+ try:
12
+ ast.parse(code)
13
+ return True
14
+ except SyntaxError:
15
+ return False
16
+
17
+
18
+ def extract_scene_class(code: str) -> str | None:
19
+ """Return the first Scene subclass name found in manim code."""
20
+ try:
21
+ tree = ast.parse(code)
22
+ except SyntaxError:
23
+ return None
24
+ for node in ast.walk(tree):
25
+ if isinstance(node, ast.ClassDef):
26
+ for base in node.bases:
27
+ base_name = ""
28
+ if isinstance(base, ast.Name):
29
+ base_name = base.id
30
+ elif isinstance(base, ast.Attribute):
31
+ base_name = base.attr
32
+ if "Scene" in base_name:
33
+ return node.name
34
+ return None
35
+
36
+
37
+ def run_marimo(code: str, timeout: int = 15) -> tuple[bool, str]:
38
+ """Try exporting a marimo notebook to HTML. Returns (success, message)."""
39
+ with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
40
+ f.write(code)
41
+ f.flush()
42
+ tmp = f.name
43
+ try:
44
+ result = subprocess.run(
45
+ ["marimo", "export", "html", tmp],
46
+ capture_output=True,
47
+ text=True,
48
+ timeout=timeout,
49
+ )
50
+ if result.returncode == 0:
51
+ return True, "marimo export succeeded"
52
+ return False, result.stderr[:500]
53
+ except FileNotFoundError:
54
+ return False, "marimo not installed"
55
+ except subprocess.TimeoutExpired:
56
+ return False, "marimo export timed out"
57
+ finally:
58
+ Path(tmp).unlink(missing_ok=True)
59
+
60
+
61
+ def run_manim(code: str, timeout: int = 30) -> tuple[bool, str]:
62
+ """Try rendering a manim scene (low quality). Returns (success, message)."""
63
+ scene = extract_scene_class(code)
64
+ if scene is None:
65
+ return False, "No Scene subclass found in code"
66
+
67
+ with tempfile.TemporaryDirectory() as tmpdir:
68
+ src = Path(tmpdir) / "scene.py"
69
+ src.write_text(code)
70
+ try:
71
+ result = subprocess.run(
72
+ ["manim", "render", "-ql", "--media_dir", tmpdir, str(src), scene],
73
+ capture_output=True,
74
+ text=True,
75
+ timeout=timeout,
76
+ )
77
+ if result.returncode == 0:
78
+ return True, "manim render succeeded"
79
+ return False, result.stderr[:500]
80
+ except FileNotFoundError:
81
+ return False, "manim not installed"
82
+ except subprocess.TimeoutExpired:
83
+ return False, "manim render timed out"
rewards/sources.py ADDED
@@ -0,0 +1,321 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Async search sources for the exploration phase.
2
+
3
+ Two backends:
4
+ - HuggingFace Papers: ML-focused semantic search via huggingface_hub
5
+ - Wikipedia: general topics via wikipediaapi (section-level + BM25 RAG)
6
+
7
+ The agent's query is routed to the most appropriate source, or the agent
8
+ can specify a source prefix (e.g. "wiki: merge sort", "hf: attention").
9
+
10
+ All external calls use async I/O (httpx / wikipediaapi.AsyncWikipedia).
11
+ """
12
+
13
+ from __future__ import annotations
14
+
15
+ import math
16
+ import re
17
+ from collections import Counter
18
+
19
+ import httpx
20
+ import wikipediaapi
21
+
22
+ HF_MAX_RESULTS = 1
23
+ WIKI_TOP_SECTIONS = 3
24
+
25
+ # BM25 parameters
26
+ _BM25_K1 = 1.5
27
+ _BM25_B = 0.75
28
+
29
+
30
+ # ---------------------------------------------------------------------------
31
+ # BM25 scoring (pure Python, no external deps)
32
+ # ---------------------------------------------------------------------------
33
+
34
+ _STOP_WORDS = {
35
+ "the", "a", "an", "is", "are", "was", "were", "be", "been", "being",
36
+ "have", "has", "had", "do", "does", "did", "will", "would", "could",
37
+ "should", "may", "might", "shall", "can", "need", "dare", "ought",
38
+ "to", "of", "in", "for", "on", "with", "at", "by", "from", "as",
39
+ "into", "through", "during", "before", "after", "and", "but", "or",
40
+ "not", "no", "nor", "so", "yet", "both", "either", "neither",
41
+ "this", "that", "these", "those", "it", "its", "he", "she", "they",
42
+ }
43
+
44
+
45
+ def _tokenize(text: str) -> list[str]:
46
+ """Lowercase alphanumeric tokenization, stop words removed."""
47
+ return [w for w in re.findall(r"\w+", text.lower()) if w not in _STOP_WORDS and len(w) > 1]
48
+
49
+
50
+ def _bm25_rank(
51
+ query: str, documents: list[tuple[str, str]], top_k: int = 3
52
+ ) -> list[tuple[float, str, str]]:
53
+ """Rank (title, text) documents against query using BM25.
54
+
55
+ Returns top_k results sorted by score descending.
56
+ """
57
+ if not documents:
58
+ return []
59
+
60
+ query_terms = _tokenize(query)
61
+ if not query_terms:
62
+ return [(0.0, t, txt) for t, txt in documents[:top_k]]
63
+
64
+ # Precompute document token stats
65
+ doc_tokens = [_tokenize(f"{title} {text}") for title, text in documents]
66
+ doc_lengths = [len(t) for t in doc_tokens]
67
+ avgdl = sum(doc_lengths) / max(len(doc_lengths), 1)
68
+ n_docs = len(documents)
69
+
70
+ # Document frequency per query term
71
+ df: dict[str, int] = {}
72
+ for term in set(query_terms):
73
+ df[term] = sum(1 for tokens in doc_tokens if term in tokens)
74
+
75
+ # Score each document
76
+ scored: list[tuple[float, str, str]] = []
77
+ for i, (title, text) in enumerate(documents):
78
+ tf_counts = Counter(doc_tokens[i])
79
+ dl = doc_lengths[i]
80
+ score = 0.0
81
+ for term in query_terms:
82
+ if term not in df or df[term] == 0:
83
+ continue
84
+ idf = math.log((n_docs - df[term] + 0.5) / (df[term] + 0.5) + 1.0)
85
+ tf = tf_counts.get(term, 0)
86
+ numerator = tf * (_BM25_K1 + 1)
87
+ denominator = tf + _BM25_K1 * (1 - _BM25_B + _BM25_B * dl / max(avgdl, 1))
88
+ score += idf * numerator / denominator
89
+ scored.append((score, title, text))
90
+
91
+ scored.sort(key=lambda x: x[0], reverse=True)
92
+ return scored[:top_k]
93
+
94
+
95
+ # ---------------------------------------------------------------------------
96
+ # Wikipedia section flattening
97
+ # ---------------------------------------------------------------------------
98
+
99
+ _SKIP_SECTIONS = {
100
+ "references", "external links", "see also", "further reading",
101
+ "notes", "citations", "bibliography", "sources",
102
+ }
103
+
104
+
105
+ def _flatten_sections(
106
+ sections: list[wikipediaapi.WikipediaPageSection],
107
+ max_depth: int = 2,
108
+ _depth: int = 0,
109
+ ) -> list[tuple[str, str]]:
110
+ """Flatten Wikipedia section tree into (title, text) pairs."""
111
+ result: list[tuple[str, str]] = []
112
+ for section in sections:
113
+ if section.title.lower() in _SKIP_SECTIONS:
114
+ continue
115
+ if section.text.strip():
116
+ result.append((section.title, section.text.strip()))
117
+ if _depth < max_depth and section.sections:
118
+ result.extend(
119
+ _flatten_sections(section.sections, max_depth, _depth + 1)
120
+ )
121
+ return result
122
+
123
+
124
+ # ---------------------------------------------------------------------------
125
+ # Wikipedia (async, section-level BM25)
126
+ # ---------------------------------------------------------------------------
127
+
128
+ async def search_wikipedia(
129
+ query: str, top_sections: int = WIKI_TOP_SECTIONS
130
+ ) -> str:
131
+ """Search Wikipedia and return the most relevant sections via BM25.
132
+
133
+ Flow: search(query) -> top page -> get sections -> BM25 rank -> top-k.
134
+ """
135
+ try:
136
+ wiki = wikipediaapi.AsyncWikipedia(
137
+ user_agent="ExplainerEnv/1.0 (hackathon project)",
138
+ language="en",
139
+ )
140
+
141
+ # Search for the top page
142
+ search_results = await wiki.search(query, limit=1)
143
+ if not search_results or not search_results.pages:
144
+ return f"No Wikipedia results for: {query}"
145
+
146
+ # pages is a dict keyed by title
147
+ title = next(iter(search_results.pages))
148
+ page = wiki.page(title)
149
+
150
+ # Check page exists
151
+ exists = await page.exists()
152
+ if not exists:
153
+ return f"No Wikipedia article found for: {query}"
154
+
155
+ # Get summary + sections
156
+ summary = await page.summary
157
+ sections = await page.sections
158
+
159
+ # Build document list: summary as first doc, then flattened sections
160
+ docs: list[tuple[str, str]] = []
161
+ if summary:
162
+ docs.append((title, summary))
163
+ docs.extend(_flatten_sections(sections))
164
+
165
+ if not docs:
166
+ return f"Wikipedia article '{title}' has no content."
167
+
168
+ # BM25 rank sections against query
169
+ ranked = _bm25_rank(query, docs, top_k=top_sections)
170
+
171
+ parts = []
172
+ for score, sec_title, sec_text in ranked:
173
+ # Truncate long sections to keep total size reasonable
174
+ trimmed = sec_text[:800] if len(sec_text) > 800 else sec_text
175
+ parts.append(f"## {sec_title}\n{trimmed}")
176
+
177
+ return f"Wikipedia: {title}\n\n" + "\n\n---\n\n".join(parts)
178
+
179
+ except Exception as e:
180
+ return f"Wikipedia search error: {e}"
181
+
182
+
183
+ # ---------------------------------------------------------------------------
184
+ # HuggingFace Papers (async, httpx + read_paper)
185
+ # ---------------------------------------------------------------------------
186
+
187
+ async def search_hf_papers(
188
+ query: str, max_results: int = HF_MAX_RESULTS
189
+ ) -> str:
190
+ """Search HuggingFace Papers (semantic search) and read top result's content.
191
+
192
+ Flow: search(query) -> top paper ID -> read_paper(id) -> BM25 chunk.
193
+ """
194
+ try:
195
+ async with httpx.AsyncClient(timeout=15.0) as client:
196
+ # 1. Search for papers
197
+ resp = await client.get(
198
+ "https://huggingface.co/api/papers/search",
199
+ params={"q": query, "limit": max_results},
200
+ headers={"User-Agent": "ExplainerEnv/1.0"},
201
+ )
202
+ resp.raise_for_status()
203
+ papers = resp.json()
204
+
205
+ if not papers:
206
+ return f"No HF papers found for: {query}"
207
+
208
+ paper = papers[0]
209
+ paper_id = paper.get("id", "")
210
+ title = paper.get("title", "Untitled")
211
+ summary = paper.get("summary", "")
212
+
213
+ if not paper_id:
214
+ # No paper ID — return just the search result
215
+ return f"Title: {title}\nAbstract: {summary[:600]}"
216
+
217
+ # 2. Read paper markdown content
218
+ md_resp = await client.get(
219
+ f"https://huggingface.co/papers/{paper_id}.md",
220
+ headers={"User-Agent": "ExplainerEnv/1.0"},
221
+ follow_redirects=True,
222
+ )
223
+
224
+ if md_resp.status_code == 200 and md_resp.text.strip():
225
+ md_content = md_resp.text
226
+ # Chunk markdown by headings
227
+ chunks = _chunk_markdown(md_content)
228
+ if chunks:
229
+ ranked = _bm25_rank(query, chunks, top_k=3)
230
+ parts = [f"Title: {title}\nPaper ID: {paper_id}\n"]
231
+ for _score, sec_title, sec_text in ranked:
232
+ trimmed = sec_text[:800] if len(sec_text) > 800 else sec_text
233
+ parts.append(f"## {sec_title}\n{trimmed}")
234
+ return "\n\n---\n\n".join(parts)
235
+
236
+ # Fallback: return abstract only
237
+ return (
238
+ f"Title: {title}\n"
239
+ f"Paper ID: {paper_id}\n"
240
+ f"Abstract: {summary[:600]}"
241
+ )
242
+
243
+ except Exception as e:
244
+ return f"HF Papers search error: {e}"
245
+
246
+
247
+ def _chunk_markdown(md_text: str) -> list[tuple[str, str]]:
248
+ """Split markdown text into (heading, body) chunks."""
249
+ chunks: list[tuple[str, str]] = []
250
+ current_heading = "Introduction"
251
+ current_lines: list[str] = []
252
+
253
+ for line in md_text.split("\n"):
254
+ if line.startswith("#"):
255
+ # Save previous chunk
256
+ body = "\n".join(current_lines).strip()
257
+ if body:
258
+ chunks.append((current_heading, body))
259
+ # Start new chunk
260
+ current_heading = line.lstrip("#").strip() or "Section"
261
+ current_lines = []
262
+ else:
263
+ current_lines.append(line)
264
+
265
+ # Save last chunk
266
+ body = "\n".join(current_lines).strip()
267
+ if body:
268
+ chunks.append((current_heading, body))
269
+
270
+ return chunks
271
+
272
+
273
+ # ---------------------------------------------------------------------------
274
+ # Router
275
+ # ---------------------------------------------------------------------------
276
+
277
+ # Keywords that suggest ML/AI topics (used when category is not available)
278
+ _ML_KEYWORDS = {
279
+ "neural", "network", "transformer", "attention", "embedding", "gradient",
280
+ "backpropagation", "cnn", "rnn", "lstm", "gpt", "bert", "diffusion",
281
+ "reinforcement", "generative", "discriminative", "autoencoder", "vae",
282
+ "gan", "fine-tuning", "pretraining", "tokenizer", "llm", "rlhf",
283
+ "classification", "regression", "clustering", "deep learning",
284
+ "machine learning", "optimization", "sgd", "adam", "batch normalization",
285
+ }
286
+
287
+
288
+ def _is_ml_topic(query: str) -> bool:
289
+ """Heuristic: does the query look like an ML/AI topic?"""
290
+ query_lower = query.lower()
291
+ return any(kw in query_lower for kw in _ML_KEYWORDS)
292
+
293
+
294
+ async def search(query: str, category_hint: str = "") -> str:
295
+ """Route a search query to the best source.
296
+
297
+ The agent can override by prefixing the query:
298
+ - "hf: attention mechanism" -> HF Papers only
299
+ - "wiki: merge sort" -> Wikipedia only
300
+
301
+ Otherwise, uses keyword heuristic to route ML topics to HF Papers
302
+ and everything else to Wikipedia.
303
+ """
304
+ query = query.strip()
305
+
306
+ # Explicit source prefix
307
+ lower = query.lower()
308
+ if lower.startswith("hf:"):
309
+ return await search_hf_papers(query[3:].strip())
310
+ if lower.startswith("wiki:"):
311
+ return await search_wikipedia(query[5:].strip())
312
+
313
+ # Auto-route based on keyword heuristic
314
+ if _is_ml_topic(query) or _is_ml_topic(category_hint):
315
+ hf = await search_hf_papers(query)
316
+ if "error" in hf.lower() or "no hf papers" in hf.lower():
317
+ return await search_wikipedia(query)
318
+ return hf
319
+
320
+ # Default: Wikipedia
321
+ return await search_wikipedia(query)
server/explainer_env_environment.py CHANGED
@@ -1,17 +1,19 @@
1
  """
2
- Research → Interactive Explainer Environment.
3
 
4
- The agent receives a topic and generates interactive educational content
5
- as either a Marimo notebook or Manim animation (with narration script).
6
- Reward is computed from 6 components: code_valid, code_runs, coverage,
7
- format_match, structure, and narration.
 
 
 
 
 
 
8
  """
9
 
10
- import ast
11
  import random
12
- import subprocess
13
- import tempfile
14
- from pathlib import Path
15
  from uuid import uuid4
16
 
17
  from openenv.core.env_server.interfaces import Environment
@@ -19,284 +21,31 @@ from openenv.core.env_server.types import State
19
 
20
  try:
21
  from ..models import ExplainerAction, ExplainerObservation
 
 
 
 
22
  from ..task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
23
  except ImportError:
24
  from models import ExplainerAction, ExplainerObservation
 
 
 
 
25
  from task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
26
 
27
- # ---------------------------------------------------------------------------
28
- # Reward helpers
29
- # ---------------------------------------------------------------------------
30
-
31
- MAX_STEPS = 1 # single-turn for now
32
-
33
-
34
- def _ast_parses(code: str) -> bool:
35
- """Check whether the code is valid Python (AST-parseable)."""
36
- try:
37
- ast.parse(code)
38
- return True
39
- except SyntaxError:
40
- return False
41
-
42
-
43
- def _run_marimo(code: str, timeout: int = 15) -> tuple[bool, str]:
44
- """Try exporting a marimo notebook to HTML. Returns (success, message)."""
45
- with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
46
- f.write(code)
47
- f.flush()
48
- tmp = f.name
49
- try:
50
- result = subprocess.run(
51
- ["marimo", "export", "html", tmp],
52
- capture_output=True,
53
- text=True,
54
- timeout=timeout,
55
- )
56
- if result.returncode == 0:
57
- return True, "marimo export succeeded"
58
- return False, result.stderr[:500]
59
- except FileNotFoundError:
60
- return False, "marimo not installed"
61
- except subprocess.TimeoutExpired:
62
- return False, "marimo export timed out"
63
- finally:
64
- Path(tmp).unlink(missing_ok=True)
65
-
66
-
67
- def _extract_scene_class(code: str) -> str | None:
68
- """Return the first Scene subclass name found in the code."""
69
- try:
70
- tree = ast.parse(code)
71
- except SyntaxError:
72
- return None
73
- for node in ast.walk(tree):
74
- if isinstance(node, ast.ClassDef):
75
- for base in node.bases:
76
- base_name = ""
77
- if isinstance(base, ast.Name):
78
- base_name = base.id
79
- elif isinstance(base, ast.Attribute):
80
- base_name = base.attr
81
- if "Scene" in base_name:
82
- return node.name
83
- return None
84
-
85
-
86
- def _run_manim(code: str, timeout: int = 30) -> tuple[bool, str]:
87
- """Try rendering a manim scene (low quality). Returns (success, message)."""
88
- scene = _extract_scene_class(code)
89
- if scene is None:
90
- return False, "No Scene subclass found in code"
91
-
92
- with tempfile.TemporaryDirectory() as tmpdir:
93
- src = Path(tmpdir) / "scene.py"
94
- src.write_text(code)
95
- try:
96
- result = subprocess.run(
97
- ["manim", "render", "-ql", "--media_dir", tmpdir, str(src), scene],
98
- capture_output=True,
99
- text=True,
100
- timeout=timeout,
101
- )
102
- if result.returncode == 0:
103
- return True, "manim render succeeded"
104
- return False, result.stderr[:500]
105
- except FileNotFoundError:
106
- return False, "manim not installed"
107
- except subprocess.TimeoutExpired:
108
- return False, "manim render timed out"
109
-
110
-
111
- def _keyword_coverage(code: str, keywords_csv: str) -> float:
112
- """Fraction of task keywords mentioned in the code (case-insensitive)."""
113
- if not keywords_csv:
114
- return 0.0
115
- keywords = [k.strip().lower() for k in keywords_csv.split(",") if k.strip()]
116
- if not keywords:
117
- return 0.0
118
- code_lower = code.lower()
119
- hits = sum(1 for kw in keywords if kw in code_lower)
120
- return hits / len(keywords)
121
-
122
-
123
- def _format_match_score(chosen_format: str, task: Task) -> float:
124
- """1.0 if format matches the task's preferred format, else 0.3."""
125
- return 1.0 if chosen_format == task.preferred_format else 0.3
126
-
127
-
128
- def _marimo_structure(code: str, task: Task) -> float:
129
- """Score structural quality of a marimo notebook (0-1)."""
130
- score = 0.0
131
- # Has marimo import
132
- if "import marimo" in code or "from marimo" in code:
133
- score += 0.2
134
- # Has app = marimo.App()
135
- if "marimo.App" in code or "mo.App" in code:
136
- score += 0.1
137
- # Cell count: look for @app.cell decorators
138
- cell_count = code.count("@app.cell")
139
- if cell_count >= 3:
140
- score += 0.2
141
- elif cell_count >= 1:
142
- score += 0.1
143
- # Interactive elements
144
- ui_patterns = ["mo.ui.", "mo.md(", "mo.Html", "mo.accordion", "mo.callout"]
145
- ui_hits = sum(1 for p in ui_patterns if p in code)
146
- score += min(0.2, ui_hits * 0.05)
147
- # Data visualization when data_available
148
- viz_patterns = ["plt.", "px.", "altair", "matplotlib", "plotly", "mo.ui.slider"]
149
- viz_hits = sum(1 for p in viz_patterns if p in code)
150
- if task.data_available and viz_hits > 0:
151
- score += 0.2
152
- elif viz_hits > 0:
153
- score += 0.1
154
- # Tier depth: advanced should have more cells
155
- if task.tier == "advanced" and cell_count >= 6:
156
- score += 0.1
157
- elif task.tier == "intermediate" and cell_count >= 4:
158
- score += 0.1
159
- elif task.tier == "beginner" and cell_count >= 2:
160
- score += 0.1
161
- return min(1.0, score)
162
-
163
-
164
- def _manim_structure(code: str, task: Task) -> float:
165
- """Score structural quality of a manim scene (0-1)."""
166
- score = 0.0
167
- # Has manim import
168
- if "from manim" in code or "import manim" in code:
169
- score += 0.2
170
- # Has Scene subclass
171
- if _extract_scene_class(code) is not None:
172
- score += 0.2
173
- # Has construct method
174
- if "def construct" in code:
175
- score += 0.1
176
- # Animation calls
177
- anim_patterns = [
178
- "self.play(",
179
- "self.wait(",
180
- "Create(",
181
- "FadeIn(",
182
- "FadeOut(",
183
- "Transform(",
184
- "Write(",
185
- "MoveToTarget",
186
- "Indicate(",
187
- "ReplacementTransform(",
188
- ]
189
- anim_hits = sum(1 for p in anim_patterns if p in code)
190
- score += min(0.3, anim_hits * 0.05)
191
- # Math objects for math topics
192
- math_patterns = ["MathTex(", "Tex(", "Axes(", "NumberPlane(", "Graph("]
193
- math_hits = sum(1 for p in math_patterns if p in code)
194
- if task.category.startswith("math") and math_hits > 0:
195
- score += 0.1
196
- elif math_hits > 0:
197
- score += 0.05
198
- # Tier depth
199
- if task.tier == "advanced" and anim_hits >= 6:
200
- score += 0.1
201
- elif task.tier == "intermediate" and anim_hits >= 4:
202
- score += 0.1
203
- elif task.tier == "beginner" and anim_hits >= 2:
204
- score += 0.1
205
- return min(1.0, score)
206
-
207
-
208
- def _structure_score(code: str, fmt: str, task: Task) -> float:
209
- if fmt == "marimo":
210
- return _marimo_structure(code, task)
211
- return _manim_structure(code, task)
212
-
213
-
214
- def _narration_score(narration: str, fmt: str) -> float:
215
- """Score narration quality. Only relevant for manim format."""
216
- if fmt != "manim":
217
- return 1.0 # full marks when narration not applicable
218
- if not narration or not narration.strip():
219
- return 0.0
220
- score = 0.0
221
- words = narration.split()
222
- # Has meaningful length
223
- if len(words) >= 30:
224
- score += 0.4
225
- elif len(words) >= 10:
226
- score += 0.2
227
- # Has scene markers or structure
228
- scene_markers = ["scene", "step", "first", "next", "then", "finally", "now"]
229
- marker_hits = sum(1 for m in scene_markers if m in narration.lower())
230
- score += min(0.3, marker_hits * 0.1)
231
- # Proportional to code complexity (rough heuristic)
232
- if len(words) >= 50:
233
- score += 0.3
234
- elif len(words) >= 20:
235
- score += 0.15
236
- return min(1.0, score)
237
-
238
-
239
- # Reward weights
240
- W_CODE_VALID = 0.20
241
- W_CODE_RUNS = 0.20
242
- W_COVERAGE = 0.20
243
- W_FORMAT = 0.15
244
- W_STRUCTURE = 0.15
245
- W_NARRATION = 0.10
246
-
247
-
248
- def compute_reward(
249
- action: ExplainerAction, task: Task, exec_success: bool
250
- ) -> tuple[float, dict]:
251
- """Compute the 6-component reward. Returns (total, components_dict)."""
252
- code_valid = 1.0 if _ast_parses(action.code) else 0.0
253
- code_runs = 1.0 if exec_success else 0.0
254
- coverage = _keyword_coverage(action.code, task.keywords)
255
- fmt_match = _format_match_score(action.format, task)
256
- structure = _structure_score(action.code, action.format, task)
257
- narration = _narration_score(action.narration, action.format)
258
-
259
- # When format is marimo, redistribute narration weight to structure
260
- if action.format == "marimo":
261
- w_struct = W_STRUCTURE + W_NARRATION
262
- w_narr = 0.0
263
- else:
264
- w_struct = W_STRUCTURE
265
- w_narr = W_NARRATION
266
-
267
- total = (
268
- W_CODE_VALID * code_valid
269
- + W_CODE_RUNS * code_runs
270
- + W_COVERAGE * coverage
271
- + W_FORMAT * fmt_match
272
- + w_struct * structure
273
- + w_narr * narration
274
- )
275
-
276
- components = {
277
- "code_valid": round(code_valid, 3),
278
- "code_runs": round(code_runs, 3),
279
- "coverage": round(coverage, 3),
280
- "format_match": round(fmt_match, 3),
281
- "structure": round(structure, 3),
282
- "narration": round(narration, 3),
283
- "total": round(total, 4),
284
- }
285
- return total, components
286
-
287
-
288
- # ---------------------------------------------------------------------------
289
- # Environment
290
- # ---------------------------------------------------------------------------
291
 
292
 
293
  class ExplainerEnvironment(Environment):
294
  """
295
- Research → Interactive Explainer environment.
 
 
 
296
 
297
- reset() samples a task from the task bank and returns it as an observation.
298
- step() receives the agent's generated code, executes it in a sandbox,
299
- computes the multi-component reward, and returns feedback.
300
  """
301
 
302
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
@@ -305,15 +54,108 @@ class ExplainerEnvironment(Environment):
305
  super().__init__()
306
  self._state = State(episode_id=str(uuid4()), step_count=0)
307
  self._current_task: Task | None = None
308
- self._difficulty_pool: list[Task] = EASY_TASKS # start easy
 
 
 
 
 
 
309
 
310
  def reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
311
- """Sample a task and return the initial observation."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
312
  self._state = State(
313
  episode_id=episode_id or str(uuid4()), step_count=0
314
  )
 
 
315
 
316
- # Allow caller to set difficulty via kwargs
317
  difficulty = kwargs.get("difficulty", None)
318
  if difficulty == "medium":
319
  pool = MEDIUM_TASKS
@@ -324,11 +166,7 @@ class ExplainerEnvironment(Environment):
324
  else:
325
  pool = self._difficulty_pool
326
 
327
- if seed is not None:
328
- rng = random.Random(seed)
329
- else:
330
- rng = random.Random()
331
-
332
  self._current_task = rng.choice(pool) if pool else rng.choice(ALL_TASKS)
333
 
334
  t = self._current_task
@@ -337,80 +175,160 @@ class ExplainerEnvironment(Environment):
337
  content=t.content,
338
  tier=t.tier,
339
  keywords=t.keywords,
340
- category=t.category,
341
  data_available=t.data_available,
342
- feedback="",
 
 
 
 
343
  done=False,
344
  reward=0.0,
345
  )
346
 
347
- def step(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
348
- """Execute the agent's code, compute reward, return feedback."""
349
- self._state.step_count += 1
350
- task = self._current_task
 
 
 
 
 
351
 
352
- if task is None:
353
- return ExplainerObservation(
354
- feedback="Error: no task set. Call reset() first.",
355
- done=True,
356
- reward=-1.0,
 
 
 
 
357
  )
358
 
359
- try:
360
- # 1. Check if code parses
361
- parses = _ast_parses(action.code)
362
-
363
- # 2. Try to run the code
364
- exec_success = False
365
- exec_msg = ""
366
- if parses:
367
- if action.format == "marimo":
368
- exec_success, exec_msg = _run_marimo(action.code)
369
- elif action.format == "manim":
370
- exec_success, exec_msg = _run_manim(action.code)
371
- else:
372
- exec_msg = "Code has syntax errors and cannot be parsed."
373
 
374
- # 3. Compute reward
375
- reward, components = compute_reward(action, task, exec_success)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
376
 
377
- # 4. Build feedback
378
- feedback_parts = []
379
- if not parses:
380
- feedback_parts.append("SYNTAX ERROR: code does not parse.")
381
- elif not exec_success:
382
- feedback_parts.append(f"EXECUTION FAILED: {exec_msg}")
383
- else:
384
- feedback_parts.append(f"EXECUTION OK: {exec_msg}")
385
- feedback_parts.append(
386
- f"Reward breakdown: {', '.join(f'{k}={v}' for k, v in components.items())}"
387
- )
388
- feedback = "\n".join(feedback_parts)
389
 
390
- done = self._state.step_count >= MAX_STEPS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
391
 
392
- return ExplainerObservation(
393
- topic=task.topic,
394
- content=task.content,
395
- tier=task.tier,
396
- keywords=task.keywords,
397
- category=task.category,
398
- data_available=task.data_available,
399
- feedback=feedback,
400
- done=done,
401
- reward=reward,
402
- metadata={"step": self._state.step_count, **components},
403
- )
 
404
 
405
- except Exception as e:
406
- return ExplainerObservation(
407
- topic=task.topic if task else "",
408
- content="",
409
- tier="beginner",
410
- feedback=f"Environment error: {e}",
411
- done=True,
412
- reward=0.0,
413
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
414
 
415
  @property
416
  def state(self) -> State:
 
1
  """
2
+ Research → Interactive Explainer Environment (multi-step, async).
3
 
4
+ Episode flow:
5
+ 1. reset() agent gets a topic + tier
6
+ 2. step(explore) × 1..MAX_EXPLORE agent searches, gets papers back
7
+ 3. step(generate) × 1 → agent produces marimo/manim code → episode ends
8
+
9
+ Each step returns a per-step reward. The final generate step also includes
10
+ a generation reward that accounts for how well the code uses the research.
11
+
12
+ The environment supports async via reset_async() / step_async() overrides.
13
+ OpenEnv's HTTP server detects these and calls them directly (no thread pool).
14
  """
15
 
 
16
  import random
 
 
 
17
  from uuid import uuid4
18
 
19
  from openenv.core.env_server.interfaces import Environment
 
21
 
22
  try:
23
  from ..models import ExplainerAction, ExplainerObservation
24
+ from ..rewards.exploration import compute_explore_reward
25
+ from ..rewards.generation import compute_generate_reward
26
+ from ..rewards.sandbox import ast_parses, run_manim, run_marimo
27
+ from ..rewards.sources import search as search_sources
28
  from ..task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
29
  except ImportError:
30
  from models import ExplainerAction, ExplainerObservation
31
+ from rewards.exploration import compute_explore_reward
32
+ from rewards.generation import compute_generate_reward
33
+ from rewards.sandbox import ast_parses, run_manim, run_marimo
34
+ from rewards.sources import search as search_sources
35
  from task_bank import ALL_TASKS, EASY_TASKS, HARD_TASKS, MEDIUM_TASKS, Task
36
 
37
+ MAX_EXPLORE_STEPS = 3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
 
40
  class ExplainerEnvironment(Environment):
41
  """
42
+ Multi-step Research → Interactive Explainer environment.
43
+
44
+ Phase 1 (explore): agent issues search queries, receives papers/wiki sections.
45
+ Phase 2 (generate): agent produces marimo/manim code using the research.
46
 
47
+ Supports async via reset_async() / step_async() OpenEnv's server detects
48
+ the overrides and awaits them directly instead of using a thread pool.
 
49
  """
50
 
51
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
 
54
  super().__init__()
55
  self._state = State(episode_id=str(uuid4()), step_count=0)
56
  self._current_task: Task | None = None
57
+ self._difficulty_pool: list[Task] = EASY_TASKS
58
+ self._accumulated_context: list[str] = []
59
+ self._explore_steps: int = 0
60
+
61
+ # ------------------------------------------------------------------
62
+ # Sync interface (fallback — OpenEnv prefers async when overridden)
63
+ # ------------------------------------------------------------------
64
 
65
  def reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
66
+ """Sample a task and return the initial observation (sync)."""
67
+ return self._do_reset(seed=seed, episode_id=episode_id, **kwargs)
68
+
69
+ def step(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
70
+ """Route to explore or generate handler (sync — explore uses blocking fallback)."""
71
+ import asyncio
72
+ self._state.step_count += 1
73
+ task = self._current_task
74
+
75
+ if task is None:
76
+ return ExplainerObservation(
77
+ feedback="Error: no task set. Call reset() first.",
78
+ done=True,
79
+ reward=-1.0,
80
+ )
81
+
82
+ try:
83
+ if action.action_type == "explore":
84
+ # Run async explore in a new event loop for sync callers
85
+ return asyncio.run(self._handle_explore(action, task))
86
+ elif action.action_type == "generate":
87
+ return self._handle_generate(action, task)
88
+ else:
89
+ return self._make_obs(
90
+ task,
91
+ phase="explore",
92
+ feedback=f"Unknown action_type: {action.action_type}",
93
+ reward=0.0,
94
+ done=True,
95
+ )
96
+ except Exception as e:
97
+ return self._make_obs(
98
+ task,
99
+ phase="done",
100
+ feedback=f"Environment error: {e}",
101
+ reward=0.0,
102
+ done=True,
103
+ )
104
+
105
+ # ------------------------------------------------------------------
106
+ # Async interface (preferred — OpenEnv detects these overrides)
107
+ # ------------------------------------------------------------------
108
+
109
+ async def reset_async(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
110
+ """Sample a task and return the initial observation (async)."""
111
+ return self._do_reset(seed=seed, episode_id=episode_id, **kwargs)
112
+
113
+ async def step_async(self, action: ExplainerAction, timeout_s=None, **kwargs) -> ExplainerObservation:
114
+ """Route to explore or generate handler (async)."""
115
+ self._state.step_count += 1
116
+ task = self._current_task
117
+
118
+ if task is None:
119
+ return ExplainerObservation(
120
+ feedback="Error: no task set. Call reset() first.",
121
+ done=True,
122
+ reward=-1.0,
123
+ )
124
+
125
+ try:
126
+ if action.action_type == "explore":
127
+ return await self._handle_explore(action, task)
128
+ elif action.action_type == "generate":
129
+ return self._handle_generate(action, task)
130
+ else:
131
+ return self._make_obs(
132
+ task,
133
+ phase="explore",
134
+ feedback=f"Unknown action_type: {action.action_type}",
135
+ reward=0.0,
136
+ done=True,
137
+ )
138
+ except Exception as e:
139
+ return self._make_obs(
140
+ task,
141
+ phase="done",
142
+ feedback=f"Environment error: {e}",
143
+ reward=0.0,
144
+ done=True,
145
+ )
146
+
147
+ # ------------------------------------------------------------------
148
+ # Internal
149
+ # ------------------------------------------------------------------
150
+
151
+ def _do_reset(self, seed=None, episode_id=None, **kwargs) -> ExplainerObservation:
152
+ """Shared reset logic (no I/O, so sync is fine)."""
153
  self._state = State(
154
  episode_id=episode_id or str(uuid4()), step_count=0
155
  )
156
+ self._accumulated_context = []
157
+ self._explore_steps = 0
158
 
 
159
  difficulty = kwargs.get("difficulty", None)
160
  if difficulty == "medium":
161
  pool = MEDIUM_TASKS
 
166
  else:
167
  pool = self._difficulty_pool
168
 
169
+ rng = random.Random(seed) if seed is not None else random.Random()
 
 
 
 
170
  self._current_task = rng.choice(pool) if pool else rng.choice(ALL_TASKS)
171
 
172
  t = self._current_task
 
175
  content=t.content,
176
  tier=t.tier,
177
  keywords=t.keywords,
 
178
  data_available=t.data_available,
179
+ phase="explore",
180
+ feedback="Research phase: search for relevant papers before generating.",
181
+ search_results="",
182
+ explored_context="",
183
+ explore_steps_left=MAX_EXPLORE_STEPS,
184
  done=False,
185
  reward=0.0,
186
  )
187
 
188
+ async def _handle_explore(self, action: ExplainerAction, task: Task) -> ExplainerObservation:
189
+ """Process an explore action: search HF Papers/Wikipedia, score query."""
190
+ if self._explore_steps >= MAX_EXPLORE_STEPS:
191
+ return self._make_obs(
192
+ task,
193
+ phase="generate",
194
+ feedback="Max explore steps reached. You must now generate.",
195
+ reward=0.0,
196
+ )
197
 
198
+ self._explore_steps += 1
199
+ query = action.query.strip()
200
+
201
+ if not query:
202
+ return self._make_obs(
203
+ task,
204
+ phase="explore",
205
+ feedback="Empty query. Provide a search query.",
206
+ reward=0.0,
207
  )
208
 
209
+ # Search HF Papers / Wikipedia (async, routed by keyword heuristic)
210
+ results_text = await search_sources(query, category_hint=task.topic)
211
+ self._accumulated_context.append(results_text)
212
+
213
+ # Compute per-step exploration reward
214
+ reward, components = compute_explore_reward(
215
+ query=query,
216
+ result_text=results_text,
217
+ topic=task.topic,
218
+ keywords_csv=task.keywords,
219
+ task_content=task.content,
220
+ accumulated_context=self._accumulated_context,
221
+ )
 
222
 
223
+ steps_left = MAX_EXPLORE_STEPS - self._explore_steps
224
+ if steps_left > 0:
225
+ phase = "explore"
226
+ hint = f"{steps_left} explore step(s) left. Continue researching or generate."
227
+ else:
228
+ phase = "generate"
229
+ hint = "Max explore steps reached. You must now generate."
230
+
231
+ return self._make_obs(
232
+ task,
233
+ phase=phase,
234
+ feedback=f"{hint}\nReward: {components}",
235
+ search_results=results_text,
236
+ reward=reward,
237
+ metadata={"step": self._state.step_count, "phase": "explore", **components},
238
+ )
239
 
240
+ def _handle_generate(self, action: ExplainerAction, task: Task) -> ExplainerObservation:
241
+ """Process a generate action: run sandbox, compute generation reward."""
242
+ fmt = action.format or "marimo"
243
+ code = action.code
244
+ narration = action.narration
 
 
 
 
 
 
 
245
 
246
+ # Penalise generating without any exploration
247
+ if self._explore_steps == 0:
248
+ skip_penalty = -0.1
249
+ penalty_msg = "Warning: generating without any research. -0.1 penalty."
250
+ else:
251
+ skip_penalty = 0.0
252
+ penalty_msg = ""
253
+
254
+ # Sandbox execution
255
+ parses = ast_parses(code)
256
+ exec_success = False
257
+ exec_msg = ""
258
+ if parses:
259
+ if fmt == "marimo":
260
+ exec_success, exec_msg = run_marimo(code)
261
+ elif fmt == "manim":
262
+ exec_success, exec_msg = run_manim(code)
263
+ else:
264
+ exec_msg = "Code has syntax errors and cannot be parsed."
265
+
266
+ # Generation reward
267
+ reward, components = compute_generate_reward(
268
+ code=code,
269
+ fmt=fmt,
270
+ narration=narration,
271
+ task=task,
272
+ exec_success=exec_success,
273
+ accumulated_context=self._accumulated_context,
274
+ )
275
+ reward = max(0.0, reward + skip_penalty)
276
+
277
+ # Feedback
278
+ parts = []
279
+ if penalty_msg:
280
+ parts.append(penalty_msg)
281
+ if not parses:
282
+ parts.append("SYNTAX ERROR: code does not parse.")
283
+ elif not exec_success:
284
+ parts.append(f"EXECUTION FAILED: {exec_msg}")
285
+ else:
286
+ parts.append(f"EXECUTION OK: {exec_msg}")
287
+ parts.append(
288
+ f"Reward: {', '.join(f'{k}={v}' for k, v in components.items())}"
289
+ )
290
 
291
+ return self._make_obs(
292
+ task,
293
+ phase="done",
294
+ feedback="\n".join(parts),
295
+ reward=reward,
296
+ done=True,
297
+ metadata={
298
+ "step": self._state.step_count,
299
+ "phase": "generate",
300
+ "explore_steps_used": self._explore_steps,
301
+ **components,
302
+ },
303
+ )
304
 
305
+ def _make_obs(
306
+ self,
307
+ task: Task,
308
+ *,
309
+ phase: str,
310
+ feedback: str,
311
+ reward: float = 0.0,
312
+ done: bool = False,
313
+ search_results: str = "",
314
+ metadata: dict | None = None,
315
+ ) -> ExplainerObservation:
316
+ """Helper to build a consistent observation."""
317
+ return ExplainerObservation(
318
+ topic=task.topic,
319
+ content=task.content,
320
+ tier=task.tier,
321
+ keywords=task.keywords,
322
+ data_available=task.data_available,
323
+ phase=phase,
324
+ feedback=feedback,
325
+ search_results=search_results,
326
+ explored_context="\n---\n".join(self._accumulated_context),
327
+ explore_steps_left=MAX_EXPLORE_STEPS - self._explore_steps,
328
+ done=done,
329
+ reward=reward,
330
+ metadata=metadata or {},
331
+ )
332
 
333
  @property
334
  def state(self) -> State:
task_bank.py CHANGED
@@ -1,8 +1,9 @@
1
  """
2
  Curated task bank for the Research → Interactive Explainer environment.
3
 
4
- Tasks are organized by category and difficulty. Each task has a preferred format
5
- (marimo or manim) that the format_match reward component checks against.
 
6
  """
7
 
8
  from dataclasses import dataclass
@@ -15,10 +16,9 @@ class Task:
15
  content: str
16
  tier: Literal["beginner", "intermediate", "advanced"]
17
  keywords: str
18
- category: str
19
  data_available: bool
20
- preferred_format: Literal["marimo", "manim"]
21
  difficulty: Literal["easy", "medium", "hard"]
 
22
 
23
 
24
  # ---------- ML Concepts (Marimo-biased) ----------
@@ -29,7 +29,6 @@ ML_CONCEPTS: list[Task] = [
29
  content="Linear regression fits a line to data by minimizing squared errors. Given input features X and target y, it finds weights w such that y ≈ Xw. The loss function is MSE = (1/n) Σ(yi - ŷi)².",
30
  tier="beginner",
31
  keywords="linear regression,least squares,MSE,gradient descent,weights,bias",
32
- category="cs.LG",
33
  data_available=True,
34
  preferred_format="marimo",
35
  difficulty="easy",
@@ -39,7 +38,6 @@ ML_CONCEPTS: list[Task] = [
39
  content="Gradient descent iteratively updates parameters by moving in the direction of steepest decrease of the loss function. Update rule: θ = θ - α∇L(θ), where α is the learning rate. Variants include SGD, mini-batch, and Adam.",
40
  tier="beginner",
41
  keywords="gradient descent,learning rate,loss function,SGD,convergence,optimization",
42
- category="cs.LG",
43
  data_available=True,
44
  preferred_format="marimo",
45
  difficulty="easy",
@@ -49,7 +47,6 @@ ML_CONCEPTS: list[Task] = [
49
  content="Decision trees split data recursively based on feature thresholds that maximize information gain (or minimize Gini impurity). Each leaf node represents a class label or regression value.",
50
  tier="beginner",
51
  keywords="decision tree,information gain,Gini impurity,splitting,leaf node,classification",
52
- category="cs.LG",
53
  data_available=True,
54
  preferred_format="marimo",
55
  difficulty="easy",
@@ -59,7 +56,6 @@ ML_CONCEPTS: list[Task] = [
59
  content="K-means partitions n observations into k clusters by iteratively assigning points to nearest centroid and updating centroids to cluster means. Converges to local optimum. Sensitive to initialization — use k-means++ for better starts.",
60
  tier="intermediate",
61
  keywords="k-means,clustering,centroid,Euclidean distance,convergence,k-means++",
62
- category="cs.LG",
63
  data_available=True,
64
  preferred_format="marimo",
65
  difficulty="easy",
@@ -69,7 +65,6 @@ ML_CONCEPTS: list[Task] = [
69
  content="The attention mechanism computes a weighted sum of values (V) where weights come from compatibility of queries (Q) and keys (K): Attention(Q,K,V) = softmax(QK^T/√dk)V. Self-attention allows each position to attend to all positions in the input.",
70
  tier="intermediate",
71
  keywords="attention,self-attention,query,key,value,softmax,transformer,scaled dot-product",
72
- category="cs.LG",
73
  data_available=False,
74
  preferred_format="marimo",
75
  difficulty="medium",
@@ -79,7 +74,6 @@ ML_CONCEPTS: list[Task] = [
79
  content="Backpropagation computes gradients of the loss with respect to each weight by applying the chain rule layer by layer from output to input. It enables efficient training of deep networks by reusing intermediate computations.",
80
  tier="intermediate",
81
  keywords="backpropagation,chain rule,gradient,computational graph,forward pass,backward pass",
82
- category="cs.LG",
83
  data_available=False,
84
  preferred_format="marimo",
85
  difficulty="medium",
@@ -89,7 +83,6 @@ ML_CONCEPTS: list[Task] = [
89
  content="CNNs use learnable filters that slide over input (convolution) to detect local patterns like edges, textures, and shapes. Key operations: convolution, pooling, and fully-connected layers. Translation equivariance is a key inductive bias.",
90
  tier="intermediate",
91
  keywords="CNN,convolution,pooling,filter,feature map,stride,padding,translation equivariance",
92
- category="cs.CV",
93
  data_available=False,
94
  preferred_format="marimo",
95
  difficulty="medium",
@@ -99,7 +92,6 @@ ML_CONCEPTS: list[Task] = [
99
  content="Batch normalization normalizes activations within a mini-batch: x̂ = (x - μ_B) / √(σ²_B + ε), then scales and shifts: y = γx̂ + β. Reduces internal covariate shift, enables higher learning rates, and acts as a regularizer.",
100
  tier="advanced",
101
  keywords="batch normalization,internal covariate shift,running mean,running variance,gamma,beta",
102
- category="cs.LG",
103
  data_available=False,
104
  preferred_format="marimo",
105
  difficulty="hard",
@@ -109,7 +101,6 @@ ML_CONCEPTS: list[Task] = [
109
  content="VAEs learn a probabilistic latent space by encoding inputs to distributions q(z|x) and decoding samples p(x|z). The ELBO loss = reconstruction + KL divergence. Reparameterization trick enables backprop through sampling: z = μ + σ⊙ε.",
110
  tier="advanced",
111
  keywords="VAE,ELBO,KL divergence,reparameterization,latent space,encoder,decoder,generative",
112
- category="cs.LG",
113
  data_available=False,
114
  preferred_format="marimo",
115
  difficulty="hard",
@@ -119,9 +110,7 @@ ML_CONCEPTS: list[Task] = [
119
  content="An agent interacts with an environment, observing states, taking actions, and receiving rewards. The goal is to learn a policy π(a|s) that maximizes cumulative discounted reward. Key concepts: value function V(s), Q-function Q(s,a), Bellman equation.",
120
  tier="beginner",
121
  keywords="reinforcement learning,agent,environment,reward,policy,value function,Q-function,Bellman",
122
- category="cs.LG",
123
  data_available=False,
124
- preferred_format="marimo",
125
  difficulty="easy",
126
  ),
127
  ]
@@ -135,7 +124,6 @@ MATH_TOPICS: list[Task] = [
135
  content="The Fourier transform decomposes a function into its constituent frequencies: F(ω) = ∫f(t)e^(-iωt)dt. Any periodic signal can be represented as a sum of sines and cosines. The DFT computes this for discrete samples.",
136
  tier="intermediate",
137
  keywords="Fourier transform,frequency,sine,cosine,DFT,spectrum,decomposition,harmonics",
138
- category="math.NA",
139
  data_available=True,
140
  preferred_format="manim",
141
  difficulty="medium",
@@ -145,7 +133,6 @@ MATH_TOPICS: list[Task] = [
145
  content="For a matrix A, eigenvector v satisfies Av = λv where λ is the eigenvalue. Eigenvectors represent directions unchanged by the transformation (only scaled). PCA uses eigenvectors of the covariance matrix.",
146
  tier="intermediate",
147
  keywords="eigenvalue,eigenvector,matrix,linear transformation,PCA,covariance,diagonalization",
148
- category="math.LA",
149
  data_available=False,
150
  preferred_format="manim",
151
  difficulty="medium",
@@ -155,7 +142,6 @@ MATH_TOPICS: list[Task] = [
155
  content="The Taylor series expands a function as an infinite sum of terms: f(x) = Σ f^(n)(a)/n! · (x-a)^n. Provides polynomial approximations to functions. Convergence depends on the radius of convergence.",
156
  tier="beginner",
157
  keywords="Taylor series,polynomial approximation,derivative,convergence,Maclaurin,expansion",
158
- category="math.CA",
159
  data_available=False,
160
  preferred_format="manim",
161
  difficulty="easy",
@@ -165,9 +151,7 @@ MATH_TOPICS: list[Task] = [
165
  content="Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A)P(A)/P(B). It enables updating beliefs given new evidence. Foundation of Bayesian inference, spam filters, and medical diagnosis.",
166
  tier="beginner",
167
  keywords="Bayes theorem,conditional probability,prior,posterior,likelihood,evidence,Bayesian",
168
- category="stat.ML",
169
  data_available=True,
170
- preferred_format="manim",
171
  difficulty="easy",
172
  ),
173
  Task(
@@ -175,7 +159,6 @@ MATH_TOPICS: list[Task] = [
175
  content="The gradient ∇f points in the direction of steepest ascent. The directional derivative Duf = ∇f · u gives the rate of change in direction u. Gradient descent follows -∇f to minimize functions.",
176
  tier="intermediate",
177
  keywords="gradient,directional derivative,steepest ascent,contour,level set,multivariable calculus",
178
- category="math.CA",
179
  data_available=False,
180
  preferred_format="manim",
181
  difficulty="medium",
@@ -185,7 +168,6 @@ MATH_TOPICS: list[Task] = [
185
  content="Multiplying a vector by a matrix transforms it: the columns of A define where basis vectors land. Composition of transformations = matrix multiplication. Determinant measures area/volume scaling.",
186
  tier="beginner",
187
  keywords="matrix multiplication,linear transformation,basis vectors,determinant,composition",
188
- category="math.LA",
189
  data_available=False,
190
  preferred_format="manim",
191
  difficulty="easy",
@@ -195,9 +177,7 @@ MATH_TOPICS: list[Task] = [
195
  content="The CLT states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the population distribution. Requires finite variance. Rate: O(1/√n).",
196
  tier="intermediate",
197
  keywords="central limit theorem,normal distribution,sample mean,variance,convergence,CLT",
198
- category="stat.ML",
199
  data_available=True,
200
- preferred_format="manim",
201
  difficulty="medium",
202
  ),
203
  Task(
@@ -205,7 +185,6 @@ MATH_TOPICS: list[Task] = [
205
  content="SVD decomposes any matrix A = UΣV^T where U,V are orthogonal and Σ is diagonal with singular values. Used in dimensionality reduction, matrix completion, and computing pseudoinverse. Truncated SVD approximates with k largest singular values.",
206
  tier="advanced",
207
  keywords="SVD,singular value,orthogonal,dimensionality reduction,low-rank approximation,pseudoinverse",
208
- category="math.LA",
209
  data_available=False,
210
  preferred_format="manim",
211
  difficulty="hard",
@@ -221,7 +200,6 @@ ALGORITHMS: list[Task] = [
221
  content="Merge sort divides the array in half, recursively sorts each half, then merges the sorted halves. Time complexity O(n log n), space O(n). Stable sort. Divide-and-conquer paradigm.",
222
  tier="beginner",
223
  keywords="merge sort,divide and conquer,recursion,O(n log n),stable sort,merging",
224
- category="cs.DS",
225
  data_available=True,
226
  preferred_format="manim",
227
  difficulty="easy",
@@ -231,7 +209,6 @@ ALGORITHMS: list[Task] = [
231
  content="Binary search finds a target in a sorted array by repeatedly halving the search space. Compare target with middle element; eliminate half. Time O(log n). Requires sorted input.",
232
  tier="beginner",
233
  keywords="binary search,sorted array,O(log n),divide and conquer,search space,comparison",
234
- category="cs.DS",
235
  data_available=True,
236
  preferred_format="manim",
237
  difficulty="easy",
@@ -241,7 +218,6 @@ ALGORITHMS: list[Task] = [
241
  content="Dijkstra's algorithm finds shortest paths from a source vertex to all others in a weighted graph with non-negative edges. Uses a priority queue. Greedily selects the nearest unvisited vertex. Time O((V+E) log V) with binary heap.",
242
  tier="intermediate",
243
  keywords="Dijkstra,shortest path,graph,priority queue,greedy,weighted edges,relaxation",
244
- category="cs.DS",
245
  data_available=False,
246
  preferred_format="manim",
247
  difficulty="medium",
@@ -251,7 +227,6 @@ ALGORITHMS: list[Task] = [
251
  content="A* combines Dijkstra's algorithm with heuristics: f(n) = g(n) + h(n), where g is cost-so-far and h is estimated cost-to-goal. Optimal if h is admissible (never overestimates). Used in pathfinding and game AI.",
252
  tier="intermediate",
253
  keywords="A-star,heuristic,admissible,pathfinding,f-score,g-score,h-score,optimal",
254
- category="cs.AI",
255
  data_available=False,
256
  preferred_format="manim",
257
  difficulty="medium",
@@ -261,7 +236,6 @@ ALGORITHMS: list[Task] = [
261
  content="Quick sort selects a pivot, partitions elements into less-than and greater-than groups, then recursively sorts each. Average O(n log n), worst O(n²). In-place. Pivot selection strategy matters (median-of-three).",
262
  tier="beginner",
263
  keywords="quick sort,pivot,partition,in-place,O(n log n),recursion,divide and conquer",
264
- category="cs.DS",
265
  data_available=True,
266
  preferred_format="manim",
267
  difficulty="easy",
@@ -277,7 +251,6 @@ STATISTICS_TASKS: list[Task] = [
277
  content="EDA uses summary statistics and visualizations to understand data distributions, correlations, and anomalies before modeling. Key tools: histograms, box plots, scatter matrices, correlation heatmaps.",
278
  tier="beginner",
279
  keywords="EDA,histogram,box plot,correlation,scatter plot,distribution,outliers,summary statistics",
280
- category="stat.ML",
281
  data_available=True,
282
  preferred_format="marimo",
283
  difficulty="easy",
@@ -287,7 +260,6 @@ STATISTICS_TASKS: list[Task] = [
287
  content="Hypothesis testing determines if observed data provides sufficient evidence against a null hypothesis H0. Steps: formulate H0/H1, choose significance level α, compute test statistic, compare with critical value or p-value.",
288
  tier="intermediate",
289
  keywords="hypothesis testing,null hypothesis,p-value,significance level,t-test,type I error,type II error",
290
- category="stat.ML",
291
  data_available=True,
292
  preferred_format="marimo",
293
  difficulty="medium",
@@ -297,7 +269,6 @@ STATISTICS_TASKS: list[Task] = [
297
  content="PCA reduces dimensionality by projecting data onto directions of maximum variance. Steps: center data, compute covariance matrix, find eigenvectors (principal components), project. Explained variance ratio guides k selection.",
298
  tier="intermediate",
299
  keywords="PCA,principal components,variance,dimensionality reduction,eigenvector,covariance,projection",
300
- category="stat.ML",
301
  data_available=True,
302
  preferred_format="marimo",
303
  difficulty="medium",
 
1
  """
2
  Curated task bank for the Research → Interactive Explainer environment.
3
 
4
+ Tasks are organized by difficulty (easy/medium/hard) and tier (beginner/intermediate/
5
+ advanced). Each task optionally specifies a preferred format (marimo or manim); when
6
+ None, the SLM must infer the best format and gets full format_match reward either way.
7
  """
8
 
9
  from dataclasses import dataclass
 
16
  content: str
17
  tier: Literal["beginner", "intermediate", "advanced"]
18
  keywords: str
 
19
  data_available: bool
 
20
  difficulty: Literal["easy", "medium", "hard"]
21
+ preferred_format: Literal["marimo", "manim"] | None = None
22
 
23
 
24
  # ---------- ML Concepts (Marimo-biased) ----------
 
29
  content="Linear regression fits a line to data by minimizing squared errors. Given input features X and target y, it finds weights w such that y ≈ Xw. The loss function is MSE = (1/n) Σ(yi - ŷi)².",
30
  tier="beginner",
31
  keywords="linear regression,least squares,MSE,gradient descent,weights,bias",
 
32
  data_available=True,
33
  preferred_format="marimo",
34
  difficulty="easy",
 
38
  content="Gradient descent iteratively updates parameters by moving in the direction of steepest decrease of the loss function. Update rule: θ = θ - α∇L(θ), where α is the learning rate. Variants include SGD, mini-batch, and Adam.",
39
  tier="beginner",
40
  keywords="gradient descent,learning rate,loss function,SGD,convergence,optimization",
 
41
  data_available=True,
42
  preferred_format="marimo",
43
  difficulty="easy",
 
47
  content="Decision trees split data recursively based on feature thresholds that maximize information gain (or minimize Gini impurity). Each leaf node represents a class label or regression value.",
48
  tier="beginner",
49
  keywords="decision tree,information gain,Gini impurity,splitting,leaf node,classification",
 
50
  data_available=True,
51
  preferred_format="marimo",
52
  difficulty="easy",
 
56
  content="K-means partitions n observations into k clusters by iteratively assigning points to nearest centroid and updating centroids to cluster means. Converges to local optimum. Sensitive to initialization — use k-means++ for better starts.",
57
  tier="intermediate",
58
  keywords="k-means,clustering,centroid,Euclidean distance,convergence,k-means++",
 
59
  data_available=True,
60
  preferred_format="marimo",
61
  difficulty="easy",
 
65
  content="The attention mechanism computes a weighted sum of values (V) where weights come from compatibility of queries (Q) and keys (K): Attention(Q,K,V) = softmax(QK^T/√dk)V. Self-attention allows each position to attend to all positions in the input.",
66
  tier="intermediate",
67
  keywords="attention,self-attention,query,key,value,softmax,transformer,scaled dot-product",
 
68
  data_available=False,
69
  preferred_format="marimo",
70
  difficulty="medium",
 
74
  content="Backpropagation computes gradients of the loss with respect to each weight by applying the chain rule layer by layer from output to input. It enables efficient training of deep networks by reusing intermediate computations.",
75
  tier="intermediate",
76
  keywords="backpropagation,chain rule,gradient,computational graph,forward pass,backward pass",
 
77
  data_available=False,
78
  preferred_format="marimo",
79
  difficulty="medium",
 
83
  content="CNNs use learnable filters that slide over input (convolution) to detect local patterns like edges, textures, and shapes. Key operations: convolution, pooling, and fully-connected layers. Translation equivariance is a key inductive bias.",
84
  tier="intermediate",
85
  keywords="CNN,convolution,pooling,filter,feature map,stride,padding,translation equivariance",
 
86
  data_available=False,
87
  preferred_format="marimo",
88
  difficulty="medium",
 
92
  content="Batch normalization normalizes activations within a mini-batch: x̂ = (x - μ_B) / √(σ²_B + ε), then scales and shifts: y = γx̂ + β. Reduces internal covariate shift, enables higher learning rates, and acts as a regularizer.",
93
  tier="advanced",
94
  keywords="batch normalization,internal covariate shift,running mean,running variance,gamma,beta",
 
95
  data_available=False,
96
  preferred_format="marimo",
97
  difficulty="hard",
 
101
  content="VAEs learn a probabilistic latent space by encoding inputs to distributions q(z|x) and decoding samples p(x|z). The ELBO loss = reconstruction + KL divergence. Reparameterization trick enables backprop through sampling: z = μ + σ⊙ε.",
102
  tier="advanced",
103
  keywords="VAE,ELBO,KL divergence,reparameterization,latent space,encoder,decoder,generative",
 
104
  data_available=False,
105
  preferred_format="marimo",
106
  difficulty="hard",
 
110
  content="An agent interacts with an environment, observing states, taking actions, and receiving rewards. The goal is to learn a policy π(a|s) that maximizes cumulative discounted reward. Key concepts: value function V(s), Q-function Q(s,a), Bellman equation.",
111
  tier="beginner",
112
  keywords="reinforcement learning,agent,environment,reward,policy,value function,Q-function,Bellman",
 
113
  data_available=False,
 
114
  difficulty="easy",
115
  ),
116
  ]
 
124
  content="The Fourier transform decomposes a function into its constituent frequencies: F(ω) = ∫f(t)e^(-iωt)dt. Any periodic signal can be represented as a sum of sines and cosines. The DFT computes this for discrete samples.",
125
  tier="intermediate",
126
  keywords="Fourier transform,frequency,sine,cosine,DFT,spectrum,decomposition,harmonics",
 
127
  data_available=True,
128
  preferred_format="manim",
129
  difficulty="medium",
 
133
  content="For a matrix A, eigenvector v satisfies Av = λv where λ is the eigenvalue. Eigenvectors represent directions unchanged by the transformation (only scaled). PCA uses eigenvectors of the covariance matrix.",
134
  tier="intermediate",
135
  keywords="eigenvalue,eigenvector,matrix,linear transformation,PCA,covariance,diagonalization",
 
136
  data_available=False,
137
  preferred_format="manim",
138
  difficulty="medium",
 
142
  content="The Taylor series expands a function as an infinite sum of terms: f(x) = Σ f^(n)(a)/n! · (x-a)^n. Provides polynomial approximations to functions. Convergence depends on the radius of convergence.",
143
  tier="beginner",
144
  keywords="Taylor series,polynomial approximation,derivative,convergence,Maclaurin,expansion",
 
145
  data_available=False,
146
  preferred_format="manim",
147
  difficulty="easy",
 
151
  content="Bayes' theorem relates conditional probabilities: P(A|B) = P(B|A)P(A)/P(B). It enables updating beliefs given new evidence. Foundation of Bayesian inference, spam filters, and medical diagnosis.",
152
  tier="beginner",
153
  keywords="Bayes theorem,conditional probability,prior,posterior,likelihood,evidence,Bayesian",
 
154
  data_available=True,
 
155
  difficulty="easy",
156
  ),
157
  Task(
 
159
  content="The gradient ∇f points in the direction of steepest ascent. The directional derivative Duf = ∇f · u gives the rate of change in direction u. Gradient descent follows -∇f to minimize functions.",
160
  tier="intermediate",
161
  keywords="gradient,directional derivative,steepest ascent,contour,level set,multivariable calculus",
 
162
  data_available=False,
163
  preferred_format="manim",
164
  difficulty="medium",
 
168
  content="Multiplying a vector by a matrix transforms it: the columns of A define where basis vectors land. Composition of transformations = matrix multiplication. Determinant measures area/volume scaling.",
169
  tier="beginner",
170
  keywords="matrix multiplication,linear transformation,basis vectors,determinant,composition",
 
171
  data_available=False,
172
  preferred_format="manim",
173
  difficulty="easy",
 
177
  content="The CLT states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the population distribution. Requires finite variance. Rate: O(1/√n).",
178
  tier="intermediate",
179
  keywords="central limit theorem,normal distribution,sample mean,variance,convergence,CLT",
 
180
  data_available=True,
 
181
  difficulty="medium",
182
  ),
183
  Task(
 
185
  content="SVD decomposes any matrix A = UΣV^T where U,V are orthogonal and Σ is diagonal with singular values. Used in dimensionality reduction, matrix completion, and computing pseudoinverse. Truncated SVD approximates with k largest singular values.",
186
  tier="advanced",
187
  keywords="SVD,singular value,orthogonal,dimensionality reduction,low-rank approximation,pseudoinverse",
 
188
  data_available=False,
189
  preferred_format="manim",
190
  difficulty="hard",
 
200
  content="Merge sort divides the array in half, recursively sorts each half, then merges the sorted halves. Time complexity O(n log n), space O(n). Stable sort. Divide-and-conquer paradigm.",
201
  tier="beginner",
202
  keywords="merge sort,divide and conquer,recursion,O(n log n),stable sort,merging",
 
203
  data_available=True,
204
  preferred_format="manim",
205
  difficulty="easy",
 
209
  content="Binary search finds a target in a sorted array by repeatedly halving the search space. Compare target with middle element; eliminate half. Time O(log n). Requires sorted input.",
210
  tier="beginner",
211
  keywords="binary search,sorted array,O(log n),divide and conquer,search space,comparison",
 
212
  data_available=True,
213
  preferred_format="manim",
214
  difficulty="easy",
 
218
  content="Dijkstra's algorithm finds shortest paths from a source vertex to all others in a weighted graph with non-negative edges. Uses a priority queue. Greedily selects the nearest unvisited vertex. Time O((V+E) log V) with binary heap.",
219
  tier="intermediate",
220
  keywords="Dijkstra,shortest path,graph,priority queue,greedy,weighted edges,relaxation",
 
221
  data_available=False,
222
  preferred_format="manim",
223
  difficulty="medium",
 
227
  content="A* combines Dijkstra's algorithm with heuristics: f(n) = g(n) + h(n), where g is cost-so-far and h is estimated cost-to-goal. Optimal if h is admissible (never overestimates). Used in pathfinding and game AI.",
228
  tier="intermediate",
229
  keywords="A-star,heuristic,admissible,pathfinding,f-score,g-score,h-score,optimal",
 
230
  data_available=False,
231
  preferred_format="manim",
232
  difficulty="medium",
 
236
  content="Quick sort selects a pivot, partitions elements into less-than and greater-than groups, then recursively sorts each. Average O(n log n), worst O(n²). In-place. Pivot selection strategy matters (median-of-three).",
237
  tier="beginner",
238
  keywords="quick sort,pivot,partition,in-place,O(n log n),recursion,divide and conquer",
 
239
  data_available=True,
240
  preferred_format="manim",
241
  difficulty="easy",
 
251
  content="EDA uses summary statistics and visualizations to understand data distributions, correlations, and anomalies before modeling. Key tools: histograms, box plots, scatter matrices, correlation heatmaps.",
252
  tier="beginner",
253
  keywords="EDA,histogram,box plot,correlation,scatter plot,distribution,outliers,summary statistics",
 
254
  data_available=True,
255
  preferred_format="marimo",
256
  difficulty="easy",
 
260
  content="Hypothesis testing determines if observed data provides sufficient evidence against a null hypothesis H0. Steps: formulate H0/H1, choose significance level α, compute test statistic, compare with critical value or p-value.",
261
  tier="intermediate",
262
  keywords="hypothesis testing,null hypothesis,p-value,significance level,t-test,type I error,type II error",
 
263
  data_available=True,
264
  preferred_format="marimo",
265
  difficulty="medium",
 
269
  content="PCA reduces dimensionality by projecting data onto directions of maximum variance. Steps: center data, compute covariance matrix, find eigenvectors (principal components), project. Explained variance ratio guides k selection.",
270
  tier="intermediate",
271
  keywords="PCA,principal components,variance,dimensionality reduction,eigenvector,covariance,projection",
 
272
  data_available=True,
273
  preferred_format="marimo",
274
  difficulty="medium",
tests/__init__.py ADDED
File without changes
tests/run_tests.sh ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Run the test suite for explainer_env.
3
+ #
4
+ # Usage:
5
+ # tests/run_tests.sh # fast tests (models, task_bank, rewards, environment)
6
+ # tests/run_tests.sh --all # fast + client-server integration
7
+ # tests/run_tests.sh --docker # fast + docker build & test
8
+ # tests/run_tests.sh --full # everything
9
+
10
+ set -euo pipefail
11
+ cd "$(dirname "$0")/.." # explainer_env/
12
+
13
+ RED='\033[0;31m'
14
+ GREEN='\033[0;32m'
15
+ YELLOW='\033[0;33m'
16
+ NC='\033[0m'
17
+
18
+ PASSED=0
19
+ FAILED=0
20
+ SKIPPED=0
21
+
22
+ run() {
23
+ local label="$1"; shift
24
+ printf "%-40s" "$label"
25
+ if output=$("$@" 2>&1); then
26
+ echo -e "${GREEN}OK${NC}"
27
+ PASSED=$((PASSED + 1))
28
+ else
29
+ echo -e "${RED}FAIL${NC}"
30
+ echo "$output" | tail -5
31
+ FAILED=$((FAILED + 1))
32
+ fi
33
+ }
34
+
35
+ skip() {
36
+ printf "%-40s" "$1"
37
+ echo -e "${YELLOW}SKIP${NC}"
38
+ SKIPPED=$((SKIPPED + 1))
39
+ }
40
+
41
+ echo "=== explainer_env test suite ==="
42
+ echo ""
43
+
44
+ # --- Fast tests (no server needed) ---
45
+ echo "--- Unit tests ---"
46
+ run "models" uv run python tests/test_models.py
47
+ run "task_bank" uv run python tests/test_task_bank.py
48
+ run "rewards" uv run python tests/test_rewards.py
49
+ run "environment" uv run python tests/test_environment.py
50
+ run "ruff lint" uvx ruff check .
51
+
52
+ # --- Integration tests (need server / docker) ---
53
+ MODE="${1:-}"
54
+
55
+ if [[ "$MODE" == "--all" || "$MODE" == "--full" ]]; then
56
+ echo ""
57
+ echo "--- Client-server integration ---"
58
+ run "client_server" uv run python tests/test_client_server.py
59
+ else
60
+ echo ""
61
+ skip "client_server (use --all)"
62
+ fi
63
+
64
+ if [[ "$MODE" == "--docker" || "$MODE" == "--full" ]]; then
65
+ echo ""
66
+ echo "--- Docker integration ---"
67
+ run "docker" uv run python tests/test_docker.py
68
+ else
69
+ skip "docker (use --docker or --full)"
70
+ fi
71
+
72
+ # --- Summary ---
73
+ echo ""
74
+ TOTAL=$((PASSED + FAILED + SKIPPED))
75
+ echo "=== ${PASSED} passed, ${FAILED} failed, ${SKIPPED} skipped (${TOTAL} total) ==="
76
+ [[ $FAILED -eq 0 ]] || exit 1
tests/test_client_server.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Integration test: start server, connect client, run explore→generate.
2
+
3
+ Usage:
4
+ uv run python tests/test_client_server.py # auto-starts server
5
+ uv run python tests/test_client_server.py --url http://localhost:8000
6
+ """
7
+
8
+ import argparse
9
+ import subprocess
10
+ import sys
11
+ import time
12
+ from pathlib import Path
13
+
14
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
15
+
16
+ from client import ExplainerEnv
17
+ from models import ExplainerAction
18
+
19
+
20
+ def wait_for_server(url: str, timeout: int = 15):
21
+ import urllib.request
22
+
23
+ deadline = time.time() + timeout
24
+ while time.time() < deadline:
25
+ try:
26
+ urllib.request.urlopen(f"{url}/health", timeout=2)
27
+ return True
28
+ except Exception:
29
+ time.sleep(0.5)
30
+ return False
31
+
32
+
33
+ def run_tests(base_url: str):
34
+ client = ExplainerEnv(base_url=base_url)
35
+ with client.sync() as sc:
36
+ # --- reset ---
37
+ result = sc.reset()
38
+ obs = result.observation
39
+ assert obs.topic, "reset should return a topic"
40
+ assert obs.phase == "explore"
41
+ assert obs.explore_steps_left == 3
42
+ print(f" reset: topic={obs.topic!r}, phase={obs.phase}")
43
+
44
+ # --- explore ---
45
+ action = ExplainerAction(action_type="explore", query=obs.topic)
46
+ result = sc.step(action)
47
+ assert not result.done
48
+ assert result.observation.explore_steps_left == 2
49
+ print(f" explore: reward={result.reward:.3f}, steps_left={result.observation.explore_steps_left}")
50
+
51
+ # --- generate ---
52
+ action = ExplainerAction(
53
+ action_type="generate",
54
+ format="marimo",
55
+ code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n mo.md('hi')\n return\n",
56
+ )
57
+ result = sc.step(action)
58
+ assert result.done
59
+ assert isinstance(result.reward, (int, float))
60
+ print(f" generate: reward={result.reward:.3f}, done={result.done}")
61
+
62
+ # --- second episode ---
63
+ result2 = sc.reset()
64
+ assert result2.observation.topic
65
+ print(f" reset2: topic={result2.observation.topic!r}")
66
+
67
+ print("PASS: test_client_server (4/4)")
68
+
69
+
70
+ def main():
71
+ parser = argparse.ArgumentParser()
72
+ parser.add_argument("--url", default=None)
73
+ args = parser.parse_args()
74
+
75
+ if args.url:
76
+ run_tests(args.url)
77
+ else:
78
+ proc = subprocess.Popen(
79
+ ["uv", "run", "server"],
80
+ stdout=subprocess.PIPE,
81
+ stderr=subprocess.PIPE,
82
+ )
83
+ try:
84
+ url = "http://localhost:8000"
85
+ if not wait_for_server(url):
86
+ stderr = proc.stderr.read().decode() if proc.stderr else ""
87
+ print(f"FAIL: server did not start\n{stderr}", file=sys.stderr)
88
+ sys.exit(1)
89
+ run_tests(url)
90
+ finally:
91
+ proc.terminate()
92
+ proc.wait(timeout=5)
93
+
94
+
95
+ if __name__ == "__main__":
96
+ main()
tests/test_docker.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Integration test: build + run Docker image, test via client.
2
+
3
+ Usage:
4
+ uv run python tests/test_docker.py # build + run + test
5
+ uv run python tests/test_docker.py --skip-build # reuse existing image
6
+ uv run python tests/test_docker.py --image my:tag # custom image name
7
+ """
8
+
9
+ import argparse
10
+ import subprocess
11
+ import sys
12
+ import time
13
+ from pathlib import Path
14
+
15
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
16
+
17
+ IMAGE = "explainer-env:latest"
18
+ CONTAINER = "explainer-env-test"
19
+
20
+
21
+ def wait_for_server(url: str, timeout: int = 30):
22
+ import urllib.request
23
+
24
+ deadline = time.time() + timeout
25
+ while time.time() < deadline:
26
+ try:
27
+ urllib.request.urlopen(f"{url}/health", timeout=2)
28
+ return True
29
+ except Exception:
30
+ time.sleep(1)
31
+ return False
32
+
33
+
34
+ def docker_build(image: str):
35
+ env_dir = Path(__file__).resolve().parents[1]
36
+ print(f" building {image} from {env_dir}...")
37
+ result = subprocess.run(
38
+ ["docker", "build", "-t", image, "-f", "server/Dockerfile", "."],
39
+ cwd=str(env_dir),
40
+ capture_output=True,
41
+ text=True,
42
+ )
43
+ if result.returncode != 0:
44
+ print(f"FAIL: docker build\n{result.stderr[-1000:]}", file=sys.stderr)
45
+ sys.exit(1)
46
+ print(" build OK")
47
+
48
+
49
+ def docker_run(image: str, container: str):
50
+ # clean up stale container
51
+ subprocess.run(["docker", "rm", "-f", container], capture_output=True)
52
+ result = subprocess.run(
53
+ ["docker", "run", "-d", "--name", container, "-p", "8000:8000", image],
54
+ capture_output=True,
55
+ text=True,
56
+ )
57
+ if result.returncode != 0:
58
+ print(f"FAIL: docker run\n{result.stderr}", file=sys.stderr)
59
+ sys.exit(1)
60
+ print(f" container {container} started")
61
+
62
+
63
+ def docker_cleanup(container: str):
64
+ subprocess.run(["docker", "rm", "-f", container], capture_output=True)
65
+ print(f" container {container} removed")
66
+
67
+
68
+ def run_tests(base_url: str):
69
+ from client import ExplainerEnv
70
+ from models import ExplainerAction
71
+
72
+ client = ExplainerEnv(base_url=base_url)
73
+ with client.sync() as sc:
74
+ result = sc.reset()
75
+ assert result.observation.topic, "reset should return topic"
76
+ print(f" reset: topic={result.observation.topic!r}")
77
+
78
+ action = ExplainerAction(
79
+ format="marimo",
80
+ code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n return\n",
81
+ )
82
+ result = sc.step(action)
83
+ assert isinstance(result.reward, (int, float))
84
+ print(f" step: reward={result.reward:.3f}, done={result.done}")
85
+
86
+ print("PASS: test_docker (2/2)")
87
+
88
+
89
+ def main():
90
+ parser = argparse.ArgumentParser()
91
+ parser.add_argument("--skip-build", action="store_true")
92
+ parser.add_argument("--image", default=IMAGE)
93
+ args = parser.parse_args()
94
+
95
+ if not args.skip_build:
96
+ docker_build(args.image)
97
+
98
+ docker_run(args.image, CONTAINER)
99
+ try:
100
+ url = "http://localhost:8000"
101
+ if not wait_for_server(url):
102
+ logs = subprocess.run(
103
+ ["docker", "logs", CONTAINER], capture_output=True, text=True
104
+ )
105
+ print(f"FAIL: container didn't start\n{logs.stdout}\n{logs.stderr}", file=sys.stderr)
106
+ sys.exit(1)
107
+ run_tests(url)
108
+ finally:
109
+ docker_cleanup(CONTAINER)
110
+
111
+
112
+ if __name__ == "__main__":
113
+ main()
tests/test_environment.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for ExplainerEnvironment — multi-step explore→generate lifecycle."""
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
7
+
8
+ from models import ExplainerAction, ExplainerObservation
9
+ from server.explainer_env_environment import ExplainerEnvironment
10
+
11
+
12
+ def test_reset_returns_observation():
13
+ env = ExplainerEnvironment()
14
+ obs = env.reset(seed=1)
15
+ assert isinstance(obs, ExplainerObservation)
16
+ assert obs.topic != ""
17
+ assert obs.phase == "explore"
18
+ assert obs.explore_steps_left == 3
19
+ assert obs.done is False
20
+
21
+
22
+ def test_reset_deterministic_with_seed():
23
+ env = ExplainerEnvironment()
24
+ obs1 = env.reset(seed=42)
25
+ obs2 = env.reset(seed=42)
26
+ assert obs1.topic == obs2.topic
27
+
28
+
29
+ def test_explore_step():
30
+ env = ExplainerEnvironment()
31
+ env.reset(seed=1)
32
+ action = ExplainerAction(action_type="explore", query="gradient descent optimization")
33
+ obs = env.step(action)
34
+ assert obs.done is False
35
+ assert obs.explore_steps_left == 2
36
+ assert isinstance(obs.reward, (int, float))
37
+ assert obs.reward >= 0.0
38
+
39
+
40
+ def test_explore_empty_query():
41
+ env = ExplainerEnvironment()
42
+ env.reset(seed=1)
43
+ action = ExplainerAction(action_type="explore", query="")
44
+ obs = env.step(action)
45
+ assert obs.reward == 0.0
46
+ assert "Empty query" in obs.feedback
47
+
48
+
49
+ def test_explore_max_steps():
50
+ env = ExplainerEnvironment()
51
+ env.reset(seed=1)
52
+ for i in range(3):
53
+ obs = env.step(ExplainerAction(action_type="explore", query=f"search {i}"))
54
+ assert obs.phase == "generate"
55
+ assert obs.explore_steps_left == 0
56
+
57
+
58
+ def test_explore_then_generate():
59
+ env = ExplainerEnvironment()
60
+ env.reset(seed=1)
61
+ # Explore
62
+ obs = env.step(ExplainerAction(action_type="explore", query="gradient descent"))
63
+ assert obs.done is False
64
+ assert obs.explored_context != ""
65
+ # Generate
66
+ obs = env.step(ExplainerAction(
67
+ action_type="generate",
68
+ format="marimo",
69
+ code="import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n return\n",
70
+ ))
71
+ assert obs.done is True
72
+ assert obs.phase == "done"
73
+ assert isinstance(obs.reward, (int, float))
74
+
75
+
76
+ def test_generate_without_explore_penalty():
77
+ env = ExplainerEnvironment()
78
+ env.reset(seed=1)
79
+ obs = env.step(ExplainerAction(
80
+ action_type="generate",
81
+ format="marimo",
82
+ code="x = 1",
83
+ ))
84
+ assert obs.done is True
85
+ assert "penalty" in obs.feedback.lower() or "without" in obs.feedback.lower()
86
+
87
+
88
+ def test_step_without_reset():
89
+ env = ExplainerEnvironment()
90
+ action = ExplainerAction(action_type="explore", query="test")
91
+ obs = env.step(action)
92
+ assert obs.done is True
93
+ assert obs.reward == -1.0
94
+
95
+
96
+ def test_generate_reward_in_metadata():
97
+ env = ExplainerEnvironment()
98
+ env.reset(seed=1)
99
+ env.step(ExplainerAction(action_type="explore", query="gradient descent"))
100
+ obs = env.step(ExplainerAction(
101
+ action_type="generate",
102
+ format="marimo",
103
+ code="x = 1",
104
+ ))
105
+ for key in ("code_valid", "code_runs", "coverage", "format_match", "structure"):
106
+ assert key in obs.metadata, f"missing {key} in metadata"
107
+ assert "explore_steps_used" in obs.metadata
108
+
109
+
110
+ def test_state_episode_id_changes():
111
+ env = ExplainerEnvironment()
112
+ env.reset()
113
+ eid1 = env.state.episode_id
114
+ env.reset()
115
+ eid2 = env.state.episode_id
116
+ assert eid1 != eid2
117
+
118
+
119
+ def test_step_increments_count():
120
+ env = ExplainerEnvironment()
121
+ env.reset(seed=1)
122
+ assert env.state.step_count == 0
123
+ env.step(ExplainerAction(action_type="explore", query="test"))
124
+ assert env.state.step_count == 1
125
+ env.step(ExplainerAction(action_type="generate", format="marimo", code="x=1"))
126
+ assert env.state.step_count == 2
127
+
128
+
129
+ def test_bad_code_does_not_crash():
130
+ env = ExplainerEnvironment()
131
+ env.reset(seed=1)
132
+ obs = env.step(ExplainerAction(
133
+ action_type="generate",
134
+ format="marimo",
135
+ code=")))syntax error(((",
136
+ ))
137
+ assert obs.done is True
138
+ assert "SYNTAX ERROR" in obs.feedback
139
+
140
+
141
+ if __name__ == "__main__":
142
+ tests = [
143
+ test_reset_returns_observation,
144
+ test_reset_deterministic_with_seed,
145
+ test_explore_step,
146
+ test_explore_empty_query,
147
+ test_explore_max_steps,
148
+ test_explore_then_generate,
149
+ test_generate_without_explore_penalty,
150
+ test_step_without_reset,
151
+ test_generate_reward_in_metadata,
152
+ test_state_episode_id_changes,
153
+ test_step_increments_count,
154
+ test_bad_code_does_not_crash,
155
+ ]
156
+ passed = 0
157
+ for t in tests:
158
+ try:
159
+ t()
160
+ passed += 1
161
+ except Exception as e:
162
+ print(f"FAIL: {t.__name__}: {e}")
163
+ print(f"PASS: test_environment ({passed}/{len(tests)})")
tests/test_models.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for Action/Observation model creation and validation."""
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
7
+
8
+ from models import ExplainerAction, ExplainerObservation
9
+
10
+
11
+ def test_action_explore():
12
+ a = ExplainerAction(action_type="explore", query="attention mechanism")
13
+ assert a.action_type == "explore"
14
+ assert a.query == "attention mechanism"
15
+ assert a.code == ""
16
+ assert a.format is None
17
+
18
+
19
+ def test_action_generate_marimo():
20
+ a = ExplainerAction(
21
+ action_type="generate",
22
+ format="marimo",
23
+ code="import marimo as mo\napp = mo.App()",
24
+ )
25
+ assert a.action_type == "generate"
26
+ assert a.format == "marimo"
27
+ assert a.narration == ""
28
+
29
+
30
+ def test_action_generate_manim():
31
+ a = ExplainerAction(
32
+ action_type="generate",
33
+ format="manim",
34
+ code="from manim import *\nclass S(Scene): pass",
35
+ narration="First we show the scene.",
36
+ )
37
+ assert a.format == "manim"
38
+ assert a.narration != ""
39
+
40
+
41
+ def test_observation_defaults():
42
+ obs = ExplainerObservation()
43
+ assert obs.topic == ""
44
+ assert obs.tier == "beginner"
45
+ assert obs.phase == "explore"
46
+ assert obs.explore_steps_left == 3
47
+ assert obs.done is False
48
+
49
+
50
+ def test_observation_full():
51
+ obs = ExplainerObservation(
52
+ topic="Gradient Descent",
53
+ content="GD iteratively updates params.",
54
+ tier="intermediate",
55
+ keywords="gradient,learning rate",
56
+ data_available=True,
57
+ phase="generate",
58
+ feedback="looks good",
59
+ search_results="paper1...",
60
+ explored_context="accumulated...",
61
+ explore_steps_left=1,
62
+ done=True,
63
+ reward=0.85,
64
+ )
65
+ assert obs.topic == "Gradient Descent"
66
+ assert obs.phase == "generate"
67
+ assert obs.explore_steps_left == 1
68
+ assert obs.reward == 0.85
69
+
70
+
71
+ if __name__ == "__main__":
72
+ test_action_explore()
73
+ test_action_generate_marimo()
74
+ test_action_generate_manim()
75
+ test_observation_defaults()
76
+ test_observation_full()
77
+ print("PASS: test_models (5/5)")
tests/test_rewards.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for reward components — exploration and generation."""
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
7
+
8
+ from rewards.exploration import (
9
+ compute_explore_reward,
10
+ query_relevance,
11
+ research_breadth,
12
+ result_novelty,
13
+ )
14
+ from rewards.generation import (
15
+ compute_generate_reward,
16
+ context_usage,
17
+ format_match,
18
+ keyword_coverage,
19
+ marimo_structure,
20
+ narration_score,
21
+ )
22
+ from rewards.sandbox import ast_parses
23
+ from task_bank import ALL_TASKS
24
+
25
+ MARIMO_TASK = next(t for t in ALL_TASKS if t.topic == "Linear Regression")
26
+ MANIM_TASK = next(t for t in ALL_TASKS if t.topic == "Fourier Transform")
27
+
28
+
29
+ # --- Sandbox ---
30
+
31
+ def test_ast_parses():
32
+ assert ast_parses("x = 1") is True
33
+ assert ast_parses("not python!!!") is False
34
+
35
+
36
+ # --- Exploration rewards ---
37
+
38
+ def test_query_relevance():
39
+ assert query_relevance("linear regression MSE", "Linear Regression", "linear regression,MSE") > 0.5
40
+ assert query_relevance("", "Linear Regression", "x") == 0.0
41
+ assert query_relevance("cats", "Linear Regression", "linear regression") < 0.3
42
+
43
+
44
+ def test_result_novelty():
45
+ assert result_novelty("new information here", []) == 1.0
46
+ assert result_novelty("same words again", ["same words again"]) < 0.5
47
+ assert result_novelty("", []) == 0.0
48
+
49
+
50
+ def test_research_breadth():
51
+ assert research_breadth([], min_sources=2) == 0.0
52
+ assert research_breadth(["a"], min_sources=2) == 0.5
53
+ assert research_breadth(["a", "b"], min_sources=2) == 1.0
54
+
55
+
56
+ def test_explore_reward_integration():
57
+ reward, comp = compute_explore_reward(
58
+ query="linear regression least squares",
59
+ result_text="Linear regression minimizes squared error...",
60
+ topic="Linear Regression",
61
+ keywords_csv="linear regression,least squares,MSE",
62
+ task_content="Linear regression is a method for modeling the relationship between variables.",
63
+ accumulated_context=["first search result"],
64
+ )
65
+ assert reward > 0.1
66
+ assert "query_relevance" in comp
67
+ assert "result_novelty" in comp
68
+ assert "research_breadth" in comp
69
+ assert "content_sufficiency" in comp
70
+
71
+
72
+ # --- Generation rewards ---
73
+
74
+ def test_keyword_coverage():
75
+ assert keyword_coverage("linear regression MSE", "linear regression,MSE,gradient descent") > 0.5
76
+ assert keyword_coverage("nothing", "linear regression,MSE") == 0.0
77
+
78
+
79
+ def test_format_match():
80
+ assert format_match("marimo", MARIMO_TASK) == 1.0
81
+ assert format_match("manim", MARIMO_TASK) == 0.3
82
+ # Task with preferred_format=None should score 1.0 for any format
83
+ no_pref_task = next(t for t in ALL_TASKS if t.preferred_format is None)
84
+ assert format_match("marimo", no_pref_task) == 1.0
85
+ assert format_match("manim", no_pref_task) == 1.0
86
+
87
+
88
+ def test_narration_marimo():
89
+ assert narration_score("", "marimo") == 1.0
90
+
91
+
92
+ def test_narration_manim():
93
+ assert narration_score("", "manim") == 0.0
94
+ long_narration = (
95
+ "First we introduce the concept. Next we show the graph. "
96
+ "Then we animate the transformation step by step. "
97
+ "Finally we summarize the key takeaways from this scene."
98
+ )
99
+ assert narration_score(long_narration, "manim") > 0.5
100
+
101
+
102
+ def test_structure_marimo():
103
+ good = """import marimo as mo
104
+ app = mo.App()
105
+ @app.cell
106
+ def _():
107
+ mo.md("# Regression")
108
+ return
109
+ @app.cell
110
+ def _():
111
+ import matplotlib.pyplot as plt
112
+ return
113
+ @app.cell
114
+ def _():
115
+ slider = mo.ui.slider(0, 5)
116
+ return
117
+ """
118
+ assert marimo_structure(good, MARIMO_TASK) > 0.5
119
+
120
+
121
+ def test_context_usage():
122
+ assert context_usage("x = 1", []) == 0.5 # no context
123
+ assert context_usage(
124
+ "linear regression least squares gradient descent optimization",
125
+ ["linear regression least squares optimization methods"],
126
+ ) > 0.3
127
+
128
+
129
+ def test_generate_reward_garbage():
130
+ reward, comp = compute_generate_reward(
131
+ code="not python!!!",
132
+ fmt="marimo",
133
+ narration="",
134
+ task=MARIMO_TASK,
135
+ exec_success=False,
136
+ accumulated_context=[],
137
+ )
138
+ assert reward < 0.4
139
+ assert comp["code_valid"] == 0.0
140
+
141
+
142
+ def test_generate_reward_good():
143
+ code = """import marimo as mo
144
+ app = mo.App()
145
+ @app.cell
146
+ def _():
147
+ mo.md("# Linear Regression")
148
+ return
149
+ @app.cell
150
+ def _():
151
+ import numpy as np
152
+ import matplotlib.pyplot as plt
153
+ # linear regression least squares MSE gradient descent weights bias
154
+ X = np.linspace(0, 10, 50)
155
+ y = 2 * X + 1
156
+ return X, y
157
+ @app.cell
158
+ def _(X, y):
159
+ slider = mo.ui.slider(0, 5, value=2, label="Slope")
160
+ return
161
+ """
162
+ reward, comp = compute_generate_reward(
163
+ code=code,
164
+ fmt="marimo",
165
+ narration="",
166
+ task=MARIMO_TASK,
167
+ exec_success=True,
168
+ accumulated_context=["linear regression least squares"],
169
+ )
170
+ assert reward > 0.6
171
+ assert comp["code_valid"] == 1.0
172
+ assert comp["code_runs"] == 1.0
173
+
174
+
175
+ def test_generate_reward_wrong_format():
176
+ code = "import marimo as mo\napp = mo.App()\n@app.cell\ndef _():\n return\n"
177
+ r_right, _ = compute_generate_reward(code, "marimo", "", MARIMO_TASK, False, [])
178
+ r_wrong, _ = compute_generate_reward(code, "manim", "", MARIMO_TASK, False, [])
179
+ assert r_right > r_wrong
180
+
181
+
182
+ def test_reward_spread():
183
+ rewards = []
184
+ for task in ALL_TASKS[:5]:
185
+ for code in ["bad!!!", "x = 1", "import marimo as mo\napp = mo.App()"]:
186
+ r, _ = compute_generate_reward(code, "marimo", "", task, False, [])
187
+ rewards.append(r)
188
+ unique = set(round(r, 3) for r in rewards)
189
+ assert len(unique) >= 3
190
+
191
+
192
+ if __name__ == "__main__":
193
+ tests = [
194
+ test_ast_parses,
195
+ test_query_relevance,
196
+ test_result_novelty,
197
+ test_research_breadth,
198
+ test_explore_reward_integration,
199
+ test_keyword_coverage,
200
+ test_format_match,
201
+ test_narration_marimo,
202
+ test_narration_manim,
203
+ test_structure_marimo,
204
+ test_context_usage,
205
+ test_generate_reward_garbage,
206
+ test_generate_reward_good,
207
+ test_generate_reward_wrong_format,
208
+ test_reward_spread,
209
+ ]
210
+ passed = 0
211
+ for t in tests:
212
+ try:
213
+ t()
214
+ passed += 1
215
+ except Exception as e:
216
+ print(f"FAIL: {t.__name__}: {e}")
217
+ print(f"PASS: test_rewards ({passed}/{len(tests)})")
tests/test_task_bank.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for task bank integrity."""
2
+
3
+ import sys
4
+ from pathlib import Path
5
+
6
+ sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
7
+
8
+ from task_bank import (
9
+ ALL_TASKS,
10
+ ALGORITHMS,
11
+ EASY_TASKS,
12
+ HARD_TASKS,
13
+ MATH_TOPICS,
14
+ MEDIUM_TASKS,
15
+ ML_CONCEPTS,
16
+ STATISTICS_TASKS,
17
+ Task,
18
+ )
19
+
20
+
21
+ def test_task_counts():
22
+ assert len(ML_CONCEPTS) >= 5, f"ML_CONCEPTS has {len(ML_CONCEPTS)} (need >=5)"
23
+ assert len(MATH_TOPICS) >= 5, f"MATH_TOPICS has {len(MATH_TOPICS)} (need >=5)"
24
+ assert len(ALGORITHMS) >= 3, f"ALGORITHMS has {len(ALGORITHMS)} (need >=3)"
25
+ assert len(STATISTICS_TASKS) >= 2, f"STATISTICS_TASKS has {len(STATISTICS_TASKS)} (need >=2)"
26
+ assert len(ALL_TASKS) == len(ML_CONCEPTS) + len(MATH_TOPICS) + len(ALGORITHMS) + len(STATISTICS_TASKS)
27
+
28
+
29
+ def test_difficulty_partition():
30
+ assert len(EASY_TASKS) + len(MEDIUM_TASKS) + len(HARD_TASKS) == len(ALL_TASKS)
31
+ assert len(EASY_TASKS) > 0
32
+ assert len(MEDIUM_TASKS) > 0
33
+ assert len(HARD_TASKS) > 0
34
+
35
+
36
+ def test_task_fields():
37
+ for t in ALL_TASKS:
38
+ assert isinstance(t, Task)
39
+ assert t.topic, f"empty topic: {t}"
40
+ assert t.content, f"empty content: {t}"
41
+ assert t.tier in ("beginner", "intermediate", "advanced"), f"bad tier: {t.tier}"
42
+ assert t.keywords, f"empty keywords: {t.topic}"
43
+ assert t.preferred_format in ("marimo", "manim", None), f"bad format: {t.preferred_format}"
44
+ assert t.difficulty in ("easy", "medium", "hard"), f"bad difficulty: {t.difficulty}"
45
+
46
+
47
+ def test_both_formats_present():
48
+ formats = {t.preferred_format for t in ALL_TASKS}
49
+ assert "marimo" in formats, "no marimo tasks"
50
+ assert "manim" in formats, "no manim tasks"
51
+
52
+
53
+ if __name__ == "__main__":
54
+ test_task_counts()
55
+ test_difficulty_partition()
56
+ test_task_fields()
57
+ test_both_formats_present()
58
+ print("PASS: test_task_bank (4/4)")
uv.lock CHANGED
@@ -544,14 +544,14 @@ wheels = [
544
 
545
  [[package]]
546
  name = "click"
547
- version = "8.3.3"
548
  source = { registry = "https://pypi.org/simple" }
549
  dependencies = [
550
  { name = "colorama", marker = "sys_platform == 'win32'" },
551
  ]
552
- sdist = { url = "https://files.pythonhosted.org/packages/bb/63/f9e1ea081ce35720d8b92acde70daaedace594dc93b693c869e0d5910718/click-8.3.3.tar.gz", hash = "sha256:398329ad4837b2ff7cbe1dd166a4c0f8900c3ca3a218de04466f38f6497f18a2", size = 328061, upload-time = "2026-04-22T15:11:27.506Z" }
553
  wheels = [
554
- { url = "https://files.pythonhosted.org/packages/ae/44/c1221527f6a71a01ec6fbad7fa78f1d50dfa02217385cf0fa3eec7087d59/click-8.3.3-py3-none-any.whl", hash = "sha256:a2bf429bb3033c89fa4936ffb35d5cb471e3719e1f3c8a7c3fff0b8314305613", size = 110502, upload-time = "2026-04-22T15:11:25.044Z" },
555
  ]
556
 
557
  [[package]]
@@ -696,62 +696,62 @@ toml = [
696
 
697
  [[package]]
698
  name = "cryptography"
699
- version = "46.0.7"
700
  source = { registry = "https://pypi.org/simple" }
701
  dependencies = [
702
  { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
703
  { name = "typing-extensions", marker = "python_full_version < '3.11'" },
704
  ]
705
- sdist = { url = "https://files.pythonhosted.org/packages/47/93/ac8f3d5ff04d54bc814e961a43ae5b0b146154c89c61b47bb07557679b18/cryptography-46.0.7.tar.gz", hash = "sha256:e4cfd68c5f3e0bfdad0d38e023239b96a2fe84146481852dffbcca442c245aa5", size = 750652, upload-time = "2026-04-08T01:57:54.692Z" }
706
- wheels = [
707
- { url = "https://files.pythonhosted.org/packages/0b/5d/4a8f770695d73be252331e60e526291e3df0c9b27556a90a6b47bccca4c2/cryptography-46.0.7-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:ea42cbe97209df307fdc3b155f1b6fa2577c0defa8f1f7d3be7d31d189108ad4", size = 7179869, upload-time = "2026-04-08T01:56:17.157Z" },
708
- { url = "https://files.pythonhosted.org/packages/5f/45/6d80dc379b0bbc1f9d1e429f42e4cb9e1d319c7a8201beffd967c516ea01/cryptography-46.0.7-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b36a4695e29fe69215d75960b22577197aca3f7a25b9cf9d165dcfe9d80bc325", size = 4275492, upload-time = "2026-04-08T01:56:19.36Z" },
709
- { url = "https://files.pythonhosted.org/packages/4a/9a/1765afe9f572e239c3469f2cb429f3ba7b31878c893b246b4b2994ffe2fe/cryptography-46.0.7-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ad9ef796328c5e3c4ceed237a183f5d41d21150f972455a9d926593a1dcb308", size = 4426670, upload-time = "2026-04-08T01:56:21.415Z" },
710
- { url = "https://files.pythonhosted.org/packages/8f/3e/af9246aaf23cd4ee060699adab1e47ced3f5f7e7a8ffdd339f817b446462/cryptography-46.0.7-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:73510b83623e080a2c35c62c15298096e2a5dc8d51c3b4e1740211839d0dea77", size = 4280275, upload-time = "2026-04-08T01:56:23.539Z" },
711
- { url = "https://files.pythonhosted.org/packages/0f/54/6bbbfc5efe86f9d71041827b793c24811a017c6ac0fd12883e4caa86b8ed/cryptography-46.0.7-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cbd5fb06b62bd0721e1170273d3f4d5a277044c47ca27ee257025146c34cbdd1", size = 4928402, upload-time = "2026-04-08T01:56:25.624Z" },
712
- { url = "https://files.pythonhosted.org/packages/2d/cf/054b9d8220f81509939599c8bdbc0c408dbd2bdd41688616a20731371fe0/cryptography-46.0.7-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:420b1e4109cc95f0e5700eed79908cef9268265c773d3a66f7af1eef53d409ef", size = 4459985, upload-time = "2026-04-08T01:56:27.309Z" },
713
- { url = "https://files.pythonhosted.org/packages/f9/46/4e4e9c6040fb01c7467d47217d2f882daddeb8828f7df800cb806d8a2288/cryptography-46.0.7-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:24402210aa54baae71d99441d15bb5a1919c195398a87b563df84468160a65de", size = 3990652, upload-time = "2026-04-08T01:56:29.095Z" },
714
- { url = "https://files.pythonhosted.org/packages/36/5f/313586c3be5a2fbe87e4c9a254207b860155a8e1f3cca99f9910008e7d08/cryptography-46.0.7-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:8a469028a86f12eb7d2fe97162d0634026d92a21f3ae0ac87ed1c4a447886c83", size = 4279805, upload-time = "2026-04-08T01:56:30.928Z" },
715
- { url = "https://files.pythonhosted.org/packages/69/33/60dfc4595f334a2082749673386a4d05e4f0cf4df8248e63b2c3437585f2/cryptography-46.0.7-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9694078c5d44c157ef3162e3bf3946510b857df5a3955458381d1c7cfc143ddb", size = 4892883, upload-time = "2026-04-08T01:56:32.614Z" },
716
- { url = "https://files.pythonhosted.org/packages/c7/0b/333ddab4270c4f5b972f980adef4faa66951a4aaf646ca067af597f15563/cryptography-46.0.7-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:42a1e5f98abb6391717978baf9f90dc28a743b7d9be7f0751a6f56a75d14065b", size = 4459756, upload-time = "2026-04-08T01:56:34.306Z" },
717
- { url = "https://files.pythonhosted.org/packages/d2/14/633913398b43b75f1234834170947957c6b623d1701ffc7a9600da907e89/cryptography-46.0.7-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91bbcb08347344f810cbe49065914fe048949648f6bd5c2519f34619142bbe85", size = 4410244, upload-time = "2026-04-08T01:56:35.977Z" },
718
- { url = "https://files.pythonhosted.org/packages/10/f2/19ceb3b3dc14009373432af0c13f46aa08e3ce334ec6eff13492e1812ccd/cryptography-46.0.7-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5d1c02a14ceb9148cc7816249f64f623fbfee39e8c03b3650d842ad3f34d637e", size = 4674868, upload-time = "2026-04-08T01:56:38.034Z" },
719
- { url = "https://files.pythonhosted.org/packages/1a/bb/a5c213c19ee94b15dfccc48f363738633a493812687f5567addbcbba9f6f/cryptography-46.0.7-cp311-abi3-win32.whl", hash = "sha256:d23c8ca48e44ee015cd0a54aeccdf9f09004eba9fc96f38c911011d9ff1bd457", size = 3026504, upload-time = "2026-04-08T01:56:39.666Z" },
720
- { url = "https://files.pythonhosted.org/packages/2b/02/7788f9fefa1d060ca68717c3901ae7fffa21ee087a90b7f23c7a603c32ae/cryptography-46.0.7-cp311-abi3-win_amd64.whl", hash = "sha256:397655da831414d165029da9bc483bed2fe0e75dde6a1523ec2fe63f3c46046b", size = 3488363, upload-time = "2026-04-08T01:56:41.893Z" },
721
- { url = "https://files.pythonhosted.org/packages/7b/56/15619b210e689c5403bb0540e4cb7dbf11a6bf42e483b7644e471a2812b3/cryptography-46.0.7-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:d151173275e1728cf7839aaa80c34fe550c04ddb27b34f48c232193df8db5842", size = 7119671, upload-time = "2026-04-08T01:56:44Z" },
722
- { url = "https://files.pythonhosted.org/packages/74/66/e3ce040721b0b5599e175ba91ab08884c75928fbeb74597dd10ef13505d2/cryptography-46.0.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:db0f493b9181c7820c8134437eb8b0b4792085d37dbb24da050476ccb664e59c", size = 4268551, upload-time = "2026-04-08T01:56:46.071Z" },
723
- { url = "https://files.pythonhosted.org/packages/03/11/5e395f961d6868269835dee1bafec6a1ac176505a167f68b7d8818431068/cryptography-46.0.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ebd6daf519b9f189f85c479427bbd6e9c9037862cf8fe89ee35503bd209ed902", size = 4408887, upload-time = "2026-04-08T01:56:47.718Z" },
724
- { url = "https://files.pythonhosted.org/packages/40/53/8ed1cf4c3b9c8e611e7122fb56f1c32d09e1fff0f1d77e78d9ff7c82653e/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b7b412817be92117ec5ed95f880defe9cf18a832e8cafacf0a22337dc1981b4d", size = 4271354, upload-time = "2026-04-08T01:56:49.312Z" },
725
- { url = "https://files.pythonhosted.org/packages/50/46/cf71e26025c2e767c5609162c866a78e8a2915bbcfa408b7ca495c6140c4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:fbfd0e5f273877695cb93baf14b185f4878128b250cc9f8e617ea0c025dfb022", size = 4905845, upload-time = "2026-04-08T01:56:50.916Z" },
726
- { url = "https://files.pythonhosted.org/packages/c0/ea/01276740375bac6249d0a971ebdf6b4dc9ead0ee0a34ef3b5a88c1a9b0d4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:ffca7aa1d00cf7d6469b988c581598f2259e46215e0140af408966a24cf086ce", size = 4444641, upload-time = "2026-04-08T01:56:52.882Z" },
727
- { url = "https://files.pythonhosted.org/packages/3d/4c/7d258f169ae71230f25d9f3d06caabcff8c3baf0978e2b7d65e0acac3827/cryptography-46.0.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:60627cf07e0d9274338521205899337c5d18249db56865f943cbe753aa96f40f", size = 3967749, upload-time = "2026-04-08T01:56:54.597Z" },
728
- { url = "https://files.pythonhosted.org/packages/b5/2a/2ea0767cad19e71b3530e4cad9605d0b5e338b6a1e72c37c9c1ceb86c333/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:80406c3065e2c55d7f49a9550fe0c49b3f12e5bfff5dedb727e319e1afb9bf99", size = 4270942, upload-time = "2026-04-08T01:56:56.416Z" },
729
- { url = "https://files.pythonhosted.org/packages/41/3d/fe14df95a83319af25717677e956567a105bb6ab25641acaa093db79975d/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:c5b1ccd1239f48b7151a65bc6dd54bcfcc15e028c8ac126d3fada09db0e07ef1", size = 4871079, upload-time = "2026-04-08T01:56:58.31Z" },
730
- { url = "https://files.pythonhosted.org/packages/9c/59/4a479e0f36f8f378d397f4eab4c850b4ffb79a2f0d58704b8fa0703ddc11/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:d5f7520159cd9c2154eb61eb67548ca05c5774d39e9c2c4339fd793fe7d097b2", size = 4443999, upload-time = "2026-04-08T01:57:00.508Z" },
731
- { url = "https://files.pythonhosted.org/packages/28/17/b59a741645822ec6d04732b43c5d35e4ef58be7bfa84a81e5ae6f05a1d33/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fcd8eac50d9138c1d7fc53a653ba60a2bee81a505f9f8850b6b2888555a45d0e", size = 4399191, upload-time = "2026-04-08T01:57:02.654Z" },
732
- { url = "https://files.pythonhosted.org/packages/59/6a/bb2e166d6d0e0955f1e9ff70f10ec4b2824c9cfcdb4da772c7dd69cc7d80/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:65814c60f8cc400c63131584e3e1fad01235edba2614b61fbfbfa954082db0ee", size = 4655782, upload-time = "2026-04-08T01:57:04.592Z" },
733
- { url = "https://files.pythonhosted.org/packages/95/b6/3da51d48415bcb63b00dc17c2eff3a651b7c4fed484308d0f19b30e8cb2c/cryptography-46.0.7-cp314-cp314t-win32.whl", hash = "sha256:fdd1736fed309b4300346f88f74cd120c27c56852c3838cab416e7a166f67298", size = 3002227, upload-time = "2026-04-08T01:57:06.91Z" },
734
- { url = "https://files.pythonhosted.org/packages/32/a8/9f0e4ed57ec9cebe506e58db11ae472972ecb0c659e4d52bbaee80ca340a/cryptography-46.0.7-cp314-cp314t-win_amd64.whl", hash = "sha256:e06acf3c99be55aa3b516397fe42f5855597f430add9c17fa46bf2e0fb34c9bb", size = 3475332, upload-time = "2026-04-08T01:57:08.807Z" },
735
- { url = "https://files.pythonhosted.org/packages/a7/7f/cd42fc3614386bc0c12f0cb3c4ae1fc2bbca5c9662dfed031514911d513d/cryptography-46.0.7-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:462ad5cb1c148a22b2e3bcc5ad52504dff325d17daf5df8d88c17dda1f75f2a4", size = 7165618, upload-time = "2026-04-08T01:57:10.645Z" },
736
- { url = "https://files.pythonhosted.org/packages/a5/d0/36a49f0262d2319139d2829f773f1b97ef8aef7f97e6e5bd21455e5a8fb5/cryptography-46.0.7-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:84d4cced91f0f159a7ddacad249cc077e63195c36aac40b4150e7a57e84fffe7", size = 4270628, upload-time = "2026-04-08T01:57:12.885Z" },
737
- { url = "https://files.pythonhosted.org/packages/8a/6c/1a42450f464dda6ffbe578a911f773e54dd48c10f9895a23a7e88b3e7db5/cryptography-46.0.7-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:128c5edfe5e5938b86b03941e94fac9ee793a94452ad1365c9fc3f4f62216832", size = 4415405, upload-time = "2026-04-08T01:57:14.923Z" },
738
- { url = "https://files.pythonhosted.org/packages/9a/92/4ed714dbe93a066dc1f4b4581a464d2d7dbec9046f7c8b7016f5286329e2/cryptography-46.0.7-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:5e51be372b26ef4ba3de3c167cd3d1022934bc838ae9eaad7e644986d2a3d163", size = 4272715, upload-time = "2026-04-08T01:57:16.638Z" },
739
- { url = "https://files.pythonhosted.org/packages/b7/e6/a26b84096eddd51494bba19111f8fffe976f6a09f132706f8f1bf03f51f7/cryptography-46.0.7-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cdf1a610ef82abb396451862739e3fc93b071c844399e15b90726ef7470eeaf2", size = 4918400, upload-time = "2026-04-08T01:57:19.021Z" },
740
- { url = "https://files.pythonhosted.org/packages/c7/08/ffd537b605568a148543ac3c2b239708ae0bd635064bab41359252ef88ed/cryptography-46.0.7-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:1d25aee46d0c6f1a501adcddb2d2fee4b979381346a78558ed13e50aa8a59067", size = 4450634, upload-time = "2026-04-08T01:57:21.185Z" },
741
- { url = "https://files.pythonhosted.org/packages/16/01/0cd51dd86ab5b9befe0d031e276510491976c3a80e9f6e31810cce46c4ad/cryptography-46.0.7-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:cdfbe22376065ffcf8be74dc9a909f032df19bc58a699456a21712d6e5eabfd0", size = 3985233, upload-time = "2026-04-08T01:57:22.862Z" },
742
- { url = "https://files.pythonhosted.org/packages/92/49/819d6ed3a7d9349c2939f81b500a738cb733ab62fbecdbc1e38e83d45e12/cryptography-46.0.7-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:abad9dac36cbf55de6eb49badd4016806b3165d396f64925bf2999bcb67837ba", size = 4271955, upload-time = "2026-04-08T01:57:24.814Z" },
743
- { url = "https://files.pythonhosted.org/packages/80/07/ad9b3c56ebb95ed2473d46df0847357e01583f4c52a85754d1a55e29e4d0/cryptography-46.0.7-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:935ce7e3cfdb53e3536119a542b839bb94ec1ad081013e9ab9b7cfd478b05006", size = 4879888, upload-time = "2026-04-08T01:57:26.88Z" },
744
- { url = "https://files.pythonhosted.org/packages/b8/c7/201d3d58f30c4c2bdbe9b03844c291feb77c20511cc3586daf7edc12a47b/cryptography-46.0.7-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:35719dc79d4730d30f1c2b6474bd6acda36ae2dfae1e3c16f2051f215df33ce0", size = 4449961, upload-time = "2026-04-08T01:57:29.068Z" },
745
- { url = "https://files.pythonhosted.org/packages/a5/ef/649750cbf96f3033c3c976e112265c33906f8e462291a33d77f90356548c/cryptography-46.0.7-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7bbc6ccf49d05ac8f7d7b5e2e2c33830d4fe2061def88210a126d130d7f71a85", size = 4401696, upload-time = "2026-04-08T01:57:31.029Z" },
746
- { url = "https://files.pythonhosted.org/packages/41/52/a8908dcb1a389a459a29008c29966c1d552588d4ae6d43f3a1a4512e0ebe/cryptography-46.0.7-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a1529d614f44b863a7b480c6d000fe93b59acee9c82ffa027cfadc77521a9f5e", size = 4664256, upload-time = "2026-04-08T01:57:33.144Z" },
747
- { url = "https://files.pythonhosted.org/packages/4b/fa/f0ab06238e899cc3fb332623f337a7364f36f4bb3f2534c2bb95a35b132c/cryptography-46.0.7-cp38-abi3-win32.whl", hash = "sha256:f247c8c1a1fb45e12586afbb436ef21ff1e80670b2861a90353d9b025583d246", size = 3013001, upload-time = "2026-04-08T01:57:34.933Z" },
748
- { url = "https://files.pythonhosted.org/packages/d2/f1/00ce3bde3ca542d1acd8f8cfa38e446840945aa6363f9b74746394b14127/cryptography-46.0.7-cp38-abi3-win_amd64.whl", hash = "sha256:506c4ff91eff4f82bdac7633318a526b1d1309fc07ca76a3ad182cb5b686d6d3", size = 3472985, upload-time = "2026-04-08T01:57:36.714Z" },
749
- { url = "https://files.pythonhosted.org/packages/63/0c/dca8abb64e7ca4f6b2978769f6fea5ad06686a190cec381f0a796fdcaaba/cryptography-46.0.7-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:fc9ab8856ae6cf7c9358430e49b368f3108f050031442eaeb6b9d87e4dcf4e4f", size = 3476879, upload-time = "2026-04-08T01:57:38.664Z" },
750
- { url = "https://files.pythonhosted.org/packages/3a/ea/075aac6a84b7c271578d81a2f9968acb6e273002408729f2ddff517fed4a/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:d3b99c535a9de0adced13d159c5a9cf65c325601aa30f4be08afd680643e9c15", size = 4219700, upload-time = "2026-04-08T01:57:40.625Z" },
751
- { url = "https://files.pythonhosted.org/packages/6c/7b/1c55db7242b5e5612b29fc7a630e91ee7a6e3c8e7bf5406d22e206875fbd/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:d02c738dacda7dc2a74d1b2b3177042009d5cab7c7079db74afc19e56ca1b455", size = 4385982, upload-time = "2026-04-08T01:57:42.725Z" },
752
- { url = "https://files.pythonhosted.org/packages/cb/da/9870eec4b69c63ef5925bf7d8342b7e13bc2ee3d47791461c4e49ca212f4/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:04959522f938493042d595a736e7dbdff6eb6cc2339c11465b3ff89343b65f65", size = 4219115, upload-time = "2026-04-08T01:57:44.939Z" },
753
- { url = "https://files.pythonhosted.org/packages/f4/72/05aa5832b82dd341969e9a734d1812a6aadb088d9eb6f0430fc337cc5a8f/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:3986ac1dee6def53797289999eabe84798ad7817f3e97779b5061a95b0ee4968", size = 4385479, upload-time = "2026-04-08T01:57:46.86Z" },
754
- { url = "https://files.pythonhosted.org/packages/20/2a/1b016902351a523aa2bd446b50a5bc1175d7a7d1cf90fe2ef904f9b84ebc/cryptography-46.0.7-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:258514877e15963bd43b558917bc9f54cf7cf866c38aa576ebf47a77ddbc43a4", size = 3412829, upload-time = "2026-04-08T01:57:48.874Z" },
755
  ]
756
 
757
  [[package]]
@@ -1136,7 +1136,7 @@ wheels = [
1136
 
1137
  [[package]]
1138
  name = "huggingface-hub"
1139
- version = "1.11.0"
1140
  source = { registry = "https://pypi.org/simple" }
1141
  dependencies = [
1142
  { name = "filelock" },
@@ -1149,9 +1149,9 @@ dependencies = [
1149
  { name = "typer" },
1150
  { name = "typing-extensions" },
1151
  ]
1152
- sdist = { url = "https://files.pythonhosted.org/packages/dc/89/e7aa12d8a6b9259bed10671abb25ae6fa437c0f88a86ecbf59617bae7759/huggingface_hub-1.11.0.tar.gz", hash = "sha256:15fb3713c7f9cdff7b808a94fd91664f661ab142796bb48c9cd9493e8d166278", size = 761749, upload-time = "2026-04-16T13:07:39.73Z" }
1153
  wheels = [
1154
- { url = "https://files.pythonhosted.org/packages/37/02/4f3f8997d1ea7fe0146b343e5e14bd065fa87af790d07e5576d31b31cc18/huggingface_hub-1.11.0-py3-none-any.whl", hash = "sha256:42a6de0afbfeb5e022222d36398f029679db4eb4778801aafda32257ae9131ab", size = 645499, upload-time = "2026-04-16T13:07:37.716Z" },
1155
  ]
1156
 
1157
  [[package]]
@@ -1751,7 +1751,7 @@ wheels = [
1751
 
1752
  [[package]]
1753
  name = "marimo"
1754
- version = "0.23.2"
1755
  source = { registry = "https://pypi.org/simple" }
1756
  dependencies = [
1757
  { name = "click" },
@@ -1774,9 +1774,9 @@ dependencies = [
1774
  { name = "uvicorn" },
1775
  { name = "websockets" },
1776
  ]
1777
- sdist = { url = "https://files.pythonhosted.org/packages/31/7f/8490043913942e48e8f21b88ad49fe736c2a236ff2a393c8ae67724105f2/marimo-0.23.2.tar.gz", hash = "sha256:25d810b4864d534c1cf33eb3020320e8ef7319e9809b3fb6ae5644135f5a660a", size = 38383208, upload-time = "2026-04-20T21:49:06.137Z" }
1778
  wheels = [
1779
- { url = "https://files.pythonhosted.org/packages/b2/23/c4d34eb5da111b0f31d2feeddf7ee619ce4883999dbf360ed781b327dc0f/marimo-0.23.2-py3-none-any.whl", hash = "sha256:b1fbf5684fbb20d987d9ce6f569fd32789693ff4fd59a5678a0598b680cc41ba", size = 38801200, upload-time = "2026-04-20T21:48:58.165Z" },
1780
  ]
1781
 
1782
  [[package]]
@@ -2311,10 +2311,13 @@ name = "openenv-explainer-env"
2311
  version = "0.1.0"
2312
  source = { editable = "." }
2313
  dependencies = [
 
 
2314
  { name = "manim", version = "0.19.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
2315
  { name = "manim", version = "0.20.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
2316
  { name = "marimo" },
2317
  { name = "openenv-core", extra = ["core"] },
 
2318
  ]
2319
 
2320
  [package.optional-dependencies]
@@ -2325,25 +2328,31 @@ dev = [
2325
 
2326
  [package.metadata]
2327
  requires-dist = [
 
 
2328
  { name = "manim", specifier = ">=0.18.0" },
2329
  { name = "marimo", specifier = ">=0.10.0" },
2330
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2331
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2332
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 
2333
  ]
2334
  provides-extras = ["dev"]
2335
 
 
 
 
2336
  [[package]]
2337
  name = "opentelemetry-api"
2338
- version = "1.41.0"
2339
  source = { registry = "https://pypi.org/simple" }
2340
  dependencies = [
2341
  { name = "importlib-metadata" },
2342
  { name = "typing-extensions" },
2343
  ]
2344
- sdist = { url = "https://files.pythonhosted.org/packages/47/8e/3778a7e87801d994869a9396b9fc2a289e5f9be91ff54a27d41eace494b0/opentelemetry_api-1.41.0.tar.gz", hash = "sha256:9421d911326ec12dee8bc933f7839090cad7a3f13fcfb0f9e82f8174dc003c09", size = 71416, upload-time = "2026-04-09T14:38:34.544Z" }
2345
  wheels = [
2346
- { url = "https://files.pythonhosted.org/packages/58/ee/99ab786653b3bda9c37ade7e24a7b607a1b1f696063172768417539d876d/opentelemetry_api-1.41.0-py3-none-any.whl", hash = "sha256:0e77c806e6a89c9e4f8d372034622f3e1418a11bdbe1c80a50b3d3397ad0fa4f", size = 69007, upload-time = "2026-04-09T14:38:11.833Z" },
2347
  ]
2348
 
2349
  [[package]]
@@ -2429,11 +2438,11 @@ wheels = [
2429
 
2430
  [[package]]
2431
  name = "packaging"
2432
- version = "26.1"
2433
  source = { registry = "https://pypi.org/simple" }
2434
- sdist = { url = "https://files.pythonhosted.org/packages/df/de/0d2b39fb4af88a0258f3bac87dfcbb48e73fbdea4a2ed0e2213f9a4c2f9a/packaging-26.1.tar.gz", hash = "sha256:f042152b681c4bfac5cae2742a55e103d27ab2ec0f3d88037136b6bfe7c9c5de", size = 215519, upload-time = "2026-04-14T21:12:49.362Z" }
2435
  wheels = [
2436
- { url = "https://files.pythonhosted.org/packages/7a/c2/920ef838e2f0028c8262f16101ec09ebd5969864e5a64c4c05fad0617c56/packaging-26.1-py3-none-any.whl", hash = "sha256:5d9c0669c6285e491e0ced2eee587eaf67b670d94a19e94e3984a481aba6802f", size = 95831, upload-time = "2026-04-14T21:12:47.56Z" },
2437
  ]
2438
 
2439
  [[package]]
@@ -3087,7 +3096,7 @@ name = "pyobjc-framework-cocoa"
3087
  version = "12.1"
3088
  source = { registry = "https://pypi.org/simple" }
3089
  dependencies = [
3090
- { name = "pyobjc-core" },
3091
  ]
3092
  sdist = { url = "https://files.pythonhosted.org/packages/02/a3/16ca9a15e77c061a9250afbae2eae26f2e1579eb8ca9462ae2d2c71e1169/pyobjc_framework_cocoa-12.1.tar.gz", hash = "sha256:5556c87db95711b985d5efdaaf01c917ddd41d148b1e52a0c66b1a2e2c5c1640", size = 2772191, upload-time = "2025-11-14T10:13:02.069Z" }
3093
  wheels = [
@@ -3810,6 +3819,15 @@ wheels = [
3810
  { url = "https://files.pythonhosted.org/packages/48/2c/6c9bb53db56c8a12a736d2158a8b842a5993b96daabc29d90a098e840280/svgelements-1.9.6-py2.py3-none-any.whl", hash = "sha256:8a5cf2cc066d98e713d5b875b1d6e5eeb9b92e855e835ebd7caab2713ae1dcad", size = 137856, upload-time = "2023-08-17T02:01:48.76Z" },
3811
  ]
3812
 
 
 
 
 
 
 
 
 
 
3813
  [[package]]
3814
  name = "tomli"
3815
  version = "2.4.1"
@@ -3932,11 +3950,11 @@ wheels = [
3932
 
3933
  [[package]]
3934
  name = "tzdata"
3935
- version = "2026.1"
3936
  source = { registry = "https://pypi.org/simple" }
3937
- sdist = { url = "https://files.pythonhosted.org/packages/19/f5/cd531b2d15a671a40c0f66cf06bc3570a12cd56eef98960068ebbad1bf5a/tzdata-2026.1.tar.gz", hash = "sha256:67658a1903c75917309e753fdc349ac0efd8c27db7a0cb406a25be4840f87f98", size = 197639, upload-time = "2026-04-03T11:25:22.002Z" }
3938
  wheels = [
3939
- { url = "https://files.pythonhosted.org/packages/b0/70/d460bd685a170790ec89317e9bd33047988e4bce507b831f5db771e142de/tzdata-2026.1-py2.py3-none-any.whl", hash = "sha256:4b1d2be7ac37ceafd7327b961aa3a54e467efbdb563a23655fbfe0d39cfc42a9", size = 348952, upload-time = "2026-04-03T11:25:20.313Z" },
3940
  ]
3941
 
3942
  [[package]]
@@ -4174,6 +4192,20 @@ wheels = [
4174
  { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
4175
  ]
4176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4177
  [[package]]
4178
  name = "zipp"
4179
  version = "3.23.1"
 
544
 
545
  [[package]]
546
  name = "click"
547
+ version = "8.3.2"
548
  source = { registry = "https://pypi.org/simple" }
549
  dependencies = [
550
  { name = "colorama", marker = "sys_platform == 'win32'" },
551
  ]
552
+ sdist = { url = "https://files.pythonhosted.org/packages/57/75/31212c6bf2503fdf920d87fee5d7a86a2e3bcf444984126f13d8e4016804/click-8.3.2.tar.gz", hash = "sha256:14162b8b3b3550a7d479eafa77dfd3c38d9dc8951f6f69c78913a8f9a7540fd5", size = 302856, upload-time = "2026-04-03T19:14:45.118Z" }
553
  wheels = [
554
+ { url = "https://files.pythonhosted.org/packages/e4/20/71885d8b97d4f3dde17b1fdb92dbd4908b00541c5a3379787137285f602e/click-8.3.2-py3-none-any.whl", hash = "sha256:1924d2c27c5653561cd2cae4548d1406039cb79b858b747cfea24924bbc1616d", size = 108379, upload-time = "2026-04-03T19:14:43.505Z" },
555
  ]
556
 
557
  [[package]]
 
696
 
697
  [[package]]
698
  name = "cryptography"
699
+ version = "47.0.0"
700
  source = { registry = "https://pypi.org/simple" }
701
  dependencies = [
702
  { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
703
  { name = "typing-extensions", marker = "python_full_version < '3.11'" },
704
  ]
705
+ sdist = { url = "https://files.pythonhosted.org/packages/ef/b2/7ffa7fe8207a8c42147ffe70c3e360b228160c1d85dc3faff16aaa3244c0/cryptography-47.0.0.tar.gz", hash = "sha256:9f8e55fe4e63613a5e1cc5819030f27b97742d720203a087802ce4ce9ceb52bb", size = 830863, upload-time = "2026-04-24T19:54:57.056Z" }
706
+ wheels = [
707
+ { url = "https://files.pythonhosted.org/packages/a4/98/40dfe932134bdcae4f6ab5927c87488754bf9eb79297d7e0070b78dd58e9/cryptography-47.0.0-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:160ad728f128972d362e714054f6ba0067cab7fb350c5202a9ae8ae4ce3ef1a0", size = 7912214, upload-time = "2026-04-24T19:53:03.864Z" },
708
+ { url = "https://files.pythonhosted.org/packages/34/c6/2733531243fba725f58611b918056b277692f1033373dcc8bd01af1c05d4/cryptography-47.0.0-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b9a8943e359b7615db1a3ba587994618e094ff3d6fa5a390c73d079ce18b3973", size = 4644617, upload-time = "2026-04-24T19:53:06.909Z" },
709
+ { url = "https://files.pythonhosted.org/packages/00/e3/b27be1a670a9b87f855d211cf0e1174a5d721216b7616bd52d8581d912ed/cryptography-47.0.0-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f5c15764f261394b22aef6b00252f5195f46f2ca300bec57149474e2538b31f8", size = 4668186, upload-time = "2026-04-24T19:53:09.053Z" },
710
+ { url = "https://files.pythonhosted.org/packages/81/b9/8443cfe5d17d482d348cee7048acf502bb89a51b6382f06240fd290d4ca3/cryptography-47.0.0-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:9c59ab0e0fa3a180a5a9c59f3a5abe3ef90d474bc56d7fadfbe80359491b615b", size = 4651244, upload-time = "2026-04-24T19:53:11.217Z" },
711
+ { url = "https://files.pythonhosted.org/packages/5d/5e/13ed0cdd0eb88ba159d6dd5ebfece8cb901dbcf1ae5ac4072e28b55d3153/cryptography-47.0.0-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:34b4358b925a5ea3e14384ca781a2c0ef7ac219b57bb9eacc4457078e2b19f92", size = 5252906, upload-time = "2026-04-24T19:53:13.532Z" },
712
+ { url = "https://files.pythonhosted.org/packages/64/16/ed058e1df0f33d440217cd120d41d5dda9dd215a80b8187f68483185af82/cryptography-47.0.0-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0024b87d47ae2399165a6bfb20d24888881eeab83ae2566d62467c5ff0030ce7", size = 4701842, upload-time = "2026-04-24T19:53:15.618Z" },
713
+ { url = "https://files.pythonhosted.org/packages/02/e0/3d30986b30fdbd9e969abbdf8ba00ed0618615144341faeb57f395a084fe/cryptography-47.0.0-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:1e47422b5557bb82d3fff997e8d92cff4e28b9789576984f08c248d2b3535d93", size = 4289313, upload-time = "2026-04-24T19:53:17.755Z" },
714
+ { url = "https://files.pythonhosted.org/packages/df/fd/32db38e3ad0cb331f0691cb4c7a8a6f176f679124dee746b3af6633db4d9/cryptography-47.0.0-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:6f29f36582e6151d9686235e586dd35bb67491f024767d10b842e520dc6a07ac", size = 4650964, upload-time = "2026-04-24T19:53:20.062Z" },
715
+ { url = "https://files.pythonhosted.org/packages/86/53/5395d944dfd48cb1f67917f533c609c34347185ef15eb4308024c876f274/cryptography-47.0.0-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:a9b761f012a943b7de0e828843c5688d0de94a0578d44d6c85a1bae32f87791f", size = 5207817, upload-time = "2026-04-24T19:53:22.498Z" },
716
+ { url = "https://files.pythonhosted.org/packages/34/4f/e5711b28e1901f7d480a2b1b688b645aa4c77c73f10731ed17e7f7db3f0d/cryptography-47.0.0-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4e1de79e047e25d6e9f8cea71c86b4a53aced64134f0f003bbcbf3655fd172c8", size = 4701544, upload-time = "2026-04-24T19:53:24.356Z" },
717
+ { url = "https://files.pythonhosted.org/packages/22/22/c8ddc25de3010fc8da447648f5a092c40e7a8fadf01dd6d255d9c0b9373d/cryptography-47.0.0-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ef6b3634087f18d2155b1e8ce264e5345a753da2c5fa9815e7d41315c90f8318", size = 4783536, upload-time = "2026-04-24T19:53:26.665Z" },
718
+ { url = "https://files.pythonhosted.org/packages/66/b6/d4a68f4ea999c6d89e8498579cba1c5fcba4276284de7773b17e4fa69293/cryptography-47.0.0-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:11dbb9f50a0f1bb9757b3d8c27c1101780efb8f0bdecfb12439c22a74d64c001", size = 4926106, upload-time = "2026-04-24T19:53:28.686Z" },
719
+ { url = "https://files.pythonhosted.org/packages/54/ed/5f524db1fade9c013aa618e1c99c6ed05e8ffc9ceee6cda22fed22dda3f4/cryptography-47.0.0-cp311-abi3-win32.whl", hash = "sha256:7fda2f02c9015db3f42bb8a22324a454516ed10a8c29ca6ece6cdbb5efe2a203", size = 3258581, upload-time = "2026-04-24T19:53:31.058Z" },
720
+ { url = "https://files.pythonhosted.org/packages/b2/dc/1b901990b174786569029f67542b3edf72ac068b6c3c8683c17e6a2f5363/cryptography-47.0.0-cp311-abi3-win_amd64.whl", hash = "sha256:f5c3296dab66202f1b18a91fa266be93d6aa0c2806ea3d67762c69f60adc71aa", size = 3775309, upload-time = "2026-04-24T19:53:33.054Z" },
721
+ { url = "https://files.pythonhosted.org/packages/14/88/7aa18ad9c11bc87689affa5ce4368d884b517502d75739d475fc6f4a03c7/cryptography-47.0.0-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:be12cb6a204f77ed968bcefe68086eb061695b540a3dd05edac507a3111b25f0", size = 7904299, upload-time = "2026-04-24T19:53:35.003Z" },
722
+ { url = "https://files.pythonhosted.org/packages/07/55/c18f75724544872f234678fdedc871391722cb34a2aee19faa9f63100bb2/cryptography-47.0.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2ebd84adf0728c039a3be2700289378e1c164afc6748df1a5ed456767bef9ba7", size = 4631180, upload-time = "2026-04-24T19:53:37.517Z" },
723
+ { url = "https://files.pythonhosted.org/packages/ee/65/31a5cc0eaca99cec5bafffe155d407115d96136bb161e8b49e0ef73f09a7/cryptography-47.0.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7f68d6fbc7fbbcfb0939fea72c3b96a9f9a6edfc0e1b1d29778a2066030418b1", size = 4653529, upload-time = "2026-04-24T19:53:39.775Z" },
724
+ { url = "https://files.pythonhosted.org/packages/e5/bc/641c0519a495f3bfd0421b48d7cd325c4336578523ccd76ea322b6c29c7a/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:6651d32eff255423503aa276739da98c30f26c40cbeffcc6048e0d54ef704c0c", size = 4638570, upload-time = "2026-04-24T19:53:42.129Z" },
725
+ { url = "https://files.pythonhosted.org/packages/2b/f2/300327b0a47f6dc94dd8b71b57052aefe178bb51745073d73d80604f11ab/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:3fb8fa48075fad7193f2e5496135c6a76ac4b2aa5a38433df0a539296b377829", size = 5238019, upload-time = "2026-04-24T19:53:44.577Z" },
726
+ { url = "https://files.pythonhosted.org/packages/e9/5a/5b5cf994391d4bf9d9c7efd4c66aabe4d95227256627f8fea6cff7dfadbd/cryptography-47.0.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:11438c7518132d95f354fa01a4aa2f806d172a061a7bed18cf18cbdacdb204d7", size = 4686832, upload-time = "2026-04-24T19:53:47.015Z" },
727
+ { url = "https://files.pythonhosted.org/packages/dc/2c/ae950e28fd6475c852fc21a44db3e6b5bcc1261d1e370f2b6e42fa800fef/cryptography-47.0.0-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:8c1a736bbb3288005796c3f7ccb9453360d7fed483b13b9f468aea5171432923", size = 4269301, upload-time = "2026-04-24T19:53:48.97Z" },
728
+ { url = "https://files.pythonhosted.org/packages/67/fb/6a39782e150ffe5cc1b0018cb6ddc48bf7ca62b498d7539ffc8a758e977d/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:f1557695e5c2b86e204f6ce9470497848634100787935ab7adc5397c54abd7ab", size = 4638110, upload-time = "2026-04-24T19:53:51.011Z" },
729
+ { url = "https://files.pythonhosted.org/packages/8e/d7/0b3c71090a76e5c203164a47688b697635ece006dcd2499ab3a4dbd3f0bd/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:f9a034b642b960767fb343766ae5ba6ad653f2e890ddd82955aef288ffea8736", size = 5194988, upload-time = "2026-04-24T19:53:52.962Z" },
730
+ { url = "https://files.pythonhosted.org/packages/63/33/63a961498a9df51721ab578c5a2622661411fc520e00bd83b0cc64eb20c4/cryptography-47.0.0-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:b1c76fca783aa7698eb21eb14f9c4aa09452248ee54a627d125025a43f83e7a7", size = 4686563, upload-time = "2026-04-24T19:53:55.274Z" },
731
+ { url = "https://files.pythonhosted.org/packages/b7/bf/5ee5b145248f92250de86145d1c1d6edebbd57a7fe7caa4dedb5d4cf06a1/cryptography-47.0.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:4f7722c97826770bab8ae92959a2e7b20a5e9e9bf4deae68fd86c3ca457bab52", size = 4770094, upload-time = "2026-04-24T19:53:57.753Z" },
732
+ { url = "https://files.pythonhosted.org/packages/92/43/21d220b2da5d517773894dacdcdb5c682c28d3fffce65548cb06e87d5501/cryptography-47.0.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:09f6d7bf6724f8db8b32f11eccf23efc8e759924bc5603800335cf8859a3ddbd", size = 4913811, upload-time = "2026-04-24T19:54:00.236Z" },
733
+ { url = "https://files.pythonhosted.org/packages/31/98/dc4ad376ac5f1a1a7d4a83f7b0c6f2bcad36b5d2d8f30aeb482d3a7d9582/cryptography-47.0.0-cp314-cp314t-win32.whl", hash = "sha256:6eebcaf0df1d21ce1f90605c9b432dd2c4f4ab665ac29a40d5e3fc68f51b5e63", size = 3237158, upload-time = "2026-04-24T19:54:02.606Z" },
734
+ { url = "https://files.pythonhosted.org/packages/bc/da/97f62d18306b5133468bc3f8cc73a3111e8cdc8cf8d3e69474d6e5fd2d1b/cryptography-47.0.0-cp314-cp314t-win_amd64.whl", hash = "sha256:51c9313e90bd1690ec5a75ed047c27c0b8e6c570029712943d6116ef9a90620b", size = 3758706, upload-time = "2026-04-24T19:54:04.433Z" },
735
+ { url = "https://files.pythonhosted.org/packages/e0/34/a4fae8ae7c3bc227460c9ae43f56abf1b911da0ec29e0ebac53bb0a4b6b7/cryptography-47.0.0-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:14432c8a9bcb37009784f9594a62fae211a2ae9543e96c92b2a8e4c3cd5cd0c4", size = 7904072, upload-time = "2026-04-24T19:54:06.411Z" },
736
+ { url = "https://files.pythonhosted.org/packages/01/64/d7b1e54fdb69f22d24a64bb3e88dc718b31c7fb10ef0b9691a3cf7eeea6e/cryptography-47.0.0-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:07efe86201817e7d3c18781ca9770bc0db04e1e48c994be384e4602bc38f8f27", size = 4635767, upload-time = "2026-04-24T19:54:08.519Z" },
737
+ { url = "https://files.pythonhosted.org/packages/8b/7b/cca826391fb2a94efdcdfe4631eb69306ee1cff0b22f664a412c90713877/cryptography-47.0.0-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:2b45761c6ec22b7c726d6a829558777e32d0f1c8be7c3f3480f9c912d5ee8a10", size = 4654350, upload-time = "2026-04-24T19:54:10.795Z" },
738
+ { url = "https://files.pythonhosted.org/packages/4c/65/4b57bcc823f42a991627c51c2f68c9fd6eb1393c1756aac876cba2accae2/cryptography-47.0.0-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:edd4da498015da5b9f26d38d3bfc2e90257bfa9cbed1f6767c282a0025ae649b", size = 4643394, upload-time = "2026-04-24T19:54:13.275Z" },
739
+ { url = "https://files.pythonhosted.org/packages/f4/c4/2c5fbeea70adbbca2bbae865e1d605d6a4a7f8dbd9d33eaf69645087f06c/cryptography-47.0.0-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:9af828c0d5a65c70ec729cd7495a4bf1a67ecb66417b8f02ff125ab8a6326a74", size = 5225777, upload-time = "2026-04-24T19:54:15.18Z" },
740
+ { url = "https://files.pythonhosted.org/packages/7e/b8/ac57107ef32749d2b244e36069bb688792a363aaaa3acc9e3cf84c130315/cryptography-47.0.0-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:256d07c78a04d6b276f5df935a9923275f53bd1522f214447fdf365494e2d515", size = 4688771, upload-time = "2026-04-24T19:54:17.835Z" },
741
+ { url = "https://files.pythonhosted.org/packages/56/fc/9f1de22ff8be99d991f240a46863c52d475404c408886c5a38d2b5c3bb26/cryptography-47.0.0-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:5d0e362ff51041b0c0d219cc7d6924d7b8996f57ce5712bdcef71eb3c65a59cc", size = 4270753, upload-time = "2026-04-24T19:54:19.963Z" },
742
+ { url = "https://files.pythonhosted.org/packages/00/68/d70c852797aa68e8e48d12e5a87170c43f67bb4a59403627259dd57d15de/cryptography-47.0.0-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:1581aef4219f7ca2849d0250edaa3866212fb74bf5667284f46aa92f9e65c1ca", size = 4642911, upload-time = "2026-04-24T19:54:21.818Z" },
743
+ { url = "https://files.pythonhosted.org/packages/a5/51/661cbee74f594c5d97ff82d34f10d5551c085ca4668645f4606ebd22bd5d/cryptography-47.0.0-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:a49a3eb5341b9503fa3000a9a0db033161db90d47285291f53c2a9d2cd1b7f76", size = 5181411, upload-time = "2026-04-24T19:54:24.376Z" },
744
+ { url = "https://files.pythonhosted.org/packages/94/87/f2b6c374a82cf076cfa1416992ac8e8ec94d79facc37aec87c1a5cb72352/cryptography-47.0.0-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:2207a498b03275d0051589e326b79d4cf59985c99031b05bb292ac52631c37fe", size = 4688262, upload-time = "2026-04-24T19:54:26.946Z" },
745
+ { url = "https://files.pythonhosted.org/packages/14/e2/8b7462f4acf21ec509616f0245018bb197194ab0b65c2ea21a0bdd53c0eb/cryptography-47.0.0-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7a02675e2fabd0c0fc04c868b8781863cbf1967691543c22f5470500ff840b31", size = 4775506, upload-time = "2026-04-24T19:54:28.926Z" },
746
+ { url = "https://files.pythonhosted.org/packages/70/75/158e494e4c08dc05e039da5bb48553826bd26c23930cf8d3cd5f21fa8921/cryptography-47.0.0-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80887c5cbd1774683cb126f0ab4184567f080071d5acf62205acb354b4b753b7", size = 4912060, upload-time = "2026-04-24T19:54:30.869Z" },
747
+ { url = "https://files.pythonhosted.org/packages/06/bd/0a9d3edbf5eadbac926d7b9b3cd0c4be584eeeae4a003d24d9eda4affbbd/cryptography-47.0.0-cp38-abi3-win32.whl", hash = "sha256:ed67ea4e0cfb5faa5bc7ecb6e2b8838f3807a03758eec239d6c21c8769355310", size = 3248487, upload-time = "2026-04-24T19:54:33.494Z" },
748
+ { url = "https://files.pythonhosted.org/packages/60/80/5681af756d0da3a599b7bdb586fac5a1540f1bcefd2717a20e611ddade45/cryptography-47.0.0-cp38-abi3-win_amd64.whl", hash = "sha256:835d2d7f47cdc53b3224e90810fb1d36ca94ea29cc1801fb4c1bc43876735769", size = 3755737, upload-time = "2026-04-24T19:54:35.408Z" },
749
+ { url = "https://files.pythonhosted.org/packages/1b/a0/928c9ce0d120a40a81aa99e3ba383e87337b9ac9ef9f6db02e4d7822424d/cryptography-47.0.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:7f1207974a904e005f762869996cf620e9bf79ecb4622f148550bb48e0eb35a7", size = 3909893, upload-time = "2026-04-24T19:54:38.334Z" },
750
+ { url = "https://files.pythonhosted.org/packages/81/75/d691e284750df5d9569f2b1ce4a00a71e1d79566da83b2b3e5549c84917f/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:1a405c08857258c11016777e11c02bacbe7ef596faf259305d282272a3a05cbe", size = 4587867, upload-time = "2026-04-24T19:54:40.619Z" },
751
+ { url = "https://files.pythonhosted.org/packages/07/d6/1b90f1a4e453009730b4545286f0b39bb348d805c11181fc31544e4f9a65/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:20fdbe3e38fb67c385d233c89371fa27f9909f6ebca1cecc20c13518dae65475", size = 4627192, upload-time = "2026-04-24T19:54:42.849Z" },
752
+ { url = "https://files.pythonhosted.org/packages/dc/53/cb358a80e9e359529f496870dd08c102aa8a4b5b9f9064f00f0d6ed5b527/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:f7db373287273d8af1414cf95dc4118b13ffdc62be521997b0f2b270771fef50", size = 4587486, upload-time = "2026-04-24T19:54:44.908Z" },
753
+ { url = "https://files.pythonhosted.org/packages/8b/57/aaa3d53876467a226f9a7a82fd14dd48058ad2de1948493442dfa16e2ffd/cryptography-47.0.0-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:9fe6b7c64926c765f9dff301f9c1b867febcda5768868ca084e18589113732ab", size = 4626327, upload-time = "2026-04-24T19:54:47.813Z" },
754
+ { url = "https://files.pythonhosted.org/packages/ab/9c/51f28c3550276bcf35660703ba0ab829a90b88be8cd98a71ef23c2413913/cryptography-47.0.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:cffbba3392df0fa8629bb7f43454ee2925059ee158e23c54620b9063912b86c8", size = 3698916, upload-time = "2026-04-24T19:54:49.782Z" },
755
  ]
756
 
757
  [[package]]
 
1136
 
1137
  [[package]]
1138
  name = "huggingface-hub"
1139
+ version = "1.12.0"
1140
  source = { registry = "https://pypi.org/simple" }
1141
  dependencies = [
1142
  { name = "filelock" },
 
1149
  { name = "typer" },
1150
  { name = "typing-extensions" },
1151
  ]
1152
+ sdist = { url = "https://files.pythonhosted.org/packages/56/52/1b54cb569509c725a32c1315261ac9fd0e6b91bbbf74d86fca10d3376164/huggingface_hub-1.12.0.tar.gz", hash = "sha256:7c3fe85e24b652334e5d456d7a812cd9a071e75630fac4365d9165ab5e4a34b6", size = 763091, upload-time = "2026-04-24T13:32:08.674Z" }
1153
  wheels = [
1154
+ { url = "https://files.pythonhosted.org/packages/7e/2b/ef03ddb96bd1123503c2bd6932001020292deea649e9bf4caa2cb65a85bf/huggingface_hub-1.12.0-py3-none-any.whl", hash = "sha256:d74939969585ee35748bd66de09baf84099d461bda7287cd9043bfb99b0e424d", size = 646806, upload-time = "2026-04-24T13:32:06.717Z" },
1155
  ]
1156
 
1157
  [[package]]
 
1751
 
1752
  [[package]]
1753
  name = "marimo"
1754
+ version = "0.23.3"
1755
  source = { registry = "https://pypi.org/simple" }
1756
  dependencies = [
1757
  { name = "click" },
 
1774
  { name = "uvicorn" },
1775
  { name = "websockets" },
1776
  ]
1777
+ sdist = { url = "https://files.pythonhosted.org/packages/b6/3f/7fb38c6c2a1f8d6b3c3ffb8ca6db5ff0b9dacbb113b4d05aa7690b51a771/marimo-0.23.3.tar.gz", hash = "sha256:251a8724b58882d65956ff6a20552cb21e59a6fd4149ca437727894375ec31e9", size = 38406206, upload-time = "2026-04-24T17:56:21.016Z" }
1778
  wheels = [
1779
+ { url = "https://files.pythonhosted.org/packages/46/e7/02d672006fb04cb8aef23aeaf0384482fe63a13f9db6125ad8e13146daee/marimo-0.23.3-py3-none-any.whl", hash = "sha256:329b35b9ca221db9c78780d1714b11f010a00e2a929942db8ae6187960d42496", size = 38828150, upload-time = "2026-04-24T17:56:16.204Z" },
1780
  ]
1781
 
1782
  [[package]]
 
2311
  version = "0.1.0"
2312
  source = { editable = "." }
2313
  dependencies = [
2314
+ { name = "httpx" },
2315
+ { name = "huggingface-hub" },
2316
  { name = "manim", version = "0.19.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
2317
  { name = "manim", version = "0.20.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
2318
  { name = "marimo" },
2319
  { name = "openenv-core", extra = ["core"] },
2320
+ { name = "wikipedia-api" },
2321
  ]
2322
 
2323
  [package.optional-dependencies]
 
2328
 
2329
  [package.metadata]
2330
  requires-dist = [
2331
+ { name = "httpx", specifier = ">=0.28.1" },
2332
+ { name = "huggingface-hub", specifier = ">=1.12.0" },
2333
  { name = "manim", specifier = ">=0.18.0" },
2334
  { name = "marimo", specifier = ">=0.10.0" },
2335
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2336
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2337
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
2338
+ { name = "wikipedia-api", specifier = ">=0.14.1" },
2339
  ]
2340
  provides-extras = ["dev"]
2341
 
2342
+ [package.metadata.requires-dev]
2343
+ dev = []
2344
+
2345
  [[package]]
2346
  name = "opentelemetry-api"
2347
+ version = "1.41.1"
2348
  source = { registry = "https://pypi.org/simple" }
2349
  dependencies = [
2350
  { name = "importlib-metadata" },
2351
  { name = "typing-extensions" },
2352
  ]
2353
+ sdist = { url = "https://files.pythonhosted.org/packages/fa/fc/b7564cbef36601aef0d6c9bc01f7badb64be8e862c2e1c3c5c3b43b53e4f/opentelemetry_api-1.41.1.tar.gz", hash = "sha256:0ad1814d73b875f84494387dae86ce0b12c68556331ce6ce8fe789197c949621", size = 71416, upload-time = "2026-04-24T13:15:38.262Z" }
2354
  wheels = [
2355
+ { url = "https://files.pythonhosted.org/packages/29/59/3e7118ed140f76b0982ba4321bdaed1997a0473f9720de2d10788a577033/opentelemetry_api-1.41.1-py3-none-any.whl", hash = "sha256:a22df900e75c76dc08440710e51f52f1aa6b451b429298896023e60db5b3139f", size = 69007, upload-time = "2026-04-24T13:15:15.662Z" },
2356
  ]
2357
 
2358
  [[package]]
 
2438
 
2439
  [[package]]
2440
  name = "packaging"
2441
+ version = "26.2"
2442
  source = { registry = "https://pypi.org/simple" }
2443
+ sdist = { url = "https://files.pythonhosted.org/packages/d7/f1/e7a6dd94a8d4a5626c03e4e99c87f241ba9e350cd9e6d75123f992427270/packaging-26.2.tar.gz", hash = "sha256:ff452ff5a3e828ce110190feff1178bb1f2ea2281fa2075aadb987c2fb221661", size = 228134, upload-time = "2026-04-24T20:15:23.917Z" }
2444
  wheels = [
2445
+ { url = "https://files.pythonhosted.org/packages/df/b2/87e62e8c3e2f4b32e5fe99e0b86d576da1312593b39f47d8ceef365e95ed/packaging-26.2-py3-none-any.whl", hash = "sha256:5fc45236b9446107ff2415ce77c807cee2862cb6fac22b8a73826d0693b0980e", size = 100195, upload-time = "2026-04-24T20:15:22.081Z" },
2446
  ]
2447
 
2448
  [[package]]
 
3096
  version = "12.1"
3097
  source = { registry = "https://pypi.org/simple" }
3098
  dependencies = [
3099
+ { name = "pyobjc-core", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
3100
  ]
3101
  sdist = { url = "https://files.pythonhosted.org/packages/02/a3/16ca9a15e77c061a9250afbae2eae26f2e1579eb8ca9462ae2d2c71e1169/pyobjc_framework_cocoa-12.1.tar.gz", hash = "sha256:5556c87db95711b985d5efdaaf01c917ddd41d148b1e52a0c66b1a2e2c5c1640", size = 2772191, upload-time = "2025-11-14T10:13:02.069Z" }
3102
  wheels = [
 
3819
  { url = "https://files.pythonhosted.org/packages/48/2c/6c9bb53db56c8a12a736d2158a8b842a5993b96daabc29d90a098e840280/svgelements-1.9.6-py2.py3-none-any.whl", hash = "sha256:8a5cf2cc066d98e713d5b875b1d6e5eeb9b92e855e835ebd7caab2713ae1dcad", size = 137856, upload-time = "2023-08-17T02:01:48.76Z" },
3820
  ]
3821
 
3822
+ [[package]]
3823
+ name = "tenacity"
3824
+ version = "9.1.4"
3825
+ source = { registry = "https://pypi.org/simple" }
3826
+ sdist = { url = "https://files.pythonhosted.org/packages/47/c6/ee486fd809e357697ee8a44d3d69222b344920433d3b6666ccd9b374630c/tenacity-9.1.4.tar.gz", hash = "sha256:adb31d4c263f2bd041081ab33b498309a57c77f9acf2db65aadf0898179cf93a", size = 49413, upload-time = "2026-02-07T10:45:33.841Z" }
3827
+ wheels = [
3828
+ { url = "https://files.pythonhosted.org/packages/d7/c1/eb8f9debc45d3b7918a32ab756658a0904732f75e555402972246b0b8e71/tenacity-9.1.4-py3-none-any.whl", hash = "sha256:6095a360c919085f28c6527de529e76a06ad89b23659fa881ae0649b867a9d55", size = 28926, upload-time = "2026-02-07T10:45:32.24Z" },
3829
+ ]
3830
+
3831
  [[package]]
3832
  name = "tomli"
3833
  version = "2.4.1"
 
3950
 
3951
  [[package]]
3952
  name = "tzdata"
3953
+ version = "2026.2"
3954
  source = { registry = "https://pypi.org/simple" }
3955
+ sdist = { url = "https://files.pythonhosted.org/packages/ba/19/1b9b0e29f30c6d35cb345486df41110984ea67ae69dddbc0e8a100999493/tzdata-2026.2.tar.gz", hash = "sha256:9173fde7d80d9018e02a662e168e5a2d04f87c41ea174b139fbef642eda62d10", size = 198254, upload-time = "2026-04-24T15:22:08.651Z" }
3956
  wheels = [
3957
+ { url = "https://files.pythonhosted.org/packages/ce/e4/dccd7f47c4b64213ac01ef921a1337ee6e30e8c6466046018326977efd95/tzdata-2026.2-py2.py3-none-any.whl", hash = "sha256:bbe9af844f658da81a5f95019480da3a89415801f6cc966806612cc7169bffe7", size = 349321, upload-time = "2026-04-24T15:22:05.876Z" },
3958
  ]
3959
 
3960
  [[package]]
 
4192
  { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
4193
  ]
4194
 
4195
+ [[package]]
4196
+ name = "wikipedia-api"
4197
+ version = "0.14.1"
4198
+ source = { registry = "https://pypi.org/simple" }
4199
+ dependencies = [
4200
+ { name = "click" },
4201
+ { name = "httpx" },
4202
+ { name = "tenacity" },
4203
+ ]
4204
+ sdist = { url = "https://files.pythonhosted.org/packages/98/a5/166011c4d24d80a88e466a9ce1beb4d39884569f329dad82aa7d15c001f7/wikipedia_api-0.14.1.tar.gz", hash = "sha256:1a4ac428711f673a983be5676eb6c5fa39130fc5869893923435884e0e2c3c31", size = 141350, upload-time = "2026-04-10T22:38:34.313Z" }
4205
+ wheels = [
4206
+ { url = "https://files.pythonhosted.org/packages/72/de/0c66576815650bc74d6fbbdc92d17df906db34043bfc59323a004391c0ed/wikipedia_api-0.14.1-py3-none-any.whl", hash = "sha256:cacfdb953c3802b96605d7ac78ee42dd7fe049f28ed47e632cfe943187b83c2b", size = 129096, upload-time = "2026-04-10T22:38:32.22Z" },
4207
+ ]
4208
+
4209
  [[package]]
4210
  name = "zipp"
4211
  version = "3.23.1"