retroam Amp commited on
Commit
00b2ea2
·
1 Parent(s): 821b942

Add VendSim VB2 environment

Browse files

Amp-Thread-ID: https://ampcode.com/threads/T-019cce9e-be2b-718e-880f-eeb8e81cf219
Co-authored-by: Amp <amp@ampcode.com>

Dockerfile ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY pyproject.toml README.md ./
6
+ COPY vendsim_vb2/ vendsim_vb2/
7
+
8
+ RUN pip install --no-cache-dir ".[server]"
9
+
10
+ EXPOSE 7860
11
+
12
+ CMD ["uvicorn", "vendsim_vb2.server.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,120 @@
1
- ---
2
- title: Vendsim Vb2
3
- emoji: 👁
4
- colorFrom: yellow
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # vendsim-vb2
2
+
3
+ `vendsim-vb2` is an OpenEnv 0.2.1-compatible implementation of a Vending-Bench 2 style environment.
4
+
5
+ The agent runs a vending machine business over a 365-day horizon. It sets prices, manages storage and machine inventory, negotiates with adversarial suppliers, delegates physical actions to a sub-agent, tracks notes/reminders, and is scored by final bank balance.
6
+
7
+ ## Environment Summary
8
+
9
+ - Starting balance: `$500`
10
+ - Episode length: `365` simulated days
11
+ - Daily machine fee: `$2`
12
+ - Bankruptcy rule: `10` consecutive negative-balance days
13
+ - Weekly token billing: `$100 / 1M output tokens`
14
+ - Machine layout: `4 x 3` slots
15
+ `2` small rows and `2` large rows
16
+ - Restock travel time: `75` minutes
17
+ - Reward:
18
+ Default benchmark reward is sparse terminal reward equal to final bank balance.
19
+ Dense shaping is available behind a training flag.
20
+
21
+ ## MCP Tool Surface
22
+
23
+ Main-agent tools:
24
+
25
+ - `set_price`
26
+ - `send_email`
27
+ - `check_balance`
28
+ - `check_storage_inventory`
29
+ - `wait_for_next_day`
30
+ - `run_sub_agent`
31
+ - `chat_with_sub_agent`
32
+ - `request_supplier_quote`
33
+ - `negotiate_supplier`
34
+ - `place_supplier_order`
35
+ - `check_delivery`
36
+ - `get_status`
37
+
38
+ Memory tools:
39
+
40
+ - `write_scratchpad`
41
+ - `read_scratchpad`
42
+ - `search_notes`
43
+ - `set_reminder`
44
+
45
+ Sub-agent tools exposed through `run_sub_agent`:
46
+
47
+ - `restock_machine`
48
+ - `collect_cash`
49
+ - `get_machine_inventory`
50
+
51
+ ## Repository Artifacts
52
+
53
+ Code:
54
+
55
+ - Environment server: [vendsim_vb2/server/app.py](./vendsim_vb2/server/app.py)
56
+ - MCP wrapper: [vendsim_vb2/mcp_env.py](./vendsim_vb2/mcp_env.py)
57
+ - Core simulation: [vendsim_vb2/environment.py](./vendsim_vb2/environment.py)
58
+
59
+ Notebooks:
60
+
61
+ - Setup verification: [00_setup_verification.ipynb](../notebooks/00_setup_verification.ipynb)
62
+ - Training notebook: [01_vb2_training_grpo.ipynb](../notebooks/01_vb2_training_grpo.ipynb)
63
+ - Final benchmark run: [02_vb2_final_run.ipynb](../notebooks/02_vb2_final_run.ipynb)
64
+
65
+ Tests:
66
+
67
+ - Test suite: [tests](./tests)
68
+
69
+ ## Local Setup
70
+
71
+ From the repository root:
72
+
73
+ ```bash
74
+ python3 -m venv .venv
75
+ source .venv/bin/activate
76
+ pip install -e ./vendsim_vb2[server,dev]
77
+ ```
78
+
79
+ Run the tests:
80
+
81
+ ```bash
82
+ PYTHONPATH=vendsim_vb2 pytest vendsim_vb2/tests -q
83
+ ```
84
+
85
+ ## Run Locally
86
+
87
+ Start the OpenEnv-compatible server:
88
+
89
+ ```bash
90
+ PYTHONPATH=vendsim_vb2 python -m uvicorn vendsim_vb2.server.app:create_app --factory --host 0.0.0.0 --port 8000
91
+ ```
92
+
93
+ Then connect with `VB2Client` or use the notebooks.
94
+
95
+ ## Hugging Face Spaces Deployment
96
+
97
+ Build and verify locally first:
98
+
99
+ ```bash
100
+ cd vendsim_vb2
101
+ docker build -t vendsim-vb2 .
102
+ ```
103
+
104
+ Then deploy with OpenEnv tooling from the repo root after configuring your Hugging Face credentials:
105
+
106
+ ```bash
107
+ openenv push
108
+ ```
109
+
110
+ Submission artifact placeholders:
111
+
112
+ - HF Space URL: `TODO`
113
+ - Installable package / repo URL: `TODO`
114
+ - Demo video URL: `TODO`
115
+
116
+ ## Training Artifact
117
+
118
+ A minimal training script in Colab using Unsloth or HF TRL is included:
119
+
120
+ - [01_vb2_training_grpo.ipynb](../notebooks/01_vb2_training_grpo.ipynb)
pyproject.toml ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "vendsim-vb2"
7
+ version = "0.1.0"
8
+ description = "OpenEnv-compatible Vending-Bench 2 simulation environment"
9
+ readme = "README.md"
10
+ requires-python = ">=3.11"
11
+ dependencies = [
12
+ "openenv-core==0.2.1",
13
+ "fastmcp",
14
+ ]
15
+
16
+ [project.optional-dependencies]
17
+ server = ["fastapi>=0.115", "uvicorn>=0.34"]
18
+ dev = ["pytest>=8.0", "ruff>=0.11"]
19
+
20
+ [tool.setuptools]
21
+ package-dir = {"" = "."}
22
+
23
+ [tool.setuptools.packages.find]
24
+ where = ["."]
25
+ include = ["vendsim_vb2*"]
vendsim_vb2/__init__.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ """Vending-Bench 2 environment package."""
2
+
3
+ from vendsim_vb2.client import VB2Client
4
+ from vendsim_vb2.config import VB2Config
5
+ from vendsim_vb2.environment import VendingBench2Environment
6
+ from vendsim_vb2.mcp_env import VB2MCPEnvironment
7
+
8
+ __all__ = ["VB2Client", "VB2Config", "VendingBench2Environment", "VB2MCPEnvironment"]
vendsim_vb2/billing.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+
4
+ def apply_weekly_costs(
5
+ cash_balance: float,
6
+ weekly_output_tokens: int,
7
+ token_cost_per_million: float,
8
+ daily_fee: float,
9
+ days_in_week: int,
10
+ ) -> float:
11
+ token_cost = (weekly_output_tokens / 1_000_000) * token_cost_per_million
12
+ total_cost = token_cost + (daily_fee * days_in_week)
13
+ return round(cash_balance - total_cost, 2)
vendsim_vb2/client.py ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """VB2 environment client for agents and training scripts."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict, List, Optional
6
+
7
+ from openenv.core.env_server.mcp_types import (
8
+ CallToolAction,
9
+ CallToolObservation,
10
+ ListToolsAction,
11
+ ListToolsObservation,
12
+ Observation,
13
+ Tool,
14
+ ToolError,
15
+ )
16
+ from openenv.core.env_client import EnvClient, StepResult
17
+ from openenv.core.mcp_client import State
18
+
19
+
20
+ class VB2Client(EnvClient[Any, Observation, State]):
21
+ """
22
+ Client for the Vending-Bench 2 MCP environment.
23
+
24
+ Provides typed convenience methods for every VB2 tool, plus the full
25
+ ``step()`` / ``reset()`` API inherited from :class:`EnvClient`.
26
+
27
+ Example::
28
+
29
+ with VB2Client(base_url="http://localhost:8000") as env:
30
+ env.reset()
31
+ balance = env.check_balance()
32
+ env.set_price("soda", 1.75)
33
+ quote = env.request_supplier_quote("chips", 20)
34
+ sales = env.wait_for_next_day()
35
+ """
36
+
37
+ def __init__(
38
+ self,
39
+ base_url: str,
40
+ connect_timeout_s: float = 10.0,
41
+ message_timeout_s: float = 60.0,
42
+ provider: Optional[Any] = None,
43
+ ) -> None:
44
+ super().__init__(
45
+ base_url=base_url,
46
+ connect_timeout_s=connect_timeout_s,
47
+ message_timeout_s=message_timeout_s,
48
+ provider=provider,
49
+ )
50
+ self._tools_cache: Optional[List[Tool]] = None
51
+
52
+ # ------------------------------------------------------------------
53
+ # Abstract method implementations
54
+ # ------------------------------------------------------------------
55
+
56
+ def _step_payload(self, action: Any) -> Dict[str, Any]:
57
+ if isinstance(action, ListToolsAction):
58
+ return {"type": "list_tools"}
59
+ if isinstance(action, CallToolAction):
60
+ return {
61
+ "type": "call_tool",
62
+ "tool_name": action.tool_name,
63
+ "arguments": action.arguments,
64
+ }
65
+ if hasattr(action, "model_dump"):
66
+ return action.model_dump()
67
+ return {"action": str(action)}
68
+
69
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[Observation]:
70
+ obs_data = payload.get("observation", {})
71
+
72
+ if "tools" in obs_data:
73
+ tools = [
74
+ Tool(
75
+ name=t.get("name", ""),
76
+ description=t.get("description", ""),
77
+ input_schema=t.get("input_schema", t.get("inputSchema", {})),
78
+ )
79
+ for t in obs_data.get("tools", [])
80
+ ]
81
+ observation: Observation = ListToolsObservation(
82
+ tools=tools,
83
+ done=payload.get("done", False),
84
+ reward=payload.get("reward"),
85
+ metadata=obs_data.get("metadata", {}),
86
+ )
87
+ elif "tool_name" in obs_data:
88
+ error = None
89
+ if obs_data.get("error"):
90
+ error = ToolError(**obs_data["error"])
91
+ observation = CallToolObservation(
92
+ tool_name=obs_data.get("tool_name", ""),
93
+ result=obs_data.get("result"),
94
+ error=error,
95
+ done=payload.get("done", False),
96
+ reward=payload.get("reward"),
97
+ metadata=obs_data.get("metadata", {}),
98
+ )
99
+ else:
100
+ observation = Observation(
101
+ done=payload.get("done", False),
102
+ reward=payload.get("reward"),
103
+ metadata=obs_data.get("metadata", {}),
104
+ )
105
+
106
+ return StepResult(
107
+ observation=observation,
108
+ reward=payload.get("reward"),
109
+ done=payload.get("done", False),
110
+ )
111
+
112
+ def _parse_state(self, payload: Dict[str, Any]) -> State:
113
+ return State(
114
+ episode_id=payload.get("episode_id"),
115
+ step_count=payload.get("step_count", 0),
116
+ )
117
+
118
+ # ------------------------------------------------------------------
119
+ # Helper: call a tool and return its result
120
+ # ------------------------------------------------------------------
121
+
122
+ def _call_tool(self, tool_name: str, **kwargs: Any) -> Any:
123
+ """Call a tool by name and return its result (or raise on error)."""
124
+ result = self.call_tool_step(tool_name, **kwargs)
125
+ obs = result.observation
126
+
127
+ if isinstance(obs, CallToolObservation) and obs.error is not None:
128
+ raise RuntimeError(
129
+ f"Tool '{tool_name}' failed: {obs.error.message} "
130
+ f"(type: {obs.error.error_type.value})"
131
+ )
132
+
133
+ if isinstance(obs, CallToolObservation):
134
+ res = obs.result
135
+ if hasattr(res, "data"):
136
+ return res.data
137
+ if isinstance(res, dict) and "data" in res:
138
+ return res["data"]
139
+ return res
140
+
141
+ return obs
142
+
143
+ def call_tool_step(self, tool_name: str, **kwargs: Any) -> StepResult[Observation]:
144
+ """Call a tool and return the full StepResult with reward/done metadata."""
145
+ action = CallToolAction(tool_name=tool_name, arguments=kwargs)
146
+ return self.step(action)
147
+
148
+ # ------------------------------------------------------------------
149
+ # Convenience methods
150
+ # ------------------------------------------------------------------
151
+
152
+ def list_tools(self, use_cache: bool = True) -> List[Tool]:
153
+ """Discover available tools from the environment."""
154
+ if use_cache and self._tools_cache is not None:
155
+ return self._tools_cache
156
+ result = self.step(ListToolsAction())
157
+ if isinstance(result.observation, ListToolsObservation):
158
+ self._tools_cache = result.observation.tools
159
+ return self._tools_cache
160
+ return []
161
+
162
+ def set_price(self, product: str, price: float) -> Any:
163
+ """Update the price of a product in the vending machine."""
164
+ return self._call_tool("set_price", product=product, price=price)
165
+
166
+ def check_balance(self) -> Any:
167
+ """Review current bank balance."""
168
+ return self._call_tool("check_balance")
169
+
170
+ def check_storage_inventory(self) -> Any:
171
+ """Inspect the storage inventory."""
172
+ return self._call_tool("check_storage_inventory")
173
+
174
+ def wait_for_next_day(self, output_tokens: int = 0) -> Any:
175
+ """Advance simulation to the next business day."""
176
+ return self._call_tool("wait_for_next_day", output_tokens=output_tokens)
177
+
178
+ def send_email(self, recipient: str, subject: str, body: str) -> Any:
179
+ """Send an email to a supplier or service provider."""
180
+ return self._call_tool(
181
+ "send_email", recipient=recipient, subject=subject, body=body
182
+ )
183
+
184
+ def restock_machine(self, product: str, qty: int) -> Any:
185
+ """Delegate to sub-agent: restock the vending machine from storage."""
186
+ return self._call_tool(
187
+ "run_sub_agent",
188
+ tool_name="restock_machine",
189
+ arguments={"product": product, "qty": qty},
190
+ )
191
+
192
+ def collect_cash(self) -> Any:
193
+ """Delegate to sub-agent: collect cash from the vending machine."""
194
+ return self._call_tool("run_sub_agent", tool_name="collect_cash", arguments={})
195
+
196
+ def get_machine_inventory(self) -> Any:
197
+ """Delegate to sub-agent: get current machine inventory."""
198
+ return self._call_tool(
199
+ "run_sub_agent",
200
+ tool_name="get_machine_inventory",
201
+ arguments={},
202
+ )
203
+
204
+ def chat_with_sub_agent(self, message: str) -> Any:
205
+ """Message the sub-agent without taking action."""
206
+ return self._call_tool("chat_with_sub_agent", message=message)
207
+
208
+ def write_scratchpad(self, note: str) -> Any:
209
+ """Append a note to working memory."""
210
+ return self._call_tool("write_scratchpad", note=note)
211
+
212
+ def read_scratchpad(self) -> Any:
213
+ """Read the working-memory scratchpad."""
214
+ return self._call_tool("read_scratchpad")
215
+
216
+ def search_notes(self, query: str) -> Any:
217
+ """Search saved notes for a keyword."""
218
+ return self._call_tool("search_notes", query=query)
219
+
220
+ def set_reminder(self, day: int, message: str) -> Any:
221
+ """Schedule a future reminder."""
222
+ return self._call_tool("set_reminder", day=day, message=message)
223
+
224
+ def request_supplier_quote(self, product: str, qty: int) -> Any:
225
+ """Request a price quote from a supplier for a product."""
226
+ return self._call_tool("request_supplier_quote", product=product, qty=qty)
227
+
228
+ def negotiate_supplier(self, quote_id: str, proposed_unit_price: float) -> Any:
229
+ """Negotiate a supplier quote with a proposed unit price."""
230
+ return self._call_tool("negotiate_supplier", quote_id=quote_id, proposed_unit_price=proposed_unit_price)
231
+
232
+ def place_supplier_order(self, product: str, qty: int) -> Any:
233
+ """Place a confirmed order with a supplier."""
234
+ return self._call_tool("place_supplier_order", product=product, qty=qty)
235
+
236
+ def check_delivery(self, order_id: str) -> Any:
237
+ """Check the delivery status of a supplier order."""
238
+ return self._call_tool("check_delivery", order_id=order_id)
vendsim_vb2/compat.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Compatibility shims for optional third-party dependencies."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from dataclasses import dataclass
6
+ from typing import Any, Callable
7
+
8
+
9
+ @dataclass(slots=True)
10
+ class Route:
11
+ method: str
12
+ path: str
13
+ endpoint: Callable[..., Any]
14
+
15
+
16
+ class FastAPI:
17
+ """Small subset of FastAPI used for local smoke tests when FastAPI is absent."""
18
+
19
+ def __init__(self, *, title: str) -> None:
20
+ self.title = title
21
+ self.routes: list[Route] = []
22
+
23
+ def get(self, path: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
24
+ return self._register("GET", path)
25
+
26
+ def post(self, path: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
27
+ return self._register("POST", path)
28
+
29
+ def _register(
30
+ self, method: str, path: str
31
+ ) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
32
+ def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
33
+ self.routes.append(Route(method=method, path=path, endpoint=func))
34
+ return func
35
+
36
+ return decorator
vendsim_vb2/config.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+
5
+
6
+ @dataclass(slots=True)
7
+ class VB2Config:
8
+ starting_balance: float = 500.0
9
+ daily_machine_fee: float = 2.0
10
+ episode_days: int = 365
11
+ bankruptcy_consecutive_negative_days: int = 10
12
+ output_token_cost_per_million: float = 100.0
13
+ storage_address: str = "1680 Mission St, San Francisco"
14
+ machine_address: str = "1421 Bay St, San Francisco"
15
+ restock_travel_time_minutes: int = 75
16
+ supplier_message_time_minutes: int = 10
17
+ delivery_check_time_minutes: int = 5
18
+ minutes_per_day: int = 24 * 60
19
+ machine_small_rows: int = 2
20
+ machine_large_rows: int = 2
21
+ machine_slots_per_row: int = 3
vendsim_vb2/customer_service.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from random import Random
5
+
6
+
7
+ @dataclass(slots=True)
8
+ class ComplaintTicket:
9
+ ticket_id: str
10
+ type: str
11
+ day: int
12
+ amount: float
13
+ reason: str
14
+
15
+
16
+ class CustomerServiceEngine:
17
+ def __init__(self, seed: int | None = None) -> None:
18
+ self._rng = Random(seed)
19
+ self._ticket_counter = 0
20
+
21
+ def maybe_create_complaint(
22
+ self, day: int, sales: dict[str, int]
23
+ ) -> ComplaintTicket | None:
24
+ total_units = sum(sales.values())
25
+ if total_units <= 0:
26
+ return None
27
+ complaint_probability = min(0.35, total_units / 150)
28
+ if self._rng.random() >= complaint_probability:
29
+ return None
30
+ self._ticket_counter += 1
31
+ amount = round(1.5 + self._rng.random() * 4.0, 2)
32
+ return ComplaintTicket(
33
+ ticket_id=f"ticket-{self._ticket_counter}",
34
+ type="refund_request",
35
+ day=day,
36
+ amount=amount,
37
+ reason="Customer reported a vending issue.",
38
+ )
39
+
40
+ def process_refund(self, cash_balance: float, amount: float) -> float:
41
+ return round(cash_balance - amount, 2)
vendsim_vb2/demand.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from random import Random
5
+
6
+ PRODUCTS: dict[str, dict[str, float | str]] = {
7
+ "soda": {
8
+ "size": "small",
9
+ "base_daily_demand": 7.0,
10
+ "ideal_price": 1.50,
11
+ "wholesale_price": 0.58,
12
+ "weather_bias": "hot",
13
+ },
14
+ "water": {
15
+ "size": "small",
16
+ "base_daily_demand": 6.0,
17
+ "ideal_price": 1.25,
18
+ "wholesale_price": 0.42,
19
+ "weather_bias": "hot",
20
+ },
21
+ "candy": {
22
+ "size": "small",
23
+ "base_daily_demand": 4.0,
24
+ "ideal_price": 1.25,
25
+ "wholesale_price": 0.35,
26
+ "weather_bias": "neutral",
27
+ },
28
+ "chips": {
29
+ "size": "large",
30
+ "base_daily_demand": 5.0,
31
+ "ideal_price": 2.00,
32
+ "wholesale_price": 0.72,
33
+ "weather_bias": "neutral",
34
+ },
35
+ "sandwich": {
36
+ "size": "large",
37
+ "base_daily_demand": 2.0,
38
+ "ideal_price": 4.50,
39
+ "wholesale_price": 2.20,
40
+ "weather_bias": "cold",
41
+ },
42
+ }
43
+
44
+ SEASON_MULTIPLIERS = {
45
+ "winter": 0.9,
46
+ "spring": 1.0,
47
+ "summer": 1.15,
48
+ "autumn": 1.0,
49
+ }
50
+
51
+ DAY_OF_WEEK_MULTIPLIERS = {
52
+ "monday": 0.95,
53
+ "tuesday": 1.0,
54
+ "wednesday": 1.0,
55
+ "thursday": 1.0,
56
+ "friday": 1.1,
57
+ "saturday": 1.2,
58
+ "sunday": 0.85,
59
+ }
60
+
61
+ WEATHER_MULTIPLIERS = {
62
+ "sunny": 1.15,
63
+ "cloudy": 1.0,
64
+ "rainy": 0.85,
65
+ "foggy": 0.9,
66
+ "heatwave": 1.25,
67
+ }
68
+
69
+ WEATHER_SEQUENCE = ["sunny", "cloudy", "rainy", "sunny", "foggy", "cloudy", "sunny"]
70
+ DAY_NAMES = [
71
+ "monday",
72
+ "tuesday",
73
+ "wednesday",
74
+ "thursday",
75
+ "friday",
76
+ "saturday",
77
+ "sunday",
78
+ ]
79
+
80
+
81
+ @dataclass(slots=True)
82
+ class DailySalesResult:
83
+ units_sold: dict[str, int]
84
+ revenue: float
85
+ debug: dict[str, float]
86
+
87
+
88
+ def season_for_day(day_index: int) -> str:
89
+ day_of_year = ((day_index - 1) % 365) + 1
90
+ if day_of_year <= 79:
91
+ return "winter"
92
+ if day_of_year <= 171:
93
+ return "spring"
94
+ if day_of_year <= 265:
95
+ return "summer"
96
+ if day_of_year <= 354:
97
+ return "autumn"
98
+ return "winter"
99
+
100
+
101
+ def day_of_week_for_day(day_index: int) -> str:
102
+ return DAY_NAMES[(day_index - 1) % len(DAY_NAMES)]
103
+
104
+
105
+ def weather_for_day(day_index: int) -> str:
106
+ return WEATHER_SEQUENCE[(day_index - 1) % len(WEATHER_SEQUENCE)]
107
+
108
+
109
+ def _weather_bias_multiplier(product: str, weather: str) -> float:
110
+ bias = str(PRODUCTS.get(product, {}).get("weather_bias", "neutral"))
111
+ if bias == "hot" and weather in {"sunny", "heatwave"}:
112
+ return 1.1
113
+ if bias == "cold" and weather in {"rainy", "foggy"}:
114
+ return 1.08
115
+ return 1.0
116
+
117
+
118
+ def compute_daily_sales(
119
+ products: list[str],
120
+ prices: dict[str, float],
121
+ weather: str,
122
+ season: str,
123
+ day_of_week: str,
124
+ inventory: dict[str, int] | None = None,
125
+ seed: int | None = None,
126
+ ) -> DailySalesResult:
127
+ rng = Random(seed)
128
+ choice_multiplier = 1.0 + min(max(len(products) - 1, 0), 5) * 0.05
129
+ weather_multiplier = WEATHER_MULTIPLIERS.get(weather, 1.0)
130
+ season_multiplier = SEASON_MULTIPLIERS.get(season, 1.0)
131
+ dow_multiplier = DAY_OF_WEEK_MULTIPLIERS.get(day_of_week, 1.0)
132
+ inventory = inventory or {}
133
+
134
+ units_sold: dict[str, int] = {}
135
+ revenue = 0.0
136
+ for product in products:
137
+ catalog = PRODUCTS.get(product, {})
138
+ base_demand = float(catalog.get("base_daily_demand", 1.0))
139
+ ideal_price = float(
140
+ catalog.get("ideal_price", max(prices.get(product, 1.0), 0.01))
141
+ )
142
+ price = float(prices.get(product, ideal_price))
143
+ price_multiplier = max(
144
+ 0.15, 1.0 - ((price - ideal_price) / max(ideal_price, 0.01)) * 0.45
145
+ )
146
+ noise_multiplier = 0.9 + (rng.random() * 0.2)
147
+ expected_units = (
148
+ base_demand
149
+ * choice_multiplier
150
+ * weather_multiplier
151
+ * season_multiplier
152
+ * dow_multiplier
153
+ * price_multiplier
154
+ * _weather_bias_multiplier(product, weather)
155
+ * noise_multiplier
156
+ )
157
+ sold = max(0, int(round(expected_units)))
158
+ if product in inventory:
159
+ sold = min(sold, inventory[product])
160
+ units_sold[product] = sold
161
+ revenue += sold * price
162
+
163
+ debug = {
164
+ "choice_multiplier": round(choice_multiplier, 3),
165
+ "weather_multiplier": round(weather_multiplier, 3),
166
+ "season_multiplier": round(season_multiplier, 3),
167
+ "day_of_week_multiplier": round(dow_multiplier, 3),
168
+ }
169
+ return DailySalesResult(
170
+ units_sold=units_sold, revenue=round(revenue, 2), debug=debug
171
+ )
vendsim_vb2/environment.py ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from random import Random
5
+ from typing import Any
6
+
7
+ from vendsim_vb2.billing import apply_weekly_costs
8
+ from vendsim_vb2.config import VB2Config
9
+ from vendsim_vb2.customer_service import CustomerServiceEngine
10
+ from vendsim_vb2.demand import (
11
+ PRODUCTS,
12
+ compute_daily_sales,
13
+ day_of_week_for_day,
14
+ season_for_day,
15
+ weather_for_day,
16
+ )
17
+ from vendsim_vb2.state import SimulationState
18
+ from vendsim_vb2.subagent import SubAgent
19
+ from vendsim_vb2.suppliers import SupplierEngine
20
+ from vendsim_vb2.tools.main_agent_tools import get_main_tool_specs
21
+ from vendsim_vb2.tools.memory_tools import get_memory_tool_specs
22
+
23
+
24
+ @dataclass(slots=True)
25
+ class ToolCallResult:
26
+ status: str
27
+ payload: dict[str, Any]
28
+
29
+
30
+ class VendingBench2Environment:
31
+ def __init__(
32
+ self,
33
+ config: VB2Config | None = None,
34
+ seed: int | None = None,
35
+ use_dense_rewards: bool = False,
36
+ ) -> None:
37
+ self.config = config or VB2Config()
38
+ self._seed = seed
39
+ self._rng = Random(seed)
40
+ self.use_dense_rewards = use_dense_rewards
41
+ self.suppliers = SupplierEngine(seed=seed)
42
+ self.customer_service = CustomerServiceEngine(seed=seed)
43
+ self.subagent = SubAgent(config=self.config)
44
+ self.state = self.reset()
45
+
46
+ def reset(self) -> SimulationState:
47
+ self._rng = Random(self._seed)
48
+ self.suppliers = SupplierEngine(seed=self._seed)
49
+ self.customer_service = CustomerServiceEngine(seed=self._seed)
50
+ self.subagent = SubAgent(config=self.config)
51
+ self.state = SimulationState.new_episode(self.config)
52
+ self.state.prices = {
53
+ product: float(spec["ideal_price"]) for product, spec in PRODUCTS.items()
54
+ }
55
+ return self.state
56
+
57
+ def tool_registry(self) -> dict[str, list[str]]:
58
+ return {
59
+ "main": [spec.name for spec in get_main_tool_specs()],
60
+ "memory": [spec.name for spec in get_memory_tool_specs()],
61
+ "subagent": list(self.subagent.specs()["tools"]),
62
+ }
63
+
64
+ def _log_email(
65
+ self,
66
+ *,
67
+ sender: str,
68
+ recipient: str,
69
+ subject: str,
70
+ body: str,
71
+ category: str = "email",
72
+ ) -> None:
73
+ self.state.email_log.append(
74
+ {
75
+ "day": self.state.day_index,
76
+ "minute_of_day": self.state.minute_of_day,
77
+ "sender": sender,
78
+ "recipient": recipient,
79
+ "subject": subject,
80
+ "body": body,
81
+ "category": category,
82
+ }
83
+ )
84
+
85
+ def resolve_delivery(self, order_id: str) -> ToolCallResult:
86
+ """Check delivery status; on success, add items to storage and charge cost."""
87
+ delivery = self.suppliers.simulate_delivery(order_id)
88
+ order = self.suppliers._orders[order_id]
89
+ if delivery.status in {"delivered", "delayed", "partial"} and delivery.delivered_qty > 0:
90
+ product = order.product
91
+ self.state.storage_inventory[product] = (
92
+ self.state.storage_inventory.get(product, 0) + delivery.delivered_qty
93
+ )
94
+ cost = round(delivery.final_unit_price * delivery.delivered_qty, 2)
95
+ self.state.cash_balance = round(self.state.cash_balance - cost, 2)
96
+ self.state.advance_minutes(self.config.delivery_check_time_minutes)
97
+ self._log_email(
98
+ sender=order.supplier_name,
99
+ recipient="charles.paxton",
100
+ subject=f"Delivery update for {order.order_id}",
101
+ body=(
102
+ f"Status={delivery.status}; delivered_qty={delivery.delivered_qty}; "
103
+ f"days_late={delivery.days_late}; final_unit_price={delivery.final_unit_price}"
104
+ ),
105
+ category="supplier_delivery",
106
+ )
107
+ return ToolCallResult(
108
+ delivery.status,
109
+ {
110
+ "order_id": delivery.order_id,
111
+ "delivered_qty": delivery.delivered_qty,
112
+ "days_late": delivery.days_late,
113
+ "final_unit_price": delivery.final_unit_price,
114
+ },
115
+ )
116
+
117
+ def set_price(self, product: str, price: float) -> ToolCallResult:
118
+ self.state.prices[product] = round(price, 2)
119
+ self.state.advance_minutes(5)
120
+ return ToolCallResult(
121
+ "ok", {"product": product, "price": self.state.prices[product]}
122
+ )
123
+
124
+ def send_email(self, recipient: str, subject: str, body: str) -> ToolCallResult:
125
+ self._log_email(
126
+ sender="charles.paxton",
127
+ recipient=recipient,
128
+ subject=subject,
129
+ body=body,
130
+ category="manual_email",
131
+ )
132
+ self.state.advance_minutes(self.config.supplier_message_time_minutes)
133
+ return ToolCallResult("ok", {"recipient": recipient, "queued": True})
134
+
135
+ def check_balance(self) -> ToolCallResult:
136
+ self.state.advance_minutes(1)
137
+ return ToolCallResult("ok", {"cash_balance": round(self.state.cash_balance, 2)})
138
+
139
+ def check_storage_inventory(self) -> ToolCallResult:
140
+ self.state.advance_minutes(2)
141
+ return ToolCallResult(
142
+ "ok", {"storage_inventory": dict(self.state.storage_inventory)}
143
+ )
144
+
145
+ def chat_with_sub_agent(self, message: str) -> ToolCallResult:
146
+ self.state.subagent_chat_log.append(message)
147
+ self.state.advance_minutes(5)
148
+ return ToolCallResult("ok", {"message": message})
149
+
150
+ def request_supplier_quote(self, product: str, qty: int) -> ToolCallResult:
151
+ quote = self.suppliers.request_quote(product, qty)
152
+ subject = f"Quote request for {qty} units of {product}"
153
+ self._log_email(
154
+ sender="charles.paxton",
155
+ recipient=quote.supplier_name,
156
+ subject=subject,
157
+ body=f"Please quote {qty} units of {product}.",
158
+ category="supplier_quote_request",
159
+ )
160
+ self.state.advance_minutes(self.config.supplier_message_time_minutes)
161
+ self._log_email(
162
+ sender=quote.supplier_name,
163
+ recipient="charles.paxton",
164
+ subject=f"Quote response for {product}",
165
+ body=(
166
+ f"quote_id={quote.quote_id}; qty={quote.qty}; "
167
+ f"unit_price={quote.unit_price}; fair_unit_price={quote.fair_unit_price}"
168
+ ),
169
+ category="supplier_quote_response",
170
+ )
171
+ return ToolCallResult(
172
+ "ok",
173
+ {
174
+ "quote_id": quote.quote_id,
175
+ "product": quote.product,
176
+ "qty": quote.qty,
177
+ "unit_price": quote.unit_price,
178
+ "supplier_name": quote.supplier_name,
179
+ },
180
+ )
181
+
182
+ def negotiate_supplier(
183
+ self, quote_id: str, proposed_unit_price: float
184
+ ) -> ToolCallResult:
185
+ quote = self.suppliers._quotes[quote_id]
186
+ response = self.suppliers.negotiate(quote_id, proposed_unit_price)
187
+ self._log_email(
188
+ sender="charles.paxton",
189
+ recipient=quote.supplier_name,
190
+ subject=f"Counteroffer for {quote.product}",
191
+ body=(
192
+ f"quote_id={quote_id}; proposed_unit_price={round(proposed_unit_price, 2)}"
193
+ ),
194
+ category="supplier_negotiation_request",
195
+ )
196
+ self.state.advance_minutes(self.config.supplier_message_time_minutes)
197
+ self._log_email(
198
+ sender=quote.supplier_name,
199
+ recipient="charles.paxton",
200
+ subject=f"Negotiation response for {quote.product}",
201
+ body=(
202
+ f"quote_id={response.quote_id}; status={response.status}; "
203
+ f"unit_price={response.unit_price}; message={response.message}"
204
+ ),
205
+ category="supplier_negotiation_response",
206
+ )
207
+ return ToolCallResult(
208
+ response.status,
209
+ {
210
+ "quote_id": response.quote_id,
211
+ "unit_price": response.unit_price,
212
+ "message": response.message,
213
+ },
214
+ )
215
+
216
+ def place_supplier_order(self, product: str, qty: int) -> ToolCallResult:
217
+ order = self.suppliers.place_email_confirmed_order(product, qty)
218
+ self._log_email(
219
+ sender="charles.paxton",
220
+ recipient=order.supplier_name,
221
+ subject=f"Purchase order for {product}",
222
+ body=f"Please ship {qty} units of {product}.",
223
+ category="supplier_order_request",
224
+ )
225
+ self.state.advance_minutes(self.config.supplier_message_time_minutes)
226
+ self._log_email(
227
+ sender=order.supplier_name,
228
+ recipient="charles.paxton",
229
+ subject=f"Order confirmation for {product}",
230
+ body=(
231
+ f"order_id={order.order_id}; qty={order.qty}; unit_price={order.unit_price}; "
232
+ f"may_bait_and_switch={order.may_bait_and_switch}"
233
+ ),
234
+ category="supplier_order_confirmation",
235
+ )
236
+ return ToolCallResult(
237
+ order.status,
238
+ {
239
+ "order_id": order.order_id,
240
+ "product": order.product,
241
+ "qty": order.qty,
242
+ "unit_price": order.unit_price,
243
+ "supplier_name": order.supplier_name,
244
+ },
245
+ )
246
+
247
+ def run_sub_agent(self, tool_name: str, **kwargs: Any) -> ToolCallResult:
248
+ if tool_name == "restock_machine":
249
+ product = str(kwargs["product"])
250
+ qty = int(kwargs["qty"])
251
+ available = self.state.storage_inventory.get(product, 0)
252
+ if available < qty:
253
+ return ToolCallResult(
254
+ "rejected",
255
+ {
256
+ "message": f"insufficient storage inventory for {product}",
257
+ "available": available,
258
+ },
259
+ )
260
+ result = self.subagent.restock_machine(product, qty)
261
+ if result.get("status") == "ok":
262
+ self.state.storage_inventory[product] = available - qty
263
+ if self.state.storage_inventory[product] == 0:
264
+ del self.state.storage_inventory[product]
265
+ self.state.machine_inventory = dict(self.subagent.machine_inventory)
266
+ self.state.advance_minutes(int(result["time_cost_minutes"]))
267
+ return ToolCallResult(str(result["status"]), dict(result))
268
+ if tool_name == "collect_cash":
269
+ self.subagent.machine_cash = self.state.machine_cash
270
+ result = self.subagent.collect_cash()
271
+ self.state.machine_cash = self.subagent.machine_cash
272
+ self.state.cash_balance = round(
273
+ self.state.cash_balance + float(result["amount_collected"]), 2
274
+ )
275
+ self.state.advance_minutes(int(result["time_cost_minutes"]))
276
+ return ToolCallResult("ok", dict(result))
277
+ if tool_name == "get_machine_inventory":
278
+ return ToolCallResult(
279
+ "ok", {"machine_inventory": self.subagent.get_machine_inventory()}
280
+ )
281
+ raise KeyError(f"unknown sub-agent tool: {tool_name}")
282
+
283
+ def write_scratchpad(self, note: str) -> ToolCallResult:
284
+ self.state.scratchpad.append(note)
285
+ self.state.notes.append(note)
286
+ return ToolCallResult("ok", {"note_count": len(self.state.scratchpad)})
287
+
288
+ def read_scratchpad(self) -> ToolCallResult:
289
+ return ToolCallResult("ok", {"scratchpad": list(self.state.scratchpad)})
290
+
291
+ def search_notes(self, query: str) -> ToolCallResult:
292
+ query_lower = query.lower()
293
+ matches = [note for note in self.state.notes if query_lower in note.lower()]
294
+ return ToolCallResult("ok", {"matches": matches})
295
+
296
+ def set_reminder(self, day: int, message: str) -> ToolCallResult:
297
+ self.state.add_reminder(day, message)
298
+ return ToolCallResult("ok", {"day": day, "message": message})
299
+
300
+ def record_output_tokens(self, count: int) -> None:
301
+ self.state.weekly_output_tokens += count
302
+
303
+ def wait_for_next_day(self, output_tokens: int = 0) -> ToolCallResult:
304
+ self.record_output_tokens(output_tokens)
305
+ weather = weather_for_day(self.state.day_index)
306
+ season = season_for_day(self.state.day_index)
307
+ day_of_week = day_of_week_for_day(self.state.day_index)
308
+ sales_result = compute_daily_sales(
309
+ products=list(self.state.machine_inventory),
310
+ prices=self.state.prices,
311
+ weather=weather,
312
+ season=season,
313
+ day_of_week=day_of_week,
314
+ inventory=self.state.machine_inventory,
315
+ seed=self._rng.randint(0, 1_000_000),
316
+ )
317
+ for product, sold in sales_result.units_sold.items():
318
+ if product in self.subagent.machine_inventory:
319
+ remaining = self.subagent.machine_inventory[product] - sold
320
+ if remaining > 0:
321
+ self.subagent.machine_inventory[product] = remaining
322
+ else:
323
+ del self.subagent.machine_inventory[product]
324
+ self.state.machine_inventory = dict(self.subagent.machine_inventory)
325
+ # Revenue goes into the machine coin box only — collected via collect_cash
326
+ self.state.machine_cash = round(
327
+ self.state.machine_cash + sales_result.revenue, 2
328
+ )
329
+ complaint = self.customer_service.maybe_create_complaint(
330
+ self.state.day_index, sales_result.units_sold
331
+ )
332
+ refund_amount = 0.0
333
+ if complaint is not None:
334
+ refund_amount = complaint.amount
335
+ self.state.cash_balance = self.customer_service.process_refund(
336
+ self.state.cash_balance, complaint.amount
337
+ )
338
+ self.state.cash_balance = round(
339
+ self.state.cash_balance - self.config.daily_machine_fee, 2
340
+ )
341
+ if self.state.cash_balance < 0:
342
+ self.state.consecutive_negative_days += 1
343
+ else:
344
+ self.state.consecutive_negative_days = 0
345
+ if self.state.day_index % 7 == 0:
346
+ self.state.cash_balance = apply_weekly_costs(
347
+ cash_balance=self.state.cash_balance
348
+ + (self.config.daily_machine_fee * 7),
349
+ weekly_output_tokens=self.state.weekly_output_tokens,
350
+ token_cost_per_million=self.config.output_token_cost_per_million,
351
+ daily_fee=self.config.daily_machine_fee,
352
+ days_in_week=7,
353
+ )
354
+ self.state.weekly_output_tokens = 0
355
+ self.suppliers.tick_supplier_health(days=1)
356
+ self.state.daily_sales_history.append(
357
+ {
358
+ "day": self.state.day_index,
359
+ "weather": weather,
360
+ "season": season,
361
+ "day_of_week": day_of_week,
362
+ "sales": dict(sales_result.units_sold),
363
+ "revenue": sales_result.revenue,
364
+ "refund_amount": refund_amount,
365
+ "debug": dict(sales_result.debug),
366
+ }
367
+ )
368
+ self.state.day_index += 1
369
+ self.state.minute_of_day = 0
370
+ return ToolCallResult(
371
+ "ok",
372
+ {
373
+ "sales": dict(sales_result.units_sold),
374
+ "revenue": sales_result.revenue,
375
+ "weather": weather,
376
+ "refund_amount": refund_amount,
377
+ },
378
+ )
379
+
380
+ def final_score(self) -> float:
381
+ """Score is final bank balance only (per spec)."""
382
+ return round(self.state.cash_balance, 2)
383
+
384
+ def is_done(self) -> bool:
385
+ return (
386
+ self.state.day_index > self.config.episode_days
387
+ or self.state.consecutive_negative_days
388
+ >= self.config.bankruptcy_consecutive_negative_days
389
+ )
390
+
391
+ def snapshot(self) -> dict[str, Any]:
392
+ data = self.state.snapshot()
393
+ data["tools"] = self.tool_registry()
394
+ data["done"] = self.is_done()
395
+ return data
vendsim_vb2/mcp_env.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Any, Optional
4
+
5
+ from fastmcp import FastMCP
6
+
7
+ from openenv.core.env_server.mcp_environment import MCPEnvironment
8
+ from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation
9
+ from openenv.core.env_server.types import Action, Observation
10
+
11
+ from vendsim_vb2.config import VB2Config
12
+ from vendsim_vb2.environment import VendingBench2Environment
13
+
14
+
15
+ class VB2MCPEnvironment(MCPEnvironment):
16
+ """OpenEnv MCP wrapper around VendingBench2Environment."""
17
+
18
+ def __init__(
19
+ self,
20
+ config: VB2Config | None = None,
21
+ seed: int | None = None,
22
+ use_dense_rewards: bool = False,
23
+ ) -> None:
24
+ self._config = config or VB2Config()
25
+ self._seed = seed
26
+ self._use_dense_rewards = use_dense_rewards
27
+ self._inner_env: VendingBench2Environment | None = None
28
+ self._prev_score: float = 0.0
29
+
30
+ mcp = FastMCP("vending-bench-2")
31
+ self._register_tools(mcp)
32
+ super().__init__(mcp)
33
+
34
+ # ------------------------------------------------------------------
35
+ # Tool registration
36
+ # ------------------------------------------------------------------
37
+
38
+ def _register_tools(self, mcp: FastMCP) -> None:
39
+ env_ref = self
40
+
41
+ @mcp.tool()
42
+ def set_price(product: str, price: float) -> dict:
43
+ """Update the price of a product in the vending machine."""
44
+ r = env_ref._inner_env.set_price(product, price)
45
+ return {"status": r.status, **r.payload}
46
+
47
+ @mcp.tool()
48
+ def send_email(recipient: str, subject: str, body: str) -> dict:
49
+ """Send an email to a supplier or service provider."""
50
+ r = env_ref._inner_env.send_email(recipient, subject, body)
51
+ return {"status": r.status, **r.payload}
52
+
53
+ @mcp.tool()
54
+ def check_balance() -> dict:
55
+ """Review current bank balance."""
56
+ r = env_ref._inner_env.check_balance()
57
+ return {"status": r.status, **r.payload}
58
+
59
+ @mcp.tool()
60
+ def check_storage_inventory() -> dict:
61
+ """Inspect the storage inventory."""
62
+ r = env_ref._inner_env.check_storage_inventory()
63
+ return {"status": r.status, **r.payload}
64
+
65
+ @mcp.tool()
66
+ def wait_for_next_day(output_tokens: int = 0) -> dict:
67
+ """Advance simulation to the next business day."""
68
+ r = env_ref._inner_env.wait_for_next_day(output_tokens)
69
+ return {"status": r.status, **r.payload}
70
+
71
+ @mcp.tool()
72
+ def run_sub_agent(tool_name: str, arguments: dict[str, Any] | None = None) -> dict:
73
+ """Delegate a physical-world action to the sub-agent."""
74
+ r = env_ref._inner_env.run_sub_agent(tool_name, **(arguments or {}))
75
+ return {"status": r.status, **r.payload}
76
+
77
+ @mcp.tool()
78
+ def chat_with_sub_agent(message: str) -> dict:
79
+ """Message the sub-agent without taking action."""
80
+ r = env_ref._inner_env.chat_with_sub_agent(message)
81
+ return {"status": r.status, **r.payload}
82
+
83
+ @mcp.tool()
84
+ def write_scratchpad(note: str) -> dict:
85
+ """Append a note to working memory."""
86
+ r = env_ref._inner_env.write_scratchpad(note)
87
+ return {"status": r.status, **r.payload}
88
+
89
+ @mcp.tool()
90
+ def read_scratchpad() -> dict:
91
+ """Read the working-memory scratchpad."""
92
+ r = env_ref._inner_env.read_scratchpad()
93
+ return {"status": r.status, **r.payload}
94
+
95
+ @mcp.tool()
96
+ def search_notes(query: str) -> dict:
97
+ """Search saved notes for a keyword."""
98
+ r = env_ref._inner_env.search_notes(query)
99
+ return {"status": r.status, **r.payload}
100
+
101
+ @mcp.tool()
102
+ def set_reminder(day: int, message: str) -> dict:
103
+ """Schedule a future reminder."""
104
+ r = env_ref._inner_env.set_reminder(day, message)
105
+ return {"status": r.status, **r.payload}
106
+
107
+ @mcp.tool()
108
+ def request_supplier_quote(product: str, qty: int) -> dict:
109
+ """Request a price quote from a supplier for a product."""
110
+ r = env_ref._inner_env.request_supplier_quote(product, qty)
111
+ return {"status": r.status, **r.payload}
112
+
113
+ @mcp.tool()
114
+ def negotiate_supplier(quote_id: str, proposed_unit_price: float) -> dict:
115
+ """Negotiate a supplier quote with a proposed unit price."""
116
+ r = env_ref._inner_env.negotiate_supplier(quote_id, proposed_unit_price)
117
+ return {"status": r.status, **r.payload}
118
+
119
+ @mcp.tool()
120
+ def place_supplier_order(product: str, qty: int) -> dict:
121
+ """Place a confirmed order with a supplier."""
122
+ r = env_ref._inner_env.place_supplier_order(product, qty)
123
+ return {"status": r.status, **r.payload}
124
+
125
+ @mcp.tool()
126
+ def check_delivery(order_id: str) -> dict:
127
+ """Check delivery status. On success, items are added to storage and cost is charged."""
128
+ r = env_ref._inner_env.resolve_delivery(order_id)
129
+ return {"status": r.status, **r.payload}
130
+
131
+ @mcp.tool()
132
+ def get_status() -> dict:
133
+ """Return a full snapshot of the current environment state."""
134
+ return env_ref._inner_env.snapshot()
135
+
136
+ # ------------------------------------------------------------------
137
+ # MCPEnvironment interface
138
+ # ------------------------------------------------------------------
139
+
140
+ def step(
141
+ self,
142
+ action: Action,
143
+ timeout_s: Optional[float] = None,
144
+ **kwargs: Any,
145
+ ) -> Observation:
146
+ """Override step to propagate reward/done on the Observation object."""
147
+ obs = super().step(action, timeout_s=timeout_s, **kwargs)
148
+ if not isinstance(obs, CallToolObservation) or self._inner_env is None:
149
+ return obs
150
+
151
+ done = self._inner_env.is_done()
152
+ obs.done = done
153
+
154
+ if isinstance(action, CallToolAction) and action.tool_name == "wait_for_next_day":
155
+ if self._use_dense_rewards:
156
+ # Dense: per-day delta of bank balance
157
+ new_score = self._inner_env.final_score()
158
+ obs.reward = round(new_score - self._prev_score, 2)
159
+ self._prev_score = new_score
160
+ elif done:
161
+ # Sparse default: final bank balance at terminal step only
162
+ obs.reward = self._inner_env.final_score()
163
+ else:
164
+ obs.reward = 0.0
165
+ else:
166
+ obs.reward = 0.0
167
+
168
+ return obs
169
+
170
+ def reset(
171
+ self,
172
+ seed: int | None = None,
173
+ episode_id: str | None = None,
174
+ **kwargs: Any,
175
+ ) -> CallToolObservation:
176
+ effective_seed = seed if seed is not None else self._seed
177
+ self._inner_env = VendingBench2Environment(
178
+ config=self._config,
179
+ seed=effective_seed,
180
+ use_dense_rewards=self._use_dense_rewards,
181
+ )
182
+ self._prev_score = self._inner_env.final_score()
183
+ snapshot = self._inner_env.snapshot()
184
+ snapshot["reward"] = 0.0
185
+ snapshot["done"] = False
186
+ return CallToolObservation(
187
+ tool_name="reset",
188
+ result=snapshot,
189
+ reward=0.0,
190
+ done=False,
191
+ )
192
+
193
+ def _step_impl(
194
+ self,
195
+ action: Action,
196
+ timeout_s: float | None = None,
197
+ **kwargs: Any,
198
+ ) -> Observation:
199
+ raise NotImplementedError("All actions are routed through MCP tools.")
200
+
201
+ @property
202
+ def state(self) -> dict[str, Any]:
203
+ if self._inner_env is None:
204
+ return {}
205
+ return self._inner_env.snapshot()
vendsim_vb2/prompts.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = """You are Charles Paxton, an autonomous AI agent running a vending machine business.
2
+ There is no user in this environment.
3
+ You have full agency to manage pricing, inventory, supplier negotiations, and reminders.
4
+ Your objective is to maximize final bank balance over a one-year operating horizon.
5
+ Weekly output token usage is billed at $100 per million output tokens.
6
+ """
vendsim_vb2/rewards.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+
4
+ def compute_reward(
5
+ final_bank_balance: float, dense_components: list[float], use_dense: bool
6
+ ) -> float:
7
+ if not use_dense:
8
+ return final_bank_balance
9
+ return final_bank_balance + sum(dense_components)
vendsim_vb2/server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Server package for the Vending-Bench 2 app factory."""
vendsim_vb2/server/app.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi import FastAPI
4
+ from openenv.core.env_server.http_server import HTTPEnvServer
5
+ from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation
6
+
7
+ from vendsim_vb2.mcp_env import VB2MCPEnvironment
8
+
9
+
10
+ def create_app() -> FastAPI:
11
+ app = FastAPI(title="Vending-Bench 2 Environment")
12
+ server = HTTPEnvServer(
13
+ env=VB2MCPEnvironment,
14
+ action_cls=CallToolAction,
15
+ observation_cls=CallToolObservation,
16
+ )
17
+ server.register_routes(app)
18
+ return app
19
+
20
+
21
+ app = create_app()
vendsim_vb2/state.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass, field
4
+
5
+ from vendsim_vb2.config import VB2Config
6
+
7
+
8
+ @dataclass(slots=True)
9
+ class Reminder:
10
+ day: int
11
+ message: str
12
+
13
+
14
+ @dataclass(slots=True)
15
+ class SimulationState:
16
+ day_index: int
17
+ minute_of_day: int
18
+ cash_balance: float
19
+ storage_inventory: dict[str, int] = field(default_factory=dict)
20
+ machine_inventory: dict[str, int] = field(default_factory=dict)
21
+ machine_cash: float = 0.0
22
+ weekly_output_tokens: int = 0
23
+ consecutive_negative_days: int = 0
24
+ scratchpad: list[str] = field(default_factory=list)
25
+ reminders: list[Reminder] = field(default_factory=list)
26
+ notes: list[str] = field(default_factory=list)
27
+ email_log: list[dict[str, object]] = field(default_factory=list)
28
+ subagent_chat_log: list[str] = field(default_factory=list)
29
+ daily_sales_history: list[dict[str, object]] = field(default_factory=list)
30
+ prices: dict[str, float] = field(default_factory=dict)
31
+
32
+ @classmethod
33
+ def new_episode(cls, config: VB2Config | None = None) -> "SimulationState":
34
+ cfg = config or VB2Config()
35
+ return cls(day_index=1, minute_of_day=0, cash_balance=cfg.starting_balance)
36
+
37
+ def advance_minutes(self, minutes: int) -> None:
38
+ if minutes < 0:
39
+ raise ValueError("minutes must be non-negative")
40
+ total = self.minute_of_day + minutes
41
+ self.day_index += total // (24 * 60)
42
+ self.minute_of_day = total % (24 * 60)
43
+
44
+ def add_reminder(self, day: int, message: str) -> None:
45
+ self.reminders.append(Reminder(day=day, message=message))
46
+
47
+ def snapshot(self) -> dict[str, object]:
48
+ return {
49
+ "day_index": self.day_index,
50
+ "minute_of_day": self.minute_of_day,
51
+ "cash_balance": round(self.cash_balance, 2),
52
+ "storage_inventory": dict(self.storage_inventory),
53
+ "machine_inventory": dict(self.machine_inventory),
54
+ "machine_cash": round(self.machine_cash, 2),
55
+ "weekly_output_tokens": self.weekly_output_tokens,
56
+ "consecutive_negative_days": self.consecutive_negative_days,
57
+ "scratchpad": list(self.scratchpad),
58
+ "reminders": [{"day": r.day, "message": r.message} for r in self.reminders],
59
+ "notes": list(self.notes),
60
+ "email_log": [dict(entry) for entry in self.email_log],
61
+ "subagent_chat_log": list(self.subagent_chat_log),
62
+ "prices": dict(self.prices),
63
+ }
vendsim_vb2/subagent.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass, field
4
+
5
+ from vendsim_vb2.config import VB2Config
6
+ from vendsim_vb2.demand import PRODUCTS
7
+
8
+ MACHINE_LAYOUT = {
9
+ "small_rows": 2,
10
+ "large_rows": 2,
11
+ "slots_per_row": 3,
12
+ "total_slots": 12,
13
+ }
14
+
15
+ RESTOCK_TRAVEL_TIME_MINUTES = 75
16
+
17
+
18
+ @dataclass(slots=True)
19
+ class SubAgent:
20
+ config: VB2Config = field(default_factory=VB2Config)
21
+ machine_inventory: dict[str, int] = field(default_factory=dict)
22
+ machine_cash: float = 0.0
23
+
24
+ def specs(self) -> dict[str, object]:
25
+ return {
26
+ "name": "physical-ops-sub-agent",
27
+ "tools": ["restock_machine", "collect_cash", "get_machine_inventory"],
28
+ }
29
+
30
+ def machine_layout(self) -> dict[str, int]:
31
+ return dict(MACHINE_LAYOUT)
32
+
33
+ def restock_machine(self, product: str, qty: int) -> dict[str, object]:
34
+ if qty <= 0:
35
+ return {"status": "rejected", "message": "qty must be positive"}
36
+ size = str(PRODUCTS.get(product, {}).get("size", "small"))
37
+ max_slots = MACHINE_LAYOUT[f"{size}_rows"] * MACHINE_LAYOUT["slots_per_row"]
38
+ current = sum(
39
+ units
40
+ for stocked_product, units in self.machine_inventory.items()
41
+ if str(PRODUCTS.get(stocked_product, {}).get("size", "small")) == size
42
+ )
43
+ if current + qty > max_slots:
44
+ return {"status": "rejected", "message": f"{size} slots full"}
45
+ self.machine_inventory[product] = self.machine_inventory.get(product, 0) + qty
46
+ return {
47
+ "status": "ok",
48
+ "time_cost_minutes": self.config.restock_travel_time_minutes,
49
+ "machine_inventory": dict(self.machine_inventory),
50
+ }
51
+
52
+ def collect_cash(self) -> dict[str, object]:
53
+ collected = round(self.machine_cash, 2)
54
+ self.machine_cash = 0.0
55
+ return {
56
+ "status": "ok",
57
+ "amount_collected": collected,
58
+ "time_cost_minutes": self.config.restock_travel_time_minutes,
59
+ }
60
+
61
+ def get_machine_inventory(self) -> dict[str, int]:
62
+ return dict(self.machine_inventory)
vendsim_vb2/suppliers.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from random import Random
5
+
6
+ from vendsim_vb2.demand import PRODUCTS
7
+
8
+
9
+ @dataclass(slots=True)
10
+ class Quote:
11
+ quote_id: str
12
+ product: str
13
+ qty: int
14
+ unit_price: float
15
+ fair_unit_price: float
16
+ supplier_name: str
17
+
18
+
19
+ @dataclass(slots=True)
20
+ class NegotiationResponse:
21
+ quote_id: str
22
+ status: str
23
+ unit_price: float
24
+ message: str
25
+
26
+
27
+ @dataclass(slots=True)
28
+ class SupplierOrder:
29
+ order_id: str
30
+ product: str
31
+ qty: int
32
+ unit_price: float
33
+ supplier_name: str
34
+ may_bait_and_switch: bool
35
+ status: str = "confirmed"
36
+
37
+
38
+ @dataclass(slots=True)
39
+ class DeliveryTimeline:
40
+ order_id: str
41
+ status: str
42
+ delivered_qty: int
43
+ days_late: int
44
+ final_unit_price: float
45
+
46
+
47
+ class SupplierEngine:
48
+ def __init__(self, seed: int | None = None) -> None:
49
+ self._rng = Random(seed)
50
+ self._quotes: dict[str, Quote] = {}
51
+ self._orders: dict[str, SupplierOrder] = {}
52
+ self._resolved_deliveries: dict[str, DeliveryTimeline] = {}
53
+ self._quote_counter = 0
54
+ self._order_counter = 0
55
+ self._health = "active"
56
+
57
+ def request_quote(self, product: str, qty: int) -> Quote:
58
+ fair_price = float(PRODUCTS.get(product, {}).get("wholesale_price", 1.0))
59
+ markup = 0.85 + (self._rng.random() * 1.4)
60
+ quoted_price = round(fair_price * markup, 2)
61
+ self._quote_counter += 1
62
+ quote = Quote(
63
+ quote_id=f"quote-{self._quote_counter}",
64
+ product=product,
65
+ qty=qty,
66
+ unit_price=quoted_price,
67
+ fair_unit_price=round(fair_price, 2),
68
+ supplier_name=f"supplier-{self._rng.randint(1, 5)}",
69
+ )
70
+ self._quotes[quote.quote_id] = quote
71
+ return quote
72
+
73
+ def negotiate(
74
+ self, quote_id: str, proposed_unit_price: float
75
+ ) -> NegotiationResponse:
76
+ quote = self._quotes[quote_id]
77
+ floor_price = round(quote.fair_unit_price * 0.9, 2)
78
+ if proposed_unit_price >= quote.unit_price:
79
+ return NegotiationResponse(
80
+ quote_id=quote_id,
81
+ status="accepted",
82
+ unit_price=round(proposed_unit_price, 2),
83
+ message="Accepted at your proposed price.",
84
+ )
85
+ if proposed_unit_price >= floor_price:
86
+ if self._rng.random() < 0.55:
87
+ return NegotiationResponse(
88
+ quote_id=quote_id,
89
+ status="accepted",
90
+ unit_price=round(proposed_unit_price, 2),
91
+ message="Accepted after negotiation.",
92
+ )
93
+ counter_price = round((proposed_unit_price + quote.unit_price) / 2, 2)
94
+ return NegotiationResponse(
95
+ quote_id=quote_id,
96
+ status="countered",
97
+ unit_price=counter_price,
98
+ message="Counteroffer issued.",
99
+ )
100
+ return NegotiationResponse(
101
+ quote_id=quote_id,
102
+ status="rejected",
103
+ unit_price=quote.unit_price,
104
+ message="Offer too low.",
105
+ )
106
+
107
+ def place_email_confirmed_order(self, product: str, qty: int) -> SupplierOrder:
108
+ fair_price = float(PRODUCTS.get(product, {}).get("wholesale_price", 1.0))
109
+ unit_price = round(fair_price * (0.95 + self._rng.random() * 0.5), 2)
110
+ self._order_counter += 1
111
+ order = SupplierOrder(
112
+ order_id=f"order-{self._order_counter}",
113
+ product=product,
114
+ qty=qty,
115
+ unit_price=unit_price,
116
+ supplier_name=f"supplier-{self._rng.randint(1, 5)}",
117
+ may_bait_and_switch=self._rng.random() < 0.35,
118
+ )
119
+ self._orders[order.order_id] = order
120
+ return order
121
+
122
+ def simulate_delivery(self, order_id: str) -> DeliveryTimeline:
123
+ # Return cached result if already resolved (idempotent)
124
+ if order_id in self._resolved_deliveries:
125
+ return self._resolved_deliveries[order_id]
126
+
127
+ order = self._orders[order_id]
128
+ if self._health == "out_of_business":
129
+ result = DeliveryTimeline(
130
+ order_id=order_id,
131
+ status="failed",
132
+ delivered_qty=0,
133
+ days_late=0,
134
+ final_unit_price=order.unit_price,
135
+ )
136
+ self._resolved_deliveries[order_id] = result
137
+ return result
138
+ roll = self._rng.random()
139
+ if roll < 0.55:
140
+ status = "delivered"
141
+ delivered_qty = order.qty
142
+ days_late = 0
143
+ elif roll < 0.8:
144
+ status = "delayed"
145
+ delivered_qty = order.qty
146
+ days_late = self._rng.randint(1, 7)
147
+ elif roll < 0.92:
148
+ status = "partial"
149
+ delivered_qty = max(1, int(order.qty * (0.4 + self._rng.random() * 0.4)))
150
+ days_late = self._rng.randint(0, 5)
151
+ else:
152
+ status = "failed"
153
+ delivered_qty = 0
154
+ days_late = 0
155
+ final_unit_price = order.unit_price
156
+ if (
157
+ order.may_bait_and_switch
158
+ and status in {"delivered", "delayed", "partial"}
159
+ and self._rng.random() < 0.5
160
+ ):
161
+ final_unit_price = round(
162
+ order.unit_price * (1.05 + self._rng.random() * 0.25), 2
163
+ )
164
+ result = DeliveryTimeline(
165
+ order_id=order_id,
166
+ status=status,
167
+ delivered_qty=delivered_qty,
168
+ days_late=days_late,
169
+ final_unit_price=final_unit_price,
170
+ )
171
+ self._resolved_deliveries[order_id] = result
172
+ return result
173
+
174
+ def tick_supplier_health(self, days: int = 1) -> str:
175
+ if self._health == "out_of_business":
176
+ return self._health
177
+ failure_risk = min(0.45, days / 365 * 0.7)
178
+ if self._rng.random() < failure_risk:
179
+ self._health = "out_of_business"
180
+ return self._health
vendsim_vb2/tools/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Tool registries for the Vending-Bench 2 environment."""
vendsim_vb2/tools/main_agent_tools.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+
5
+
6
+ @dataclass(frozen=True, slots=True)
7
+ class ToolSpec:
8
+ name: str
9
+ description: str
10
+ time_cost_minutes: int
11
+
12
+
13
+ MAIN_TOOL_SPECS: tuple[ToolSpec, ...] = (
14
+ ToolSpec("set_price", "Update the price of a product in the vending machine.", 5),
15
+ ToolSpec("send_email", "Send an email to a supplier or service provider.", 10),
16
+ ToolSpec("check_balance", "Review current bank balance.", 1),
17
+ ToolSpec("check_storage_inventory", "Inspect the storage inventory.", 2),
18
+ ToolSpec("wait_for_next_day", "Advance simulation to the next business day.", 0),
19
+ ToolSpec("run_sub_agent", "Delegate a physical-world action to the sub-agent.", 0),
20
+ ToolSpec("chat_with_sub_agent", "Message the sub-agent without taking action.", 5),
21
+ ToolSpec("request_supplier_quote", "Request a quote from a supplier.", 10),
22
+ ToolSpec("negotiate_supplier", "Negotiate pricing with a supplier.", 10),
23
+ ToolSpec("place_supplier_order", "Place a supplier order after email confirmation.", 10),
24
+ ToolSpec("check_delivery", "Check the delivery status of a supplier order.", 5),
25
+ ToolSpec("get_status", "Return a full environment snapshot.", 0),
26
+ )
27
+
28
+
29
+ def list_main_tools() -> list[str]:
30
+ return [spec.name for spec in MAIN_TOOL_SPECS]
31
+
32
+
33
+ def get_main_tool_specs() -> tuple[ToolSpec, ...]:
34
+ return MAIN_TOOL_SPECS
vendsim_vb2/tools/memory_tools.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+
5
+
6
+ @dataclass(frozen=True, slots=True)
7
+ class MemoryToolSpec:
8
+ name: str
9
+ description: str
10
+
11
+
12
+ MEMORY_TOOL_SPECS: tuple[MemoryToolSpec, ...] = (
13
+ MemoryToolSpec("write_scratchpad", "Append a note to working memory."),
14
+ MemoryToolSpec("read_scratchpad", "Read the working-memory scratchpad."),
15
+ MemoryToolSpec("search_notes", "Search saved notes for a keyword."),
16
+ MemoryToolSpec("set_reminder", "Schedule a future reminder."),
17
+ )
18
+
19
+
20
+ def list_memory_tools() -> list[str]:
21
+ return [spec.name for spec in MEMORY_TOOL_SPECS]
22
+
23
+
24
+ def get_memory_tool_specs() -> tuple[MemoryToolSpec, ...]:
25
+ return MEMORY_TOOL_SPECS