Spaces:
Sleeping
Sleeping
retroam Amp commited on
Commit ·
00b2ea2
1
Parent(s): 821b942
Add VendSim VB2 environment
Browse filesAmp-Thread-ID: https://ampcode.com/threads/T-019cce9e-be2b-718e-880f-eeb8e81cf219
Co-authored-by: Amp <amp@ampcode.com>
- Dockerfile +12 -0
- README.md +120 -10
- pyproject.toml +25 -0
- vendsim_vb2/__init__.py +8 -0
- vendsim_vb2/billing.py +13 -0
- vendsim_vb2/client.py +238 -0
- vendsim_vb2/compat.py +36 -0
- vendsim_vb2/config.py +21 -0
- vendsim_vb2/customer_service.py +41 -0
- vendsim_vb2/demand.py +171 -0
- vendsim_vb2/environment.py +395 -0
- vendsim_vb2/mcp_env.py +205 -0
- vendsim_vb2/prompts.py +6 -0
- vendsim_vb2/rewards.py +9 -0
- vendsim_vb2/server/__init__.py +1 -0
- vendsim_vb2/server/app.py +21 -0
- vendsim_vb2/state.py +63 -0
- vendsim_vb2/subagent.py +62 -0
- vendsim_vb2/suppliers.py +180 -0
- vendsim_vb2/tools/__init__.py +1 -0
- vendsim_vb2/tools/main_agent_tools.py +34 -0
- vendsim_vb2/tools/memory_tools.py +25 -0
Dockerfile
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.12-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
COPY pyproject.toml README.md ./
|
| 6 |
+
COPY vendsim_vb2/ vendsim_vb2/
|
| 7 |
+
|
| 8 |
+
RUN pip install --no-cache-dir ".[server]"
|
| 9 |
+
|
| 10 |
+
EXPOSE 7860
|
| 11 |
+
|
| 12 |
+
CMD ["uvicorn", "vendsim_vb2.server.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
CHANGED
|
@@ -1,10 +1,120 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# vendsim-vb2
|
| 2 |
+
|
| 3 |
+
`vendsim-vb2` is an OpenEnv 0.2.1-compatible implementation of a Vending-Bench 2 style environment.
|
| 4 |
+
|
| 5 |
+
The agent runs a vending machine business over a 365-day horizon. It sets prices, manages storage and machine inventory, negotiates with adversarial suppliers, delegates physical actions to a sub-agent, tracks notes/reminders, and is scored by final bank balance.
|
| 6 |
+
|
| 7 |
+
## Environment Summary
|
| 8 |
+
|
| 9 |
+
- Starting balance: `$500`
|
| 10 |
+
- Episode length: `365` simulated days
|
| 11 |
+
- Daily machine fee: `$2`
|
| 12 |
+
- Bankruptcy rule: `10` consecutive negative-balance days
|
| 13 |
+
- Weekly token billing: `$100 / 1M output tokens`
|
| 14 |
+
- Machine layout: `4 x 3` slots
|
| 15 |
+
`2` small rows and `2` large rows
|
| 16 |
+
- Restock travel time: `75` minutes
|
| 17 |
+
- Reward:
|
| 18 |
+
Default benchmark reward is sparse terminal reward equal to final bank balance.
|
| 19 |
+
Dense shaping is available behind a training flag.
|
| 20 |
+
|
| 21 |
+
## MCP Tool Surface
|
| 22 |
+
|
| 23 |
+
Main-agent tools:
|
| 24 |
+
|
| 25 |
+
- `set_price`
|
| 26 |
+
- `send_email`
|
| 27 |
+
- `check_balance`
|
| 28 |
+
- `check_storage_inventory`
|
| 29 |
+
- `wait_for_next_day`
|
| 30 |
+
- `run_sub_agent`
|
| 31 |
+
- `chat_with_sub_agent`
|
| 32 |
+
- `request_supplier_quote`
|
| 33 |
+
- `negotiate_supplier`
|
| 34 |
+
- `place_supplier_order`
|
| 35 |
+
- `check_delivery`
|
| 36 |
+
- `get_status`
|
| 37 |
+
|
| 38 |
+
Memory tools:
|
| 39 |
+
|
| 40 |
+
- `write_scratchpad`
|
| 41 |
+
- `read_scratchpad`
|
| 42 |
+
- `search_notes`
|
| 43 |
+
- `set_reminder`
|
| 44 |
+
|
| 45 |
+
Sub-agent tools exposed through `run_sub_agent`:
|
| 46 |
+
|
| 47 |
+
- `restock_machine`
|
| 48 |
+
- `collect_cash`
|
| 49 |
+
- `get_machine_inventory`
|
| 50 |
+
|
| 51 |
+
## Repository Artifacts
|
| 52 |
+
|
| 53 |
+
Code:
|
| 54 |
+
|
| 55 |
+
- Environment server: [vendsim_vb2/server/app.py](./vendsim_vb2/server/app.py)
|
| 56 |
+
- MCP wrapper: [vendsim_vb2/mcp_env.py](./vendsim_vb2/mcp_env.py)
|
| 57 |
+
- Core simulation: [vendsim_vb2/environment.py](./vendsim_vb2/environment.py)
|
| 58 |
+
|
| 59 |
+
Notebooks:
|
| 60 |
+
|
| 61 |
+
- Setup verification: [00_setup_verification.ipynb](../notebooks/00_setup_verification.ipynb)
|
| 62 |
+
- Training notebook: [01_vb2_training_grpo.ipynb](../notebooks/01_vb2_training_grpo.ipynb)
|
| 63 |
+
- Final benchmark run: [02_vb2_final_run.ipynb](../notebooks/02_vb2_final_run.ipynb)
|
| 64 |
+
|
| 65 |
+
Tests:
|
| 66 |
+
|
| 67 |
+
- Test suite: [tests](./tests)
|
| 68 |
+
|
| 69 |
+
## Local Setup
|
| 70 |
+
|
| 71 |
+
From the repository root:
|
| 72 |
+
|
| 73 |
+
```bash
|
| 74 |
+
python3 -m venv .venv
|
| 75 |
+
source .venv/bin/activate
|
| 76 |
+
pip install -e ./vendsim_vb2[server,dev]
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
Run the tests:
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
PYTHONPATH=vendsim_vb2 pytest vendsim_vb2/tests -q
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
## Run Locally
|
| 86 |
+
|
| 87 |
+
Start the OpenEnv-compatible server:
|
| 88 |
+
|
| 89 |
+
```bash
|
| 90 |
+
PYTHONPATH=vendsim_vb2 python -m uvicorn vendsim_vb2.server.app:create_app --factory --host 0.0.0.0 --port 8000
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
Then connect with `VB2Client` or use the notebooks.
|
| 94 |
+
|
| 95 |
+
## Hugging Face Spaces Deployment
|
| 96 |
+
|
| 97 |
+
Build and verify locally first:
|
| 98 |
+
|
| 99 |
+
```bash
|
| 100 |
+
cd vendsim_vb2
|
| 101 |
+
docker build -t vendsim-vb2 .
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
Then deploy with OpenEnv tooling from the repo root after configuring your Hugging Face credentials:
|
| 105 |
+
|
| 106 |
+
```bash
|
| 107 |
+
openenv push
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
Submission artifact placeholders:
|
| 111 |
+
|
| 112 |
+
- HF Space URL: `TODO`
|
| 113 |
+
- Installable package / repo URL: `TODO`
|
| 114 |
+
- Demo video URL: `TODO`
|
| 115 |
+
|
| 116 |
+
## Training Artifact
|
| 117 |
+
|
| 118 |
+
A minimal training script in Colab using Unsloth or HF TRL is included:
|
| 119 |
+
|
| 120 |
+
- [01_vb2_training_grpo.ipynb](../notebooks/01_vb2_training_grpo.ipynb)
|
pyproject.toml
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=68", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "vendsim-vb2"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "OpenEnv-compatible Vending-Bench 2 simulation environment"
|
| 9 |
+
readme = "README.md"
|
| 10 |
+
requires-python = ">=3.11"
|
| 11 |
+
dependencies = [
|
| 12 |
+
"openenv-core==0.2.1",
|
| 13 |
+
"fastmcp",
|
| 14 |
+
]
|
| 15 |
+
|
| 16 |
+
[project.optional-dependencies]
|
| 17 |
+
server = ["fastapi>=0.115", "uvicorn>=0.34"]
|
| 18 |
+
dev = ["pytest>=8.0", "ruff>=0.11"]
|
| 19 |
+
|
| 20 |
+
[tool.setuptools]
|
| 21 |
+
package-dir = {"" = "."}
|
| 22 |
+
|
| 23 |
+
[tool.setuptools.packages.find]
|
| 24 |
+
where = ["."]
|
| 25 |
+
include = ["vendsim_vb2*"]
|
vendsim_vb2/__init__.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Vending-Bench 2 environment package."""
|
| 2 |
+
|
| 3 |
+
from vendsim_vb2.client import VB2Client
|
| 4 |
+
from vendsim_vb2.config import VB2Config
|
| 5 |
+
from vendsim_vb2.environment import VendingBench2Environment
|
| 6 |
+
from vendsim_vb2.mcp_env import VB2MCPEnvironment
|
| 7 |
+
|
| 8 |
+
__all__ = ["VB2Client", "VB2Config", "VendingBench2Environment", "VB2MCPEnvironment"]
|
vendsim_vb2/billing.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def apply_weekly_costs(
|
| 5 |
+
cash_balance: float,
|
| 6 |
+
weekly_output_tokens: int,
|
| 7 |
+
token_cost_per_million: float,
|
| 8 |
+
daily_fee: float,
|
| 9 |
+
days_in_week: int,
|
| 10 |
+
) -> float:
|
| 11 |
+
token_cost = (weekly_output_tokens / 1_000_000) * token_cost_per_million
|
| 12 |
+
total_cost = token_cost + (daily_fee * days_in_week)
|
| 13 |
+
return round(cash_balance - total_cost, 2)
|
vendsim_vb2/client.py
ADDED
|
@@ -0,0 +1,238 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""VB2 environment client for agents and training scripts."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Any, Dict, List, Optional
|
| 6 |
+
|
| 7 |
+
from openenv.core.env_server.mcp_types import (
|
| 8 |
+
CallToolAction,
|
| 9 |
+
CallToolObservation,
|
| 10 |
+
ListToolsAction,
|
| 11 |
+
ListToolsObservation,
|
| 12 |
+
Observation,
|
| 13 |
+
Tool,
|
| 14 |
+
ToolError,
|
| 15 |
+
)
|
| 16 |
+
from openenv.core.env_client import EnvClient, StepResult
|
| 17 |
+
from openenv.core.mcp_client import State
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class VB2Client(EnvClient[Any, Observation, State]):
|
| 21 |
+
"""
|
| 22 |
+
Client for the Vending-Bench 2 MCP environment.
|
| 23 |
+
|
| 24 |
+
Provides typed convenience methods for every VB2 tool, plus the full
|
| 25 |
+
``step()`` / ``reset()`` API inherited from :class:`EnvClient`.
|
| 26 |
+
|
| 27 |
+
Example::
|
| 28 |
+
|
| 29 |
+
with VB2Client(base_url="http://localhost:8000") as env:
|
| 30 |
+
env.reset()
|
| 31 |
+
balance = env.check_balance()
|
| 32 |
+
env.set_price("soda", 1.75)
|
| 33 |
+
quote = env.request_supplier_quote("chips", 20)
|
| 34 |
+
sales = env.wait_for_next_day()
|
| 35 |
+
"""
|
| 36 |
+
|
| 37 |
+
def __init__(
|
| 38 |
+
self,
|
| 39 |
+
base_url: str,
|
| 40 |
+
connect_timeout_s: float = 10.0,
|
| 41 |
+
message_timeout_s: float = 60.0,
|
| 42 |
+
provider: Optional[Any] = None,
|
| 43 |
+
) -> None:
|
| 44 |
+
super().__init__(
|
| 45 |
+
base_url=base_url,
|
| 46 |
+
connect_timeout_s=connect_timeout_s,
|
| 47 |
+
message_timeout_s=message_timeout_s,
|
| 48 |
+
provider=provider,
|
| 49 |
+
)
|
| 50 |
+
self._tools_cache: Optional[List[Tool]] = None
|
| 51 |
+
|
| 52 |
+
# ------------------------------------------------------------------
|
| 53 |
+
# Abstract method implementations
|
| 54 |
+
# ------------------------------------------------------------------
|
| 55 |
+
|
| 56 |
+
def _step_payload(self, action: Any) -> Dict[str, Any]:
|
| 57 |
+
if isinstance(action, ListToolsAction):
|
| 58 |
+
return {"type": "list_tools"}
|
| 59 |
+
if isinstance(action, CallToolAction):
|
| 60 |
+
return {
|
| 61 |
+
"type": "call_tool",
|
| 62 |
+
"tool_name": action.tool_name,
|
| 63 |
+
"arguments": action.arguments,
|
| 64 |
+
}
|
| 65 |
+
if hasattr(action, "model_dump"):
|
| 66 |
+
return action.model_dump()
|
| 67 |
+
return {"action": str(action)}
|
| 68 |
+
|
| 69 |
+
def _parse_result(self, payload: Dict[str, Any]) -> StepResult[Observation]:
|
| 70 |
+
obs_data = payload.get("observation", {})
|
| 71 |
+
|
| 72 |
+
if "tools" in obs_data:
|
| 73 |
+
tools = [
|
| 74 |
+
Tool(
|
| 75 |
+
name=t.get("name", ""),
|
| 76 |
+
description=t.get("description", ""),
|
| 77 |
+
input_schema=t.get("input_schema", t.get("inputSchema", {})),
|
| 78 |
+
)
|
| 79 |
+
for t in obs_data.get("tools", [])
|
| 80 |
+
]
|
| 81 |
+
observation: Observation = ListToolsObservation(
|
| 82 |
+
tools=tools,
|
| 83 |
+
done=payload.get("done", False),
|
| 84 |
+
reward=payload.get("reward"),
|
| 85 |
+
metadata=obs_data.get("metadata", {}),
|
| 86 |
+
)
|
| 87 |
+
elif "tool_name" in obs_data:
|
| 88 |
+
error = None
|
| 89 |
+
if obs_data.get("error"):
|
| 90 |
+
error = ToolError(**obs_data["error"])
|
| 91 |
+
observation = CallToolObservation(
|
| 92 |
+
tool_name=obs_data.get("tool_name", ""),
|
| 93 |
+
result=obs_data.get("result"),
|
| 94 |
+
error=error,
|
| 95 |
+
done=payload.get("done", False),
|
| 96 |
+
reward=payload.get("reward"),
|
| 97 |
+
metadata=obs_data.get("metadata", {}),
|
| 98 |
+
)
|
| 99 |
+
else:
|
| 100 |
+
observation = Observation(
|
| 101 |
+
done=payload.get("done", False),
|
| 102 |
+
reward=payload.get("reward"),
|
| 103 |
+
metadata=obs_data.get("metadata", {}),
|
| 104 |
+
)
|
| 105 |
+
|
| 106 |
+
return StepResult(
|
| 107 |
+
observation=observation,
|
| 108 |
+
reward=payload.get("reward"),
|
| 109 |
+
done=payload.get("done", False),
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
def _parse_state(self, payload: Dict[str, Any]) -> State:
|
| 113 |
+
return State(
|
| 114 |
+
episode_id=payload.get("episode_id"),
|
| 115 |
+
step_count=payload.get("step_count", 0),
|
| 116 |
+
)
|
| 117 |
+
|
| 118 |
+
# ------------------------------------------------------------------
|
| 119 |
+
# Helper: call a tool and return its result
|
| 120 |
+
# ------------------------------------------------------------------
|
| 121 |
+
|
| 122 |
+
def _call_tool(self, tool_name: str, **kwargs: Any) -> Any:
|
| 123 |
+
"""Call a tool by name and return its result (or raise on error)."""
|
| 124 |
+
result = self.call_tool_step(tool_name, **kwargs)
|
| 125 |
+
obs = result.observation
|
| 126 |
+
|
| 127 |
+
if isinstance(obs, CallToolObservation) and obs.error is not None:
|
| 128 |
+
raise RuntimeError(
|
| 129 |
+
f"Tool '{tool_name}' failed: {obs.error.message} "
|
| 130 |
+
f"(type: {obs.error.error_type.value})"
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
if isinstance(obs, CallToolObservation):
|
| 134 |
+
res = obs.result
|
| 135 |
+
if hasattr(res, "data"):
|
| 136 |
+
return res.data
|
| 137 |
+
if isinstance(res, dict) and "data" in res:
|
| 138 |
+
return res["data"]
|
| 139 |
+
return res
|
| 140 |
+
|
| 141 |
+
return obs
|
| 142 |
+
|
| 143 |
+
def call_tool_step(self, tool_name: str, **kwargs: Any) -> StepResult[Observation]:
|
| 144 |
+
"""Call a tool and return the full StepResult with reward/done metadata."""
|
| 145 |
+
action = CallToolAction(tool_name=tool_name, arguments=kwargs)
|
| 146 |
+
return self.step(action)
|
| 147 |
+
|
| 148 |
+
# ------------------------------------------------------------------
|
| 149 |
+
# Convenience methods
|
| 150 |
+
# ------------------------------------------------------------------
|
| 151 |
+
|
| 152 |
+
def list_tools(self, use_cache: bool = True) -> List[Tool]:
|
| 153 |
+
"""Discover available tools from the environment."""
|
| 154 |
+
if use_cache and self._tools_cache is not None:
|
| 155 |
+
return self._tools_cache
|
| 156 |
+
result = self.step(ListToolsAction())
|
| 157 |
+
if isinstance(result.observation, ListToolsObservation):
|
| 158 |
+
self._tools_cache = result.observation.tools
|
| 159 |
+
return self._tools_cache
|
| 160 |
+
return []
|
| 161 |
+
|
| 162 |
+
def set_price(self, product: str, price: float) -> Any:
|
| 163 |
+
"""Update the price of a product in the vending machine."""
|
| 164 |
+
return self._call_tool("set_price", product=product, price=price)
|
| 165 |
+
|
| 166 |
+
def check_balance(self) -> Any:
|
| 167 |
+
"""Review current bank balance."""
|
| 168 |
+
return self._call_tool("check_balance")
|
| 169 |
+
|
| 170 |
+
def check_storage_inventory(self) -> Any:
|
| 171 |
+
"""Inspect the storage inventory."""
|
| 172 |
+
return self._call_tool("check_storage_inventory")
|
| 173 |
+
|
| 174 |
+
def wait_for_next_day(self, output_tokens: int = 0) -> Any:
|
| 175 |
+
"""Advance simulation to the next business day."""
|
| 176 |
+
return self._call_tool("wait_for_next_day", output_tokens=output_tokens)
|
| 177 |
+
|
| 178 |
+
def send_email(self, recipient: str, subject: str, body: str) -> Any:
|
| 179 |
+
"""Send an email to a supplier or service provider."""
|
| 180 |
+
return self._call_tool(
|
| 181 |
+
"send_email", recipient=recipient, subject=subject, body=body
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
+
def restock_machine(self, product: str, qty: int) -> Any:
|
| 185 |
+
"""Delegate to sub-agent: restock the vending machine from storage."""
|
| 186 |
+
return self._call_tool(
|
| 187 |
+
"run_sub_agent",
|
| 188 |
+
tool_name="restock_machine",
|
| 189 |
+
arguments={"product": product, "qty": qty},
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
def collect_cash(self) -> Any:
|
| 193 |
+
"""Delegate to sub-agent: collect cash from the vending machine."""
|
| 194 |
+
return self._call_tool("run_sub_agent", tool_name="collect_cash", arguments={})
|
| 195 |
+
|
| 196 |
+
def get_machine_inventory(self) -> Any:
|
| 197 |
+
"""Delegate to sub-agent: get current machine inventory."""
|
| 198 |
+
return self._call_tool(
|
| 199 |
+
"run_sub_agent",
|
| 200 |
+
tool_name="get_machine_inventory",
|
| 201 |
+
arguments={},
|
| 202 |
+
)
|
| 203 |
+
|
| 204 |
+
def chat_with_sub_agent(self, message: str) -> Any:
|
| 205 |
+
"""Message the sub-agent without taking action."""
|
| 206 |
+
return self._call_tool("chat_with_sub_agent", message=message)
|
| 207 |
+
|
| 208 |
+
def write_scratchpad(self, note: str) -> Any:
|
| 209 |
+
"""Append a note to working memory."""
|
| 210 |
+
return self._call_tool("write_scratchpad", note=note)
|
| 211 |
+
|
| 212 |
+
def read_scratchpad(self) -> Any:
|
| 213 |
+
"""Read the working-memory scratchpad."""
|
| 214 |
+
return self._call_tool("read_scratchpad")
|
| 215 |
+
|
| 216 |
+
def search_notes(self, query: str) -> Any:
|
| 217 |
+
"""Search saved notes for a keyword."""
|
| 218 |
+
return self._call_tool("search_notes", query=query)
|
| 219 |
+
|
| 220 |
+
def set_reminder(self, day: int, message: str) -> Any:
|
| 221 |
+
"""Schedule a future reminder."""
|
| 222 |
+
return self._call_tool("set_reminder", day=day, message=message)
|
| 223 |
+
|
| 224 |
+
def request_supplier_quote(self, product: str, qty: int) -> Any:
|
| 225 |
+
"""Request a price quote from a supplier for a product."""
|
| 226 |
+
return self._call_tool("request_supplier_quote", product=product, qty=qty)
|
| 227 |
+
|
| 228 |
+
def negotiate_supplier(self, quote_id: str, proposed_unit_price: float) -> Any:
|
| 229 |
+
"""Negotiate a supplier quote with a proposed unit price."""
|
| 230 |
+
return self._call_tool("negotiate_supplier", quote_id=quote_id, proposed_unit_price=proposed_unit_price)
|
| 231 |
+
|
| 232 |
+
def place_supplier_order(self, product: str, qty: int) -> Any:
|
| 233 |
+
"""Place a confirmed order with a supplier."""
|
| 234 |
+
return self._call_tool("place_supplier_order", product=product, qty=qty)
|
| 235 |
+
|
| 236 |
+
def check_delivery(self, order_id: str) -> Any:
|
| 237 |
+
"""Check the delivery status of a supplier order."""
|
| 238 |
+
return self._call_tool("check_delivery", order_id=order_id)
|
vendsim_vb2/compat.py
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Compatibility shims for optional third-party dependencies."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
+
from typing import Any, Callable
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
@dataclass(slots=True)
|
| 10 |
+
class Route:
|
| 11 |
+
method: str
|
| 12 |
+
path: str
|
| 13 |
+
endpoint: Callable[..., Any]
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
class FastAPI:
|
| 17 |
+
"""Small subset of FastAPI used for local smoke tests when FastAPI is absent."""
|
| 18 |
+
|
| 19 |
+
def __init__(self, *, title: str) -> None:
|
| 20 |
+
self.title = title
|
| 21 |
+
self.routes: list[Route] = []
|
| 22 |
+
|
| 23 |
+
def get(self, path: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
|
| 24 |
+
return self._register("GET", path)
|
| 25 |
+
|
| 26 |
+
def post(self, path: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
|
| 27 |
+
return self._register("POST", path)
|
| 28 |
+
|
| 29 |
+
def _register(
|
| 30 |
+
self, method: str, path: str
|
| 31 |
+
) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
|
| 32 |
+
def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
|
| 33 |
+
self.routes.append(Route(method=method, path=path, endpoint=func))
|
| 34 |
+
return func
|
| 35 |
+
|
| 36 |
+
return decorator
|
vendsim_vb2/config.py
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
@dataclass(slots=True)
|
| 7 |
+
class VB2Config:
|
| 8 |
+
starting_balance: float = 500.0
|
| 9 |
+
daily_machine_fee: float = 2.0
|
| 10 |
+
episode_days: int = 365
|
| 11 |
+
bankruptcy_consecutive_negative_days: int = 10
|
| 12 |
+
output_token_cost_per_million: float = 100.0
|
| 13 |
+
storage_address: str = "1680 Mission St, San Francisco"
|
| 14 |
+
machine_address: str = "1421 Bay St, San Francisco"
|
| 15 |
+
restock_travel_time_minutes: int = 75
|
| 16 |
+
supplier_message_time_minutes: int = 10
|
| 17 |
+
delivery_check_time_minutes: int = 5
|
| 18 |
+
minutes_per_day: int = 24 * 60
|
| 19 |
+
machine_small_rows: int = 2
|
| 20 |
+
machine_large_rows: int = 2
|
| 21 |
+
machine_slots_per_row: int = 3
|
vendsim_vb2/customer_service.py
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from random import Random
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
@dataclass(slots=True)
|
| 8 |
+
class ComplaintTicket:
|
| 9 |
+
ticket_id: str
|
| 10 |
+
type: str
|
| 11 |
+
day: int
|
| 12 |
+
amount: float
|
| 13 |
+
reason: str
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
class CustomerServiceEngine:
|
| 17 |
+
def __init__(self, seed: int | None = None) -> None:
|
| 18 |
+
self._rng = Random(seed)
|
| 19 |
+
self._ticket_counter = 0
|
| 20 |
+
|
| 21 |
+
def maybe_create_complaint(
|
| 22 |
+
self, day: int, sales: dict[str, int]
|
| 23 |
+
) -> ComplaintTicket | None:
|
| 24 |
+
total_units = sum(sales.values())
|
| 25 |
+
if total_units <= 0:
|
| 26 |
+
return None
|
| 27 |
+
complaint_probability = min(0.35, total_units / 150)
|
| 28 |
+
if self._rng.random() >= complaint_probability:
|
| 29 |
+
return None
|
| 30 |
+
self._ticket_counter += 1
|
| 31 |
+
amount = round(1.5 + self._rng.random() * 4.0, 2)
|
| 32 |
+
return ComplaintTicket(
|
| 33 |
+
ticket_id=f"ticket-{self._ticket_counter}",
|
| 34 |
+
type="refund_request",
|
| 35 |
+
day=day,
|
| 36 |
+
amount=amount,
|
| 37 |
+
reason="Customer reported a vending issue.",
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
def process_refund(self, cash_balance: float, amount: float) -> float:
|
| 41 |
+
return round(cash_balance - amount, 2)
|
vendsim_vb2/demand.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from random import Random
|
| 5 |
+
|
| 6 |
+
PRODUCTS: dict[str, dict[str, float | str]] = {
|
| 7 |
+
"soda": {
|
| 8 |
+
"size": "small",
|
| 9 |
+
"base_daily_demand": 7.0,
|
| 10 |
+
"ideal_price": 1.50,
|
| 11 |
+
"wholesale_price": 0.58,
|
| 12 |
+
"weather_bias": "hot",
|
| 13 |
+
},
|
| 14 |
+
"water": {
|
| 15 |
+
"size": "small",
|
| 16 |
+
"base_daily_demand": 6.0,
|
| 17 |
+
"ideal_price": 1.25,
|
| 18 |
+
"wholesale_price": 0.42,
|
| 19 |
+
"weather_bias": "hot",
|
| 20 |
+
},
|
| 21 |
+
"candy": {
|
| 22 |
+
"size": "small",
|
| 23 |
+
"base_daily_demand": 4.0,
|
| 24 |
+
"ideal_price": 1.25,
|
| 25 |
+
"wholesale_price": 0.35,
|
| 26 |
+
"weather_bias": "neutral",
|
| 27 |
+
},
|
| 28 |
+
"chips": {
|
| 29 |
+
"size": "large",
|
| 30 |
+
"base_daily_demand": 5.0,
|
| 31 |
+
"ideal_price": 2.00,
|
| 32 |
+
"wholesale_price": 0.72,
|
| 33 |
+
"weather_bias": "neutral",
|
| 34 |
+
},
|
| 35 |
+
"sandwich": {
|
| 36 |
+
"size": "large",
|
| 37 |
+
"base_daily_demand": 2.0,
|
| 38 |
+
"ideal_price": 4.50,
|
| 39 |
+
"wholesale_price": 2.20,
|
| 40 |
+
"weather_bias": "cold",
|
| 41 |
+
},
|
| 42 |
+
}
|
| 43 |
+
|
| 44 |
+
SEASON_MULTIPLIERS = {
|
| 45 |
+
"winter": 0.9,
|
| 46 |
+
"spring": 1.0,
|
| 47 |
+
"summer": 1.15,
|
| 48 |
+
"autumn": 1.0,
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
DAY_OF_WEEK_MULTIPLIERS = {
|
| 52 |
+
"monday": 0.95,
|
| 53 |
+
"tuesday": 1.0,
|
| 54 |
+
"wednesday": 1.0,
|
| 55 |
+
"thursday": 1.0,
|
| 56 |
+
"friday": 1.1,
|
| 57 |
+
"saturday": 1.2,
|
| 58 |
+
"sunday": 0.85,
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
WEATHER_MULTIPLIERS = {
|
| 62 |
+
"sunny": 1.15,
|
| 63 |
+
"cloudy": 1.0,
|
| 64 |
+
"rainy": 0.85,
|
| 65 |
+
"foggy": 0.9,
|
| 66 |
+
"heatwave": 1.25,
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
WEATHER_SEQUENCE = ["sunny", "cloudy", "rainy", "sunny", "foggy", "cloudy", "sunny"]
|
| 70 |
+
DAY_NAMES = [
|
| 71 |
+
"monday",
|
| 72 |
+
"tuesday",
|
| 73 |
+
"wednesday",
|
| 74 |
+
"thursday",
|
| 75 |
+
"friday",
|
| 76 |
+
"saturday",
|
| 77 |
+
"sunday",
|
| 78 |
+
]
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
@dataclass(slots=True)
|
| 82 |
+
class DailySalesResult:
|
| 83 |
+
units_sold: dict[str, int]
|
| 84 |
+
revenue: float
|
| 85 |
+
debug: dict[str, float]
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def season_for_day(day_index: int) -> str:
|
| 89 |
+
day_of_year = ((day_index - 1) % 365) + 1
|
| 90 |
+
if day_of_year <= 79:
|
| 91 |
+
return "winter"
|
| 92 |
+
if day_of_year <= 171:
|
| 93 |
+
return "spring"
|
| 94 |
+
if day_of_year <= 265:
|
| 95 |
+
return "summer"
|
| 96 |
+
if day_of_year <= 354:
|
| 97 |
+
return "autumn"
|
| 98 |
+
return "winter"
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
def day_of_week_for_day(day_index: int) -> str:
|
| 102 |
+
return DAY_NAMES[(day_index - 1) % len(DAY_NAMES)]
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
def weather_for_day(day_index: int) -> str:
|
| 106 |
+
return WEATHER_SEQUENCE[(day_index - 1) % len(WEATHER_SEQUENCE)]
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def _weather_bias_multiplier(product: str, weather: str) -> float:
|
| 110 |
+
bias = str(PRODUCTS.get(product, {}).get("weather_bias", "neutral"))
|
| 111 |
+
if bias == "hot" and weather in {"sunny", "heatwave"}:
|
| 112 |
+
return 1.1
|
| 113 |
+
if bias == "cold" and weather in {"rainy", "foggy"}:
|
| 114 |
+
return 1.08
|
| 115 |
+
return 1.0
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
def compute_daily_sales(
|
| 119 |
+
products: list[str],
|
| 120 |
+
prices: dict[str, float],
|
| 121 |
+
weather: str,
|
| 122 |
+
season: str,
|
| 123 |
+
day_of_week: str,
|
| 124 |
+
inventory: dict[str, int] | None = None,
|
| 125 |
+
seed: int | None = None,
|
| 126 |
+
) -> DailySalesResult:
|
| 127 |
+
rng = Random(seed)
|
| 128 |
+
choice_multiplier = 1.0 + min(max(len(products) - 1, 0), 5) * 0.05
|
| 129 |
+
weather_multiplier = WEATHER_MULTIPLIERS.get(weather, 1.0)
|
| 130 |
+
season_multiplier = SEASON_MULTIPLIERS.get(season, 1.0)
|
| 131 |
+
dow_multiplier = DAY_OF_WEEK_MULTIPLIERS.get(day_of_week, 1.0)
|
| 132 |
+
inventory = inventory or {}
|
| 133 |
+
|
| 134 |
+
units_sold: dict[str, int] = {}
|
| 135 |
+
revenue = 0.0
|
| 136 |
+
for product in products:
|
| 137 |
+
catalog = PRODUCTS.get(product, {})
|
| 138 |
+
base_demand = float(catalog.get("base_daily_demand", 1.0))
|
| 139 |
+
ideal_price = float(
|
| 140 |
+
catalog.get("ideal_price", max(prices.get(product, 1.0), 0.01))
|
| 141 |
+
)
|
| 142 |
+
price = float(prices.get(product, ideal_price))
|
| 143 |
+
price_multiplier = max(
|
| 144 |
+
0.15, 1.0 - ((price - ideal_price) / max(ideal_price, 0.01)) * 0.45
|
| 145 |
+
)
|
| 146 |
+
noise_multiplier = 0.9 + (rng.random() * 0.2)
|
| 147 |
+
expected_units = (
|
| 148 |
+
base_demand
|
| 149 |
+
* choice_multiplier
|
| 150 |
+
* weather_multiplier
|
| 151 |
+
* season_multiplier
|
| 152 |
+
* dow_multiplier
|
| 153 |
+
* price_multiplier
|
| 154 |
+
* _weather_bias_multiplier(product, weather)
|
| 155 |
+
* noise_multiplier
|
| 156 |
+
)
|
| 157 |
+
sold = max(0, int(round(expected_units)))
|
| 158 |
+
if product in inventory:
|
| 159 |
+
sold = min(sold, inventory[product])
|
| 160 |
+
units_sold[product] = sold
|
| 161 |
+
revenue += sold * price
|
| 162 |
+
|
| 163 |
+
debug = {
|
| 164 |
+
"choice_multiplier": round(choice_multiplier, 3),
|
| 165 |
+
"weather_multiplier": round(weather_multiplier, 3),
|
| 166 |
+
"season_multiplier": round(season_multiplier, 3),
|
| 167 |
+
"day_of_week_multiplier": round(dow_multiplier, 3),
|
| 168 |
+
}
|
| 169 |
+
return DailySalesResult(
|
| 170 |
+
units_sold=units_sold, revenue=round(revenue, 2), debug=debug
|
| 171 |
+
)
|
vendsim_vb2/environment.py
ADDED
|
@@ -0,0 +1,395 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from random import Random
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
from vendsim_vb2.billing import apply_weekly_costs
|
| 8 |
+
from vendsim_vb2.config import VB2Config
|
| 9 |
+
from vendsim_vb2.customer_service import CustomerServiceEngine
|
| 10 |
+
from vendsim_vb2.demand import (
|
| 11 |
+
PRODUCTS,
|
| 12 |
+
compute_daily_sales,
|
| 13 |
+
day_of_week_for_day,
|
| 14 |
+
season_for_day,
|
| 15 |
+
weather_for_day,
|
| 16 |
+
)
|
| 17 |
+
from vendsim_vb2.state import SimulationState
|
| 18 |
+
from vendsim_vb2.subagent import SubAgent
|
| 19 |
+
from vendsim_vb2.suppliers import SupplierEngine
|
| 20 |
+
from vendsim_vb2.tools.main_agent_tools import get_main_tool_specs
|
| 21 |
+
from vendsim_vb2.tools.memory_tools import get_memory_tool_specs
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@dataclass(slots=True)
|
| 25 |
+
class ToolCallResult:
|
| 26 |
+
status: str
|
| 27 |
+
payload: dict[str, Any]
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
class VendingBench2Environment:
|
| 31 |
+
def __init__(
|
| 32 |
+
self,
|
| 33 |
+
config: VB2Config | None = None,
|
| 34 |
+
seed: int | None = None,
|
| 35 |
+
use_dense_rewards: bool = False,
|
| 36 |
+
) -> None:
|
| 37 |
+
self.config = config or VB2Config()
|
| 38 |
+
self._seed = seed
|
| 39 |
+
self._rng = Random(seed)
|
| 40 |
+
self.use_dense_rewards = use_dense_rewards
|
| 41 |
+
self.suppliers = SupplierEngine(seed=seed)
|
| 42 |
+
self.customer_service = CustomerServiceEngine(seed=seed)
|
| 43 |
+
self.subagent = SubAgent(config=self.config)
|
| 44 |
+
self.state = self.reset()
|
| 45 |
+
|
| 46 |
+
def reset(self) -> SimulationState:
|
| 47 |
+
self._rng = Random(self._seed)
|
| 48 |
+
self.suppliers = SupplierEngine(seed=self._seed)
|
| 49 |
+
self.customer_service = CustomerServiceEngine(seed=self._seed)
|
| 50 |
+
self.subagent = SubAgent(config=self.config)
|
| 51 |
+
self.state = SimulationState.new_episode(self.config)
|
| 52 |
+
self.state.prices = {
|
| 53 |
+
product: float(spec["ideal_price"]) for product, spec in PRODUCTS.items()
|
| 54 |
+
}
|
| 55 |
+
return self.state
|
| 56 |
+
|
| 57 |
+
def tool_registry(self) -> dict[str, list[str]]:
|
| 58 |
+
return {
|
| 59 |
+
"main": [spec.name for spec in get_main_tool_specs()],
|
| 60 |
+
"memory": [spec.name for spec in get_memory_tool_specs()],
|
| 61 |
+
"subagent": list(self.subagent.specs()["tools"]),
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
def _log_email(
|
| 65 |
+
self,
|
| 66 |
+
*,
|
| 67 |
+
sender: str,
|
| 68 |
+
recipient: str,
|
| 69 |
+
subject: str,
|
| 70 |
+
body: str,
|
| 71 |
+
category: str = "email",
|
| 72 |
+
) -> None:
|
| 73 |
+
self.state.email_log.append(
|
| 74 |
+
{
|
| 75 |
+
"day": self.state.day_index,
|
| 76 |
+
"minute_of_day": self.state.minute_of_day,
|
| 77 |
+
"sender": sender,
|
| 78 |
+
"recipient": recipient,
|
| 79 |
+
"subject": subject,
|
| 80 |
+
"body": body,
|
| 81 |
+
"category": category,
|
| 82 |
+
}
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
def resolve_delivery(self, order_id: str) -> ToolCallResult:
|
| 86 |
+
"""Check delivery status; on success, add items to storage and charge cost."""
|
| 87 |
+
delivery = self.suppliers.simulate_delivery(order_id)
|
| 88 |
+
order = self.suppliers._orders[order_id]
|
| 89 |
+
if delivery.status in {"delivered", "delayed", "partial"} and delivery.delivered_qty > 0:
|
| 90 |
+
product = order.product
|
| 91 |
+
self.state.storage_inventory[product] = (
|
| 92 |
+
self.state.storage_inventory.get(product, 0) + delivery.delivered_qty
|
| 93 |
+
)
|
| 94 |
+
cost = round(delivery.final_unit_price * delivery.delivered_qty, 2)
|
| 95 |
+
self.state.cash_balance = round(self.state.cash_balance - cost, 2)
|
| 96 |
+
self.state.advance_minutes(self.config.delivery_check_time_minutes)
|
| 97 |
+
self._log_email(
|
| 98 |
+
sender=order.supplier_name,
|
| 99 |
+
recipient="charles.paxton",
|
| 100 |
+
subject=f"Delivery update for {order.order_id}",
|
| 101 |
+
body=(
|
| 102 |
+
f"Status={delivery.status}; delivered_qty={delivery.delivered_qty}; "
|
| 103 |
+
f"days_late={delivery.days_late}; final_unit_price={delivery.final_unit_price}"
|
| 104 |
+
),
|
| 105 |
+
category="supplier_delivery",
|
| 106 |
+
)
|
| 107 |
+
return ToolCallResult(
|
| 108 |
+
delivery.status,
|
| 109 |
+
{
|
| 110 |
+
"order_id": delivery.order_id,
|
| 111 |
+
"delivered_qty": delivery.delivered_qty,
|
| 112 |
+
"days_late": delivery.days_late,
|
| 113 |
+
"final_unit_price": delivery.final_unit_price,
|
| 114 |
+
},
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
def set_price(self, product: str, price: float) -> ToolCallResult:
|
| 118 |
+
self.state.prices[product] = round(price, 2)
|
| 119 |
+
self.state.advance_minutes(5)
|
| 120 |
+
return ToolCallResult(
|
| 121 |
+
"ok", {"product": product, "price": self.state.prices[product]}
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
def send_email(self, recipient: str, subject: str, body: str) -> ToolCallResult:
|
| 125 |
+
self._log_email(
|
| 126 |
+
sender="charles.paxton",
|
| 127 |
+
recipient=recipient,
|
| 128 |
+
subject=subject,
|
| 129 |
+
body=body,
|
| 130 |
+
category="manual_email",
|
| 131 |
+
)
|
| 132 |
+
self.state.advance_minutes(self.config.supplier_message_time_minutes)
|
| 133 |
+
return ToolCallResult("ok", {"recipient": recipient, "queued": True})
|
| 134 |
+
|
| 135 |
+
def check_balance(self) -> ToolCallResult:
|
| 136 |
+
self.state.advance_minutes(1)
|
| 137 |
+
return ToolCallResult("ok", {"cash_balance": round(self.state.cash_balance, 2)})
|
| 138 |
+
|
| 139 |
+
def check_storage_inventory(self) -> ToolCallResult:
|
| 140 |
+
self.state.advance_minutes(2)
|
| 141 |
+
return ToolCallResult(
|
| 142 |
+
"ok", {"storage_inventory": dict(self.state.storage_inventory)}
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
def chat_with_sub_agent(self, message: str) -> ToolCallResult:
|
| 146 |
+
self.state.subagent_chat_log.append(message)
|
| 147 |
+
self.state.advance_minutes(5)
|
| 148 |
+
return ToolCallResult("ok", {"message": message})
|
| 149 |
+
|
| 150 |
+
def request_supplier_quote(self, product: str, qty: int) -> ToolCallResult:
|
| 151 |
+
quote = self.suppliers.request_quote(product, qty)
|
| 152 |
+
subject = f"Quote request for {qty} units of {product}"
|
| 153 |
+
self._log_email(
|
| 154 |
+
sender="charles.paxton",
|
| 155 |
+
recipient=quote.supplier_name,
|
| 156 |
+
subject=subject,
|
| 157 |
+
body=f"Please quote {qty} units of {product}.",
|
| 158 |
+
category="supplier_quote_request",
|
| 159 |
+
)
|
| 160 |
+
self.state.advance_minutes(self.config.supplier_message_time_minutes)
|
| 161 |
+
self._log_email(
|
| 162 |
+
sender=quote.supplier_name,
|
| 163 |
+
recipient="charles.paxton",
|
| 164 |
+
subject=f"Quote response for {product}",
|
| 165 |
+
body=(
|
| 166 |
+
f"quote_id={quote.quote_id}; qty={quote.qty}; "
|
| 167 |
+
f"unit_price={quote.unit_price}; fair_unit_price={quote.fair_unit_price}"
|
| 168 |
+
),
|
| 169 |
+
category="supplier_quote_response",
|
| 170 |
+
)
|
| 171 |
+
return ToolCallResult(
|
| 172 |
+
"ok",
|
| 173 |
+
{
|
| 174 |
+
"quote_id": quote.quote_id,
|
| 175 |
+
"product": quote.product,
|
| 176 |
+
"qty": quote.qty,
|
| 177 |
+
"unit_price": quote.unit_price,
|
| 178 |
+
"supplier_name": quote.supplier_name,
|
| 179 |
+
},
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
def negotiate_supplier(
|
| 183 |
+
self, quote_id: str, proposed_unit_price: float
|
| 184 |
+
) -> ToolCallResult:
|
| 185 |
+
quote = self.suppliers._quotes[quote_id]
|
| 186 |
+
response = self.suppliers.negotiate(quote_id, proposed_unit_price)
|
| 187 |
+
self._log_email(
|
| 188 |
+
sender="charles.paxton",
|
| 189 |
+
recipient=quote.supplier_name,
|
| 190 |
+
subject=f"Counteroffer for {quote.product}",
|
| 191 |
+
body=(
|
| 192 |
+
f"quote_id={quote_id}; proposed_unit_price={round(proposed_unit_price, 2)}"
|
| 193 |
+
),
|
| 194 |
+
category="supplier_negotiation_request",
|
| 195 |
+
)
|
| 196 |
+
self.state.advance_minutes(self.config.supplier_message_time_minutes)
|
| 197 |
+
self._log_email(
|
| 198 |
+
sender=quote.supplier_name,
|
| 199 |
+
recipient="charles.paxton",
|
| 200 |
+
subject=f"Negotiation response for {quote.product}",
|
| 201 |
+
body=(
|
| 202 |
+
f"quote_id={response.quote_id}; status={response.status}; "
|
| 203 |
+
f"unit_price={response.unit_price}; message={response.message}"
|
| 204 |
+
),
|
| 205 |
+
category="supplier_negotiation_response",
|
| 206 |
+
)
|
| 207 |
+
return ToolCallResult(
|
| 208 |
+
response.status,
|
| 209 |
+
{
|
| 210 |
+
"quote_id": response.quote_id,
|
| 211 |
+
"unit_price": response.unit_price,
|
| 212 |
+
"message": response.message,
|
| 213 |
+
},
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
def place_supplier_order(self, product: str, qty: int) -> ToolCallResult:
|
| 217 |
+
order = self.suppliers.place_email_confirmed_order(product, qty)
|
| 218 |
+
self._log_email(
|
| 219 |
+
sender="charles.paxton",
|
| 220 |
+
recipient=order.supplier_name,
|
| 221 |
+
subject=f"Purchase order for {product}",
|
| 222 |
+
body=f"Please ship {qty} units of {product}.",
|
| 223 |
+
category="supplier_order_request",
|
| 224 |
+
)
|
| 225 |
+
self.state.advance_minutes(self.config.supplier_message_time_minutes)
|
| 226 |
+
self._log_email(
|
| 227 |
+
sender=order.supplier_name,
|
| 228 |
+
recipient="charles.paxton",
|
| 229 |
+
subject=f"Order confirmation for {product}",
|
| 230 |
+
body=(
|
| 231 |
+
f"order_id={order.order_id}; qty={order.qty}; unit_price={order.unit_price}; "
|
| 232 |
+
f"may_bait_and_switch={order.may_bait_and_switch}"
|
| 233 |
+
),
|
| 234 |
+
category="supplier_order_confirmation",
|
| 235 |
+
)
|
| 236 |
+
return ToolCallResult(
|
| 237 |
+
order.status,
|
| 238 |
+
{
|
| 239 |
+
"order_id": order.order_id,
|
| 240 |
+
"product": order.product,
|
| 241 |
+
"qty": order.qty,
|
| 242 |
+
"unit_price": order.unit_price,
|
| 243 |
+
"supplier_name": order.supplier_name,
|
| 244 |
+
},
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
def run_sub_agent(self, tool_name: str, **kwargs: Any) -> ToolCallResult:
|
| 248 |
+
if tool_name == "restock_machine":
|
| 249 |
+
product = str(kwargs["product"])
|
| 250 |
+
qty = int(kwargs["qty"])
|
| 251 |
+
available = self.state.storage_inventory.get(product, 0)
|
| 252 |
+
if available < qty:
|
| 253 |
+
return ToolCallResult(
|
| 254 |
+
"rejected",
|
| 255 |
+
{
|
| 256 |
+
"message": f"insufficient storage inventory for {product}",
|
| 257 |
+
"available": available,
|
| 258 |
+
},
|
| 259 |
+
)
|
| 260 |
+
result = self.subagent.restock_machine(product, qty)
|
| 261 |
+
if result.get("status") == "ok":
|
| 262 |
+
self.state.storage_inventory[product] = available - qty
|
| 263 |
+
if self.state.storage_inventory[product] == 0:
|
| 264 |
+
del self.state.storage_inventory[product]
|
| 265 |
+
self.state.machine_inventory = dict(self.subagent.machine_inventory)
|
| 266 |
+
self.state.advance_minutes(int(result["time_cost_minutes"]))
|
| 267 |
+
return ToolCallResult(str(result["status"]), dict(result))
|
| 268 |
+
if tool_name == "collect_cash":
|
| 269 |
+
self.subagent.machine_cash = self.state.machine_cash
|
| 270 |
+
result = self.subagent.collect_cash()
|
| 271 |
+
self.state.machine_cash = self.subagent.machine_cash
|
| 272 |
+
self.state.cash_balance = round(
|
| 273 |
+
self.state.cash_balance + float(result["amount_collected"]), 2
|
| 274 |
+
)
|
| 275 |
+
self.state.advance_minutes(int(result["time_cost_minutes"]))
|
| 276 |
+
return ToolCallResult("ok", dict(result))
|
| 277 |
+
if tool_name == "get_machine_inventory":
|
| 278 |
+
return ToolCallResult(
|
| 279 |
+
"ok", {"machine_inventory": self.subagent.get_machine_inventory()}
|
| 280 |
+
)
|
| 281 |
+
raise KeyError(f"unknown sub-agent tool: {tool_name}")
|
| 282 |
+
|
| 283 |
+
def write_scratchpad(self, note: str) -> ToolCallResult:
|
| 284 |
+
self.state.scratchpad.append(note)
|
| 285 |
+
self.state.notes.append(note)
|
| 286 |
+
return ToolCallResult("ok", {"note_count": len(self.state.scratchpad)})
|
| 287 |
+
|
| 288 |
+
def read_scratchpad(self) -> ToolCallResult:
|
| 289 |
+
return ToolCallResult("ok", {"scratchpad": list(self.state.scratchpad)})
|
| 290 |
+
|
| 291 |
+
def search_notes(self, query: str) -> ToolCallResult:
|
| 292 |
+
query_lower = query.lower()
|
| 293 |
+
matches = [note for note in self.state.notes if query_lower in note.lower()]
|
| 294 |
+
return ToolCallResult("ok", {"matches": matches})
|
| 295 |
+
|
| 296 |
+
def set_reminder(self, day: int, message: str) -> ToolCallResult:
|
| 297 |
+
self.state.add_reminder(day, message)
|
| 298 |
+
return ToolCallResult("ok", {"day": day, "message": message})
|
| 299 |
+
|
| 300 |
+
def record_output_tokens(self, count: int) -> None:
|
| 301 |
+
self.state.weekly_output_tokens += count
|
| 302 |
+
|
| 303 |
+
def wait_for_next_day(self, output_tokens: int = 0) -> ToolCallResult:
|
| 304 |
+
self.record_output_tokens(output_tokens)
|
| 305 |
+
weather = weather_for_day(self.state.day_index)
|
| 306 |
+
season = season_for_day(self.state.day_index)
|
| 307 |
+
day_of_week = day_of_week_for_day(self.state.day_index)
|
| 308 |
+
sales_result = compute_daily_sales(
|
| 309 |
+
products=list(self.state.machine_inventory),
|
| 310 |
+
prices=self.state.prices,
|
| 311 |
+
weather=weather,
|
| 312 |
+
season=season,
|
| 313 |
+
day_of_week=day_of_week,
|
| 314 |
+
inventory=self.state.machine_inventory,
|
| 315 |
+
seed=self._rng.randint(0, 1_000_000),
|
| 316 |
+
)
|
| 317 |
+
for product, sold in sales_result.units_sold.items():
|
| 318 |
+
if product in self.subagent.machine_inventory:
|
| 319 |
+
remaining = self.subagent.machine_inventory[product] - sold
|
| 320 |
+
if remaining > 0:
|
| 321 |
+
self.subagent.machine_inventory[product] = remaining
|
| 322 |
+
else:
|
| 323 |
+
del self.subagent.machine_inventory[product]
|
| 324 |
+
self.state.machine_inventory = dict(self.subagent.machine_inventory)
|
| 325 |
+
# Revenue goes into the machine coin box only — collected via collect_cash
|
| 326 |
+
self.state.machine_cash = round(
|
| 327 |
+
self.state.machine_cash + sales_result.revenue, 2
|
| 328 |
+
)
|
| 329 |
+
complaint = self.customer_service.maybe_create_complaint(
|
| 330 |
+
self.state.day_index, sales_result.units_sold
|
| 331 |
+
)
|
| 332 |
+
refund_amount = 0.0
|
| 333 |
+
if complaint is not None:
|
| 334 |
+
refund_amount = complaint.amount
|
| 335 |
+
self.state.cash_balance = self.customer_service.process_refund(
|
| 336 |
+
self.state.cash_balance, complaint.amount
|
| 337 |
+
)
|
| 338 |
+
self.state.cash_balance = round(
|
| 339 |
+
self.state.cash_balance - self.config.daily_machine_fee, 2
|
| 340 |
+
)
|
| 341 |
+
if self.state.cash_balance < 0:
|
| 342 |
+
self.state.consecutive_negative_days += 1
|
| 343 |
+
else:
|
| 344 |
+
self.state.consecutive_negative_days = 0
|
| 345 |
+
if self.state.day_index % 7 == 0:
|
| 346 |
+
self.state.cash_balance = apply_weekly_costs(
|
| 347 |
+
cash_balance=self.state.cash_balance
|
| 348 |
+
+ (self.config.daily_machine_fee * 7),
|
| 349 |
+
weekly_output_tokens=self.state.weekly_output_tokens,
|
| 350 |
+
token_cost_per_million=self.config.output_token_cost_per_million,
|
| 351 |
+
daily_fee=self.config.daily_machine_fee,
|
| 352 |
+
days_in_week=7,
|
| 353 |
+
)
|
| 354 |
+
self.state.weekly_output_tokens = 0
|
| 355 |
+
self.suppliers.tick_supplier_health(days=1)
|
| 356 |
+
self.state.daily_sales_history.append(
|
| 357 |
+
{
|
| 358 |
+
"day": self.state.day_index,
|
| 359 |
+
"weather": weather,
|
| 360 |
+
"season": season,
|
| 361 |
+
"day_of_week": day_of_week,
|
| 362 |
+
"sales": dict(sales_result.units_sold),
|
| 363 |
+
"revenue": sales_result.revenue,
|
| 364 |
+
"refund_amount": refund_amount,
|
| 365 |
+
"debug": dict(sales_result.debug),
|
| 366 |
+
}
|
| 367 |
+
)
|
| 368 |
+
self.state.day_index += 1
|
| 369 |
+
self.state.minute_of_day = 0
|
| 370 |
+
return ToolCallResult(
|
| 371 |
+
"ok",
|
| 372 |
+
{
|
| 373 |
+
"sales": dict(sales_result.units_sold),
|
| 374 |
+
"revenue": sales_result.revenue,
|
| 375 |
+
"weather": weather,
|
| 376 |
+
"refund_amount": refund_amount,
|
| 377 |
+
},
|
| 378 |
+
)
|
| 379 |
+
|
| 380 |
+
def final_score(self) -> float:
|
| 381 |
+
"""Score is final bank balance only (per spec)."""
|
| 382 |
+
return round(self.state.cash_balance, 2)
|
| 383 |
+
|
| 384 |
+
def is_done(self) -> bool:
|
| 385 |
+
return (
|
| 386 |
+
self.state.day_index > self.config.episode_days
|
| 387 |
+
or self.state.consecutive_negative_days
|
| 388 |
+
>= self.config.bankruptcy_consecutive_negative_days
|
| 389 |
+
)
|
| 390 |
+
|
| 391 |
+
def snapshot(self) -> dict[str, Any]:
|
| 392 |
+
data = self.state.snapshot()
|
| 393 |
+
data["tools"] = self.tool_registry()
|
| 394 |
+
data["done"] = self.is_done()
|
| 395 |
+
return data
|
vendsim_vb2/mcp_env.py
ADDED
|
@@ -0,0 +1,205 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from typing import Any, Optional
|
| 4 |
+
|
| 5 |
+
from fastmcp import FastMCP
|
| 6 |
+
|
| 7 |
+
from openenv.core.env_server.mcp_environment import MCPEnvironment
|
| 8 |
+
from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation
|
| 9 |
+
from openenv.core.env_server.types import Action, Observation
|
| 10 |
+
|
| 11 |
+
from vendsim_vb2.config import VB2Config
|
| 12 |
+
from vendsim_vb2.environment import VendingBench2Environment
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class VB2MCPEnvironment(MCPEnvironment):
|
| 16 |
+
"""OpenEnv MCP wrapper around VendingBench2Environment."""
|
| 17 |
+
|
| 18 |
+
def __init__(
|
| 19 |
+
self,
|
| 20 |
+
config: VB2Config | None = None,
|
| 21 |
+
seed: int | None = None,
|
| 22 |
+
use_dense_rewards: bool = False,
|
| 23 |
+
) -> None:
|
| 24 |
+
self._config = config or VB2Config()
|
| 25 |
+
self._seed = seed
|
| 26 |
+
self._use_dense_rewards = use_dense_rewards
|
| 27 |
+
self._inner_env: VendingBench2Environment | None = None
|
| 28 |
+
self._prev_score: float = 0.0
|
| 29 |
+
|
| 30 |
+
mcp = FastMCP("vending-bench-2")
|
| 31 |
+
self._register_tools(mcp)
|
| 32 |
+
super().__init__(mcp)
|
| 33 |
+
|
| 34 |
+
# ------------------------------------------------------------------
|
| 35 |
+
# Tool registration
|
| 36 |
+
# ------------------------------------------------------------------
|
| 37 |
+
|
| 38 |
+
def _register_tools(self, mcp: FastMCP) -> None:
|
| 39 |
+
env_ref = self
|
| 40 |
+
|
| 41 |
+
@mcp.tool()
|
| 42 |
+
def set_price(product: str, price: float) -> dict:
|
| 43 |
+
"""Update the price of a product in the vending machine."""
|
| 44 |
+
r = env_ref._inner_env.set_price(product, price)
|
| 45 |
+
return {"status": r.status, **r.payload}
|
| 46 |
+
|
| 47 |
+
@mcp.tool()
|
| 48 |
+
def send_email(recipient: str, subject: str, body: str) -> dict:
|
| 49 |
+
"""Send an email to a supplier or service provider."""
|
| 50 |
+
r = env_ref._inner_env.send_email(recipient, subject, body)
|
| 51 |
+
return {"status": r.status, **r.payload}
|
| 52 |
+
|
| 53 |
+
@mcp.tool()
|
| 54 |
+
def check_balance() -> dict:
|
| 55 |
+
"""Review current bank balance."""
|
| 56 |
+
r = env_ref._inner_env.check_balance()
|
| 57 |
+
return {"status": r.status, **r.payload}
|
| 58 |
+
|
| 59 |
+
@mcp.tool()
|
| 60 |
+
def check_storage_inventory() -> dict:
|
| 61 |
+
"""Inspect the storage inventory."""
|
| 62 |
+
r = env_ref._inner_env.check_storage_inventory()
|
| 63 |
+
return {"status": r.status, **r.payload}
|
| 64 |
+
|
| 65 |
+
@mcp.tool()
|
| 66 |
+
def wait_for_next_day(output_tokens: int = 0) -> dict:
|
| 67 |
+
"""Advance simulation to the next business day."""
|
| 68 |
+
r = env_ref._inner_env.wait_for_next_day(output_tokens)
|
| 69 |
+
return {"status": r.status, **r.payload}
|
| 70 |
+
|
| 71 |
+
@mcp.tool()
|
| 72 |
+
def run_sub_agent(tool_name: str, arguments: dict[str, Any] | None = None) -> dict:
|
| 73 |
+
"""Delegate a physical-world action to the sub-agent."""
|
| 74 |
+
r = env_ref._inner_env.run_sub_agent(tool_name, **(arguments or {}))
|
| 75 |
+
return {"status": r.status, **r.payload}
|
| 76 |
+
|
| 77 |
+
@mcp.tool()
|
| 78 |
+
def chat_with_sub_agent(message: str) -> dict:
|
| 79 |
+
"""Message the sub-agent without taking action."""
|
| 80 |
+
r = env_ref._inner_env.chat_with_sub_agent(message)
|
| 81 |
+
return {"status": r.status, **r.payload}
|
| 82 |
+
|
| 83 |
+
@mcp.tool()
|
| 84 |
+
def write_scratchpad(note: str) -> dict:
|
| 85 |
+
"""Append a note to working memory."""
|
| 86 |
+
r = env_ref._inner_env.write_scratchpad(note)
|
| 87 |
+
return {"status": r.status, **r.payload}
|
| 88 |
+
|
| 89 |
+
@mcp.tool()
|
| 90 |
+
def read_scratchpad() -> dict:
|
| 91 |
+
"""Read the working-memory scratchpad."""
|
| 92 |
+
r = env_ref._inner_env.read_scratchpad()
|
| 93 |
+
return {"status": r.status, **r.payload}
|
| 94 |
+
|
| 95 |
+
@mcp.tool()
|
| 96 |
+
def search_notes(query: str) -> dict:
|
| 97 |
+
"""Search saved notes for a keyword."""
|
| 98 |
+
r = env_ref._inner_env.search_notes(query)
|
| 99 |
+
return {"status": r.status, **r.payload}
|
| 100 |
+
|
| 101 |
+
@mcp.tool()
|
| 102 |
+
def set_reminder(day: int, message: str) -> dict:
|
| 103 |
+
"""Schedule a future reminder."""
|
| 104 |
+
r = env_ref._inner_env.set_reminder(day, message)
|
| 105 |
+
return {"status": r.status, **r.payload}
|
| 106 |
+
|
| 107 |
+
@mcp.tool()
|
| 108 |
+
def request_supplier_quote(product: str, qty: int) -> dict:
|
| 109 |
+
"""Request a price quote from a supplier for a product."""
|
| 110 |
+
r = env_ref._inner_env.request_supplier_quote(product, qty)
|
| 111 |
+
return {"status": r.status, **r.payload}
|
| 112 |
+
|
| 113 |
+
@mcp.tool()
|
| 114 |
+
def negotiate_supplier(quote_id: str, proposed_unit_price: float) -> dict:
|
| 115 |
+
"""Negotiate a supplier quote with a proposed unit price."""
|
| 116 |
+
r = env_ref._inner_env.negotiate_supplier(quote_id, proposed_unit_price)
|
| 117 |
+
return {"status": r.status, **r.payload}
|
| 118 |
+
|
| 119 |
+
@mcp.tool()
|
| 120 |
+
def place_supplier_order(product: str, qty: int) -> dict:
|
| 121 |
+
"""Place a confirmed order with a supplier."""
|
| 122 |
+
r = env_ref._inner_env.place_supplier_order(product, qty)
|
| 123 |
+
return {"status": r.status, **r.payload}
|
| 124 |
+
|
| 125 |
+
@mcp.tool()
|
| 126 |
+
def check_delivery(order_id: str) -> dict:
|
| 127 |
+
"""Check delivery status. On success, items are added to storage and cost is charged."""
|
| 128 |
+
r = env_ref._inner_env.resolve_delivery(order_id)
|
| 129 |
+
return {"status": r.status, **r.payload}
|
| 130 |
+
|
| 131 |
+
@mcp.tool()
|
| 132 |
+
def get_status() -> dict:
|
| 133 |
+
"""Return a full snapshot of the current environment state."""
|
| 134 |
+
return env_ref._inner_env.snapshot()
|
| 135 |
+
|
| 136 |
+
# ------------------------------------------------------------------
|
| 137 |
+
# MCPEnvironment interface
|
| 138 |
+
# ------------------------------------------------------------------
|
| 139 |
+
|
| 140 |
+
def step(
|
| 141 |
+
self,
|
| 142 |
+
action: Action,
|
| 143 |
+
timeout_s: Optional[float] = None,
|
| 144 |
+
**kwargs: Any,
|
| 145 |
+
) -> Observation:
|
| 146 |
+
"""Override step to propagate reward/done on the Observation object."""
|
| 147 |
+
obs = super().step(action, timeout_s=timeout_s, **kwargs)
|
| 148 |
+
if not isinstance(obs, CallToolObservation) or self._inner_env is None:
|
| 149 |
+
return obs
|
| 150 |
+
|
| 151 |
+
done = self._inner_env.is_done()
|
| 152 |
+
obs.done = done
|
| 153 |
+
|
| 154 |
+
if isinstance(action, CallToolAction) and action.tool_name == "wait_for_next_day":
|
| 155 |
+
if self._use_dense_rewards:
|
| 156 |
+
# Dense: per-day delta of bank balance
|
| 157 |
+
new_score = self._inner_env.final_score()
|
| 158 |
+
obs.reward = round(new_score - self._prev_score, 2)
|
| 159 |
+
self._prev_score = new_score
|
| 160 |
+
elif done:
|
| 161 |
+
# Sparse default: final bank balance at terminal step only
|
| 162 |
+
obs.reward = self._inner_env.final_score()
|
| 163 |
+
else:
|
| 164 |
+
obs.reward = 0.0
|
| 165 |
+
else:
|
| 166 |
+
obs.reward = 0.0
|
| 167 |
+
|
| 168 |
+
return obs
|
| 169 |
+
|
| 170 |
+
def reset(
|
| 171 |
+
self,
|
| 172 |
+
seed: int | None = None,
|
| 173 |
+
episode_id: str | None = None,
|
| 174 |
+
**kwargs: Any,
|
| 175 |
+
) -> CallToolObservation:
|
| 176 |
+
effective_seed = seed if seed is not None else self._seed
|
| 177 |
+
self._inner_env = VendingBench2Environment(
|
| 178 |
+
config=self._config,
|
| 179 |
+
seed=effective_seed,
|
| 180 |
+
use_dense_rewards=self._use_dense_rewards,
|
| 181 |
+
)
|
| 182 |
+
self._prev_score = self._inner_env.final_score()
|
| 183 |
+
snapshot = self._inner_env.snapshot()
|
| 184 |
+
snapshot["reward"] = 0.0
|
| 185 |
+
snapshot["done"] = False
|
| 186 |
+
return CallToolObservation(
|
| 187 |
+
tool_name="reset",
|
| 188 |
+
result=snapshot,
|
| 189 |
+
reward=0.0,
|
| 190 |
+
done=False,
|
| 191 |
+
)
|
| 192 |
+
|
| 193 |
+
def _step_impl(
|
| 194 |
+
self,
|
| 195 |
+
action: Action,
|
| 196 |
+
timeout_s: float | None = None,
|
| 197 |
+
**kwargs: Any,
|
| 198 |
+
) -> Observation:
|
| 199 |
+
raise NotImplementedError("All actions are routed through MCP tools.")
|
| 200 |
+
|
| 201 |
+
@property
|
| 202 |
+
def state(self) -> dict[str, Any]:
|
| 203 |
+
if self._inner_env is None:
|
| 204 |
+
return {}
|
| 205 |
+
return self._inner_env.snapshot()
|
vendsim_vb2/prompts.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
SYSTEM_PROMPT = """You are Charles Paxton, an autonomous AI agent running a vending machine business.
|
| 2 |
+
There is no user in this environment.
|
| 3 |
+
You have full agency to manage pricing, inventory, supplier negotiations, and reminders.
|
| 4 |
+
Your objective is to maximize final bank balance over a one-year operating horizon.
|
| 5 |
+
Weekly output token usage is billed at $100 per million output tokens.
|
| 6 |
+
"""
|
vendsim_vb2/rewards.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def compute_reward(
|
| 5 |
+
final_bank_balance: float, dense_components: list[float], use_dense: bool
|
| 6 |
+
) -> float:
|
| 7 |
+
if not use_dense:
|
| 8 |
+
return final_bank_balance
|
| 9 |
+
return final_bank_balance + sum(dense_components)
|
vendsim_vb2/server/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Server package for the Vending-Bench 2 app factory."""
|
vendsim_vb2/server/app.py
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from fastapi import FastAPI
|
| 4 |
+
from openenv.core.env_server.http_server import HTTPEnvServer
|
| 5 |
+
from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation
|
| 6 |
+
|
| 7 |
+
from vendsim_vb2.mcp_env import VB2MCPEnvironment
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def create_app() -> FastAPI:
|
| 11 |
+
app = FastAPI(title="Vending-Bench 2 Environment")
|
| 12 |
+
server = HTTPEnvServer(
|
| 13 |
+
env=VB2MCPEnvironment,
|
| 14 |
+
action_cls=CallToolAction,
|
| 15 |
+
observation_cls=CallToolObservation,
|
| 16 |
+
)
|
| 17 |
+
server.register_routes(app)
|
| 18 |
+
return app
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
app = create_app()
|
vendsim_vb2/state.py
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass, field
|
| 4 |
+
|
| 5 |
+
from vendsim_vb2.config import VB2Config
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
@dataclass(slots=True)
|
| 9 |
+
class Reminder:
|
| 10 |
+
day: int
|
| 11 |
+
message: str
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
@dataclass(slots=True)
|
| 15 |
+
class SimulationState:
|
| 16 |
+
day_index: int
|
| 17 |
+
minute_of_day: int
|
| 18 |
+
cash_balance: float
|
| 19 |
+
storage_inventory: dict[str, int] = field(default_factory=dict)
|
| 20 |
+
machine_inventory: dict[str, int] = field(default_factory=dict)
|
| 21 |
+
machine_cash: float = 0.0
|
| 22 |
+
weekly_output_tokens: int = 0
|
| 23 |
+
consecutive_negative_days: int = 0
|
| 24 |
+
scratchpad: list[str] = field(default_factory=list)
|
| 25 |
+
reminders: list[Reminder] = field(default_factory=list)
|
| 26 |
+
notes: list[str] = field(default_factory=list)
|
| 27 |
+
email_log: list[dict[str, object]] = field(default_factory=list)
|
| 28 |
+
subagent_chat_log: list[str] = field(default_factory=list)
|
| 29 |
+
daily_sales_history: list[dict[str, object]] = field(default_factory=list)
|
| 30 |
+
prices: dict[str, float] = field(default_factory=dict)
|
| 31 |
+
|
| 32 |
+
@classmethod
|
| 33 |
+
def new_episode(cls, config: VB2Config | None = None) -> "SimulationState":
|
| 34 |
+
cfg = config or VB2Config()
|
| 35 |
+
return cls(day_index=1, minute_of_day=0, cash_balance=cfg.starting_balance)
|
| 36 |
+
|
| 37 |
+
def advance_minutes(self, minutes: int) -> None:
|
| 38 |
+
if minutes < 0:
|
| 39 |
+
raise ValueError("minutes must be non-negative")
|
| 40 |
+
total = self.minute_of_day + minutes
|
| 41 |
+
self.day_index += total // (24 * 60)
|
| 42 |
+
self.minute_of_day = total % (24 * 60)
|
| 43 |
+
|
| 44 |
+
def add_reminder(self, day: int, message: str) -> None:
|
| 45 |
+
self.reminders.append(Reminder(day=day, message=message))
|
| 46 |
+
|
| 47 |
+
def snapshot(self) -> dict[str, object]:
|
| 48 |
+
return {
|
| 49 |
+
"day_index": self.day_index,
|
| 50 |
+
"minute_of_day": self.minute_of_day,
|
| 51 |
+
"cash_balance": round(self.cash_balance, 2),
|
| 52 |
+
"storage_inventory": dict(self.storage_inventory),
|
| 53 |
+
"machine_inventory": dict(self.machine_inventory),
|
| 54 |
+
"machine_cash": round(self.machine_cash, 2),
|
| 55 |
+
"weekly_output_tokens": self.weekly_output_tokens,
|
| 56 |
+
"consecutive_negative_days": self.consecutive_negative_days,
|
| 57 |
+
"scratchpad": list(self.scratchpad),
|
| 58 |
+
"reminders": [{"day": r.day, "message": r.message} for r in self.reminders],
|
| 59 |
+
"notes": list(self.notes),
|
| 60 |
+
"email_log": [dict(entry) for entry in self.email_log],
|
| 61 |
+
"subagent_chat_log": list(self.subagent_chat_log),
|
| 62 |
+
"prices": dict(self.prices),
|
| 63 |
+
}
|
vendsim_vb2/subagent.py
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass, field
|
| 4 |
+
|
| 5 |
+
from vendsim_vb2.config import VB2Config
|
| 6 |
+
from vendsim_vb2.demand import PRODUCTS
|
| 7 |
+
|
| 8 |
+
MACHINE_LAYOUT = {
|
| 9 |
+
"small_rows": 2,
|
| 10 |
+
"large_rows": 2,
|
| 11 |
+
"slots_per_row": 3,
|
| 12 |
+
"total_slots": 12,
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
RESTOCK_TRAVEL_TIME_MINUTES = 75
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
@dataclass(slots=True)
|
| 19 |
+
class SubAgent:
|
| 20 |
+
config: VB2Config = field(default_factory=VB2Config)
|
| 21 |
+
machine_inventory: dict[str, int] = field(default_factory=dict)
|
| 22 |
+
machine_cash: float = 0.0
|
| 23 |
+
|
| 24 |
+
def specs(self) -> dict[str, object]:
|
| 25 |
+
return {
|
| 26 |
+
"name": "physical-ops-sub-agent",
|
| 27 |
+
"tools": ["restock_machine", "collect_cash", "get_machine_inventory"],
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
def machine_layout(self) -> dict[str, int]:
|
| 31 |
+
return dict(MACHINE_LAYOUT)
|
| 32 |
+
|
| 33 |
+
def restock_machine(self, product: str, qty: int) -> dict[str, object]:
|
| 34 |
+
if qty <= 0:
|
| 35 |
+
return {"status": "rejected", "message": "qty must be positive"}
|
| 36 |
+
size = str(PRODUCTS.get(product, {}).get("size", "small"))
|
| 37 |
+
max_slots = MACHINE_LAYOUT[f"{size}_rows"] * MACHINE_LAYOUT["slots_per_row"]
|
| 38 |
+
current = sum(
|
| 39 |
+
units
|
| 40 |
+
for stocked_product, units in self.machine_inventory.items()
|
| 41 |
+
if str(PRODUCTS.get(stocked_product, {}).get("size", "small")) == size
|
| 42 |
+
)
|
| 43 |
+
if current + qty > max_slots:
|
| 44 |
+
return {"status": "rejected", "message": f"{size} slots full"}
|
| 45 |
+
self.machine_inventory[product] = self.machine_inventory.get(product, 0) + qty
|
| 46 |
+
return {
|
| 47 |
+
"status": "ok",
|
| 48 |
+
"time_cost_minutes": self.config.restock_travel_time_minutes,
|
| 49 |
+
"machine_inventory": dict(self.machine_inventory),
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
def collect_cash(self) -> dict[str, object]:
|
| 53 |
+
collected = round(self.machine_cash, 2)
|
| 54 |
+
self.machine_cash = 0.0
|
| 55 |
+
return {
|
| 56 |
+
"status": "ok",
|
| 57 |
+
"amount_collected": collected,
|
| 58 |
+
"time_cost_minutes": self.config.restock_travel_time_minutes,
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
def get_machine_inventory(self) -> dict[str, int]:
|
| 62 |
+
return dict(self.machine_inventory)
|
vendsim_vb2/suppliers.py
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from random import Random
|
| 5 |
+
|
| 6 |
+
from vendsim_vb2.demand import PRODUCTS
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
@dataclass(slots=True)
|
| 10 |
+
class Quote:
|
| 11 |
+
quote_id: str
|
| 12 |
+
product: str
|
| 13 |
+
qty: int
|
| 14 |
+
unit_price: float
|
| 15 |
+
fair_unit_price: float
|
| 16 |
+
supplier_name: str
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
@dataclass(slots=True)
|
| 20 |
+
class NegotiationResponse:
|
| 21 |
+
quote_id: str
|
| 22 |
+
status: str
|
| 23 |
+
unit_price: float
|
| 24 |
+
message: str
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
@dataclass(slots=True)
|
| 28 |
+
class SupplierOrder:
|
| 29 |
+
order_id: str
|
| 30 |
+
product: str
|
| 31 |
+
qty: int
|
| 32 |
+
unit_price: float
|
| 33 |
+
supplier_name: str
|
| 34 |
+
may_bait_and_switch: bool
|
| 35 |
+
status: str = "confirmed"
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
@dataclass(slots=True)
|
| 39 |
+
class DeliveryTimeline:
|
| 40 |
+
order_id: str
|
| 41 |
+
status: str
|
| 42 |
+
delivered_qty: int
|
| 43 |
+
days_late: int
|
| 44 |
+
final_unit_price: float
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
class SupplierEngine:
|
| 48 |
+
def __init__(self, seed: int | None = None) -> None:
|
| 49 |
+
self._rng = Random(seed)
|
| 50 |
+
self._quotes: dict[str, Quote] = {}
|
| 51 |
+
self._orders: dict[str, SupplierOrder] = {}
|
| 52 |
+
self._resolved_deliveries: dict[str, DeliveryTimeline] = {}
|
| 53 |
+
self._quote_counter = 0
|
| 54 |
+
self._order_counter = 0
|
| 55 |
+
self._health = "active"
|
| 56 |
+
|
| 57 |
+
def request_quote(self, product: str, qty: int) -> Quote:
|
| 58 |
+
fair_price = float(PRODUCTS.get(product, {}).get("wholesale_price", 1.0))
|
| 59 |
+
markup = 0.85 + (self._rng.random() * 1.4)
|
| 60 |
+
quoted_price = round(fair_price * markup, 2)
|
| 61 |
+
self._quote_counter += 1
|
| 62 |
+
quote = Quote(
|
| 63 |
+
quote_id=f"quote-{self._quote_counter}",
|
| 64 |
+
product=product,
|
| 65 |
+
qty=qty,
|
| 66 |
+
unit_price=quoted_price,
|
| 67 |
+
fair_unit_price=round(fair_price, 2),
|
| 68 |
+
supplier_name=f"supplier-{self._rng.randint(1, 5)}",
|
| 69 |
+
)
|
| 70 |
+
self._quotes[quote.quote_id] = quote
|
| 71 |
+
return quote
|
| 72 |
+
|
| 73 |
+
def negotiate(
|
| 74 |
+
self, quote_id: str, proposed_unit_price: float
|
| 75 |
+
) -> NegotiationResponse:
|
| 76 |
+
quote = self._quotes[quote_id]
|
| 77 |
+
floor_price = round(quote.fair_unit_price * 0.9, 2)
|
| 78 |
+
if proposed_unit_price >= quote.unit_price:
|
| 79 |
+
return NegotiationResponse(
|
| 80 |
+
quote_id=quote_id,
|
| 81 |
+
status="accepted",
|
| 82 |
+
unit_price=round(proposed_unit_price, 2),
|
| 83 |
+
message="Accepted at your proposed price.",
|
| 84 |
+
)
|
| 85 |
+
if proposed_unit_price >= floor_price:
|
| 86 |
+
if self._rng.random() < 0.55:
|
| 87 |
+
return NegotiationResponse(
|
| 88 |
+
quote_id=quote_id,
|
| 89 |
+
status="accepted",
|
| 90 |
+
unit_price=round(proposed_unit_price, 2),
|
| 91 |
+
message="Accepted after negotiation.",
|
| 92 |
+
)
|
| 93 |
+
counter_price = round((proposed_unit_price + quote.unit_price) / 2, 2)
|
| 94 |
+
return NegotiationResponse(
|
| 95 |
+
quote_id=quote_id,
|
| 96 |
+
status="countered",
|
| 97 |
+
unit_price=counter_price,
|
| 98 |
+
message="Counteroffer issued.",
|
| 99 |
+
)
|
| 100 |
+
return NegotiationResponse(
|
| 101 |
+
quote_id=quote_id,
|
| 102 |
+
status="rejected",
|
| 103 |
+
unit_price=quote.unit_price,
|
| 104 |
+
message="Offer too low.",
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
def place_email_confirmed_order(self, product: str, qty: int) -> SupplierOrder:
|
| 108 |
+
fair_price = float(PRODUCTS.get(product, {}).get("wholesale_price", 1.0))
|
| 109 |
+
unit_price = round(fair_price * (0.95 + self._rng.random() * 0.5), 2)
|
| 110 |
+
self._order_counter += 1
|
| 111 |
+
order = SupplierOrder(
|
| 112 |
+
order_id=f"order-{self._order_counter}",
|
| 113 |
+
product=product,
|
| 114 |
+
qty=qty,
|
| 115 |
+
unit_price=unit_price,
|
| 116 |
+
supplier_name=f"supplier-{self._rng.randint(1, 5)}",
|
| 117 |
+
may_bait_and_switch=self._rng.random() < 0.35,
|
| 118 |
+
)
|
| 119 |
+
self._orders[order.order_id] = order
|
| 120 |
+
return order
|
| 121 |
+
|
| 122 |
+
def simulate_delivery(self, order_id: str) -> DeliveryTimeline:
|
| 123 |
+
# Return cached result if already resolved (idempotent)
|
| 124 |
+
if order_id in self._resolved_deliveries:
|
| 125 |
+
return self._resolved_deliveries[order_id]
|
| 126 |
+
|
| 127 |
+
order = self._orders[order_id]
|
| 128 |
+
if self._health == "out_of_business":
|
| 129 |
+
result = DeliveryTimeline(
|
| 130 |
+
order_id=order_id,
|
| 131 |
+
status="failed",
|
| 132 |
+
delivered_qty=0,
|
| 133 |
+
days_late=0,
|
| 134 |
+
final_unit_price=order.unit_price,
|
| 135 |
+
)
|
| 136 |
+
self._resolved_deliveries[order_id] = result
|
| 137 |
+
return result
|
| 138 |
+
roll = self._rng.random()
|
| 139 |
+
if roll < 0.55:
|
| 140 |
+
status = "delivered"
|
| 141 |
+
delivered_qty = order.qty
|
| 142 |
+
days_late = 0
|
| 143 |
+
elif roll < 0.8:
|
| 144 |
+
status = "delayed"
|
| 145 |
+
delivered_qty = order.qty
|
| 146 |
+
days_late = self._rng.randint(1, 7)
|
| 147 |
+
elif roll < 0.92:
|
| 148 |
+
status = "partial"
|
| 149 |
+
delivered_qty = max(1, int(order.qty * (0.4 + self._rng.random() * 0.4)))
|
| 150 |
+
days_late = self._rng.randint(0, 5)
|
| 151 |
+
else:
|
| 152 |
+
status = "failed"
|
| 153 |
+
delivered_qty = 0
|
| 154 |
+
days_late = 0
|
| 155 |
+
final_unit_price = order.unit_price
|
| 156 |
+
if (
|
| 157 |
+
order.may_bait_and_switch
|
| 158 |
+
and status in {"delivered", "delayed", "partial"}
|
| 159 |
+
and self._rng.random() < 0.5
|
| 160 |
+
):
|
| 161 |
+
final_unit_price = round(
|
| 162 |
+
order.unit_price * (1.05 + self._rng.random() * 0.25), 2
|
| 163 |
+
)
|
| 164 |
+
result = DeliveryTimeline(
|
| 165 |
+
order_id=order_id,
|
| 166 |
+
status=status,
|
| 167 |
+
delivered_qty=delivered_qty,
|
| 168 |
+
days_late=days_late,
|
| 169 |
+
final_unit_price=final_unit_price,
|
| 170 |
+
)
|
| 171 |
+
self._resolved_deliveries[order_id] = result
|
| 172 |
+
return result
|
| 173 |
+
|
| 174 |
+
def tick_supplier_health(self, days: int = 1) -> str:
|
| 175 |
+
if self._health == "out_of_business":
|
| 176 |
+
return self._health
|
| 177 |
+
failure_risk = min(0.45, days / 365 * 0.7)
|
| 178 |
+
if self._rng.random() < failure_risk:
|
| 179 |
+
self._health = "out_of_business"
|
| 180 |
+
return self._health
|
vendsim_vb2/tools/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Tool registries for the Vending-Bench 2 environment."""
|
vendsim_vb2/tools/main_agent_tools.py
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
@dataclass(frozen=True, slots=True)
|
| 7 |
+
class ToolSpec:
|
| 8 |
+
name: str
|
| 9 |
+
description: str
|
| 10 |
+
time_cost_minutes: int
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
MAIN_TOOL_SPECS: tuple[ToolSpec, ...] = (
|
| 14 |
+
ToolSpec("set_price", "Update the price of a product in the vending machine.", 5),
|
| 15 |
+
ToolSpec("send_email", "Send an email to a supplier or service provider.", 10),
|
| 16 |
+
ToolSpec("check_balance", "Review current bank balance.", 1),
|
| 17 |
+
ToolSpec("check_storage_inventory", "Inspect the storage inventory.", 2),
|
| 18 |
+
ToolSpec("wait_for_next_day", "Advance simulation to the next business day.", 0),
|
| 19 |
+
ToolSpec("run_sub_agent", "Delegate a physical-world action to the sub-agent.", 0),
|
| 20 |
+
ToolSpec("chat_with_sub_agent", "Message the sub-agent without taking action.", 5),
|
| 21 |
+
ToolSpec("request_supplier_quote", "Request a quote from a supplier.", 10),
|
| 22 |
+
ToolSpec("negotiate_supplier", "Negotiate pricing with a supplier.", 10),
|
| 23 |
+
ToolSpec("place_supplier_order", "Place a supplier order after email confirmation.", 10),
|
| 24 |
+
ToolSpec("check_delivery", "Check the delivery status of a supplier order.", 5),
|
| 25 |
+
ToolSpec("get_status", "Return a full environment snapshot.", 0),
|
| 26 |
+
)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def list_main_tools() -> list[str]:
|
| 30 |
+
return [spec.name for spec in MAIN_TOOL_SPECS]
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def get_main_tool_specs() -> tuple[ToolSpec, ...]:
|
| 34 |
+
return MAIN_TOOL_SPECS
|
vendsim_vb2/tools/memory_tools.py
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
@dataclass(frozen=True, slots=True)
|
| 7 |
+
class MemoryToolSpec:
|
| 8 |
+
name: str
|
| 9 |
+
description: str
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
MEMORY_TOOL_SPECS: tuple[MemoryToolSpec, ...] = (
|
| 13 |
+
MemoryToolSpec("write_scratchpad", "Append a note to working memory."),
|
| 14 |
+
MemoryToolSpec("read_scratchpad", "Read the working-memory scratchpad."),
|
| 15 |
+
MemoryToolSpec("search_notes", "Search saved notes for a keyword."),
|
| 16 |
+
MemoryToolSpec("set_reminder", "Schedule a future reminder."),
|
| 17 |
+
)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def list_memory_tools() -> list[str]:
|
| 21 |
+
return [spec.name for spec in MEMORY_TOOL_SPECS]
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def get_memory_tool_specs() -> tuple[MemoryToolSpec, ...]:
|
| 25 |
+
return MEMORY_TOOL_SPECS
|