Spaces:

huzzle-labs
/

spreadsheet

Sleeping

App Files Files Community

kdemon1011 commited on 30 days ago

Commit

6b4e5a8

verified ·

1 Parent(s): 1a4ae64

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Dockerfile +41 -0
README.md +78 -5
__init__.py +11 -0
client.py +100 -0
comparison.md +96 -0
generate_scenarios.py +1336 -0
models.py +86 -0
openenv.yaml +8 -0
openenv_spreadsheet.egg-info/PKG-INFO +17 -0
openenv_spreadsheet.egg-info/SOURCES.txt +24 -0
openenv_spreadsheet.egg-info/dependency_links.txt +1 -0
openenv_spreadsheet.egg-info/entry_points.txt +2 -0
openenv_spreadsheet.egg-info/requires.txt +13 -0
openenv_spreadsheet.egg-info/top_level.txt +1 -0
pyproject.toml +37 -0
scenarios/.gitkeep +0 -0
scenarios/buggy_template_fix_01.json +8 -0
scenarios/conditional_aggregation_01.json +8 -0
scenarios/conditional_aggregation_02.json +8 -0
scenarios/cross_sheet_lookup_01.json +8 -0
scenarios/cross_sheet_lookup_02.json +8 -0
scenarios/formula_repair_01.json +8 -0
scenarios/formula_repair_02.json +8 -0
scenarios/ledger_reconciliation_01.json +8 -0
scenarios/ledger_reconciliation_02.json +8 -0
scenarios/messy_table_extraction_01.json +8 -0
scenarios/range_transformation_01.json +8 -0
scenarios/schedule_grid_fill_01.json +8 -0
server/__init__.py +11 -0
server/app.py +41 -0
server/formula_utils.py +39 -0
server/scenario_loader.py +62 -0
server/spreadsheet_environment.py +445 -0
server/workbook_engine.py +564 -0
spreadsheet.egg-info/PKG-INFO +16 -0
spreadsheet.egg-info/SOURCES.txt +22 -0
spreadsheet.egg-info/dependency_links.txt +1 -0
spreadsheet.egg-info/entry_points.txt +2 -0
spreadsheet.egg-info/requires.txt +13 -0
spreadsheet.egg-info/top_level.txt +1 -0
uv.lock +0 -0
workbooks/fixtures/.gitkeep +0 -0
workbooks/fixtures/037858b5-3d0e-4714-8640-2dea23fc3a18_multi_currency_reconciliation.xlsx +0 -0
workbooks/fixtures/1333ba32-7957-4f7f-b310-6a9ba0e718bd_data_pivot_reshape.xlsx +0 -0
workbooks/fixtures/15123e53-9510-48d4-ae1a-a01556145b8e_employee_bonus_calculation.xlsx +0 -0
workbooks/fixtures/158cebfc-4813-49c4-bd54-fceef44c4860_employee_schedule_grid.xlsx +0 -0
workbooks/fixtures/19d8a671-1769-45aa-af51-39d12e81d45c_multi_currency_reconciliation.xlsx +0 -0
workbooks/fixtures/30f43287-34a1-4620-a9ae-3d982705a5e5_bank_reconciliation.xlsx +0 -0
workbooks/fixtures/45bb730b-c042-491e-9f7d-ff9ff3de25a6_cascading_formula_errors.xlsx +0 -0
workbooks/fixtures/5a43518a-7b4c-4e49-a2e3-ae0e550f5351_multi_department_budget.xlsx +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,41 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git curl && \
+    rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then uv sync --frozen --no-install-project --no-editable; \
+    else uv sync --no-install-project --no-editable; fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then uv sync --frozen --no-editable; \
+    else uv sync --no-editable; fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+ENV ENABLE_WEB_INTERFACE=true
+ENV WORKBOOKS_DIR=/app/env/workbooks
+ENV SCENARIOS_DIR=/app/env/scenarios
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
+    CMD curl -sf http://localhost:8000/health || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,83 @@
 ---
-title: Spreadsheet
-emoji: 🚀
-colorFrom: red
-colorTo: red
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: "Spreadsheet Environment Server"
+emoji: 📊
+colorFrom: green
+colorTo: blue
 sdk: docker
 pinned: false
+license: mit
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+  - rl-environment
 ---
+# Spreadsheet Environment
+Exact workbook manipulation and reasoning over realistic spreadsheet tasks. This gym targets weaknesses in structured state tracking, cross-sheet reasoning, non-standard table layouts, and exact edit correctness.
+## Quick Start
+```bash
+cd spreadsheet && docker build -t openenv-spreadsheet -f server/Dockerfile .
+docker run -d --name spreadsheet -p 8000:8000 openenv-spreadsheet
+curl http://localhost:8000/health
+```
+```python
+from spreadsheet import SpreadsheetEnv
+with SpreadsheetEnv(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    # Use MCP tools: list_sheets, read_range, write_cell, submit_workbook, etc.
+```
+## Project Structure
+```
+spreadsheet/
+├── __init__.py
+├── client.py
+├── models.py
+├── openenv.yaml
+├── pyproject.toml
+├── README.md
+├── .env
+├── .dockerignore
+├── uv.lock
+├── server/
+│   ├── __init__.py
+│   ├── app.py
+│   ├── spreadsheet_environment.py
+│   ├── workbook_engine.py
+│   ├── formula_utils.py
+│   └── scenario_loader.py
+├── workbooks/
+│   ├── templates/
+│   ├── fixtures/
+│   └── hidden_tests/
+├── scenarios/
+└── server/Dockerfile
+```
+## Reward System
+Both reward modes use a unified scoring formula:
+```
+total = 0.25 × quality + 0.15 × efficiency + 0.60 × ground_truth + penalty
+```
+- **Quality (0.25)** — Custom mode: F1 of expected vs used tools + success rate. OpenEnv mode: fraction of non-neutral steps that were productive (sign-based).
+- **Efficiency (0.15)** — `1.0 - (actual_steps / max_steps)`. Fewer steps = higher score.
+- **Ground Truth (0.60)** — Outcome checks verified against submit_workbook hidden test results (pass rate of cell/formula checks).
+- **Penalty** — Graduated: -0.5 (all calls succeed, 0% ground truth) or -0.2 (<30% ground truth).
+See [Reward System](../docs/reward-system.md) for full details.
+## Deployment
+```bash
+openenv push . --private --repo-id huzzle-labs/spreadsheet
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+"""Spreadsheet Environment."""
+from .client import SpreadsheetEnv
+from .models import SpreadsheetAction, SpreadsheetObservation, SpreadsheetState
+__all__ = [
+    "SpreadsheetAction",
+    "SpreadsheetObservation",
+    "SpreadsheetState",
+    "SpreadsheetEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Spreadsheet Environment Client.
+Connects to a running Spreadsheet OpenEnv server over HTTP/WebSocket.
+Agents interact via MCP tools (read_range, write_cell, submit_workbook, etc.).
+"""
+from __future__ import annotations
+from typing import Any, Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction, Tool
+from .models import SpreadsheetAction, SpreadsheetObservation, SpreadsheetState
+class SpreadsheetEnv(
+    EnvClient[SpreadsheetAction, SpreadsheetObservation, SpreadsheetState]
+):
+    """Client for the Spreadsheet Environment.
+    Example:
+        >>> with SpreadsheetEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     result = client.step(
+        ...         SpreadsheetAction(tool_name="list_scenarios", arguments_json="{}")
+        ...     )
+        ...     result = client.step(
+        ...         SpreadsheetAction(
+        ...             tool_name="read_range",
+        ...             arguments_json='{"sheet":"Summary","range":"A1:D10"}'
+        ...         )
+        ...     )
+    """
+    def list_tools(self, use_cache: bool = True):
+        if use_cache and hasattr(self, "_tools_cache") and self._tools_cache:
+            return self._tools_cache
+        import requests
+        http_base = (
+            self._ws_url
+            .replace("ws://", "http://")
+            .replace("wss://", "https://")
+            .rstrip("/ws")
+        )
+        resp = requests.post(
+            f"{http_base}/step",
+            json={"action": {"type": "list_tools"}},
+        )
+        data = resp.json()
+        raw_tools = data.get("observation", {}).get("tools", [])
+        tools = [
+            Tool(
+                name=t["name"],
+                description=t.get("description", ""),
+                input_schema=t.get("input_schema", {}),
+            )
+            for t in raw_tools
+        ]
+        self._tools_cache = tools
+        return tools
+    def _step_payload(self, action: Any) -> Dict:
+        if hasattr(action, "to_mcp_action"):
+            action = action.to_mcp_action()
+        if isinstance(action, ListToolsAction):
+            return {"type": "list_tools"}
+        if isinstance(action, CallToolAction):
+            return {
+                "type": "call_tool",
+                "tool_name": action.tool_name,
+                "arguments": action.arguments or {},
+            }
+        if hasattr(action, "model_dump"):
+            return action.model_dump()
+        return {"tool_name": getattr(action, "tool_name", ""), "arguments": {}}
+    def _parse_result(self, payload: Dict) -> StepResult[SpreadsheetObservation]:
+        obs_data = payload.get("observation", payload)
+        observation = SpreadsheetObservation(
+            tool_name=obs_data.get("tool_name", ""),
+            result=obs_data.get("result"),
+            error=obs_data.get("error"),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict[str, Any]) -> SpreadsheetState:
+        return SpreadsheetState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

comparison.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# Spreadsheet Gym — Model Comparison
+**5 SOTA models × 2 reward modes** evaluated on 12 scenarios.
+## Summary
+| Model | Custom Avg | OpenEnv Avg | GT Pass Rate | Avg Steps | Time |
+|---|:---:|:---:|:---:|:---:|:---:|
+| **gpt-5.4** | **0.67** | **0.65** | 10/12 (83%) | 7.8 | 161s |
+| claude-opus-4-20250514 | 0.39 | 0.46 | 7/12 (58%) | 33.6 | 1759s |
+| claude-sonnet-4-6 | 0.33 | 0.44 | 6/12 (50%) | 18.9 | 895s |
+| claude-opus-4-6 | -0.03 | 0.39 | 5/12 (42%) | 41.3 | 1062s |
+| gpt-5 | -0.44 | 0.14 | 1/12 (8%) | 8.8 | 1876s |
+**Best model:** gpt-5.4 — highest scores on both reward modes, fastest execution, most scenarios solved.
+## Per-Scenario Breakdown (Custom Mode)
+| Scenario | gpt-5.4 | sonnet-4-6 | opus-4-6 | opus-0514 | gpt-5 |
+|---|:---:|:---:|:---:|:---:|:---:|
+| buggy_template_fix_01 | **0.92** | 0.94 | 0.94 | 0.85 | 0.89 |
+| conditional_aggregation_01 | **0.96** | 0.87 | 0.81 | -0.78 | -0.71 |
+| conditional_aggregation_02 | **0.91** | -0.68 | -0.66 | 0.85 | -0.68 |
+| cross_sheet_lookup_01 | **0.92** | -0.68 | -0.70 | -0.80 | -0.68 |
+| cross_sheet_lookup_02 | **0.95** | 0.23 | 0.82 | 0.16 | -0.68 |
+| formula_repair_01 | **0.88** | 0.88 | -0.69 | 0.86 | -0.72 |
+| formula_repair_02 | **0.93** | 0.89 | 0.91 | 0.91 | 0.92 |
+| ledger_reconciliation_01 | -0.66 | 0.28 | -0.68 | **0.83** | -0.68 |
+| ledger_reconciliation_02 | **0.92** | 0.22 | -0.66 | 0.13 | -0.80 |
+| messy_table_extraction_01 | -0.66 | **0.83** | -0.68 | ERROR | -0.75 |
+| range_transformation_01 | **0.98** | 0.86 | 0.91 | 0.83 | -0.68 |
+| schedule_grid_fill_01 | **0.96** | -0.68 | -0.68 | 0.88 | -0.69 |
+## Per-Scenario Breakdown (OpenEnv Mode)
+| Scenario | gpt-5.4 | sonnet-4-6 | opus-4-6 | opus-0514 | gpt-5 |
+|---|:---:|:---:|:---:|:---:|:---:|
+| buggy_template_fix_01 | **0.76** | 0.76 | 0.76 | 0.74 | 0.15 |
+| conditional_aggregation_01 | **0.76** | 0.74 | 0.73 | 0.74 | 0.14 |
+| conditional_aggregation_02 | **0.76** | 0.14 | 0.14 | 0.74 | 0.14 |
+| cross_sheet_lookup_01 | **0.75** | 0.14 | 0.14 | 0.14 | 0.14 |
+| cross_sheet_lookup_02 | **0.76** | 0.74 | 0.73 | 0.74 | 0.14 |
+| formula_repair_01 | **0.75** | 0.14 | 0.14 | ERROR | 0.15 |
+| formula_repair_02 | **0.76** | 0.75 | 0.75 | ERROR | 0.15 |
+| ledger_reconciliation_01 | 0.14 | **0.74** | 0.14 | ERROR | 0.14 |
+| ledger_reconciliation_02 | **0.75** | 0.14 | 0.14 | 0.13 | 0.14 |
+| messy_table_extraction_01 | 0.14 | 0.13 | 0.14 | **0.75** | 0.14 |
+| range_transformation_01 | **0.76** | 0.76 | 0.75 | 0.74 | 0.14 |
+| schedule_grid_fill_01 | **0.76** | 0.14 | 0.14 | 0.75 | 0.14 |
+## Step Count Comparison
+| Scenario | max_steps | gpt-5.4 | sonnet-4-6 | opus-4-6 | opus-0514 | gpt-5 |
+|---|:---:|:---:|:---:|:---:|:---:|:---:|
+| buggy_template_fix_01 | 50 | **10** | 9 | 10 | 26 | 15 |
+| conditional_aggregation_01 | 55 | **6** | 10 | 234 | 55 | 9 |
+| conditional_aggregation_02 | 55 | **8** | 5 | 4 | 12 | 5 |
+| cross_sheet_lookup_01 | 60 | **9** | 6 | 7 | 60 | 5 |
+| cross_sheet_lookup_02 | 60 | **7** | 84 | 198 | 60 | 6 |
+| formula_repair_01 | 40 | **12** | 15 | 7 | 27 | 12 |
+| formula_repair_02 | 40 | **8** | 15 | 13 | 13 | 11 |
+| ledger_reconciliation_01 | 60 | 3 | 7 | 5 | **23** | 5 |
+| ledger_reconciliation_02 | 60 | **9** | 18 | 5 | 55 | 21 |
+| messy_table_extraction_01 | 55 | 3 | **15** | 4 | 17 | 8 |
+| range_transformation_01 | 50 | **5** | 12 | 7 | 39 | 4 |
+| schedule_grid_fill_01 | 55 | **8** | 5 | 5 | 26 | 6 |
+Bold = model that solved the scenario (GT=1.0) in fewest steps.
+## Key Observations
+### 1. Step Score Compression (OpenEnv Mode)
+All step_score values fall between 0.31–0.41 regardless of agent quality. This is a known issue — the per-step reward magnitudes are too small (0.02–0.50) relative to the normalizer's expected range [-0.5, +1.0]. The entire ranking comes from ground truth alone.
+### 2. Hallucination Penalty Dominance (Custom Mode)
+The -1.0 hallucination penalty fires frequently (when all tool calls succeed but ground truth fails). This causes scores like -0.66 to -0.80, making the average for models that fail a few scenarios deeply negative — even if they perform well on others.
+### 3. Efficiency Score Bug (Custom Mode)
+The efficiency denominator uses `len(expected_tools)` (tool types, not steps). A scenario with 8 expected tool types but a realistic 20-step solution gives efficiency = 8/20 = 0.40 even for a perfect agent.
+### 4. gpt-5 OpenEnv Anomaly
+gpt-5 scored 0.14 across ALL scenarios in OpenEnv mode (GT=0.0 for every scenario). The same model scored higher on some scenarios in custom mode. This suggests a session or environment issue during the OpenEnv batch run, not a model capability problem.
+### 5. Step Count vs Quality
+Some models take extremely many steps (opus-4-6: 234 on conditional_aggregation_01, 198 on cross_sheet_lookup_02) but still achieve GT=1.0. Others fail in just 3-5 steps. The current scoring doesn't properly differentiate efficient correct agents from slow correct agents.
+## Hardening Assessment
+**SOTA average (custom): 0.67** (gpt-5.4) — below the 0.7 threshold.
+No hardening required at this time. The scenarios are challenging enough — even the best model fails 2/12 scenarios.

generate_scenarios.py ADDED Viewed

	@@ -0,0 +1,1336 @@

+"""Generate all 12 scenario workbooks, scenario JSONs, and hidden test JSONs.
+Run once:  cd spreadsheet && python generate_scenarios.py
+"""
+from __future__ import annotations
+import json
+import os
+import random
+from datetime import date, datetime, timedelta
+from pathlib import Path
+import openpyxl
+from openpyxl.styles import Font, Alignment, PatternFill
+from openpyxl.utils import get_column_letter
+random.seed(42)
+BASE = Path(__file__).resolve().parent
+TEMPLATES_DIR = BASE / "workbooks" / "templates"
+HIDDEN_TESTS_DIR = BASE / "workbooks" / "hidden_tests"
+SCENARIOS_DIR = BASE / "scenarios"
+TEMPLATES_DIR.mkdir(parents=True, exist_ok=True)
+HIDDEN_TESTS_DIR.mkdir(parents=True, exist_ok=True)
+SCENARIOS_DIR.mkdir(parents=True, exist_ok=True)
+NAMES = [
+    "Alice Chen", "Bob Martinez", "Carol Singh", "David Kim", "Elena Petrov",
+    "Frank Okafor", "Grace Yamamoto", "Hector Reyes", "Irene Muller", "James Adebayo",
+    "Karen Novak", "Liam O'Brien", "Maria Santos", "Nathan Park", "Olivia Johansson",
+    "Peter Andersen", "Quinn Zhao", "Rosa Fernandez", "Samuel Osei", "Tanya Volkov",
+    "Uma Krishnan", "Viktor Sokolov", "Wendy Liu", "Xavier Torres", "Yuki Tanaka",
+]
+PRODUCTS = [
+    ("PRD-001", "Widget Alpha", "Hardware"),
+    ("PRD-002", "Widget Beta", "Hardware"),
+    ("PRD-003", "Service Gold", "Services"),
+    ("PRD-004", "Service Silver", "Services"),
+    ("PRD-005", "License Pro", "Software"),
+    ("PRD-006", "License Basic", "Software"),
+    ("PRD-007", "Widget Gamma", "Hardware"),
+    ("PRD-008", "Consulting Plus", "Services"),
+]
+REGIONS = ["North", "South", "East", "West"]
+def _write_json(path: Path, data: dict):
+    with open(path, "w") as f:
+        json.dump(data, f, indent=2)
+def _bold_font():
+    return Font(bold=True)
+def _header_fill():
+    return PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
+def _header_font():
+    return Font(bold=True, color="FFFFFF")
+def _yellow_fill():
+    return PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 1: formula_repair_01 — Multi-Department Budget Repair
+# ══════════════════════════════════════════════════════════════════════
+def gen_formula_repair_01():
+    wb = openpyxl.Workbook()
+    # -- Engineering sheet --
+    ws_eng = wb.active
+    ws_eng.title = "Engineering"
+    headers = ["Name", "Role", "Base Salary", "Bonus %", "Total Comp"]
+    for c, h in enumerate(headers, 1):
+        cell = ws_eng.cell(row=1, column=c, value=h)
+        cell.font = _header_font()
+        cell.fill = _header_fill()
+    roles = ["Senior Engineer", "Staff Engineer", "Principal", "Junior Engineer", "Tech Lead"]
+    for i in range(20):
+        r = i + 2
+        ws_eng.cell(row=r, column=1, value=NAMES[i % len(NAMES)])
+        ws_eng.cell(row=r, column=2, value=roles[i % len(roles)])
+        salary = random.randint(80, 200) * 1000
+        bonus_pct = random.choice([0.05, 0.10, 0.15, 0.20])
+        ws_eng.cell(row=r, column=3, value=salary)
+        ws_eng.cell(row=r, column=4, value=bonus_pct)
+        ws_eng.cell(row=r, column=5, value=f"=C{r}*(1+D{r})")
+    # Add a blank separator row
+    ws_eng.cell(row=12, column=1, value=None)
+    # -- Marketing sheet --
+    ws_mkt = wb.create_sheet("Marketing")
+    mkt_headers = ["Name", "Role", "Base Salary", "Campaign Budget", "Total Comp"]
+    for c, h in enumerate(mkt_headers, 1):
+        cell = ws_mkt.cell(row=1, column=c, value=h)
+        cell.font = _header_font()
+        cell.fill = _header_fill()
+    mkt_roles = ["Marketing Manager", "Content Lead", "SEO Specialist", "Brand Director"]
+    for i in range(15):
+        r = i + 2
+        ws_mkt.cell(row=r, column=1, value=NAMES[(i + 5) % len(NAMES)])
+        ws_mkt.cell(row=r, column=2, value=mkt_roles[i % len(mkt_roles)])
+        salary = random.randint(60, 150) * 1000
+        ws_mkt.cell(row=r, column=3, value=salary)
+        ws_mkt.cell(row=r, column=4, value=random.randint(5, 50) * 1000)
+        # BUG: References a deleted "OldBudget" sheet
+        ws_mkt.cell(row=r, column=5, value=f"=C{r}+OldBudget!B{r}")
+    # -- HR Policies sheet (non-standard layout) --
+    ws_hr = wb.create_sheet("HR Policies")
+    ws_hr.cell(row=1, column=1, value="Company Bonus Policy").font = _bold_font()
+    policies = [
+        "All employees are eligible for annual performance bonuses.",
+        "Bonus tiers: Junior 5%, Mid 10%, Senior 15%, Principal 20%.",
+        "Bonuses are calculated on base salary before taxes.",
+        "Campaign budgets are separate from compensation.",
+        "Total compensation = Base Salary × (1 + Bonus %).",
+    ]
+    for i, p in enumerate(policies):
+        ws_hr.cell(row=i + 3, column=1, value=p)
+    # Lookup table starts at F1 (non-standard)
+    ws_hr.cell(row=1, column=6, value="Tier").font = _bold_font()
+    ws_hr.cell(row=1, column=7, value="Min Bonus %").font = _bold_font()
+    ws_hr.cell(row=1, column=8, value="Max Bonus %").font = _bold_font()
+    tiers = [("Junior", 0.05, 0.08), ("Mid", 0.10, 0.12), ("Senior", 0.15, 0.18), ("Principal", 0.20, 0.25)]
+    for i, (tier, mn, mx) in enumerate(tiers):
+        ws_hr.cell(row=i + 2, column=6, value=tier)
+        ws_hr.cell(row=i + 2, column=7, value=mn)
+        ws_hr.cell(row=i + 2, column=8, value=mx)
+    # -- Summary sheet --
+    ws_sum = wb.create_sheet("Summary")
+    ws_sum.cell(row=1, column=1, value="Department Budget Summary").font = Font(bold=True, size=14)
+    ws_sum.merge_cells("A1:C1")
+    ws_sum.cell(row=3, column=1, value="Department").font = _bold_font()
+    ws_sum.cell(row=3, column=2, value="Total Base Salary").font = _bold_font()
+    ws_sum.cell(row=3, column=3, value="Total Compensation").font = _bold_font()
+    ws_sum.cell(row=4, column=1, value="Engineering")
+    ws_sum.cell(row=4, column=2, value="=SUM(Engineering!C2:C21)")
+    # BUG: wrong range (C2:C10 instead of E2:E21) and references wrong sheet
+    ws_sum.cell(row=4, column=3, value="=SUM(Engineering!C2:C10)")
+    ws_sum.cell(row=5, column=1, value="Marketing")
+    ws_sum.cell(row=5, column=2, value="=SUM(Marketing!C2:C16)")
+    # BUG: references deleted OldBudget sheet
+    ws_sum.cell(row=5, column=3, value="=SUM(OldBudget!E2:E16)")
+    ws_sum.cell(row=7, column=1, value="Grand Total").font = _bold_font()
+    ws_sum.cell(row=7, column=2, value="=B4+B5")
+    # BUG: formula should sum total comp, not base salary again
+    ws_sum.cell(row=7, column=3, value="=B4+B5")
+    # -- Metadata sheet (hidden) --
+    ws_meta = wb.create_sheet("Metadata")
+    ws_meta.sheet_state = "hidden"
+    ws_meta.cell(row=1, column=1, value="Target: Fix Summary!C4, Summary!C5, Summary!C7")
+    ws_meta.cell(row=2, column=1, value="Target: Fix Marketing total comp formulas (remove OldBudget refs)")
+    ws_meta.cell(row=3, column=1, value="Expected: C7 = sum of all total comp across both departments")
+    wb.move_sheet("Summary", offset=-3)
+    wb.save(TEMPLATES_DIR / "multi_department_budget.xlsx")
+    # Scenario JSON
+    _write_json(SCENARIOS_DIR / "formula_repair_01.json", {
+        "id": "formula_repair_01",
+        "description": "Fix broken formulas in a multi-department budget workbook. Summary sheet has wrong ranges and references to a deleted sheet. Marketing total comp formulas reference a non-existent OldBudget sheet.",
+        "instructions": "The Summary sheet has broken formulas. Engineering total compensation references wrong cell ranges. Marketing total compensation references a deleted 'OldBudget' sheet. Fix all broken formulas so Summary correctly aggregates total compensation from Engineering and Marketing sheets. Also fix the Marketing sheet's Total Comp column to use the correct formula (Base Salary × (1 + Bonus %)). Check the HR Policies sheet for the correct bonus calculation method. There is a hidden Metadata sheet with hints.",
+        "workbook": "multi_department_budget.xlsx",
+        "max_steps": 50,
+        "category": "formula_repair",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "formula_repair_01.json", {
+        "scenario_id": "formula_repair_01",
+        "checks": [
+            {"sheet": "Summary", "cell": "C4", "expected_formula": "=SUM(Engineering!E2:E21)"},
+            {"sheet": "Summary", "cell": "C5", "expected_formula": "=SUM(Marketing!E2:E16)"},
+            {"sheet": "Summary", "cell": "C7", "expected_formula": "=C4+C5"},
+            {"sheet": "Marketing", "range": "E2:E16", "check": "no_blanks"},
+            {"sheet": "Engineering", "range": "E2:E21", "check": "no_blanks"},
+        ],
+        "target_regions": [
+            {"sheet": "Summary", "range": "C4:C7"},
+            {"sheet": "Marketing", "range": "E2:E16"},
+        ],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 2: formula_repair_02 — Cascading Formula Errors
+# ══════════════════════════════════════════════════════════════════════
+def gen_formula_repair_02():
+    wb = openpyxl.Workbook()
+    # -- Assumptions sheet --
+    ws_a = wb.active
+    ws_a.title = "Assumptions"
+    ws_a.cell(row=1, column=1, value="Parameter").font = _bold_font()
+    ws_a.cell(row=1, column=2, value="Value").font = _bold_font()
+    params = [
+        ("Growth Rate", 0.08), ("Discount Rate", 0.12), ("Tax Rate", 0.21),
+        ("Inflation", 0.03), ("Depreciation Years", 5),
+    ]
+    for i, (name, val) in enumerate(params):
+        ws_a.cell(row=i + 2, column=1, value=name)
+        ws_a.cell(row=i + 2, column=2, value=val)
+    # -- Revenue Projections sheet --
+    ws_rev = wb.create_sheet("Revenue")
+    ws_rev.cell(row=1, column=1, value="Year").font = _bold_font()
+    for y in range(5):
+        ws_rev.cell(row=1, column=y + 2, value=2024 + y).font = _bold_font()
+    ws_rev.cell(row=2, column=1, value="Base Revenue")
+    ws_rev.cell(row=2, column=2, value=1000000)
+    for y in range(1, 5):
+        col = y + 2
+        prev_col = get_column_letter(col - 1)
+        # BUG: References Assumptions!B2 but should use absolute ref, and year 3+ has wrong formula
+        if y < 3:
+            ws_rev.cell(row=2, column=col, value=f"={prev_col}2*(1+Assumptions!B2)")
+        else:
+            # BUG: hardcoded 0.05 instead of referencing Assumptions
+            ws_rev.cell(row=2, column=col, value=f"={prev_col}2*1.05")
+    ws_rev.cell(row=3, column=1, value="Operating Costs")
+    for y in range(5):
+        col = y + 2
+        ws_rev.cell(row=3, column=col, value=f"={get_column_letter(col)}2*0.65")
+    ws_rev.cell(row=4, column=1, value="EBIT")
+    for y in range(5):
+        col = y + 2
+        cl = get_column_letter(col)
+        ws_rev.cell(row=4, column=col, value=f"={cl}2-{cl}3")
+    ws_rev.cell(row=5, column=1, value="Tax")
+    for y in range(5):
+        col = y + 2
+        cl = get_column_letter(col)
+        # BUG: Uses hardcoded 0.25 instead of Assumptions!B4 (Tax Rate)
+        ws_rev.cell(row=5, column=col, value=f"={cl}4*0.25")
+    ws_rev.cell(row=6, column=1, value="Net Income")
+    for y in range(5):
+        col = y + 2
+        cl = get_column_letter(col)
+        ws_rev.cell(row=6, column=col, value=f"={cl}4-{cl}5")
+    # -- DCF sheet --
+    ws_dcf = wb.create_sheet("DCF")
+    ws_dcf.cell(row=1, column=1, value="Year").font = _bold_font()
+    ws_dcf.cell(row=1, column=2, value="Net Income").font = _bold_font()
+    ws_dcf.cell(row=1, column=3, value="Discount Factor").font = _bold_font()
+    ws_dcf.cell(row=1, column=4, value="PV").font = _bold_font()
+    for y in range(5):
+        r = y + 2
+        col_letter = get_column_letter(y + 2)
+        ws_dcf.cell(row=r, column=1, value=2024 + y)
+        ws_dcf.cell(row=r, column=2, value=f"=Revenue!{col_letter}6")
+        # BUG: discount factor uses wrong cell ref (B3 = Discount Rate is actually B3 in Assumptions)
+        ws_dcf.cell(row=r, column=3, value=f"=1/(1+Assumptions!B3)^{y + 1}")
+        ws_dcf.cell(row=r, column=4, value=f"=B{r}*C{r}")
+    ws_dcf.cell(row=8, column=1, value="Total NPV").font = _bold_font()
+    ws_dcf.cell(row=8, column=4, value="=SUM(D2:D6)")
+    wb.save(TEMPLATES_DIR / "cascading_formula_errors.xlsx")
+    _write_json(SCENARIOS_DIR / "formula_repair_02.json", {
+        "id": "formula_repair_02",
+        "description": "Fix cascading formula errors in a 5-year financial projection. Revenue growth, tax rates, and discount factors reference wrong cells or use hardcoded values instead of the Assumptions sheet.",
+        "instructions": "This workbook has a 5-year financial projection with three sheets: Assumptions, Revenue, and DCF. Multiple formulas contain errors: (1) Revenue years 4-5 use hardcoded 5% growth instead of the Assumptions growth rate. (2) Tax calculations use 25% instead of the Assumptions tax rate (21%). (3) The Assumptions sheet has the correct values — all formulas should reference it. Fix all broken formulas in Revenue and DCF sheets to properly reference Assumptions.",
+        "workbook": "cascading_formula_errors.xlsx",
+        "max_steps": 50,
+        "category": "formula_repair",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "formula_repair_02.json", {
+        "scenario_id": "formula_repair_02",
+        "checks": [
+            {"sheet": "Revenue", "cell": "E2", "expected_formula": "=D2*(1+Assumptions!B2)"},
+            {"sheet": "Revenue", "cell": "F2", "expected_formula": "=E2*(1+Assumptions!B2)"},
+            {"sheet": "Revenue", "cell": "B5", "expected_formula": "=B4*Assumptions!B4"},
+            {"sheet": "Revenue", "cell": "C5", "expected_formula": "=C4*Assumptions!B4"},
+            {"sheet": "Revenue", "cell": "D5", "expected_formula": "=D4*Assumptions!B4"},
+            {"sheet": "Revenue", "cell": "E5", "expected_formula": "=E4*Assumptions!B4"},
+            {"sheet": "Revenue", "cell": "F5", "expected_formula": "=F4*Assumptions!B4"},
+        ],
+        "target_regions": [
+            {"sheet": "Revenue", "range": "B2:F6"},
+            {"sheet": "DCF", "range": "B2:D8"},
+        ],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 3: cross_sheet_lookup_01 — Product Revenue by Region
+# ══════════════════════════════════════════════════════════════════════
+def gen_cross_sheet_lookup_01():
+    wb = openpyxl.Workbook()
+    # -- Products sheet (lookup table) --
+    ws_prod = wb.active
+    ws_prod.title = "Products"
+    for c, h in enumerate(["Code", "Name", "Category", "Unit Price"], 1):
+        ws_prod.cell(row=1, column=c, value=h).font = _header_font()
+        ws_prod.cell(row=1, column=c).fill = _header_fill()
+    for i, (code, name, cat) in enumerate(PRODUCTS):
+        r = i + 2
+        ws_prod.cell(row=r, column=1, value=code)
+        ws_prod.cell(row=r, column=2, value=name)
+        ws_prod.cell(row=r, column=3, value=cat)
+        ws_prod.cell(row=r, column=4, value=random.randint(50, 500))
+    # -- Sales Q1 sheet (raw data with some bad codes) --
+    ws_q1 = wb.create_sheet("Sales_Q1")
+    q1_headers = ["Date", "Product Code", "Region", "Quantity", "Revenue"]
+    for c, h in enumerate(q1_headers, 1):
+        ws_q1.cell(row=1, column=c, value=h).font = _bold_font()
+    q1_rows = 80
+    for i in range(q1_rows):
+        r = i + 2
+        d = date(2024, 1, 1) + timedelta(days=random.randint(0, 89))
+        code = PRODUCTS[random.randint(0, len(PRODUCTS) - 1)][0]
+        # Introduce some bad codes (typos)
+        if i in (15, 32, 55, 71):
+            code = code.replace("-", "")  # PRD001 instead of PRD-001
+        region = random.choice(REGIONS)
+        qty = random.randint(1, 50)
+        ws_q1.cell(row=r, column=1, value=d)
+        ws_q1.cell(row=r, column=2, value=code)
+        ws_q1.cell(row=r, column=3, value=region)
+        ws_q1.cell(row=r, column=4, value=qty)
+        ws_q1.cell(row=r, column=5, value=qty * random.randint(50, 500))
+    # -- Sales Q2 sheet --
+    ws_q2 = wb.create_sheet("Sales_Q2")
+    for c, h in enumerate(q1_headers, 1):
+        ws_q2.cell(row=1, column=c, value=h).font = _bold_font()
+    q2_rows = 90
+    for i in range(q2_rows):
+        r = i + 2
+        d = date(2024, 4, 1) + timedelta(days=random.randint(0, 90))
+        code = PRODUCTS[random.randint(0, len(PRODUCTS) - 1)][0]
+        if i in (20, 45, 67):
+            code = code.lower()  # prd-003 instead of PRD-003
+        region = random.choice(REGIONS)
+        qty = random.randint(1, 50)
+        ws_q2.cell(row=r, column=1, value=d)
+        ws_q2.cell(row=r, column=2, value=code)
+        ws_q2.cell(row=r, column=3, value=region)
+        ws_q2.cell(row=r, column=4, value=qty)
+        ws_q2.cell(row=r, column=5, value=qty * random.randint(50, 500))
+    # -- Summary sheet (agent must fill) --
+    ws_sum = wb.create_sheet("Summary")
+    ws_sum.cell(row=1, column=1, value="Revenue Summary by Region and Category").font = Font(bold=True, size=14)
+    ws_sum.merge_cells("A1:E1")
+    ws_sum.cell(row=3, column=1, value="Region").font = _bold_font()
+    ws_sum.cell(row=3, column=2, value="Hardware").font = _bold_font()
+    ws_sum.cell(row=3, column=3, value="Services").font = _bold_font()
+    ws_sum.cell(row=3, column=4, value="Software").font = _bold_font()
+    ws_sum.cell(row=3, column=5, value="Total").font = _bold_font()
+    for i, region in enumerate(REGIONS):
+        r = i + 4
+        ws_sum.cell(row=r, column=1, value=region)
+        for c in range(2, 6):
+            cell = ws_sum.cell(row=r, column=c)
+            cell.fill = _yellow_fill()
+    ws_sum.cell(row=8, column=1, value="Grand Total").font = _bold_font()
+    for c in range(2, 6):
+        ws_sum.cell(row=8, column=c).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "product_revenue_by_region.xlsx")
+    _write_json(SCENARIOS_DIR / "cross_sheet_lookup_01.json", {
+        "id": "cross_sheet_lookup_01",
+        "description": "Aggregate product revenue by region and category across two quarterly sales sheets. Some product codes have typos. The Summary sheet must be filled with correct totals.",
+        "instructions": "Fill the Summary sheet with revenue totals broken down by Region (rows) and Product Category (columns: Hardware, Services, Software). Data is in Sales_Q1 and Sales_Q2. Use the Products sheet to map product codes to categories. WARNING: Some product codes in the sales sheets have typos (missing dashes or lowercase). You must account for these when aggregating. The Total column should sum across categories for each region. Grand Total row should sum each column.",
+        "workbook": "product_revenue_by_region.xlsx",
+        "max_steps": 60,
+        "category": "cross_sheet_lookup",
+    })
+    # Calculate expected values
+    cat_map = {code: cat for code, _, cat in PRODUCTS}
+    # Also map typo variants
+    for code, _, cat in PRODUCTS:
+        cat_map[code.replace("-", "")] = cat
+        cat_map[code.lower()] = cat
+    totals = {r: {"Hardware": 0, "Services": 0, "Software": 0} for r in REGIONS}
+    for ws_name in ["Sales_Q1", "Sales_Q2"]:
+        ws = wb[ws_name]
+        for row in ws.iter_rows(min_row=2, values_only=True):
+            _, code, region, _, revenue = row
+            if code is None:
+                continue
+            cat = cat_map.get(str(code))
+            if cat and region in totals:
+                totals[region][cat] += revenue
+    checks = []
+    for i, region in enumerate(REGIONS):
+        r = i + 4
+        for j, cat in enumerate(["Hardware", "Services", "Software"]):
+            col = get_column_letter(j + 2)
+            val = totals[region][cat]
+            checks.append({
+                "sheet": "Summary", "cell": f"{col}{r}",
+                "expected_value_range": [val * 0.99, val * 1.01],
+            })
+    checks.append({"sheet": "Summary", "range": "B4:E8", "check": "no_blanks"})
+    _write_json(HIDDEN_TESTS_DIR / "cross_sheet_lookup_01.json", {
+        "scenario_id": "cross_sheet_lookup_01",
+        "checks": checks,
+        "target_regions": [{"sheet": "Summary", "range": "B4:E8"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 4: cross_sheet_lookup_02 — Employee Bonus Calculation
+# ══════════════════════════════════════════════════════════════════════
+def gen_cross_sheet_lookup_02():
+    wb = openpyxl.Workbook()
+    # -- Employees --
+    ws_emp = wb.active
+    ws_emp.title = "Employees"
+    for c, h in enumerate(["ID", "Name", "Department", "Level", "Base Salary"], 1):
+        ws_emp.cell(row=1, column=c, value=h).font = _bold_font()
+    levels = ["Junior", "Mid", "Senior", "Principal"]
+    depts = ["Engineering", "Marketing", "Sales", "Operations"]
+    for i in range(25):
+        r = i + 2
+        ws_emp.cell(row=r, column=1, value=f"EMP-{i+1:03d}")
+        ws_emp.cell(row=r, column=2, value=NAMES[i])
+        ws_emp.cell(row=r, column=3, value=depts[i % len(depts)])
+        ws_emp.cell(row=r, column=4, value=levels[i % len(levels)])
+        ws_emp.cell(row=r, column=5, value=random.randint(50, 200) * 1000)
+    # -- Bonus Tiers (non-standard layout: starts at column F) --
+    ws_tiers = wb.create_sheet("Bonus_Tiers")
+    ws_tiers.cell(row=1, column=1, value="This sheet contains the bonus tier lookup table.")
+    ws_tiers.cell(row=2, column=1, value="The table is located in columns F-I, not A-D.")
+    ws_tiers.cell(row=3, column=1, value="Do not modify columns A-D.")
+    ws_tiers.cell(row=1, column=6, value="Level").font = _bold_font()
+    ws_tiers.cell(row=1, column=7, value="Bonus Rate").font = _bold_font()
+    ws_tiers.cell(row=1, column=8, value="Min Performance").font = _bold_font()
+    ws_tiers.cell(row=1, column=9, value="Cap Multiplier").font = _bold_font()
+    tier_data = [
+        ("Junior", 0.05, 3, 1.0), ("Mid", 0.10, 3, 1.2),
+        ("Senior", 0.15, 4, 1.5), ("Principal", 0.20, 4, 2.0),
+    ]
+    for i, (level, rate, min_perf, cap) in enumerate(tier_data):
+        r = i + 2
+        ws_tiers.cell(row=r, column=6, value=level)
+        ws_tiers.cell(row=r, column=7, value=rate)
+        ws_tiers.cell(row=r, column=8, value=min_perf)
+        ws_tiers.cell(row=r, column=9, value=cap)
+    # -- Performance scores --
+    ws_perf = wb.create_sheet("Performance")
+    for c, h in enumerate(["Employee ID", "Q1", "Q2", "Q3", "Q4", "Avg Score"], 1):
+        ws_perf.cell(row=1, column=c, value=h).font = _bold_font()
+    for i in range(25):
+        r = i + 2
+        ws_perf.cell(row=r, column=1, value=f"EMP-{i+1:03d}")
+        scores = [random.randint(1, 5) for _ in range(4)]
+        for q in range(4):
+            ws_perf.cell(row=r, column=q + 2, value=scores[q])
+        ws_perf.cell(row=r, column=6, value=f"=AVERAGE(B{r}:E{r})")
+    # -- Payroll (agent must fill) --
+    ws_pay = wb.create_sheet("Payroll")
+    for c, h in enumerate(["Employee ID", "Name", "Level", "Base Salary", "Avg Score", "Bonus Rate", "Bonus Amount", "Total Comp"], 1):
+        cell = ws_pay.cell(row=1, column=c, value=h)
+        cell.font = _bold_font()
+    for i in range(25):
+        r = i + 2
+        ws_pay.cell(row=r, column=1, value=f"EMP-{i+1:03d}")
+        for c in range(2, 9):
+            ws_pay.cell(row=r, column=c).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "employee_bonus_calculation.xlsx")
+    _write_json(SCENARIOS_DIR / "cross_sheet_lookup_02.json", {
+        "id": "cross_sheet_lookup_02",
+        "description": "Calculate employee bonuses by cross-referencing Employees, Bonus_Tiers (non-standard layout at column F), and Performance sheets. Fill the Payroll sheet.",
+        "instructions": "Fill the Payroll sheet for all 25 employees. For each employee: (1) Look up their Name, Level, and Base Salary from the Employees sheet. (2) Look up their average performance score from the Performance sheet. (3) Find the bonus rate from the Bonus_Tiers sheet (NOTE: the tier table is in columns F-I, not A-D). (4) If the employee's avg score meets the minimum performance threshold for their tier, apply the bonus rate; otherwise bonus is 0. (5) Bonus Amount = Base Salary × Bonus Rate. (6) Total Comp = Base Salary + Bonus Amount.",
+        "workbook": "employee_bonus_calculation.xlsx",
+        "max_steps": 60,
+        "category": "cross_sheet_lookup",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "cross_sheet_lookup_02.json", {
+        "scenario_id": "cross_sheet_lookup_02",
+        "checks": [
+            {"sheet": "Payroll", "range": "B2:H26", "check": "no_blanks"},
+        ],
+        "target_regions": [{"sheet": "Payroll", "range": "B2:H26"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 5: messy_table_extraction_01 — Vendor Invoice Processing
+# ══════════════════════════════════════════════════════════════════════
+def gen_messy_table_extraction_01():
+    wb = openpyxl.Workbook()
+    ws_raw = wb.active
+    ws_raw.title = "Raw_Invoices"
+    # Messy layout: header row at row 3 (rows 1-2 are title), blank rows between sections
+    ws_raw.cell(row=1, column=1, value="ACME Corp — Invoice Register 2024").font = Font(bold=True, size=14)
+    ws_raw.merge_cells("A1:F1")
+    ws_raw.cell(row=2, column=1, value="Exported from legacy system on 2024-06-15")
+    headers = ["Invoice #", "Date", "Vendor", "Amount", "Currency", "Status"]
+    for c, h in enumerate(headers, 1):
+        ws_raw.cell(row=3, column=c, value=h).font = _bold_font()
+    vendors = ["TechSupply Co.", "Office Depot", "CloudHost Inc.", "DataPipe LLC", "SecureNet Corp"]
+    statuses = ["Paid", "Pending", "Overdue", "Paid", "Pending"]
+    row = 4
+    invoice_count = 0
+    for section in range(4):
+        # Section header
+        ws_raw.cell(row=row, column=1, value=f"--- Q{section+1} 2024 ---").font = Font(italic=True)
+        row += 1
+        for i in range(random.randint(12, 18)):
+            inv_num = f"INV-2024-{invoice_count+1:04d}"
+            d = date(2024, section * 3 + 1, 1) + timedelta(days=random.randint(0, 85))
+            # Mix date formats deliberately
+            if i % 3 == 0:
+                date_str = d.strftime("%m/%d/%Y")  # US format
+            elif i % 3 == 1:
+                date_str = d.strftime("%d-%m-%Y")  # EU format
+            else:
+                date_str = d.isoformat()  # ISO format
+            ws_raw.cell(row=row, column=1, value=inv_num)
+            ws_raw.cell(row=row, column=2, value=date_str)
+            ws_raw.cell(row=row, column=3, value=vendors[i % len(vendors)])
+            ws_raw.cell(row=row, column=4, value=round(random.uniform(500, 50000), 2))
+            ws_raw.cell(row=row, column=5, value="USD")
+            ws_raw.cell(row=row, column=6, value=statuses[i % len(statuses)])
+            row += 1
+            invoice_count += 1
+        # Blank separator
+        row += 1
+    # -- Processed sheet (target) --
+    ws_proc = wb.create_sheet("Processed")
+    proc_headers = ["Invoice #", "Date", "Vendor", "Amount", "Status"]
+    for c, h in enumerate(proc_headers, 1):
+        cell = ws_proc.cell(row=1, column=c, value=h)
+        cell.font = _header_font()
+        cell.fill = _header_fill()
+    # -- Vendor Lookup --
+    ws_vendor = wb.create_sheet("Vendor_Lookup")
+    ws_vendor.cell(row=1, column=1, value="Vendor Name").font = _bold_font()
+    ws_vendor.cell(row=1, column=2, value="Category").font = _bold_font()
+    ws_vendor.cell(row=1, column=3, value="Payment Terms").font = _bold_font()
+    vendor_cats = [
+        ("TechSupply Co.", "Hardware", "Net 30"),
+        ("Office Depot", "Supplies", "Net 15"),
+        ("CloudHost Inc.", "Cloud", "Net 30"),
+        ("DataPipe LLC", "Data", "Net 45"),
+        ("SecureNet Corp", "Security", "Net 30"),
+    ]
+    for i, (v, c, t) in enumerate(vendor_cats):
+        ws_vendor.cell(row=i + 2, column=1, value=v)
+        ws_vendor.cell(row=i + 2, column=2, value=c)
+        ws_vendor.cell(row=i + 2, column=3, value=t)
+    wb.save(TEMPLATES_DIR / "vendor_invoice_processing.xlsx")
+    _write_json(SCENARIOS_DIR / "messy_table_extraction_01.json", {
+        "id": "messy_table_extraction_01",
+        "description": "Extract and clean invoice data from a messy raw export with mixed date formats, section headers mixed in with data rows, and blank separator rows. All dates must be normalized to ISO format.",
+        "instructions": "The Raw_Invoices sheet has messy data exported from a legacy system: title rows at top, section header rows (like '--- Q1 2024 ---') mixed in with data, blank separator rows, and inconsistent date formats (MM/DD/YYYY, DD-MM-YYYY, and ISO). Extract all actual invoice rows into the Processed sheet with: (1) Invoice #, (2) Date in ISO format (YYYY-MM-DD), (3) Vendor, (4) Amount, (5) Status. Skip section headers and blank rows. Dates must all be converted to ISO format.",
+        "workbook": "vendor_invoice_processing.xlsx",
+        "max_steps": 60,
+        "category": "messy_table_extraction",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "messy_table_extraction_01.json", {
+        "scenario_id": "messy_table_extraction_01",
+        "checks": [
+            {"sheet": "Processed", "check": "row_count_equals", "value": invoice_count},
+            {"sheet": "Processed", "column": "B", "check": "all_dates_iso_format"},
+            {"sheet": "Processed", "range": f"A2:E{invoice_count + 1}", "check": "no_blanks"},
+        ],
+        "target_regions": [{"sheet": "Processed", "range": f"A2:E{invoice_count + 1}"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 6: schedule_grid_fill_01 — Employee Schedule Planning
+# ══════════════════════════════════════════════════════════════════════
+def gen_schedule_grid_fill_01():
+    wb = openpyxl.Workbook()
+    employees = NAMES[:12]
+    days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
+    # -- Constraints sheet --
+    ws_con = wb.active
+    ws_con.title = "Constraints"
+    ws_con.cell(row=1, column=1, value="Scheduling Constraints").font = Font(bold=True, size=12)
+    constraints = [
+        "No employee works more than 5 days per week.",
+        "Night shift (N) must not be followed by Morning shift (M) the next day.",
+        "At least 2 employees must be on Morning shift every day.",
+        "At least 1 employee must be on Night shift every day.",
+        "Each employee must work at least 3 days per week.",
+        "Saturday and Sunday must have at least 3 employees on Afternoon shift.",
+    ]
+    for i, c in enumerate(constraints):
+        ws_con.cell(row=i + 3, column=1, value=c)
+    # -- Availability sheet (exceptions) --
+    ws_avail = wb.create_sheet("Availability")
+    ws_avail.cell(row=1, column=1, value="Employee").font = _bold_font()
+    ws_avail.cell(row=1, column=2, value="Unavailable Day").font = _bold_font()
+    ws_avail.cell(row=1, column=3, value="Reason").font = _bold_font()
+    exceptions = [
+        (employees[0], "Monday", "PTO"),
+        (employees[0], "Tuesday", "PTO"),
+        (employees[3], "Saturday", "Personal"),
+        (employees[5], "Sunday", "Religious"),
+        (employees[7], "Friday", "Medical"),
+        (employees[9], "Wednesday", "Training"),
+        (employees[11], "Thursday", "Court duty"),
+    ]
+    for i, (emp, day, reason) in enumerate(exceptions):
+        ws_avail.cell(row=i + 2, column=1, value=emp)
+        ws_avail.cell(row=i + 2, column=2, value=day)
+        ws_avail.cell(row=i + 2, column=3, value=reason)
+    # -- Output sheet (empty grid) --
+    ws_out = wb.create_sheet("Output")
+    ws_out.cell(row=1, column=1, value="Employee").font = _bold_font()
+    for j, day in enumerate(days):
+        ws_out.cell(row=1, column=j + 2, value=day).font = _bold_font()
+    for i, emp in enumerate(employees):
+        ws_out.cell(row=i + 2, column=1, value=emp)
+        for j in range(len(days)):
+            ws_out.cell(row=i + 2, column=j + 2).fill = _yellow_fill()
+    # -- Reference (shift codes) --
+    ws_ref = wb.create_sheet("Shift_Codes")
+    ws_ref.cell(row=1, column=1, value="Code").font = _bold_font()
+    ws_ref.cell(row=1, column=2, value="Shift").font = _bold_font()
+    ws_ref.cell(row=1, column=3, value="Hours").font = _bold_font()
+    codes = [("M", "Morning", "6:00-14:00"), ("A", "Afternoon", "14:00-22:00"), ("N", "Night", "22:00-6:00"), ("X", "Off", "N/A")]
+    for i, (code, shift, hours) in enumerate(codes):
+        ws_ref.cell(row=i + 2, column=1, value=code)
+        ws_ref.cell(row=i + 2, column=2, value=shift)
+        ws_ref.cell(row=i + 2, column=3, value=hours)
+    wb.save(TEMPLATES_DIR / "employee_schedule_grid.xlsx")
+    _write_json(SCENARIOS_DIR / "schedule_grid_fill_01.json", {
+        "id": "schedule_grid_fill_01",
+        "description": "Fill an employee schedule grid for 12 employees across 7 days, respecting prose constraints on max days, shift transitions, minimum coverage, and availability exceptions.",
+        "instructions": "Fill the Output sheet with shift codes (M=Morning, A=Afternoon, N=Night, X=Off) for each employee and day. You must satisfy ALL constraints from the Constraints sheet and respect the availability exceptions from the Availability sheet. Unavailable employees must have X for that day. Check the Shift_Codes sheet for valid codes.",
+        "workbook": "employee_schedule_grid.xlsx",
+        "max_steps": 70,
+        "category": "schedule_grid_fill",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "schedule_grid_fill_01.json", {
+        "scenario_id": "schedule_grid_fill_01",
+        "checks": [
+            {"sheet": "Output", "range": "B2:H13", "check": "no_blanks"},
+            {"sheet": "Output", "check": "constraint_satisfaction", "constraints_sheet": "Constraints"},
+        ],
+        "target_regions": [{"sheet": "Output", "range": "B2:H13"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 7: ledger_reconciliation_01 — Bank Statement Reconciliation
+# ══════════════════════════════════════════════════════════════════════
+def gen_ledger_reconciliation_01():
+    wb = openpyxl.Workbook()
+    # -- Bank Statement --
+    ws_bank = wb.active
+    ws_bank.title = "Bank_Statement"
+    for c, h in enumerate(["Date", "Description", "Reference", "Amount", "Balance"], 1):
+        ws_bank.cell(row=1, column=c, value=h).font = _bold_font()
+    bank_txns = []
+    balance = 50000
+    for i in range(60):
+        d = date(2024, 1, 1) + timedelta(days=random.randint(0, 180))
+        desc = random.choice(["Wire Transfer", "ACH Payment", "Check #" + str(random.randint(1000, 9999)),
+                              "Deposit", "Service Fee", "Interest Credit"])
+        ref = f"BNK-{random.randint(10000, 99999)}"
+        amt = round(random.uniform(-5000, 10000), 2)
+        balance += amt
+        bank_txns.append((d, desc, ref, amt, round(balance, 2)))
+    bank_txns.sort(key=lambda x: x[0])
+    for i, (d, desc, ref, amt, bal) in enumerate(bank_txns):
+        r = i + 2
+        ws_bank.cell(row=r, column=1, value=d)
+        ws_bank.cell(row=r, column=2, value=desc)
+        ws_bank.cell(row=r, column=3, value=ref)
+        ws_bank.cell(row=r, column=4, value=amt)
+        ws_bank.cell(row=r, column=5, value=bal)
+    # -- Internal Ledger (slightly different — some missing, some extra, some amount mismatches) --
+    ws_ledger = wb.active if wb.active.title == "Bank_Statement" else wb.create_sheet("Internal_Ledger")
+    ws_ledger = wb.create_sheet("Internal_Ledger")
+    for c, h in enumerate(["Date", "Description", "GL Code", "Amount", "Reconciled"], 1):
+        ws_ledger.cell(row=1, column=c, value=h).font = _bold_font()
+    ledger_row = 2
+    matched = 0
+    unmatched_bank = []
+    for i, (d, desc, ref, amt, bal) in enumerate(bank_txns):
+        if random.random() < 0.1:
+            unmatched_bank.append(i)
+            continue
+        gl = f"GL-{random.randint(4000, 4999)}"
+        ledger_amt = amt
+        if random.random() < 0.08:
+            ledger_amt = round(amt + random.uniform(-50, 50), 2)
+        ws_ledger.cell(row=ledger_row, column=1, value=d)
+        ws_ledger.cell(row=ledger_row, column=2, value=desc)
+        ws_ledger.cell(row=ledger_row, column=3, value=gl)
+        ws_ledger.cell(row=ledger_row, column=4, value=ledger_amt)
+        ws_ledger.cell(row=ledger_row, column=5, value="No")
+        ledger_row += 1
+        matched += 1
+    # Add some extra ledger entries not in bank
+    for i in range(5):
+        d = date(2024, 1, 1) + timedelta(days=random.randint(0, 180))
+        ws_ledger.cell(row=ledger_row, column=1, value=d)
+        ws_ledger.cell(row=ledger_row, column=2, value=f"Manual Adjustment {i+1}")
+        ws_ledger.cell(row=ledger_row, column=3, value=f"GL-{random.randint(5000, 5999)}")
+        ws_ledger.cell(row=ledger_row, column=4, value=round(random.uniform(-2000, 2000), 2))
+        ws_ledger.cell(row=ledger_row, column=5, value="No")
+        ledger_row += 1
+    # -- Reconciled sheet (target) --
+    ws_recon = wb.create_sheet("Reconciled")
+    for c, h in enumerate(["Date", "Description", "Bank Amount", "Ledger Amount", "Difference", "Status"], 1):
+        cell = ws_recon.cell(row=1, column=c, value=h)
+        cell.font = _header_font()
+        cell.fill = _header_fill()
+    wb.save(TEMPLATES_DIR / "bank_reconciliation.xlsx")
+    _write_json(SCENARIOS_DIR / "ledger_reconciliation_01.json", {
+        "id": "ledger_reconciliation_01",
+        "description": "Reconcile a bank statement against an internal ledger. Find mismatches, missing entries, and amount discrepancies. Fill the Reconciled sheet.",
+        "instructions": "Compare Bank_Statement and Internal_Ledger to produce a reconciliation report in the Reconciled sheet. For each transaction: match by date and description. Record the Bank Amount, Ledger Amount, Difference (Bank - Ledger), and Status (Matched/Mismatch/Bank Only/Ledger Only). Include ALL transactions from both sources. Sort by date.",
+        "workbook": "bank_reconciliation.xlsx",
+        "max_steps": 60,
+        "category": "ledger_reconciliation",
+    })
+    total_entries = len(bank_txns) + 5
+    _write_json(HIDDEN_TESTS_DIR / "ledger_reconciliation_01.json", {
+        "scenario_id": "ledger_reconciliation_01",
+        "checks": [
+            {"sheet": "Reconciled", "range": f"A2:F{total_entries + 1}", "check": "no_blanks"},
+            {"sheet": "Reconciled", "column": "A", "check": "all_dates_iso_format"},
+        ],
+        "target_regions": [{"sheet": "Reconciled", "range": f"A2:F{total_entries + 1}"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 8: ledger_reconciliation_02 — Multi-Currency Reconciliation
+# ══════════════════════════════════════════════════════════════════════
+def gen_ledger_reconciliation_02():
+    wb = openpyxl.Workbook()
+    # -- Transactions USD --
+    ws_usd = wb.active
+    ws_usd.title = "Transactions_USD"
+    for c, h in enumerate(["Date", "Description", "Amount USD"], 1):
+        ws_usd.cell(row=1, column=c, value=h).font = _bold_font()
+    usd_total = 0
+    for i in range(30):
+        r = i + 2
+        d = date(2024, 1, 1) + timedelta(days=random.randint(0, 180))
+        amt = round(random.uniform(100, 15000), 2)
+        usd_total += amt
+        ws_usd.cell(row=r, column=1, value=d.strftime("%m/%d/%Y"))
+        ws_usd.cell(row=r, column=2, value=f"USD Transaction {i+1}")
+        ws_usd.cell(row=r, column=3, value=amt)
+    # -- Transactions EUR --
+    ws_eur = wb.create_sheet("Transactions_EUR")
+    for c, h in enumerate(["Date", "Description", "Amount EUR"], 1):
+        ws_eur.cell(row=1, column=c, value=h).font = _bold_font()
+    eur_amounts = []
+    for i in range(20):
+        r = i + 2
+        d = date(2024, 1, 1) + timedelta(days=random.randint(0, 180))
+        amt = round(random.uniform(100, 12000), 2)
+        eur_amounts.append(amt)
+        ws_eur.cell(row=r, column=1, value=d.strftime("%d-%m-%Y"))
+        ws_eur.cell(row=r, column=2, value=f"EUR Transaction {i+1}")
+        ws_eur.cell(row=r, column=3, value=amt)
+    # -- Exchange Rates --
+    ws_fx = wb.create_sheet("Exchange_Rates")
+    ws_fx.cell(row=1, column=1, value="Month").font = _bold_font()
+    ws_fx.cell(row=1, column=2, value="EUR_to_USD").font = _bold_font()
+    months = ["January", "February", "March", "April", "May", "June"]
+    rates = [1.08, 1.09, 1.07, 1.10, 1.08, 1.11]
+    for i, (m, rate) in enumerate(zip(months, rates)):
+        ws_fx.cell(row=i + 2, column=1, value=m)
+        ws_fx.cell(row=i + 2, column=2, value=rate)
+    # -- Summary (target) --
+    ws_sum = wb.create_sheet("Summary")
+    ws_sum.cell(row=1, column=1, value="Multi-Currency Reconciliation Summary").font = Font(bold=True, size=14)
+    ws_sum.cell(row=3, column=1, value="Category").font = _bold_font()
+    ws_sum.cell(row=3, column=2, value="Amount (USD)").font = _bold_font()
+    ws_sum.cell(row=4, column=1, value="Total USD Transactions")
+    ws_sum.cell(row=4, column=2).fill = _yellow_fill()
+    ws_sum.cell(row=5, column=1, value="Total EUR Transactions (converted to USD)")
+    ws_sum.cell(row=5, column=2).fill = _yellow_fill()
+    ws_sum.cell(row=6, column=1, value="Grand Total (USD)")
+    ws_sum.cell(row=6, column=2).fill = _yellow_fill()
+    ws_sum.cell(row=8, column=1, value="EUR Transaction Count")
+    ws_sum.cell(row=8, column=2).fill = _yellow_fill()
+    ws_sum.cell(row=9, column=1, value="USD Transaction Count")
+    ws_sum.cell(row=9, column=2).fill = _yellow_fill()
+    ws_sum.cell(row=10, column=1, value="Total Transaction Count")
+    ws_sum.cell(row=10, column=2).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "multi_currency_reconciliation.xlsx")
+    avg_rate = sum(rates) / len(rates)
+    eur_in_usd = round(sum(eur_amounts) * avg_rate, 2)
+    _write_json(SCENARIOS_DIR / "ledger_reconciliation_02.json", {
+        "id": "ledger_reconciliation_02",
+        "description": "Reconcile USD and EUR transaction sheets into a unified summary. EUR dates are in DD-MM-YYYY format, USD dates in MM/DD/YYYY. Convert EUR to USD using the Exchange_Rates sheet.",
+        "instructions": "The workbook has USD and EUR transaction sheets with different date formats. Convert all EUR transactions to USD using the monthly exchange rate from the Exchange_Rates sheet (match each transaction's month to the correct rate). Fill the Summary sheet with: Total USD Transactions, Total EUR Transactions converted to USD, Grand Total, and transaction counts. Dates in the transaction sheets use different formats — be careful when determining which month each EUR transaction falls in.",
+        "workbook": "multi_currency_reconciliation.xlsx",
+        "max_steps": 55,
+        "category": "ledger_reconciliation",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "ledger_reconciliation_02.json", {
+        "scenario_id": "ledger_reconciliation_02",
+        "checks": [
+            {"sheet": "Summary", "cell": "B4", "expected_value_range": [usd_total * 0.99, usd_total * 1.01]},
+            {"sheet": "Summary", "cell": "B9", "expected_value_range": [30, 30]},
+            {"sheet": "Summary", "cell": "B8", "expected_value_range": [20, 20]},
+            {"sheet": "Summary", "cell": "B10", "expected_value_range": [50, 50]},
+            {"sheet": "Summary", "range": "B4:B10", "check": "no_blanks"},
+        ],
+        "target_regions": [{"sheet": "Summary", "range": "B4:B10"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 9: range_transformation_01 — Data Pivot and Reshape
+# ══════════════════════════════════════════════════════════════════════
+def gen_range_transformation_01():
+    wb = openpyxl.Workbook()
+    # -- Raw Data (long format) --
+    ws_raw = wb.active
+    ws_raw.title = "Raw_Data"
+    for c, h in enumerate(["Employee", "Month", "Metric", "Value"], 1):
+        ws_raw.cell(row=1, column=c, value=h).font = _bold_font()
+    emps = NAMES[:8]
+    months_list = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
+    metrics = ["Sales", "Returns", "Net Revenue"]
+    row = 2
+    data_map = {}
+    for emp in emps:
+        for month in months_list:
+            sales = random.randint(10000, 80000)
+            returns = random.randint(500, sales // 5)
+            net = sales - returns
+            for metric, val in [("Sales", sales), ("Returns", returns), ("Net Revenue", net)]:
+                ws_raw.cell(row=row, column=1, value=emp)
+                ws_raw.cell(row=row, column=2, value=month)
+                ws_raw.cell(row=row, column=3, value=metric)
+                ws_raw.cell(row=row, column=4, value=val)
+                data_map[(emp, month, metric)] = val
+                row += 1
+    # -- Instructions --
+    ws_instr = wb.create_sheet("Instructions")
+    ws_instr.cell(row=1, column=1, value="Data Transformation Task").font = Font(bold=True, size=12)
+    ws_instr.cell(row=3, column=1, value="Pivot the Raw_Data into the Pivot_Output sheet.")
+    ws_instr.cell(row=4, column=1, value="Layout: Rows = Employees, Column groups = Months")
+    ws_instr.cell(row=5, column=1, value="Under each month, show three sub-columns: Sales, Returns, Net Revenue")
+    ws_instr.cell(row=6, column=1, value="Add a final column 'Total Net Revenue' summing Net Revenue across all months")
+    ws_instr.cell(row=7, column=1, value="Sort employees alphabetically.")
+    # -- Pivot Output (target, headers pre-filled) --
+    ws_pivot = wb.create_sheet("Pivot_Output")
+    ws_pivot.cell(row=1, column=1, value="Employee").font = _bold_font()
+    col = 2
+    for month in months_list:
+        ws_pivot.cell(row=1, column=col, value=f"{month} Sales").font = _bold_font()
+        ws_pivot.cell(row=1, column=col + 1, value=f"{month} Returns").font = _bold_font()
+        ws_pivot.cell(row=1, column=col + 2, value=f"{month} Net Revenue").font = _bold_font()
+        col += 3
+    ws_pivot.cell(row=1, column=col, value="Total Net Revenue").font = _bold_font()
+    for i, emp in enumerate(sorted(emps)):
+        ws_pivot.cell(row=i + 2, column=1, value=emp)
+        for c in range(2, col + 1):
+            ws_pivot.cell(row=i + 2, column=c).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "data_pivot_reshape.xlsx")
+    # Calculate expected total net revenues for checks
+    sorted_emps = sorted(emps)
+    checks = []
+    for i, emp in enumerate(sorted_emps):
+        total_net = sum(data_map.get((emp, m, "Net Revenue"), 0) for m in months_list)
+        checks.append({
+            "sheet": "Pivot_Output",
+            "cell": f"{get_column_letter(col)}{i+2}",
+            "expected_value_range": [total_net * 0.99, total_net * 1.01],
+        })
+    checks.append({"sheet": "Pivot_Output", "range": f"B2:{get_column_letter(col)}{len(sorted_emps)+1}", "check": "no_blanks"})
+    checks.append({"sheet": "Pivot_Output", "check": "row_count_equals", "value": len(sorted_emps)})
+    _write_json(SCENARIOS_DIR / "range_transformation_01.json", {
+        "id": "range_transformation_01",
+        "description": "Pivot long-format employee metrics data into a wide-format table. Each employee gets one row with Sales, Returns, and Net Revenue for each of 6 months, plus a total.",
+        "instructions": "The Raw_Data sheet has employee performance metrics in long format (one row per employee-month-metric combination). Pivot this into the Pivot_Output sheet: one row per employee (sorted alphabetically), with columns for each month's Sales, Returns, and Net Revenue. Add a Total Net Revenue column at the end. The headers are pre-filled; fill the data cells.",
+        "workbook": "data_pivot_reshape.xlsx",
+        "max_steps": 60,
+        "category": "range_transformation",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "range_transformation_01.json", {
+        "scenario_id": "range_transformation_01",
+        "checks": checks,
+        "target_regions": [{"sheet": "Pivot_Output", "range": f"B2:{get_column_letter(col)}{len(sorted_emps)+1}"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 10: conditional_aggregation_01 — Sales Commission
+# ══════════════════════════════════════════════════════════════════════
+def gen_conditional_aggregation_01():
+    wb = openpyxl.Workbook()
+    # -- Sales data --
+    ws_sales = wb.active
+    ws_sales.title = "Sales"
+    for c, h in enumerate(["Salesperson", "Region", "Q1", "Q2", "Q3", "Q4", "Annual Total"], 1):
+        ws_sales.cell(row=1, column=c, value=h).font = _bold_font()
+    salespeople = NAMES[:15]
+    sales_data = {}
+    for i, sp in enumerate(salespeople):
+        r = i + 2
+        ws_sales.cell(row=r, column=1, value=sp)
+        ws_sales.cell(row=r, column=2, value=REGIONS[i % len(REGIONS)])
+        quarterly = [random.randint(20000, 150000) for _ in range(4)]
+        for q in range(4):
+            ws_sales.cell(row=r, column=q + 3, value=quarterly[q])
+        ws_sales.cell(row=r, column=7, value=f"=SUM(C{r}:F{r})")
+        sales_data[sp] = sum(quarterly)
+    # -- Commission Rules (tiered, complex) --
+    ws_rules = wb.create_sheet("Commission_Rules")
+    ws_rules.cell(row=1, column=1, value="Commission Tier Structure").font = Font(bold=True, size=12)
+    ws_rules.cell(row=3, column=1, value="Tier").font = _bold_font()
+    ws_rules.cell(row=3, column=2, value="Min Annual Sales").font = _bold_font()
+    ws_rules.cell(row=3, column=3, value="Max Annual Sales").font = _bold_font()
+    ws_rules.cell(row=3, column=4, value="Commission Rate").font = _bold_font()
+    tiers = [
+        ("Bronze", 0, 200000, 0.03),
+        ("Silver", 200001, 400000, 0.05),
+        ("Gold", 400001, 600000, 0.08),
+        ("Platinum", 600001, 99999999, 0.12),
+    ]
+    for i, (tier, mn, mx, rate) in enumerate(tiers):
+        r = i + 4
+        ws_rules.cell(row=r, column=1, value=tier)
+        ws_rules.cell(row=r, column=2, value=mn)
+        ws_rules.cell(row=r, column=3, value=mx)
+        ws_rules.cell(row=r, column=4, value=rate)
+    ws_rules.cell(row=9, column=1, value="IMPORTANT: Commission is calculated on the FULL annual total,")
+    ws_rules.cell(row=10, column=1, value="not just the amount within each tier. Apply the single rate for the tier.")
+    ws_rules.cell(row=11, column=1, value="Regional bonus: West region gets +2% on top of tier rate.")
+    # -- Commissions (target) --
+    ws_comm = wb.create_sheet("Commissions")
+    for c, h in enumerate(["Salesperson", "Region", "Annual Sales", "Tier", "Base Rate", "Regional Bonus", "Total Rate", "Commission Amount"], 1):
+        cell = ws_comm.cell(row=1, column=c, value=h)
+        cell.font = _bold_font()
+    for i, sp in enumerate(salespeople):
+        r = i + 2
+        ws_comm.cell(row=r, column=1, value=sp)
+        for c in range(2, 9):
+            ws_comm.cell(row=r, column=c).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "sales_commission.xlsx")
+    # Calculate expected values
+    checks = []
+    for i, sp in enumerate(salespeople):
+        total = sales_data[sp]
+        for tier_name, mn, mx, rate in tiers:
+            if mn <= total <= mx:
+                base_rate = rate
+                break
+        region = REGIONS[i % len(REGIONS)]
+        regional_bonus = 0.02 if region == "West" else 0
+        total_rate = base_rate + regional_bonus
+        commission = round(total * total_rate, 2)
+        r = i + 2
+        checks.append({
+            "sheet": "Commissions", "cell": f"H{r}",
+            "expected_value_range": [commission * 0.99, commission * 1.01],
+        })
+    checks.append({"sheet": "Commissions", "range": "B2:H16", "check": "no_blanks"})
+    _write_json(SCENARIOS_DIR / "conditional_aggregation_01.json", {
+        "id": "conditional_aggregation_01",
+        "description": "Calculate tiered sales commissions for 15 salespeople. Commission rates depend on annual total tier. West region gets a +2% bonus on top of the base tier rate.",
+        "instructions": "Fill the Commissions sheet for each salesperson. Look up their Annual Sales from the Sales sheet, determine their tier from Commission_Rules, and calculate the commission. IMPORTANT: Read the Commission_Rules sheet carefully — the commission rate is applied to the FULL annual total (not marginal). The West region gets an additional +2% regional bonus. Fill all columns: Region, Annual Sales, Tier, Base Rate, Regional Bonus, Total Rate, Commission Amount.",
+        "workbook": "sales_commission.xlsx",
+        "max_steps": 55,
+        "category": "conditional_aggregation",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "conditional_aggregation_01.json", {
+        "scenario_id": "conditional_aggregation_01",
+        "checks": checks,
+        "target_regions": [{"sheet": "Commissions", "range": "B2:H16"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 11: conditional_aggregation_02 — Budget Allocation
+# ══════════════════════════════════════════════════════════════════════
+def gen_conditional_aggregation_02():
+    wb = openpyxl.Workbook()
+    # -- Requests --
+    ws_req = wb.active
+    ws_req.title = "Requests"
+    for c, h in enumerate(["Request ID", "Department", "Priority", "Requested Amount", "Justification"], 1):
+        ws_req.cell(row=1, column=c, value=h).font = _bold_font()
+    priorities = ["Critical", "High", "Medium", "Low"]
+    departments = ["Engineering", "Marketing", "Sales", "Operations", "HR"]
+    requests_data = []
+    for i in range(20):
+        r = i + 2
+        req_id = f"REQ-{i+1:03d}"
+        dept = departments[i % len(departments)]
+        pri = priorities[i % len(priorities)]
+        amt = random.randint(5, 100) * 1000
+        ws_req.cell(row=r, column=1, value=req_id)
+        ws_req.cell(row=r, column=2, value=dept)
+        ws_req.cell(row=r, column=3, value=pri)
+        ws_req.cell(row=r, column=4, value=amt)
+        ws_req.cell(row=r, column=5, value=f"Budget for {dept.lower()} initiative {i+1}")
+        requests_data.append((req_id, dept, pri, amt))
+    # -- Budget Pool --
+    ws_pool = wb.create_sheet("Budget_Pool")
+    ws_pool.cell(row=1, column=1, value="Total Available Budget").font = _bold_font()
+    total_budget = 800000
+    ws_pool.cell(row=1, column=2, value=total_budget)
+    ws_pool.cell(row=3, column=1, value="Allocation Rules:").font = _bold_font()
+    ws_pool.cell(row=4, column=1, value="1. Critical requests get 100% of requested amount (if budget allows).")
+    ws_pool.cell(row=5, column=1, value="2. High requests get 80% of requested amount.")
+    ws_pool.cell(row=6, column=1, value="3. Medium requests get 50% of requested amount.")
+    ws_pool.cell(row=7, column=1, value="4. Low requests get 25% of requested amount.")
+    ws_pool.cell(row=8, column=1, value="5. Process in priority order (Critical first, then High, etc.).")
+    ws_pool.cell(row=9, column=1, value="6. If remaining budget is less than the allocation, give remaining budget.")
+    ws_pool.cell(row=10, column=1, value="7. Once budget is exhausted, remaining requests get $0.")
+    # -- Output --
+    ws_out = wb.create_sheet("Output")
+    for c, h in enumerate(["Request ID", "Department", "Priority", "Requested", "Allocation %", "Allocated Amount", "Remaining Budget"], 1):
+        ws_out.cell(row=1, column=c, value=h).font = _bold_font()
+    for i in range(20):
+        r = i + 2
+        ws_out.cell(row=r, column=1, value=requests_data[i][0])
+        for c in range(2, 8):
+            ws_out.cell(row=r, column=c).fill = _yellow_fill()
+    wb.save(TEMPLATES_DIR / "budget_allocation.xlsx")
+    # Calculate expected allocations
+    priority_rates = {"Critical": 1.0, "High": 0.8, "Medium": 0.5, "Low": 0.25}
+    sorted_requests = sorted(requests_data, key=lambda x: list(priority_rates.keys()).index(x[2]))
+    remaining = total_budget
+    allocations = {}
+    for req_id, dept, pri, amt in sorted_requests:
+        rate = priority_rates[pri]
+        intended = round(amt * rate)
+        actual = min(intended, remaining)
+        remaining -= actual
+        allocations[req_id] = actual
+    checks = []
+    for i, (req_id, dept, pri, amt) in enumerate(requests_data):
+        r = i + 2
+        expected = allocations[req_id]
+        checks.append({
+            "sheet": "Output", "cell": f"F{r}",
+            "expected_value_range": [expected * 0.99 - 1, expected * 1.01 + 1],
+        })
+    checks.append({"sheet": "Output", "range": "B2:G21", "check": "no_blanks"})
+    _write_json(SCENARIOS_DIR / "conditional_aggregation_02.json", {
+        "id": "conditional_aggregation_02",
+        "description": "Allocate a fixed budget across 20 requests with priority-based allocation rates. Process in priority order; when budget runs out, remaining requests get $0.",
+        "instructions": "Fill the Output sheet by allocating the budget from Budget_Pool across all 20 requests. Read the allocation rules carefully from the Budget_Pool sheet. Process requests in priority order (Critical first, then High, Medium, Low). Within the same priority, process in order of appearance. Each priority level gets a different % of requested amount. Track remaining budget — when it's exhausted, remaining requests get $0.",
+        "workbook": "budget_allocation.xlsx",
+        "max_steps": 55,
+        "category": "conditional_aggregation",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "conditional_aggregation_02.json", {
+        "scenario_id": "conditional_aggregation_02",
+        "checks": checks,
+        "target_regions": [{"sheet": "Output", "range": "B2:G21"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Scenario 12: buggy_template_fix_01 — Quarterly Report Debug
+# ══════════════════════════════════════════════════════════════════════
+def gen_buggy_template_fix_01():
+    wb = openpyxl.Workbook()
+    quarters = ["Q1", "Q2", "Q3", "Q4"]
+    metrics = ["Revenue", "COGS", "Gross Profit", "OpEx", "EBITDA", "Depreciation", "Net Income"]
+    # Generate quarterly data
+    q_data = {}
+    for q_name in quarters:
+        ws = wb.active if q_name == "Q1" else wb.create_sheet(q_name)
+        if q_name == "Q1":
+            ws.title = "Q1"
+        ws.cell(row=1, column=1, value=f"{q_name} Financial Data").font = Font(bold=True, size=12)
+        ws.cell(row=3, column=1, value="Metric").font = _bold_font()
+        ws.cell(row=3, column=2, value="Amount").font = _bold_font()
+        revenue = random.randint(800, 1200) * 1000
+        cogs = round(revenue * random.uniform(0.35, 0.45))
+        gross = revenue - cogs
+        opex = round(revenue * random.uniform(0.15, 0.25))
+        ebitda = gross - opex
+        depreciation = round(revenue * 0.05)
+        net_income = ebitda - depreciation
+        values = [revenue, cogs, gross, opex, ebitda, depreciation, net_income]
+        q_data[q_name] = dict(zip(metrics, values))
+        for i, (metric, val) in enumerate(zip(metrics, values)):
+            ws.cell(row=i + 4, column=1, value=metric)
+            if metric in ("Gross Profit", "EBITDA", "Net Income"):
+                # These should be formulas
+                if metric == "Gross Profit":
+                    ws.cell(row=i + 4, column=2, value=f"=B4-B5")
+                elif metric == "EBITDA":
+                    ws.cell(row=i + 4, column=2, value=f"=B6-B7")
+                elif metric == "Net Income":
+                    ws.cell(row=i + 4, column=2, value=f"=B8-B9")
+            else:
+                ws.cell(row=i + 4, column=2, value=val)
+    # -- Annual Summary (with BUGS) --
+    ws_annual = wb.create_sheet("Annual_Summary")
+    ws_annual.cell(row=1, column=1, value="Annual Financial Summary").font = Font(bold=True, size=14)
+    ws_annual.cell(row=3, column=1, value="Metric").font = _bold_font()
+    for i, q in enumerate(quarters):
+        ws_annual.cell(row=3, column=i + 2, value=q).font = _bold_font()
+    ws_annual.cell(row=3, column=6, value="Annual Total").font = _bold_font()
+    for mi, metric in enumerate(metrics):
+        r = mi + 4
+        ws_annual.cell(row=r, column=1, value=metric)
+        for qi, q in enumerate(quarters):
+            col = qi + 2
+            data_row = mi + 4
+            # BUG 1: Q3 references Q2 sheet instead of Q3
+            if qi == 2:
+                ws_annual.cell(row=r, column=col, value=f"=Q2!B{data_row}")
+            # BUG 2: Q4 has off-by-one (references row+1)
+            elif qi == 3:
+                ws_annual.cell(row=r, column=col, value=f"=Q4!B{data_row + 1}")
+            else:
+                ws_annual.cell(row=r, column=col, value=f"={q}!B{data_row}")
+        # BUG 3: Annual total only sums Q1 and Q2 (missing Q3 and Q4)
+        ws_annual.cell(row=r, column=6, value=f"=B{r}+C{r}")
+    wb.save(TEMPLATES_DIR / "quarterly_report_debug.xlsx")
+    # Build expected formulas
+    checks = []
+    for mi, metric in enumerate(metrics):
+        r = mi + 4
+        data_row = mi + 4
+        # Q3 should reference Q3, not Q2
+        checks.append({
+            "sheet": "Annual_Summary", "cell": f"D{r}",
+            "expected_formula": f"=Q3!B{data_row}",
+        })
+        # Q4 should reference correct row
+        checks.append({
+            "sheet": "Annual_Summary", "cell": f"E{r}",
+            "expected_formula": f"=Q4!B{data_row}",
+        })
+        # Annual total should sum all 4 quarters
+        checks.append({
+            "sheet": "Annual_Summary", "cell": f"F{r}",
+            "expected_formula": f"=B{r}+C{r}+D{r}+E{r}",
+        })
+    _write_json(SCENARIOS_DIR / "buggy_template_fix_01.json", {
+        "id": "buggy_template_fix_01",
+        "description": "Debug a quarterly financial report template. The Annual_Summary sheet has three types of formula bugs: Q3 references Q2 data, Q4 has off-by-one row errors, and Annual Totals only sum 2 of 4 quarters.",
+        "instructions": "The Annual_Summary sheet should show each metric's value for Q1-Q4 (pulled from the individual quarter sheets) and an Annual Total. Three bugs exist: (1) The Q3 column references the Q2 sheet instead of Q3. (2) The Q4 column references the wrong row (off by one). (3) The Annual Total formula only sums Q1+Q2 instead of all four quarters. Fix all formulas in the Annual_Summary sheet. Do NOT modify the individual quarter sheets.",
+        "workbook": "quarterly_report_debug.xlsx",
+        "max_steps": 50,
+        "category": "buggy_template_fix",
+    })
+    _write_json(HIDDEN_TESTS_DIR / "buggy_template_fix_01.json", {
+        "scenario_id": "buggy_template_fix_01",
+        "checks": checks,
+        "target_regions": [{"sheet": "Annual_Summary", "range": "D4:F10"}],
+    })
+# ══════════════════════════════════════════════════════════════════════
+# Main
+# ══════════════════════════════════════════════════════════════════════
+def main():
+    generators = [
+        gen_formula_repair_01,
+        gen_formula_repair_02,
+        gen_cross_sheet_lookup_01,
+        gen_cross_sheet_lookup_02,
+        gen_messy_table_extraction_01,
+        gen_schedule_grid_fill_01,
+        gen_ledger_reconciliation_01,
+        gen_ledger_reconciliation_02,
+        gen_range_transformation_01,
+        gen_conditional_aggregation_01,
+        gen_conditional_aggregation_02,
+        gen_buggy_template_fix_01,
+    ]
+    for gen_fn in generators:
+        name = gen_fn.__name__.replace("gen_", "")
+        print(f"Generating {name}...")
+        gen_fn()
+        print(f"  ✓ {name}")
+    print(f"\nGenerated {len(generators)} scenarios:")
+    print(f"  Templates:    {TEMPLATES_DIR}")
+    print(f"  Scenarios:    {SCENARIOS_DIR}")
+    print(f"  Hidden Tests: {HIDDEN_TESTS_DIR}")
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,86 @@

+"""Data models for the Spreadsheet Environment.
+SpreadsheetAction has explicit Pydantic fields for MCP-style tool calls
+(tool_name, arguments_json) compatible with the OpenEnv web interface.
+"""
+from __future__ import annotations
+import json as _json
+from typing import Any, Union
+from pydantic import ConfigDict, Field, TypeAdapter, model_validator
+from openenv.core.env_server.mcp_types import (
+    CallToolAction,
+    CallToolObservation,
+    ListToolsAction,
+    ListToolsObservation,
+)
+from openenv.core.env_server.types import Action, Observation, State
+_mcp_action_adapter = TypeAdapter(Union[ListToolsAction, CallToolAction])
+_AVAILABLE_TOOLS = (
+    "list_tools, get_session_info, list_scenarios, load_scenario, "
+    "list_sheets, read_range, write_cell, write_range, inspect_formula, "
+    "list_named_targets, validate_partial, submit_workbook, "
+    "get_edit_history, reset_scenario"
+)
+class SpreadsheetAction(Action):
+    """Action with explicit fields for the web UI and MCP compatibility."""
+    model_config = ConfigDict(
+        extra="forbid",
+        validate_assignment=True,
+        arbitrary_types_allowed=True,
+    )
+    tool_name: str = Field(
+        default="list_tools",
+        description=f"MCP tool to invoke. Available: {_AVAILABLE_TOOLS}",
+    )
+    arguments_json: str = Field(
+        default="{}",
+        description=(
+            'Tool arguments as a JSON string. Examples: '
+            '"{}" for no args, '
+            '\'{"scenario_id":"formula_repair_01"}\' for load_scenario, '
+            '\'{"sheet":"Summary","range":"A1:D10"}\' for read_range, '
+            '\'{"sheet":"Summary","cell":"C15","value":"=SUM(A1:A10)"}\' for write_cell'
+        ),
+    )
+    @model_validator(mode="after")
+    def _validate_json(self) -> "SpreadsheetAction":
+        if self.arguments_json.strip():
+            _json.loads(self.arguments_json)
+        return self
+    @classmethod
+    def model_validate(cls, data: Any, **kwargs: Any) -> Action:
+        if isinstance(data, dict) and data.get("type") in ("call_tool", "list_tools"):
+            return _mcp_action_adapter.validate_python(data)
+        return super().model_validate(data, **kwargs)
+    def to_mcp_action(self) -> Action:
+        if self.tool_name == "list_tools":
+            return ListToolsAction()
+        args = _json.loads(self.arguments_json) if self.arguments_json else {}
+        return CallToolAction(tool_name=self.tool_name, arguments=args)
+SpreadsheetObservation = CallToolObservation
+SpreadsheetState = State
+__all__ = [
+    "SpreadsheetAction",
+    "SpreadsheetObservation",
+    "SpreadsheetState",
+    "CallToolAction",
+    "CallToolObservation",
+    "ListToolsAction",
+    "ListToolsObservation",
+]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+spec_version: 1
+name: spreadsheet
+description: "Spreadsheet — exact workbook manipulation and reasoning over realistic spreadsheet tasks"
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

openenv_spreadsheet.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,17 @@

+Metadata-Version: 2.4
+Name: openenv-spreadsheet
+Version: 0.1.0
+Summary: Spreadsheet gym — exact workbook manipulation and reasoning over realistic spreadsheet tasks
+Requires-Python: >=3.11
+Requires-Dist: openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@v0.2.1
+Requires-Dist: fastapi>=0.115.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: uvicorn[standard]>=0.24.0
+Requires-Dist: fastmcp>=0.1.0
+Requires-Dist: httpx>=0.25.0
+Requires-Dist: openpyxl>=3.1.0
+Requires-Dist: pandas>=2.0.0
+Requires-Dist: formulas>=1.2.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_spreadsheet.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,24 @@

+README.md
+__init__.py
+client.py
+generate_scenarios.py
+models.py
+openenv.yaml
+pyproject.toml
+./__init__.py
+./client.py
+./generate_scenarios.py
+./models.py
+./openenv.yaml
+openenv_spreadsheet.egg-info/PKG-INFO
+openenv_spreadsheet.egg-info/SOURCES.txt
+openenv_spreadsheet.egg-info/dependency_links.txt
+openenv_spreadsheet.egg-info/entry_points.txt
+openenv_spreadsheet.egg-info/requires.txt
+openenv_spreadsheet.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/formula_utils.py
+server/scenario_loader.py
+server/spreadsheet_environment.py
+server/workbook_engine.py

openenv_spreadsheet.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_spreadsheet.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = spreadsheet.server.app:main

openenv_spreadsheet.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@v0.2.1
+fastapi>=0.115.0
+pydantic>=2.0.0
+uvicorn[standard]>=0.24.0
+fastmcp>=0.1.0
+httpx>=0.25.0
+openpyxl>=3.1.0
+pandas>=2.0.0
+formulas>=1.2.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_spreadsheet.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ spreadsheet

pyproject.toml ADDED Viewed

	@@ -0,0 +1,37 @@

+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-spreadsheet"
+version = "0.1.0"
+description = "Spreadsheet gym — exact workbook manipulation and reasoning over realistic spreadsheet tasks"
+requires-python = ">=3.11"
+dependencies = [
+    "openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@v0.2.1",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn[standard]>=0.24.0",
+    "fastmcp>=0.1.0",
+    "httpx>=0.25.0",
+    "openpyxl>=3.1.0",
+    "pandas>=2.0.0",
+    "formulas>=1.2.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+server = "spreadsheet.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["spreadsheet", "spreadsheet.server"]
+package-dir = { "spreadsheet" = ".", "spreadsheet.server" = "server" }
+[tool.setuptools.package-data]
+spreadsheet = ["openenv.yaml"]

scenarios/.gitkeep ADDED Viewed

File without changes

scenarios/buggy_template_fix_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "buggy_template_fix_01",
+  "description": "Debug a quarterly financial report template. The Annual_Summary sheet has three types of formula bugs: Q3 references Q2 data, Q4 has off-by-one row errors, and Annual Totals only sum 2 of 4 quarters.",
+  "instructions": "The Annual_Summary sheet should show each metric's value for Q1-Q4 (pulled from the individual quarter sheets) and an Annual Total. Three bugs exist: (1) The Q3 column references the Q2 sheet instead of Q3. (2) The Q4 column references the wrong row (off by one). (3) The Annual Total formula only sums Q1+Q2 instead of all four quarters. Fix all formulas in the Annual_Summary sheet. Do NOT modify the individual quarter sheets.",
+  "workbook": "quarterly_report_debug.xlsx",
+  "max_steps": 50,
+  "category": "buggy_template_fix"
+}

scenarios/conditional_aggregation_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "conditional_aggregation_01",
+  "description": "Calculate tiered sales commissions for 15 salespeople. Commission rates depend on annual total tier. West region gets a +2% bonus on top of the base tier rate.",
+  "instructions": "Fill the Commissions sheet for each salesperson. Look up their Annual Sales from the Sales sheet, determine their tier from Commission_Rules, and calculate the commission. IMPORTANT: Read the Commission_Rules sheet carefully \u2014 the commission rate is applied to the FULL annual total (not marginal). The West region gets an additional +2% regional bonus. Fill all columns: Region, Annual Sales, Tier, Base Rate, Regional Bonus, Total Rate, Commission Amount.",
+  "workbook": "sales_commission.xlsx",
+  "max_steps": 55,
+  "category": "conditional_aggregation"
+}

scenarios/conditional_aggregation_02.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "conditional_aggregation_02",
+  "description": "Allocate a fixed budget across 20 requests with priority-based allocation rates. Process in priority order; when budget runs out, remaining requests get $0.",
+  "instructions": "Fill the Output sheet by allocating the budget from Budget_Pool across all 20 requests. Read the allocation rules carefully from the Budget_Pool sheet. Process requests in priority order (Critical first, then High, Medium, Low). Within the same priority, process in order of appearance. Each priority level gets a different % of requested amount. Track remaining budget \u2014 when it's exhausted, remaining requests get $0.",
+  "workbook": "budget_allocation.xlsx",
+  "max_steps": 55,
+  "category": "conditional_aggregation"
+}

scenarios/cross_sheet_lookup_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "cross_sheet_lookup_01",
+  "description": "Aggregate product revenue by region and category across two quarterly sales sheets. Some product codes have typos. The Summary sheet must be filled with correct totals.",
+  "instructions": "Fill the Summary sheet with revenue totals broken down by Region (rows) and Product Category (columns: Hardware, Services, Software). Data is in Sales_Q1 and Sales_Q2. Use the Products sheet to map product codes to categories. WARNING: Some product codes in the sales sheets have typos (missing dashes or lowercase). You must account for these when aggregating. The Total column should sum across categories for each region. Grand Total row should sum each column.",
+  "workbook": "product_revenue_by_region.xlsx",
+  "max_steps": 60,
+  "category": "cross_sheet_lookup"
+}

scenarios/cross_sheet_lookup_02.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "cross_sheet_lookup_02",
+  "description": "Calculate employee bonuses by cross-referencing Employees, Bonus_Tiers (non-standard layout at column F), and Performance sheets. Fill the Payroll sheet.",
+  "instructions": "Fill the Payroll sheet for all 25 employees. For each employee: (1) Look up their Name, Level, and Base Salary from the Employees sheet. (2) Look up their average performance score from the Performance sheet. (3) Find the bonus rate from the Bonus_Tiers sheet (NOTE: the tier table is in columns F-I, not A-D). (4) If the employee's avg score meets the minimum performance threshold for their tier, apply the bonus rate; otherwise bonus is 0. (5) Bonus Amount = Base Salary \u00d7 Bonus Rate. (6) Total Comp = Base Salary + Bonus Amount.",
+  "workbook": "employee_bonus_calculation.xlsx",
+  "max_steps": 60,
+  "category": "cross_sheet_lookup"
+}

scenarios/formula_repair_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "formula_repair_01",
+  "description": "Fix broken formulas in a multi-department budget workbook. Summary sheet has wrong ranges and references to a deleted sheet. Marketing total comp formulas reference a non-existent OldBudget sheet.",
+  "instructions": "The Summary sheet has broken formulas. Engineering total compensation references wrong cell ranges. Marketing total compensation references a deleted 'OldBudget' sheet. Fix all broken formulas so Summary correctly aggregates total compensation from Engineering and Marketing sheets. Also fix the Marketing sheet's Total Comp column to use the correct formula (Base Salary \u00d7 (1 + Bonus %)). Check the HR Policies sheet for the correct bonus calculation method. There is a hidden Metadata sheet with hints.",
+  "workbook": "multi_department_budget.xlsx",
+  "max_steps": 50,
+  "category": "formula_repair"
+}

scenarios/formula_repair_02.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "formula_repair_02",
+  "description": "Fix cascading formula errors in a 5-year financial projection. Revenue growth, tax rates, and discount factors reference wrong cells or use hardcoded values instead of the Assumptions sheet.",
+  "instructions": "This workbook has a 5-year financial projection with three sheets: Assumptions, Revenue, and DCF. Multiple formulas contain errors: (1) Revenue years 4-5 use hardcoded 5% growth instead of the Assumptions growth rate. (2) Tax calculations use 25% instead of the Assumptions tax rate (21%). (3) The Assumptions sheet has the correct values \u2014 all formulas should reference it. Fix all broken formulas in Revenue and DCF sheets to properly reference Assumptions.",
+  "workbook": "cascading_formula_errors.xlsx",
+  "max_steps": 50,
+  "category": "formula_repair"
+}

scenarios/ledger_reconciliation_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "ledger_reconciliation_01",
+  "description": "Reconcile a bank statement against an internal ledger. Find mismatches, missing entries, and amount discrepancies. Fill the Reconciled sheet.",
+  "instructions": "Compare Bank_Statement and Internal_Ledger to produce a reconciliation report in the Reconciled sheet. For each transaction: match by date and description. Record the Bank Amount, Ledger Amount, Difference (Bank - Ledger), and Status (Matched/Mismatch/Bank Only/Ledger Only). Include ALL transactions from both sources. Sort by date.",
+  "workbook": "bank_reconciliation.xlsx",
+  "max_steps": 60,
+  "category": "ledger_reconciliation"
+}

scenarios/ledger_reconciliation_02.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "ledger_reconciliation_02",
+  "description": "Reconcile USD and EUR transaction sheets into a unified summary. EUR dates are in DD-MM-YYYY format, USD dates in MM/DD/YYYY. Convert EUR to USD using the Exchange_Rates sheet.",
+  "instructions": "The workbook has USD and EUR transaction sheets with different date formats. Convert all EUR transactions to USD using the monthly exchange rate from the Exchange_Rates sheet (match each transaction's month to the correct rate). Fill the Summary sheet with: Total USD Transactions, Total EUR Transactions converted to USD, Grand Total, and transaction counts. Dates in the transaction sheets use different formats \u2014 be careful when determining which month each EUR transaction falls in.",
+  "workbook": "multi_currency_reconciliation.xlsx",
+  "max_steps": 55,
+  "category": "ledger_reconciliation"
+}

scenarios/messy_table_extraction_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "messy_table_extraction_01",
+  "description": "Extract and clean invoice data from a messy raw export with mixed date formats, section headers mixed in with data rows, and blank separator rows. All dates must be normalized to ISO format.",
+  "instructions": "The Raw_Invoices sheet has messy data exported from a legacy system: title rows at top, section header rows (like '--- Q1 2024 ---') mixed in with data, blank separator rows, and inconsistent date formats (MM/DD/YYYY, DD-MM-YYYY, and ISO). Extract all actual invoice rows into the Processed sheet with: (1) Invoice #, (2) Date in ISO format (YYYY-MM-DD), (3) Vendor, (4) Amount, (5) Status. Skip section headers and blank rows. Dates must all be converted to ISO format.",
+  "workbook": "vendor_invoice_processing.xlsx",
+  "max_steps": 60,
+  "category": "messy_table_extraction"
+}

scenarios/range_transformation_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "range_transformation_01",
+  "description": "Pivot long-format employee metrics data into a wide-format table. Each employee gets one row with Sales, Returns, and Net Revenue for each of 6 months, plus a total.",
+  "instructions": "The Raw_Data sheet has employee performance metrics in long format (one row per employee-month-metric combination). Pivot this into the Pivot_Output sheet: one row per employee (sorted alphabetically), with columns for each month's Sales, Returns, and Net Revenue. Add a Total Net Revenue column at the end. The headers are pre-filled; fill the data cells.",
+  "workbook": "data_pivot_reshape.xlsx",
+  "max_steps": 60,
+  "category": "range_transformation"
+}

scenarios/schedule_grid_fill_01.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "id": "schedule_grid_fill_01",
+  "description": "Fill an employee schedule grid for 12 employees across 7 days, respecting prose constraints on max days, shift transitions, minimum coverage, and availability exceptions.",
+  "instructions": "Fill the Output sheet with shift codes (M=Morning, A=Afternoon, N=Night, X=Off) for each employee and day. You must satisfy ALL constraints from the Constraints sheet and respect the availability exceptions from the Availability sheet. Unavailable employees must have X for that day. Check the Shift_Codes sheet for valid codes.",
+  "workbook": "employee_schedule_grid.xlsx",
+  "max_steps": 70,
+  "category": "schedule_grid_fill"
+}

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Spreadsheet environment server components."""
+from .spreadsheet_environment import SpreadsheetEnvironment
+__all__ = ["SpreadsheetEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,41 @@

+"""FastAPI application for the Spreadsheet Environment."""
+from __future__ import annotations
+import os
+import sys
+from pathlib import Path
+try:
+    from openenv.core.env_server.http_server import create_app
+except ImportError as e:
+    raise ImportError(
+        "openenv is required. Install with: uv sync"
+    ) from e
+try:
+    from spreadsheet.models import SpreadsheetAction, SpreadsheetObservation
+    from spreadsheet.server.spreadsheet_environment import SpreadsheetEnvironment
+except ImportError:
+    sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+    from models import SpreadsheetAction, SpreadsheetObservation
+    from server.spreadsheet_environment import SpreadsheetEnvironment
+MAX_CONCURRENT_ENVS = int(os.getenv("MAX_CONCURRENT_ENVS", "8"))
+app = create_app(
+    SpreadsheetEnvironment,
+    SpreadsheetAction,
+    SpreadsheetObservation,
+    env_name="spreadsheet",
+    max_concurrent_envs=MAX_CONCURRENT_ENVS,
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/formula_utils.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Formula utilities — Excel-compatible formula evaluation using the formulas library."""
+from __future__ import annotations
+from typing import Any, Optional
+import openpyxl
+def evaluate_formula(wb: openpyxl.Workbook, sheet_name: str, cell_ref: str) -> Optional[Any]:
+    """Evaluate an Excel formula in-memory using the formulas library.
+    Falls back to openpyxl data_only reload if formulas library fails.
+    Returns the computed value or None on failure.
+    """
+    try:
+        import formulas
+    except ImportError:
+        return _fallback_evaluate(wb, sheet_name, cell_ref)
+    try:
+        xl_model = formulas.ExcelModel().loads(wb.path).finish()
+        solution = xl_model.calculate()
+        key = f"'{sheet_name}'!{cell_ref.upper()}"
+        return solution.get(key)
+    except Exception:
+        return _fallback_evaluate(wb, sheet_name, cell_ref)
+def _fallback_evaluate(wb: openpyxl.Workbook, sheet_name: str, cell_ref: str) -> Optional[Any]:
+    """Fallback: reload workbook with data_only=True to get cached values."""
+    if not wb.path:
+        return None
+    try:
+        wb_data = openpyxl.load_workbook(wb.path, data_only=True)
+        ws = wb_data[sheet_name]
+        return ws[cell_ref].value
+    except Exception:
+        return None

server/scenario_loader.py ADDED Viewed

	@@ -0,0 +1,62 @@

+"""Scenario loader — load scenario definitions from JSON files."""
+from __future__ import annotations
+import json
+import os
+import shutil
+from pathlib import Path
+from typing import Optional
+WORKBOOKS_DIR = os.getenv("WORKBOOKS_DIR", str(Path(__file__).resolve().parent.parent / "workbooks"))
+SCENARIOS_DIR = os.getenv("SCENARIOS_DIR", str(Path(__file__).resolve().parent.parent / "scenarios"))
+FIXTURES_DIR = os.path.join(WORKBOOKS_DIR, "fixtures")
+TEMPLATES_DIR = os.path.join(WORKBOOKS_DIR, "templates")
+def list_scenarios() -> list[dict]:
+    """List all available scenario definitions."""
+    if not os.path.isdir(SCENARIOS_DIR):
+        return []
+    scenarios = []
+    for f in sorted(os.listdir(SCENARIOS_DIR)):
+        if not f.endswith(".json"):
+            continue
+        try:
+            data = load_scenario_def(f.replace(".json", ""))
+            scenarios.append({
+                "scenario_id": data.get("id", f.replace(".json", "")),
+                "description": data.get("description", ""),
+                "workbook": data.get("workbook", ""),
+                "max_steps": data.get("max_steps", 50),
+            })
+        except Exception:
+            continue
+    return scenarios
+def load_scenario_def(scenario_id: str) -> dict:
+    """Load a single scenario definition JSON."""
+    path = os.path.join(SCENARIOS_DIR, f"{scenario_id}.json")
+    if not os.path.isfile(path):
+        raise FileNotFoundError(f"Scenario '{scenario_id}' not found at {path}")
+    with open(path) as f:
+        return json.load(f)
+def prepare_workbook_for_session(scenario_id: str, session_id: str) -> str:
+    """Copy the template workbook to a session-specific fixture path.
+    Returns the path to the session's workbook copy.
+    """
+    scenario = load_scenario_def(scenario_id)
+    template_name = scenario.get("workbook", f"{scenario_id}.xlsx")
+    template_path = os.path.join(TEMPLATES_DIR, template_name)
+    if not os.path.isfile(template_path):
+        raise FileNotFoundError(f"Template workbook not found: {template_path}")
+    os.makedirs(FIXTURES_DIR, exist_ok=True)
+    session_wb_path = os.path.join(FIXTURES_DIR, f"{session_id}_{template_name}")
+    shutil.copy2(template_path, session_wb_path)
+    return session_wb_path

server/spreadsheet_environment.py ADDED Viewed

	@@ -0,0 +1,445 @@

+"""Spreadsheet Environment — MCPEnvironment with 13 real MCP tools."""
+from __future__ import annotations
+import json
+import os
+from typing import Any, Optional
+from uuid import uuid4
+from fastmcp import FastMCP
+from openenv.core.env_server.mcp_environment import MCPEnvironment
+from openenv.core.env_server.types import Action, EnvironmentMetadata, Observation, State
+from .scenario_loader import list_scenarios as _list_scenarios
+from .scenario_loader import load_scenario_def, prepare_workbook_for_session
+from .workbook_engine import WorkbookEngine, WorkbookSession
+WRITE_TOOLS = frozenset({"write_cell", "write_range"})
+READ_TOOLS = frozenset({"read_range", "read_cell"})
+class SpreadsheetEnvironment(MCPEnvironment):
+    """Workbook manipulation environment — 13 MCP tools exposed via OpenEnv."""
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    def __init__(self):
+        mcp = FastMCP("spreadsheet")
+        self._session_id: Optional[str] = None
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._action_history: list[dict] = []
+        self._engine = WorkbookEngine()
+        self._scenario: Optional[dict] = None
+        self._last_validate_passed: int = 0
+        def _record(tool_name: str, **kwargs: Any) -> None:
+            self._action_history.append({"tool": tool_name, "arguments": kwargs})
+        # ── Tool 1: get_session_info ──────────────────────────────────
+        @mcp.tool()
+        def get_session_info() -> dict:
+            """Return current session metadata: session ID, loaded scenario, step count, edit count, and solve status."""
+            _record("get_session_info")
+            if not self._session_id:
+                return {"status": "no_session", "message": "Reset the environment first."}
+            return self._engine.get_session_info(self._session_id)
+        # ── Tool 2: list_scenarios ────────────────────────────────────
+        @mcp.tool()
+        def list_scenarios() -> dict:
+            """List all available spreadsheet task scenarios. Each entry has a scenario_id, description, workbook name, and max_steps."""
+            _record("list_scenarios")
+            scenarios = _list_scenarios()
+            return {"scenarios": scenarios, "count": len(scenarios)}
+        # ── Tool 3: load_scenario ─────────────────────────────────────
+        @mcp.tool()
+        def load_scenario(scenario_id: str) -> dict:
+            """Load a scenario and its workbook to begin working on a task.
+            Args:
+                scenario_id: The ID of the scenario to load (from list_scenarios).
+            Returns the scenario description, instructions, sheet list, and target regions.
+            """
+            _record("load_scenario", scenario_id=scenario_id)
+            try:
+                scenario_def = load_scenario_def(scenario_id)
+            except FileNotFoundError as e:
+                return {"error": str(e)}
+            wb_path = prepare_workbook_for_session(scenario_id, self._session_id)
+            session = WorkbookSession(
+                session_id=self._session_id,
+                scenario_id=scenario_id,
+                workbook_path=wb_path,
+            )
+            self._engine.load_workbook(session)
+            self._scenario = scenario_def
+            self._last_validate_passed = 0
+            sheets = self._engine.list_sheets(self._session_id)
+            targets = self._engine.get_named_targets(self._session_id)
+            return {
+                "scenario_id": scenario_id,
+                "description": scenario_def.get("description", ""),
+                "instructions": scenario_def.get("instructions", ""),
+                "max_steps": scenario_def.get("max_steps", 50),
+                "sheets": sheets,
+                "target_regions": targets,
+            }
+        # ── Tool 4: list_sheets ───────────────────────────────────────
+        @mcp.tool()
+        def list_sheets() -> dict:
+            """List all sheets in the current workbook with their names, row/column dimensions, and visibility state.
+            Returns an error if no scenario is loaded.
+            """
+            _record("list_sheets")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            sheets = self._engine.list_sheets(self._session_id)
+            return {"sheets": sheets}
+        # ── Tool 5: read_range ────────────────────────────────────────
+        @mcp.tool()
+        def read_range(sheet: str, range: str) -> dict:
+            """Read a rectangular range of cells from a sheet.
+            Args:
+                sheet: Sheet name (e.g. "Summary", "Engineering").
+                range: Cell range in A1 notation (e.g. "A1", "B2:D10", "A1:Z100").
+            Returns a 2D array of cell values. Formulas are shown as their formula strings (e.g. "=SUM(A1:A10)").
+            """
+            _record("read_range", sheet=sheet, range=range)
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            try:
+                data = self._engine.read_range(self._session_id, sheet, range)
+                return {"sheet": sheet, "range": range, "data": data}
+            except (ValueError, KeyError) as e:
+                return {"error": str(e)}
+        # ── Tool 6: write_cell ────────────────────────────────────────
+        @mcp.tool()
+        def write_cell(sheet: str, cell: str, value: str) -> dict:
+            """Write a value or formula to a single cell.
+            Args:
+                sheet: Sheet name.
+                cell: Cell reference in A1 notation (e.g. "C15").
+                value: The value to write. Use "=" prefix for formulas (e.g. "=SUM(A1:A10)").
+                       Numeric strings are auto-converted to numbers.
+            Returns confirmation of the write.
+            """
+            _record("write_cell", sheet=sheet, cell=cell, value=value)
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            try:
+                parsed = _parse_value(value)
+                result = self._engine.write_cell(self._session_id, sheet, cell, parsed)
+                return result
+            except (ValueError, KeyError) as e:
+                return {"error": str(e)}
+        # ── Tool 7: write_range ───────────────────────────────────────
+        @mcp.tool()
+        def write_range(sheet: str, start_cell: str, data: str) -> dict:
+            """Write a 2D block of values starting from a cell.
+            Args:
+                sheet: Sheet name.
+                start_cell: Top-left cell in A1 notation (e.g. "A1").
+                data: JSON string of a 2D array, e.g. '[[1, 2], [3, 4]]'.
+                      Use "=" prefix for formulas within cells.
+            Returns the range written and cell count.
+            """
+            _record("write_range", sheet=sheet, start_cell=start_cell, data=data)
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            try:
+                parsed_data = json.loads(data)
+                if not isinstance(parsed_data, list):
+                    return {"error": "data must be a JSON 2D array, e.g. '[[1, 2], [3, 4]]'"}
+                converted = [[_parse_value(str(v)) for v in row] for row in parsed_data]
+                result = self._engine.write_range(self._session_id, sheet, start_cell, converted)
+                return result
+            except json.JSONDecodeError:
+                return {"error": "Invalid JSON in data parameter."}
+            except (ValueError, KeyError) as e:
+                return {"error": str(e)}
+        # ── Tool 8: inspect_formula ───────────────────────────────────
+        @mcp.tool()
+        def inspect_formula(sheet: str, cell: str) -> dict:
+            """Return the raw formula string from a cell, or indicate it's not a formula.
+            Args:
+                sheet: Sheet name.
+                cell: Cell reference (e.g. "C15").
+            Returns the formula string if the cell contains one, or is_formula=false otherwise.
+            """
+            _record("inspect_formula", sheet=sheet, cell=cell)
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            try:
+                return self._engine.inspect_formula(self._session_id, sheet, cell)
+            except (ValueError, KeyError) as e:
+                return {"error": str(e)}
+        # ── Tool 9: list_named_targets ────────────────────────────────
+        @mcp.tool()
+        def list_named_targets() -> dict:
+            """Show the target areas and allowed output zones for the current scenario.
+            Target regions are the cells/ranges where the agent is expected to write.
+            Writing outside these areas may incur a penalty.
+            """
+            _record("list_named_targets")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            targets = self._engine.get_named_targets(self._session_id)
+            return {"target_regions": targets}
+        # ── Tool 10: validate_partial ─────────────────────────────────
+        @mcp.tool()
+        def validate_partial() -> dict:
+            """Check partial progress on the current scenario.
+            Returns the number of hidden test checks that pass and fail,
+            without revealing the specific expected answers. Use this to
+            gauge progress before submitting.
+            """
+            _record("validate_partial")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            result = self._engine.validate_partial(self._session_id)
+            self._last_validate_passed = result.get("passed", 0)
+            return result
+        # ── Tool 11: submit_workbook ──────────────────────────────────
+        @mcp.tool()
+        def submit_workbook() -> dict:
+            """Submit the workbook for final evaluation against hidden tests.
+            Runs all hidden test checks and returns structured results including
+            pass rate, per-check pass/fail, and whether the scenario is fully solved.
+            """
+            _record("submit_workbook")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            result = self._engine.run_hidden_tests(self._session_id)
+            return result
+        # ── Tool 12: get_edit_history ─────────────────────────────────
+        @mcp.tool()
+        def get_edit_history() -> dict:
+            """Return the full list of cell edits made in this session, in order.
+            Each entry shows the sheet, cell, value written, and the step number.
+            """
+            _record("get_edit_history")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            history = self._engine.get_edit_history(self._session_id)
+            return {"edits": history, "count": len(history)}
+        # ── Tool 13: reset_scenario ───────────────────────────────────
+        @mcp.tool()
+        def reset_scenario() -> dict:
+            """Restore the workbook to its original state, discarding all edits.
+            The scenario remains loaded; you do not need to call load_scenario again.
+            """
+            _record("reset_scenario")
+            if not self._session_id or self._session_id not in self._engine._sessions:
+                return {"error": "No workbook loaded. Use load_scenario first."}
+            self._engine.reset_workbook(self._session_id)
+            self._last_validate_passed = 0
+            sheets = self._engine.list_sheets(self._session_id)
+            return {"message": "Workbook reset to original state.", "sheets": sheets}
+        super().__init__(mcp)
+    # ── Lifecycle ─────────────────────────────────────────────────────
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        **kwargs: Any,
+    ) -> Observation:
+        if self._session_id and self._session_id in self._engine._sessions:
+            self._engine.close_session(self._session_id)
+        self._session_id = str(uuid4())
+        self._state = State(
+            episode_id=episode_id or self._session_id,
+            step_count=0,
+        )
+        self._scenario = None
+        self._action_history = []
+        self._last_validate_passed = 0
+        return Observation(
+            done=False,
+            reward=0.0,
+            metadata={
+                "status": "ready",
+                "session_id": self._session_id,
+                "instructions": (
+                    "Use list_scenarios to see available tasks, then load_scenario to begin. "
+                    "Read the workbook structure with list_sheets and read_range before making edits. "
+                    "Use submit_workbook when done."
+                ),
+            },
+        )
+    def step(self, action: Action, timeout_s: Optional[float] = None, **kwargs: Any) -> Observation:
+        self._state.step_count += 1
+        if hasattr(action, "to_mcp_action"):
+            action = action.to_mcp_action()
+        obs = super().step(action, timeout_s=timeout_s, **kwargs)
+        tool_name = getattr(action, "tool_name", None)
+        args = getattr(action, "arguments", None) or {}
+        result = getattr(obs, "result", None)
+        if hasattr(result, "data"):
+            result = result.data
+        elif isinstance(result, dict) and "data" in result:
+            result = result["data"]
+        if not isinstance(result, dict):
+            result = {}
+        reward = self._compute_step_reward(tool_name, args, result)
+        if reward != 0:
+            obs.reward = (obs.reward or 0) + reward
+        session = self._engine._sessions.get(self._session_id)
+        if session:
+            obs.done = session.solved
+        return obs
+    def _compute_step_reward(self, tool_name: Optional[str], args: dict, result: dict) -> float:
+        """Layer 1 per-step reward heuristics (internal, approximate)."""
+        if isinstance(result, dict) and result.get("error"):
+            return 0.0
+        if tool_name == "inspect_formula":
+            return 0.05
+        if tool_name == "validate_partial":
+            new_passed = result.get("passed", 0)
+            if new_passed > self._last_validate_passed:
+                return 0.10
+            return 0.05
+        if tool_name in WRITE_TOOLS:
+            sheet = args.get("sheet", "")
+            cell = args.get("cell", args.get("start_cell", ""))
+            in_target = True
+            if self._session_id and self._session_id in self._engine._sessions:
+                in_target = self._engine.is_in_target_region(self._session_id, sheet, cell)
+            if not in_target:
+                return -0.10
+            recent_reads = any(
+                a["tool"] in ("read_range", "read_cell")
+                for a in self._action_history[-4:-1]
+            )
+            reward = 0.05
+            if recent_reads:
+                reward += 0.05
+            if self._session_id and self._session_id in self._engine._sessions:
+                cell_ref = cell.upper()
+                write_count = sum(
+                    1 for a in self._action_history
+                    if a["tool"] in WRITE_TOOLS
+                    and a["arguments"].get("cell", a["arguments"].get("start_cell", "")).upper() == cell_ref
+                    and a["arguments"].get("sheet", "") == sheet
+                )
+                if write_count >= 3:
+                    reward -= 0.05
+            return reward
+        if tool_name in READ_TOOLS:
+            return 0.0
+        if tool_name == "submit_workbook":
+            pass_rate = result.get("pass_rate", 0)
+            if pass_rate == 1.0:
+                return 0.50
+            if pass_rate > 0.5:
+                return 0.20
+            if pass_rate < 0.3:
+                return -0.10
+            return 0.0
+        return 0.0
+    def _step_impl(self, action: Action, timeout_s: Optional[float] = None, **kwargs: Any) -> Observation:
+        return Observation(
+            done=False,
+            reward=0.0,
+            metadata={
+                "error": f"Unknown action type: {type(action).__name__}. "
+                "Use ListToolsAction or CallToolAction."
+            },
+        )
+    @property
+    def state(self) -> State:
+        return self._state
+    def get_metadata(self) -> EnvironmentMetadata:
+        return EnvironmentMetadata(
+            name="spreadsheet",
+            description="Spreadsheet — exact workbook manipulation and reasoning over realistic spreadsheet tasks",
+            version="0.1.0",
+        )
+def _parse_value(value: str) -> Any:
+    """Convert string input to appropriate Python type for cell writing."""
+    if isinstance(value, str) and value.startswith("="):
+        return value
+    try:
+        if "." in value:
+            return float(value)
+        return int(value)
+    except (ValueError, TypeError):
+        pass
+    if value.lower() in ("true",):
+        return True
+    if value.lower() in ("false",):
+        return False
+    if value.lower() in ("none", "null", ""):
+        return None
+    return value

server/workbook_engine.py ADDED Viewed

	@@ -0,0 +1,564 @@

+"""Workbook engine — load, edit, and validate Excel workbooks via openpyxl."""
+from __future__ import annotations
+import copy
+import json
+import os
+import re
+import shutil
+from datetime import date, datetime
+from pathlib import Path
+from typing import Any, Optional
+import openpyxl
+from openpyxl.utils import get_column_letter, column_index_from_string
+from pydantic import BaseModel
+WORKBOOKS_DIR = os.getenv("WORKBOOKS_DIR", str(Path(__file__).resolve().parent.parent / "workbooks"))
+HIDDEN_TESTS_DIR = os.path.join(WORKBOOKS_DIR, "hidden_tests")
+FIXTURES_DIR = os.path.join(WORKBOOKS_DIR, "fixtures")
+TEMPLATES_DIR = os.path.join(WORKBOOKS_DIR, "templates")
+class WorkbookSession(BaseModel):
+    session_id: str
+    scenario_id: str
+    workbook_path: str
+    modified_cells: list[dict] = []
+    step_count: int = 0
+    solved: bool = False
+def _parse_cell_ref(ref: str) -> tuple[str, int]:
+    """Parse 'A1' into (column_letter, row_number)."""
+    m = re.match(r"^([A-Z]+)(\d+)$", ref.upper().strip())
+    if not m:
+        raise ValueError(f"Invalid cell reference: {ref}")
+    return m.group(1), int(m.group(2))
+def _parse_range_ref(range_str: str) -> tuple[str, str]:
+    """Parse 'A1:D10' into ('A1', 'D10')."""
+    parts = range_str.upper().strip().split(":")
+    if len(parts) == 1:
+        return parts[0], parts[0]
+    if len(parts) == 2:
+        return parts[0], parts[1]
+    raise ValueError(f"Invalid range reference: {range_str}")
+class WorkbookEngine:
+    """In-memory workbook operations backed by openpyxl."""
+    def __init__(self):
+        self._sessions: dict[str, WorkbookSession] = {}
+        self._workbooks: dict[str, openpyxl.Workbook] = {}
+    def load_workbook(self, session: WorkbookSession) -> None:
+        """Load a workbook from disk into memory for a session."""
+        if not os.path.isfile(session.workbook_path):
+            raise FileNotFoundError(f"Workbook not found: {session.workbook_path}")
+        wb = openpyxl.load_workbook(session.workbook_path, data_only=False)
+        self._sessions[session.session_id] = session
+        self._workbooks[session.session_id] = wb
+    def reset_workbook(self, session_id: str) -> None:
+        """Reload the original workbook from disk, discarding all edits."""
+        session = self._get_session(session_id)
+        session.modified_cells = []
+        session.step_count = 0
+        session.solved = False
+        wb = openpyxl.load_workbook(session.workbook_path, data_only=False)
+        self._workbooks[session_id] = wb
+    def close_session(self, session_id: str) -> None:
+        """Remove a session and free its workbook."""
+        self._sessions.pop(session_id, None)
+        self._workbooks.pop(session_id, None)
+    def list_sheets(self, session_id: str) -> list[dict]:
+        """Return sheet names with basic metadata."""
+        wb = self._get_wb(session_id)
+        sheets = []
+        for name in wb.sheetnames:
+            ws = wb[name]
+            sheets.append({
+                "name": name,
+                "min_row": ws.min_row,
+                "max_row": ws.max_row,
+                "min_column": ws.min_column,
+                "max_column": ws.max_column,
+                "state": ws.sheet_state,
+            })
+        return sheets
+    def read_range(self, session_id: str, sheet: str, range_str: str) -> list[list[Any]]:
+        """Read a rectangular range and return a 2D list of cell values."""
+        ws = self._get_sheet(session_id, sheet)
+        start, end = _parse_range_ref(range_str)
+        start_col, start_row = _parse_cell_ref(start)
+        end_col, end_row = _parse_cell_ref(end)
+        min_col = column_index_from_string(start_col)
+        max_col = column_index_from_string(end_col)
+        rows = []
+        for r in range(start_row, end_row + 1):
+            row_data = []
+            for c in range(min_col, max_col + 1):
+                cell = ws.cell(row=r, column=c)
+                row_data.append(self._cell_display_value(cell))
+            rows.append(row_data)
+        return rows
+    def read_cell(self, session_id: str, sheet: str, cell_ref: str) -> dict:
+        """Read a single cell and return value, formula, and type info."""
+        ws = self._get_sheet(session_id, sheet)
+        col_letter, row_num = _parse_cell_ref(cell_ref)
+        col_idx = column_index_from_string(col_letter)
+        cell = ws.cell(row=row_num, column=col_idx)
+        return {
+            "cell": cell_ref.upper(),
+            "value": self._cell_display_value(cell),
+            "formula": cell.value if isinstance(cell.value, str) and cell.value.startswith("=") else None,
+            "data_type": cell.data_type,
+            "number_format": cell.number_format,
+        }
+    def inspect_formula(self, session_id: str, sheet: str, cell_ref: str) -> dict:
+        """Return the raw formula string from a cell, or None if not a formula."""
+        ws = self._get_sheet(session_id, sheet)
+        col_letter, row_num = _parse_cell_ref(cell_ref)
+        col_idx = column_index_from_string(col_letter)
+        cell = ws.cell(row=row_num, column=col_idx)
+        raw = cell.value
+        is_formula = isinstance(raw, str) and raw.startswith("=")
+        return {
+            "cell": cell_ref.upper(),
+            "formula": raw if is_formula else None,
+            "is_formula": is_formula,
+        }
+    def write_cell(self, session_id: str, sheet: str, cell_ref: str, value: Any) -> dict:
+        """Write a value or formula to a single cell."""
+        session = self._get_session(session_id)
+        ws = self._get_sheet(session_id, sheet)
+        col_letter, row_num = _parse_cell_ref(cell_ref)
+        col_idx = column_index_from_string(col_letter)
+        ws.cell(row=row_num, column=col_idx, value=value)
+        session.modified_cells.append({
+            "sheet": sheet,
+            "cell": cell_ref.upper(),
+            "value": str(value),
+            "step": session.step_count,
+        })
+        return {"written": cell_ref.upper(), "sheet": sheet, "value": str(value)}
+    def write_range(self, session_id: str, sheet: str, start_cell: str, data: list[list[Any]]) -> dict:
+        """Write a 2D block of values starting from start_cell."""
+        session = self._get_session(session_id)
+        ws = self._get_sheet(session_id, sheet)
+        col_letter, start_row = _parse_cell_ref(start_cell)
+        start_col = column_index_from_string(col_letter)
+        cells_written = 0
+        for r_offset, row_data in enumerate(data):
+            for c_offset, val in enumerate(row_data):
+                row_num = start_row + r_offset
+                col_idx = start_col + c_offset
+                ws.cell(row=row_num, column=col_idx, value=val)
+                cell_ref = f"{get_column_letter(col_idx)}{row_num}"
+                session.modified_cells.append({
+                    "sheet": sheet,
+                    "cell": cell_ref,
+                    "value": str(val),
+                    "step": session.step_count,
+                })
+                cells_written += 1
+        end_row = start_row + len(data) - 1
+        end_col = start_col + (max(len(r) for r in data) - 1 if data else 0)
+        end_ref = f"{get_column_letter(end_col)}{end_row}"
+        return {
+            "range": f"{start_cell.upper()}:{end_ref}",
+            "sheet": sheet,
+            "cells_written": cells_written,
+        }
+    def copy_range(
+        self, session_id: str,
+        src_sheet: str, src_range: str,
+        dst_sheet: str, dst_start: str,
+    ) -> dict:
+        """Copy a range of cells from one location to another (values and formulas)."""
+        data = self.read_range(session_id, src_sheet, src_range)
+        src_ws = self._get_sheet(session_id, src_sheet)
+        start_ref, end_ref = _parse_range_ref(src_range)
+        start_col_letter, start_row = _parse_cell_ref(start_ref)
+        end_col_letter, end_row = _parse_cell_ref(end_ref)
+        min_col = column_index_from_string(start_col_letter)
+        max_col = column_index_from_string(end_col_letter)
+        raw_data = []
+        for r in range(start_row, end_row + 1):
+            row = []
+            for c in range(min_col, max_col + 1):
+                cell = src_ws.cell(row=r, column=c)
+                row.append(cell.value)
+            raw_data.append(row)
+        result = self.write_range(session_id, dst_sheet, dst_start, raw_data)
+        return {"copied_from": f"{src_sheet}!{src_range}", **result}
+    def get_edit_history(self, session_id: str) -> list[dict]:
+        """Return the list of all edits made in this session."""
+        session = self._get_session(session_id)
+        return list(session.modified_cells)
+    def get_session_info(self, session_id: str) -> dict:
+        """Return session metadata."""
+        session = self._get_session(session_id)
+        return {
+            "session_id": session.session_id,
+            "scenario_id": session.scenario_id,
+            "step_count": session.step_count,
+            "edits_made": len(session.modified_cells),
+            "solved": session.solved,
+        }
+    # ── Hidden test execution ──────────────────────────────────────────
+    def run_hidden_tests(self, session_id: str) -> dict:
+        """Run all hidden test checks for the current scenario and return results."""
+        session = self._get_session(session_id)
+        wb = self._get_wb(session_id)
+        test_path = os.path.join(HIDDEN_TESTS_DIR, f"{session.scenario_id}.json")
+        if not os.path.isfile(test_path):
+            return {"error": f"No hidden tests found for scenario {session.scenario_id}"}
+        with open(test_path) as f:
+            test_spec = json.load(f)
+        checks = test_spec.get("checks", [])
+        results = []
+        passed = 0
+        for check in checks:
+            result = self._run_single_check(wb, check)
+            results.append(result)
+            if result["passed"]:
+                passed += 1
+        total = len(checks)
+        pass_rate = passed / total if total > 0 else 0.0
+        session.solved = pass_rate == 1.0
+        return {
+            "scenario_id": session.scenario_id,
+            "total_checks": total,
+            "passed": passed,
+            "failed": total - passed,
+            "pass_rate": pass_rate,
+            "results": results,
+        }
+    def validate_partial(self, session_id: str) -> dict:
+        """Run hidden tests but return only pass/fail counts, not full answers."""
+        full = self.run_hidden_tests(session_id)
+        if "error" in full:
+            return full
+        return {
+            "scenario_id": full["scenario_id"],
+            "total_checks": full["total_checks"],
+            "passed": full["passed"],
+            "failed": full["failed"],
+            "pass_rate": full["pass_rate"],
+        }
+    # ── Target region helpers ──────────────────────────────────────────
+    def get_named_targets(self, session_id: str) -> list[dict]:
+        """Return scenario-defined target areas where the agent should write."""
+        session = self._get_session(session_id)
+        test_path = os.path.join(HIDDEN_TESTS_DIR, f"{session.scenario_id}.json")
+        if not os.path.isfile(test_path):
+            return []
+        with open(test_path) as f:
+            test_spec = json.load(f)
+        return test_spec.get("target_regions", [])
+    def is_in_target_region(self, session_id: str, sheet: str, cell_ref: str) -> bool:
+        """Check if a cell is within a designated target region."""
+        targets = self.get_named_targets(session_id)
+        if not targets:
+            return True
+        cell_ref = cell_ref.upper()
+        for t in targets:
+            if t.get("sheet") != sheet:
+                continue
+            t_range = t.get("range")
+            if t_range and self._cell_in_range(cell_ref, t_range):
+                return True
+        return False
+    # ── Private helpers ────────────────────────────────────────────────
+    def _get_session(self, session_id: str) -> WorkbookSession:
+        if session_id not in self._sessions:
+            raise KeyError(f"Session not found: {session_id}")
+        return self._sessions[session_id]
+    def _get_wb(self, session_id: str) -> openpyxl.Workbook:
+        if session_id not in self._workbooks:
+            raise KeyError(f"No workbook loaded for session: {session_id}")
+        return self._workbooks[session_id]
+    def _get_sheet(self, session_id: str, sheet_name: str):
+        wb = self._get_wb(session_id)
+        if sheet_name not in wb.sheetnames:
+            raise ValueError(f"Sheet '{sheet_name}' not found. Available: {wb.sheetnames}")
+        return wb[sheet_name]
+    def _cell_display_value(self, cell) -> Any:
+        """Return a JSON-safe display value for a cell."""
+        val = cell.value
+        if val is None:
+            return None
+        if isinstance(val, str) and val.startswith("="):
+            return val
+        if isinstance(val, (datetime, date)):
+            return val.isoformat()
+        return val
+    def _cell_in_range(self, cell_ref: str, range_str: str) -> bool:
+        """Check if cell_ref falls within range_str (e.g. 'B2:D10')."""
+        start, end = _parse_range_ref(range_str)
+        s_col, s_row = _parse_cell_ref(start)
+        e_col, e_row = _parse_cell_ref(end)
+        c_col, c_row = _parse_cell_ref(cell_ref)
+        return (
+            column_index_from_string(s_col) <= column_index_from_string(c_col) <= column_index_from_string(e_col)
+            and s_row <= c_row <= e_row
+        )
+    def _run_single_check(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        """Execute a single hidden test check against the workbook."""
+        check_type = self._determine_check_type(check)
+        try:
+            if check_type == "expected_formula":
+                return self._check_expected_formula(wb, check)
+            elif check_type == "expected_value_range":
+                return self._check_expected_value_range(wb, check)
+            elif check_type == "no_blanks":
+                return self._check_no_blanks(wb, check)
+            elif check_type == "row_count_equals":
+                return self._check_row_count_equals(wb, check)
+            elif check_type == "all_dates_iso_format":
+                return self._check_all_dates_iso(wb, check)
+            elif check_type == "constraint_satisfaction":
+                return self._check_constraint_satisfaction(wb, check)
+            else:
+                return {"check": check_type, "passed": False, "reason": f"Unknown check type: {check_type}"}
+        except Exception as e:
+            return {"check": check_type, "passed": False, "reason": str(e)}
+    def _determine_check_type(self, check: dict) -> str:
+        if "expected_formula" in check:
+            return "expected_formula"
+        if "expected_value_range" in check:
+            return "expected_value_range"
+        return check.get("check", "unknown")
+    def _check_expected_formula(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        ws = wb[check["sheet"]]
+        col_letter, row_num = _parse_cell_ref(check["cell"])
+        col_idx = column_index_from_string(col_letter)
+        cell = ws.cell(row=row_num, column=col_idx)
+        actual = cell.value
+        expected = check["expected_formula"]
+        passed = isinstance(actual, str) and actual.strip() == expected.strip()
+        return {
+            "check": "expected_formula",
+            "cell": f"{check['sheet']}!{check['cell']}",
+            "passed": passed,
+            "expected": expected,
+            "actual": actual,
+        }
+    def _check_expected_value_range(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        ws = wb[check["sheet"]]
+        col_letter, row_num = _parse_cell_ref(check["cell"])
+        col_idx = column_index_from_string(col_letter)
+        cell = ws.cell(row=row_num, column=col_idx)
+        val = cell.value
+        lo, hi = check["expected_value_range"]
+        if isinstance(val, str) and val.startswith("="):
+            from .formula_utils import evaluate_formula
+            val = evaluate_formula(wb, check["sheet"], check["cell"])
+        numeric = self._to_numeric(val)
+        if numeric is None:
+            return {
+                "check": "expected_value_range",
+                "cell": f"{check['sheet']}!{check['cell']}",
+                "passed": False,
+                "reason": f"Non-numeric value: {val}",
+            }
+        passed = lo <= numeric <= hi
+        return {
+            "check": "expected_value_range",
+            "cell": f"{check['sheet']}!{check['cell']}",
+            "passed": passed,
+            "expected_range": [lo, hi],
+            "actual": numeric,
+        }
+    def _check_no_blanks(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        ws = wb[check["sheet"]]
+        range_str = check["range"]
+        start, end = _parse_range_ref(range_str)
+        s_col, s_row = _parse_cell_ref(start)
+        e_col, e_row = _parse_cell_ref(end)
+        min_col = column_index_from_string(s_col)
+        max_col = column_index_from_string(e_col)
+        blanks = []
+        for r in range(s_row, e_row + 1):
+            for c in range(min_col, max_col + 1):
+                if ws.cell(row=r, column=c).value is None:
+                    blanks.append(f"{get_column_letter(c)}{r}")
+        return {
+            "check": "no_blanks",
+            "range": f"{check['sheet']}!{range_str}",
+            "passed": len(blanks) == 0,
+            "blank_count": len(blanks),
+        }
+    def _check_row_count_equals(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        ws = wb[check["sheet"]]
+        expected = check["value"]
+        actual = 0
+        for row in ws.iter_rows(min_row=2):
+            if any(cell.value is not None for cell in row):
+                actual += 1
+        return {
+            "check": "row_count_equals",
+            "sheet": check["sheet"],
+            "passed": actual == expected,
+            "expected": expected,
+            "actual": actual,
+        }
+    def _check_all_dates_iso(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        ws = wb[check["sheet"]]
+        col_letter = check["column"].upper()
+        col_idx = column_index_from_string(col_letter)
+        iso_re = re.compile(r"^\d{4}-\d{2}-\d{2}")
+        non_iso = []
+        for r in range(2, ws.max_row + 1):
+            val = ws.cell(row=r, column=col_idx).value
+            if val is None:
+                continue
+            if isinstance(val, (datetime, date)):
+                continue
+            if isinstance(val, str) and iso_re.match(val):
+                continue
+            non_iso.append(f"{col_letter}{r}: {val}")
+        return {
+            "check": "all_dates_iso_format",
+            "column": f"{check['sheet']}!{col_letter}",
+            "passed": len(non_iso) == 0,
+            "non_iso_count": len(non_iso),
+        }
+    def _check_constraint_satisfaction(self, wb: openpyxl.Workbook, check: dict) -> dict:
+        """Evaluate domain constraints from a constraints sheet against an output sheet.
+        Constraints are read from prose text in column A of the constraints sheet.
+        Each row is a rule. The engine checks common patterns:
+        - "No employee works >N days"
+        - "Night→Morning gap required"
+        These are matched via regex and evaluated against the output grid.
+        """
+        output_sheet = check["sheet"]
+        constraints_sheet = check.get("constraints_sheet", "Constraints")
+        if output_sheet not in wb.sheetnames:
+            return {"check": "constraint_satisfaction", "passed": False, "reason": f"Sheet '{output_sheet}' not found"}
+        if constraints_sheet not in wb.sheetnames:
+            return {"check": "constraint_satisfaction", "passed": False, "reason": f"Sheet '{constraints_sheet}' not found"}
+        ws_out = wb[output_sheet]
+        ws_con = wb[constraints_sheet]
+        constraints = []
+        for row in ws_con.iter_rows(min_col=1, max_col=1, values_only=True):
+            if row[0] and isinstance(row[0], str) and row[0].strip():
+                constraints.append(row[0].strip())
+        violations = []
+        for constraint_text in constraints:
+            violation = self._evaluate_constraint(ws_out, constraint_text)
+            if violation:
+                violations.append(violation)
+        return {
+            "check": "constraint_satisfaction",
+            "sheet": output_sheet,
+            "passed": len(violations) == 0,
+            "total_constraints": len(constraints),
+            "violations": violations,
+        }
+    def _evaluate_constraint(self, ws, constraint_text: str) -> Optional[str]:
+        """Evaluate a single prose constraint against the output sheet.
+        Returns a violation description or None if satisfied."""
+        text_lower = constraint_text.lower()
+        max_days_match = re.search(r"no employee works?\s*>\s*(\d+)\s*days?", text_lower)
+        if max_days_match:
+            max_days = int(max_days_match.group(1))
+            for row in ws.iter_rows(min_row=2):
+                working_days = sum(
+                    1 for cell in row[1:]
+                    if cell.value is not None
+                    and str(cell.value).upper().strip() not in ("X", "", "OFF")
+                )
+                emp = row[0].value
+                if working_days > max_days:
+                    return f"{emp} works {working_days} days (max {max_days})"
+            return None
+        if "night" in text_lower and "morning" in text_lower and "gap" in text_lower:
+            for row in ws.iter_rows(min_row=2):
+                emp = row[0].value
+                shifts = [str(cell.value).upper().strip() if cell.value else "X" for cell in row[1:]]
+                for i in range(len(shifts) - 1):
+                    if shifts[i] == "N" and shifts[i + 1] == "M":
+                        return f"{emp} has Night→Morning on days {i + 1}→{i + 2}"
+            return None
+        return None
+    def _to_numeric(self, val: Any) -> Optional[float]:
+        if val is None:
+            return None
+        if isinstance(val, (int, float)):
+            return float(val)
+        if isinstance(val, str):
+            cleaned = val.replace(",", "").replace("$", "").replace("%", "").strip()
+            try:
+                return float(cleaned)
+            except (ValueError, TypeError):
+                return None
+        return None

spreadsheet.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,16 @@

+Metadata-Version: 2.4
+Name: spreadsheet
+Version: 0.1.0
+Requires-Python: >=3.11
+Requires-Dist: openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@v0.2.1
+Requires-Dist: fastapi>=0.115.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: uvicorn[standard]>=0.24.0
+Requires-Dist: fastmcp>=0.1.0
+Requires-Dist: httpx>=0.25.0
+Requires-Dist: openpyxl>=3.1.0
+Requires-Dist: pandas>=2.0.0
+Requires-Dist: formulas>=1.2.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

spreadsheet.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+README.md
+__init__.py
+client.py
+generate_scenarios.py
+models.py
+pyproject.toml
+./__init__.py
+./client.py
+./generate_scenarios.py
+./models.py
+server/__init__.py
+server/app.py
+server/formula_utils.py
+server/scenario_loader.py
+server/spreadsheet_environment.py
+server/workbook_engine.py
+spreadsheet.egg-info/PKG-INFO
+spreadsheet.egg-info/SOURCES.txt
+spreadsheet.egg-info/dependency_links.txt
+spreadsheet.egg-info/entry_points.txt
+spreadsheet.egg-info/requires.txt
+spreadsheet.egg-info/top_level.txt

spreadsheet.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

spreadsheet.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = spreadsheet.server.app:main

spreadsheet.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git@v0.2.1
+fastapi>=0.115.0
+pydantic>=2.0.0
+uvicorn[standard]>=0.24.0
+fastmcp>=0.1.0
+httpx>=0.25.0
+openpyxl>=3.1.0
+pandas>=2.0.0
+formulas>=1.2.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

spreadsheet.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ spreadsheet

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

workbooks/fixtures/.gitkeep ADDED Viewed

File without changes

workbooks/fixtures/037858b5-3d0e-4714-8640-2dea23fc3a18_multi_currency_reconciliation.xlsx ADDED Viewed

Binary file (8 kB). View file

workbooks/fixtures/1333ba32-7957-4f7f-b310-6a9ba0e718bd_data_pivot_reshape.xlsx ADDED Viewed

Binary file (9.93 kB). View file

workbooks/fixtures/15123e53-9510-48d4-ae1a-a01556145b8e_employee_bonus_calculation.xlsx ADDED Viewed

Binary file (9.15 kB). View file

workbooks/fixtures/158cebfc-4813-49c4-bd54-fceef44c4860_employee_schedule_grid.xlsx ADDED Viewed

Binary file (7.38 kB). View file

workbooks/fixtures/19d8a671-1769-45aa-af51-39d12e81d45c_multi_currency_reconciliation.xlsx ADDED Viewed

Binary file (8 kB). View file

workbooks/fixtures/30f43287-34a1-4620-a9ae-3d982705a5e5_bank_reconciliation.xlsx ADDED Viewed

Binary file (10.8 kB). View file

workbooks/fixtures/45bb730b-c042-491e-9f7d-ff9ff3de25a6_cascading_formula_errors.xlsx ADDED Viewed

Binary file (6.49 kB). View file

workbooks/fixtures/5a43518a-7b4c-4e49-a2e3-ae0e550f5351_multi_department_budget.xlsx ADDED Viewed

Binary file (9.1 kB). View file