Spaces:
Sleeping
Sleeping
feat: add eval baseline script (inference testing for env), report, prompter, and env config
Browse files- EVAL_REPORT.md +73 -0
- eval_baseline.py +592 -0
- openenv.yaml +29 -0
- uv.lock +8 -0
EVAL_REPORT.md
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Evaluation Report: Project Polymath
|
| 2 |
+
|
| 3 |
+
## 1. Executive Summary
|
| 4 |
+
Project Polymath addresses the "Sycophancy Gap" in current Large Language Models (LLMs)—the tendency to prioritize polite, generic prose over strict adherence to conflicting stakeholder constraints.
|
| 5 |
+
|
| 6 |
+
To bridge this gap, we utilize a **two-stage curriculum** on the **OpenEnv** framework:
|
| 7 |
+
1. **Stage 1 (Easy):** Efficiency in hidden constraint discovery (Research).
|
| 8 |
+
2. **Stage 2 (Medium):** Balanced synthesis using a **Harmonic Mean Reward** (Decision Making).
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## 2. Curriculum Stage 1: Research Stability
|
| 13 |
+
We validated the environment using a **Scripted Oracle** to ensure the task is solvable and the rewards are deterministic. We then ran a baseline using the untrained LLM.
|
| 14 |
+
|
| 15 |
+
### Easy Mode: Performance Metrics
|
| 16 |
+
| Metric | Oracle Scripted | Base LLM (Pre-Training) | Delta |
|
| 17 |
+
| :--- | :---: | :---: | :---: |
|
| 18 |
+
| **Completion Rate** | 1.00 | 1.00 | -- |
|
| 19 |
+
| **Avg. Cumulative Reward** | 0.99 | 0.825 | -0.165 |
|
| 20 |
+
| **Avg. Final Step Reward** | 0.33 | 0.264 | -0.066 |
|
| 21 |
+
| **Avg. Turns Completed** | 3.0 | 3.2 | +0.2 turns |
|
| 22 |
+
| **Constraint Discovery Rate**| 100% | 100% | -- |
|
| 23 |
+
|
| 24 |
+
### The "Policy Efficiency" Gap
|
| 25 |
+
While the Base LLM successfully discovers the constraints, it demonstrates **sloppy policy logic**:
|
| 26 |
+
* **Redundancy:** Episodes 2 and 4 showed the agent asking the same expert identical questions multiple times.
|
| 27 |
+
* **Shortcut Misuse:** Episode 5 showed the agent using `target="All"` to broadcast, resulting in a **0.0 reward** due to environment-enforced privacy/discipline penalties.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## 3. Curriculum Stage 2: Synthesis & The "Final Boss"
|
| 32 |
+
In Medium Mode, we shift the reward signal from "Discovery" to "Synthesis." The model must now generate a PRD that satisfies all three constraints simultaneously.
|
| 33 |
+
|
| 34 |
+
### Medium Mode Baseline (The Problem)
|
| 35 |
+
| Metric | Base LLM (Pre-Training) | Target (Post-Training) |
|
| 36 |
+
| :--- | :--- | :--- |
|
| 37 |
+
| **Avg. Final Reward** | **0.00** | **> 0.90** |
|
| 38 |
+
| **Synthesis Accuracy** | **Low** | **High** |
|
| 39 |
+
|
| 40 |
+
**Observation:**
|
| 41 |
+
Even when the Base LLM knows the constraints (e.g., $50k budget), it often omits them in the final PRD in favor of professional-sounding "filler" text. Because we use a **Harmonic Mean Reward**, failing to satisfy even one stakeholder (Finance, Security, or UX) results in a total reward collapse to **0.0**.
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## 4. Training Roadmap (Onsite 36-Hour Sprint)
|
| 46 |
+
Our objective is to close the gap between the **Base LLM** and the **Oracle**.
|
| 47 |
+
|
| 48 |
+
### Training Targets
|
| 49 |
+
| Focus Area | Metric | Baseline | Target |
|
| 50 |
+
| :--- | :--- | :---: | :---: |
|
| 51 |
+
| **Policy** | Broadcast (`target="All"`) Usage | Present | **0%** |
|
| 52 |
+
| **Policy** | Repeated Query Rate | Present | **< 5%** |
|
| 53 |
+
| **Logic** | Multi-Constraint Synthesis | 0% | **> 90%** |
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## 5. Judging Narrative & Pitch
|
| 58 |
+
> "Our evaluation proves that LLMs don't fail because they are 'uninformed'; they fail because their default policy is inefficient and sycophantic. We built a stable, verifiable gym where the Oracle proves perfection is possible. Our GRPO training will move the agent from a **0.825** sloppy researcher to a **0.99** disciplined negotiator and bridge the **0.0 to 0.9** gap in multi-stakeholder synthesis."
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## 6. Failure Breakdown (Pre-Training)
|
| 63 |
+
| Failure Type | Count | Interpretation |
|
| 64 |
+
| :--- | :---: | :--- |
|
| 65 |
+
| Policy Loops | 2/10 | Asked the same question 3 times in one episode. |
|
| 66 |
+
| Broadcast Penalty | 2/10 | Tried to message 'All' to skip individual negotiation. |
|
| 67 |
+
| Synthesis Failure | 10/10 | Failed to include all 3 exact patterns in the final PRD (Medium). |
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
The Story This Tells the Judges:
|
| 72 |
+
|
| 73 |
+
If you walk into the judging room with a model that already scores a $1.0$, the judges will say, "You didn't need RL for this. Just use a better prompt."By showing them this baseline, you are saying:"Prompt engineering isn't enough for complex, multi-agent negotiation. The base model gets distracted (Episode 5), hallucinates JSON (Episode 3), and fails to synthesize all constraints into the final document (Episode 2). It's a sycophant that tries to please the last person it talked to. We need Reinforcement Learning to fix this."
|
eval_baseline.py
ADDED
|
@@ -0,0 +1,592 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import logging
|
| 3 |
+
import os
|
| 4 |
+
import re
|
| 5 |
+
import time
|
| 6 |
+
|
| 7 |
+
from dataclasses import dataclass
|
| 8 |
+
from typing import Optional
|
| 9 |
+
|
| 10 |
+
from pydantic import ValidationError
|
| 11 |
+
|
| 12 |
+
try:
|
| 13 |
+
from dotenv import load_dotenv
|
| 14 |
+
except ImportError:
|
| 15 |
+
def load_dotenv():
|
| 16 |
+
return False
|
| 17 |
+
|
| 18 |
+
try:
|
| 19 |
+
from openai import OpenAI
|
| 20 |
+
except ImportError:
|
| 21 |
+
OpenAI = None
|
| 22 |
+
|
| 23 |
+
from envs.environment import WorkSpaceEnvironment
|
| 24 |
+
from models.schemas import WorkSpaceAction, WorkspaceState
|
| 25 |
+
from prompter.system_prompt import SystemPrompt
|
| 26 |
+
|
| 27 |
+
load_dotenv()
|
| 28 |
+
|
| 29 |
+
logging.basicConfig(
|
| 30 |
+
level=logging.INFO,
|
| 31 |
+
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
|
| 32 |
+
)
|
| 33 |
+
logger = logging.getLogger(__name__)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
SCRIPTED_QUESTIONS = {
|
| 37 |
+
"Finance": (
|
| 38 |
+
"Hi Finance, what budget guardrails should the PRD lock in for the first release? "
|
| 39 |
+
"Please call out the hard budget cap and any scope discipline we should preserve."
|
| 40 |
+
),
|
| 41 |
+
"Security": (
|
| 42 |
+
"Hi Security, what authentication requirement is non-negotiable for this app? "
|
| 43 |
+
"Please tell me the strongest user-verification control that must appear in the PRD."
|
| 44 |
+
),
|
| 45 |
+
"UX": (
|
| 46 |
+
"Hi UX, what checkout experience must the PRD guarantee for launch? "
|
| 47 |
+
"Please describe the required conversion flow in plain terms."
|
| 48 |
+
),
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def normalize_agent_mode(mode: str | None) -> str:
|
| 53 |
+
canonical = (mode or "").strip().lower()
|
| 54 |
+
aliases = {
|
| 55 |
+
"": "scripted",
|
| 56 |
+
"scripted": "scripted",
|
| 57 |
+
"medium": "medium",
|
| 58 |
+
"mock": "scripted",
|
| 59 |
+
"deterministic": "scripted",
|
| 60 |
+
"llm": "llm",
|
| 61 |
+
"local": "local",
|
| 62 |
+
"trained": "local",
|
| 63 |
+
"live": "llm",
|
| 64 |
+
"online": "llm",
|
| 65 |
+
"remote": "llm",
|
| 66 |
+
"api": "llm",
|
| 67 |
+
}
|
| 68 |
+
if canonical not in aliases:
|
| 69 |
+
raise ValueError(f"Unsupported agent mode: {mode}")
|
| 70 |
+
return aliases[canonical]
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
@dataclass
|
| 74 |
+
class AgentDecision:
|
| 75 |
+
action: Optional[WorkSpaceAction]
|
| 76 |
+
status: str = "ok"
|
| 77 |
+
error: Optional[str] = None
|
| 78 |
+
raw_response: Optional[str] = None
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
class AgentWrapper:
|
| 82 |
+
def __init__(self, mode: str | None = None):
|
| 83 |
+
requested_mode = mode or os.getenv("BASELINE_AGENT_MODE") or "scripted"
|
| 84 |
+
self.mode = normalize_agent_mode(requested_mode)
|
| 85 |
+
self.model_name = os.getenv("AGENT_MODEL_NAME") or os.getenv("MODEL_NAME") or "llama-3.1-8b-instant"
|
| 86 |
+
self.prompt_builder = SystemPrompt()
|
| 87 |
+
self.client: object | None = None
|
| 88 |
+
self.local_model = None
|
| 89 |
+
self.local_tokenizer = None
|
| 90 |
+
self._torch = None
|
| 91 |
+
|
| 92 |
+
if self.mode == "llm":
|
| 93 |
+
if OpenAI is None:
|
| 94 |
+
raise RuntimeError("openai package is required for llm agent mode.")
|
| 95 |
+
self.client = OpenAI(
|
| 96 |
+
base_url=os.getenv("AGENT_API_BASE_URL") or os.getenv("API_BASE_URL_1"),
|
| 97 |
+
api_key=os.getenv("AGENT_API_KEY") or os.getenv("GROQ_API_KEY"),
|
| 98 |
+
timeout=45.0,
|
| 99 |
+
max_retries=2,
|
| 100 |
+
)
|
| 101 |
+
elif self.mode == "local":
|
| 102 |
+
try:
|
| 103 |
+
import torch
|
| 104 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 105 |
+
except ImportError as exc:
|
| 106 |
+
raise RuntimeError("transformers and torch are required for local agent mode.") from exc
|
| 107 |
+
model_path = os.getenv("LOCAL_AGENT_MODEL_PATH")
|
| 108 |
+
if not model_path:
|
| 109 |
+
raise RuntimeError("Set LOCAL_AGENT_MODEL_PATH for local agent mode.")
|
| 110 |
+
self._torch = torch
|
| 111 |
+
self.local_tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 112 |
+
if self.local_tokenizer.pad_token is None:
|
| 113 |
+
self.local_tokenizer.pad_token = self.local_tokenizer.eos_token
|
| 114 |
+
self.local_model = AutoModelForCausalLM.from_pretrained(
|
| 115 |
+
model_path,
|
| 116 |
+
torch_dtype="auto",
|
| 117 |
+
device_map="auto",
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
self.reset_episode()
|
| 121 |
+
|
| 122 |
+
def reset_episode(self):
|
| 123 |
+
self.scripted_targets = ["Finance", "Security", "UX"]
|
| 124 |
+
self.final_draft = self._build_final_draft()
|
| 125 |
+
|
| 126 |
+
def get_action(
|
| 127 |
+
self,
|
| 128 |
+
observation_text: str,
|
| 129 |
+
conversation_history: list[dict[str, str]],
|
| 130 |
+
discovered_constraints: str,
|
| 131 |
+
) -> AgentDecision:
|
| 132 |
+
if self.mode == "scripted":
|
| 133 |
+
return self._scripted_action(observation_text)
|
| 134 |
+
if self.mode == "local":
|
| 135 |
+
return self._local_action(observation_text, conversation_history, discovered_constraints)
|
| 136 |
+
return self._llm_action(observation_text, conversation_history, discovered_constraints)
|
| 137 |
+
|
| 138 |
+
def _scripted_action(self, observation_text: str) -> AgentDecision:
|
| 139 |
+
current_turn = self._extract_turn(observation_text)
|
| 140 |
+
|
| 141 |
+
# Gather constraints 1-by-1
|
| 142 |
+
if current_turn < len(self.scripted_targets):
|
| 143 |
+
target = self.scripted_targets[current_turn]
|
| 144 |
+
return AgentDecision(
|
| 145 |
+
action=WorkSpaceAction(
|
| 146 |
+
action_type="message_expert",
|
| 147 |
+
target=target,
|
| 148 |
+
content=SCRIPTED_QUESTIONS[target],
|
| 149 |
+
)
|
| 150 |
+
)
|
| 151 |
+
|
| 152 |
+
# Propose a draft on Turn 4
|
| 153 |
+
if current_turn == len(self.scripted_targets):
|
| 154 |
+
return AgentDecision(
|
| 155 |
+
action=WorkSpaceAction(
|
| 156 |
+
action_type="propose_draft",
|
| 157 |
+
target="All",
|
| 158 |
+
content=self._build_draft_proposal(),
|
| 159 |
+
)
|
| 160 |
+
)
|
| 161 |
+
|
| 162 |
+
if current_turn == len(self.scripted_targets) + 1:
|
| 163 |
+
return AgentDecision(
|
| 164 |
+
action=WorkSpaceAction(
|
| 165 |
+
action_type="submit_final",
|
| 166 |
+
target=None,
|
| 167 |
+
content=self.final_draft,
|
| 168 |
+
)
|
| 169 |
+
)
|
| 170 |
+
|
| 171 |
+
return AgentDecision(action=None, status="completed")
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
def _llm_action(
|
| 175 |
+
self,
|
| 176 |
+
observation_text: str,
|
| 177 |
+
conversation_history: list[dict[str, str]],
|
| 178 |
+
discovered_constraints: str,
|
| 179 |
+
) -> AgentDecision:
|
| 180 |
+
if self.client is None:
|
| 181 |
+
return AgentDecision(
|
| 182 |
+
action=None,
|
| 183 |
+
status="infra_error",
|
| 184 |
+
error="Agent client is not configured for llm mode.",
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
system_prompt = self.prompt_builder.system_prompt(
|
| 188 |
+
conversation_history=self._render_history(conversation_history),
|
| 189 |
+
discovered=discovered_constraints,
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
try:
|
| 193 |
+
response = self.client.chat.completions.create(
|
| 194 |
+
messages=[
|
| 195 |
+
{"role": "system", "content": system_prompt},
|
| 196 |
+
*conversation_history,
|
| 197 |
+
{"role": "user", "content": observation_text},
|
| 198 |
+
],
|
| 199 |
+
model=self.model_name,
|
| 200 |
+
temperature=0.2,
|
| 201 |
+
max_tokens=2048,
|
| 202 |
+
response_format={"type": "json_object"}
|
| 203 |
+
)
|
| 204 |
+
except Exception as exc:
|
| 205 |
+
logger.error(f"Agent API Error: {exc}")
|
| 206 |
+
return AgentDecision(action=None, status="infra_error", error=str(exc))
|
| 207 |
+
|
| 208 |
+
raw_text = (response.choices[0].message.content or "").strip()
|
| 209 |
+
json_match = re.search(r"\{.*?\}", raw_text, re.DOTALL)
|
| 210 |
+
if not json_match:
|
| 211 |
+
return AgentDecision(
|
| 212 |
+
action=None,
|
| 213 |
+
status="parse_error",
|
| 214 |
+
error="Model response did not contain a JSON object.",
|
| 215 |
+
raw_response=raw_text,
|
| 216 |
+
)
|
| 217 |
+
|
| 218 |
+
try:
|
| 219 |
+
payload = json.loads(json_match.group(0))
|
| 220 |
+
except json.JSONDecodeError as exc:
|
| 221 |
+
return AgentDecision(
|
| 222 |
+
action=None,
|
| 223 |
+
status="parse_error",
|
| 224 |
+
error=f"Invalid JSON payload: {exc}",
|
| 225 |
+
raw_response=raw_text,
|
| 226 |
+
)
|
| 227 |
+
|
| 228 |
+
try:
|
| 229 |
+
action = WorkSpaceAction(**payload)
|
| 230 |
+
except ValidationError as exc:
|
| 231 |
+
return AgentDecision(
|
| 232 |
+
action=None,
|
| 233 |
+
status="policy_error",
|
| 234 |
+
error=f"Schema validation failed: {exc}",
|
| 235 |
+
raw_response=raw_text,
|
| 236 |
+
)
|
| 237 |
+
|
| 238 |
+
semantic_error = self._validate_action(action)
|
| 239 |
+
if semantic_error:
|
| 240 |
+
return AgentDecision(
|
| 241 |
+
action=None,
|
| 242 |
+
status="policy_error",
|
| 243 |
+
error=semantic_error,
|
| 244 |
+
raw_response=raw_text,
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
return AgentDecision(action=action, raw_response=raw_text)
|
| 248 |
+
|
| 249 |
+
def _local_action(
|
| 250 |
+
self,
|
| 251 |
+
observation_text: str,
|
| 252 |
+
conversation_history: list[dict[str, str]],
|
| 253 |
+
discovered_constraints: str,
|
| 254 |
+
) -> AgentDecision:
|
| 255 |
+
if self.local_model is None or self.local_tokenizer is None:
|
| 256 |
+
return AgentDecision(
|
| 257 |
+
action=None,
|
| 258 |
+
status="infra_error",
|
| 259 |
+
error="Local model is not configured for local agent mode.",
|
| 260 |
+
)
|
| 261 |
+
|
| 262 |
+
system_prompt = self.prompt_builder.system_prompt(
|
| 263 |
+
conversation_history=self._render_history(conversation_history),
|
| 264 |
+
discovered=discovered_constraints,
|
| 265 |
+
)
|
| 266 |
+
|
| 267 |
+
messages = [
|
| 268 |
+
{"role": "system", "content": system_prompt},
|
| 269 |
+
*conversation_history,
|
| 270 |
+
{"role": "user", "content": observation_text},
|
| 271 |
+
]
|
| 272 |
+
|
| 273 |
+
try:
|
| 274 |
+
if hasattr(self.local_tokenizer, "apply_chat_template"):
|
| 275 |
+
prompt_text = self.local_tokenizer.apply_chat_template(
|
| 276 |
+
messages,
|
| 277 |
+
tokenize=False,
|
| 278 |
+
add_generation_prompt=True,
|
| 279 |
+
)
|
| 280 |
+
else:
|
| 281 |
+
prompt_text = (
|
| 282 |
+
f"System: {system_prompt}\n"
|
| 283 |
+
+ "\n".join(f"{m['role']}: {m['content']}" for m in conversation_history)
|
| 284 |
+
+ f"\nuser: {observation_text}\nassistant:"
|
| 285 |
+
)
|
| 286 |
+
|
| 287 |
+
inputs = self.local_tokenizer(prompt_text, return_tensors="pt")
|
| 288 |
+
inputs = {k: v.to(self.local_model.device) for k, v in inputs.items()}
|
| 289 |
+
prompt_len = inputs["input_ids"].shape[1]
|
| 290 |
+
|
| 291 |
+
with self._torch.no_grad():
|
| 292 |
+
output_ids = self.local_model.generate(
|
| 293 |
+
**inputs,
|
| 294 |
+
max_new_tokens=256,
|
| 295 |
+
do_sample=False,
|
| 296 |
+
temperature=0.0,
|
| 297 |
+
pad_token_id=self.local_tokenizer.pad_token_id,
|
| 298 |
+
)
|
| 299 |
+
|
| 300 |
+
completion_ids = output_ids[0][prompt_len:]
|
| 301 |
+
raw_text = self.local_tokenizer.decode(completion_ids, skip_special_tokens=True).strip()
|
| 302 |
+
except Exception as exc:
|
| 303 |
+
logger.error(f"Local Agent Error: {exc}")
|
| 304 |
+
return AgentDecision(action=None, status="infra_error", error=str(exc))
|
| 305 |
+
|
| 306 |
+
json_match = re.search(r"\{.*?\}", raw_text, re.DOTALL)
|
| 307 |
+
if not json_match:
|
| 308 |
+
return AgentDecision(
|
| 309 |
+
action=None,
|
| 310 |
+
status="parse_error",
|
| 311 |
+
error="Model response did not contain a JSON object.",
|
| 312 |
+
raw_response=raw_text,
|
| 313 |
+
)
|
| 314 |
+
|
| 315 |
+
try:
|
| 316 |
+
payload = json.loads(json_match.group(0))
|
| 317 |
+
except json.JSONDecodeError as exc:
|
| 318 |
+
return AgentDecision(
|
| 319 |
+
action=None,
|
| 320 |
+
status="parse_error",
|
| 321 |
+
error=f"Invalid JSON payload: {exc}",
|
| 322 |
+
raw_response=raw_text,
|
| 323 |
+
)
|
| 324 |
+
|
| 325 |
+
try:
|
| 326 |
+
action = WorkSpaceAction(**payload)
|
| 327 |
+
except ValidationError as exc:
|
| 328 |
+
return AgentDecision(
|
| 329 |
+
action=None,
|
| 330 |
+
status="policy_error",
|
| 331 |
+
error=f"Schema validation failed: {exc}",
|
| 332 |
+
raw_response=raw_text,
|
| 333 |
+
)
|
| 334 |
+
|
| 335 |
+
semantic_error = self._validate_action(action)
|
| 336 |
+
if semantic_error:
|
| 337 |
+
return AgentDecision(
|
| 338 |
+
action=None,
|
| 339 |
+
status="policy_error",
|
| 340 |
+
error=semantic_error,
|
| 341 |
+
raw_response=raw_text,
|
| 342 |
+
)
|
| 343 |
+
|
| 344 |
+
return AgentDecision(action=action, raw_response=raw_text)
|
| 345 |
+
|
| 346 |
+
def _build_draft_proposal(self) -> str:
|
| 347 |
+
return (
|
| 348 |
+
"Draft PRD proposal for the mobile app MVP:\n"
|
| 349 |
+
"- Keep the initial release budget capped at $50k and prioritize the highest-ROI scope.\n"
|
| 350 |
+
"- Require biometric 2FA for sign-in and sensitive actions.\n"
|
| 351 |
+
"- Deliver a true single-click checkout so the purchase flow stays low-friction."
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
def _build_final_draft(self) -> str:
|
| 355 |
+
return (
|
| 356 |
+
"Mobile App PRD Final Draft\n"
|
| 357 |
+
"1. Budget and scope: The first release must stay at or below a $50k budget cap, with the MVP limited to the highest-ROI features.\n"
|
| 358 |
+
"2. Security: The app must require biometric 2FA for login and other sensitive account actions.\n"
|
| 359 |
+
"3. UX: Checkout must be implemented as a single-click checkout flow with minimal friction for the user.\n"
|
| 360 |
+
"4. Delivery focus: Product, design, and engineering should keep the implementation lean so these launch requirements are met without scope creep."
|
| 361 |
+
)
|
| 362 |
+
|
| 363 |
+
def _validate_action(self, action: WorkSpaceAction) -> Optional[str]:
|
| 364 |
+
if not action.content.strip():
|
| 365 |
+
return "Action content cannot be empty."
|
| 366 |
+
|
| 367 |
+
if action.action_type == "message_expert" and action.target is None:
|
| 368 |
+
return "message_expert actions must include a target expert."
|
| 369 |
+
if action.action_type == "message_expert" and action.target == "All":
|
| 370 |
+
return "message_expert must target exactly one expert; do not use target='All'."
|
| 371 |
+
if action.action_type == "propose_draft" and action.target != "All":
|
| 372 |
+
return "propose_draft actions must use target='All' to collect multi-expert draft feedback."
|
| 373 |
+
|
| 374 |
+
if action.action_type == "submit_final" and action.target is not None:
|
| 375 |
+
return "submit_final actions must use target=null."
|
| 376 |
+
|
| 377 |
+
return None
|
| 378 |
+
|
| 379 |
+
def _render_history(self, conversation_history: list[dict[str, str]], max_items: int = 8) -> str:
|
| 380 |
+
if not conversation_history:
|
| 381 |
+
return "No prior conversation yet."
|
| 382 |
+
|
| 383 |
+
rendered = []
|
| 384 |
+
for message in conversation_history[-max_items:]:
|
| 385 |
+
content = message["content"].replace("\n", " ").strip()
|
| 386 |
+
rendered.append(f"{message['role']}: {content}")
|
| 387 |
+
return "\n".join(rendered)
|
| 388 |
+
|
| 389 |
+
def _extract_turn(self, observation_text: str) -> int:
|
| 390 |
+
match = re.search(r"Turn\s+(\d+)", observation_text)
|
| 391 |
+
return int(match.group(1)) if match else 0
|
| 392 |
+
|
| 393 |
+
def get_discovered_constraints(self, state: WorkspaceState) -> str:
|
| 394 |
+
lines = []
|
| 395 |
+
for name, expert in state.experts.items():
|
| 396 |
+
if expert.constraint_discovered_by_agent:
|
| 397 |
+
lines.append(f"{name}: discovered from prior expert feedback.")
|
| 398 |
+
else:
|
| 399 |
+
lines.append(f"{name}: still unknown.")
|
| 400 |
+
return "\n".join(lines)
|
| 401 |
+
|
| 402 |
+
|
| 403 |
+
def summarize_results(results: list[dict], episodes_requested: int, agent_mode: str, env_mode: str) -> dict:
|
| 404 |
+
status_counts: dict[str, int] = {}
|
| 405 |
+
for result in results:
|
| 406 |
+
status_counts[result["status"]] = status_counts.get(result["status"], 0) + 1
|
| 407 |
+
|
| 408 |
+
completed = [result for result in results if result["status"] == "completed"]
|
| 409 |
+
|
| 410 |
+
avg_cumulative = None
|
| 411 |
+
avg_final = None
|
| 412 |
+
avg_turns = None
|
| 413 |
+
all_constraints_discovered_rate = None
|
| 414 |
+
finance_discovery_rate = None
|
| 415 |
+
security_discovery_rate = None
|
| 416 |
+
ux_discovery_rate = None
|
| 417 |
+
|
| 418 |
+
if completed:
|
| 419 |
+
avg_cumulative = round(
|
| 420 |
+
sum(result["cumulative_reward"] for result in completed) / len(completed),
|
| 421 |
+
3,
|
| 422 |
+
)
|
| 423 |
+
avg_final = round(
|
| 424 |
+
sum(result["final_step_reward"] for result in completed) / len(completed),
|
| 425 |
+
3,
|
| 426 |
+
)
|
| 427 |
+
avg_turns = round(
|
| 428 |
+
sum(result["turns_completed"] for result in completed) / len(completed),
|
| 429 |
+
2,
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
def has_discovery(result: dict, expert_name: str) -> bool:
|
| 433 |
+
marker = f"{expert_name}: discovered from prior expert feedback."
|
| 434 |
+
return marker in (result.get("discovered_constraints") or "")
|
| 435 |
+
|
| 436 |
+
finance_hits = sum(1 for result in completed if has_discovery(result, "Finance"))
|
| 437 |
+
security_hits = sum(1 for result in completed if has_discovery(result, "Security"))
|
| 438 |
+
ux_hits = sum(1 for result in completed if has_discovery(result, "UX"))
|
| 439 |
+
all_constraints_hits = sum(
|
| 440 |
+
1
|
| 441 |
+
for result in completed
|
| 442 |
+
if has_discovery(result, "Finance")
|
| 443 |
+
and has_discovery(result, "Security")
|
| 444 |
+
and has_discovery(result, "UX")
|
| 445 |
+
)
|
| 446 |
+
|
| 447 |
+
finance_discovery_rate = round(finance_hits / len(completed), 3)
|
| 448 |
+
security_discovery_rate = round(security_hits / len(completed), 3)
|
| 449 |
+
ux_discovery_rate = round(ux_hits / len(completed), 3)
|
| 450 |
+
all_constraints_discovered_rate = round(all_constraints_hits / len(completed), 3)
|
| 451 |
+
|
| 452 |
+
return {
|
| 453 |
+
"episodes_requested": episodes_requested,
|
| 454 |
+
"episodes_completed": len(completed),
|
| 455 |
+
"completion_rate": round(len(completed) / episodes_requested, 3) if episodes_requested else 0.0,
|
| 456 |
+
"average_cumulative_reward_completed": avg_cumulative,
|
| 457 |
+
"average_final_step_reward_completed": avg_final,
|
| 458 |
+
"average_turns_completed": avg_turns,
|
| 459 |
+
"all_constraints_discovered_rate": all_constraints_discovered_rate,
|
| 460 |
+
"finance_discovery_rate": finance_discovery_rate,
|
| 461 |
+
"security_discovery_rate": security_discovery_rate,
|
| 462 |
+
"ux_discovery_rate": ux_discovery_rate,
|
| 463 |
+
"status_counts": status_counts,
|
| 464 |
+
"agent_mode": agent_mode,
|
| 465 |
+
"environment_mode": env_mode,
|
| 466 |
+
}
|
| 467 |
+
|
| 468 |
+
|
| 469 |
+
def record_baseline(episodes: Optional[int] = None):
|
| 470 |
+
episodes = episodes or int(os.getenv("BASELINE_EPISODES", "10"))
|
| 471 |
+
step_delay = float(os.getenv("BASELINE_STEP_DELAY", "0"))
|
| 472 |
+
|
| 473 |
+
agent_mode = os.getenv("BASELINE_AGENT_MODE") or "scripted"
|
| 474 |
+
env_mode = os.getenv("BASELINE_ENV_MODE") or "mock"
|
| 475 |
+
|
| 476 |
+
env = WorkSpaceEnvironment(mode=env_mode)
|
| 477 |
+
agent = AgentWrapper(mode=agent_mode)
|
| 478 |
+
all_results = []
|
| 479 |
+
|
| 480 |
+
print(
|
| 481 |
+
f"Starting Baseline Recording for {episodes} episodes "
|
| 482 |
+
f"(agent_mode={agent.mode}, env_mode={env.mode})..."
|
| 483 |
+
)
|
| 484 |
+
|
| 485 |
+
for i in range(episodes):
|
| 486 |
+
obs = env.reset()
|
| 487 |
+
agent.reset_episode()
|
| 488 |
+
conversation_history: list[dict[str, str]] = []
|
| 489 |
+
cumulative_reward = 0.0
|
| 490 |
+
step_rewards: list[float] = []
|
| 491 |
+
episode_result: Optional[dict] = None
|
| 492 |
+
|
| 493 |
+
print(f"\n--- Episode {i + 1} ---")
|
| 494 |
+
|
| 495 |
+
while not obs.done:
|
| 496 |
+
prompt = f"Turn {obs.current_turn}. Feedback: {obs.feedback}"
|
| 497 |
+
discovered = agent.get_discovered_constraints(env.state())
|
| 498 |
+
|
| 499 |
+
decision = agent.get_action(prompt, conversation_history, discovered)
|
| 500 |
+
|
| 501 |
+
if obs.current_turn >= 4:
|
| 502 |
+
prompt += "\n\nCRITICAL SYSTEM OVERRIDE: You are out of time. You MUST output a JSON with action_type: 'submit_final' right now. Do not message anyone else."
|
| 503 |
+
|
| 504 |
+
|
| 505 |
+
if decision.status != "ok" or decision.action is None:
|
| 506 |
+
episode_result = {
|
| 507 |
+
"episode": i + 1,
|
| 508 |
+
"status": decision.status,
|
| 509 |
+
"error_source": "agent",
|
| 510 |
+
"error_detail": decision.error,
|
| 511 |
+
"raw_response": decision.raw_response,
|
| 512 |
+
"final_step_reward": step_rewards[-1] if step_rewards else None,
|
| 513 |
+
"cumulative_reward": round(cumulative_reward, 3),
|
| 514 |
+
"step_rewards": step_rewards,
|
| 515 |
+
"turns_completed": obs.current_turn,
|
| 516 |
+
"discovered_constraints": discovered,
|
| 517 |
+
"chat_history": env.state().chat_history,
|
| 518 |
+
}
|
| 519 |
+
print(f" {decision.status.upper()}: Episode {i + 1} ended early")
|
| 520 |
+
break
|
| 521 |
+
|
| 522 |
+
action = decision.action
|
| 523 |
+
print(f"Agent Action: {action.action_type} -> {action.target}")
|
| 524 |
+
|
| 525 |
+
conversation_history.append({"role": "user", "content": prompt})
|
| 526 |
+
conversation_history.append({"role": "assistant", "content": action.model_dump_json()})
|
| 527 |
+
|
| 528 |
+
try:
|
| 529 |
+
obs = env.step(action)
|
| 530 |
+
except Exception as exc:
|
| 531 |
+
logger.error(f"Environment step failed: {exc}")
|
| 532 |
+
episode_result = {
|
| 533 |
+
"episode": i + 1,
|
| 534 |
+
"status": "infra_error",
|
| 535 |
+
"error_source": "environment",
|
| 536 |
+
"error_detail": str(exc),
|
| 537 |
+
"raw_response": None,
|
| 538 |
+
"final_step_reward": step_rewards[-1] if step_rewards else None,
|
| 539 |
+
"cumulative_reward": round(cumulative_reward, 3),
|
| 540 |
+
"step_rewards": step_rewards,
|
| 541 |
+
"turns_completed": env.state().turn_count,
|
| 542 |
+
"discovered_constraints": agent.get_discovered_constraints(env.state()),
|
| 543 |
+
"chat_history": env.state().chat_history,
|
| 544 |
+
}
|
| 545 |
+
print(f" INFRA_ERROR: Environment failed during episode {i + 1}")
|
| 546 |
+
break
|
| 547 |
+
|
| 548 |
+
cumulative_reward = round(cumulative_reward + obs.reward, 3)
|
| 549 |
+
step_rewards.append(obs.reward)
|
| 550 |
+
|
| 551 |
+
if step_delay > 0:
|
| 552 |
+
time.sleep(step_delay)
|
| 553 |
+
|
| 554 |
+
if episode_result is None:
|
| 555 |
+
episode_result = {
|
| 556 |
+
"episode": i + 1,
|
| 557 |
+
"status": "completed",
|
| 558 |
+
"error_source": None,
|
| 559 |
+
"error_detail": None,
|
| 560 |
+
"raw_response": None,
|
| 561 |
+
"final_step_reward": obs.reward,
|
| 562 |
+
"cumulative_reward": cumulative_reward,
|
| 563 |
+
"step_rewards": step_rewards,
|
| 564 |
+
"turns_completed": obs.current_turn,
|
| 565 |
+
"discovered_constraints": agent.get_discovered_constraints(env.state()),
|
| 566 |
+
"chat_history": env.state().chat_history,
|
| 567 |
+
}
|
| 568 |
+
print(
|
| 569 |
+
f"Episode {i + 1} completed in {obs.current_turn} turns. "
|
| 570 |
+
f"Final step reward: {obs.reward:.3f} | Cumulative reward: {cumulative_reward:.3f}"
|
| 571 |
+
)
|
| 572 |
+
else:
|
| 573 |
+
print(f"Episode {i + 1} status: {episode_result['status']}")
|
| 574 |
+
|
| 575 |
+
all_results.append(episode_result)
|
| 576 |
+
|
| 577 |
+
summary = summarize_results(all_results, episodes, agent.mode, env.mode)
|
| 578 |
+
output_payload = {
|
| 579 |
+
"summary": summary,
|
| 580 |
+
"episodes": all_results,
|
| 581 |
+
}
|
| 582 |
+
|
| 583 |
+
with open("baseline_results.json", "w", encoding="utf-8") as file:
|
| 584 |
+
json.dump(output_payload, file, indent=4)
|
| 585 |
+
|
| 586 |
+
print("\nBaseline summary:")
|
| 587 |
+
print(json.dumps(summary, indent=4))
|
| 588 |
+
print("Saved to baseline_results.json.")
|
| 589 |
+
|
| 590 |
+
|
| 591 |
+
if __name__ == "__main__":
|
| 592 |
+
record_baseline()
|
openenv.yaml
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: expert-negotiation-env
|
| 2 |
+
version: "1.0.0"
|
| 3 |
+
description: "Multi-agent negotiation environment for training LLM stakeholder alignment"
|
| 4 |
+
tasks:
|
| 5 |
+
- name: constraint_discovery
|
| 6 |
+
difficulty: easy
|
| 7 |
+
max_steps: 5
|
| 8 |
+
- name: draft_compromise
|
| 9 |
+
difficulty: medium
|
| 10 |
+
max_steps: 10
|
| 11 |
+
- name: shifting_goalpost
|
| 12 |
+
difficulty: hard
|
| 13 |
+
max_steps: 15
|
| 14 |
+
action_space:
|
| 15 |
+
type: structured
|
| 16 |
+
fields:
|
| 17 |
+
- name: action_type
|
| 18 |
+
type: string
|
| 19 |
+
- name: target
|
| 20 |
+
type: string
|
| 21 |
+
- name: content
|
| 22 |
+
type: string
|
| 23 |
+
observation_space:
|
| 24 |
+
type: structured
|
| 25 |
+
fields:
|
| 26 |
+
- feedback
|
| 27 |
+
- current_turn
|
| 28 |
+
- reward
|
| 29 |
+
- done
|
uv.lock
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version = 1
|
| 2 |
+
revision = 2
|
| 3 |
+
requires-python = ">=3.13"
|
| 4 |
+
|
| 5 |
+
[[package]]
|
| 6 |
+
name = "project-polymath"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
source = { virtual = "." }
|