Revanth-ml commited on
Commit
56724ad
·
verified ·
1 Parent(s): e2eb9d7

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +95 -9
  2. client.py +4 -1
  3. inference.py +1 -1
  4. server/app.py +16 -6
  5. server/environment.py +20 -9
  6. server/inference.py +1 -1
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  title: Agentops Gym Environment Server
3
- emoji: 🏏
4
- colorFrom: gray
5
- colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
@@ -11,9 +11,11 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # Agentops Gym Environment
15
 
16
- Stateful, partially observable, efficiency-penalizing RL environment for training agents on software engineering tool-use tasks.
 
 
17
 
18
  ## Quick Start
19
 
@@ -44,13 +46,97 @@ finally:
44
  agentops_gymenv.close()
45
  ```
46
 
47
- ## Building the Docker Image
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- Before using the environment, you need to build the Docker image:
 
 
50
 
 
51
  ```bash
52
- # From project root
53
- docker build -t agentops_gym-env:latest -f agentops_gym/server/Dockerfile .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ```
55
 
56
  ## Environment Details
 
1
  ---
2
  title: Agentops Gym Environment Server
3
+ emoji: 🔊
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
 
11
  - openenv
12
  ---
13
 
14
+ # Agentops Gym: Optimizing Tool-Use Efficiency
15
 
16
+ **"LLMs burn tokens via inefficient tool usage."**
17
+
18
+ Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes **efficiency**—penalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption.
19
 
20
  ## Quick Start
21
 
 
46
  agentops_gymenv.close()
47
  ```
48
 
49
+ ## Docker Build & Run
50
+
51
+ ### 1. Build the Image
52
+ Build the environment server from the project root:
53
+ ```bash
54
+ docker build -t agentops-gym -f agentops_gym/server/Dockerfile .
55
+ ```
56
+
57
+ ### 2. Run the Container
58
+ Start the server on port 8000:
59
+ ```bash
60
+ # Remove existing container if necessary
61
+ docker stop agentops-gym && docker rm agentops-gym
62
 
63
+ # Run new container
64
+ docker run -d --name agentops-gym -p 8000:8000 agentops-gym
65
+ ```
66
 
67
+ ### 3. Verify & Logs
68
  ```bash
69
+ # Check health
70
+ curl http://localhost:8000/health
71
+
72
+ # Tail logs
73
+ docker logs -f agentops-gym
74
+ ```
75
+
76
+ ## Run Baseline Inference
77
+
78
+ The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration).
79
+
80
+ ### Setup
81
+ ```bash
82
+ export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
83
+ export IMAGE_NAME=agentops-gym
84
+
85
+ # Optional overrides:
86
+ # export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
87
+ # export API_BASE_URL=https://router.huggingface.co/v1
88
+ ```
89
+
90
+ ### Run
91
+ ```bash
92
+ python agentops_gym/inference.py
93
+ ```
94
+
95
+ ### Expected Output
96
+ ```text
97
+ ============================================================
98
+ AgentOps Gym — Baseline Inference
99
+ Model: gpt-4.1 | Server: http://localhost:8000
100
+ ============================================================
101
+ ────────────────────────────────────────
102
+ [START] task=task_1 env=agentops-gym model=gpt-4.1
103
+ [STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null
104
+ [STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null
105
+ [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null
106
+ ...
107
+ [STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null
108
+ [END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14
109
+ ────────────────────────────────────────
110
+ [START] task=task_2 env=agentops-gym model=gpt-4.1
111
+ [STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null
112
+ [STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null
113
+ [STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null
114
+ [END] success=true steps=3 rewards=0.05,0.10,0.55
115
+ ────────────────────────────────────────
116
+ [START] task=task_3 env=agentops-gym model=gpt-4.1
117
+ ...
118
+ [STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null
119
+ [END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20
120
+ ────────────────────────────────────────
121
+ [START] task=task_4 env=agentops-gym model=gpt-4.1
122
+ [STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null
123
+ [STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null
124
+ [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null
125
+ [STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null
126
+ [STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null
127
+ [END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43
128
+
129
+ ============================================================
130
+ BASELINE SUMMARY
131
+ ============================================================
132
+ task_1 score=0.390 steps= 8 ❌ FAIL
133
+ task_2 score=1.000 steps= 3 ✅ PASS
134
+ task_3 score=0.392 steps= 8 ❌ FAIL
135
+ task_4 score=0.856 steps=10 ✅ PASS
136
+
137
+ Average score: 0.659
138
+ Solved: 2 / 4
139
+ ============================================================
140
  ```
141
 
142
  ## Environment Details
client.py CHANGED
@@ -9,7 +9,10 @@ from typing import Dict, Any
9
  from openenv.core.env_client import EnvClient
10
  from openenv.core.client_types import StepResult
11
 
12
- from agentops_gym.models import ToolCall, AgentObservation, AgentState
 
 
 
13
 
14
 
15
  class AgentOpsEnv(EnvClient[ToolCall, AgentObservation, AgentState]):
 
9
  from openenv.core.env_client import EnvClient
10
  from openenv.core.client_types import StepResult
11
 
12
+ try:
13
+ from agentops_gym.models import ToolCall, AgentObservation, AgentState
14
+ except (ModuleNotFoundError, ImportError):
15
+ from models import ToolCall, AgentObservation, AgentState
16
 
17
 
18
  class AgentOpsEnv(EnvClient[ToolCall, AgentObservation, AgentState]):
inference.py CHANGED
@@ -246,7 +246,7 @@ async def async_main() -> None:
246
 
247
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
248
 
249
- async with AgentOpsEnv.from_docker_image(IMAGE_NAME) as env:
250
  results = []
251
  for task_id in ALL_TASKS:
252
  result = await run_episode(env, client, task_id)
 
246
 
247
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
248
 
249
+ async with await AgentOpsEnv.from_docker_image(IMAGE_NAME) as env:
250
  results = []
251
  for task_id in ALL_TASKS:
252
  result = await run_episode(env, client, task_id)
server/app.py CHANGED
@@ -11,15 +11,24 @@ get their own AgentOpsEnvironment instance (via create_app factory pattern).
11
 
12
  import threading
13
  import logging
 
14
  from typing import Optional
15
 
16
  from fastapi.responses import JSONResponse
17
 
18
- from openenv.core.env_server.http_server import create_app
 
 
 
19
 
20
- from agentops_gym.models import ToolCall, AgentObservation
21
- from agentops_gym.server.environment import AgentOpsEnvironment, get_last_grader_result
22
- from agentops_gym.server.tasks import TASK_REGISTRY
 
 
 
 
 
23
 
24
  logger = logging.getLogger(__name__)
25
 
@@ -143,11 +152,12 @@ async def health():
143
 
144
 
145
  def main():
 
146
  import uvicorn
147
  import os
148
  host = os.getenv("HOST", "0.0.0.0")
149
- port = int(os.getenv("PORT", 8000))
150
- uvicorn.run(app, host=host, port=port)
151
 
152
 
153
  if __name__ == "__main__":
 
11
 
12
  import threading
13
  import logging
14
+ import os
15
  from typing import Optional
16
 
17
  from fastapi.responses import JSONResponse
18
 
19
+ try:
20
+ from openenv.core.env_server.http_server import create_app
21
+ except ImportError:
22
+ raise ImportError("openenv is required. Install with 'pip install openenv-core'")
23
 
24
+ try:
25
+ from agentops_gym.models import ToolCall, AgentObservation
26
+ from agentops_gym.server.environment import AgentOpsEnvironment, get_last_grader_result
27
+ from agentops_gym.server.tasks import TASK_REGISTRY
28
+ except (ModuleNotFoundError, ImportError):
29
+ from models import ToolCall, AgentObservation
30
+ from server.environment import AgentOpsEnvironment, get_last_grader_result
31
+ from server.tasks import TASK_REGISTRY
32
 
33
  logger = logging.getLogger(__name__)
34
 
 
152
 
153
 
154
  def main():
155
+ """Entry point for running the AgentOps Gym server."""
156
  import uvicorn
157
  import os
158
  host = os.getenv("HOST", "0.0.0.0")
159
+ port = int(os.getenv("PORT", "8000"))
160
+ uvicorn.run(app, host=host, port=int(port))
161
 
162
 
163
  if __name__ == "__main__":
server/environment.py CHANGED
@@ -17,15 +17,26 @@ from typing import Optional, Any
17
 
18
  from openenv.core.env_server.interfaces import Environment
19
 
20
- from agentops_gym.models import ToolCall, AgentObservation, AgentState
21
- from agentops_gym.server.tools import run_tool, PROJECT_SNAPSHOTS, AVAILABLE_TOOLS
22
- from agentops_gym.server.tasks import (
23
- TASK_REGISTRY,
24
- get_task,
25
- list_task_ids,
26
- compute_step_reward,
27
- grade_episode,
28
- )
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  logger = logging.getLogger(__name__)
31
 
 
17
 
18
  from openenv.core.env_server.interfaces import Environment
19
 
20
+ try:
21
+ from agentops_gym.models import ToolCall, AgentObservation, AgentState
22
+ from agentops_gym.server.tools import run_tool, PROJECT_SNAPSHOTS, AVAILABLE_TOOLS
23
+ from agentops_gym.server.tasks import (
24
+ TASK_REGISTRY,
25
+ get_task,
26
+ list_task_ids,
27
+ compute_step_reward,
28
+ grade_episode,
29
+ )
30
+ except (ModuleNotFoundError, ImportError):
31
+ from models import ToolCall, AgentObservation, AgentState
32
+ from server.tools import run_tool, PROJECT_SNAPSHOTS, AVAILABLE_TOOLS
33
+ from server.tasks import (
34
+ TASK_REGISTRY,
35
+ get_task,
36
+ list_task_ids,
37
+ compute_step_reward,
38
+ grade_episode,
39
+ )
40
 
41
  logger = logging.getLogger(__name__)
42
 
server/inference.py CHANGED
@@ -40,7 +40,7 @@ except ImportError:
40
  # ---------------------------------------------------------------------------
41
 
42
  IMAGE_NAME = os.getenv("IMAGE_NAME")
43
- API_KEY = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY")
44
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
45
  MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
46
  BASE_URL = os.getenv("ENV_BASE_URL", "http://localhost:8000")
 
40
  # ---------------------------------------------------------------------------
41
 
42
  IMAGE_NAME = os.getenv("IMAGE_NAME")
43
+ API_KEY = os.getenv("OPENAI_API_KEY")
44
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
45
  MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
46
  BASE_URL = os.getenv("ENV_BASE_URL", "http://localhost:8000")