Spaces:
Sleeping
Sleeping
| title: Agentops Gym Environment Server | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| # Agentops Gym: Optimizing Tool-Use Efficiency | |
| **"LLMs burn tokens via inefficient tool usage."** | |
| Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes **efficiency**βpenalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption. | |
| ## Quick Start | |
| The simplest way to use the Agentops Gym environment is through the `AgentopsGymEnv` class: | |
| ```python | |
| from agentops_gym import AgentopsGymAction, AgentopsGymEnv | |
| from agentops_gym.models import ToolCall | |
| try: | |
| # Create environment from Docker image | |
| agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest") | |
| # Reset to start a task | |
| result = agentops_gymenv.reset(task_id="task_1") | |
| print(f"Task: {result.observation.task_description}") | |
| # Use tools to complete the task | |
| # Example: Search for a pattern | |
| action = AgentopsGymAction( | |
| tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"}) | |
| ) | |
| result = agentops_gymenv.step(action) | |
| print(f"Grep Result: {result.observation.last_tool_result}") | |
| finally: | |
| # Always clean up | |
| agentops_gymenv.close() | |
| ``` | |
| ## Docker Build & Run | |
| ### 1. Build the Image | |
| Build the environment server from the project root: | |
| ```bash | |
| docker build -t agentops-gym -f agentops_gym/server/Dockerfile . | |
| ``` | |
| ### 2. Run the Container | |
| Start the server on port 8000: | |
| ```bash | |
| # Remove existing container if necessary | |
| docker stop agentops-gym && docker rm agentops-gym | |
| # Run new container | |
| docker run -d --name agentops-gym -p 8000:8000 agentops-gym | |
| ``` | |
| ### 3. Verify & Logs | |
| ```bash | |
| # Check health | |
| curl http://localhost:8000/health | |
| # Tail logs | |
| docker logs -f agentops-gym | |
| ``` | |
| ## Run Baseline Inference | |
| The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration). | |
| ### Setup | |
| ```bash | |
| export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx | |
| export IMAGE_NAME=agentops-gym | |
| # Optional overrides: | |
| # export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct | |
| # export API_BASE_URL=https://router.huggingface.co/v1 | |
| ``` | |
| ### Run | |
| ```bash | |
| python agentops_gym/inference.py | |
| ``` | |
| ### Expected Output | |
| ```text | |
| ============================================================ | |
| AgentOps Gym β Baseline Inference | |
| Model: gpt-4.1 | Server: http://localhost:8000 | |
| ============================================================ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| [START] task=task_1 env=agentops-gym model=gpt-4.1 | |
| [STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null | |
| [STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null | |
| [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null | |
| ... | |
| [STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null | |
| [END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14 | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| [START] task=task_2 env=agentops-gym model=gpt-4.1 | |
| [STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null | |
| [STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null | |
| [STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null | |
| [END] success=true steps=3 rewards=0.05,0.10,0.55 | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| [START] task=task_3 env=agentops-gym model=gpt-4.1 | |
| ... | |
| [STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null | |
| [END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20 | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| [START] task=task_4 env=agentops-gym model=gpt-4.1 | |
| [STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null | |
| [STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null | |
| [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null | |
| [STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null | |
| [STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null | |
| [END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43 | |
| ============================================================ | |
| BASELINE SUMMARY | |
| ============================================================ | |
| task_1 score=0.390 steps= 8 β FAIL | |
| task_2 score=1.000 steps= 3 β PASS | |
| task_3 score=0.392 steps= 8 β FAIL | |
| task_4 score=0.856 steps=10 β PASS | |
| Average score: 0.659 | |
| Solved: 2 / 4 | |
| ============================================================ | |
| ``` | |
| ## Environment Details | |
| ### Action | |
| **AgentopsGymAction**: | |
| - `tool_call` (ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit) | |
| - `reasoning` (str, optional) - Agent's explanation for the action | |
| ### Observation | |
| **AgentopsGymObservation**: | |
| - `task_description` (str) - The task objective | |
| - `visible_files` (list[str]) - Files discovered so far | |
| - `last_tool_result` (str) - Output of the last tool call | |
| - `action_history` (list[str]) - Previous actions in this episode | |
| - `step_count` (int) - Current step number | |
| - `max_steps` (int) - Maximum allowed steps | |
| - `done` (bool) - Whether the episode is complete | |
| - `feedback` (str, optional) - Warnings or penalties from the environment | |
| ### Available Tools | |
| - **Grep**: Search for patterns in the virtual filesystem. | |
| - **FileRead**: Read file contents. | |
| - **FileWrite**: Modify file contents. | |
| - **Bash**: Run simulated commands (lint, test). | |
| - **TodoWrite**: Save a plan for the task. | |
| - **Submit**: Submit the final answer. | |
| ## Advanced Usage | |
| ### Using the Context Manager | |
| ```python | |
| from agentops_gym import AgentopsGymAction, AgentopsGymEnv | |
| from agentops_gym.models import ToolCall | |
| with AgentopsGymEnv(base_url="http://localhost:8000") as env: | |
| result = env.reset(task_id="task_1") | |
| # Execute steps... | |
| action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"})) | |
| result = env.step(action) | |
| ``` | |
| ## Running Locally | |
| Run the server locally for development: | |
| ```bash | |
| cd agentops_gym | |
| uvicorn server.app:app --reload | |
| ``` | |
| ## Project Structure | |
| ``` | |
| agentops_gym/ | |
| βββ __init__.py # Module exports | |
| βββ README.md # This file | |
| βββ openenv.yaml # OpenEnv manifest | |
| βββ pyproject.toml # Project metadata and dependencies | |
| βββ models.py # Action and Observation models | |
| βββ server/ | |
| βββ __init__.py # Server module exports | |
| βββ agentops_gym_environment.py # Core environment logic | |
| βββ app.py # FastAPI application | |
| βββ Dockerfile # Container image definition | |
| ``` | |