--- title: Agentops Gym Environment Server emoji: πŸ”Š colorFrom: blue colorTo: indigo sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # Agentops Gym: Optimizing Tool-Use Efficiency **"LLMs burn tokens via inefficient tool usage."** Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes **efficiency**β€”penalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption. ## Quick Start The simplest way to use the Agentops Gym environment is through the `AgentopsGymEnv` class: ```python from agentops_gym import AgentopsGymAction, AgentopsGymEnv from agentops_gym.models import ToolCall try: # Create environment from Docker image agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest") # Reset to start a task result = agentops_gymenv.reset(task_id="task_1") print(f"Task: {result.observation.task_description}") # Use tools to complete the task # Example: Search for a pattern action = AgentopsGymAction( tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"}) ) result = agentops_gymenv.step(action) print(f"Grep Result: {result.observation.last_tool_result}") finally: # Always clean up agentops_gymenv.close() ``` ## Docker Build & Run ### 1. Build the Image Build the environment server from the project root: ```bash docker build -t agentops-gym -f agentops_gym/server/Dockerfile . ``` ### 2. Run the Container Start the server on port 8000: ```bash # Remove existing container if necessary docker stop agentops-gym && docker rm agentops-gym # Run new container docker run -d --name agentops-gym -p 8000:8000 agentops-gym ``` ### 3. Verify & Logs ```bash # Check health curl http://localhost:8000/health # Tail logs docker logs -f agentops-gym ``` ## Run Baseline Inference The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration). ### Setup ```bash export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx export IMAGE_NAME=agentops-gym # Optional overrides: # export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # export API_BASE_URL=https://router.huggingface.co/v1 ``` ### Run ```bash python agentops_gym/inference.py ``` ### Expected Output ```text ============================================================ AgentOps Gym β€” Baseline Inference Model: gpt-4.1 | Server: http://localhost:8000 ============================================================ ──────────────────────────────────────── [START] task=task_1 env=agentops-gym model=gpt-4.1 [STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null [STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null ... [STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null [END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14 ──────────────────────────────────────── [START] task=task_2 env=agentops-gym model=gpt-4.1 [STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null [STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null [STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null [END] success=true steps=3 rewards=0.05,0.10,0.55 ──────────────────────────────────────── [START] task=task_3 env=agentops-gym model=gpt-4.1 ... [STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null [END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20 ──────────────────────────────────────── [START] task=task_4 env=agentops-gym model=gpt-4.1 [STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null [STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null [STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null [STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null [STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null [END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43 ============================================================ BASELINE SUMMARY ============================================================ task_1 score=0.390 steps= 8 ❌ FAIL task_2 score=1.000 steps= 3 βœ… PASS task_3 score=0.392 steps= 8 ❌ FAIL task_4 score=0.856 steps=10 βœ… PASS Average score: 0.659 Solved: 2 / 4 ============================================================ ``` ## Environment Details ### Action **AgentopsGymAction**: - `tool_call` (ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit) - `reasoning` (str, optional) - Agent's explanation for the action ### Observation **AgentopsGymObservation**: - `task_description` (str) - The task objective - `visible_files` (list[str]) - Files discovered so far - `last_tool_result` (str) - Output of the last tool call - `action_history` (list[str]) - Previous actions in this episode - `step_count` (int) - Current step number - `max_steps` (int) - Maximum allowed steps - `done` (bool) - Whether the episode is complete - `feedback` (str, optional) - Warnings or penalties from the environment ### Available Tools - **Grep**: Search for patterns in the virtual filesystem. - **FileRead**: Read file contents. - **FileWrite**: Modify file contents. - **Bash**: Run simulated commands (lint, test). - **TodoWrite**: Save a plan for the task. - **Submit**: Submit the final answer. ## Advanced Usage ### Using the Context Manager ```python from agentops_gym import AgentopsGymAction, AgentopsGymEnv from agentops_gym.models import ToolCall with AgentopsGymEnv(base_url="http://localhost:8000") as env: result = env.reset(task_id="task_1") # Execute steps... action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"})) result = env.step(action) ``` ## Running Locally Run the server locally for development: ```bash cd agentops_gym uvicorn server.app:app --reload ``` ## Project Structure ``` agentops_gym/ β”œβ”€β”€ __init__.py # Module exports β”œβ”€β”€ README.md # This file β”œβ”€β”€ openenv.yaml # OpenEnv manifest β”œβ”€β”€ pyproject.toml # Project metadata and dependencies β”œβ”€β”€ models.py # Action and Observation models └── server/ β”œβ”€β”€ __init__.py # Server module exports β”œβ”€β”€ agentops_gym_environment.py # Core environment logic β”œβ”€β”€ app.py # FastAPI application └── Dockerfile # Container image definition ```