Spaces:

Revanth-ml
/

agentops-gym

Sleeping

App Files Files Community

agentops-gym / README.md

Revanth-ml

Upload folder using huggingface_hub

56724ad verified about 2 months ago

preview code

raw

history blame contribute delete

7.56 kB

	---
	title: Agentops Gym Environment Server
	emoji: 🔊
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Agentops Gym: Optimizing Tool-Use Efficiency

	"LLMs burn tokens via inefficient tool usage."

	Agentops Gym is a stateful, partially observable, efficiency-penalizing RL environment designed to train and evaluate agents on software engineering tasks. While many environments focus solely on task completion, Agentops Gym prioritizes efficiency—penalizing redundant calls, reward-hacking, and "hallucinated" file reads to help you build agents that solve problems with minimal token consumption.

	## Quick Start

	The simplest way to use the Agentops Gym environment is through the `AgentopsGymEnv` class:

	```python
	from agentops_gym import AgentopsGymAction, AgentopsGymEnv
	from agentops_gym.models import ToolCall

	try:
	# Create environment from Docker image
	agentops_gymenv = AgentopsGymEnv.from_docker_image("agentops_gym-env:latest")

	# Reset to start a task
	result = agentops_gymenv.reset(task_id="task_1")
	print(f"Task: {result.observation.task_description}")

	# Use tools to complete the task
	# Example: Search for a pattern
	action = AgentopsGymAction(
	tool_call=ToolCall(tool="Grep", parameters={"pattern": "json"})
	)
	result = agentops_gymenv.step(action)
	print(f"Grep Result: {result.observation.last_tool_result}")

	finally:
	# Always clean up
	agentops_gymenv.close()
	```

	## Docker Build & Run

	### 1. Build the Image
	Build the environment server from the project root:
	```bash
	docker build -t agentops-gym -f agentops_gym/server/Dockerfile .
	```

	### 2. Run the Container
	Start the server on port 8000:
	```bash
	# Remove existing container if necessary
	docker stop agentops-gym && docker rm agentops-gym

	# Run new container
	docker run -d --name agentops-gym -p 8000:8000 agentops-gym
	```

	### 3. Verify & Logs
	```bash
	# Check health
	curl http://localhost:8000/health

	# Tail logs
	docker logs -f agentops-gym
	```

	## Run Baseline Inference

	The project includes a baseline inference script to evaluate agents across all tasks (including the new Task 4: Secret Migration).

	### Setup
	```bash
	export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
	export IMAGE_NAME=agentops-gym

	# Optional overrides:
	# export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
	# export API_BASE_URL=https://router.huggingface.co/v1
	```

	### Run
	```bash
	python agentops_gym/inference.py
	```

	### Expected Output
	```text
	============================================================
	AgentOps Gym — Baseline Inference
	Model: gpt-4.1 \| Server: http://localhost:8000
	============================================================
	────────────────────────────────────────
	[START] task=task_1 env=agentops-gym model=gpt-4.1
	[STEP] step=1 action=Grep({"pattern": "def fetch_user"}) reward=0.00 done=false error=null
	[STEP] step=2 action=Grep({"pattern": "return"}) reward=0.00 done=false error=null
	[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.10 done=false error=null
	...
	[STEP] step=8 action=FileRead({"filename": "main.py"}) reward=0.14 done=true error=null
	[END] success=false steps=8 rewards=0.00,0.00,0.10,-0.05,-0.05,-0.05,-0.05,0.14
	────────────────────────────────────────
	[START] task=task_2 env=agentops-gym model=gpt-4.1
	[STEP] step=1 action=Grep({"pattern": "timeout"}) reward=0.05 done=false error=null
	[STEP] step=2 action=FileRead({"filename": "config.json"}) reward=0.10 done=false error=null
	[STEP] step=3 action=FileWrite({"filename": "config.json", "content": "{\"api_url\": \"https://api.example.com\", \"timeout\": 10}"}) reward=0.55 done=true error=null
	[END] success=true steps=3 rewards=0.05,0.10,0.55
	────────────────────────────────────────
	[START] task=task_3 env=agentops-gym model=gpt-4.1
	...
	[STEP] step=8 action=Grep({"pattern": "def "}) reward=0.20 done=true error=null
	[END] success=false steps=8 rewards=0.10,0.00,0.05,0.05,0.05,0.00,0.05,0.20
	────────────────────────────────────────
	[START] task=task_4 env=agentops-gym model=gpt-4.1
	[STEP] step=1 action=TodoWrite({"plan": "..."}) reward=0.05 done=false error=null
	[STEP] step=2 action=Grep({"pattern": "SECRET_TOKEN_XYZ"}) reward=0.05 done=false error=null
	[STEP] step=3 action=FileRead({"filename": "main.py"}) reward=0.05 done=false error=null
	[STEP] step=4 action=FileWrite({"filename": ".env", "content": "API_KEY=SECRET_TOKEN_XYZ\n"}) reward=0.10 done=false error=null
	[STEP] step=10 action=FileWrite({"filename": "main.py", "content": "import os\n..."}) reward=0.43 done=true error=null
	[END] success=true steps=10 rewards=0.05,0.05,0.05,0.10,0.05,0.00,0.05,0.05,0.10,0.43

	============================================================
	BASELINE SUMMARY
	============================================================
	task_1 score=0.390 steps= 8 ❌ FAIL
	task_2 score=1.000 steps= 3 ✅ PASS
	task_3 score=0.392 steps= 8 ❌ FAIL
	task_4 score=0.856 steps=10 ✅ PASS

	Average score: 0.659
	Solved: 2 / 4
	============================================================
	```

	## Environment Details

	### Action
	AgentopsGymAction:
	- `tool_call` (ToolCall) - The tool to execute (Grep, FileRead, FileWrite, Bash, TodoWrite, Submit)
	- `reasoning` (str, optional) - Agent's explanation for the action

	### Observation
	AgentopsGymObservation:
	- `task_description` (str) - The task objective
	- `visible_files` (list[str]) - Files discovered so far
	- `last_tool_result` (str) - Output of the last tool call
	- `action_history` (list[str]) - Previous actions in this episode
	- `step_count` (int) - Current step number
	- `max_steps` (int) - Maximum allowed steps
	- `done` (bool) - Whether the episode is complete
	- `feedback` (str, optional) - Warnings or penalties from the environment

	### Available Tools
	- Grep: Search for patterns in the virtual filesystem.
	- FileRead: Read file contents.
	- FileWrite: Modify file contents.
	- Bash: Run simulated commands (lint, test).
	- TodoWrite: Save a plan for the task.
	- Submit: Submit the final answer.

	## Advanced Usage

	### Using the Context Manager

	```python
	from agentops_gym import AgentopsGymAction, AgentopsGymEnv
	from agentops_gym.models import ToolCall

	with AgentopsGymEnv(base_url="http://localhost:8000") as env:
	result = env.reset(task_id="task_1")
	# Execute steps...
	action = AgentopsGymAction(tool_call=ToolCall(tool="FileRead", parameters={"filename": "README.md"}))
	result = env.step(action)
	```

	## Running Locally

	Run the server locally for development:

	```bash
	cd agentops_gym
	uvicorn server.app:app --reload
	```

	## Project Structure

	```
	agentops_gym/
	├── __init__.py # Module exports
	├── README.md # This file
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml # Project metadata and dependencies
	├── models.py # Action and Observation models
	└── server/
	├── __init__.py # Server module exports
	├── agentops_gym_environment.py # Core environment logic
	├── app.py # FastAPI application
	└── Dockerfile # Container image definition
	```