Spaces:

Kavya988
/

API_DEBUG_SOLVER

Sleeping

App Files Files Community

API_DEBUG_SOLVER / README.md

Siteshcodes

Update README with full documentation

02c65a9 about 2 months ago

preview code

raw

history blame contribute delete

7.96 kB

metadata

title: API Triage Agent
emoji: 🔧
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

API Triage Agent

An OpenEnv-compliant reinforcement learning environment that trains AI agents to diagnose and resolve real-world API integration failures. The agent inspects logs, identifies error patterns, and applies corrective actions — mirroring the workflow of an on-call support engineer.

Why This Matters

API failures cost companies millions in downtime and lost revenue. Authentication expiry, malformed payloads, rate limiting, endpoint misconfiguration, and server errors are the top five categories that SRE teams deal with daily. This environment lets an LLM agent learn the diagnostic playbook through trial and error, receiving shaped rewards that encourage methodical investigation before action.

Project Structure

api-triage-agent/
├── app.py                          # FastAPI server (main entry point)
├── inference.py                    # Baseline agent script (OpenAI client)
├── openenv.yaml                    # OpenEnv manifest (tasks + graders)
├── Dockerfile                      # Container definition for HF Spaces
├── requirements.txt
├── pyproject.toml
├── environment/
│   ├── api_triage_env.py           # Core environment (reset/step/state)
│   ├── incident_generator.py       # Procedural incident generation
│   ├── action_space.py             # Valid action definitions
│   └── reward.py                   # 5-factor reward function
├── tasks/
│   ├── grading_helper.py           # Shared grading utilities
│   ├── auth_error/grader.py        # Task 1 grader
│   ├── missing_fields/grader.py    # Task 2 grader
│   ├── rate_limit/grader.py        # Task 3 grader
│   ├── timeout/grader.py           # Task 4 grader
│   ├── wrong_endpoint/grader.py    # Task 5 grader
│   └── server_error/grader.py      # Task 6 grader
└── tests/
    ├── test_env.py                 # Environment unit tests
    └── test_graders.py             # Grader validation tests

Action Space

The agent chooses from 8 discrete actions per step:

Action	Description	When to Use
`inspect_logs`	Read error logs for diagnostic clues	First step in any incident
`inspect_request`	Examine the failed HTTP request	Gather additional context
`refresh_token`	Regenerate expired API credentials	401 Unauthorized errors
`add_field`	Add missing required payload fields	400 Bad Request errors
`wait_retry`	Back off and retry the request	429 Rate Limit / 408 Timeout
`change_endpoint`	Switch to the correct API path	404 Not Found errors
`escalate`	Escalate to a human operator	500 Internal Server errors
`resolve`	End the episode (must apply fix first)	After successful remediation

Observation Space

Each step returns a structured observation:

Field	Type	Description
`step`	`int`	Current step number (1-indexed)
`max_steps`	`int`	Maximum allowed steps (default: 10)
`incident_summary`	`str`	Human-readable problem description
`logs`	`list[str]`	Simulated error log entries
`response_code`	`int`	HTTP status code (401, 400, 429, 408, 404, 500)
`fix_applied`	`bool`	Whether the correct fix action has been taken
`is_resolved`	`bool`	Whether the episode has terminated

Tasks (6 tasks, Easy → Hard)

Each task targets a specific API failure pattern. Graders evaluate whether the agent follows the correct diagnostic-then-fix workflow.

#	Task ID	Difficulty	HTTP Code	Correct Fix	Grader
1	`auth_error`	Easy	401	`refresh_token` → `resolve`	`tasks.auth_error.grader:grade`
2	`missing_fields`	Easy	400	`add_field` → `resolve`	`tasks.missing_fields.grader:grade`
3	`rate_limit`	Medium	429	`wait_retry` → `resolve`	`tasks.rate_limit.grader:grade`
4	`timeout`	Medium	408	`wait_retry` → `resolve`	`tasks.timeout.grader:grade`
5	`wrong_endpoint`	Medium	404	`change_endpoint` → `resolve`	`tasks.wrong_endpoint.grader:grade`
6	`server_error`	Hard	500	`escalate` → `resolve`	`tasks.server_error.grader:grade`

Reward Function

A 5-factor shaped reward system provides partial-credit signals at every step, guiding the agent toward the optimal diagnostic workflow:

Factor	Reward	Rationale
Correct fix action	+5.0	Directly addresses the root cause
Diagnostic action (`inspect_logs`, `inspect_request`)	+0.5	Encourages investigation before action
Successful resolution (`resolve` after fix)	+15.0	Large bonus for completing the episode correctly
Premature resolution (`resolve` without fix)	-10.0	Prevents the agent from "lying" about fixing the issue
Wrong action	-2.0	Mild penalty to discourage random exploration
Max steps reached	-5.0	Time pressure to act efficiently

Reward range: [-20.0, +20.5]
Optimal 3-step episode: inspect_logs (+0.5) → correct fix (+5.0) → resolve (+15.0) = +20.5

API Endpoints

The FastAPI server exposes the following endpoints:

Method	Path	Description
`GET`	`/`	Root status check
`GET`	`/health`	Health check
`POST`	`/reset`	Reset environment, get initial observation
`POST`	`/step`	Execute an action, receive observation + reward
`GET`	`/state`	Get current environment state
`GET`	`/tasks`	List all tasks with grader references
`POST`	`/grade/{task_id}`	Run a specific task's grader
`GET`	`/docs`	Swagger UI documentation

Environment Variables

Variable	Description	Default
`API_BASE_URL`	LLM API endpoint	`https://router.huggingface.co/v1`
`MODEL_NAME`	Model identifier for inference	`Qwen/Qwen2.5-72B-Instruct`
`HF_TOKEN`	Hugging Face API key	(required)

Setup & Run

Local Development

# Clone the repo
git clone https://huggingface.co/spaces/Kavya988/API_DEBUG_SOLVER
cd API_DEBUG_SOLVER

# Create virtual environment
python -m venv venv
source venv/bin/activate    # Linux/Mac
venv\Scripts\activate       # Windows

# Install dependencies
pip install -r requirements.txt

# Run the server
python app.py
# Server starts at http://localhost:7860

# Run tests
pytest tests/test_env.py -v
python tests/test_graders.py

Docker

docker build -t api-triage-agent .
docker run -p 7860:7860 -e HF_TOKEN=your_token api-triage-agent

Run Inference

export HF_TOKEN=your_token
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct

python inference.py

The inference script runs all 6 tasks sequentially, emitting structured [START]/[STEP]/[END] logs for each task. Scores are computed from actual episode rewards, clamped to [0.0, 1.0].

OpenEnv Compliance

This environment implements the full OpenEnv specification:

Typed models: Pydantic BaseModel for observations, actions, and state
Standard endpoints: reset(), step(), state() via FastAPI
Task discovery: GET /tasks returns all tasks with grader module references
Agent graders: 6 grader functions (one per task), each returning scores in [0.0, 1.0]
Manifest: openenv.yaml defines environment metadata, action/observation spaces, and task-grader mappings
Containerized: Dockerfile builds and runs on HF Spaces (port 7860)

Deployment

Live at: https://kavya988-api-debug-solver.hf.space