Spaces:
Sleeping
Sleeping
File size: 7,955 Bytes
92f22c6 d416acc 02c65a9 d416acc 92f22c6 d416acc 02c65a9 d416acc 02c65a9 d416acc 02c65a9 d416acc 02c65a9 d416acc 02c65a9 d416acc 02c65a9 d416acc 02c65a9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | ---
title: API Triage Agent
emoji: π§
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# API Triage Agent
An OpenEnv-compliant reinforcement learning environment that trains AI agents to diagnose and resolve real-world API integration failures. The agent inspects logs, identifies error patterns, and applies corrective actions β mirroring the workflow of an on-call support engineer.
## Why This Matters
API failures cost companies millions in downtime and lost revenue. Authentication expiry, malformed payloads, rate limiting, endpoint misconfiguration, and server errors are the top five categories that SRE teams deal with daily. This environment lets an LLM agent learn the diagnostic playbook through trial and error, receiving shaped rewards that encourage methodical investigation before action.
## Project Structure
```
api-triage-agent/
βββ app.py # FastAPI server (main entry point)
βββ inference.py # Baseline agent script (OpenAI client)
βββ openenv.yaml # OpenEnv manifest (tasks + graders)
βββ Dockerfile # Container definition for HF Spaces
βββ requirements.txt
βββ pyproject.toml
βββ environment/
β βββ api_triage_env.py # Core environment (reset/step/state)
β βββ incident_generator.py # Procedural incident generation
β βββ action_space.py # Valid action definitions
β βββ reward.py # 5-factor reward function
βββ tasks/
β βββ grading_helper.py # Shared grading utilities
β βββ auth_error/grader.py # Task 1 grader
β βββ missing_fields/grader.py # Task 2 grader
β βββ rate_limit/grader.py # Task 3 grader
β βββ timeout/grader.py # Task 4 grader
β βββ wrong_endpoint/grader.py # Task 5 grader
β βββ server_error/grader.py # Task 6 grader
βββ tests/
βββ test_env.py # Environment unit tests
βββ test_graders.py # Grader validation tests
```
## Action Space
The agent chooses from 8 discrete actions per step:
| Action | Description | When to Use |
|--------|-------------|-------------|
| `inspect_logs` | Read error logs for diagnostic clues | First step in any incident |
| `inspect_request` | Examine the failed HTTP request | Gather additional context |
| `refresh_token` | Regenerate expired API credentials | 401 Unauthorized errors |
| `add_field` | Add missing required payload fields | 400 Bad Request errors |
| `wait_retry` | Back off and retry the request | 429 Rate Limit / 408 Timeout |
| `change_endpoint` | Switch to the correct API path | 404 Not Found errors |
| `escalate` | Escalate to a human operator | 500 Internal Server errors |
| `resolve` | End the episode (must apply fix first) | After successful remediation |
## Observation Space
Each step returns a structured observation:
| Field | Type | Description |
|-------|------|-------------|
| `step` | `int` | Current step number (1-indexed) |
| `max_steps` | `int` | Maximum allowed steps (default: 10) |
| `incident_summary` | `str` | Human-readable problem description |
| `logs` | `list[str]` | Simulated error log entries |
| `response_code` | `int` | HTTP status code (401, 400, 429, 408, 404, 500) |
| `fix_applied` | `bool` | Whether the correct fix action has been taken |
| `is_resolved` | `bool` | Whether the episode has terminated |
## Tasks (6 tasks, Easy β Hard)
Each task targets a specific API failure pattern. Graders evaluate whether the agent follows the correct diagnostic-then-fix workflow.
| # | Task ID | Difficulty | HTTP Code | Correct Fix | Grader |
|---|---------|-----------|-----------|-------------|--------|
| 1 | `auth_error` | Easy | 401 | `refresh_token` β `resolve` | `tasks.auth_error.grader:grade` |
| 2 | `missing_fields` | Easy | 400 | `add_field` β `resolve` | `tasks.missing_fields.grader:grade` |
| 3 | `rate_limit` | Medium | 429 | `wait_retry` β `resolve` | `tasks.rate_limit.grader:grade` |
| 4 | `timeout` | Medium | 408 | `wait_retry` β `resolve` | `tasks.timeout.grader:grade` |
| 5 | `wrong_endpoint` | Medium | 404 | `change_endpoint` β `resolve` | `tasks.wrong_endpoint.grader:grade` |
| 6 | `server_error` | Hard | 500 | `escalate` β `resolve` | `tasks.server_error.grader:grade` |
## Reward Function
A 5-factor shaped reward system provides partial-credit signals at every step, guiding the agent toward the optimal diagnostic workflow:
| Factor | Reward | Rationale |
|--------|--------|-----------|
| Correct fix action | **+5.0** | Directly addresses the root cause |
| Diagnostic action (`inspect_logs`, `inspect_request`) | **+0.5** | Encourages investigation before action |
| Successful resolution (`resolve` after fix) | **+15.0** | Large bonus for completing the episode correctly |
| Premature resolution (`resolve` without fix) | **-10.0** | Prevents the agent from "lying" about fixing the issue |
| Wrong action | **-2.0** | Mild penalty to discourage random exploration |
| Max steps reached | **-5.0** | Time pressure to act efficiently |
**Reward range:** `[-20.0, +20.5]`
**Optimal 3-step episode:** `inspect_logs (+0.5)` β correct fix (`+5.0`) β `resolve (+15.0)` = **+20.5**
## API Endpoints
The FastAPI server exposes the following endpoints:
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/` | Root status check |
| `GET` | `/health` | Health check |
| `POST` | `/reset` | Reset environment, get initial observation |
| `POST` | `/step` | Execute an action, receive observation + reward |
| `GET` | `/state` | Get current environment state |
| `GET` | `/tasks` | List all tasks with grader references |
| `POST` | `/grade/{task_id}` | Run a specific task's grader |
| `GET` | `/docs` | Swagger UI documentation |
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `API_BASE_URL` | LLM API endpoint | `https://router.huggingface.co/v1` |
| `MODEL_NAME` | Model identifier for inference | `Qwen/Qwen2.5-72B-Instruct` |
| `HF_TOKEN` | Hugging Face API key | *(required)* |
## Setup & Run
### Local Development
```bash
# Clone the repo
git clone https://huggingface.co/spaces/Kavya988/API_DEBUG_SOLVER
cd API_DEBUG_SOLVER
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Run the server
python app.py
# Server starts at http://localhost:7860
# Run tests
pytest tests/test_env.py -v
python tests/test_graders.py
```
### Docker
```bash
docker build -t api-triage-agent .
docker run -p 7860:7860 -e HF_TOKEN=your_token api-triage-agent
```
### Run Inference
```bash
export HF_TOKEN=your_token
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
python inference.py
```
The inference script runs all 6 tasks sequentially, emitting structured `[START]`/`[STEP]`/`[END]` logs for each task. Scores are computed from actual episode rewards, clamped to `[0.0, 1.0]`.
## OpenEnv Compliance
This environment implements the full OpenEnv specification:
- **Typed models:** Pydantic `BaseModel` for observations, actions, and state
- **Standard endpoints:** `reset()`, `step()`, `state()` via FastAPI
- **Task discovery:** `GET /tasks` returns all tasks with grader module references
- **Agent graders:** 6 grader functions (one per task), each returning scores in `[0.0, 1.0]`
- **Manifest:** `openenv.yaml` defines environment metadata, action/observation spaces, and task-grader mappings
- **Containerized:** Dockerfile builds and runs on HF Spaces (port 7860)
## Deployment
Live at: **[https://kavya988-api-debug-solver.hf.space](https://kavya988-api-debug-solver.hf.space)**
|