API_DEBUG_SOLVER / README.md
Siteshcodes's picture
Update README with full documentation
02c65a9
metadata
title: API Triage Agent
emoji: πŸ”§
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

API Triage Agent

An OpenEnv-compliant reinforcement learning environment that trains AI agents to diagnose and resolve real-world API integration failures. The agent inspects logs, identifies error patterns, and applies corrective actions β€” mirroring the workflow of an on-call support engineer.

Why This Matters

API failures cost companies millions in downtime and lost revenue. Authentication expiry, malformed payloads, rate limiting, endpoint misconfiguration, and server errors are the top five categories that SRE teams deal with daily. This environment lets an LLM agent learn the diagnostic playbook through trial and error, receiving shaped rewards that encourage methodical investigation before action.

Project Structure

api-triage-agent/
β”œβ”€β”€ app.py                          # FastAPI server (main entry point)
β”œβ”€β”€ inference.py                    # Baseline agent script (OpenAI client)
β”œβ”€β”€ openenv.yaml                    # OpenEnv manifest (tasks + graders)
β”œβ”€β”€ Dockerfile                      # Container definition for HF Spaces
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ environment/
β”‚   β”œβ”€β”€ api_triage_env.py           # Core environment (reset/step/state)
β”‚   β”œβ”€β”€ incident_generator.py       # Procedural incident generation
β”‚   β”œβ”€β”€ action_space.py             # Valid action definitions
β”‚   └── reward.py                   # 5-factor reward function
β”œβ”€β”€ tasks/
β”‚   β”œβ”€β”€ grading_helper.py           # Shared grading utilities
β”‚   β”œβ”€β”€ auth_error/grader.py        # Task 1 grader
β”‚   β”œβ”€β”€ missing_fields/grader.py    # Task 2 grader
β”‚   β”œβ”€β”€ rate_limit/grader.py        # Task 3 grader
β”‚   β”œβ”€β”€ timeout/grader.py           # Task 4 grader
β”‚   β”œβ”€β”€ wrong_endpoint/grader.py    # Task 5 grader
β”‚   └── server_error/grader.py      # Task 6 grader
└── tests/
    β”œβ”€β”€ test_env.py                 # Environment unit tests
    └── test_graders.py             # Grader validation tests

Action Space

The agent chooses from 8 discrete actions per step:

Action Description When to Use
inspect_logs Read error logs for diagnostic clues First step in any incident
inspect_request Examine the failed HTTP request Gather additional context
refresh_token Regenerate expired API credentials 401 Unauthorized errors
add_field Add missing required payload fields 400 Bad Request errors
wait_retry Back off and retry the request 429 Rate Limit / 408 Timeout
change_endpoint Switch to the correct API path 404 Not Found errors
escalate Escalate to a human operator 500 Internal Server errors
resolve End the episode (must apply fix first) After successful remediation

Observation Space

Each step returns a structured observation:

Field Type Description
step int Current step number (1-indexed)
max_steps int Maximum allowed steps (default: 10)
incident_summary str Human-readable problem description
logs list[str] Simulated error log entries
response_code int HTTP status code (401, 400, 429, 408, 404, 500)
fix_applied bool Whether the correct fix action has been taken
is_resolved bool Whether the episode has terminated

Tasks (6 tasks, Easy β†’ Hard)

Each task targets a specific API failure pattern. Graders evaluate whether the agent follows the correct diagnostic-then-fix workflow.

# Task ID Difficulty HTTP Code Correct Fix Grader
1 auth_error Easy 401 refresh_token β†’ resolve tasks.auth_error.grader:grade
2 missing_fields Easy 400 add_field β†’ resolve tasks.missing_fields.grader:grade
3 rate_limit Medium 429 wait_retry β†’ resolve tasks.rate_limit.grader:grade
4 timeout Medium 408 wait_retry β†’ resolve tasks.timeout.grader:grade
5 wrong_endpoint Medium 404 change_endpoint β†’ resolve tasks.wrong_endpoint.grader:grade
6 server_error Hard 500 escalate β†’ resolve tasks.server_error.grader:grade

Reward Function

A 5-factor shaped reward system provides partial-credit signals at every step, guiding the agent toward the optimal diagnostic workflow:

Factor Reward Rationale
Correct fix action +5.0 Directly addresses the root cause
Diagnostic action (inspect_logs, inspect_request) +0.5 Encourages investigation before action
Successful resolution (resolve after fix) +15.0 Large bonus for completing the episode correctly
Premature resolution (resolve without fix) -10.0 Prevents the agent from "lying" about fixing the issue
Wrong action -2.0 Mild penalty to discourage random exploration
Max steps reached -5.0 Time pressure to act efficiently

Reward range: [-20.0, +20.5]
Optimal 3-step episode: inspect_logs (+0.5) β†’ correct fix (+5.0) β†’ resolve (+15.0) = +20.5

API Endpoints

The FastAPI server exposes the following endpoints:

Method Path Description
GET / Root status check
GET /health Health check
POST /reset Reset environment, get initial observation
POST /step Execute an action, receive observation + reward
GET /state Get current environment state
GET /tasks List all tasks with grader references
POST /grade/{task_id} Run a specific task's grader
GET /docs Swagger UI documentation

Environment Variables

Variable Description Default
API_BASE_URL LLM API endpoint https://router.huggingface.co/v1
MODEL_NAME Model identifier for inference Qwen/Qwen2.5-72B-Instruct
HF_TOKEN Hugging Face API key (required)

Setup & Run

Local Development

# Clone the repo
git clone https://huggingface.co/spaces/Kavya988/API_DEBUG_SOLVER
cd API_DEBUG_SOLVER

# Create virtual environment
python -m venv venv
source venv/bin/activate    # Linux/Mac
venv\Scripts\activate       # Windows

# Install dependencies
pip install -r requirements.txt

# Run the server
python app.py
# Server starts at http://localhost:7860

# Run tests
pytest tests/test_env.py -v
python tests/test_graders.py

Docker

docker build -t api-triage-agent .
docker run -p 7860:7860 -e HF_TOKEN=your_token api-triage-agent

Run Inference

export HF_TOKEN=your_token
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct

python inference.py

The inference script runs all 6 tasks sequentially, emitting structured [START]/[STEP]/[END] logs for each task. Scores are computed from actual episode rewards, clamped to [0.0, 1.0].

OpenEnv Compliance

This environment implements the full OpenEnv specification:

  • Typed models: Pydantic BaseModel for observations, actions, and state
  • Standard endpoints: reset(), step(), state() via FastAPI
  • Task discovery: GET /tasks returns all tasks with grader module references
  • Agent graders: 6 grader functions (one per task), each returning scores in [0.0, 1.0]
  • Manifest: openenv.yaml defines environment metadata, action/observation spaces, and task-grader mappings
  • Containerized: Dockerfile builds and runs on HF Spaces (port 7860)

Deployment

Live at: https://kavya988-api-debug-solver.hf.space