--- title: API Contract Debugger emoji: 🔍 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false tags: - openenv - rl-environment - api-debugging - contract-testing --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # API Contract Debugger — OpenEnv Environment An OpenEnv environment where AI agents debug broken OpenAPI-style contract specifications by proposing targeted field-level corrections. ## What Is This? Every backend engineer debugs API contract violations constantly — mismatched types, missing required fields, wrong HTTP status codes, forbidden extra fields leaking into responses. This environment turns that real-world task into a structured RL benchmark. The agent receives a broken API spec and a list of violations. Each step, it proposes one fix. It gets rewarded for each violation resolved and penalised for introducing new ones. --- ## Action Space ```json { "kind": "add_field | remove_field | change_type | change_status | no_op", "endpoint_index": 0, "location": "request_body | response_body | status_code", "field_name": "field_name_or_null", "new_value": "" } ``` | `kind` | `new_value` type | Description | |-----------------|-----------------|-------------| | `add_field` | `{"type": "...", "required": true, "description": "..."}` | Add a missing field | | `remove_field` | `null` | Remove a forbidden field | | `change_type` | `"integer"` / `"string"` / `"boolean"` / `"number"` | Fix a field's type | | `change_status` | `204` / `200` / `201` etc. | Fix the HTTP status code | | `no_op` | `null` | Do nothing (small implicit cost) | --- ## Observation Space | Field | Type | Description | |-------|------|-------------| | `task_name` | str | Active task: `easy`, `medium`, `hard` | | `task_description` | str | Plain-English description of violations | | `endpoints` | list | Current (partially fixed) endpoint specs | | `violations` | list | Remaining violations with type + description | | `violations_fixed_this_step` | int | How many the last action resolved | | `violations_introduced_this_step` | int | How many the last action introduced | | `total_violations_at_start` | int | Violation count at episode start | | `step_count` | int | Steps taken so far | | `max_steps` | int | Episode step budget | | `last_action_error` | str\|null | Validation error if action was malformed | | `reward` | float | Per-step reward | | `done` | bool | Whether the episode has terminated | --- ## Tasks ### Easy (1 endpoint, 1 violation, max 5 steps) A user registration endpoint is missing `created_at` (string) in its response. Expected score for a capable agent: **1.0** ### Medium (3 endpoints, 3 violations, max 10 steps) An e-commerce API has: 1. `GET /products/{id}` — `product_id` returned as `string` instead of `integer` 2. `POST /orders` — `quantity` accepted as `string` instead of `integer` 3. `DELETE /orders/{id}` — returns status `200` instead of `204` Expected score for a capable agent: **1.0** ### Hard (4 endpoints, 6 violations, max 15 steps) An auth + profile API has: 1. `POST /auth/login` — missing `refresh_token` in response 2. `POST /auth/login` — `expires_in` is `string` instead of `integer` 3. `GET /users/{id}/profile` — missing `created_at` in response 4. `GET /users/{id}/profile` — exposes forbidden `password_hash` field (must be removed) 5. `PATCH /users/{id}/profile` — returns status `500` instead of `200` 6. `PATCH /users/{id}/profile` — missing `updated_at` in response Expected score for a capable agent: **0.7–1.0** (frontier models) --- ## Reward Function | Event | Reward | |-------|--------| | Fix a violation | `+0.2 × severity` | | Introduce a violation | `−0.15 × severity` | | Malformed action | `−0.05` | | Solve all violations | `+0.5` bonus | Severity weights: `missing_field=1.0`, `wrong_type=0.9`, `wrong_status=0.8`, `extra_field=0.7` Final episode score is computed by `grade_episode()` → float in `[0.0, 1.0]`. --- ## API Endpoints | Method | Path | Description | |--------|------|-------------| | `POST` | `/reset` | Reset environment. Body: `{"task_name": "easy\|medium\|hard"}` | | `POST` | `/step` | Apply one action. Body: `{"action": {...}}` | | `GET` | `/state` | Full internal state | | `GET` | `/score` | Final episode score | | `GET` | `/tasks` | List all available tasks | | `GET` | `/health`| Health check | | `GET` | `/schema`| JSON schemas for action + observation | --- ## Setup & Usage ### Installation ```bash # Clone the repository git clone cd api-contract-debugger # Create virtual environment python3 -m venv .venv source .venv/bin/activate # Install dependencies pip install --upgrade pip pip install -r requirements.txt ``` ### Run locally ```bash # Start the server uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload ``` The server will be available at `http://localhost:7860` ### Run with Docker ```bash docker build -t api-contract-debugger . docker run -p 7860:7860 api-contract-debugger ``` ### Run tests ```bash # Run entire test suite (56 tests) pytest tests/ -v # Run with coverage pytest tests/ -v --cov=server ``` ### Run the baseline agent The baseline agent uses an LLM (via OpenAI client) to propose fixes. **Required environment variables** (must be set): ```bash export HF_TOKEN="your_huggingface_api_token" # Get from huggingface.co/settings/tokens export ENV_BASE_URL="http://localhost:7860" # Environment server URL export TASK_NAME="all" # "easy", "medium", "hard", or "all" ``` **Optional environment variables** (have defaults): ```bash export API_BASE_URL="https://router.huggingface.co/v1" # LLM endpoint export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct" # Model identifier export LOCAL_IMAGE_NAME="optional_docker_image" # For docker image initialization ``` Then run the agent: ```bash python inference.py ``` **Example output:** ``` [START] task=easy env=api_contract_debugger model=Qwen/Qwen2.5-72B-Instruct [STEP] step=1 action={"kind":"add_field",...} reward=0.70 done=true error=null [END] success=true steps=1 score=1.000 rewards=0.70 ``` ### Test individual endpoints ```bash # Health check curl http://localhost:7860/health # List available tasks curl http://localhost:7860/tasks # Reset to a task curl -X POST http://localhost:7860/reset \ -H "Content-Type: application/json" \ -d '{"task_name":"easy"}' # Apply an action curl -X POST http://localhost:7860/step \ -H "Content-Type: application/json" \ -d '{ "action": { "kind": "add_field", "endpoint_index": 0, "location": "response_body", "field_name": "created_at", "new_value": {"type": "string", "description": "ISO-8601 timestamp"} } }' # Get final score curl http://localhost:7860/score ``` --- ## Baseline Scores | Task | Model | Score | Steps Used | |------|-------|-------|-----------| | easy | Qwen2.5-72B-Instruct | 1.000 | 1 | | medium | Qwen2.5-72B-Instruct | 1.000 | 3 | | hard | Qwen2.5-72B-Instruct | ~0.85 | 12 | --- ## Project Structure ``` api-contract-debugger/ ├── server/ │ ├── __init__.py │ ├── app.py # FastAPI app, route registration │ ├── environment.py # OpenEnv Environment subclass │ ├── models.py # Pydantic Action / Observation / State │ ├── graders.py # Violation detection + reward shaping │ └── fixtures.py # Task definitions (broken + golden specs) ├── tests/ │ └── test_env.py # 56 unit tests covering all components ├── inference.py # Baseline LLM-powered agent ├── openenv.yaml # OpenEnv metadata ├── pyproject.toml # Package configuration ├── requirements.txt # Python dependencies ├── Dockerfile # Container image configuration └── RL_ARCHITECTURE.md # Complete RL framework documentation ``` --- ## Documentation ### RL_ARCHITECTURE.md Comprehensive guide to the reinforcement learning implementation: - **Agent** — How external AI systems interact with the environment via HTTP API - **Environment** — Core `APIContractDebuggerEnv` class and episode lifecycle - **State** — Observation space and full internal state representation - **Action** — All 5 action types with validation rules and examples - **Reward & Scoring** — Dense per-step rewards and episode grading formula - **Complete example episode transcript** with JSON payloads - **Python agent pseudocode** for custom implementations ---