Spaces:
Sleeping
Sleeping
| # API Reference | |
| LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle. | |
| ## Base URL | |
| ```text | |
| http://127.0.0.1:8000 | |
| ``` | |
| ## Response Envelope | |
| `POST /reset` and `POST /step` return a common top-level envelope: | |
| ```json | |
| { | |
| "observation": {}, | |
| "reward": 0.0, | |
| "done": false, | |
| "truncated": false, | |
| "terminated": false, | |
| "info": {} | |
| } | |
| ``` | |
| ### Semantics | |
| - `done`: the episode has ended for any reason | |
| - `terminated`: a true terminal condition, currently a successful `submit_decision` | |
| - `truncated`: the episode ended because of budget exhaustion or max-step exhaustion | |
| - `info.reward_model`: structured reward breakdown for the last action | |
| ## Endpoints | |
| ### `GET /` | |
| Basic service probe. | |
| Example response: | |
| ```json | |
| { | |
| "status": "ok", | |
| "service": "LedgerShield OpenEnv" | |
| } | |
| ``` | |
| ### `GET /health` | |
| Health check used by local smoke tests, Docker smoke tests, and CI. | |
| Example response: | |
| ```json | |
| { | |
| "status": "ok" | |
| } | |
| ``` | |
| ### `POST /reset` | |
| Start a new episode or load a specific case. | |
| Request body: | |
| ```json | |
| { | |
| "seed": 42, | |
| "case_id": "CASE-D-001" | |
| } | |
| ``` | |
| Fields: | |
| | Field | Type | Required | Notes | | |
| |---|---|---|---| | |
| | `seed` | integer | no | used for random case selection | | |
| | `case_id` | string | no | when provided, loads that specific case | | |
| Example response: | |
| ```json | |
| { | |
| "observation": { | |
| "case_id": "CASE-D-001", | |
| "task_type": "task_d", | |
| "instruction": "Act as an AP analyst...", | |
| "visible_documents": [ | |
| { | |
| "doc_id": "INV-D-001", | |
| "doc_type": "invoice", | |
| "thumbnail": "thumbnail::INV-D-001", | |
| "page_count": 1, | |
| "language": "en", | |
| "available_views": [ | |
| "thumbnail", | |
| "zoom", | |
| "get_doc_crop", | |
| "ocr_fast", | |
| "ocr_accurate" | |
| ] | |
| } | |
| ], | |
| "revealed_artifacts": [], | |
| "pending_events": [], | |
| "budget_remaining": 16.0, | |
| "budget_total": 16.0, | |
| "step_count": 0, | |
| "max_steps": 18, | |
| "case_clock": 0, | |
| "risk_snapshot": {}, | |
| "investigation_status": {}, | |
| "last_tool_result": {}, | |
| "messages": ["Loaded case CASE-D-001"], | |
| "allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"], | |
| "available_interventions": ["request_callback_verification", "route_to_security"], | |
| "case_metadata": { | |
| "task_label": "AP inbox incident triage", | |
| "due_date_days": 30 | |
| }, | |
| "portfolio_context": {} | |
| }, | |
| "reward": 0.0, | |
| "done": false, | |
| "truncated": false, | |
| "terminated": false, | |
| "info": { | |
| "case_id": "CASE-D-001" | |
| } | |
| } | |
| ``` | |
| ### `POST /step` | |
| Execute one action. | |
| Request body: | |
| ```json | |
| { | |
| "action_type": "ocr", | |
| "payload": { | |
| "doc_id": "INV-D-001", | |
| "mode": "accurate" | |
| } | |
| } | |
| ``` | |
| Example response: | |
| ```json | |
| { | |
| "observation": { | |
| "case_id": "CASE-D-001", | |
| "step_count": 1, | |
| "budget_remaining": 14.9, | |
| "last_tool_result": { | |
| "tool_name": "ocr", | |
| "success": true, | |
| "doc_id": "INV-D-001", | |
| "mode": "accurate", | |
| "scope": "document", | |
| "text_preview": "Invoice ...", | |
| "cost": 1.1, | |
| "reward_model": { | |
| "value": -0.055, | |
| "terminal": false, | |
| "components": { | |
| "cost_penalty": -0.055, | |
| "info_gain_bonus": 0.0, | |
| "potential_delta": 0.0 | |
| }, | |
| "metadata": { | |
| "action_type": "ocr", | |
| "success": true | |
| } | |
| } | |
| } | |
| }, | |
| "reward": -0.055, | |
| "done": false, | |
| "truncated": false, | |
| "terminated": false, | |
| "info": { | |
| "tool_name": "ocr", | |
| "success": true, | |
| "reward_model": { | |
| "value": -0.055, | |
| "terminal": false | |
| } | |
| } | |
| } | |
| ``` | |
| ### `GET /state` | |
| Return the current public environment state, not the full hidden system state. | |
| Key fields: | |
| | Field | Meaning | | |
| |---|---| | |
| | `episode_id` | current episode UUID | | |
| | `case_id` | current case | | |
| | `task_type` | task family | | |
| | `budget_total`, `budget_remaining` | budget accounting | | |
| | `step_count`, `case_clock`, `max_steps` | episode progress | | |
| | `trajectory` | public action history | | |
| | `interventions_taken` | public intervention log | | |
| | `observed_risk_signals` | only signals the agent has revealed | | |
| | `pending_events` | delayed artifacts waiting to resolve | | |
| | `pressure_events_seen` | injected pressure events already observed | | |
| | `terminal_reason` | why the episode ended if it ended | | |
| ### `GET /leaderboard` | |
| Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact. | |
| Typical response shape: | |
| ```json | |
| { | |
| "benchmark": "ledgershield-v3", | |
| "generated_at": "2026-04-08T12:00:00+00:00", | |
| "entries": [ | |
| { | |
| "model": "openai/gpt-4.1-mini", | |
| "type": "deterministic-policy", | |
| "public_mean": 0.9674, | |
| "holdout_mean": 0.6649, | |
| "holdout_pass_k_consistent": 0.619 | |
| } | |
| ] | |
| } | |
| ``` | |
| ### `GET /benchmark-report` | |
| Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run `benchmark_report.py`. | |
| ## Observation Shape | |
| The observation returned by `/reset` and `/step` includes: | |
| | Field | Type | Notes | | |
| |---|---|---| | |
| | `case_id` | string | current case ID | | |
| | `task_type` | string | one of `task_a`..`task_e` | | |
| | `instruction` | string | natural-language episode instruction | | |
| | `visible_documents` | list | document catalog entries only, not raw OCR | | |
| | `revealed_artifacts` | list | artifacts unlocked by interventions | | |
| | `pending_events` | list | future artifact events not yet resolved | | |
| | `budget_remaining` | float | current remaining budget | | |
| | `budget_total` | float | episode budget | | |
| | `step_count` | integer | executed step count | | |
| | `max_steps` | integer | episode cap | | |
| | `case_clock` | integer | logical clock used by delayed events | | |
| | `risk_snapshot` | object | summarized public risk signals | | |
| | `investigation_status` | object | tool/intervention/reveal counts | | |
| | `last_tool_result` | object | payload from the most recent action | | |
| | `messages` | list[string] | user-facing environment messages | | |
| | `allowed_actions` | list[string] | investigation + intervention + final action names | | |
| | `available_interventions` | list[string] | intervention subset | | |
| | `case_metadata` | object | task label and due-date info | | |
| | `portfolio_context` | object | cross-invoice/campaign context when relevant | | |
| ## Action Taxonomy | |
| ### Investigation actions | |
| | Action | Required payload | | |
| |---|---| | |
| | `zoom` | `doc_id`, optional `page`, `bbox` | | |
| | `get_doc_crop` | `doc_id`, optional `page`, `bbox` | | |
| | `ocr` | `doc_id`, optional `mode`, `page`, `bbox` | | |
| | `lookup_vendor` | `vendor_key` | | |
| | `lookup_vendor_history` | `vendor_key` | | |
| | `lookup_policy` | optional `rule_id` | | |
| | `lookup_po` | `po_id` | | |
| | `lookup_receipt` | `receipt_id` | | |
| | `search_ledger` | optional `vendor_key`, `invoice_number`, `amount` | | |
| | `inspect_email_thread` | `thread_id` | | |
| | `compare_bank_account` | `vendor_key`, `proposed_bank_account` | | |
| ### Intervention actions | |
| | Action | Typical use | | |
| |---|---| | |
| | `request_callback_verification` | verify vendor identity or remittance changes | | |
| | `freeze_vendor_profile` | contain high-risk vendor state | | |
| | `request_bank_change_approval_chain` | unlock approval-chain artifact | | |
| | `request_po_reconciliation` | unlock PO reconciliation artifact | | |
| | `request_additional_receipt_evidence` | unlock receipt reconciliation artifact | | |
| | `route_to_procurement` | route operationally | | |
| | `route_to_security` | escalate suspicious incidents | | |
| | `flag_duplicate_cluster_review` | request duplicate cluster artifact | | |
| | `create_human_handoff` | create structured handoff packet | | |
| ### Final decision action | |
| `submit_decision` carries the structured task output. | |
| Minimal example: | |
| ```json | |
| { | |
| "action_type": "submit_decision", | |
| "payload": { | |
| "decision": "ESCALATE_FRAUD", | |
| "confidence": 0.95, | |
| "reason_codes": ["sender_domain_spoof", "bank_override_attempt"], | |
| "policy_checks": { | |
| "bank_change_verification": "fail" | |
| }, | |
| "evidence_map": {} | |
| } | |
| } | |
| ``` | |
| ## Reward Model | |
| Every step may include `info.reward_model` and `observation.last_tool_result.reward_model` with: | |
| | Field | Meaning | | |
| |---|---| | |
| | `value` | scalar reward emitted for the step | | |
| | `terminal` | whether the reward ended the episode | | |
| | `components` | shaping/cost/outcome breakdown | | |
| | `metadata` | action type, success flag, terminal reason, and other step context | | |
| The environment currently combines: | |
| - action cost penalties | |
| - PBRS shaping delta | |
| - information-gain bonus | |
| - milestone rewards | |
| - terminal score on `submit_decision` | |
| ## Python API Notes | |
| The HTTP API is the main integration path, but the Python environment class also exposes: | |
| - `LedgerShieldEnvironment.action_space()` | |
| - `LedgerShieldEnvironment.observation_space()` | |
| - `LedgerShieldEnvironment.render(mode="text")` | |
| These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints. | |
| ## Agent Capability Profiles | |
| The reference agent in `inference.py` uses a `ModelCapabilityProfile` to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment: | |
| | Tier | Capability score | Plan mode | Repair level | Decision token budget | | |
| |---|---|---|---|---| | |
| | Elite | ≥ 5.0 | coverage | grounded | ≥ 1536 | | |
| | Strong | ≥ 4.5 | hybrid | partial | ≥ 1280 | | |
| | Standard | < 4.5 | LLM-first | none | model default | | |
| The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains. | |