ledgershield / docs /api-reference.md
king673134's picture
Upload folder using huggingface_hub
5f7588b verified
|
Raw
History Blame Contribute Delete
9.68 kB
# API Reference
LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle.
## Base URL
```text
http://127.0.0.1:8000
```
## Response Envelope
`POST /reset` and `POST /step` return a common top-level envelope:
```json
{
"observation": {},
"reward": 0.0,
"done": false,
"truncated": false,
"terminated": false,
"info": {}
}
```
### Semantics
- `done`: the episode has ended for any reason
- `terminated`: a true terminal condition, currently a successful `submit_decision`
- `truncated`: the episode ended because of budget exhaustion or max-step exhaustion
- `info.reward_model`: structured reward breakdown for the last action
## Endpoints
### `GET /`
Basic service probe.
Example response:
```json
{
"status": "ok",
"service": "LedgerShield OpenEnv"
}
```
### `GET /health`
Health check used by local smoke tests, Docker smoke tests, and CI.
Example response:
```json
{
"status": "ok"
}
```
### `POST /reset`
Start a new episode or load a specific case.
Request body:
```json
{
"seed": 42,
"case_id": "CASE-D-001"
}
```
Fields:
| Field | Type | Required | Notes |
|---|---|---|---|
| `seed` | integer | no | used for random case selection |
| `case_id` | string | no | when provided, loads that specific case |
Example response:
```json
{
"observation": {
"case_id": "CASE-D-001",
"task_type": "task_d",
"instruction": "Act as an AP analyst...",
"visible_documents": [
{
"doc_id": "INV-D-001",
"doc_type": "invoice",
"thumbnail": "thumbnail::INV-D-001",
"page_count": 1,
"language": "en",
"available_views": [
"thumbnail",
"zoom",
"get_doc_crop",
"ocr_fast",
"ocr_accurate"
]
}
],
"revealed_artifacts": [],
"pending_events": [],
"budget_remaining": 16.0,
"budget_total": 16.0,
"step_count": 0,
"max_steps": 18,
"case_clock": 0,
"risk_snapshot": {},
"investigation_status": {},
"last_tool_result": {},
"messages": ["Loaded case CASE-D-001"],
"allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"],
"available_interventions": ["request_callback_verification", "route_to_security"],
"case_metadata": {
"task_label": "AP inbox incident triage",
"due_date_days": 30
},
"portfolio_context": {}
},
"reward": 0.0,
"done": false,
"truncated": false,
"terminated": false,
"info": {
"case_id": "CASE-D-001"
}
}
```
### `POST /step`
Execute one action.
Request body:
```json
{
"action_type": "ocr",
"payload": {
"doc_id": "INV-D-001",
"mode": "accurate"
}
}
```
Example response:
```json
{
"observation": {
"case_id": "CASE-D-001",
"step_count": 1,
"budget_remaining": 14.9,
"last_tool_result": {
"tool_name": "ocr",
"success": true,
"doc_id": "INV-D-001",
"mode": "accurate",
"scope": "document",
"text_preview": "Invoice ...",
"cost": 1.1,
"reward_model": {
"value": -0.055,
"terminal": false,
"components": {
"cost_penalty": -0.055,
"info_gain_bonus": 0.0,
"potential_delta": 0.0
},
"metadata": {
"action_type": "ocr",
"success": true
}
}
}
},
"reward": -0.055,
"done": false,
"truncated": false,
"terminated": false,
"info": {
"tool_name": "ocr",
"success": true,
"reward_model": {
"value": -0.055,
"terminal": false
}
}
}
```
### `GET /state`
Return the current public environment state, not the full hidden system state.
Key fields:
| Field | Meaning |
|---|---|
| `episode_id` | current episode UUID |
| `case_id` | current case |
| `task_type` | task family |
| `budget_total`, `budget_remaining` | budget accounting |
| `step_count`, `case_clock`, `max_steps` | episode progress |
| `trajectory` | public action history |
| `interventions_taken` | public intervention log |
| `observed_risk_signals` | only signals the agent has revealed |
| `pending_events` | delayed artifacts waiting to resolve |
| `pressure_events_seen` | injected pressure events already observed |
| `terminal_reason` | why the episode ended if it ended |
### `GET /leaderboard`
Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact.
Typical response shape:
```json
{
"benchmark": "ledgershield-v3",
"generated_at": "2026-04-08T12:00:00+00:00",
"entries": [
{
"model": "openai/gpt-4.1-mini",
"type": "deterministic-policy",
"public_mean": 0.9674,
"holdout_mean": 0.6649,
"holdout_pass_k_consistent": 0.619
}
]
}
```
### `GET /benchmark-report`
Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run `benchmark_report.py`.
## Observation Shape
The observation returned by `/reset` and `/step` includes:
| Field | Type | Notes |
|---|---|---|
| `case_id` | string | current case ID |
| `task_type` | string | one of `task_a`..`task_e` |
| `instruction` | string | natural-language episode instruction |
| `visible_documents` | list | document catalog entries only, not raw OCR |
| `revealed_artifacts` | list | artifacts unlocked by interventions |
| `pending_events` | list | future artifact events not yet resolved |
| `budget_remaining` | float | current remaining budget |
| `budget_total` | float | episode budget |
| `step_count` | integer | executed step count |
| `max_steps` | integer | episode cap |
| `case_clock` | integer | logical clock used by delayed events |
| `risk_snapshot` | object | summarized public risk signals |
| `investigation_status` | object | tool/intervention/reveal counts |
| `last_tool_result` | object | payload from the most recent action |
| `messages` | list[string] | user-facing environment messages |
| `allowed_actions` | list[string] | investigation + intervention + final action names |
| `available_interventions` | list[string] | intervention subset |
| `case_metadata` | object | task label and due-date info |
| `portfolio_context` | object | cross-invoice/campaign context when relevant |
## Action Taxonomy
### Investigation actions
| Action | Required payload |
|---|---|
| `zoom` | `doc_id`, optional `page`, `bbox` |
| `get_doc_crop` | `doc_id`, optional `page`, `bbox` |
| `ocr` | `doc_id`, optional `mode`, `page`, `bbox` |
| `lookup_vendor` | `vendor_key` |
| `lookup_vendor_history` | `vendor_key` |
| `lookup_policy` | optional `rule_id` |
| `lookup_po` | `po_id` |
| `lookup_receipt` | `receipt_id` |
| `search_ledger` | optional `vendor_key`, `invoice_number`, `amount` |
| `inspect_email_thread` | `thread_id` |
| `compare_bank_account` | `vendor_key`, `proposed_bank_account` |
### Intervention actions
| Action | Typical use |
|---|---|
| `request_callback_verification` | verify vendor identity or remittance changes |
| `freeze_vendor_profile` | contain high-risk vendor state |
| `request_bank_change_approval_chain` | unlock approval-chain artifact |
| `request_po_reconciliation` | unlock PO reconciliation artifact |
| `request_additional_receipt_evidence` | unlock receipt reconciliation artifact |
| `route_to_procurement` | route operationally |
| `route_to_security` | escalate suspicious incidents |
| `flag_duplicate_cluster_review` | request duplicate cluster artifact |
| `create_human_handoff` | create structured handoff packet |
### Final decision action
`submit_decision` carries the structured task output.
Minimal example:
```json
{
"action_type": "submit_decision",
"payload": {
"decision": "ESCALATE_FRAUD",
"confidence": 0.95,
"reason_codes": ["sender_domain_spoof", "bank_override_attempt"],
"policy_checks": {
"bank_change_verification": "fail"
},
"evidence_map": {}
}
}
```
## Reward Model
Every step may include `info.reward_model` and `observation.last_tool_result.reward_model` with:
| Field | Meaning |
|---|---|
| `value` | scalar reward emitted for the step |
| `terminal` | whether the reward ended the episode |
| `components` | shaping/cost/outcome breakdown |
| `metadata` | action type, success flag, terminal reason, and other step context |
The environment currently combines:
- action cost penalties
- PBRS shaping delta
- information-gain bonus
- milestone rewards
- terminal score on `submit_decision`
## Python API Notes
The HTTP API is the main integration path, but the Python environment class also exposes:
- `LedgerShieldEnvironment.action_space()`
- `LedgerShieldEnvironment.observation_space()`
- `LedgerShieldEnvironment.render(mode="text")`
These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints.
## Agent Capability Profiles
The reference agent in `inference.py` uses a `ModelCapabilityProfile` to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment:
| Tier | Capability score | Plan mode | Repair level | Decision token budget |
|---|---|---|---|---|
| Elite | ≥ 5.0 | coverage | grounded | ≥ 1536 |
| Strong | ≥ 4.5 | hybrid | partial | ≥ 1280 |
| Standard | < 4.5 | LLM-first | none | model default |
The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains.