Spaces:
Sleeping
API Reference
LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle.
Base URL
http://127.0.0.1:8000
Response Envelope
POST /reset and POST /step return a common top-level envelope:
{
"observation": {},
"reward": 0.0,
"done": false,
"truncated": false,
"terminated": false,
"info": {}
}
Semantics
done: the episode has ended for any reasonterminated: a true terminal condition, currently a successfulsubmit_decisiontruncated: the episode ended because of budget exhaustion or max-step exhaustioninfo.reward_model: structured reward breakdown for the last action
Endpoints
GET /
Basic service probe.
Example response:
{
"status": "ok",
"service": "LedgerShield OpenEnv"
}
GET /health
Health check used by local smoke tests, Docker smoke tests, and CI.
Example response:
{
"status": "ok"
}
POST /reset
Start a new episode or load a specific case.
Request body:
{
"seed": 42,
"case_id": "CASE-D-001"
}
Fields:
| Field | Type | Required | Notes |
|---|---|---|---|
seed |
integer | no | used for random case selection |
case_id |
string | no | when provided, loads that specific case |
Example response:
{
"observation": {
"case_id": "CASE-D-001",
"task_type": "task_d",
"instruction": "Act as an AP analyst...",
"visible_documents": [
{
"doc_id": "INV-D-001",
"doc_type": "invoice",
"thumbnail": "thumbnail::INV-D-001",
"page_count": 1,
"language": "en",
"available_views": [
"thumbnail",
"zoom",
"get_doc_crop",
"ocr_fast",
"ocr_accurate"
]
}
],
"revealed_artifacts": [],
"pending_events": [],
"budget_remaining": 16.0,
"budget_total": 16.0,
"step_count": 0,
"max_steps": 18,
"case_clock": 0,
"risk_snapshot": {},
"investigation_status": {},
"last_tool_result": {},
"messages": ["Loaded case CASE-D-001"],
"allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"],
"available_interventions": ["request_callback_verification", "route_to_security"],
"case_metadata": {
"task_label": "AP inbox incident triage",
"due_date_days": 30
},
"portfolio_context": {}
},
"reward": 0.0,
"done": false,
"truncated": false,
"terminated": false,
"info": {
"case_id": "CASE-D-001"
}
}
POST /step
Execute one action.
Request body:
{
"action_type": "ocr",
"payload": {
"doc_id": "INV-D-001",
"mode": "accurate"
}
}
Example response:
{
"observation": {
"case_id": "CASE-D-001",
"step_count": 1,
"budget_remaining": 14.9,
"last_tool_result": {
"tool_name": "ocr",
"success": true,
"doc_id": "INV-D-001",
"mode": "accurate",
"scope": "document",
"text_preview": "Invoice ...",
"cost": 1.1,
"reward_model": {
"value": -0.055,
"terminal": false,
"components": {
"cost_penalty": -0.055,
"info_gain_bonus": 0.0,
"potential_delta": 0.0
},
"metadata": {
"action_type": "ocr",
"success": true
}
}
}
},
"reward": -0.055,
"done": false,
"truncated": false,
"terminated": false,
"info": {
"tool_name": "ocr",
"success": true,
"reward_model": {
"value": -0.055,
"terminal": false
}
}
}
GET /state
Return the current public environment state, not the full hidden system state.
Key fields:
| Field | Meaning |
|---|---|
episode_id |
current episode UUID |
case_id |
current case |
task_type |
task family |
budget_total, budget_remaining |
budget accounting |
step_count, case_clock, max_steps |
episode progress |
trajectory |
public action history |
interventions_taken |
public intervention log |
observed_risk_signals |
only signals the agent has revealed |
pending_events |
delayed artifacts waiting to resolve |
pressure_events_seen |
injected pressure events already observed |
terminal_reason |
why the episode ended if it ended |
GET /leaderboard
Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact.
Typical response shape:
{
"benchmark": "ledgershield-v3",
"generated_at": "2026-04-08T12:00:00+00:00",
"entries": [
{
"model": "openai/gpt-4.1-mini",
"type": "deterministic-policy",
"public_mean": 0.9674,
"holdout_mean": 0.6649,
"holdout_pass_k_consistent": 0.619
}
]
}
GET /benchmark-report
Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run benchmark_report.py.
Observation Shape
The observation returned by /reset and /step includes:
| Field | Type | Notes |
|---|---|---|
case_id |
string | current case ID |
task_type |
string | one of task_a..task_e |
instruction |
string | natural-language episode instruction |
visible_documents |
list | document catalog entries only, not raw OCR |
revealed_artifacts |
list | artifacts unlocked by interventions |
pending_events |
list | future artifact events not yet resolved |
budget_remaining |
float | current remaining budget |
budget_total |
float | episode budget |
step_count |
integer | executed step count |
max_steps |
integer | episode cap |
case_clock |
integer | logical clock used by delayed events |
risk_snapshot |
object | summarized public risk signals |
investigation_status |
object | tool/intervention/reveal counts |
last_tool_result |
object | payload from the most recent action |
messages |
list[string] | user-facing environment messages |
allowed_actions |
list[string] | investigation + intervention + final action names |
available_interventions |
list[string] | intervention subset |
case_metadata |
object | task label and due-date info |
portfolio_context |
object | cross-invoice/campaign context when relevant |
Action Taxonomy
Investigation actions
| Action | Required payload |
|---|---|
zoom |
doc_id, optional page, bbox |
get_doc_crop |
doc_id, optional page, bbox |
ocr |
doc_id, optional mode, page, bbox |
lookup_vendor |
vendor_key |
lookup_vendor_history |
vendor_key |
lookup_policy |
optional rule_id |
lookup_po |
po_id |
lookup_receipt |
receipt_id |
search_ledger |
optional vendor_key, invoice_number, amount |
inspect_email_thread |
thread_id |
compare_bank_account |
vendor_key, proposed_bank_account |
Intervention actions
| Action | Typical use |
|---|---|
request_callback_verification |
verify vendor identity or remittance changes |
freeze_vendor_profile |
contain high-risk vendor state |
request_bank_change_approval_chain |
unlock approval-chain artifact |
request_po_reconciliation |
unlock PO reconciliation artifact |
request_additional_receipt_evidence |
unlock receipt reconciliation artifact |
route_to_procurement |
route operationally |
route_to_security |
escalate suspicious incidents |
flag_duplicate_cluster_review |
request duplicate cluster artifact |
create_human_handoff |
create structured handoff packet |
Final decision action
submit_decision carries the structured task output.
Minimal example:
{
"action_type": "submit_decision",
"payload": {
"decision": "ESCALATE_FRAUD",
"confidence": 0.95,
"reason_codes": ["sender_domain_spoof", "bank_override_attempt"],
"policy_checks": {
"bank_change_verification": "fail"
},
"evidence_map": {}
}
}
Reward Model
Every step may include info.reward_model and observation.last_tool_result.reward_model with:
| Field | Meaning |
|---|---|
value |
scalar reward emitted for the step |
terminal |
whether the reward ended the episode |
components |
shaping/cost/outcome breakdown |
metadata |
action type, success flag, terminal reason, and other step context |
The environment currently combines:
- action cost penalties
- PBRS shaping delta
- information-gain bonus
- milestone rewards
- terminal score on
submit_decision
Python API Notes
The HTTP API is the main integration path, but the Python environment class also exposes:
LedgerShieldEnvironment.action_space()LedgerShieldEnvironment.observation_space()LedgerShieldEnvironment.render(mode="text")
These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints.
Agent Capability Profiles
The reference agent in inference.py uses a ModelCapabilityProfile to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment:
| Tier | Capability score | Plan mode | Repair level | Decision token budget |
|---|---|---|---|---|
| Elite | ≥ 5.0 | coverage | grounded | ≥ 1536 |
| Strong | ≥ 4.5 | hybrid | partial | ≥ 1280 |
| Standard | < 4.5 | LLM-first | none | model default |
The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains.