Spaces:

king673134
/

ledgershield

Sleeping

App Files Files Community

ledgershield / docs /api-reference.md

king673134

Upload folder using huggingface_hub

5f7588b verified 3 months ago

preview code

Raw

History Blame Contribute Delete

9.68 kB

	# API Reference

	LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle.

	## Base URL

	```text
	http://127.0.0.1:8000
	```

	## Response Envelope

	`POST /reset` and `POST /step` return a common top-level envelope:

	```json
	{
	"observation": {},
	"reward": 0.0,
	"done": false,
	"truncated": false,
	"terminated": false,
	"info": {}
	}
	```

	### Semantics

	- `done`: the episode has ended for any reason
	- `terminated`: a true terminal condition, currently a successful `submit_decision`
	- `truncated`: the episode ended because of budget exhaustion or max-step exhaustion
	- `info.reward_model`: structured reward breakdown for the last action

	## Endpoints

	### `GET /`

	Basic service probe.

	Example response:

	```json
	{
	"status": "ok",
	"service": "LedgerShield OpenEnv"
	}
	```

	### `GET /health`

	Health check used by local smoke tests, Docker smoke tests, and CI.

	Example response:

	```json
	{
	"status": "ok"
	}
	```

	### `POST /reset`

	Start a new episode or load a specific case.

	Request body:

	```json
	{
	"seed": 42,
	"case_id": "CASE-D-001"
	}
	```

	Fields:

	\| Field \| Type \| Required \| Notes \|
	\|---\|---\|---\|---\|
	\| `seed` \| integer \| no \| used for random case selection \|
	\| `case_id` \| string \| no \| when provided, loads that specific case \|

	Example response:

	```json
	{
	"observation": {
	"case_id": "CASE-D-001",
	"task_type": "task_d",
	"instruction": "Act as an AP analyst...",
	"visible_documents": [
	{
	"doc_id": "INV-D-001",
	"doc_type": "invoice",
	"thumbnail": "thumbnail::INV-D-001",
	"page_count": 1,
	"language": "en",
	"available_views": [
	"thumbnail",
	"zoom",
	"get_doc_crop",
	"ocr_fast",
	"ocr_accurate"
	]
	}
	],
	"revealed_artifacts": [],
	"pending_events": [],
	"budget_remaining": 16.0,
	"budget_total": 16.0,
	"step_count": 0,
	"max_steps": 18,
	"case_clock": 0,
	"risk_snapshot": {},
	"investigation_status": {},
	"last_tool_result": {},
	"messages": ["Loaded case CASE-D-001"],
	"allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"],
	"available_interventions": ["request_callback_verification", "route_to_security"],
	"case_metadata": {
	"task_label": "AP inbox incident triage",
	"due_date_days": 30
	},
	"portfolio_context": {}
	},
	"reward": 0.0,
	"done": false,
	"truncated": false,
	"terminated": false,
	"info": {
	"case_id": "CASE-D-001"
	}
	}
	```

	### `POST /step`

	Execute one action.

	Request body:

	```json
	{
	"action_type": "ocr",
	"payload": {
	"doc_id": "INV-D-001",
	"mode": "accurate"
	}
	}
	```

	Example response:

	```json
	{
	"observation": {
	"case_id": "CASE-D-001",
	"step_count": 1,
	"budget_remaining": 14.9,
	"last_tool_result": {
	"tool_name": "ocr",
	"success": true,
	"doc_id": "INV-D-001",
	"mode": "accurate",
	"scope": "document",
	"text_preview": "Invoice ...",
	"cost": 1.1,
	"reward_model": {
	"value": -0.055,
	"terminal": false,
	"components": {
	"cost_penalty": -0.055,
	"info_gain_bonus": 0.0,
	"potential_delta": 0.0
	},
	"metadata": {
	"action_type": "ocr",
	"success": true
	}
	}
	}
	},
	"reward": -0.055,
	"done": false,
	"truncated": false,
	"terminated": false,
	"info": {
	"tool_name": "ocr",
	"success": true,
	"reward_model": {
	"value": -0.055,
	"terminal": false
	}
	}
	}
	```

	### `GET /state`

	Return the current public environment state, not the full hidden system state.

	Key fields:

	\| Field \| Meaning \|
	\|---\|---\|
	\| `episode_id` \| current episode UUID \|
	\| `case_id` \| current case \|
	\| `task_type` \| task family \|
	\| `budget_total`, `budget_remaining` \| budget accounting \|
	\| `step_count`, `case_clock`, `max_steps` \| episode progress \|
	\| `trajectory` \| public action history \|
	\| `interventions_taken` \| public intervention log \|
	\| `observed_risk_signals` \| only signals the agent has revealed \|
	\| `pending_events` \| delayed artifacts waiting to resolve \|
	\| `pressure_events_seen` \| injected pressure events already observed \|
	\| `terminal_reason` \| why the episode ended if it ended \|

	### `GET /leaderboard`

	Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact.

	Typical response shape:

	```json
	{
	"benchmark": "ledgershield-v3",
	"generated_at": "2026-04-08T12:00:00+00:00",
	"entries": [
	{
	"model": "openai/gpt-4.1-mini",
	"type": "deterministic-policy",
	"public_mean": 0.9674,
	"holdout_mean": 0.6649,
	"holdout_pass_k_consistent": 0.619
	}
	]
	}
	```

	### `GET /benchmark-report`

	Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run `benchmark_report.py`.

	## Observation Shape

	The observation returned by `/reset` and `/step` includes:

	\| Field \| Type \| Notes \|
	\|---\|---\|---\|
	\| `case_id` \| string \| current case ID \|
	\| `task_type` \| string \| one of `task_a`..`task_e` \|
	\| `instruction` \| string \| natural-language episode instruction \|
	\| `visible_documents` \| list \| document catalog entries only, not raw OCR \|
	\| `revealed_artifacts` \| list \| artifacts unlocked by interventions \|
	\| `pending_events` \| list \| future artifact events not yet resolved \|
	\| `budget_remaining` \| float \| current remaining budget \|
	\| `budget_total` \| float \| episode budget \|
	\| `step_count` \| integer \| executed step count \|
	\| `max_steps` \| integer \| episode cap \|
	\| `case_clock` \| integer \| logical clock used by delayed events \|
	\| `risk_snapshot` \| object \| summarized public risk signals \|
	\| `investigation_status` \| object \| tool/intervention/reveal counts \|
	\| `last_tool_result` \| object \| payload from the most recent action \|
	\| `messages` \| list[string] \| user-facing environment messages \|
	\| `allowed_actions` \| list[string] \| investigation + intervention + final action names \|
	\| `available_interventions` \| list[string] \| intervention subset \|
	\| `case_metadata` \| object \| task label and due-date info \|
	\| `portfolio_context` \| object \| cross-invoice/campaign context when relevant \|

	## Action Taxonomy

	### Investigation actions

	\| Action \| Required payload \|
	\|---\|---\|
	\| `zoom` \| `doc_id`, optional `page`, `bbox` \|
	\| `get_doc_crop` \| `doc_id`, optional `page`, `bbox` \|
	\| `ocr` \| `doc_id`, optional `mode`, `page`, `bbox` \|
	\| `lookup_vendor` \| `vendor_key` \|
	\| `lookup_vendor_history` \| `vendor_key` \|
	\| `lookup_policy` \| optional `rule_id` \|
	\| `lookup_po` \| `po_id` \|
	\| `lookup_receipt` \| `receipt_id` \|
	\| `search_ledger` \| optional `vendor_key`, `invoice_number`, `amount` \|
	\| `inspect_email_thread` \| `thread_id` \|
	\| `compare_bank_account` \| `vendor_key`, `proposed_bank_account` \|

	### Intervention actions

	\| Action \| Typical use \|
	\|---\|---\|
	\| `request_callback_verification` \| verify vendor identity or remittance changes \|
	\| `freeze_vendor_profile` \| contain high-risk vendor state \|
	\| `request_bank_change_approval_chain` \| unlock approval-chain artifact \|
	\| `request_po_reconciliation` \| unlock PO reconciliation artifact \|
	\| `request_additional_receipt_evidence` \| unlock receipt reconciliation artifact \|
	\| `route_to_procurement` \| route operationally \|
	\| `route_to_security` \| escalate suspicious incidents \|
	\| `flag_duplicate_cluster_review` \| request duplicate cluster artifact \|
	\| `create_human_handoff` \| create structured handoff packet \|

	### Final decision action

	`submit_decision` carries the structured task output.

	Minimal example:

	```json
	{
	"action_type": "submit_decision",
	"payload": {
	"decision": "ESCALATE_FRAUD",
	"confidence": 0.95,
	"reason_codes": ["sender_domain_spoof", "bank_override_attempt"],
	"policy_checks": {
	"bank_change_verification": "fail"
	},
	"evidence_map": {}
	}
	}
	```

	## Reward Model

	Every step may include `info.reward_model` and `observation.last_tool_result.reward_model` with:

	\| Field \| Meaning \|
	\|---\|---\|
	\| `value` \| scalar reward emitted for the step \|
	\| `terminal` \| whether the reward ended the episode \|
	\| `components` \| shaping/cost/outcome breakdown \|
	\| `metadata` \| action type, success flag, terminal reason, and other step context \|

	The environment currently combines:

	- action cost penalties
	- PBRS shaping delta
	- information-gain bonus
	- milestone rewards
	- terminal score on `submit_decision`

	## Python API Notes

	The HTTP API is the main integration path, but the Python environment class also exposes:

	- `LedgerShieldEnvironment.action_space()`
	- `LedgerShieldEnvironment.observation_space()`
	- `LedgerShieldEnvironment.render(mode="text")`

	These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints.

	## Agent Capability Profiles

	The reference agent in `inference.py` uses a `ModelCapabilityProfile` to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment:

	\| Tier \| Capability score \| Plan mode \| Repair level \| Decision token budget \|
	\|---\|---\|---\|---\|---\|
	\| Elite \| ≥ 5.0 \| coverage \| grounded \| ≥ 1536 \|
	\| Strong \| ≥ 4.5 \| hybrid \| partial \| ≥ 1280 \|
	\| Standard \| < 4.5 \| LLM-first \| none \| model default \|

	The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains.