Spaces:

king673134
/

ledgershield

Sleeping

App Files Files Community

ledgershield / docs /api-reference.md

king673134

Upload folder using huggingface_hub

5f7588b verified 3 months ago

preview code

Raw

History Blame Contribute Delete

9.68 kB

API Reference

LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle.

Base URL

http://127.0.0.1:8000

Response Envelope

POST /reset and POST /step return a common top-level envelope:

{
  "observation": {},
  "reward": 0.0,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {}
}

Semantics

done: the episode has ended for any reason
terminated: a true terminal condition, currently a successful submit_decision
truncated: the episode ended because of budget exhaustion or max-step exhaustion
info.reward_model: structured reward breakdown for the last action

Endpoints

`GET /`

Basic service probe.

Example response:

{
  "status": "ok",
  "service": "LedgerShield OpenEnv"
}

`GET /health`

Health check used by local smoke tests, Docker smoke tests, and CI.

Example response:

{
  "status": "ok"
}

`POST /reset`

Start a new episode or load a specific case.

Request body:

{
  "seed": 42,
  "case_id": "CASE-D-001"
}

Fields:

Field	Type	Required	Notes
`seed`	integer	no	used for random case selection
`case_id`	string	no	when provided, loads that specific case

Example response:

{
  "observation": {
    "case_id": "CASE-D-001",
    "task_type": "task_d",
    "instruction": "Act as an AP analyst...",
    "visible_documents": [
      {
        "doc_id": "INV-D-001",
        "doc_type": "invoice",
        "thumbnail": "thumbnail::INV-D-001",
        "page_count": 1,
        "language": "en",
        "available_views": [
          "thumbnail",
          "zoom",
          "get_doc_crop",
          "ocr_fast",
          "ocr_accurate"
        ]
      }
    ],
    "revealed_artifacts": [],
    "pending_events": [],
    "budget_remaining": 16.0,
    "budget_total": 16.0,
    "step_count": 0,
    "max_steps": 18,
    "case_clock": 0,
    "risk_snapshot": {},
    "investigation_status": {},
    "last_tool_result": {},
    "messages": ["Loaded case CASE-D-001"],
    "allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"],
    "available_interventions": ["request_callback_verification", "route_to_security"],
    "case_metadata": {
      "task_label": "AP inbox incident triage",
      "due_date_days": 30
    },
    "portfolio_context": {}
  },
  "reward": 0.0,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {
    "case_id": "CASE-D-001"
  }
}

`POST /step`

Execute one action.

Request body:

{
  "action_type": "ocr",
  "payload": {
    "doc_id": "INV-D-001",
    "mode": "accurate"
  }
}

Example response:

{
  "observation": {
    "case_id": "CASE-D-001",
    "step_count": 1,
    "budget_remaining": 14.9,
    "last_tool_result": {
      "tool_name": "ocr",
      "success": true,
      "doc_id": "INV-D-001",
      "mode": "accurate",
      "scope": "document",
      "text_preview": "Invoice ...",
      "cost": 1.1,
      "reward_model": {
        "value": -0.055,
        "terminal": false,
        "components": {
          "cost_penalty": -0.055,
          "info_gain_bonus": 0.0,
          "potential_delta": 0.0
        },
        "metadata": {
          "action_type": "ocr",
          "success": true
        }
      }
    }
  },
  "reward": -0.055,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {
    "tool_name": "ocr",
    "success": true,
    "reward_model": {
      "value": -0.055,
      "terminal": false
    }
  }
}

`GET /state`

Return the current public environment state, not the full hidden system state.

Key fields:

Field	Meaning
`episode_id`	current episode UUID
`case_id`	current case
`task_type`	task family
`budget_total`, `budget_remaining`	budget accounting
`step_count`, `case_clock`, `max_steps`	episode progress
`trajectory`	public action history
`interventions_taken`	public intervention log
`observed_risk_signals`	only signals the agent has revealed
`pending_events`	delayed artifacts waiting to resolve
`pressure_events_seen`	injected pressure events already observed
`terminal_reason`	why the episode ended if it ended

`GET /leaderboard`

Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact.

Typical response shape:

{
  "benchmark": "ledgershield-v3",
  "generated_at": "2026-04-08T12:00:00+00:00",
  "entries": [
    {
      "model": "openai/gpt-4.1-mini",
      "type": "deterministic-policy",
      "public_mean": 0.9674,
      "holdout_mean": 0.6649,
      "holdout_pass_k_consistent": 0.619
    }
  ]
}

`GET /benchmark-report`

Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run benchmark_report.py.

Observation Shape

The observation returned by /reset and /step includes:

Field	Type	Notes
`case_id`	string	current case ID
`task_type`	string	one of `task_a`..`task_e`
`instruction`	string	natural-language episode instruction
`visible_documents`	list	document catalog entries only, not raw OCR
`revealed_artifacts`	list	artifacts unlocked by interventions
`pending_events`	list	future artifact events not yet resolved
`budget_remaining`	float	current remaining budget
`budget_total`	float	episode budget
`step_count`	integer	executed step count
`max_steps`	integer	episode cap
`case_clock`	integer	logical clock used by delayed events
`risk_snapshot`	object	summarized public risk signals
`investigation_status`	object	tool/intervention/reveal counts
`last_tool_result`	object	payload from the most recent action
`messages`	list[string]	user-facing environment messages
`allowed_actions`	list[string]	investigation + intervention + final action names
`available_interventions`	list[string]	intervention subset
`case_metadata`	object	task label and due-date info
`portfolio_context`	object	cross-invoice/campaign context when relevant

Action Taxonomy

Investigation actions

Action	Required payload
`zoom`	`doc_id`, optional `page`, `bbox`
`get_doc_crop`	`doc_id`, optional `page`, `bbox`
`ocr`	`doc_id`, optional `mode`, `page`, `bbox`
`lookup_vendor`	`vendor_key`
`lookup_vendor_history`	`vendor_key`
`lookup_policy`	optional `rule_id`
`lookup_po`	`po_id`
`lookup_receipt`	`receipt_id`
`search_ledger`	optional `vendor_key`, `invoice_number`, `amount`
`inspect_email_thread`	`thread_id`
`compare_bank_account`	`vendor_key`, `proposed_bank_account`

Intervention actions

Action	Typical use
`request_callback_verification`	verify vendor identity or remittance changes
`freeze_vendor_profile`	contain high-risk vendor state
`request_bank_change_approval_chain`	unlock approval-chain artifact
`request_po_reconciliation`	unlock PO reconciliation artifact
`request_additional_receipt_evidence`	unlock receipt reconciliation artifact
`route_to_procurement`	route operationally
`route_to_security`	escalate suspicious incidents
`flag_duplicate_cluster_review`	request duplicate cluster artifact
`create_human_handoff`	create structured handoff packet

Final decision action

submit_decision carries the structured task output.

Minimal example:

{
  "action_type": "submit_decision",
  "payload": {
    "decision": "ESCALATE_FRAUD",
    "confidence": 0.95,
    "reason_codes": ["sender_domain_spoof", "bank_override_attempt"],
    "policy_checks": {
      "bank_change_verification": "fail"
    },
    "evidence_map": {}
  }
}

Reward Model

Every step may include info.reward_model and observation.last_tool_result.reward_model with:

Field	Meaning
`value`	scalar reward emitted for the step
`terminal`	whether the reward ended the episode
`components`	shaping/cost/outcome breakdown
`metadata`	action type, success flag, terminal reason, and other step context

The environment currently combines:

action cost penalties
PBRS shaping delta
information-gain bonus
milestone rewards
terminal score on submit_decision

Python API Notes

The HTTP API is the main integration path, but the Python environment class also exposes:

LedgerShieldEnvironment.action_space()
LedgerShieldEnvironment.observation_space()
LedgerShieldEnvironment.render(mode="text")

These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints.

Agent Capability Profiles

The reference agent in inference.py uses a ModelCapabilityProfile to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment:

Tier	Capability score	Plan mode	Repair level	Decision token budget
Elite	≥ 5.0	coverage	grounded	≥ 1536
Strong	≥ 4.5	hybrid	partial	≥ 1280
Standard	< 4.5	LLM-first	none	model default

The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains.