ledgershield / docs /api-reference.md
king673134's picture
Upload folder using huggingface_hub
5f7588b verified
|
Raw
History Blame Contribute Delete
9.68 kB

API Reference

LedgerShield exposes an OpenEnv-compatible HTTP API backed by FastAPI. This page documents the endpoints, action payloads, response envelope, and the key object shapes an agent needs to handle.

Base URL

http://127.0.0.1:8000

Response Envelope

POST /reset and POST /step return a common top-level envelope:

{
  "observation": {},
  "reward": 0.0,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {}
}

Semantics

  • done: the episode has ended for any reason
  • terminated: a true terminal condition, currently a successful submit_decision
  • truncated: the episode ended because of budget exhaustion or max-step exhaustion
  • info.reward_model: structured reward breakdown for the last action

Endpoints

GET /

Basic service probe.

Example response:

{
  "status": "ok",
  "service": "LedgerShield OpenEnv"
}

GET /health

Health check used by local smoke tests, Docker smoke tests, and CI.

Example response:

{
  "status": "ok"
}

POST /reset

Start a new episode or load a specific case.

Request body:

{
  "seed": 42,
  "case_id": "CASE-D-001"
}

Fields:

Field Type Required Notes
seed integer no used for random case selection
case_id string no when provided, loads that specific case

Example response:

{
  "observation": {
    "case_id": "CASE-D-001",
    "task_type": "task_d",
    "instruction": "Act as an AP analyst...",
    "visible_documents": [
      {
        "doc_id": "INV-D-001",
        "doc_type": "invoice",
        "thumbnail": "thumbnail::INV-D-001",
        "page_count": 1,
        "language": "en",
        "available_views": [
          "thumbnail",
          "zoom",
          "get_doc_crop",
          "ocr_fast",
          "ocr_accurate"
        ]
      }
    ],
    "revealed_artifacts": [],
    "pending_events": [],
    "budget_remaining": 16.0,
    "budget_total": 16.0,
    "step_count": 0,
    "max_steps": 18,
    "case_clock": 0,
    "risk_snapshot": {},
    "investigation_status": {},
    "last_tool_result": {},
    "messages": ["Loaded case CASE-D-001"],
    "allowed_actions": ["zoom", "get_doc_crop", "ocr", "submit_decision"],
    "available_interventions": ["request_callback_verification", "route_to_security"],
    "case_metadata": {
      "task_label": "AP inbox incident triage",
      "due_date_days": 30
    },
    "portfolio_context": {}
  },
  "reward": 0.0,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {
    "case_id": "CASE-D-001"
  }
}

POST /step

Execute one action.

Request body:

{
  "action_type": "ocr",
  "payload": {
    "doc_id": "INV-D-001",
    "mode": "accurate"
  }
}

Example response:

{
  "observation": {
    "case_id": "CASE-D-001",
    "step_count": 1,
    "budget_remaining": 14.9,
    "last_tool_result": {
      "tool_name": "ocr",
      "success": true,
      "doc_id": "INV-D-001",
      "mode": "accurate",
      "scope": "document",
      "text_preview": "Invoice ...",
      "cost": 1.1,
      "reward_model": {
        "value": -0.055,
        "terminal": false,
        "components": {
          "cost_penalty": -0.055,
          "info_gain_bonus": 0.0,
          "potential_delta": 0.0
        },
        "metadata": {
          "action_type": "ocr",
          "success": true
        }
      }
    }
  },
  "reward": -0.055,
  "done": false,
  "truncated": false,
  "terminated": false,
  "info": {
    "tool_name": "ocr",
    "success": true,
    "reward_model": {
      "value": -0.055,
      "terminal": false
    }
  }
}

GET /state

Return the current public environment state, not the full hidden system state.

Key fields:

Field Meaning
episode_id current episode UUID
case_id current case
task_type task family
budget_total, budget_remaining budget accounting
step_count, case_clock, max_steps episode progress
trajectory public action history
interventions_taken public intervention log
observed_risk_signals only signals the agent has revealed
pending_events delayed artifacts waiting to resolve
pressure_events_seen injected pressure events already observed
terminal_reason why the episode ended if it ended

GET /leaderboard

Returns leaderboard entries if a leaderboard artifact exists, otherwise derives a minimal payload from the latest benchmark report artifact.

Typical response shape:

{
  "benchmark": "ledgershield-v3",
  "generated_at": "2026-04-08T12:00:00+00:00",
  "entries": [
    {
      "model": "openai/gpt-4.1-mini",
      "type": "deterministic-policy",
      "public_mean": 0.9674,
      "holdout_mean": 0.6649,
      "holdout_pass_k_consistent": 0.619
    }
  ]
}

GET /benchmark-report

Returns the latest benchmark report artifact if present. If none exists yet, the endpoint returns a placeholder note telling you to run benchmark_report.py.

Observation Shape

The observation returned by /reset and /step includes:

Field Type Notes
case_id string current case ID
task_type string one of task_a..task_e
instruction string natural-language episode instruction
visible_documents list document catalog entries only, not raw OCR
revealed_artifacts list artifacts unlocked by interventions
pending_events list future artifact events not yet resolved
budget_remaining float current remaining budget
budget_total float episode budget
step_count integer executed step count
max_steps integer episode cap
case_clock integer logical clock used by delayed events
risk_snapshot object summarized public risk signals
investigation_status object tool/intervention/reveal counts
last_tool_result object payload from the most recent action
messages list[string] user-facing environment messages
allowed_actions list[string] investigation + intervention + final action names
available_interventions list[string] intervention subset
case_metadata object task label and due-date info
portfolio_context object cross-invoice/campaign context when relevant

Action Taxonomy

Investigation actions

Action Required payload
zoom doc_id, optional page, bbox
get_doc_crop doc_id, optional page, bbox
ocr doc_id, optional mode, page, bbox
lookup_vendor vendor_key
lookup_vendor_history vendor_key
lookup_policy optional rule_id
lookup_po po_id
lookup_receipt receipt_id
search_ledger optional vendor_key, invoice_number, amount
inspect_email_thread thread_id
compare_bank_account vendor_key, proposed_bank_account

Intervention actions

Action Typical use
request_callback_verification verify vendor identity or remittance changes
freeze_vendor_profile contain high-risk vendor state
request_bank_change_approval_chain unlock approval-chain artifact
request_po_reconciliation unlock PO reconciliation artifact
request_additional_receipt_evidence unlock receipt reconciliation artifact
route_to_procurement route operationally
route_to_security escalate suspicious incidents
flag_duplicate_cluster_review request duplicate cluster artifact
create_human_handoff create structured handoff packet

Final decision action

submit_decision carries the structured task output.

Minimal example:

{
  "action_type": "submit_decision",
  "payload": {
    "decision": "ESCALATE_FRAUD",
    "confidence": 0.95,
    "reason_codes": ["sender_domain_spoof", "bank_override_attempt"],
    "policy_checks": {
      "bank_change_verification": "fail"
    },
    "evidence_map": {}
  }
}

Reward Model

Every step may include info.reward_model and observation.last_tool_result.reward_model with:

Field Meaning
value scalar reward emitted for the step
terminal whether the reward ended the episode
components shaping/cost/outcome breakdown
metadata action type, success flag, terminal reason, and other step context

The environment currently combines:

  • action cost penalties
  • PBRS shaping delta
  • information-gain bonus
  • milestone rewards
  • terminal score on submit_decision

Python API Notes

The HTTP API is the main integration path, but the Python environment class also exposes:

  • LedgerShieldEnvironment.action_space()
  • LedgerShieldEnvironment.observation_space()
  • LedgerShieldEnvironment.render(mode="text")

These are useful for local experiments and Gymnasium-style tooling, but they are not separate REST endpoints.

Agent Capability Profiles

The reference agent in inference.py uses a ModelCapabilityProfile to adapt behavior to model strength. This is part of the agent-side logic, not the server API, but it affects how different models interact with the environment:

Tier Capability score Plan mode Repair level Decision token budget
Elite ≥ 5.0 coverage grounded ≥ 1536
Strong ≥ 4.5 hybrid partial ≥ 1280
Standard < 4.5 LLM-first none model default

The tier determines investigation and intervention budget bonuses, whether repair attempts are made on malformed outputs, and how much planning context the agent maintains.