Sparks / README.md
KeithXD's picture
Add AuditRepairEnv++ interactive demo
28957f9
metadata
title: AuditRepairEnv++
emoji: πŸ“Š
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
tags:
  - reinforcement-learning
  - finance
  - ledger-repair
  - multi-step-decision-making
pinned: false

AuditRepairEnv++ β€” RL Environment for Cost-Constrained Iterative Ledger Repair

Multi-Step RL Environment | Financial Ledger Repair | Budget-Constrained Optimization

An OpenAI Gymnasium-compatible RL environment where agents must iteratively repair inconsistencies in a financial ledger while managing costs and avoiding cascading errors.

"An RL environment where fixing one problem can create another, and the agent must find the best sequence of fixes under cost constraints."


🎯 Core Problem

In real-world financial systems, inconsistencies arise due to failures, retries, and delayed updates. These problems are:

  • Interconnected: Fixing one error can introduce new errors
  • Hidden: Not all effects appear immediately
  • Costly: Each repair action has a monetary cost
  • Constrained: Work must be completed within a budget

Real-world impact: Financial reconciliation, audit repair, transaction correction in payment systems.


πŸ€– What the Agent Does

  1. Observes: Ledger state, errors, budget remaining
  2. Acts: Fix an entry, revert a change, or skip
  3. Learns: Which fixes minimize cost and side effects
  4. Balances:
    • Correctness (minimize errors)
    • Cost efficiency (stay within budget)
    • Caution (avoid overcorrection)

πŸ—οΈ Environment Architecture

Action Space

The agent can take one of 3 discrete actions:

Action Cost Effect
Fix (0) $10 Correct an entry error
Revert (1) $5 Undo the last fix action
Skip (2) $0 Do nothing

Observation Space

4-dimensional vector:

[
  error_ratio,        # (num_errors / num_transactions)
  total_cost,         # Cost spent so far
  actions_taken,      # Number of actions executed
  num_transactions    # Total transactions in ledger
]

Reward Function

Structurally:
  +10.0  per successful fix
  -3.0   per revert
  -1.0   per skip
  -20.0  if budget exceeded
  +50.0  bonus for achieving full consistency under budget
  -0.5   per action (discourage excessive fixes)

Deterministic and reproducible β€” same state & action always yields same reward.


πŸ“Š Task Scenarios

Scenario 1: Simple Repair (Easy)

Setup:

  • 20 transactions
  • 30% error rate (~6 errors)
  • $200 budget
  • Max 50 steps

Challenge: Fix all errors within budget.

Expected agent behavior: Fix errors sequentially while monitoring cost.

Scenario 2: Cascading Effects (Hard)

Setup:

  • 30 transactions
  • Errors have dependencies (fixing A can corrupt B)
  • $150 budget
  • Max 50 steps

Challenge: Identify correct fix sequence to avoid cascades.

Expected agent behavior: Learn to test fixes carefully; use revertsstrategically.

Scenario 3: Deep Complexity (Expert)

Setup:

  • 50+ transactions
  • Hidden dependencies across multiple entries
  • Limited budget, tight constraints
  • Max 100 steps

πŸš€ Quick Start

Installation

# Clone and install
git clone https://github.com/your-repo/auditrepairenv-plus.git
cd auditrepairenv-plus

pip install -e .

Running the Server

# Start the API server
python server.py

# Server runs on http://localhost:8000
# Docs: http://localhost:8000/docs

Using the Environment (Direct)

from chronostasis import LedgerRepairEnv

# Create environment
env = LedgerRepairEnv(
    num_transactions=20,
    error_probability=0.3,
    budget=200.0,
    max_steps=50
)

# Reset to start
obs, info = env.reset()

# Step through episode
for step in range(50):
    action = env.action_space.sample()  # Random policy
    obs, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        break

print(f"Final cost: ${info['total_cost']:.2f}")
print(f"Errors fixed: {env.initial_error_count - len(env.ledger.errors)}")

Using via REST API

# 1. Create environment
curl -X POST http://localhost:8000/env/create \
  -H "Content-Type: application/json" \
  -d '{
    "num_transactions": 20,
    "error_probability": 0.3,
    "budget": 200.0,
    "max_steps": 50
  }'

# Returns:
# {
#   "env_id": "a7f3k2j1",
#   "observation": [0.3, 0.0, 0, 20],
#   "info": {...}
# }

# 2. Take an action (fix action 0)
curl -X POST http://localhost:8000/env/a7f3k2j1/step \
  -H "Content-Type: application/json" \
  -d '{"action": 0}'

# 3. Check status
curl http://localhost:8000/env/a7f3k2j1/status

# 4. Render readable state
curl http://localhost:8000/env/a7f3k2j1/render

🧠 Example: Train a Baseline Agent

import gymnasium as gym
from stable_baselines3 import PPO
from chronostasis import LedgerRepairEnv

# Create environment
env = LedgerRepairEnv(
    num_transactions=20,
    error_probability=0.3,
    budget=200.0,
    max_steps=50
)

# Train with PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=50000)

# Evaluate
obs, info = env.reset()
for _ in range(100):
    action, _ = model.predict(obs)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break

print(f"βœ“ Episode completed with cost: ${info['total_cost']:.2f}")

πŸ“ˆ Evaluation Metrics

When submitting an agent, we score on:

Metric Definition Weight
Consistency Ratio (1 - errors_remaining / initial_errors) 0.40
Cost Efficiency max(0, 1 - cost/budget) 0.35
Action Efficiency (1 - actions_taken / max_steps) 0.15
Stability (1 - overcorrections / total_actions) 0.10

Final Score = weighted sum (0 to 1)


πŸ† Baseline Results

Baseline agent: Simple greedy fix strategy (always fix next available error)

Scenario Consistency Cost Efficiency Final Score
Simple (20 txns, $200) 0.95 0.72 0.81
Cascading (30 txns, $150) 0.78 0.45 0.65
Complex (50 txns, $200) 0.62 0.38 0.54

πŸ”§ Docker Deployment

# Build image
docker build -t auditrepairenv++ .

# Run locally
docker run -p 8000:8000 auditrepairenv++

# Or deploy to HuggingFace Spaces with Docker SDK

πŸ“š File Structure

.
β”œβ”€β”€ chronostasis/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── ledger_repair_env.py       # Core RL environment
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                     # FastAPI server
β”‚   └── static/
β”‚       └── index.html
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
└── README.md

❓ FAQ

Q1: Why use RL instead of a solver?

The system changes after every action. Classic optimization solvers assume static problems. RL naturally handles sequential decision-making where each step affects the next.

Q2: Is this realistic?

Yes. Financial reconciliation systems regularly face interdependent errors where fixing one entry impacts others. This is exactly what auditors deal with.

Q3: How do you measure success?

Deterministic scoring: consistency ratio, cost efficiency, action count, and stability. No randomnessβ€”reproducible results every time.

Q4: What makes the hard task difficult?

Hidden dependencies. Fixing entry A might silently corrupt entries B and C, which become visible only after subsequent checks. The agent must learn to be cautious.

Q5: Can I use my own agent?

Yes! The environment is Gymnasium-compatible. Use any RL framework (Stable Baselines3, RLlib, etc.) or hand-coded policies.

Q6: What's the license?

MIT. Free to use, modify, and distribute.


🀝 Contributing

Found a bug? Have an idea for a harder task variant? Open an issue or PR!


πŸ“– Citation

If you use AuditRepairEnv++ in your research, please cite:

@software{auditrepairenv2024,
  title={AuditRepairEnv++: RL Environment for Cost-Constrained Iterative Ledger Repair},
  author={Your Name},
  year={2024},
  url={https://github.com/your-repo/auditrepairenv-plus}
}

Built with ❀️ for the AI community. Let's teach agents to be careful accountants.