Spaces:

KeithXD
/

Sparks

Runtime error

App Files Files Community

Sparks / README.md

KeithXD

Add AuditRepairEnv++ interactive demo

28957f9 10 days ago

preview code

raw

history blame contribute delete

8.39 kB

metadata

title: AuditRepairEnv++
emoji: 📊
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
tags:
  - reinforcement-learning
  - finance
  - ledger-repair
  - multi-step-decision-making
pinned: false

AuditRepairEnv++ — RL Environment for Cost-Constrained Iterative Ledger Repair

Multi-Step RL Environment | Financial Ledger Repair | Budget-Constrained Optimization

An OpenAI Gymnasium-compatible RL environment where agents must iteratively repair inconsistencies in a financial ledger while managing costs and avoiding cascading errors.

"An RL environment where fixing one problem can create another, and the agent must find the best sequence of fixes under cost constraints."

🎯 Core Problem

In real-world financial systems, inconsistencies arise due to failures, retries, and delayed updates. These problems are:

Interconnected: Fixing one error can introduce new errors
Hidden: Not all effects appear immediately
Costly: Each repair action has a monetary cost
Constrained: Work must be completed within a budget

Real-world impact: Financial reconciliation, audit repair, transaction correction in payment systems.

🤖 What the Agent Does

Observes: Ledger state, errors, budget remaining
Acts: Fix an entry, revert a change, or skip
Learns: Which fixes minimize cost and side effects
Balances:
- Correctness (minimize errors)
- Cost efficiency (stay within budget)
- Caution (avoid overcorrection)

🏗️ Environment Architecture

Action Space

The agent can take one of 3 discrete actions:

Action	Cost	Effect
Fix (0)	$10	Correct an entry error
Revert (1)	$5	Undo the last fix action
Skip (2)	$0	Do nothing

Observation Space

4-dimensional vector:

[
  error_ratio,        # (num_errors / num_transactions)
  total_cost,         # Cost spent so far
  actions_taken,      # Number of actions executed
  num_transactions    # Total transactions in ledger
]

Reward Function

Structurally:
  +10.0  per successful fix
  -3.0   per revert
  -1.0   per skip
  -20.0  if budget exceeded
  +50.0  bonus for achieving full consistency under budget
  -0.5   per action (discourage excessive fixes)

Deterministic and reproducible — same state & action always yields same reward.

📊 Task Scenarios

Scenario 1: Simple Repair (Easy)

Setup:

20 transactions
30% error rate (~6 errors)
$200 budget
Max 50 steps

Challenge: Fix all errors within budget.

Expected agent behavior: Fix errors sequentially while monitoring cost.

Scenario 2: Cascading Effects (Hard)

Setup:

30 transactions
Errors have dependencies (fixing A can corrupt B)
$150 budget
Max 50 steps

Challenge: Identify correct fix sequence to avoid cascades.

Expected agent behavior: Learn to test fixes carefully; use revertsstrategically.

Scenario 3: Deep Complexity (Expert)

Setup:

50+ transactions
Hidden dependencies across multiple entries
Limited budget, tight constraints
Max 100 steps

🚀 Quick Start

Installation

# Clone and install
git clone https://github.com/your-repo/auditrepairenv-plus.git
cd auditrepairenv-plus

pip install -e .

Running the Server

# Start the API server
python server.py

# Server runs on http://localhost:8000
# Docs: http://localhost:8000/docs

Using the Environment (Direct)

from chronostasis import LedgerRepairEnv

# Create environment
env = LedgerRepairEnv(
    num_transactions=20,
    error_probability=0.3,
    budget=200.0,
    max_steps=50
)

# Reset to start
obs, info = env.reset()

# Step through episode
for step in range(50):
    action = env.action_space.sample()  # Random policy
    obs, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        break

print(f"Final cost: ${info['total_cost']:.2f}")
print(f"Errors fixed: {env.initial_error_count - len(env.ledger.errors)}")

Using via REST API

# 1. Create environment
curl -X POST http://localhost:8000/env/create \
  -H "Content-Type: application/json" \
  -d '{
    "num_transactions": 20,
    "error_probability": 0.3,
    "budget": 200.0,
    "max_steps": 50
  }'

# Returns:
# {
#   "env_id": "a7f3k2j1",
#   "observation": [0.3, 0.0, 0, 20],
#   "info": {...}
# }

# 2. Take an action (fix action 0)
curl -X POST http://localhost:8000/env/a7f3k2j1/step \
  -H "Content-Type: application/json" \
  -d '{"action": 0}'

# 3. Check status
curl http://localhost:8000/env/a7f3k2j1/status

# 4. Render readable state
curl http://localhost:8000/env/a7f3k2j1/render

🧠 Example: Train a Baseline Agent

import gymnasium as gym
from stable_baselines3 import PPO
from chronostasis import LedgerRepairEnv

# Create environment
env = LedgerRepairEnv(
    num_transactions=20,
    error_probability=0.3,
    budget=200.0,
    max_steps=50
)

# Train with PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=50000)

# Evaluate
obs, info = env.reset()
for _ in range(100):
    action, _ = model.predict(obs)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break

print(f"✓ Episode completed with cost: ${info['total_cost']:.2f}")

📈 Evaluation Metrics

When submitting an agent, we score on:

Metric	Definition	Weight
Consistency Ratio	(1 - errors_remaining / initial_errors)	0.40
Cost Efficiency	max(0, 1 - cost/budget)	0.35
Action Efficiency	(1 - actions_taken / max_steps)	0.15
Stability	(1 - overcorrections / total_actions)	0.10

Final Score = weighted sum (0 to 1)

🏆 Baseline Results

Baseline agent: Simple greedy fix strategy (always fix next available error)

Scenario	Consistency	Cost Efficiency	Final Score
Simple (20 txns, $200)	0.95	0.72	0.81
Cascading (30 txns, $150)	0.78	0.45	0.65
Complex (50 txns, $200)	0.62	0.38	0.54

🔧 Docker Deployment

# Build image
docker build -t auditrepairenv++ .

# Run locally
docker run -p 8000:8000 auditrepairenv++

# Or deploy to HuggingFace Spaces with Docker SDK

📚 File Structure

.
├── chronostasis/
│   ├── __init__.py
│   └── ledger_repair_env.py       # Core RL environment
├── server/
│   ├── app.py                     # FastAPI server
│   └── static/
│       └── index.html
├── pyproject.toml
├── requirements.txt
├── Dockerfile
└── README.md

❓ FAQ

Q1: Why use RL instead of a solver?

The system changes after every action. Classic optimization solvers assume static problems. RL naturally handles sequential decision-making where each step affects the next.

Q2: Is this realistic?

Yes. Financial reconciliation systems regularly face interdependent errors where fixing one entry impacts others. This is exactly what auditors deal with.

Q3: How do you measure success?

Deterministic scoring: consistency ratio, cost efficiency, action count, and stability. No randomness—reproducible results every time.

Q4: What makes the hard task difficult?

Hidden dependencies. Fixing entry A might silently corrupt entries B and C, which become visible only after subsequent checks. The agent must learn to be cautious.

Q5: Can I use my own agent?

Yes! The environment is Gymnasium-compatible. Use any RL framework (Stable Baselines3, RLlib, etc.) or hand-coded policies.

Q6: What's the license?

MIT. Free to use, modify, and distribute.

🤝 Contributing

Found a bug? Have an idea for a harder task variant? Open an issue or PR!

📖 Citation

If you use AuditRepairEnv++ in your research, please cite:

@software{auditrepairenv2024,
  title={AuditRepairEnv++: RL Environment for Cost-Constrained Iterative Ledger Repair},
  author={Your Name},
  year={2024},
  url={https://github.com/your-repo/auditrepairenv-plus}
}

Built with ❤️ for the AI community. Let's teach agents to be careful accountants.