title: AuditRepairEnv++
emoji: π
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
tags:
- reinforcement-learning
- finance
- ledger-repair
- multi-step-decision-making
pinned: false
AuditRepairEnv++ β RL Environment for Cost-Constrained Iterative Ledger Repair
Multi-Step RL Environment | Financial Ledger Repair | Budget-Constrained Optimization
An OpenAI Gymnasium-compatible RL environment where agents must iteratively repair inconsistencies in a financial ledger while managing costs and avoiding cascading errors.
"An RL environment where fixing one problem can create another, and the agent must find the best sequence of fixes under cost constraints."
π― Core Problem
In real-world financial systems, inconsistencies arise due to failures, retries, and delayed updates. These problems are:
- Interconnected: Fixing one error can introduce new errors
- Hidden: Not all effects appear immediately
- Costly: Each repair action has a monetary cost
- Constrained: Work must be completed within a budget
Real-world impact: Financial reconciliation, audit repair, transaction correction in payment systems.
π€ What the Agent Does
- Observes: Ledger state, errors, budget remaining
- Acts: Fix an entry, revert a change, or skip
- Learns: Which fixes minimize cost and side effects
- Balances:
- Correctness (minimize errors)
- Cost efficiency (stay within budget)
- Caution (avoid overcorrection)
ποΈ Environment Architecture
Action Space
The agent can take one of 3 discrete actions:
| Action | Cost | Effect |
|---|---|---|
| Fix (0) | $10 | Correct an entry error |
| Revert (1) | $5 | Undo the last fix action |
| Skip (2) | $0 | Do nothing |
Observation Space
4-dimensional vector:
[
error_ratio, # (num_errors / num_transactions)
total_cost, # Cost spent so far
actions_taken, # Number of actions executed
num_transactions # Total transactions in ledger
]
Reward Function
Structurally:
+10.0 per successful fix
-3.0 per revert
-1.0 per skip
-20.0 if budget exceeded
+50.0 bonus for achieving full consistency under budget
-0.5 per action (discourage excessive fixes)
Deterministic and reproducible β same state & action always yields same reward.
π Task Scenarios
Scenario 1: Simple Repair (Easy)
Setup:
- 20 transactions
- 30% error rate (~6 errors)
- $200 budget
- Max 50 steps
Challenge: Fix all errors within budget.
Expected agent behavior: Fix errors sequentially while monitoring cost.
Scenario 2: Cascading Effects (Hard)
Setup:
- 30 transactions
- Errors have dependencies (fixing A can corrupt B)
- $150 budget
- Max 50 steps
Challenge: Identify correct fix sequence to avoid cascades.
Expected agent behavior: Learn to test fixes carefully; use revertsstrategically.
Scenario 3: Deep Complexity (Expert)
Setup:
- 50+ transactions
- Hidden dependencies across multiple entries
- Limited budget, tight constraints
- Max 100 steps
π Quick Start
Installation
# Clone and install
git clone https://github.com/your-repo/auditrepairenv-plus.git
cd auditrepairenv-plus
pip install -e .
Running the Server
# Start the API server
python server.py
# Server runs on http://localhost:8000
# Docs: http://localhost:8000/docs
Using the Environment (Direct)
from chronostasis import LedgerRepairEnv
# Create environment
env = LedgerRepairEnv(
num_transactions=20,
error_probability=0.3,
budget=200.0,
max_steps=50
)
# Reset to start
obs, info = env.reset()
# Step through episode
for step in range(50):
action = env.action_space.sample() # Random policy
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
print(f"Final cost: ${info['total_cost']:.2f}")
print(f"Errors fixed: {env.initial_error_count - len(env.ledger.errors)}")
Using via REST API
# 1. Create environment
curl -X POST http://localhost:8000/env/create \
-H "Content-Type: application/json" \
-d '{
"num_transactions": 20,
"error_probability": 0.3,
"budget": 200.0,
"max_steps": 50
}'
# Returns:
# {
# "env_id": "a7f3k2j1",
# "observation": [0.3, 0.0, 0, 20],
# "info": {...}
# }
# 2. Take an action (fix action 0)
curl -X POST http://localhost:8000/env/a7f3k2j1/step \
-H "Content-Type: application/json" \
-d '{"action": 0}'
# 3. Check status
curl http://localhost:8000/env/a7f3k2j1/status
# 4. Render readable state
curl http://localhost:8000/env/a7f3k2j1/render
π§ Example: Train a Baseline Agent
import gymnasium as gym
from stable_baselines3 import PPO
from chronostasis import LedgerRepairEnv
# Create environment
env = LedgerRepairEnv(
num_transactions=20,
error_probability=0.3,
budget=200.0,
max_steps=50
)
# Train with PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=50000)
# Evaluate
obs, info = env.reset()
for _ in range(100):
action, _ = model.predict(obs)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
print(f"β Episode completed with cost: ${info['total_cost']:.2f}")
π Evaluation Metrics
When submitting an agent, we score on:
| Metric | Definition | Weight |
|---|---|---|
| Consistency Ratio | (1 - errors_remaining / initial_errors) | 0.40 |
| Cost Efficiency | max(0, 1 - cost/budget) | 0.35 |
| Action Efficiency | (1 - actions_taken / max_steps) | 0.15 |
| Stability | (1 - overcorrections / total_actions) | 0.10 |
Final Score = weighted sum (0 to 1)
π Baseline Results
Baseline agent: Simple greedy fix strategy (always fix next available error)
| Scenario | Consistency | Cost Efficiency | Final Score |
|---|---|---|---|
| Simple (20 txns, $200) | 0.95 | 0.72 | 0.81 |
| Cascading (30 txns, $150) | 0.78 | 0.45 | 0.65 |
| Complex (50 txns, $200) | 0.62 | 0.38 | 0.54 |
π§ Docker Deployment
# Build image
docker build -t auditrepairenv++ .
# Run locally
docker run -p 8000:8000 auditrepairenv++
# Or deploy to HuggingFace Spaces with Docker SDK
π File Structure
.
βββ chronostasis/
β βββ __init__.py
β βββ ledger_repair_env.py # Core RL environment
βββ server/
β βββ app.py # FastAPI server
β βββ static/
β βββ index.html
βββ pyproject.toml
βββ requirements.txt
βββ Dockerfile
βββ README.md
β FAQ
Q1: Why use RL instead of a solver?
The system changes after every action. Classic optimization solvers assume static problems. RL naturally handles sequential decision-making where each step affects the next.
Q2: Is this realistic?
Yes. Financial reconciliation systems regularly face interdependent errors where fixing one entry impacts others. This is exactly what auditors deal with.
Q3: How do you measure success?
Deterministic scoring: consistency ratio, cost efficiency, action count, and stability. No randomnessβreproducible results every time.
Q4: What makes the hard task difficult?
Hidden dependencies. Fixing entry A might silently corrupt entries B and C, which become visible only after subsequent checks. The agent must learn to be cautious.
Q5: Can I use my own agent?
Yes! The environment is Gymnasium-compatible. Use any RL framework (Stable Baselines3, RLlib, etc.) or hand-coded policies.
Q6: What's the license?
MIT. Free to use, modify, and distribute.
π€ Contributing
Found a bug? Have an idea for a harder task variant? Open an issue or PR!
π Citation
If you use AuditRepairEnv++ in your research, please cite:
@software{auditrepairenv2024,
title={AuditRepairEnv++: RL Environment for Cost-Constrained Iterative Ledger Repair},
author={Your Name},
year={2024},
url={https://github.com/your-repo/auditrepairenv-plus}
}
Built with β€οΈ for the AI community. Let's teach agents to be careful accountants.