File size: 12,068 Bytes
4702dbb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | ---
title: AuditRepairEnv++
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
tags:
- openenv
- ledger-repair
- reinforcement-learning
- dependency-propagation
pinned: false
---
# AuditRepairEnv++ β Cost-Constrained Iterative Ledger Repair
**OpenEnv Environment | RL for Financial Ledger Auditing**
An RL environment where an AI agent must repair inconsistencies in a financial ledger. Errors are interdependent β fixing one entry may introduce new errors in dependent entries. The agent must maximize ledger correctness while minimizing cost and avoiding overcorrection, all under a limited budget.
---
## Problem Description
A financial ledger contains entries where `value β expected_value` (errors). These errors are interconnected through a **hidden dependency graph** β fixing one entry can cascade changes to the `expected_value` of dependent entries, potentially creating new errors.
The agent has a **limited action budget** and must strategically choose which entries to fix and in what order to:
1. **Maximize consistency** β fix as many errors as possible
2. **Minimize cost** β use the fewest actions possible
3. **Avoid overcorrection** β don't fix entries that are already correct
---
## Solution Approach
**AuditRepairEnv++** addresses this challenge by:
1. **Modeling Real Dependencies** β Entries are linked through a dependency DAG, simulating cascading effects in real ledgers
2. **Cost-Constrained Optimization** β Agents must repair ledgers within a limited budget, forcing strategic decisions
3. **Multi-Objective Scoring** β Balances correctness, efficiency, and overcorrection penalties
4. **Scalable Difficulty** β Three task levels (easy/medium/hard) with increasing complexity
5. **OpenEnv-Compatible API** β Standard HTTP endpoints for seamless integration with any LLM agent
This environment tests an LLM agent's ability to:
- Parse complex structured state (ledger + dependencies)
- Reason about side effects (dependency propagation)
- Plan multi-step actions under uncertainty
- Handle budget constraints and trade-offs
---
## RL Reasoning
This environment tests **multi-step decision making** under uncertainty:
- **State**: The current ledger, errors, remaining budget, and step count
- **Actions**: FIX_ENTRY, ADJUST_ENTRY, REVERT_ENTRY, NO_OP
- **Transitions**: Non-trivial due to dependency propagation
- **Reward**: Composite score based on consistency, efficiency, budget usage, and overcorrection penalties
The key challenge is that actions have **side effects** (dependency propagation), requiring the agent to plan ahead and reason about cascading consequences.
---
## Action Space
| Action | Description | Cost |
|--------|-------------|------|
| `FIX_ENTRY <id>` | Sets `value = expected_value` for the entry. Triggers dependency updates. | 1 |
| `ADJUST_ENTRY <id> <delta>` | Increments/decrements the entry's value by delta. | 1 |
| `REVERT_ENTRY <id>` | Undoes the last change to an entry. | 1 |
| `NO_OP` | Does nothing. No budget cost. | 0 |
### Action Model (Pydantic)
```python
class AuditAction(BaseModel):
action_type: str # FIX_ENTRY | ADJUST_ENTRY | REVERT_ENTRY | NO_OP
target_id: int # ID of the ledger entry (not needed for NO_OP)
adjust_delta: int # +/- value for ADJUST_ENTRY
```
---
## Observation Space
```json
{
"task_id": "medium",
"task_description": "Repair a financial ledger with 8 entries...",
"ledger": [
{"id": 0, "value": 100, "expected_value": 100, "dependencies": []},
{"id": 1, "value": 180, "expected_value": 200, "dependencies": [3, 5]}
],
"errors": [
{"entry_id": 1, "current_value": 180, "expected_value": 200, "delta": -20}
],
"remaining_budget": 12,
"initial_budget": 12,
"step": 0,
"max_steps": 15,
"done": false
}
```
> **Note**: In `hard` mode, the `dependencies` list is hidden (shown as `[]`), requiring the agent to discover dependency effects through interaction.
---
## Tasks
### Task 1 β Easy Ledger Repair Β· `easy` Β· max 10 steps Β· budget 10
> 5 independent entries, 3 errors, no dependencies.
The simplest tier β errors are independent and can be fixed in any order. Tests basic comprehension and action selection.
### Task 2 β Medium Ledger Repair Β· `medium` Β· max 15 steps Β· budget 12
> 8 entries with visible dependencies and moderate budget.
Fixing entry 1 changes `expected_value` of entries 3 and 5. The agent must reason about repair ordering to avoid creating new errors.
### Task 3 β Hard Ledger Repair Β· `hard` Β· max 12 steps Β· budget 8
> 10 entries with HIDDEN dependency graph. Cascading errors. Tight budget.
Dependencies are **not visible** in observations. Fixing entries triggers hidden cascades. Overcorrection is heavily penalized. Requires exploration and strategic planning.
---
## Reward / Scoring Logic
Final score is computed **deterministically** (no randomness):
```
score = 0.5 Γ consistency_score
+ 0.3 Γ efficiency_score
+ 0.2 Γ budget_remaining_ratio
β overcorrection_penalty
```
Where:
- `consistency_score` = `correct_entries / total_entries`
- `efficiency_score` = `optimal_steps / actual_steps` (capped at 1.0)
- `budget_remaining_ratio` = `remaining_budget / initial_budget`
- `overcorrection_penalty` = `0.05 Γ overcorrection_count`
Final score is clamped to **[0.0, 1.0]**.
---
## Setup & Running
### Local
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Start the environment server
python server.py
# 3. Set env vars for inference
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="hf_..."
# 4. Run the inference agent
python inference.py
```
### Docker
```bash
docker build -t auditrepairenv .
docker run -p 7860:7860 \
-e HF_TOKEN=hf_... \
auditrepairenv
```
### How to run inference.py
```bash
# Set required environment variables
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="hf_..."
export ENV_BASE_URL="http://localhost:7860"
# Run the agent (runs all 3 tasks: easy, medium, hard)
python inference.py
```
The inference script will:
1. Connect to the environment server at `ENV_BASE_URL`
2. Run each task (easy β medium β hard) sequentially
3. Use the LLM to decide repair actions at each step
4. Print structured logs in the required format
5. Output final scores for each task
### Validate
```bash
# Verify the space is running
curl -X POST http://localhost:7860/reset -d '{"task_id":"easy"}' -H "Content-Type: application/json"
# Check health
curl http://localhost:7860/health
```
---
## Baseline Results
Baseline agent: `inference.py` with `Qwen/Qwen2.5-72B-Instruct`
| Task | Score |
|--------|-------|
| easy | 0.90 |
| medium | 0.70 |
| hard | 0.55 |
---
## Deployment & Submission
### π Submission Checklist
Before submitting, verify:
β
**Files at root**:
- [ ] `inference.py` β exactly at root (not in subfolder)
- [ ] `requirements.txt` β all dependencies listed
- [ ] `README.md` β clear setup instructions
- [ ] `demo.py` β working Gradio UI
- [ ] `Dockerfile` β builds successfully
β
**inference.py Requirements**:
- [ ] Reads `HF_TOKEN` env variable
- [ ] Reads `API_BASE_URL` with default
- [ ] Reads `MODEL_NAME` with default
- [ ] **Validates** `HF_TOKEN` and raises error if missing
- [ ] Uses OpenAI Python client (not raw HTTP)
- [ ] Prints `[START]` at beginning
- [ ] Prints `[STEP]` per step with action and reward
- [ ] Prints `[END]` at end (even on error)
- [ ] Formats rewards to 2 decimal places
- [ ] Prints booleans as lowercase (`true`/`false`)
- [ ] Step count matches actual steps taken
β
**Output Format**:
```
[START]
Task: easy
[STEP]
Action: FIX_ENTRY 1
Reward: 0.20
[STEP]
Action: NO_OP
Reward: 0.00
[END]
Final Score: 0.85
```
β
**Public GitHub Repo**:
- [ ] Repository is public
- [ ] All code is committed
- [ ] README has clear instructions
- [ ] Dockerfile is present and works
β
**Hugging Face Spaces Demo**:
- [ ] Space URL is public
- [ ] Space is built and running (not broken)
- [ ] `demo.py` loads successfully
- [ ] Inference runs end-to-end
- [ ] HF_TOKEN secret is set
β
**Resource Limits** (Free Tier):
- [ ] Model size fits in 8GB RAM
- [ ] Dockerfile doesn't exceed 2 vCPU usage
- [ ] App starts in <60 seconds
- [ ] No unnecessary background services
### π HuggingFace Spaces Deployment
For detailed deployment instructions, see [HF_SPACES_GUIDE.md](./HF_SPACES_GUIDE.md)
**Quick Start**:
1. **Prepare GitHub Repo**
```bash
git add .
git commit -m "Ready for submission"
git push origin main
```
2. **Create HF Space**
- Go to [huggingface.co/spaces/create](https://huggingface.co/spaces/create)
- Choose **Docker** SDK
- Link your GitHub repo
- Set HF_TOKEN secret in Settings
3. **Monitor Build**
- Watch Logs tab for build status
- Wait for "Running" status
- Access app via public URL
4. **Test**
```bash
curl -X POST https://your-space.hf.space/reset \
-d '{"task_id":"easy"}' \
-H "Content-Type: application/json"
```
### π Project Pitch
For pitching at hackathons, see [PITCH.md](./PITCH.md)
**30-second pitch:**
> "We built AuditRepairEnv++, an RL environment where AI agents repair financial ledgers with interdependent errors under budget constraints. Fixing one entry cascades changes to others, forcing agents to plan strategically. It benchmarks LLM reasoning on cost-constrained optimization."
### π§ Troubleshooting
**Issue**: `inference.py` fails with "module not found"
- Verify `requirements.txt` is installed: `pip install -r requirements.txt`
**Issue**: `HF_TOKEN` error
- Generate token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- Export: `export HF_TOKEN="hf_..."`
**Issue**: Space shows "Application Error"
- Check Logs tab in HF Spaces
- Verify app listens on `0.0.0.0:7860`
- Ensure HF_TOKEN secret is set
**Issue**: "Out of memory" on Spaces
- Use smaller model or quantized version
- Reduce MAX_TOKENS in inference.py
- Consider upgrading Space tier
See [HF_SPACES_GUIDE.md](./HF_SPACES_GUIDE.md) for detailed troubleshooting.
---
## Project Structure
```
audit-repair-env/
βββ inference.py β Main submission file (MUST be at root)
βββ server.py β OpenEnv environment server
βββ tasks.py β Task definitions & environment logic
βββ demo.py β Gradio UI (minimal black aesthetic)
βββ requirements.txt β Python dependencies
βββ Dockerfile β Docker image definition
βββ README.md β This file
βββ HF_SPACES_GUIDE.md β Deployment instructions
βββ PITCH.md β Project pitch & overview
βββ auditrepairenv/ β Python package (optional)
βββ __init__.py
```
---
## Documentation
- **[README.md](./README.md)** β This file; environment overview
- **[PITCH.md](./PITCH.md)** β Project pitch, problem statement, comparison to other benchmarks
- **[HF_SPACES_GUIDE.md](./HF_SPACES_GUIDE.md)** β Step-by-step Spaces deployment, troubleshooting, how HF Spaces works
- **[inference.py](./inference.py)** β Submission script with HF_TOKEN validation
- **[demo.py](./demo.py)** β Live Gradio demo with dark theme
---
## Community & Support
- **GitHub Issues**: Report bugs or suggest features
- **Discussions**: Ask questions about the environment
- **Spaces Discussions**: Comment on the demo
---
## License
MIT License β see LICENSE file
---
## Citation
If you use AuditRepairEnv++ in your research, please cite:
```bibtex
@misc{auditrepairenv2024,
title={AuditRepairEnv++: Cost-Constrained Iterative Ledger Repair},
author={Your Name},
year={2024},
howpublished={Hugging Face Spaces},
url={https://huggingface.co/spaces/username/audit-repair-env}
}
```
---
**Good luck with your submission! π**
|