Corp_AI / README.md
minato1718's picture
fix: add app_port 7860 to README for HuggingFace Docker Spaces routing
68a4a41
metadata
title: Corp AI
emoji: 🏒
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
tags:
  - openenv
  - gradio
  - reinforcement-learning
  - compliance

πŸ“Š AuditEnv

AuditEnv is an OpenEnv-style reinforcement learning environment for autonomous compliance auditing.

Instead of points for eating pac-dots like in traditional video game AI, AuditEnv teaches AI models how to be corporate compliance auditors. The AI gets points (rewards) for successfully finding fraud, policy violations, or access anomalies in corporate data.

The environment simulates real audit workflows and exposes deterministic, graded episodes across three difficulty levels.


?? What is AuditEnv? (The Core Concept)

In Reinforcement Learning (RL), the loop works like this:

  1. State/Observation: The environment gives the AI some data (e.g., procurement invoices, employee access logs).
  2. Action: The AI decides what to do (e.g., "submit_finding" for a fake invoice, or "noop" if everything looks fine).
  3. Reward: The environment grades the AI's action. Did it catch the fraud? (+1.0). Did it falsely accuse someone? (0.0).

AuditEnv provides this exact standardized simulation framework utilizing the OpenEnv standard so any AI agent can connect to it.


??? How the Codebase is Structured

  • The API (src/auditenv/server.py): A FastAPI web server that acts as the "game engine". It has three main endpoints:
    • POST /reset: Starts a new audit scenario.
    • GET /state: Lets the AI look at the current documents/data.
    • POST /step: The AI submits its action here, and the server replies with the eward and whether the audit is done.
  • The Rules (src/auditenv/models.py): Defines exactly what data looks like using Pydantic. It ensures the AI can only take specific actions: submit_finding, lag_human_review, or oop.
  • The Logic & Referee (src/auditenv/grader.py): This is the core of the environment. When the AI takes an action, this calculate a deterministic score between 0.0 (total failure) and 1.0 (perfect audit).
  • The World Builders (src/auditenv/datasets/): Scripts that take messy real-world procurement datasets and turn them into controlled "episodes" for the AI to solve.

Structure Tree

AI_AUDIT/ +-- configs/ # YAML configs for dataset mapping & reward logic +-- data/ # Local dataset folder (ignored in git) +-- docker/ # Dockerization for the environment +-- scripts/ # Baselines and data checkers +-- src/auditenv/ # Core environment server, State, Models, and Graders +-- tests/ # Unit and smoke tests


?? Tasks & Difficulty Levels

OpenEnv metadata is defined in openenv.yaml. All tasks return deterministic, normalized rewards from [0.0, 1.0].

  1. Easy: Expense Report Audit (Single policy checks)
  2. Medium: Access Control Review (Cross-referencing logs)
  3. Hard: Multi-System Fraud Detection (Complex duplicate and anomaly hunting)

? Current Status (As of April 2026)

We have successfully built the "game engine" and the levels!

  • Core requirements implemented (FastAPI, PyDantic models, strict OpenEnv compliance).
  • 3 difficulty levels built with deterministic grading rules.
  • Data pipeline connected, processing external data (data/procurement_invoice_fraud) with smart duplicate-preservation cleaning.
  • Fully tested, with all 10 unit tests passing.
  • Heuristic Baseline Agent (scripts/run_baseline.py) built, scoring around ~0.23 / 1.0. This proves the environment correctly gives partial rewards and penalizes bad behavior!

??? What Needs to be Done Next (Roadmap)

Now that the simulator is built, the next steps are pure Machine Learning: hooking up LLMs to this environment and optimizing them.

Phase A: The LLM Baseline

Our OpenAI baseline is currently blocked by quota limits.

  • Action: Add billing credits to the OpenAI key, or swap out the OpenAI code in un_baseline.py for a free local model (like Llama-3 or Mistral) using Ollama/vLLM.
  • Goal: See what an unschooled, prompt-based LLM scores on Easy, Medium, and Hard.

Phase B: The Reinforcement Learning (RL) Pipeline

Train an open-source model using policy optimization algorithms like GRPO or PPO.

  • Action: Scaffold src/training/grpo_train.py.
  • Goal: The AI tries an audit task, the environment grades it, and the RL algorithm updates the AI's weights. Loop this thousands of times until the model achieves perfect 1.0 scores!

Phase C: Multi-modal Support (Future Expansion)

Right now, the AI only parses text/tabular data.

  • Action: Integrate optical character recognition (OCR) to parse raw invoice images in images/.
  • Goal: Allow the AI to spot visual anomalies directly on physical invoice scans.

??? Setup & Installation

1) Create environment and install dependencies

Using uv (recommended):

uv venv ..venv\Scripts\Activate.ps1 uv pip install -r requirements.txt

2) Run the Server

uvicorn auditenv.server:app --reload --app-dir src

(Endpoints available: GET /health, POST /reset, POST /step, GET /state)