mikhiel39's picture
Upload folder using huggingface_hub
e650f0f verified
---
title: Contract Validation Environment Server
emoji: πŸ“
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Contract Validation Environment
The **Contract Validation Environment** is an OpenEnv-compliant RL and LLM benchmark designed to test an agent's ability to act as a precise legal assistant. The agent must review various contract clauses, identify specific legal risks (e.g., liability, termination, payment), and correctly flag them without generating false positives on standard, safe clauses.
## πŸš€ Motivation
Legal contract review is a massive industry, but off-the-shelf LLMs often struggle with "alert fatigue"β€”flagging everything as a risk. This environment challenges agents to precisely isolate genuine liabilities across varying difficulty levels while explicitly rewarding speed and accuracy.
---
## 🎯 Tasks & Difficulty
The environment features 3 deterministic tasks with increasing complexity:
1. **Easy:** 1 clause. Contains a single, explicit liability risk. Tests basic risk identification.
2. **Medium:** 3 clauses. Requires identifying payment and termination risks while actively ignoring a safe governing-law distractor clause.
3. **Hard:** 5 clauses. A complex mix of confidentiality, liability, and compliance risks interspersed with dense, safe, standard boilerplate clauses. Challenges frontier models to avoid false positives.
---
## πŸ“Š Environment Details
### Action Space (`ContractValidationAction`)
- `clause_id` (int): The ID of the clause being reviewed (Set to 0 if submitting final).
- `risk_type` (str): The identified risk (e.g., 'liability', 'payment', 'termination', 'confidentiality', 'compliance', or 'none').
- `submit_final` (bool): Set to `True` when the agent has finished flagging risks to end the episode and receive a final score.
- `explanation` (str): The agent's chain-of-thought or reasoning for the decision.
### Observation Space (`ContractValidationObservation`)
- `task_level` (str): Difficulty level of the current task ("easy", "medium", "hard").
- `contract_clauses` (list): List of dictionaries containing the `id` and `text` of the contract clauses to review.
- `flagged_risks` (dict): A dictionary mapping clause IDs to the risks currently flagged by the agent.
- `step_count` (int): Number of steps taken in the current episode.
- `reward` (float): The reward delta granted for the most recent action.
- `done` (bool): Whether the episode has concluded.
- `info` (dict): Additional environment info, including the current internal `score` from the grader.
### Reward Function & Grader
The environment utilizes a **trajectory-based reward system**. The grader calculates a score between `0.0` and `1.0` based on precision and recall.
* **Positive Reward:** Granted for newly correct flags.
* **Negative Penalty:** Applied for flagging safe clauses or assigning the wrong risk type.
* **Step Penalty:** A `-0.02` penalty is applied per step to encourage the agent to evaluate the contract efficiently. Rewards are clamped to `max(0.0)` to ensure compatibility with OpenEnv graders.
* **Completion Bonus:** A `+0.5` bonus is awarded if the agent submits the contract with a perfect 1.0 grader score.
---
## Project Structure
```
contract_validation/
β”œβ”€β”€ .dockerignore # Docker build exclusions
β”œβ”€β”€ .env # Local environment variables (API keys - DO NOT COMMIT)
β”œβ”€β”€ .gitignore # Git tracking exclusions (ignores .env, caches, etc.)
β”œβ”€β”€ __init__.py # Module exports
β”œβ”€β”€ README.md # Project documentation (with tags: - openenv)
β”œβ”€β”€ openenv.yaml # OpenEnv manifest
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
β”œβ”€β”€ uv.lock # Locked dependencies (generated)
β”œβ”€β”€ client.py # ContractValidationEnv client
β”œβ”€β”€ inference.py # Evaluation script for the OpenEnv grader (JSON logging)
β”œβ”€β”€ models.py # Action and Observation Pydantic models
β”œβ”€β”€ Dockerfile # Container image definition
└── server/
β”œβ”€β”€ __init__.py # Server module exports
β”œβ”€β”€ contract_validation_environment.py # Core environment logic and task data
└── app.py # FastAPI application (HTTP + WebSocket endpoints)
```