File size: 4,403 Bytes
4317cb9
e650f0f
 
 
4317cb9
 
 
e650f0f
 
 
 
4317cb9
 
e650f0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: Contract Validation Environment Server
emoji: πŸ“
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Contract Validation Environment

The **Contract Validation Environment** is an OpenEnv-compliant RL and LLM benchmark designed to test an agent's ability to act as a precise legal assistant. The agent must review various contract clauses, identify specific legal risks (e.g., liability, termination, payment), and correctly flag them without generating false positives on standard, safe clauses.

## πŸš€ Motivation
Legal contract review is a massive industry, but off-the-shelf LLMs often struggle with "alert fatigue"β€”flagging everything as a risk. This environment challenges agents to precisely isolate genuine liabilities across varying difficulty levels while explicitly rewarding speed and accuracy.

---

## 🎯 Tasks & Difficulty

The environment features 3 deterministic tasks with increasing complexity:

1. **Easy:** 1 clause. Contains a single, explicit liability risk. Tests basic risk identification.
2. **Medium:** 3 clauses. Requires identifying payment and termination risks while actively ignoring a safe governing-law distractor clause.
3. **Hard:** 5 clauses. A complex mix of confidentiality, liability, and compliance risks interspersed with dense, safe, standard boilerplate clauses. Challenges frontier models to avoid false positives.

---

## πŸ“Š Environment Details

### Action Space (`ContractValidationAction`)
- `clause_id` (int): The ID of the clause being reviewed (Set to 0 if submitting final).
- `risk_type` (str): The identified risk (e.g., 'liability', 'payment', 'termination', 'confidentiality', 'compliance', or 'none').
- `submit_final` (bool): Set to `True` when the agent has finished flagging risks to end the episode and receive a final score.
- `explanation` (str): The agent's chain-of-thought or reasoning for the decision.

### Observation Space (`ContractValidationObservation`)
- `task_level` (str): Difficulty level of the current task ("easy", "medium", "hard").
- `contract_clauses` (list): List of dictionaries containing the `id` and `text` of the contract clauses to review.
- `flagged_risks` (dict): A dictionary mapping clause IDs to the risks currently flagged by the agent.
- `step_count` (int): Number of steps taken in the current episode.
- `reward` (float): The reward delta granted for the most recent action.
- `done` (bool): Whether the episode has concluded.
- `info` (dict): Additional environment info, including the current internal `score` from the grader.

### Reward Function & Grader
The environment utilizes a **trajectory-based reward system**. The grader calculates a score between `0.0` and `1.0` based on precision and recall.
* **Positive Reward:** Granted for newly correct flags.
* **Negative Penalty:** Applied for flagging safe clauses or assigning the wrong risk type.
* **Step Penalty:** A `-0.02` penalty is applied per step to encourage the agent to evaluate the contract efficiently. Rewards are clamped to `max(0.0)` to ensure compatibility with OpenEnv graders.
* **Completion Bonus:** A `+0.5` bonus is awarded if the agent submits the contract with a perfect 1.0 grader score.

---

## Project Structure

```
contract_validation/
β”œβ”€β”€ .dockerignore          # Docker build exclusions
β”œβ”€β”€ .env                   # Local environment variables (API keys - DO NOT COMMIT)
β”œβ”€β”€ .gitignore             # Git tracking exclusions (ignores .env, caches, etc.)
β”œβ”€β”€ __init__.py            # Module exports
β”œβ”€β”€ README.md              # Project documentation (with tags: - openenv)
β”œβ”€β”€ openenv.yaml           # OpenEnv manifest
β”œβ”€β”€ pyproject.toml         # Project metadata and dependencies
β”œβ”€β”€ uv.lock                # Locked dependencies (generated)
β”œβ”€β”€ client.py              # ContractValidationEnv client
β”œβ”€β”€ inference.py           # Evaluation script for the OpenEnv grader (JSON logging)
β”œβ”€β”€ models.py              # Action and Observation Pydantic models
β”œβ”€β”€ Dockerfile             # Container image definition
└── server/
    β”œβ”€β”€ __init__.py        # Server module exports
    β”œβ”€β”€ contract_validation_environment.py  # Core environment logic and task data
    └── app.py             # FastAPI application (HTTP + WebSocket endpoints)
```