mikhiel39's picture
Upload folder using huggingface_hub
e650f0f verified
metadata
title: Contract Validation Environment Server
emoji: πŸ“
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Contract Validation Environment

The Contract Validation Environment is an OpenEnv-compliant RL and LLM benchmark designed to test an agent's ability to act as a precise legal assistant. The agent must review various contract clauses, identify specific legal risks (e.g., liability, termination, payment), and correctly flag them without generating false positives on standard, safe clauses.

πŸš€ Motivation

Legal contract review is a massive industry, but off-the-shelf LLMs often struggle with "alert fatigue"β€”flagging everything as a risk. This environment challenges agents to precisely isolate genuine liabilities across varying difficulty levels while explicitly rewarding speed and accuracy.


🎯 Tasks & Difficulty

The environment features 3 deterministic tasks with increasing complexity:

  1. Easy: 1 clause. Contains a single, explicit liability risk. Tests basic risk identification.
  2. Medium: 3 clauses. Requires identifying payment and termination risks while actively ignoring a safe governing-law distractor clause.
  3. Hard: 5 clauses. A complex mix of confidentiality, liability, and compliance risks interspersed with dense, safe, standard boilerplate clauses. Challenges frontier models to avoid false positives.

πŸ“Š Environment Details

Action Space (ContractValidationAction)

  • clause_id (int): The ID of the clause being reviewed (Set to 0 if submitting final).
  • risk_type (str): The identified risk (e.g., 'liability', 'payment', 'termination', 'confidentiality', 'compliance', or 'none').
  • submit_final (bool): Set to True when the agent has finished flagging risks to end the episode and receive a final score.
  • explanation (str): The agent's chain-of-thought or reasoning for the decision.

Observation Space (ContractValidationObservation)

  • task_level (str): Difficulty level of the current task ("easy", "medium", "hard").
  • contract_clauses (list): List of dictionaries containing the id and text of the contract clauses to review.
  • flagged_risks (dict): A dictionary mapping clause IDs to the risks currently flagged by the agent.
  • step_count (int): Number of steps taken in the current episode.
  • reward (float): The reward delta granted for the most recent action.
  • done (bool): Whether the episode has concluded.
  • info (dict): Additional environment info, including the current internal score from the grader.

Reward Function & Grader

The environment utilizes a trajectory-based reward system. The grader calculates a score between 0.0 and 1.0 based on precision and recall.

  • Positive Reward: Granted for newly correct flags.
  • Negative Penalty: Applied for flagging safe clauses or assigning the wrong risk type.
  • Step Penalty: A -0.02 penalty is applied per step to encourage the agent to evaluate the contract efficiently. Rewards are clamped to max(0.0) to ensure compatibility with OpenEnv graders.
  • Completion Bonus: A +0.5 bonus is awarded if the agent submits the contract with a perfect 1.0 grader score.

Project Structure

contract_validation/
β”œβ”€β”€ .dockerignore          # Docker build exclusions
β”œβ”€β”€ .env                   # Local environment variables (API keys - DO NOT COMMIT)
β”œβ”€β”€ .gitignore             # Git tracking exclusions (ignores .env, caches, etc.)
β”œβ”€β”€ __init__.py            # Module exports
β”œβ”€β”€ README.md              # Project documentation (with tags: - openenv)
β”œβ”€β”€ openenv.yaml           # OpenEnv manifest
β”œβ”€β”€ pyproject.toml         # Project metadata and dependencies
β”œβ”€β”€ uv.lock                # Locked dependencies (generated)
β”œβ”€β”€ client.py              # ContractValidationEnv client
β”œβ”€β”€ inference.py           # Evaluation script for the OpenEnv grader (JSON logging)
β”œβ”€β”€ models.py              # Action and Observation Pydantic models
β”œβ”€β”€ Dockerfile             # Container image definition
└── server/
    β”œβ”€β”€ __init__.py        # Server module exports
    β”œβ”€β”€ contract_validation_environment.py  # Core environment logic and task data
    └── app.py             # FastAPI application (HTTP + WebSocket endpoints)