Spaces:

Jayant2304
/

commitment-os

Sleeping

jayantaggarwal-sketch commited on 27 days ago

Commit

d53a65c

1 Parent(s): 98b25a9

Sync latest project updates to Hugging Face Space.

Include current code, evaluation scripts, notebook, and docs while excluding PNG binaries required by Space push policy.

Made-with: Cursor

Files changed (13) hide show

.env.example +29 -0
.gitignore +3 -0
HF_README.md +19 -0
README.md +19 -0
artifacts/evals/README.md +2 -0
artifacts/evals_llm/README.md +63 -0
evaluation/CommitmentOS_Checkpoint_Eval_Colab.ipynb +247 -0
evaluation/evaluate_llm_checkpoints.py +565 -0
evaluation/plot_llm_checkpoints.py +133 -0
pyproject.toml +16 -1
server/__init__.py +1 -0
training/CommitmentOS_Training.ipynb +116 -92
uv.lock +100 -18

.env.example ADDED Viewed

	@@ -0,0 +1,29 @@

+# Copy to .env and fill in. Never commit real secrets.
+# --- inference.py (OpenAI-compatible HTTP API) ---
+API_BASE_URL=https://api.openai.com/v1
+MODEL_NAME=gpt-4o-mini
+# Used as API key by inference.py (or set OPENAI_API_KEY instead)
+HF_TOKEN=hf_xxx
+# --- CommitmentOS HTTP environment (inference + LLM eval) ---
+ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
+# --- evaluation/evaluate_llm_checkpoints.py (local Transformers + PEFT) ---
+# Base model on Hugging Face (must match what you trained on)
+BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
+# REQUIRED: absolute or relative path to a folder containing adapter_config.json
+# (e.g. ./training_output after train_grpo.py, or a downloaded adapter dir)
+TRAINED_MODEL_PATH=./training_output
+# Optional eval protocol (defaults shown)
+EVAL_SEED=42
+EVAL_MAX_STEPS=12
+EVAL_TEMPERATURE=0.0
+EVAL_TOP_P=1.0
+EVAL_MAX_NEW_TOKENS=256
+EVAL_SUCCESS_THRESHOLD=0.6
+# --- training/train_grpo.py --push_to_hub only ---
+# Hub repo id when using: python training/train_grpo.py ... --push_to_hub --hub_model_id your/repo
+# TRAINED_MODEL_NAME is not read by evaluate_llm_checkpoints.py; use TRAINED_MODEL_PATH.

.gitignore CHANGED Viewed

@@ -11,3 +11,6 @@ build/
 .ruff_cache/
 *.log
 .DS_Store

 .ruff_cache/
 *.log
 .DS_Store
+# Local GRPO / LoRA output (large; do not commit)
+training_output/

HF_README.md CHANGED Viewed

@@ -91,3 +91,22 @@ Headline metrics (`summary.json`):
 - Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
 - Success rate: **0.3333 -> 1.0000** (**+0.6667**)
 - Median per-task reward delta: **+0.4200**

 - Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
 - Success rate: **0.3333 -> 1.0000** (**+0.6667**)
 - Median per-task reward delta: **+0.4200**
+For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
+run:
+```bash
+# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
+pip install -e ".[llm-eval]"
+export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
+export TRAINED_MODEL_PATH=/content/commitment_os/training_output
+export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
+python3 evaluation/evaluate_llm_checkpoints.py
+python3 evaluation/plot_llm_checkpoints.py
+```
+Artifacts are written to `artifacts/evals_llm/`.
+**Published LLM run (bundle on Drive):** success **46.7% → 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.
+**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.

README.md CHANGED Viewed

@@ -91,3 +91,22 @@ Headline metrics (`summary.json`):
 - Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
 - Success rate: **0.3333 -> 1.0000** (**+0.6667**)
 - Median per-task reward delta: **+0.4200**

 - Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
 - Success rate: **0.3333 -> 1.0000** (**+0.6667**)
 - Median per-task reward delta: **+0.4200**
+For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
+run:
+```bash
+# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
+pip install -e ".[llm-eval]"
+export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
+export TRAINED_MODEL_PATH=/content/commitment_os/training_output
+export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
+python3 evaluation/evaluate_llm_checkpoints.py
+python3 evaluation/plot_llm_checkpoints.py
+```
+Artifacts are written to `artifacts/evals_llm/`.
+**Published LLM run (bundle on Drive):** success **46.7% → 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.
+**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.

artifacts/evals/README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 This folder contains deterministic baseline-vs-trained-style evaluation outputs for all 15 CommitmentOS tasks.
 ## Files
 - `eval_protocol.json`: fixed protocol (task set, seed, max steps, decode config)

 This folder contains deterministic baseline-vs-trained-style evaluation outputs for all 15 CommitmentOS tasks.
+This is **not** the same as the real LLM checkpoint comparison; see root **README** section **B) True LLM Learning Eval** and `artifacts/evals_llm/`.
 ## Files
 - `eval_protocol.json`: fixed protocol (task set, seed, max steps, decode config)

artifacts/evals_llm/README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# True LLM Learning Evaluation (Pre-RL vs Post-RL)
+This folder is for checkpoint-vs-checkpoint evidence:
+- pre-RL base model
+- post-RL trained checkpoint
+Both are evaluated with an identical protocol.
+## Required environment variables
+- `BASELINE_MODEL_NAME`
+- `TRAINED_MODEL_PATH` (local directory with `adapter_config.json`)
+- `ENV_BASE_URL` (CommitmentOS HTTP API)
+Optional:
+- `HF_TOKEN` (gated Hub models / rate limits)
+Optional protocol overrides:
+- `EVAL_SEED` (default: `42`)
+- `EVAL_MAX_STEPS` (default: `12`)
+- `EVAL_TEMPERATURE` (default: `0.0`)
+- `EVAL_TOP_P` (default: `1.0`)
+- `EVAL_MAX_NEW_TOKENS` (default: `256`)
+- `EVAL_SUCCESS_THRESHOLD` (default: `0.6`)
+## Run
+```bash
+cd commitment_os
+pip install -e ".[llm-eval]"
+python3 evaluation/evaluate_llm_checkpoints.py
+python3 evaluation/plot_llm_checkpoints.py
+```
+The evaluator prints one line per task (`[eval …] task i/n`) so long Colab runs do not look frozen.
+## After Colab
+Zip weights + artifacts for download (paths assume `/content/commitment_os`):
+```bash
+cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm
+```
+Or copy `training_output/` and `artifacts/evals_llm/` to Google Drive if the zip is too large for the browser.
+These bundles are **not** checked into git (clone speed + history). A **~330MB** zip (weights + this folder) is a normal size: publish it as a **GitHub Release** asset, **HF Hub**, or **Google Drive**.
+**Drive (weights + this folder):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — after download you should have `artifacts/evals_llm/` (this layout) next to `training_output/`. See root **README** for `gdown` / `TRAINED_MODEL_PATH` notes.
+## Expected outputs
+- `llm_eval_protocol.json`
+- `baseline_llm_eval.json`
+- `trained_llm_eval.json`
+- `llm_comparison.csv`
+- `llm_summary.json`
+- `llm_case_study_hard_015.md`
+- `llm_reward_by_task.svg`
+- `llm_violations_before_after.svg`

evaluation/CommitmentOS_Checkpoint_Eval_Colab.ipynb ADDED Viewed

	@@ -0,0 +1,247 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# CommitmentOS Checkpoint Evaluation (Colab)\n",
+        "\n",
+        "This notebook compares a base model against a locally saved LoRA-trained checkpoint on the CommitmentOS environment.\n",
+        "\n",
+        "It uses:\n",
+        "- `BASELINE_MODEL_NAME` from Hugging Face\n",
+        "- `TRAINED_MODEL_PATH` from disk in Colab\n",
+        "- the existing `evaluation/evaluate_llm_checkpoints.py` script\n",
+        "\n",
+        "By default the notebook evaluates against the hosted CommitmentOS environment on Hugging Face Space. An optional local-server cell is included below."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "d43c692d",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!pip -q install --upgrade pip\n",
+        "!pip -q install transformers peft accelerate torch sentencepiece fastapi uvicorn requests python-dotenv pydantic \"openenv-core>=0.2.0\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!git clone https://github.com/Jayant2304/commitment_os.git\n",
+        "%cd commitment_os"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Configure Paths\n",
+        "\n",
+        "Set the base model ID and the local adapter/checkpoint path. Change `TRAINED_MODEL_PATH` to the folder you actually want to evaluate.\n",
+        "\n",
+        "If the base model is gated, set `HF_TOKEN` as well."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "\n",
+        "# Colab: load Hugging Face token from Secrets (key must be exactly HF_TOKEN)\n",
+        "try:\n",
+        "    from google.colab import userdata\n",
+        "\n",
+        "    os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")\n",
+        "    print(\"HF_TOKEN loaded from Colab secrets\")\n",
+        "except ImportError:\n",
+        "    print(\"Not on Colab; set HF_TOKEN in the shell or .env if downloads fail.\")\n",
+        "except Exception as exc:\n",
+        "    print(\"Could not load HF_TOKEN from secrets:\", exc)\n",
+        "\n",
+        "os.environ[\"BASELINE_MODEL_NAME\"] = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
+        "os.environ[\"TRAINED_MODEL_PATH\"] = \"/content/commitment_os/training_output\"\n",
+        "os.environ[\"ENV_BASE_URL\"] = \"https://jayant2304-commitment-os.hf.space\"\n",
+        "\n",
+        "# Optional for gated base models:\n",
+        "# os.environ[\"HF_TOKEN\"] = \"hf_xxx\"\n",
+        "\n",
+        "# Optional eval overrides:\n",
+        "os.environ[\"EVAL_SEED\"] = \"42\"\n",
+        "os.environ[\"EVAL_MAX_STEPS\"] = \"12\"\n",
+        "os.environ[\"EVAL_TEMPERATURE\"] = \"0.0\"\n",
+        "os.environ[\"EVAL_TOP_P\"] = \"1.0\"\n",
+        "os.environ[\"EVAL_MAX_NEW_TOKENS\"] = \"256\"\n",
+        "os.environ[\"EVAL_SUCCESS_THRESHOLD\"] = \"0.6\"\n",
+        "\n",
+        "for key in [\n",
+        "    \"BASELINE_MODEL_NAME\",\n",
+        "    \"TRAINED_MODEL_PATH\",\n",
+        "    \"ENV_BASE_URL\",\n",
+        "    \"EVAL_SEED\",\n",
+        "    \"EVAL_MAX_STEPS\",\n",
+        "    \"EVAL_TEMPERATURE\",\n",
+        "    \"EVAL_TOP_P\",\n",
+        "    \"EVAL_MAX_NEW_TOKENS\",\n",
+        "    \"EVAL_SUCCESS_THRESHOLD\",\n",
+        "]:\n",
+        "    print(f\"{key}={os.environ[key]}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from pathlib import Path\n",
+        "\n",
+        "trained_path = Path(os.environ[\"TRAINED_MODEL_PATH\"])\n",
+        "print(\"Checkpoint exists:\", trained_path.exists())\n",
+        "if trained_path.exists():\n",
+        "    print(\"Checkpoint contents:\")\n",
+        "    for item in sorted(trained_path.iterdir()):\n",
+        "        print(\" -\", item.name)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Optional: Run CommitmentOS Locally Instead Of HF Space\n",
+        "\n",
+        "Only run this if you want evaluation against a local server inside Colab. Otherwise skip this section and keep `ENV_BASE_URL` pointed at the hosted Space."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Optional local server setup\n",
+        "# import os\n",
+        "# os.environ[\"ENV_BASE_URL\"] = \"http://127.0.0.1:7860\"\n",
+        "# !nohup python -m uvicorn server.app:app --host 0.0.0.0 --port 7860 >/tmp/commitmentos.log 2>&1 &\n",
+        "# !sleep 5\n",
+        "# !curl -s http://127.0.0.1:7860/health"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Run Checkpoint Comparison"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!python evaluation/evaluate_llm_checkpoints.py"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!python evaluation/plot_llm_checkpoints.py"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Inspect Artifacts"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "from pathlib import Path\n",
+        "\n",
+        "artifact_dir = Path(\"artifacts/evals_llm\")\n",
+        "print(sorted(p.name for p in artifact_dir.iterdir()))\n",
+        "\n",
+        "summary = json.loads((artifact_dir / \"llm_summary.json\").read_text())\n",
+        "summary"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "\n",
+        "pd.read_csv(\"artifacts/evals_llm/llm_comparison.csv\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from IPython.display import SVG, display\n",
+        "\n",
+        "display(SVG(filename=\"artifacts/evals_llm/llm_reward_by_task.svg\"))\n",
+        "display(SVG(filename=\"artifacts/evals_llm/llm_violations_before_after.svg\"))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "9e8a35c5",
+      "metadata": {},
+      "source": [
+        "## Backup results (zip and download)\n",
+        "\n",
+        "Run after eval/plot finish. Large runs: copy `training_output` to Google Drive instead of browser download.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "b4a5bcc7",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!cd /content/commitment_os && du -sh training_output artifacts/evals_llm 2>/dev/null || true\n",
+        "!cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm\n",
+        "from google.colab import files\n",
+        "\n",
+        "files.download(\"/content/commitment_os_bundle.zip\")\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.x"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}

evaluation/evaluate_llm_checkpoints.py ADDED Viewed

	@@ -0,0 +1,565 @@

+"""Evaluate base vs RL-trained LLM checkpoints on CommitmentOS.
+This script runs the SAME protocol for two local-loading model setups:
+- baseline model loaded from a Hugging Face model ID
+- trained model loaded from a local LoRA adapter path on top of that base model
+It writes judge-friendly artifacts under artifacts/evals_llm/.
+"""
+from __future__ import annotations
+import csv
+import gc
+import json
+import os
+import sys
+import uuid
+from pathlib import Path
+from statistics import mean, median
+from typing import Any
+import requests
+from dotenv import load_dotenv
+from pydantic import ValidationError
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from models import CommitmentAction
+ARTIFACT_DIR = Path("artifacts/evals_llm")
+ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)
+load_dotenv()
+ENV_BASE_URL = os.getenv("ENV_BASE_URL", "https://jayant2304-commitment-os.hf.space")
+HF_TOKEN = os.getenv("HF_TOKEN", "").strip() or None
+BASELINE_MODEL = os.getenv("BASELINE_MODEL_NAME", "").strip()
+TRAINED_MODEL_PATH = os.getenv("TRAINED_MODEL_PATH", "").strip()
+EVAL_SEED = int(os.getenv("EVAL_SEED", "42"))
+MAX_STEPS = int(os.getenv("EVAL_MAX_STEPS", "12"))
+TEMPERATURE = float(os.getenv("EVAL_TEMPERATURE", "0.0"))
+TOP_P = float(os.getenv("EVAL_TOP_P", "1.0"))
+MAX_NEW_TOKENS = int(os.getenv("EVAL_MAX_NEW_TOKENS", "256"))
+SUCCESS_THRESHOLD = float(os.getenv("EVAL_SUCCESS_THRESHOLD", "0.6"))
+SYSTEM_PROMPT = """You are an expert executive assistant AI. You manage calendars, emails, and dining reservations.
+You will be given a scenario briefing describing a situation with calendar conflicts, emails, or planning tasks.
+For each turn, you must respond with EXACTLY ONE JSON object choosing a tool to call:
+Available tools:
+- {"action_type": "view_calendar", "date": "2026-04-25"}
+- {"action_type": "check_availability", "person": "Client_Jones"}
+- {"action_type": "search_restaurants", "cuisine": "Italian", "max_price": 50, "dietary": "vegetarian", "max_distance_miles": 3.0, "near_airport": false}
+- {"action_type": "schedule_meeting", "title": "Demo", "date": "2026-04-25", "time": "14:00", "duration_min": 60, "participants": ["Client_Jones"], "location": "Room A"}
+- {"action_type": "reschedule_event", "event_id": "evt_1", "new_time": "15:00"}
+- {"action_type": "cancel_event", "event_id": "evt_1"}
+- {"action_type": "send_email", "to": "VP_Chen", "subject": "Meeting update", "body": "Hi, I need to reschedule..."}
+- {"action_type": "book_restaurant", "restaurant_name": "Sky Lounge"}
+- {"action_type": "submit_plan"}
+IMPORTANT RULES:
+1. Respond with ONLY a JSON object, no markdown, no explanation
+2. Handle higher-priority items before lower-priority ones
+3. When cancelling or rescheduling commitments, ALWAYS send an email to affected parties BEFORE submitting
+4. Call submit_plan when you have resolved all issues
+5. Never silently drop a commitment — always notify the affected person"""
+def _require_env() -> None:
+    if not BASELINE_MODEL:
+        raise RuntimeError("Set BASELINE_MODEL_NAME")
+    if not TRAINED_MODEL_PATH:
+        raise RuntimeError("Set TRAINED_MODEL_PATH")
+    if not Path(TRAINED_MODEL_PATH).exists():
+        raise RuntimeError(f"TRAINED_MODEL_PATH does not exist: {TRAINED_MODEL_PATH}")
+def _load_runtime_deps() -> tuple[Any, Any, Any, Any]:
+    try:
+        import torch
+        from peft import PeftModel
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+    except ImportError as exc:
+        raise RuntimeError(
+            "Missing evaluation dependencies. From the repo root: "
+            'pip install -e ".[llm-eval]"'
+            "  (or: pip install transformers peft accelerate torch sentencepiece)"
+        ) from exc
+    return torch, AutoModelForCausalLM, AutoTokenizer, PeftModel
+def _get_task_ids() -> list[str]:
+    resp = requests.get(f"{ENV_BASE_URL}/tasks", timeout=30)
+    resp.raise_for_status()
+    data = resp.json()
+    task_ids: list[str] = []
+    for difficulty in ("easy", "medium", "hard"):
+        task_ids.extend(data.get(difficulty, []))
+    return task_ids
+def _parse_action(text: str) -> dict[str, Any]:
+    text = (text or "").strip()
+    if text.startswith("```"):
+        lines = text.split("\n")
+        text = "\n".join(lines[1:-1]) if len(lines) > 2 else lines[0]
+    try:
+        action = json.loads(text)
+        if isinstance(action, dict) and action.get("action_type"):
+            return action
+    except json.JSONDecodeError:
+        pass
+    return {"action_type": "submit_plan"}
+def _normalize_action(action: dict[str, Any]) -> dict[str, Any]:
+    allowed_fields = set(CommitmentAction.model_fields.keys())
+    payload = {k: v for k, v in action.items() if k in allowed_fields}
+    if isinstance(payload.get("participants"), str):
+        participants = [
+            item.strip()
+            for item in payload["participants"].split(",")
+            if item.strip()
+        ]
+        payload["participants"] = participants
+    if "duration_min" in payload:
+        try:
+            payload["duration_min"] = int(payload["duration_min"])
+        except (TypeError, ValueError):
+            payload.pop("duration_min", None)
+    if "max_price" in payload:
+        try:
+            payload["max_price"] = int(payload["max_price"])
+        except (TypeError, ValueError):
+            payload.pop("max_price", None)
+    if "max_distance_miles" in payload:
+        try:
+            payload["max_distance_miles"] = float(payload["max_distance_miles"])
+        except (TypeError, ValueError):
+            payload.pop("max_distance_miles", None)
+    if isinstance(payload.get("near_airport"), str):
+        payload["near_airport"] = payload["near_airport"].strip().lower() in {"true", "1", "yes"}
+    try:
+        return CommitmentAction.model_validate(payload).model_dump()
+    except ValidationError:
+        return CommitmentAction(action_type="submit_plan").model_dump()
+def _dtype_and_device(torch_mod: Any) -> tuple[Any, str | None]:
+    if not torch_mod.cuda.is_available():
+        return torch_mod.float32, None
+    if torch_mod.cuda.is_bf16_supported():
+        return torch_mod.bfloat16, "auto"
+    return torch_mod.float16, "auto"
+def _path_has_tokenizer_files(path: Path) -> bool:
+    tokenizer_files = {
+        "tokenizer.json",
+        "tokenizer_config.json",
+        "special_tokens_map.json",
+        "vocab.json",
+        "merges.txt",
+        "spiece.model",
+    }
+    return any((path / file_name).exists() for file_name in tokenizer_files)
+class LocalChatModel:
+    def __init__(
+        self,
+        *,
+        display_name: str,
+        tokenizer: Any,
+        model: Any,
+        torch_mod: Any,
+    ) -> None:
+        self.display_name = display_name
+        self.tokenizer = tokenizer
+        self.model = model
+        self.torch = torch_mod
+    def generate_action(self, messages: list[dict[str, str]]) -> tuple[dict[str, Any], str]:
+        prompt = self.tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
+        inputs = self.tokenizer(prompt, return_tensors="pt")
+        target_device = next(self.model.parameters()).device
+        inputs = {k: v.to(target_device) for k, v in inputs.items()}
+        generation_kwargs: dict[str, Any] = {
+            "max_new_tokens": MAX_NEW_TOKENS,
+            "pad_token_id": self.tokenizer.pad_token_id,
+            "eos_token_id": self.tokenizer.eos_token_id,
+        }
+        if TEMPERATURE > 0:
+            generation_kwargs.update(
+                {
+                    "do_sample": True,
+                    "temperature": TEMPERATURE,
+                    "top_p": TOP_P,
+                }
+            )
+        else:
+            generation_kwargs["do_sample"] = False
+        with self.torch.inference_mode():
+            output_ids = self.model.generate(**inputs, **generation_kwargs)
+        prompt_len = inputs["input_ids"].shape[-1]
+        new_tokens = output_ids[0][prompt_len:]
+        raw = self.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
+        return _normalize_action(_parse_action(raw)), raw
+    def unload(self) -> None:
+        del self.model
+        gc.collect()
+        if self.torch.cuda.is_available():
+            self.torch.cuda.empty_cache()
+def _load_tokenizer(AutoTokenizer: Any, model_or_path: str | Path) -> Any:
+    tokenizer = AutoTokenizer.from_pretrained(
+        model_or_path,
+        trust_remote_code=True,
+        token=HF_TOKEN,
+    )
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    return tokenizer
+def load_baseline_model() -> LocalChatModel:
+    torch_mod, AutoModelForCausalLM, AutoTokenizer, _ = _load_runtime_deps()
+    dtype, device_map = _dtype_and_device(torch_mod)
+    tokenizer = _load_tokenizer(AutoTokenizer, BASELINE_MODEL)
+    model = AutoModelForCausalLM.from_pretrained(
+        BASELINE_MODEL,
+        trust_remote_code=True,
+        token=HF_TOKEN,
+        dtype=dtype,
+        device_map=device_map,
+    )
+    model.eval()
+    return LocalChatModel(
+        display_name=BASELINE_MODEL,
+        tokenizer=tokenizer,
+        model=model,
+        torch_mod=torch_mod,
+    )
+def load_trained_model() -> LocalChatModel:
+    torch_mod, AutoModelForCausalLM, AutoTokenizer, PeftModel = _load_runtime_deps()
+    dtype, device_map = _dtype_and_device(torch_mod)
+    adapter_path = Path(TRAINED_MODEL_PATH)
+    tokenizer_source: str | Path = adapter_path if _path_has_tokenizer_files(adapter_path) else BASELINE_MODEL
+    tokenizer = _load_tokenizer(AutoTokenizer, tokenizer_source)
+    base_model = AutoModelForCausalLM.from_pretrained(
+        BASELINE_MODEL,
+        trust_remote_code=True,
+        token=HF_TOKEN,
+        dtype=dtype,
+        device_map=device_map,
+    )
+    model = PeftModel.from_pretrained(base_model, adapter_path)
+    model.eval()
+    return LocalChatModel(
+        display_name=str(adapter_path),
+        tokenizer=tokenizer,
+        model=model,
+        torch_mod=torch_mod,
+    )
+def _env_reset(task_id: str, episode_id: str) -> dict[str, Any]:
+    resp = requests.post(
+        f"{ENV_BASE_URL}/reset",
+        params={"task_id": task_id, "seed": EVAL_SEED, "episode_id": episode_id},
+        timeout=30,
+    )
+    resp.raise_for_status()
+    data = resp.json()
+    return data.get("observation", data)
+def _env_step(action: dict[str, Any], episode_id: str) -> dict[str, Any]:
+    resp = requests.post(
+        f"{ENV_BASE_URL}/step",
+        params={"episode_id": episode_id},
+        json={"action": action},
+        timeout=30,
+    )
+    if resp.status_code >= 400:
+        raise requests.HTTPError(
+            f"{resp.status_code} {resp.reason}: {resp.text}",
+            response=resp,
+        )
+    data = resp.json()
+    obs = data.get("observation", data)
+    obs["done"] = data.get("done", obs.get("done", False))
+    obs["reward"] = float(data.get("reward", obs.get("reward", 0.0)) or 0.0)
+    return obs
+def _env_state(episode_id: str) -> dict[str, Any]:
+    resp = requests.get(f"{ENV_BASE_URL}/state", params={"episode_id": episode_id}, timeout=30)
+    resp.raise_for_status()
+    return resp.json()
+def run_task(chat_model: LocalChatModel, task_id: str) -> dict[str, Any]:
+    safe_name = chat_model.display_name.replace("/", "-").replace(" ", "_")
+    episode_id = f"eval-{safe_name}-{task_id}-{uuid.uuid4().hex[:8]}"
+    obs = _env_reset(task_id, episode_id)
+    briefing = obs.get("briefing", "")
+    calendar = json.dumps(obs.get("calendar_snapshot", []), indent=2)
+    inbox = json.dumps(obs.get("inbox", []), indent=2)
+    messages: list[dict[str, str]] = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": f"SCENARIO: {briefing}\n\nCALENDAR:\n{calendar}\n\nINBOX:\n{inbox}\n\nWhat is your first action?"},
+    ]
+    trace: list[dict[str, Any]] = []
+    step_num = 0
+    done = False
+    final_obs: dict[str, Any] = obs
+    for step_num in range(1, MAX_STEPS + 1):
+        action, raw = chat_model.generate_action(messages)
+        step_obs = _env_step(action, episode_id)
+        final_obs = step_obs
+        done = bool(step_obs.get("done", False))
+        trace.append(
+            {
+                "step": step_num,
+                "action": action,
+                "raw_model_output": raw,
+                "reward": float(step_obs.get("reward", 0.0)),
+                "done": done,
+                "tool_result": step_obs.get("tool_result", ""),
+            }
+        )
+        if done:
+            break
+        messages.append({"role": "assistant", "content": raw})
+        messages.append({"role": "user", "content": f"TOOL RESULT: {step_obs.get('tool_result', '')}\n\nWhat is your next action?"})
+    if not done:
+        final_obs = _env_step({"action_type": "submit_plan"}, episode_id)
+        step_num += 1
+        trace.append(
+            {
+                "step": step_num,
+                "action": {"action_type": "submit_plan"},
+                "raw_model_output": '{"action_type":"submit_plan"}',
+                "reward": float(final_obs.get("reward", 0.0)),
+                "done": True,
+                "tool_result": final_obs.get("tool_result", ""),
+            }
+        )
+    state = _env_state(episode_id)
+    final_reward = float(final_obs.get("reward", 0.0))
+    return {
+        "task_id": task_id,
+        "difficulty": final_obs.get("difficulty", ""),
+        "model_name": chat_model.display_name,
+        "final_reward": round(final_reward, 4),
+        "success": final_reward >= SUCCESS_THRESHOLD,
+        "steps_used": int(state.get("step_count", step_num)),
+        "violation_count": int(state.get("violation_count", 0)),
+        "reward_breakdown": final_obs.get("reward_breakdown", {}),
+        "feedback": final_obs.get("feedback", ""),
+        "trace": trace,
+    }
+def run_model(chat_model: LocalChatModel, task_ids: list[str]) -> list[dict[str, Any]]:
+    results: list[dict[str, Any]] = []
+    n = len(task_ids)
+    label = chat_model.display_name
+    for i, task_id in enumerate(task_ids, start=1):
+        print(f"[eval {label}] task {i}/{n}: {task_id}", flush=True)
+        results.append(run_task(chat_model, task_id=task_id))
+    return results
+def _write_json(path: Path, payload: Any) -> None:
+    path.write_text(json.dumps(payload, indent=2))
+def write_artifacts(baseline: list[dict[str, Any]], trained: list[dict[str, Any]]) -> None:
+    by_task = {row["task_id"]: row for row in trained}
+    comparison_rows: list[dict[str, Any]] = []
+    for base in baseline:
+        tr = by_task[base["task_id"]]
+        comparison_rows.append(
+            {
+                "task_id": base["task_id"],
+                "difficulty": base["difficulty"],
+                "baseline_reward": base["final_reward"],
+                "trained_reward": tr["final_reward"],
+                "reward_delta": round(tr["final_reward"] - base["final_reward"], 4),
+                "baseline_steps": base["steps_used"],
+                "trained_steps": tr["steps_used"],
+                "step_delta": tr["steps_used"] - base["steps_used"],
+                "baseline_violations": base["violation_count"],
+                "trained_violations": tr["violation_count"],
+                "violation_delta": tr["violation_count"] - base["violation_count"],
+                "baseline_success": int(base["success"]),
+                "trained_success": int(tr["success"]),
+            }
+        )
+    _write_json(ARTIFACT_DIR / "baseline_llm_eval.json", baseline)
+    _write_json(ARTIFACT_DIR / "trained_llm_eval.json", trained)
+    _write_json(
+        ARTIFACT_DIR / "llm_eval_protocol.json",
+        {
+            "task_set": "easy_001..hard_015",
+            "seed": EVAL_SEED,
+            "max_steps": MAX_STEPS,
+            "decode_config": {
+                "temperature": TEMPERATURE,
+                "top_p": TOP_P,
+                "max_new_tokens": MAX_NEW_TOKENS,
+            },
+            "env_base_url": ENV_BASE_URL,
+            "baseline_model_name": BASELINE_MODEL,
+            "trained_model_path": TRAINED_MODEL_PATH,
+            "success_threshold": SUCCESS_THRESHOLD,
+        },
+    )
+    with (ARTIFACT_DIR / "llm_comparison.csv").open("w", newline="") as f:
+        writer = csv.DictWriter(f, fieldnames=list(comparison_rows[0].keys()))
+        writer.writeheader()
+        writer.writerows(comparison_rows)
+    baseline_rewards = [r["baseline_reward"] for r in comparison_rows]
+    trained_rewards = [r["trained_reward"] for r in comparison_rows]
+    reward_deltas = [r["reward_delta"] for r in comparison_rows]
+    baseline_steps = [r["baseline_steps"] for r in comparison_rows]
+    trained_steps = [r["trained_steps"] for r in comparison_rows]
+    baseline_violations = [r["baseline_violations"] for r in comparison_rows]
+    trained_violations = [r["trained_violations"] for r in comparison_rows]
+    baseline_success = [r["baseline_success"] for r in comparison_rows]
+    trained_success = [r["trained_success"] for r in comparison_rows]
+    summary = {
+        "task_count": len(comparison_rows),
+        "baseline_mean_reward": round(mean(baseline_rewards), 4),
+        "trained_mean_reward": round(mean(trained_rewards), 4),
+        "mean_reward_delta": round(mean(trained_rewards) - mean(baseline_rewards), 4),
+        "median_reward_delta": round(median(reward_deltas), 4),
+        "baseline_success_rate": round(mean(baseline_success), 4),
+        "trained_success_rate": round(mean(trained_success), 4),
+        "success_rate_delta": round(mean(trained_success) - mean(baseline_success), 4),
+        "baseline_mean_steps": round(mean(baseline_steps), 4),
+        "trained_mean_steps": round(mean(trained_steps), 4),
+        "step_delta": round(mean(trained_steps) - mean(baseline_steps), 4),
+        "baseline_mean_violations": round(mean(baseline_violations), 4),
+        "trained_mean_violations": round(mean(trained_violations), 4),
+        "violation_delta": round(mean(trained_violations) - mean(baseline_violations), 4),
+        "tasks_with_positive_reward_delta": sum(1 for x in reward_deltas if x > 0),
+        "tasks_with_no_reward_delta": sum(1 for x in reward_deltas if x == 0),
+        "per_difficulty": {},
+    }
+    for difficulty in ("easy", "medium", "hard"):
+        subset = [r for r in comparison_rows if r["difficulty"] == difficulty]
+        if not subset:
+            continue
+        summary["per_difficulty"][difficulty] = {
+            "count": len(subset),
+            "baseline_mean_reward": round(mean([r["baseline_reward"] for r in subset]), 4),
+            "trained_mean_reward": round(mean([r["trained_reward"] for r in subset]), 4),
+            "reward_delta": round(
+                mean([r["trained_reward"] for r in subset]) - mean([r["baseline_reward"] for r in subset]),
+                4,
+            ),
+            "baseline_mean_steps": round(mean([r["baseline_steps"] for r in subset]), 4),
+            "trained_mean_steps": round(mean([r["trained_steps"] for r in subset]), 4),
+            "step_delta": round(
+                mean([r["trained_steps"] for r in subset]) - mean([r["baseline_steps"] for r in subset]),
+                4,
+            ),
+        }
+    _write_json(ARTIFACT_DIR / "llm_summary.json", summary)
+    target_task = "hard_015"
+    base_case = next((r for r in baseline if r["task_id"] == target_task), None)
+    tr_case = next((r for r in trained if r["task_id"] == target_task), None)
+    if base_case and tr_case:
+        case_study = f"""# LLM Case Study: {target_task}
+## Baseline model ({BASELINE_MODEL})
+- Reward: {base_case['final_reward']:.4f}
+- Steps: {base_case['steps_used']}
+- Violations: {base_case['violation_count']}
+- Feedback: {base_case['feedback']}
+## Trained model ({TRAINED_MODEL_PATH})
+- Reward: {tr_case['final_reward']:.4f}
+- Steps: {tr_case['steps_used']}
+- Violations: {tr_case['violation_count']}
+- Feedback: {tr_case['feedback']}
+"""
+        (ARTIFACT_DIR / "llm_case_study_hard_015.md").write_text(case_study)
+def _print_summary() -> None:
+    summary_path = ARTIFACT_DIR / "llm_summary.json"
+    summary = json.loads(summary_path.read_text())
+    print("\nCheckpoint comparison summary")
+    print(f"Baseline mean reward: {summary['baseline_mean_reward']:.4f}")
+    print(f"Trained mean reward:  {summary['trained_mean_reward']:.4f}")
+    print(f"Reward delta:         {summary['mean_reward_delta']:+.4f}")
+    print(f"Baseline success:     {summary['baseline_success_rate']:.4f}")
+    print(f"Trained success:      {summary['trained_success_rate']:.4f}")
+    print(f"Success delta:        {summary['success_rate_delta']:+.4f}")
+def main() -> None:
+    _require_env()
+    task_ids = _get_task_ids()
+    print(f"CommitmentOS LLM eval: {len(task_ids)} tasks, env={ENV_BASE_URL}", flush=True)
+    print("Loading baseline model…", flush=True)
+    baseline_model = load_baseline_model()
+    print("Running baseline…", flush=True)
+    baseline_results = run_model(baseline_model, task_ids)
+    baseline_model.unload()
+    print("Loading trained adapter…", flush=True)
+    trained_model = load_trained_model()
+    print("Running trained…", flush=True)
+    trained_results = run_model(trained_model, task_ids)
+    trained_model.unload()
+    write_artifacts(baseline_results, trained_results)
+    print("Wrote LLM checkpoint artifacts to", ARTIFACT_DIR)
+    _print_summary()
+if __name__ == "__main__":
+    main()

evaluation/plot_llm_checkpoints.py ADDED Viewed

	@@ -0,0 +1,133 @@

+"""Render SVG visuals for LLM checkpoint comparison."""
+from __future__ import annotations
+import csv
+from pathlib import Path
+ARTIFACT_DIR = Path("artifacts/evals_llm")
+COMPARISON_CSV = ARTIFACT_DIR / "llm_comparison.csv"
+def _svg_header(width: int, height: int) -> list[str]:
+    return [
+        f'<svg xmlns="http://www.w3.org/2000/svg" width="{width}" height="{height}" viewBox="0 0 {width} {height}">',
+        '<rect width="100%" height="100%" fill="#FFFFFF"/>',
+    ]
+def _svg_footer() -> list[str]:
+    return ["</svg>"]
+def _rows() -> list[dict[str, str]]:
+    with COMPARISON_CSV.open() as f:
+        return list(csv.DictReader(f))
+def plot_reward(rows: list[dict[str, str]]) -> None:
+    tasks = [r["task_id"] for r in rows]
+    base = [float(r["baseline_reward"]) for r in rows]
+    trained = [float(r["trained_reward"]) for r in rows]
+    width, height = 1360, 520
+    left, right, top, bottom = 80, 40, 70, 110
+    plot_w = width - left - right
+    plot_h = height - top - bottom
+    group_w = plot_w / max(len(tasks), 1)
+    bar_w = max(group_w * 0.32, 10)
+    lines = _svg_header(width, height)
+    lines.append('<text x="80" y="35" font-size="22" font-family="Arial" fill="#111827">Base vs Trained LLM Reward by Task</text>')
+    lines.append(f'<line x1="{left}" y1="{top+plot_h}" x2="{left+plot_w}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
+    lines.append(f'<line x1="{left}" y1="{top}" x2="{left}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
+    for tick in range(0, 6):
+        value = tick / 5
+        y = top + plot_h - (value * plot_h)
+        lines.append(f'<line x1="{left}" y1="{y:.2f}" x2="{left+plot_w}" y2="{y:.2f}" stroke="#E5E7EB" stroke-width="1"/>')
+        lines.append(f'<text x="{left-38}" y="{y+5:.2f}" font-size="12" font-family="Arial" fill="#374151">{value:.1f}</text>')
+    for idx, task in enumerate(tasks):
+        gx = left + (idx * group_w) + (group_w * 0.5)
+        b_h = base[idx] * plot_h
+        t_h = trained[idx] * plot_h
+        b_x = gx - bar_w - 2
+        t_x = gx + 2
+        b_y = top + plot_h - b_h
+        t_y = top + plot_h - t_h
+        lines.append(f'<rect x="{b_x:.2f}" y="{b_y:.2f}" width="{bar_w:.2f}" height="{b_h:.2f}" fill="#9CA3AF"/>')
+        lines.append(f'<rect x="{t_x:.2f}" y="{t_y:.2f}" width="{bar_w:.2f}" height="{t_h:.2f}" fill="#2563EB"/>')
+        lines.append(
+            f'<text x="{gx:.2f}" y="{top+plot_h+22}" font-size="10" text-anchor="middle" '
+            f'font-family="Arial" fill="#374151" transform="rotate(25 {gx:.2f},{top+plot_h+22})">{task}</text>'
+        )
+    legend_y = 52
+    lines.append(f'<rect x="{width-310}" y="{legend_y-10}" width="12" height="12" fill="#9CA3AF"/>')
+    lines.append(f'<text x="{width-292}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Base</text>')
+    lines.append(f'<rect x="{width-230}" y="{legend_y-10}" width="12" height="12" fill="#2563EB"/>')
+    lines.append(f'<text x="{width-212}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Trained</text>')
+    lines.extend(_svg_footer())
+    (ARTIFACT_DIR / "llm_reward_by_task.svg").write_text("\n".join(lines))
+def plot_violations(rows: list[dict[str, str]]) -> None:
+    tasks = [r["task_id"] for r in rows]
+    base = [int(r["baseline_violations"]) for r in rows]
+    trained = [int(r["trained_violations"]) for r in rows]
+    max_v = max(max(base, default=0), max(trained, default=0), 1)
+    width, height = 1360, 500
+    left, right, top, bottom = 80, 40, 70, 100
+    plot_w = width - left - right
+    plot_h = height - top - bottom
+    def point_x(i: int) -> float:
+        return left + (i / max(len(tasks) - 1, 1)) * plot_w
+    def point_y(v: int) -> float:
+        return top + plot_h - ((v / max_v) * plot_h)
+    lines = _svg_header(width, height)
+    lines.append('<text x="80" y="35" font-size="22" font-family="Arial" fill="#111827">Base vs Trained LLM Commitment Violations</text>')
+    lines.append(f'<line x1="{left}" y1="{top+plot_h}" x2="{left+plot_w}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
+    lines.append(f'<line x1="{left}" y1="{top}" x2="{left}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
+    for tick in range(max_v + 1):
+        y = point_y(tick)
+        lines.append(f'<line x1="{left}" y1="{y:.2f}" x2="{left+plot_w}" y2="{y:.2f}" stroke="#E5E7EB" stroke-width="1"/>')
+        lines.append(f'<text x="{left-24}" y="{y+5:.2f}" font-size="12" font-family="Arial" fill="#374151">{tick}</text>')
+    base_points = " ".join(f"{point_x(i):.2f},{point_y(v):.2f}" for i, v in enumerate(base))
+    tr_points = " ".join(f"{point_x(i):.2f},{point_y(v):.2f}" for i, v in enumerate(trained))
+    lines.append(f'<polyline points="{base_points}" fill="none" stroke="#DC2626" stroke-width="2"/>')
+    lines.append(f'<polyline points="{tr_points}" fill="none" stroke="#059669" stroke-width="2"/>')
+    for i, task in enumerate(tasks):
+        x = point_x(i)
+        lines.append(f'<circle cx="{x:.2f}" cy="{point_y(base[i]):.2f}" r="3" fill="#DC2626"/>')
+        lines.append(f'<circle cx="{x:.2f}" cy="{point_y(trained[i]):.2f}" r="3" fill="#059669"/>')
+        lines.append(
+            f'<text x="{x:.2f}" y="{top+plot_h+20}" font-size="10" text-anchor="middle" '
+            f'font-family="Arial" fill="#374151" transform="rotate(25 {x:.2f},{top+plot_h+20})">{task}</text>'
+        )
+    legend_y = 52
+    lines.append(f'<line x1="{width-320}" y1="{legend_y-5}" x2="{width-300}" y2="{legend_y-5}" stroke="#DC2626" stroke-width="2"/>')
+    lines.append(f'<text x="{width-295}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Base</text>')
+    lines.append(f'<line x1="{width-230}" y1="{legend_y-5}" x2="{width-210}" y2="{legend_y-5}" stroke="#059669" stroke-width="2"/>')
+    lines.append(f'<text x="{width-205}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Trained</text>')
+    lines.extend(_svg_footer())
+    (ARTIFACT_DIR / "llm_violations_before_after.svg").write_text("\n".join(lines))
+def main() -> None:
+    rows = _rows()
+    plot_reward(rows)
+    plot_violations(rows)
+    print("Wrote checkpoint comparison SVG plots to", ARTIFACT_DIR)
+if __name__ == "__main__":
+    main()

pyproject.toml CHANGED Viewed

@@ -7,7 +7,7 @@ name = "commitment-os"
 version = "0.1.0"
 description = "CommitmentOS: the first RL environment that trains temporal commitment coherence in LLMs"
 requires-python = ">=3.10"
-license = {text = "MIT"}
 authors = [
     {name = "Jayant Aggarwal"},
 ]
@@ -40,4 +40,19 @@ training = [
     "torch>=2.0.0",
     "peft>=0.14.0",
     "datasets>=3.0.0",
 ]

 version = "0.1.0"
 description = "CommitmentOS: the first RL environment that trains temporal commitment coherence in LLMs"
 requires-python = ">=3.10"
+license = "MIT"
 authors = [
     {name = "Jayant Aggarwal"},
 ]
     "torch>=2.0.0",
     "peft>=0.14.0",
     "datasets>=3.0.0",
+    "accelerate>=0.30.0",
+    "sentencepiece>=0.2.0",
 ]
+# Local Transformers + PEFT eval (evaluate_llm_checkpoints.py); not in Docker requirements.txt
+llm-eval = [
+    "transformers>=4.45.0",
+    "peft>=0.14.0",
+    "torch>=2.0.0",
+    "accelerate>=0.30.0",
+    "sentencepiece>=0.2.0",
+    "requests>=2.31.0",
+]
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["server*", "training*"]

server/__init__.py CHANGED Viewed

	@@ -0,0 +1 @@


1	+ """CommitmentOS HTTP server and environment implementation."""

training/CommitmentOS_Training.ipynb CHANGED Viewed

@@ -1,95 +1,119 @@
 {
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# CommitmentOS Training Notebook\\n",
-    "\\n",
-    "This notebook reproduces GRPO training for CommitmentOS using TRL + LoRA."
-   ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip -q install --upgrade pip\\n",
-    "!pip -q install openenv trl transformers peft datasets torch accelerate bitsandbytes matplotlib pandas"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!git clone https://github.com/Jayant2304/commitment_os.git\\n",
-    "%cd commitment_os\\n",
-    "!python -m pytest tests/test_environment.py -q"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!python training/train_grpo.py \\\\\\n",
-    "  --model Qwen/Qwen2.5-1.5B-Instruct \\\\\\n",
-    "  --epochs 2 \\\\\\n",
-    "  --lr 5e-6 \\\\\\n",
-    "  --batch_size 1 \\\\\\n",
-    "  --group_size 2 \\\\\\n",
-    "  --lora_rank 16 \\\\\\n",
-    "  --lora_alpha 32 \\\\\\n",
-    "  --output_dir ./training_output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import json\\n",
-    "import matplotlib.pyplot as plt\\n",
-    "from pathlib import Path\\n",
-    "\\n",
-    "p = Path('training_output/training_metrics.json')\\n",
-    "logs = json.loads(p.read_text())\\n",
-    "\\n",
-    "steps = [float(x['step']) for x in logs if 'step' in x and 'loss' in x]\\n",
-    "loss = [float(x['loss']) for x in logs if 'step' in x and 'loss' in x]\\n",
-    "r_steps = [float(x['step']) for x in logs if 'step' in x and 'reward' in x]\\n",
-    "rewards = [float(x['reward']) for x in logs if 'step' in x and 'reward' in x]\\n",
-    "\\n",
-    "plt.figure(figsize=(8,5))\\n",
-    "plt.plot(steps, loss, marker='o')\\n",
-    "plt.title('CommitmentOS GRPO Loss vs Step')\\n",
-    "plt.xlabel('Step'); plt.ylabel('Loss'); plt.grid(alpha=0.3)\\n",
-    "plt.tight_layout(); plt.savefig('loss_curve.png', dpi=200); plt.show()\\n",
-    "\\n",
-    "plt.figure(figsize=(8,5))\\n",
-    "plt.plot(r_steps, rewards, marker='o')\\n",
-    "plt.title('CommitmentOS GRPO Reward vs Step')\\n",
-    "plt.xlabel('Step'); plt.ylabel('Reward'); plt.grid(alpha=0.3)\\n",
-    "plt.tight_layout(); plt.savefig('reward_curve.png', dpi=200); plt.show()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "name": "python",
-   "version": "3.10"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
 }

 {
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# CommitmentOS Training Notebook\\n\n",
+        "\\n\n",
+        "This notebook reproduces GRPO training for CommitmentOS using TRL + LoRA."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "5bc9c2fe",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!pip -q install --upgrade pip\\n\n",
+        "!pip -q install \"openenv-core>=0.2.0\" trl transformers peft datasets torch accelerate bitsandbytes matplotlib pandas pydantic"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!git clone https://github.com/Jayant2304/commitment_os.git\\n\n",
+        "%cd commitment_os\\n\n",
+        "!python -m pytest tests/test_environment.py -q"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!python training/train_grpo.py \\\\\\n\n",
+        "  --model Qwen/Qwen2.5-1.5B-Instruct \\\\\\n\n",
+        "  --epochs 2 \\\\\\n\n",
+        "  --lr 5e-6 \\\\\\n\n",
+        "  --batch_size 1 \\\\\\n\n",
+        "  --group_size 2 \\\\\\n\n",
+        "  --lora_rank 16 \\\\\\n\n",
+        "  --lora_alpha 32 \\\\\\n\n",
+        "  --output_dir ./training_output"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import json\\n\n",
+        "import matplotlib.pyplot as plt\\n\n",
+        "from pathlib import Path\\n\n",
+        "\\n\n",
+        "p = Path('training_output/training_metrics.json')\\n\n",
+        "logs = json.loads(p.read_text())\\n\n",
+        "\\n\n",
+        "steps = [float(x['step']) for x in logs if 'step' in x and 'loss' in x]\\n\n",
+        "loss = [float(x['loss']) for x in logs if 'step' in x and 'loss' in x]\\n\n",
+        "r_steps = [float(x['step']) for x in logs if 'step' in x and 'reward' in x]\\n\n",
+        "rewards = [float(x['reward']) for x in logs if 'step' in x and 'reward' in x]\\n\n",
+        "\\n\n",
+        "plt.figure(figsize=(8,5))\\n\n",
+        "plt.plot(steps, loss, marker='o')\\n\n",
+        "plt.title('CommitmentOS GRPO Loss vs Step')\\n\n",
+        "plt.xlabel('Step'); plt.ylabel('Loss'); plt.grid(alpha=0.3)\\n\n",
+        "plt.tight_layout(); plt.savefig('loss_curve.png', dpi=200); plt.show()\\n\n",
+        "\\n\n",
+        "plt.figure(figsize=(8,5))\\n\n",
+        "plt.plot(r_steps, rewards, marker='o')\\n\n",
+        "plt.title('CommitmentOS GRPO Reward vs Step')\\n\n",
+        "plt.xlabel('Step'); plt.ylabel('Reward'); plt.grid(alpha=0.3)\\n\n",
+        "plt.tight_layout(); plt.savefig('reward_curve.png', dpi=200); plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "e788b455",
+      "metadata": {},
+      "source": [
+        "### Optional: zip `training_output` for download\n",
+        "\n",
+        "Run after training completes. On Colab, use **Files** sidebar or `files.download` for the zip.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "1b3c760a",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!cd /content/commitment_os && du -sh training_output && zip -r /content/training_output_only.zip training_output\n",
+        "from google.colab import files\n",
+        "\n",
+        "files.download(\"/content/training_output_only.zip\")\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.10"
+    }
   },
+  "nbformat": 4,
+  "nbformat_minor": 5
 }

uv.lock CHANGED Viewed

@@ -660,9 +660,19 @@ inference = [
     { name = "openai" },
     { name = "requests" },
 ]
 training = [
     { name = "datasets" },
     { name = "peft" },
     { name = "torch" },
     { name = "transformers" },
     { name = "trl" },
@@ -670,24 +680,32 @@ training = [
 [package.metadata]
 requires-dist = [
     { name = "datasets", marker = "extra == 'training'", specifier = ">=3.0.0" },
     { name = "fastapi", specifier = ">=0.110.0" },
     { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27.0" },
     { name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
     { name = "openai", marker = "extra == 'inference'", specifier = ">=1.0.0" },
     { name = "openenv-core", specifier = ">=0.2.0" },
     { name = "peft", marker = "extra == 'training'", specifier = ">=0.14.0" },
     { name = "pydantic", specifier = ">=2.0.0" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "python-dotenv", specifier = ">=1.0.0" },
     { name = "requests", marker = "extra == 'dev'", specifier = ">=2.31.0" },
     { name = "requests", marker = "extra == 'inference'", specifier = ">=2.31.0" },
     { name = "torch", marker = "extra == 'training'", specifier = ">=2.0.0" },
     { name = "transformers", marker = "extra == 'training'", specifier = ">=4.45.0" },
     { name = "trl", marker = "extra == 'training'", specifier = ">=0.14.0" },
     { name = "uvicorn", extras = ["standard"], specifier = ">=0.29.0" },
 ]
-provides-extras = ["inference", "dev", "training"]
 [[package]]
 name = "cryptography"
@@ -754,7 +772,7 @@ name = "cuda-bindings"
 version = "13.2.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "cuda-pathfinder" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254 },
@@ -789,37 +807,37 @@ wheels = [
 [package.optional-dependencies]
 cublas = [
-    { name = "nvidia-cublas", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 cudart = [
-    { name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 cufft = [
-    { name = "nvidia-cufft", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 cufile = [
     { name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
 ]
 cupti = [
-    { name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 curand = [
-    { name = "nvidia-curand", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 cusolver = [
-    { name = "nvidia-cusolver", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 cusparse = [
-    { name = "nvidia-cusparse", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 nvjitlink = [
-    { name = "nvidia-nvjitlink", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 nvrtc = [
-    { name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 nvtx = [
-    { name = "nvidia-nvtx", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
 ]
 [[package]]
@@ -2158,7 +2176,7 @@ name = "nvidia-cudnn-cu13"
 version = "9.19.0.56"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "nvidia-cublas" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201 },
@@ -2170,7 +2188,7 @@ name = "nvidia-cufft"
 version = "12.0.0.61"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "nvidia-nvjitlink" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554 },
@@ -2200,9 +2218,9 @@ name = "nvidia-cusolver"
 version = "12.0.4.66"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "nvidia-cublas" },
-    { name = "nvidia-cusparse" },
-    { name = "nvidia-nvjitlink" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760 },
@@ -2214,7 +2232,7 @@ name = "nvidia-cusparse"
 version = "12.6.3.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "nvidia-nvjitlink" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568 },
@@ -3637,6 +3655,70 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/6a/23/8146aad7d88f4fcb3a6218f41a60f6c2d4e3a72de72da1825dc7c8f7877c/semantic_version-2.10.0-py2.py3-none-any.whl", hash = "sha256:de78a3b8e0feda74cabc54aab2da702113e33ac9d9eb9d2389bcf1f58b7d9177", size = 15552 },
 ]
 [[package]]
 name = "setuptools"
 version = "81.0.0"

     { name = "openai" },
     { name = "requests" },
 ]
+llm-eval = [
+    { name = "accelerate" },
+    { name = "peft" },
+    { name = "requests" },
+    { name = "sentencepiece" },
+    { name = "torch" },
+    { name = "transformers" },
+]
 training = [
+    { name = "accelerate" },
     { name = "datasets" },
     { name = "peft" },
+    { name = "sentencepiece" },
     { name = "torch" },
     { name = "transformers" },
     { name = "trl" },
 [package.metadata]
 requires-dist = [
+    { name = "accelerate", marker = "extra == 'llm-eval'", specifier = ">=0.30.0" },
+    { name = "accelerate", marker = "extra == 'training'", specifier = ">=0.30.0" },
     { name = "datasets", marker = "extra == 'training'", specifier = ">=3.0.0" },
     { name = "fastapi", specifier = ">=0.110.0" },
     { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27.0" },
     { name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
     { name = "openai", marker = "extra == 'inference'", specifier = ">=1.0.0" },
     { name = "openenv-core", specifier = ">=0.2.0" },
+    { name = "peft", marker = "extra == 'llm-eval'", specifier = ">=0.14.0" },
     { name = "peft", marker = "extra == 'training'", specifier = ">=0.14.0" },
     { name = "pydantic", specifier = ">=2.0.0" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "python-dotenv", specifier = ">=1.0.0" },
     { name = "requests", marker = "extra == 'dev'", specifier = ">=2.31.0" },
     { name = "requests", marker = "extra == 'inference'", specifier = ">=2.31.0" },
+    { name = "requests", marker = "extra == 'llm-eval'", specifier = ">=2.31.0" },
+    { name = "sentencepiece", marker = "extra == 'llm-eval'", specifier = ">=0.2.0" },
+    { name = "sentencepiece", marker = "extra == 'training'", specifier = ">=0.2.0" },
+    { name = "torch", marker = "extra == 'llm-eval'", specifier = ">=2.0.0" },
     { name = "torch", marker = "extra == 'training'", specifier = ">=2.0.0" },
+    { name = "transformers", marker = "extra == 'llm-eval'", specifier = ">=4.45.0" },
     { name = "transformers", marker = "extra == 'training'", specifier = ">=4.45.0" },
     { name = "trl", marker = "extra == 'training'", specifier = ">=0.14.0" },
     { name = "uvicorn", extras = ["standard"], specifier = ">=0.29.0" },
 ]
+provides-extras = ["inference", "dev", "training", "llm-eval"]
 [[package]]
 name = "cryptography"
 version = "13.2.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "cuda-pathfinder", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254 },
 [package.optional-dependencies]
 cublas = [
+    { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 cudart = [
+    { name = "nvidia-cuda-runtime", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 cufft = [
+    { name = "nvidia-cufft", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 cufile = [
     { name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
 ]
 cupti = [
+    { name = "nvidia-cuda-cupti", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 curand = [
+    { name = "nvidia-curand", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 cusolver = [
+    { name = "nvidia-cusolver", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 cusparse = [
+    { name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 nvjitlink = [
+    { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 nvrtc = [
+    { name = "nvidia-cuda-nvrtc", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 nvtx = [
+    { name = "nvidia-nvtx", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
 ]
 [[package]]
 version = "9.19.0.56"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201 },
 version = "12.0.0.61"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554 },
 version = "12.0.4.66"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
+    { name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
+    { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760 },
 version = "12.6.3.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
 ]
 wheels = [
     { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568 },
     { url = "https://files.pythonhosted.org/packages/6a/23/8146aad7d88f4fcb3a6218f41a60f6c2d4e3a72de72da1825dc7c8f7877c/semantic_version-2.10.0-py2.py3-none-any.whl", hash = "sha256:de78a3b8e0feda74cabc54aab2da702113e33ac9d9eb9d2389bcf1f58b7d9177", size = 15552 },
 ]
+[[package]]
+name = "sentencepiece"
+version = "0.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/15/15/2e7a025fc62d764b151ae6d0f2a92f8081755ebe8d4a64099accc6f77ba6/sentencepiece-0.2.1.tar.gz", hash = "sha256:8138cec27c2f2282f4a34d9a016e3374cd40e5c6e9cb335063db66a0a3b71fad", size = 3228515 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/af/31/5b7cccb307b485db1a2372d6d2980b0a65d067f8be5ca943a103b4acd5b3/sentencepiece-0.2.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e10fa50bdbaa5e2445dbd387979980d391760faf0ec99a09bd7780ff37eaec44", size = 1942557 },
+    { url = "https://files.pythonhosted.org/packages/1f/41/0ac923a8e685ad290c5afc8ae55c5844977b8d75076fcc04302b9a324274/sentencepiece-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f27ae6deea72efdb6f361750c92f6c21fd0ad087445082770cc34015213c526", size = 1325384 },
+    { url = "https://files.pythonhosted.org/packages/fc/ef/3751555d67daf9003384978f169d31c775cb5c7baf28633caaf1eb2b2b4d/sentencepiece-0.2.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:60937c959e6f44159fdd9f56fbdd302501f96114a5ba436829496d5f32d8de3f", size = 1253317 },
+    { url = "https://files.pythonhosted.org/packages/46/a5/742c69b7bd144eb32b6e5fd50dbd8abbbc7a95fce2fe16e50156fa400e3b/sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8b1d91545578852f128650b8cce4ec20f93d39b378ff554ebe66290f2dabb92", size = 1316379 },
+    { url = "https://files.pythonhosted.org/packages/c8/89/8deeafbba2871e8fa10f20f17447786f4ac38085925335728d360eaf4cae/sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:27e38eee653abc3d387862e67bc5c8b6f428cd604e688b85d29170b7e725c26c", size = 1387926 },
+    { url = "https://files.pythonhosted.org/packages/c3/ca/67fe73005f0ab617c6a970b199754e28e524b6873aa7025224fad3cda252/sentencepiece-0.2.1-cp310-cp310-win32.whl", hash = "sha256:251874d720ac7f28024a168501f3c7bb15d1802245f6e66de565f18bbb9b5eaa", size = 999550 },
+    { url = "https://files.pythonhosted.org/packages/6d/33/dc5b54042050d2dda4229c3ce1f862541c99966390b6aa20f54d520d2dc2/sentencepiece-0.2.1-cp310-cp310-win_amd64.whl", hash = "sha256:e52144670738b4b477fade6c2a9b6af71a8d0094514c9853ac9f6fc1fcfabae7", size = 1054613 },
+    { url = "https://files.pythonhosted.org/packages/fa/19/1ea47f46ff97fe04422b78997da1a37cd632f414aae042d27a9009c5b733/sentencepiece-0.2.1-cp310-cp310-win_arm64.whl", hash = "sha256:9076430ac25dfa7147d9d05751dbc66a04bc1aaac371c07f84952979ea59f0d0", size = 1033884 },
+    { url = "https://files.pythonhosted.org/packages/d8/15/46afbab00733d81788b64be430ca1b93011bb9388527958e26cc31832de5/sentencepiece-0.2.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:6356d0986b8b8dc351b943150fcd81a1c6e6e4d439772e8584c64230e58ca987", size = 1942560 },
+    { url = "https://files.pythonhosted.org/packages/fa/79/7c01b8ef98a0567e9d84a4e7a910f8e7074fcbf398a5cd76f93f4b9316f9/sentencepiece-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:8f8ba89a3acb3dc1ae90f65ec1894b0b9596fdb98ab003ff38e058f898b39bc7", size = 1325385 },
+    { url = "https://files.pythonhosted.org/packages/bb/88/2b41e07bd24f33dcf2f18ec3b74247aa4af3526bad8907b8727ea3caba03/sentencepiece-0.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:02593eca45440ef39247cee8c47322a34bdcc1d8ae83ad28ba5a899a2cf8d79a", size = 1253319 },
+    { url = "https://files.pythonhosted.org/packages/a0/54/38a1af0c6210a3c6f95aa46d23d6640636d020fba7135cd0d9a84ada05a7/sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a0d15781a171d188b661ae4bde1d998c303f6bd8621498c50c671bd45a4798e", size = 1316162 },
+    { url = "https://files.pythonhosted.org/packages/ef/66/fb191403ade791ad2c3c1e72fe8413e63781b08cfa3aa4c9dfc536d6e795/sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4f5a3e0d9f445ed9d66c0fec47d4b23d12cfc858b407a03c194c1b26c2ac2a63", size = 1387785 },
+    { url = "https://files.pythonhosted.org/packages/a9/2d/3bd9b08e70067b2124518b308db6a84a4f8901cc8a4317e2e4288cdd9b4d/sentencepiece-0.2.1-cp311-cp311-win32.whl", hash = "sha256:6d297a1748d429ba8534eebe5535448d78b8acc32d00a29b49acf28102eeb094", size = 999555 },
+    { url = "https://files.pythonhosted.org/packages/32/b8/f709977f5fda195ae1ea24f24e7c581163b6f142b1005bc3d0bbfe4d7082/sentencepiece-0.2.1-cp311-cp311-win_amd64.whl", hash = "sha256:82d9ead6591015f009cb1be1cb1c015d5e6f04046dbb8c9588b931e869a29728", size = 1054617 },
+    { url = "https://files.pythonhosted.org/packages/7a/40/a1fc23be23067da0f703709797b464e8a30a1c78cc8a687120cd58d4d509/sentencepiece-0.2.1-cp311-cp311-win_arm64.whl", hash = "sha256:39f8651bd10974eafb9834ce30d9bcf5b73e1fc798a7f7d2528f9820ca86e119", size = 1033877 },
+    { url = "https://files.pythonhosted.org/packages/4a/be/32ce495aa1d0e0c323dcb1ba87096037358edee539cac5baf8755a6bd396/sentencepiece-0.2.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:57cae326c8727de58c85977b175af132a7138d84c764635d7e71bbee7e774133", size = 1943152 },
+    { url = "https://files.pythonhosted.org/packages/88/7e/ff23008899a58678e98c6ff592bf4d368eee5a71af96d0df6b38a039dd4f/sentencepiece-0.2.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:56dd39a3c4d6493db3cdca7e8cc68c6b633f0d4195495cbadfcf5af8a22d05a6", size = 1325651 },
+    { url = "https://files.pythonhosted.org/packages/19/84/42eb3ce4796777a1b5d3699dfd4dca85113e68b637f194a6c8d786f16a04/sentencepiece-0.2.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d9381351182ff9888cc80e41c632e7e274b106f450de33d67a9e8f6043da6f76", size = 1253645 },
+    { url = "https://files.pythonhosted.org/packages/89/fa/d3d5ebcba3cb9e6d3775a096251860c41a6bc53a1b9461151df83fe93255/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:99f955df238021bf11f0fc37cdb54fd5e5b5f7fd30ecc3d93fb48b6815437167", size = 1316273 },
+    { url = "https://files.pythonhosted.org/packages/04/88/14f2f4a2b922d8b39be45bf63d79e6cd3a9b2f248b2fcb98a69b12af12f5/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0cdfecef430d985f1c2bcbfff3defd1d95dae876fbd0173376012d2d7d24044b", size = 1387881 },
+    { url = "https://files.pythonhosted.org/packages/fd/b8/903e5ccb77b4ef140605d5d71b4f9e0ad95d456d6184688073ed11712809/sentencepiece-0.2.1-cp312-cp312-win32.whl", hash = "sha256:a483fd29a34c3e34c39ac5556b0a90942bec253d260235729e50976f5dba1068", size = 999540 },
+    { url = "https://files.pythonhosted.org/packages/2d/81/92df5673c067148c2545b1bfe49adfd775bcc3a169a047f5a0e6575ddaca/sentencepiece-0.2.1-cp312-cp312-win_amd64.whl", hash = "sha256:4cdc7c36234fda305e85c32949c5211faaf8dd886096c7cea289ddc12a2d02de", size = 1054671 },
+    { url = "https://files.pythonhosted.org/packages/fe/02/c5e3bc518655d714622bec87d83db9cdba1cd0619a4a04e2109751c4f47f/sentencepiece-0.2.1-cp312-cp312-win_arm64.whl", hash = "sha256:daeb5e9e9fcad012324807856113708614d534f596d5008638eb9b40112cd9e4", size = 1033923 },
+    { url = "https://files.pythonhosted.org/packages/ba/4a/85fbe1706d4d04a7e826b53f327c4b80f849cf1c7b7c5e31a20a97d8f28b/sentencepiece-0.2.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:dcd8161eee7b41aae57ded06272905dbd680a0a04b91edd0f64790c796b2f706", size = 1943150 },
+    { url = "https://files.pythonhosted.org/packages/c2/83/4cfb393e287509fc2155480b9d184706ef8d9fa8cbf5505d02a5792bf220/sentencepiece-0.2.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:c6c8f42949f419ff8c7e9960dbadcfbc982d7b5efc2f6748210d3dd53a7de062", size = 1325651 },
+    { url = "https://files.pythonhosted.org/packages/8d/de/5a007fb53b1ab0aafc69d11a5a3dd72a289d5a3e78dcf2c3a3d9b14ffe93/sentencepiece-0.2.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:097f3394e99456e9e4efba1737c3749d7e23563dd1588ce71a3d007f25475fff", size = 1253641 },
+    { url = "https://files.pythonhosted.org/packages/2c/d2/f552be5928105588f4f4d66ee37dd4c61460d8097e62d0e2e0eec41bc61d/sentencepiece-0.2.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d7b670879c370d350557edabadbad1f6561a9e6968126e6debca4029e5547820", size = 1316271 },
+    { url = "https://files.pythonhosted.org/packages/96/df/0cfe748ace5485be740fed9476dee7877f109da32ed0d280312c94ec259f/sentencepiece-0.2.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c7f0fd2f2693309e6628aeeb2e2faf6edd221134dfccac3308ca0de01f8dab47", size = 1387882 },
+    { url = "https://files.pythonhosted.org/packages/ac/dd/f7774d42a881ced8e1739f393ab1e82ece39fc9abd4779e28050c2e975b5/sentencepiece-0.2.1-cp313-cp313-win32.whl", hash = "sha256:92b3816aa2339355fda2c8c4e021a5de92180b00aaccaf5e2808972e77a4b22f", size = 999541 },
+    { url = "https://files.pythonhosted.org/packages/dd/e9/932b9eae6fd7019548321eee1ab8d5e3b3d1294df9d9a0c9ac517c7b636d/sentencepiece-0.2.1-cp313-cp313-win_amd64.whl", hash = "sha256:10ed3dab2044c47f7a2e7b4969b0c430420cdd45735d78c8f853191fa0e3148b", size = 1054669 },
+    { url = "https://files.pythonhosted.org/packages/c9/3a/76488a00ea7d6931689cda28726a1447d66bf1a4837943489314593d5596/sentencepiece-0.2.1-cp313-cp313-win_arm64.whl", hash = "sha256:ac650534e2251083c5f75dde4ff28896ce7c8904133dc8fef42780f4d5588fcd", size = 1033922 },
+    { url = "https://files.pythonhosted.org/packages/4a/b6/08fe2ce819e02ccb0296f4843e3f195764ce9829cbda61b7513f29b95718/sentencepiece-0.2.1-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:8dd4b477a7b069648d19363aad0cab9bad2f4e83b2d179be668efa672500dc94", size = 1946052 },
+    { url = "https://files.pythonhosted.org/packages/ab/d9/1ea0e740591ff4c6fc2b6eb1d7510d02f3fb885093f19b2f3abd1363b402/sentencepiece-0.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0c0f672da370cc490e4c59d89e12289778310a0e71d176c541e4834759e1ae07", size = 1327408 },
+    { url = "https://files.pythonhosted.org/packages/99/7e/1fb26e8a21613f6200e1ab88824d5d203714162cf2883248b517deb500b7/sentencepiece-0.2.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ad8493bea8432dae8d6830365352350f3b4144415a1d09c4c8cb8d30cf3b6c3c", size = 1254857 },
+    { url = "https://files.pythonhosted.org/packages/bc/85/c72fd1f3c7a6010544d6ae07f8ddb38b5e2a7e33bd4318f87266c0bbafbf/sentencepiece-0.2.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b81a24733726e3678d2db63619acc5a8dccd074f7aa7a54ecd5ca33ca6d2d596", size = 1315722 },
+    { url = "https://files.pythonhosted.org/packages/4a/e8/661e5bd82a8aa641fd6c1020bd0e890ef73230a2b7215ddf9c8cd8e941c2/sentencepiece-0.2.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0a81799d0a68d618e89063fb423c3001a034c893069135ffe51fee439ae474d6", size = 1387452 },
+    { url = "https://files.pythonhosted.org/packages/99/5e/ae66c361023a470afcbc1fbb8da722c72ea678a2fcd9a18f1a12598c7501/sentencepiece-0.2.1-cp313-cp313t-win32.whl", hash = "sha256:89a3ea015517c42c0341d0d962f3e6aaf2cf10d71b1932d475c44ba48d00aa2b", size = 1002501 },
+    { url = "https://files.pythonhosted.org/packages/c1/03/d332828c4ff764e16c1b56c2c8f9a33488bbe796b53fb6b9c4205ddbf167/sentencepiece-0.2.1-cp313-cp313t-win_amd64.whl", hash = "sha256:33f068c9382dc2e7c228eedfd8163b52baa86bb92f50d0488bf2b7da7032e484", size = 1057555 },
+    { url = "https://files.pythonhosted.org/packages/88/14/5aee0bf0864df9bd82bd59e7711362908e4935e3f9cdc1f57246b5d5c9b9/sentencepiece-0.2.1-cp313-cp313t-win_arm64.whl", hash = "sha256:b3616ad246f360e52c85781e47682d31abfb6554c779e42b65333d4b5f44ecc0", size = 1036042 },
+    { url = "https://files.pythonhosted.org/packages/24/9c/89eb8b2052f720a612478baf11c8227dcf1dc28cd4ea4c0c19506b5af2a2/sentencepiece-0.2.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:5d0350b686c320068702116276cfb26c066dc7e65cfef173980b11bb4d606719", size = 1943147 },
+    { url = "https://files.pythonhosted.org/packages/82/0b/a1432bc87f97c2ace36386ca23e8bd3b91fb40581b5e6148d24b24186419/sentencepiece-0.2.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:c7f54a31cde6fa5cb030370566f68152a742f433f8d2be458463d06c208aef33", size = 1325624 },
+    { url = "https://files.pythonhosted.org/packages/ea/99/bbe054ebb5a5039457c590e0a4156ed073fb0fe9ce4f7523404dd5b37463/sentencepiece-0.2.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c83b85ab2d6576607f31df77ff86f28182be4a8de6d175d2c33ca609925f5da1", size = 1253670 },
+    { url = "https://files.pythonhosted.org/packages/19/ad/d5c7075f701bd97971d7c2ac2904f227566f51ef0838dfbdfdccb58cd212/sentencepiece-0.2.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1855f57db07b51fb51ed6c9c452f570624d2b169b36f0f79ef71a6e6c618cd8b", size = 1316247 },
+    { url = "https://files.pythonhosted.org/packages/fb/03/35fbe5f3d9a7435eebd0b473e09584bd3cc354ce118b960445b060d33781/sentencepiece-0.2.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:01e6912125cb45d3792f530a4d38f8e21bf884d6b4d4ade1b2de5cf7a8d2a52b", size = 1387894 },
+    { url = "https://files.pythonhosted.org/packages/dc/aa/956ef729aafb6c8f9c443104c9636489093bb5c61d6b90fc27aa1a865574/sentencepiece-0.2.1-cp314-cp314-win32.whl", hash = "sha256:c415c9de1447e0a74ae3fdb2e52f967cb544113a3a5ce3a194df185cbc1f962f", size = 1096698 },
+    { url = "https://files.pythonhosted.org/packages/b8/cb/fe400d8836952cc535c81a0ce47dc6875160e5fedb71d2d9ff0e9894c2a6/sentencepiece-0.2.1-cp314-cp314-win_amd64.whl", hash = "sha256:881b2e44b14fc19feade3cbed314be37de639fc415375cefaa5bc81a4be137fd", size = 1155115 },
+    { url = "https://files.pythonhosted.org/packages/32/89/047921cf70f36c7b6b6390876b2399b3633ab73b8d0cb857e5a964238941/sentencepiece-0.2.1-cp314-cp314-win_arm64.whl", hash = "sha256:2005242a16d2dc3ac5fe18aa7667549134d37854823df4c4db244752453b78a8", size = 1133890 },
+    { url = "https://files.pythonhosted.org/packages/a1/11/5b414b9fae6255b5fb1e22e2ed3dc3a72d3a694e5703910e640ac78346bb/sentencepiece-0.2.1-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:a19adcec27c524cb7069a1c741060add95f942d1cbf7ad0d104dffa0a7d28a2b", size = 1946081 },
+    { url = "https://files.pythonhosted.org/packages/77/eb/7a5682bb25824db8545f8e5662e7f3e32d72a508fdce086029d89695106b/sentencepiece-0.2.1-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:e37e4b4c4a11662b5db521def4e44d4d30ae69a1743241412a93ae40fdcab4bb", size = 1327406 },
+    { url = "https://files.pythonhosted.org/packages/03/b0/811dae8fb9f2784e138785d481469788f2e0d0c109c5737372454415f55f/sentencepiece-0.2.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:477c81505db072b3ab627e7eab972ea1025331bd3a92bacbf798df2b75ea86ec", size = 1254846 },
+    { url = "https://files.pythonhosted.org/packages/ef/23/195b2e7ec85ebb6a547969f60b723c7aca5a75800ece6cc3f41da872d14e/sentencepiece-0.2.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:010f025a544ef770bb395091d57cb94deb9652d8972e0d09f71d85d5a0816c8c", size = 1315721 },
+    { url = "https://files.pythonhosted.org/packages/7e/aa/553dbe4178b5f23eb28e59393dddd64186178b56b81d9b8d5c3ff1c28395/sentencepiece-0.2.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:733e59ff1794d26db706cd41fc2d7ca5f6c64a820709cb801dc0ea31780d64ab", size = 1387458 },
+    { url = "https://files.pythonhosted.org/packages/66/7c/08ff0012507297a4dd74a5420fdc0eb9e3e80f4e88cab1538d7f28db303d/sentencepiece-0.2.1-cp314-cp314t-win32.whl", hash = "sha256:d3233770f78e637dc8b1fda2cd7c3b99ec77e7505041934188a4e7fe751de3b0", size = 1099765 },
+    { url = "https://files.pythonhosted.org/packages/91/d5/2a69e1ce15881beb9ddfc7e3f998322f5cedcd5e4d244cb74dade9441663/sentencepiece-0.2.1-cp314-cp314t-win_amd64.whl", hash = "sha256:5e4366c97b68218fd30ea72d70c525e6e78a6c0a88650f57ac4c43c63b234a9d", size = 1157807 },
+    { url = "https://files.pythonhosted.org/packages/f3/16/54f611fcfc2d1c46cbe3ec4169780b2cfa7cf63708ef2b71611136db7513/sentencepiece-0.2.1-cp314-cp314t-win_arm64.whl", hash = "sha256:105e36e75cbac1292642045458e8da677b2342dcd33df503e640f0b457cb6751", size = 1136264 },
+]
 [[package]]
 name = "setuptools"
 version = "81.0.0"