Spaces:
Sleeping
Sleeping
jayantaggarwal-sketch commited on
Commit ·
d53a65c
1
Parent(s): 98b25a9
Sync latest project updates to Hugging Face Space.
Browse filesInclude current code, evaluation scripts, notebook, and docs while excluding PNG binaries required by Space push policy.
Made-with: Cursor
- .env.example +29 -0
- .gitignore +3 -0
- HF_README.md +19 -0
- README.md +19 -0
- artifacts/evals/README.md +2 -0
- artifacts/evals_llm/README.md +63 -0
- evaluation/CommitmentOS_Checkpoint_Eval_Colab.ipynb +247 -0
- evaluation/evaluate_llm_checkpoints.py +565 -0
- evaluation/plot_llm_checkpoints.py +133 -0
- pyproject.toml +16 -1
- server/__init__.py +1 -0
- training/CommitmentOS_Training.ipynb +116 -92
- uv.lock +100 -18
.env.example
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copy to .env and fill in. Never commit real secrets.
|
| 2 |
+
|
| 3 |
+
# --- inference.py (OpenAI-compatible HTTP API) ---
|
| 4 |
+
API_BASE_URL=https://api.openai.com/v1
|
| 5 |
+
MODEL_NAME=gpt-4o-mini
|
| 6 |
+
# Used as API key by inference.py (or set OPENAI_API_KEY instead)
|
| 7 |
+
HF_TOKEN=hf_xxx
|
| 8 |
+
|
| 9 |
+
# --- CommitmentOS HTTP environment (inference + LLM eval) ---
|
| 10 |
+
ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
|
| 11 |
+
|
| 12 |
+
# --- evaluation/evaluate_llm_checkpoints.py (local Transformers + PEFT) ---
|
| 13 |
+
# Base model on Hugging Face (must match what you trained on)
|
| 14 |
+
BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
|
| 15 |
+
# REQUIRED: absolute or relative path to a folder containing adapter_config.json
|
| 16 |
+
# (e.g. ./training_output after train_grpo.py, or a downloaded adapter dir)
|
| 17 |
+
TRAINED_MODEL_PATH=./training_output
|
| 18 |
+
|
| 19 |
+
# Optional eval protocol (defaults shown)
|
| 20 |
+
EVAL_SEED=42
|
| 21 |
+
EVAL_MAX_STEPS=12
|
| 22 |
+
EVAL_TEMPERATURE=0.0
|
| 23 |
+
EVAL_TOP_P=1.0
|
| 24 |
+
EVAL_MAX_NEW_TOKENS=256
|
| 25 |
+
EVAL_SUCCESS_THRESHOLD=0.6
|
| 26 |
+
|
| 27 |
+
# --- training/train_grpo.py --push_to_hub only ---
|
| 28 |
+
# Hub repo id when using: python training/train_grpo.py ... --push_to_hub --hub_model_id your/repo
|
| 29 |
+
# TRAINED_MODEL_NAME is not read by evaluate_llm_checkpoints.py; use TRAINED_MODEL_PATH.
|
.gitignore
CHANGED
|
@@ -11,3 +11,6 @@ build/
|
|
| 11 |
.ruff_cache/
|
| 12 |
*.log
|
| 13 |
.DS_Store
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
.ruff_cache/
|
| 12 |
*.log
|
| 13 |
.DS_Store
|
| 14 |
+
|
| 15 |
+
# Local GRPO / LoRA output (large; do not commit)
|
| 16 |
+
training_output/
|
HF_README.md
CHANGED
|
@@ -91,3 +91,22 @@ Headline metrics (`summary.json`):
|
|
| 91 |
- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
|
| 92 |
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
|
| 93 |
- Median per-task reward delta: **+0.4200**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
|
| 92 |
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
|
| 93 |
- Median per-task reward delta: **+0.4200**
|
| 94 |
+
|
| 95 |
+
For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
|
| 96 |
+
run:
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
|
| 100 |
+
pip install -e ".[llm-eval]"
|
| 101 |
+
export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
|
| 102 |
+
export TRAINED_MODEL_PATH=/content/commitment_os/training_output
|
| 103 |
+
export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
|
| 104 |
+
python3 evaluation/evaluate_llm_checkpoints.py
|
| 105 |
+
python3 evaluation/plot_llm_checkpoints.py
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
Artifacts are written to `artifacts/evals_llm/`.
|
| 109 |
+
|
| 110 |
+
**Published LLM run (bundle on Drive):** success **46.7% → 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.
|
| 111 |
+
|
| 112 |
+
**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.
|
README.md
CHANGED
|
@@ -91,3 +91,22 @@ Headline metrics (`summary.json`):
|
|
| 91 |
- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
|
| 92 |
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
|
| 93 |
- Median per-task reward delta: **+0.4200**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
|
| 92 |
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
|
| 93 |
- Median per-task reward delta: **+0.4200**
|
| 94 |
+
|
| 95 |
+
For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
|
| 96 |
+
run:
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
|
| 100 |
+
pip install -e ".[llm-eval]"
|
| 101 |
+
export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
|
| 102 |
+
export TRAINED_MODEL_PATH=/content/commitment_os/training_output
|
| 103 |
+
export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
|
| 104 |
+
python3 evaluation/evaluate_llm_checkpoints.py
|
| 105 |
+
python3 evaluation/plot_llm_checkpoints.py
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
Artifacts are written to `artifacts/evals_llm/`.
|
| 109 |
+
|
| 110 |
+
**Published LLM run (bundle on Drive):** success **46.7% → 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.
|
| 111 |
+
|
| 112 |
+
**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.
|
artifacts/evals/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
|
| 3 |
This folder contains deterministic baseline-vs-trained-style evaluation outputs for all 15 CommitmentOS tasks.
|
| 4 |
|
|
|
|
|
|
|
| 5 |
## Files
|
| 6 |
|
| 7 |
- `eval_protocol.json`: fixed protocol (task set, seed, max steps, decode config)
|
|
|
|
| 2 |
|
| 3 |
This folder contains deterministic baseline-vs-trained-style evaluation outputs for all 15 CommitmentOS tasks.
|
| 4 |
|
| 5 |
+
This is **not** the same as the real LLM checkpoint comparison; see root **README** section **B) True LLM Learning Eval** and `artifacts/evals_llm/`.
|
| 6 |
+
|
| 7 |
## Files
|
| 8 |
|
| 9 |
- `eval_protocol.json`: fixed protocol (task set, seed, max steps, decode config)
|
artifacts/evals_llm/README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# True LLM Learning Evaluation (Pre-RL vs Post-RL)
|
| 2 |
+
|
| 3 |
+
This folder is for checkpoint-vs-checkpoint evidence:
|
| 4 |
+
|
| 5 |
+
- pre-RL base model
|
| 6 |
+
- post-RL trained checkpoint
|
| 7 |
+
|
| 8 |
+
Both are evaluated with an identical protocol.
|
| 9 |
+
|
| 10 |
+
## Required environment variables
|
| 11 |
+
|
| 12 |
+
- `BASELINE_MODEL_NAME`
|
| 13 |
+
- `TRAINED_MODEL_PATH` (local directory with `adapter_config.json`)
|
| 14 |
+
- `ENV_BASE_URL` (CommitmentOS HTTP API)
|
| 15 |
+
|
| 16 |
+
Optional:
|
| 17 |
+
|
| 18 |
+
- `HF_TOKEN` (gated Hub models / rate limits)
|
| 19 |
+
|
| 20 |
+
Optional protocol overrides:
|
| 21 |
+
|
| 22 |
+
- `EVAL_SEED` (default: `42`)
|
| 23 |
+
- `EVAL_MAX_STEPS` (default: `12`)
|
| 24 |
+
- `EVAL_TEMPERATURE` (default: `0.0`)
|
| 25 |
+
- `EVAL_TOP_P` (default: `1.0`)
|
| 26 |
+
- `EVAL_MAX_NEW_TOKENS` (default: `256`)
|
| 27 |
+
- `EVAL_SUCCESS_THRESHOLD` (default: `0.6`)
|
| 28 |
+
|
| 29 |
+
## Run
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
cd commitment_os
|
| 33 |
+
pip install -e ".[llm-eval]"
|
| 34 |
+
python3 evaluation/evaluate_llm_checkpoints.py
|
| 35 |
+
python3 evaluation/plot_llm_checkpoints.py
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
The evaluator prints one line per task (`[eval …] task i/n`) so long Colab runs do not look frozen.
|
| 39 |
+
|
| 40 |
+
## After Colab
|
| 41 |
+
|
| 42 |
+
Zip weights + artifacts for download (paths assume `/content/commitment_os`):
|
| 43 |
+
|
| 44 |
+
```bash
|
| 45 |
+
cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
Or copy `training_output/` and `artifacts/evals_llm/` to Google Drive if the zip is too large for the browser.
|
| 49 |
+
|
| 50 |
+
These bundles are **not** checked into git (clone speed + history). A **~330MB** zip (weights + this folder) is a normal size: publish it as a **GitHub Release** asset, **HF Hub**, or **Google Drive**.
|
| 51 |
+
|
| 52 |
+
**Drive (weights + this folder):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — after download you should have `artifacts/evals_llm/` (this layout) next to `training_output/`. See root **README** for `gdown` / `TRAINED_MODEL_PATH` notes.
|
| 53 |
+
|
| 54 |
+
## Expected outputs
|
| 55 |
+
|
| 56 |
+
- `llm_eval_protocol.json`
|
| 57 |
+
- `baseline_llm_eval.json`
|
| 58 |
+
- `trained_llm_eval.json`
|
| 59 |
+
- `llm_comparison.csv`
|
| 60 |
+
- `llm_summary.json`
|
| 61 |
+
- `llm_case_study_hard_015.md`
|
| 62 |
+
- `llm_reward_by_task.svg`
|
| 63 |
+
- `llm_violations_before_after.svg`
|
evaluation/CommitmentOS_Checkpoint_Eval_Colab.ipynb
ADDED
|
@@ -0,0 +1,247 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# CommitmentOS Checkpoint Evaluation (Colab)\n",
|
| 8 |
+
"\n",
|
| 9 |
+
"This notebook compares a base model against a locally saved LoRA-trained checkpoint on the CommitmentOS environment.\n",
|
| 10 |
+
"\n",
|
| 11 |
+
"It uses:\n",
|
| 12 |
+
"- `BASELINE_MODEL_NAME` from Hugging Face\n",
|
| 13 |
+
"- `TRAINED_MODEL_PATH` from disk in Colab\n",
|
| 14 |
+
"- the existing `evaluation/evaluate_llm_checkpoints.py` script\n",
|
| 15 |
+
"\n",
|
| 16 |
+
"By default the notebook evaluates against the hosted CommitmentOS environment on Hugging Face Space. An optional local-server cell is included below."
|
| 17 |
+
]
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"cell_type": "code",
|
| 21 |
+
"execution_count": null,
|
| 22 |
+
"id": "d43c692d",
|
| 23 |
+
"metadata": {},
|
| 24 |
+
"outputs": [],
|
| 25 |
+
"source": [
|
| 26 |
+
"!pip -q install --upgrade pip\n",
|
| 27 |
+
"!pip -q install transformers peft accelerate torch sentencepiece fastapi uvicorn requests python-dotenv pydantic \"openenv-core>=0.2.0\""
|
| 28 |
+
]
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"cell_type": "code",
|
| 32 |
+
"execution_count": null,
|
| 33 |
+
"metadata": {},
|
| 34 |
+
"outputs": [],
|
| 35 |
+
"source": [
|
| 36 |
+
"!git clone https://github.com/Jayant2304/commitment_os.git\n",
|
| 37 |
+
"%cd commitment_os"
|
| 38 |
+
]
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"cell_type": "markdown",
|
| 42 |
+
"metadata": {},
|
| 43 |
+
"source": [
|
| 44 |
+
"## Configure Paths\n",
|
| 45 |
+
"\n",
|
| 46 |
+
"Set the base model ID and the local adapter/checkpoint path. Change `TRAINED_MODEL_PATH` to the folder you actually want to evaluate.\n",
|
| 47 |
+
"\n",
|
| 48 |
+
"If the base model is gated, set `HF_TOKEN` as well."
|
| 49 |
+
]
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"cell_type": "code",
|
| 53 |
+
"execution_count": null,
|
| 54 |
+
"metadata": {},
|
| 55 |
+
"outputs": [],
|
| 56 |
+
"source": [
|
| 57 |
+
"import os\n",
|
| 58 |
+
"\n",
|
| 59 |
+
"# Colab: load Hugging Face token from Secrets (key must be exactly HF_TOKEN)\n",
|
| 60 |
+
"try:\n",
|
| 61 |
+
" from google.colab import userdata\n",
|
| 62 |
+
"\n",
|
| 63 |
+
" os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")\n",
|
| 64 |
+
" print(\"HF_TOKEN loaded from Colab secrets\")\n",
|
| 65 |
+
"except ImportError:\n",
|
| 66 |
+
" print(\"Not on Colab; set HF_TOKEN in the shell or .env if downloads fail.\")\n",
|
| 67 |
+
"except Exception as exc:\n",
|
| 68 |
+
" print(\"Could not load HF_TOKEN from secrets:\", exc)\n",
|
| 69 |
+
"\n",
|
| 70 |
+
"os.environ[\"BASELINE_MODEL_NAME\"] = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
|
| 71 |
+
"os.environ[\"TRAINED_MODEL_PATH\"] = \"/content/commitment_os/training_output\"\n",
|
| 72 |
+
"os.environ[\"ENV_BASE_URL\"] = \"https://jayant2304-commitment-os.hf.space\"\n",
|
| 73 |
+
"\n",
|
| 74 |
+
"# Optional for gated base models:\n",
|
| 75 |
+
"# os.environ[\"HF_TOKEN\"] = \"hf_xxx\"\n",
|
| 76 |
+
"\n",
|
| 77 |
+
"# Optional eval overrides:\n",
|
| 78 |
+
"os.environ[\"EVAL_SEED\"] = \"42\"\n",
|
| 79 |
+
"os.environ[\"EVAL_MAX_STEPS\"] = \"12\"\n",
|
| 80 |
+
"os.environ[\"EVAL_TEMPERATURE\"] = \"0.0\"\n",
|
| 81 |
+
"os.environ[\"EVAL_TOP_P\"] = \"1.0\"\n",
|
| 82 |
+
"os.environ[\"EVAL_MAX_NEW_TOKENS\"] = \"256\"\n",
|
| 83 |
+
"os.environ[\"EVAL_SUCCESS_THRESHOLD\"] = \"0.6\"\n",
|
| 84 |
+
"\n",
|
| 85 |
+
"for key in [\n",
|
| 86 |
+
" \"BASELINE_MODEL_NAME\",\n",
|
| 87 |
+
" \"TRAINED_MODEL_PATH\",\n",
|
| 88 |
+
" \"ENV_BASE_URL\",\n",
|
| 89 |
+
" \"EVAL_SEED\",\n",
|
| 90 |
+
" \"EVAL_MAX_STEPS\",\n",
|
| 91 |
+
" \"EVAL_TEMPERATURE\",\n",
|
| 92 |
+
" \"EVAL_TOP_P\",\n",
|
| 93 |
+
" \"EVAL_MAX_NEW_TOKENS\",\n",
|
| 94 |
+
" \"EVAL_SUCCESS_THRESHOLD\",\n",
|
| 95 |
+
"]:\n",
|
| 96 |
+
" print(f\"{key}={os.environ[key]}\")"
|
| 97 |
+
]
|
| 98 |
+
},
|
| 99 |
+
{
|
| 100 |
+
"cell_type": "code",
|
| 101 |
+
"execution_count": null,
|
| 102 |
+
"metadata": {},
|
| 103 |
+
"outputs": [],
|
| 104 |
+
"source": [
|
| 105 |
+
"from pathlib import Path\n",
|
| 106 |
+
"\n",
|
| 107 |
+
"trained_path = Path(os.environ[\"TRAINED_MODEL_PATH\"])\n",
|
| 108 |
+
"print(\"Checkpoint exists:\", trained_path.exists())\n",
|
| 109 |
+
"if trained_path.exists():\n",
|
| 110 |
+
" print(\"Checkpoint contents:\")\n",
|
| 111 |
+
" for item in sorted(trained_path.iterdir()):\n",
|
| 112 |
+
" print(\" -\", item.name)"
|
| 113 |
+
]
|
| 114 |
+
},
|
| 115 |
+
{
|
| 116 |
+
"cell_type": "markdown",
|
| 117 |
+
"metadata": {},
|
| 118 |
+
"source": [
|
| 119 |
+
"## Optional: Run CommitmentOS Locally Instead Of HF Space\n",
|
| 120 |
+
"\n",
|
| 121 |
+
"Only run this if you want evaluation against a local server inside Colab. Otherwise skip this section and keep `ENV_BASE_URL` pointed at the hosted Space."
|
| 122 |
+
]
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"cell_type": "code",
|
| 126 |
+
"execution_count": null,
|
| 127 |
+
"metadata": {},
|
| 128 |
+
"outputs": [],
|
| 129 |
+
"source": [
|
| 130 |
+
"# Optional local server setup\n",
|
| 131 |
+
"# import os\n",
|
| 132 |
+
"# os.environ[\"ENV_BASE_URL\"] = \"http://127.0.0.1:7860\"\n",
|
| 133 |
+
"# !nohup python -m uvicorn server.app:app --host 0.0.0.0 --port 7860 >/tmp/commitmentos.log 2>&1 &\n",
|
| 134 |
+
"# !sleep 5\n",
|
| 135 |
+
"# !curl -s http://127.0.0.1:7860/health"
|
| 136 |
+
]
|
| 137 |
+
},
|
| 138 |
+
{
|
| 139 |
+
"cell_type": "markdown",
|
| 140 |
+
"metadata": {},
|
| 141 |
+
"source": [
|
| 142 |
+
"## Run Checkpoint Comparison"
|
| 143 |
+
]
|
| 144 |
+
},
|
| 145 |
+
{
|
| 146 |
+
"cell_type": "code",
|
| 147 |
+
"execution_count": null,
|
| 148 |
+
"metadata": {},
|
| 149 |
+
"outputs": [],
|
| 150 |
+
"source": [
|
| 151 |
+
"!python evaluation/evaluate_llm_checkpoints.py"
|
| 152 |
+
]
|
| 153 |
+
},
|
| 154 |
+
{
|
| 155 |
+
"cell_type": "code",
|
| 156 |
+
"execution_count": null,
|
| 157 |
+
"metadata": {},
|
| 158 |
+
"outputs": [],
|
| 159 |
+
"source": [
|
| 160 |
+
"!python evaluation/plot_llm_checkpoints.py"
|
| 161 |
+
]
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"cell_type": "markdown",
|
| 165 |
+
"metadata": {},
|
| 166 |
+
"source": [
|
| 167 |
+
"## Inspect Artifacts"
|
| 168 |
+
]
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"cell_type": "code",
|
| 172 |
+
"execution_count": null,
|
| 173 |
+
"metadata": {},
|
| 174 |
+
"outputs": [],
|
| 175 |
+
"source": [
|
| 176 |
+
"import json\n",
|
| 177 |
+
"from pathlib import Path\n",
|
| 178 |
+
"\n",
|
| 179 |
+
"artifact_dir = Path(\"artifacts/evals_llm\")\n",
|
| 180 |
+
"print(sorted(p.name for p in artifact_dir.iterdir()))\n",
|
| 181 |
+
"\n",
|
| 182 |
+
"summary = json.loads((artifact_dir / \"llm_summary.json\").read_text())\n",
|
| 183 |
+
"summary"
|
| 184 |
+
]
|
| 185 |
+
},
|
| 186 |
+
{
|
| 187 |
+
"cell_type": "code",
|
| 188 |
+
"execution_count": null,
|
| 189 |
+
"metadata": {},
|
| 190 |
+
"outputs": [],
|
| 191 |
+
"source": [
|
| 192 |
+
"import pandas as pd\n",
|
| 193 |
+
"\n",
|
| 194 |
+
"pd.read_csv(\"artifacts/evals_llm/llm_comparison.csv\")"
|
| 195 |
+
]
|
| 196 |
+
},
|
| 197 |
+
{
|
| 198 |
+
"cell_type": "code",
|
| 199 |
+
"execution_count": null,
|
| 200 |
+
"metadata": {},
|
| 201 |
+
"outputs": [],
|
| 202 |
+
"source": [
|
| 203 |
+
"from IPython.display import SVG, display\n",
|
| 204 |
+
"\n",
|
| 205 |
+
"display(SVG(filename=\"artifacts/evals_llm/llm_reward_by_task.svg\"))\n",
|
| 206 |
+
"display(SVG(filename=\"artifacts/evals_llm/llm_violations_before_after.svg\"))"
|
| 207 |
+
]
|
| 208 |
+
},
|
| 209 |
+
{
|
| 210 |
+
"cell_type": "markdown",
|
| 211 |
+
"id": "9e8a35c5",
|
| 212 |
+
"metadata": {},
|
| 213 |
+
"source": [
|
| 214 |
+
"## Backup results (zip and download)\n",
|
| 215 |
+
"\n",
|
| 216 |
+
"Run after eval/plot finish. Large runs: copy `training_output` to Google Drive instead of browser download.\n"
|
| 217 |
+
]
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"cell_type": "code",
|
| 221 |
+
"execution_count": null,
|
| 222 |
+
"id": "b4a5bcc7",
|
| 223 |
+
"metadata": {},
|
| 224 |
+
"outputs": [],
|
| 225 |
+
"source": [
|
| 226 |
+
"!cd /content/commitment_os && du -sh training_output artifacts/evals_llm 2>/dev/null || true\n",
|
| 227 |
+
"!cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm\n",
|
| 228 |
+
"from google.colab import files\n",
|
| 229 |
+
"\n",
|
| 230 |
+
"files.download(\"/content/commitment_os_bundle.zip\")\n"
|
| 231 |
+
]
|
| 232 |
+
}
|
| 233 |
+
],
|
| 234 |
+
"metadata": {
|
| 235 |
+
"kernelspec": {
|
| 236 |
+
"display_name": "Python 3",
|
| 237 |
+
"language": "python",
|
| 238 |
+
"name": "python3"
|
| 239 |
+
},
|
| 240 |
+
"language_info": {
|
| 241 |
+
"name": "python",
|
| 242 |
+
"version": "3.x"
|
| 243 |
+
}
|
| 244 |
+
},
|
| 245 |
+
"nbformat": 4,
|
| 246 |
+
"nbformat_minor": 5
|
| 247 |
+
}
|
evaluation/evaluate_llm_checkpoints.py
ADDED
|
@@ -0,0 +1,565 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Evaluate base vs RL-trained LLM checkpoints on CommitmentOS.
|
| 2 |
+
|
| 3 |
+
This script runs the SAME protocol for two local-loading model setups:
|
| 4 |
+
- baseline model loaded from a Hugging Face model ID
|
| 5 |
+
- trained model loaded from a local LoRA adapter path on top of that base model
|
| 6 |
+
|
| 7 |
+
It writes judge-friendly artifacts under artifacts/evals_llm/.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
|
| 12 |
+
import csv
|
| 13 |
+
import gc
|
| 14 |
+
import json
|
| 15 |
+
import os
|
| 16 |
+
import sys
|
| 17 |
+
import uuid
|
| 18 |
+
from pathlib import Path
|
| 19 |
+
from statistics import mean, median
|
| 20 |
+
from typing import Any
|
| 21 |
+
|
| 22 |
+
import requests
|
| 23 |
+
from dotenv import load_dotenv
|
| 24 |
+
from pydantic import ValidationError
|
| 25 |
+
|
| 26 |
+
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
| 27 |
+
if str(PROJECT_ROOT) not in sys.path:
|
| 28 |
+
sys.path.insert(0, str(PROJECT_ROOT))
|
| 29 |
+
|
| 30 |
+
from models import CommitmentAction
|
| 31 |
+
|
| 32 |
+
ARTIFACT_DIR = Path("artifacts/evals_llm")
|
| 33 |
+
ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)
|
| 34 |
+
|
| 35 |
+
load_dotenv()
|
| 36 |
+
|
| 37 |
+
ENV_BASE_URL = os.getenv("ENV_BASE_URL", "https://jayant2304-commitment-os.hf.space")
|
| 38 |
+
HF_TOKEN = os.getenv("HF_TOKEN", "").strip() or None
|
| 39 |
+
|
| 40 |
+
BASELINE_MODEL = os.getenv("BASELINE_MODEL_NAME", "").strip()
|
| 41 |
+
TRAINED_MODEL_PATH = os.getenv("TRAINED_MODEL_PATH", "").strip()
|
| 42 |
+
|
| 43 |
+
EVAL_SEED = int(os.getenv("EVAL_SEED", "42"))
|
| 44 |
+
MAX_STEPS = int(os.getenv("EVAL_MAX_STEPS", "12"))
|
| 45 |
+
TEMPERATURE = float(os.getenv("EVAL_TEMPERATURE", "0.0"))
|
| 46 |
+
TOP_P = float(os.getenv("EVAL_TOP_P", "1.0"))
|
| 47 |
+
MAX_NEW_TOKENS = int(os.getenv("EVAL_MAX_NEW_TOKENS", "256"))
|
| 48 |
+
SUCCESS_THRESHOLD = float(os.getenv("EVAL_SUCCESS_THRESHOLD", "0.6"))
|
| 49 |
+
|
| 50 |
+
SYSTEM_PROMPT = """You are an expert executive assistant AI. You manage calendars, emails, and dining reservations.
|
| 51 |
+
|
| 52 |
+
You will be given a scenario briefing describing a situation with calendar conflicts, emails, or planning tasks.
|
| 53 |
+
|
| 54 |
+
For each turn, you must respond with EXACTLY ONE JSON object choosing a tool to call:
|
| 55 |
+
|
| 56 |
+
Available tools:
|
| 57 |
+
- {"action_type": "view_calendar", "date": "2026-04-25"}
|
| 58 |
+
- {"action_type": "check_availability", "person": "Client_Jones"}
|
| 59 |
+
- {"action_type": "search_restaurants", "cuisine": "Italian", "max_price": 50, "dietary": "vegetarian", "max_distance_miles": 3.0, "near_airport": false}
|
| 60 |
+
- {"action_type": "schedule_meeting", "title": "Demo", "date": "2026-04-25", "time": "14:00", "duration_min": 60, "participants": ["Client_Jones"], "location": "Room A"}
|
| 61 |
+
- {"action_type": "reschedule_event", "event_id": "evt_1", "new_time": "15:00"}
|
| 62 |
+
- {"action_type": "cancel_event", "event_id": "evt_1"}
|
| 63 |
+
- {"action_type": "send_email", "to": "VP_Chen", "subject": "Meeting update", "body": "Hi, I need to reschedule..."}
|
| 64 |
+
- {"action_type": "book_restaurant", "restaurant_name": "Sky Lounge"}
|
| 65 |
+
- {"action_type": "submit_plan"}
|
| 66 |
+
|
| 67 |
+
IMPORTANT RULES:
|
| 68 |
+
1. Respond with ONLY a JSON object, no markdown, no explanation
|
| 69 |
+
2. Handle higher-priority items before lower-priority ones
|
| 70 |
+
3. When cancelling or rescheduling commitments, ALWAYS send an email to affected parties BEFORE submitting
|
| 71 |
+
4. Call submit_plan when you have resolved all issues
|
| 72 |
+
5. Never silently drop a commitment — always notify the affected person"""
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def _require_env() -> None:
|
| 76 |
+
if not BASELINE_MODEL:
|
| 77 |
+
raise RuntimeError("Set BASELINE_MODEL_NAME")
|
| 78 |
+
if not TRAINED_MODEL_PATH:
|
| 79 |
+
raise RuntimeError("Set TRAINED_MODEL_PATH")
|
| 80 |
+
if not Path(TRAINED_MODEL_PATH).exists():
|
| 81 |
+
raise RuntimeError(f"TRAINED_MODEL_PATH does not exist: {TRAINED_MODEL_PATH}")
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
def _load_runtime_deps() -> tuple[Any, Any, Any, Any]:
|
| 85 |
+
try:
|
| 86 |
+
import torch
|
| 87 |
+
from peft import PeftModel
|
| 88 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 89 |
+
except ImportError as exc:
|
| 90 |
+
raise RuntimeError(
|
| 91 |
+
"Missing evaluation dependencies. From the repo root: "
|
| 92 |
+
'pip install -e ".[llm-eval]"'
|
| 93 |
+
" (or: pip install transformers peft accelerate torch sentencepiece)"
|
| 94 |
+
) from exc
|
| 95 |
+
return torch, AutoModelForCausalLM, AutoTokenizer, PeftModel
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
def _get_task_ids() -> list[str]:
|
| 99 |
+
resp = requests.get(f"{ENV_BASE_URL}/tasks", timeout=30)
|
| 100 |
+
resp.raise_for_status()
|
| 101 |
+
data = resp.json()
|
| 102 |
+
task_ids: list[str] = []
|
| 103 |
+
for difficulty in ("easy", "medium", "hard"):
|
| 104 |
+
task_ids.extend(data.get(difficulty, []))
|
| 105 |
+
return task_ids
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
def _parse_action(text: str) -> dict[str, Any]:
|
| 109 |
+
text = (text or "").strip()
|
| 110 |
+
if text.startswith("```"):
|
| 111 |
+
lines = text.split("\n")
|
| 112 |
+
text = "\n".join(lines[1:-1]) if len(lines) > 2 else lines[0]
|
| 113 |
+
try:
|
| 114 |
+
action = json.loads(text)
|
| 115 |
+
if isinstance(action, dict) and action.get("action_type"):
|
| 116 |
+
return action
|
| 117 |
+
except json.JSONDecodeError:
|
| 118 |
+
pass
|
| 119 |
+
return {"action_type": "submit_plan"}
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
def _normalize_action(action: dict[str, Any]) -> dict[str, Any]:
|
| 123 |
+
allowed_fields = set(CommitmentAction.model_fields.keys())
|
| 124 |
+
payload = {k: v for k, v in action.items() if k in allowed_fields}
|
| 125 |
+
|
| 126 |
+
if isinstance(payload.get("participants"), str):
|
| 127 |
+
participants = [
|
| 128 |
+
item.strip()
|
| 129 |
+
for item in payload["participants"].split(",")
|
| 130 |
+
if item.strip()
|
| 131 |
+
]
|
| 132 |
+
payload["participants"] = participants
|
| 133 |
+
|
| 134 |
+
if "duration_min" in payload:
|
| 135 |
+
try:
|
| 136 |
+
payload["duration_min"] = int(payload["duration_min"])
|
| 137 |
+
except (TypeError, ValueError):
|
| 138 |
+
payload.pop("duration_min", None)
|
| 139 |
+
|
| 140 |
+
if "max_price" in payload:
|
| 141 |
+
try:
|
| 142 |
+
payload["max_price"] = int(payload["max_price"])
|
| 143 |
+
except (TypeError, ValueError):
|
| 144 |
+
payload.pop("max_price", None)
|
| 145 |
+
|
| 146 |
+
if "max_distance_miles" in payload:
|
| 147 |
+
try:
|
| 148 |
+
payload["max_distance_miles"] = float(payload["max_distance_miles"])
|
| 149 |
+
except (TypeError, ValueError):
|
| 150 |
+
payload.pop("max_distance_miles", None)
|
| 151 |
+
|
| 152 |
+
if isinstance(payload.get("near_airport"), str):
|
| 153 |
+
payload["near_airport"] = payload["near_airport"].strip().lower() in {"true", "1", "yes"}
|
| 154 |
+
|
| 155 |
+
try:
|
| 156 |
+
return CommitmentAction.model_validate(payload).model_dump()
|
| 157 |
+
except ValidationError:
|
| 158 |
+
return CommitmentAction(action_type="submit_plan").model_dump()
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def _dtype_and_device(torch_mod: Any) -> tuple[Any, str | None]:
|
| 162 |
+
if not torch_mod.cuda.is_available():
|
| 163 |
+
return torch_mod.float32, None
|
| 164 |
+
if torch_mod.cuda.is_bf16_supported():
|
| 165 |
+
return torch_mod.bfloat16, "auto"
|
| 166 |
+
return torch_mod.float16, "auto"
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
def _path_has_tokenizer_files(path: Path) -> bool:
|
| 170 |
+
tokenizer_files = {
|
| 171 |
+
"tokenizer.json",
|
| 172 |
+
"tokenizer_config.json",
|
| 173 |
+
"special_tokens_map.json",
|
| 174 |
+
"vocab.json",
|
| 175 |
+
"merges.txt",
|
| 176 |
+
"spiece.model",
|
| 177 |
+
}
|
| 178 |
+
return any((path / file_name).exists() for file_name in tokenizer_files)
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
class LocalChatModel:
|
| 182 |
+
def __init__(
|
| 183 |
+
self,
|
| 184 |
+
*,
|
| 185 |
+
display_name: str,
|
| 186 |
+
tokenizer: Any,
|
| 187 |
+
model: Any,
|
| 188 |
+
torch_mod: Any,
|
| 189 |
+
) -> None:
|
| 190 |
+
self.display_name = display_name
|
| 191 |
+
self.tokenizer = tokenizer
|
| 192 |
+
self.model = model
|
| 193 |
+
self.torch = torch_mod
|
| 194 |
+
|
| 195 |
+
def generate_action(self, messages: list[dict[str, str]]) -> tuple[dict[str, Any], str]:
|
| 196 |
+
prompt = self.tokenizer.apply_chat_template(
|
| 197 |
+
messages,
|
| 198 |
+
tokenize=False,
|
| 199 |
+
add_generation_prompt=True,
|
| 200 |
+
)
|
| 201 |
+
inputs = self.tokenizer(prompt, return_tensors="pt")
|
| 202 |
+
target_device = next(self.model.parameters()).device
|
| 203 |
+
inputs = {k: v.to(target_device) for k, v in inputs.items()}
|
| 204 |
+
|
| 205 |
+
generation_kwargs: dict[str, Any] = {
|
| 206 |
+
"max_new_tokens": MAX_NEW_TOKENS,
|
| 207 |
+
"pad_token_id": self.tokenizer.pad_token_id,
|
| 208 |
+
"eos_token_id": self.tokenizer.eos_token_id,
|
| 209 |
+
}
|
| 210 |
+
if TEMPERATURE > 0:
|
| 211 |
+
generation_kwargs.update(
|
| 212 |
+
{
|
| 213 |
+
"do_sample": True,
|
| 214 |
+
"temperature": TEMPERATURE,
|
| 215 |
+
"top_p": TOP_P,
|
| 216 |
+
}
|
| 217 |
+
)
|
| 218 |
+
else:
|
| 219 |
+
generation_kwargs["do_sample"] = False
|
| 220 |
+
|
| 221 |
+
with self.torch.inference_mode():
|
| 222 |
+
output_ids = self.model.generate(**inputs, **generation_kwargs)
|
| 223 |
+
|
| 224 |
+
prompt_len = inputs["input_ids"].shape[-1]
|
| 225 |
+
new_tokens = output_ids[0][prompt_len:]
|
| 226 |
+
raw = self.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
|
| 227 |
+
return _normalize_action(_parse_action(raw)), raw
|
| 228 |
+
|
| 229 |
+
def unload(self) -> None:
|
| 230 |
+
del self.model
|
| 231 |
+
gc.collect()
|
| 232 |
+
if self.torch.cuda.is_available():
|
| 233 |
+
self.torch.cuda.empty_cache()
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
def _load_tokenizer(AutoTokenizer: Any, model_or_path: str | Path) -> Any:
|
| 237 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
| 238 |
+
model_or_path,
|
| 239 |
+
trust_remote_code=True,
|
| 240 |
+
token=HF_TOKEN,
|
| 241 |
+
)
|
| 242 |
+
if tokenizer.pad_token is None:
|
| 243 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 244 |
+
return tokenizer
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
def load_baseline_model() -> LocalChatModel:
|
| 248 |
+
torch_mod, AutoModelForCausalLM, AutoTokenizer, _ = _load_runtime_deps()
|
| 249 |
+
dtype, device_map = _dtype_and_device(torch_mod)
|
| 250 |
+
tokenizer = _load_tokenizer(AutoTokenizer, BASELINE_MODEL)
|
| 251 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 252 |
+
BASELINE_MODEL,
|
| 253 |
+
trust_remote_code=True,
|
| 254 |
+
token=HF_TOKEN,
|
| 255 |
+
dtype=dtype,
|
| 256 |
+
device_map=device_map,
|
| 257 |
+
)
|
| 258 |
+
model.eval()
|
| 259 |
+
return LocalChatModel(
|
| 260 |
+
display_name=BASELINE_MODEL,
|
| 261 |
+
tokenizer=tokenizer,
|
| 262 |
+
model=model,
|
| 263 |
+
torch_mod=torch_mod,
|
| 264 |
+
)
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
def load_trained_model() -> LocalChatModel:
|
| 268 |
+
torch_mod, AutoModelForCausalLM, AutoTokenizer, PeftModel = _load_runtime_deps()
|
| 269 |
+
dtype, device_map = _dtype_and_device(torch_mod)
|
| 270 |
+
adapter_path = Path(TRAINED_MODEL_PATH)
|
| 271 |
+
tokenizer_source: str | Path = adapter_path if _path_has_tokenizer_files(adapter_path) else BASELINE_MODEL
|
| 272 |
+
tokenizer = _load_tokenizer(AutoTokenizer, tokenizer_source)
|
| 273 |
+
|
| 274 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
| 275 |
+
BASELINE_MODEL,
|
| 276 |
+
trust_remote_code=True,
|
| 277 |
+
token=HF_TOKEN,
|
| 278 |
+
dtype=dtype,
|
| 279 |
+
device_map=device_map,
|
| 280 |
+
)
|
| 281 |
+
model = PeftModel.from_pretrained(base_model, adapter_path)
|
| 282 |
+
model.eval()
|
| 283 |
+
return LocalChatModel(
|
| 284 |
+
display_name=str(adapter_path),
|
| 285 |
+
tokenizer=tokenizer,
|
| 286 |
+
model=model,
|
| 287 |
+
torch_mod=torch_mod,
|
| 288 |
+
)
|
| 289 |
+
|
| 290 |
+
|
| 291 |
+
def _env_reset(task_id: str, episode_id: str) -> dict[str, Any]:
|
| 292 |
+
resp = requests.post(
|
| 293 |
+
f"{ENV_BASE_URL}/reset",
|
| 294 |
+
params={"task_id": task_id, "seed": EVAL_SEED, "episode_id": episode_id},
|
| 295 |
+
timeout=30,
|
| 296 |
+
)
|
| 297 |
+
resp.raise_for_status()
|
| 298 |
+
data = resp.json()
|
| 299 |
+
return data.get("observation", data)
|
| 300 |
+
|
| 301 |
+
|
| 302 |
+
def _env_step(action: dict[str, Any], episode_id: str) -> dict[str, Any]:
|
| 303 |
+
resp = requests.post(
|
| 304 |
+
f"{ENV_BASE_URL}/step",
|
| 305 |
+
params={"episode_id": episode_id},
|
| 306 |
+
json={"action": action},
|
| 307 |
+
timeout=30,
|
| 308 |
+
)
|
| 309 |
+
if resp.status_code >= 400:
|
| 310 |
+
raise requests.HTTPError(
|
| 311 |
+
f"{resp.status_code} {resp.reason}: {resp.text}",
|
| 312 |
+
response=resp,
|
| 313 |
+
)
|
| 314 |
+
data = resp.json()
|
| 315 |
+
obs = data.get("observation", data)
|
| 316 |
+
obs["done"] = data.get("done", obs.get("done", False))
|
| 317 |
+
obs["reward"] = float(data.get("reward", obs.get("reward", 0.0)) or 0.0)
|
| 318 |
+
return obs
|
| 319 |
+
|
| 320 |
+
|
| 321 |
+
def _env_state(episode_id: str) -> dict[str, Any]:
|
| 322 |
+
resp = requests.get(f"{ENV_BASE_URL}/state", params={"episode_id": episode_id}, timeout=30)
|
| 323 |
+
resp.raise_for_status()
|
| 324 |
+
return resp.json()
|
| 325 |
+
|
| 326 |
+
|
| 327 |
+
def run_task(chat_model: LocalChatModel, task_id: str) -> dict[str, Any]:
|
| 328 |
+
safe_name = chat_model.display_name.replace("/", "-").replace(" ", "_")
|
| 329 |
+
episode_id = f"eval-{safe_name}-{task_id}-{uuid.uuid4().hex[:8]}"
|
| 330 |
+
obs = _env_reset(task_id, episode_id)
|
| 331 |
+
|
| 332 |
+
briefing = obs.get("briefing", "")
|
| 333 |
+
calendar = json.dumps(obs.get("calendar_snapshot", []), indent=2)
|
| 334 |
+
inbox = json.dumps(obs.get("inbox", []), indent=2)
|
| 335 |
+
messages: list[dict[str, str]] = [
|
| 336 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 337 |
+
{"role": "user", "content": f"SCENARIO: {briefing}\n\nCALENDAR:\n{calendar}\n\nINBOX:\n{inbox}\n\nWhat is your first action?"},
|
| 338 |
+
]
|
| 339 |
+
|
| 340 |
+
trace: list[dict[str, Any]] = []
|
| 341 |
+
step_num = 0
|
| 342 |
+
done = False
|
| 343 |
+
final_obs: dict[str, Any] = obs
|
| 344 |
+
|
| 345 |
+
for step_num in range(1, MAX_STEPS + 1):
|
| 346 |
+
action, raw = chat_model.generate_action(messages)
|
| 347 |
+
step_obs = _env_step(action, episode_id)
|
| 348 |
+
final_obs = step_obs
|
| 349 |
+
done = bool(step_obs.get("done", False))
|
| 350 |
+
trace.append(
|
| 351 |
+
{
|
| 352 |
+
"step": step_num,
|
| 353 |
+
"action": action,
|
| 354 |
+
"raw_model_output": raw,
|
| 355 |
+
"reward": float(step_obs.get("reward", 0.0)),
|
| 356 |
+
"done": done,
|
| 357 |
+
"tool_result": step_obs.get("tool_result", ""),
|
| 358 |
+
}
|
| 359 |
+
)
|
| 360 |
+
if done:
|
| 361 |
+
break
|
| 362 |
+
messages.append({"role": "assistant", "content": raw})
|
| 363 |
+
messages.append({"role": "user", "content": f"TOOL RESULT: {step_obs.get('tool_result', '')}\n\nWhat is your next action?"})
|
| 364 |
+
|
| 365 |
+
if not done:
|
| 366 |
+
final_obs = _env_step({"action_type": "submit_plan"}, episode_id)
|
| 367 |
+
step_num += 1
|
| 368 |
+
trace.append(
|
| 369 |
+
{
|
| 370 |
+
"step": step_num,
|
| 371 |
+
"action": {"action_type": "submit_plan"},
|
| 372 |
+
"raw_model_output": '{"action_type":"submit_plan"}',
|
| 373 |
+
"reward": float(final_obs.get("reward", 0.0)),
|
| 374 |
+
"done": True,
|
| 375 |
+
"tool_result": final_obs.get("tool_result", ""),
|
| 376 |
+
}
|
| 377 |
+
)
|
| 378 |
+
|
| 379 |
+
state = _env_state(episode_id)
|
| 380 |
+
final_reward = float(final_obs.get("reward", 0.0))
|
| 381 |
+
return {
|
| 382 |
+
"task_id": task_id,
|
| 383 |
+
"difficulty": final_obs.get("difficulty", ""),
|
| 384 |
+
"model_name": chat_model.display_name,
|
| 385 |
+
"final_reward": round(final_reward, 4),
|
| 386 |
+
"success": final_reward >= SUCCESS_THRESHOLD,
|
| 387 |
+
"steps_used": int(state.get("step_count", step_num)),
|
| 388 |
+
"violation_count": int(state.get("violation_count", 0)),
|
| 389 |
+
"reward_breakdown": final_obs.get("reward_breakdown", {}),
|
| 390 |
+
"feedback": final_obs.get("feedback", ""),
|
| 391 |
+
"trace": trace,
|
| 392 |
+
}
|
| 393 |
+
|
| 394 |
+
|
| 395 |
+
def run_model(chat_model: LocalChatModel, task_ids: list[str]) -> list[dict[str, Any]]:
|
| 396 |
+
results: list[dict[str, Any]] = []
|
| 397 |
+
n = len(task_ids)
|
| 398 |
+
label = chat_model.display_name
|
| 399 |
+
for i, task_id in enumerate(task_ids, start=1):
|
| 400 |
+
print(f"[eval {label}] task {i}/{n}: {task_id}", flush=True)
|
| 401 |
+
results.append(run_task(chat_model, task_id=task_id))
|
| 402 |
+
return results
|
| 403 |
+
|
| 404 |
+
|
| 405 |
+
def _write_json(path: Path, payload: Any) -> None:
|
| 406 |
+
path.write_text(json.dumps(payload, indent=2))
|
| 407 |
+
|
| 408 |
+
|
| 409 |
+
def write_artifacts(baseline: list[dict[str, Any]], trained: list[dict[str, Any]]) -> None:
|
| 410 |
+
by_task = {row["task_id"]: row for row in trained}
|
| 411 |
+
comparison_rows: list[dict[str, Any]] = []
|
| 412 |
+
for base in baseline:
|
| 413 |
+
tr = by_task[base["task_id"]]
|
| 414 |
+
comparison_rows.append(
|
| 415 |
+
{
|
| 416 |
+
"task_id": base["task_id"],
|
| 417 |
+
"difficulty": base["difficulty"],
|
| 418 |
+
"baseline_reward": base["final_reward"],
|
| 419 |
+
"trained_reward": tr["final_reward"],
|
| 420 |
+
"reward_delta": round(tr["final_reward"] - base["final_reward"], 4),
|
| 421 |
+
"baseline_steps": base["steps_used"],
|
| 422 |
+
"trained_steps": tr["steps_used"],
|
| 423 |
+
"step_delta": tr["steps_used"] - base["steps_used"],
|
| 424 |
+
"baseline_violations": base["violation_count"],
|
| 425 |
+
"trained_violations": tr["violation_count"],
|
| 426 |
+
"violation_delta": tr["violation_count"] - base["violation_count"],
|
| 427 |
+
"baseline_success": int(base["success"]),
|
| 428 |
+
"trained_success": int(tr["success"]),
|
| 429 |
+
}
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
_write_json(ARTIFACT_DIR / "baseline_llm_eval.json", baseline)
|
| 433 |
+
_write_json(ARTIFACT_DIR / "trained_llm_eval.json", trained)
|
| 434 |
+
_write_json(
|
| 435 |
+
ARTIFACT_DIR / "llm_eval_protocol.json",
|
| 436 |
+
{
|
| 437 |
+
"task_set": "easy_001..hard_015",
|
| 438 |
+
"seed": EVAL_SEED,
|
| 439 |
+
"max_steps": MAX_STEPS,
|
| 440 |
+
"decode_config": {
|
| 441 |
+
"temperature": TEMPERATURE,
|
| 442 |
+
"top_p": TOP_P,
|
| 443 |
+
"max_new_tokens": MAX_NEW_TOKENS,
|
| 444 |
+
},
|
| 445 |
+
"env_base_url": ENV_BASE_URL,
|
| 446 |
+
"baseline_model_name": BASELINE_MODEL,
|
| 447 |
+
"trained_model_path": TRAINED_MODEL_PATH,
|
| 448 |
+
"success_threshold": SUCCESS_THRESHOLD,
|
| 449 |
+
},
|
| 450 |
+
)
|
| 451 |
+
|
| 452 |
+
with (ARTIFACT_DIR / "llm_comparison.csv").open("w", newline="") as f:
|
| 453 |
+
writer = csv.DictWriter(f, fieldnames=list(comparison_rows[0].keys()))
|
| 454 |
+
writer.writeheader()
|
| 455 |
+
writer.writerows(comparison_rows)
|
| 456 |
+
|
| 457 |
+
baseline_rewards = [r["baseline_reward"] for r in comparison_rows]
|
| 458 |
+
trained_rewards = [r["trained_reward"] for r in comparison_rows]
|
| 459 |
+
reward_deltas = [r["reward_delta"] for r in comparison_rows]
|
| 460 |
+
baseline_steps = [r["baseline_steps"] for r in comparison_rows]
|
| 461 |
+
trained_steps = [r["trained_steps"] for r in comparison_rows]
|
| 462 |
+
baseline_violations = [r["baseline_violations"] for r in comparison_rows]
|
| 463 |
+
trained_violations = [r["trained_violations"] for r in comparison_rows]
|
| 464 |
+
baseline_success = [r["baseline_success"] for r in comparison_rows]
|
| 465 |
+
trained_success = [r["trained_success"] for r in comparison_rows]
|
| 466 |
+
|
| 467 |
+
summary = {
|
| 468 |
+
"task_count": len(comparison_rows),
|
| 469 |
+
"baseline_mean_reward": round(mean(baseline_rewards), 4),
|
| 470 |
+
"trained_mean_reward": round(mean(trained_rewards), 4),
|
| 471 |
+
"mean_reward_delta": round(mean(trained_rewards) - mean(baseline_rewards), 4),
|
| 472 |
+
"median_reward_delta": round(median(reward_deltas), 4),
|
| 473 |
+
"baseline_success_rate": round(mean(baseline_success), 4),
|
| 474 |
+
"trained_success_rate": round(mean(trained_success), 4),
|
| 475 |
+
"success_rate_delta": round(mean(trained_success) - mean(baseline_success), 4),
|
| 476 |
+
"baseline_mean_steps": round(mean(baseline_steps), 4),
|
| 477 |
+
"trained_mean_steps": round(mean(trained_steps), 4),
|
| 478 |
+
"step_delta": round(mean(trained_steps) - mean(baseline_steps), 4),
|
| 479 |
+
"baseline_mean_violations": round(mean(baseline_violations), 4),
|
| 480 |
+
"trained_mean_violations": round(mean(trained_violations), 4),
|
| 481 |
+
"violation_delta": round(mean(trained_violations) - mean(baseline_violations), 4),
|
| 482 |
+
"tasks_with_positive_reward_delta": sum(1 for x in reward_deltas if x > 0),
|
| 483 |
+
"tasks_with_no_reward_delta": sum(1 for x in reward_deltas if x == 0),
|
| 484 |
+
"per_difficulty": {},
|
| 485 |
+
}
|
| 486 |
+
|
| 487 |
+
for difficulty in ("easy", "medium", "hard"):
|
| 488 |
+
subset = [r for r in comparison_rows if r["difficulty"] == difficulty]
|
| 489 |
+
if not subset:
|
| 490 |
+
continue
|
| 491 |
+
summary["per_difficulty"][difficulty] = {
|
| 492 |
+
"count": len(subset),
|
| 493 |
+
"baseline_mean_reward": round(mean([r["baseline_reward"] for r in subset]), 4),
|
| 494 |
+
"trained_mean_reward": round(mean([r["trained_reward"] for r in subset]), 4),
|
| 495 |
+
"reward_delta": round(
|
| 496 |
+
mean([r["trained_reward"] for r in subset]) - mean([r["baseline_reward"] for r in subset]),
|
| 497 |
+
4,
|
| 498 |
+
),
|
| 499 |
+
"baseline_mean_steps": round(mean([r["baseline_steps"] for r in subset]), 4),
|
| 500 |
+
"trained_mean_steps": round(mean([r["trained_steps"] for r in subset]), 4),
|
| 501 |
+
"step_delta": round(
|
| 502 |
+
mean([r["trained_steps"] for r in subset]) - mean([r["baseline_steps"] for r in subset]),
|
| 503 |
+
4,
|
| 504 |
+
),
|
| 505 |
+
}
|
| 506 |
+
|
| 507 |
+
_write_json(ARTIFACT_DIR / "llm_summary.json", summary)
|
| 508 |
+
|
| 509 |
+
target_task = "hard_015"
|
| 510 |
+
base_case = next((r for r in baseline if r["task_id"] == target_task), None)
|
| 511 |
+
tr_case = next((r for r in trained if r["task_id"] == target_task), None)
|
| 512 |
+
if base_case and tr_case:
|
| 513 |
+
case_study = f"""# LLM Case Study: {target_task}
|
| 514 |
+
|
| 515 |
+
## Baseline model ({BASELINE_MODEL})
|
| 516 |
+
- Reward: {base_case['final_reward']:.4f}
|
| 517 |
+
- Steps: {base_case['steps_used']}
|
| 518 |
+
- Violations: {base_case['violation_count']}
|
| 519 |
+
- Feedback: {base_case['feedback']}
|
| 520 |
+
|
| 521 |
+
## Trained model ({TRAINED_MODEL_PATH})
|
| 522 |
+
- Reward: {tr_case['final_reward']:.4f}
|
| 523 |
+
- Steps: {tr_case['steps_used']}
|
| 524 |
+
- Violations: {tr_case['violation_count']}
|
| 525 |
+
- Feedback: {tr_case['feedback']}
|
| 526 |
+
"""
|
| 527 |
+
(ARTIFACT_DIR / "llm_case_study_hard_015.md").write_text(case_study)
|
| 528 |
+
|
| 529 |
+
|
| 530 |
+
def _print_summary() -> None:
|
| 531 |
+
summary_path = ARTIFACT_DIR / "llm_summary.json"
|
| 532 |
+
summary = json.loads(summary_path.read_text())
|
| 533 |
+
print("\nCheckpoint comparison summary")
|
| 534 |
+
print(f"Baseline mean reward: {summary['baseline_mean_reward']:.4f}")
|
| 535 |
+
print(f"Trained mean reward: {summary['trained_mean_reward']:.4f}")
|
| 536 |
+
print(f"Reward delta: {summary['mean_reward_delta']:+.4f}")
|
| 537 |
+
print(f"Baseline success: {summary['baseline_success_rate']:.4f}")
|
| 538 |
+
print(f"Trained success: {summary['trained_success_rate']:.4f}")
|
| 539 |
+
print(f"Success delta: {summary['success_rate_delta']:+.4f}")
|
| 540 |
+
|
| 541 |
+
|
| 542 |
+
def main() -> None:
|
| 543 |
+
_require_env()
|
| 544 |
+
task_ids = _get_task_ids()
|
| 545 |
+
print(f"CommitmentOS LLM eval: {len(task_ids)} tasks, env={ENV_BASE_URL}", flush=True)
|
| 546 |
+
|
| 547 |
+
print("Loading baseline model…", flush=True)
|
| 548 |
+
baseline_model = load_baseline_model()
|
| 549 |
+
print("Running baseline…", flush=True)
|
| 550 |
+
baseline_results = run_model(baseline_model, task_ids)
|
| 551 |
+
baseline_model.unload()
|
| 552 |
+
|
| 553 |
+
print("Loading trained adapter…", flush=True)
|
| 554 |
+
trained_model = load_trained_model()
|
| 555 |
+
print("Running trained…", flush=True)
|
| 556 |
+
trained_results = run_model(trained_model, task_ids)
|
| 557 |
+
trained_model.unload()
|
| 558 |
+
|
| 559 |
+
write_artifacts(baseline_results, trained_results)
|
| 560 |
+
print("Wrote LLM checkpoint artifacts to", ARTIFACT_DIR)
|
| 561 |
+
_print_summary()
|
| 562 |
+
|
| 563 |
+
|
| 564 |
+
if __name__ == "__main__":
|
| 565 |
+
main()
|
evaluation/plot_llm_checkpoints.py
ADDED
|
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Render SVG visuals for LLM checkpoint comparison."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import csv
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
ARTIFACT_DIR = Path("artifacts/evals_llm")
|
| 9 |
+
COMPARISON_CSV = ARTIFACT_DIR / "llm_comparison.csv"
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def _svg_header(width: int, height: int) -> list[str]:
|
| 13 |
+
return [
|
| 14 |
+
f'<svg xmlns="http://www.w3.org/2000/svg" width="{width}" height="{height}" viewBox="0 0 {width} {height}">',
|
| 15 |
+
'<rect width="100%" height="100%" fill="#FFFFFF"/>',
|
| 16 |
+
]
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def _svg_footer() -> list[str]:
|
| 20 |
+
return ["</svg>"]
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _rows() -> list[dict[str, str]]:
|
| 24 |
+
with COMPARISON_CSV.open() as f:
|
| 25 |
+
return list(csv.DictReader(f))
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def plot_reward(rows: list[dict[str, str]]) -> None:
|
| 29 |
+
tasks = [r["task_id"] for r in rows]
|
| 30 |
+
base = [float(r["baseline_reward"]) for r in rows]
|
| 31 |
+
trained = [float(r["trained_reward"]) for r in rows]
|
| 32 |
+
|
| 33 |
+
width, height = 1360, 520
|
| 34 |
+
left, right, top, bottom = 80, 40, 70, 110
|
| 35 |
+
plot_w = width - left - right
|
| 36 |
+
plot_h = height - top - bottom
|
| 37 |
+
group_w = plot_w / max(len(tasks), 1)
|
| 38 |
+
bar_w = max(group_w * 0.32, 10)
|
| 39 |
+
|
| 40 |
+
lines = _svg_header(width, height)
|
| 41 |
+
lines.append('<text x="80" y="35" font-size="22" font-family="Arial" fill="#111827">Base vs Trained LLM Reward by Task</text>')
|
| 42 |
+
lines.append(f'<line x1="{left}" y1="{top+plot_h}" x2="{left+plot_w}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
|
| 43 |
+
lines.append(f'<line x1="{left}" y1="{top}" x2="{left}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
|
| 44 |
+
|
| 45 |
+
for tick in range(0, 6):
|
| 46 |
+
value = tick / 5
|
| 47 |
+
y = top + plot_h - (value * plot_h)
|
| 48 |
+
lines.append(f'<line x1="{left}" y1="{y:.2f}" x2="{left+plot_w}" y2="{y:.2f}" stroke="#E5E7EB" stroke-width="1"/>')
|
| 49 |
+
lines.append(f'<text x="{left-38}" y="{y+5:.2f}" font-size="12" font-family="Arial" fill="#374151">{value:.1f}</text>')
|
| 50 |
+
|
| 51 |
+
for idx, task in enumerate(tasks):
|
| 52 |
+
gx = left + (idx * group_w) + (group_w * 0.5)
|
| 53 |
+
b_h = base[idx] * plot_h
|
| 54 |
+
t_h = trained[idx] * plot_h
|
| 55 |
+
b_x = gx - bar_w - 2
|
| 56 |
+
t_x = gx + 2
|
| 57 |
+
b_y = top + plot_h - b_h
|
| 58 |
+
t_y = top + plot_h - t_h
|
| 59 |
+
lines.append(f'<rect x="{b_x:.2f}" y="{b_y:.2f}" width="{bar_w:.2f}" height="{b_h:.2f}" fill="#9CA3AF"/>')
|
| 60 |
+
lines.append(f'<rect x="{t_x:.2f}" y="{t_y:.2f}" width="{bar_w:.2f}" height="{t_h:.2f}" fill="#2563EB"/>')
|
| 61 |
+
lines.append(
|
| 62 |
+
f'<text x="{gx:.2f}" y="{top+plot_h+22}" font-size="10" text-anchor="middle" '
|
| 63 |
+
f'font-family="Arial" fill="#374151" transform="rotate(25 {gx:.2f},{top+plot_h+22})">{task}</text>'
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
legend_y = 52
|
| 67 |
+
lines.append(f'<rect x="{width-310}" y="{legend_y-10}" width="12" height="12" fill="#9CA3AF"/>')
|
| 68 |
+
lines.append(f'<text x="{width-292}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Base</text>')
|
| 69 |
+
lines.append(f'<rect x="{width-230}" y="{legend_y-10}" width="12" height="12" fill="#2563EB"/>')
|
| 70 |
+
lines.append(f'<text x="{width-212}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Trained</text>')
|
| 71 |
+
lines.extend(_svg_footer())
|
| 72 |
+
(ARTIFACT_DIR / "llm_reward_by_task.svg").write_text("\n".join(lines))
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def plot_violations(rows: list[dict[str, str]]) -> None:
|
| 76 |
+
tasks = [r["task_id"] for r in rows]
|
| 77 |
+
base = [int(r["baseline_violations"]) for r in rows]
|
| 78 |
+
trained = [int(r["trained_violations"]) for r in rows]
|
| 79 |
+
max_v = max(max(base, default=0), max(trained, default=0), 1)
|
| 80 |
+
|
| 81 |
+
width, height = 1360, 500
|
| 82 |
+
left, right, top, bottom = 80, 40, 70, 100
|
| 83 |
+
plot_w = width - left - right
|
| 84 |
+
plot_h = height - top - bottom
|
| 85 |
+
|
| 86 |
+
def point_x(i: int) -> float:
|
| 87 |
+
return left + (i / max(len(tasks) - 1, 1)) * plot_w
|
| 88 |
+
|
| 89 |
+
def point_y(v: int) -> float:
|
| 90 |
+
return top + plot_h - ((v / max_v) * plot_h)
|
| 91 |
+
|
| 92 |
+
lines = _svg_header(width, height)
|
| 93 |
+
lines.append('<text x="80" y="35" font-size="22" font-family="Arial" fill="#111827">Base vs Trained LLM Commitment Violations</text>')
|
| 94 |
+
lines.append(f'<line x1="{left}" y1="{top+plot_h}" x2="{left+plot_w}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
|
| 95 |
+
lines.append(f'<line x1="{left}" y1="{top}" x2="{left}" y2="{top+plot_h}" stroke="#374151" stroke-width="1"/>')
|
| 96 |
+
|
| 97 |
+
for tick in range(max_v + 1):
|
| 98 |
+
y = point_y(tick)
|
| 99 |
+
lines.append(f'<line x1="{left}" y1="{y:.2f}" x2="{left+plot_w}" y2="{y:.2f}" stroke="#E5E7EB" stroke-width="1"/>')
|
| 100 |
+
lines.append(f'<text x="{left-24}" y="{y+5:.2f}" font-size="12" font-family="Arial" fill="#374151">{tick}</text>')
|
| 101 |
+
|
| 102 |
+
base_points = " ".join(f"{point_x(i):.2f},{point_y(v):.2f}" for i, v in enumerate(base))
|
| 103 |
+
tr_points = " ".join(f"{point_x(i):.2f},{point_y(v):.2f}" for i, v in enumerate(trained))
|
| 104 |
+
lines.append(f'<polyline points="{base_points}" fill="none" stroke="#DC2626" stroke-width="2"/>')
|
| 105 |
+
lines.append(f'<polyline points="{tr_points}" fill="none" stroke="#059669" stroke-width="2"/>')
|
| 106 |
+
|
| 107 |
+
for i, task in enumerate(tasks):
|
| 108 |
+
x = point_x(i)
|
| 109 |
+
lines.append(f'<circle cx="{x:.2f}" cy="{point_y(base[i]):.2f}" r="3" fill="#DC2626"/>')
|
| 110 |
+
lines.append(f'<circle cx="{x:.2f}" cy="{point_y(trained[i]):.2f}" r="3" fill="#059669"/>')
|
| 111 |
+
lines.append(
|
| 112 |
+
f'<text x="{x:.2f}" y="{top+plot_h+20}" font-size="10" text-anchor="middle" '
|
| 113 |
+
f'font-family="Arial" fill="#374151" transform="rotate(25 {x:.2f},{top+plot_h+20})">{task}</text>'
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
legend_y = 52
|
| 117 |
+
lines.append(f'<line x1="{width-320}" y1="{legend_y-5}" x2="{width-300}" y2="{legend_y-5}" stroke="#DC2626" stroke-width="2"/>')
|
| 118 |
+
lines.append(f'<text x="{width-295}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Base</text>')
|
| 119 |
+
lines.append(f'<line x1="{width-230}" y1="{legend_y-5}" x2="{width-210}" y2="{legend_y-5}" stroke="#059669" stroke-width="2"/>')
|
| 120 |
+
lines.append(f'<text x="{width-205}" y="{legend_y}" font-size="12" font-family="Arial" fill="#111827">Trained</text>')
|
| 121 |
+
lines.extend(_svg_footer())
|
| 122 |
+
(ARTIFACT_DIR / "llm_violations_before_after.svg").write_text("\n".join(lines))
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
def main() -> None:
|
| 126 |
+
rows = _rows()
|
| 127 |
+
plot_reward(rows)
|
| 128 |
+
plot_violations(rows)
|
| 129 |
+
print("Wrote checkpoint comparison SVG plots to", ARTIFACT_DIR)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
if __name__ == "__main__":
|
| 133 |
+
main()
|
pyproject.toml
CHANGED
|
@@ -7,7 +7,7 @@ name = "commitment-os"
|
|
| 7 |
version = "0.1.0"
|
| 8 |
description = "CommitmentOS: the first RL environment that trains temporal commitment coherence in LLMs"
|
| 9 |
requires-python = ">=3.10"
|
| 10 |
-
license =
|
| 11 |
authors = [
|
| 12 |
{name = "Jayant Aggarwal"},
|
| 13 |
]
|
|
@@ -40,4 +40,19 @@ training = [
|
|
| 40 |
"torch>=2.0.0",
|
| 41 |
"peft>=0.14.0",
|
| 42 |
"datasets>=3.0.0",
|
|
|
|
|
|
|
| 43 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
version = "0.1.0"
|
| 8 |
description = "CommitmentOS: the first RL environment that trains temporal commitment coherence in LLMs"
|
| 9 |
requires-python = ">=3.10"
|
| 10 |
+
license = "MIT"
|
| 11 |
authors = [
|
| 12 |
{name = "Jayant Aggarwal"},
|
| 13 |
]
|
|
|
|
| 40 |
"torch>=2.0.0",
|
| 41 |
"peft>=0.14.0",
|
| 42 |
"datasets>=3.0.0",
|
| 43 |
+
"accelerate>=0.30.0",
|
| 44 |
+
"sentencepiece>=0.2.0",
|
| 45 |
]
|
| 46 |
+
# Local Transformers + PEFT eval (evaluate_llm_checkpoints.py); not in Docker requirements.txt
|
| 47 |
+
llm-eval = [
|
| 48 |
+
"transformers>=4.45.0",
|
| 49 |
+
"peft>=0.14.0",
|
| 50 |
+
"torch>=2.0.0",
|
| 51 |
+
"accelerate>=0.30.0",
|
| 52 |
+
"sentencepiece>=0.2.0",
|
| 53 |
+
"requests>=2.31.0",
|
| 54 |
+
]
|
| 55 |
+
|
| 56 |
+
[tool.setuptools.packages.find]
|
| 57 |
+
where = ["."]
|
| 58 |
+
include = ["server*", "training*"]
|
server/__init__.py
CHANGED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""CommitmentOS HTTP server and environment implementation."""
|
training/CommitmentOS_Training.ipynb
CHANGED
|
@@ -1,95 +1,119 @@
|
|
| 1 |
{
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
},
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
"execution_count": null,
|
| 15 |
-
"metadata": {},
|
| 16 |
-
"outputs": [],
|
| 17 |
-
"source": [
|
| 18 |
-
"!pip -q install --upgrade pip\\n",
|
| 19 |
-
"!pip -q install openenv trl transformers peft datasets torch accelerate bitsandbytes matplotlib pandas"
|
| 20 |
-
]
|
| 21 |
-
},
|
| 22 |
-
{
|
| 23 |
-
"cell_type": "code",
|
| 24 |
-
"execution_count": null,
|
| 25 |
-
"metadata": {},
|
| 26 |
-
"outputs": [],
|
| 27 |
-
"source": [
|
| 28 |
-
"!git clone https://github.com/Jayant2304/commitment_os.git\\n",
|
| 29 |
-
"%cd commitment_os\\n",
|
| 30 |
-
"!python -m pytest tests/test_environment.py -q"
|
| 31 |
-
]
|
| 32 |
-
},
|
| 33 |
-
{
|
| 34 |
-
"cell_type": "code",
|
| 35 |
-
"execution_count": null,
|
| 36 |
-
"metadata": {},
|
| 37 |
-
"outputs": [],
|
| 38 |
-
"source": [
|
| 39 |
-
"!python training/train_grpo.py \\\\\\n",
|
| 40 |
-
" --model Qwen/Qwen2.5-1.5B-Instruct \\\\\\n",
|
| 41 |
-
" --epochs 2 \\\\\\n",
|
| 42 |
-
" --lr 5e-6 \\\\\\n",
|
| 43 |
-
" --batch_size 1 \\\\\\n",
|
| 44 |
-
" --group_size 2 \\\\\\n",
|
| 45 |
-
" --lora_rank 16 \\\\\\n",
|
| 46 |
-
" --lora_alpha 32 \\\\\\n",
|
| 47 |
-
" --output_dir ./training_output"
|
| 48 |
-
]
|
| 49 |
-
},
|
| 50 |
-
{
|
| 51 |
-
"cell_type": "code",
|
| 52 |
-
"execution_count": null,
|
| 53 |
-
"metadata": {},
|
| 54 |
-
"outputs": [],
|
| 55 |
-
"source": [
|
| 56 |
-
"import json\\n",
|
| 57 |
-
"import matplotlib.pyplot as plt\\n",
|
| 58 |
-
"from pathlib import Path\\n",
|
| 59 |
-
"\\n",
|
| 60 |
-
"p = Path('training_output/training_metrics.json')\\n",
|
| 61 |
-
"logs = json.loads(p.read_text())\\n",
|
| 62 |
-
"\\n",
|
| 63 |
-
"steps = [float(x['step']) for x in logs if 'step' in x and 'loss' in x]\\n",
|
| 64 |
-
"loss = [float(x['loss']) for x in logs if 'step' in x and 'loss' in x]\\n",
|
| 65 |
-
"r_steps = [float(x['step']) for x in logs if 'step' in x and 'reward' in x]\\n",
|
| 66 |
-
"rewards = [float(x['reward']) for x in logs if 'step' in x and 'reward' in x]\\n",
|
| 67 |
-
"\\n",
|
| 68 |
-
"plt.figure(figsize=(8,5))\\n",
|
| 69 |
-
"plt.plot(steps, loss, marker='o')\\n",
|
| 70 |
-
"plt.title('CommitmentOS GRPO Loss vs Step')\\n",
|
| 71 |
-
"plt.xlabel('Step'); plt.ylabel('Loss'); plt.grid(alpha=0.3)\\n",
|
| 72 |
-
"plt.tight_layout(); plt.savefig('loss_curve.png', dpi=200); plt.show()\\n",
|
| 73 |
-
"\\n",
|
| 74 |
-
"plt.figure(figsize=(8,5))\\n",
|
| 75 |
-
"plt.plot(r_steps, rewards, marker='o')\\n",
|
| 76 |
-
"plt.title('CommitmentOS GRPO Reward vs Step')\\n",
|
| 77 |
-
"plt.xlabel('Step'); plt.ylabel('Reward'); plt.grid(alpha=0.3)\\n",
|
| 78 |
-
"plt.tight_layout(); plt.savefig('reward_curve.png', dpi=200); plt.show()"
|
| 79 |
-
]
|
| 80 |
-
}
|
| 81 |
-
],
|
| 82 |
-
"metadata": {
|
| 83 |
-
"kernelspec": {
|
| 84 |
-
"display_name": "Python 3",
|
| 85 |
-
"language": "python",
|
| 86 |
-
"name": "python3"
|
| 87 |
-
},
|
| 88 |
-
"language_info": {
|
| 89 |
-
"name": "python",
|
| 90 |
-
"version": "3.10"
|
| 91 |
-
}
|
| 92 |
-
},
|
| 93 |
-
"nbformat": 4,
|
| 94 |
-
"nbformat_minor": 5
|
| 95 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# CommitmentOS Training Notebook\\n\n",
|
| 8 |
+
"\\n\n",
|
| 9 |
+
"This notebook reproduces GRPO training for CommitmentOS using TRL + LoRA."
|
| 10 |
+
]
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"cell_type": "code",
|
| 14 |
+
"execution_count": null,
|
| 15 |
+
"id": "5bc9c2fe",
|
| 16 |
+
"metadata": {},
|
| 17 |
+
"outputs": [],
|
| 18 |
+
"source": [
|
| 19 |
+
"!pip -q install --upgrade pip\\n\n",
|
| 20 |
+
"!pip -q install \"openenv-core>=0.2.0\" trl transformers peft datasets torch accelerate bitsandbytes matplotlib pandas pydantic"
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"cell_type": "code",
|
| 25 |
+
"execution_count": null,
|
| 26 |
+
"metadata": {},
|
| 27 |
+
"outputs": [],
|
| 28 |
+
"source": [
|
| 29 |
+
"!git clone https://github.com/Jayant2304/commitment_os.git\\n\n",
|
| 30 |
+
"%cd commitment_os\\n\n",
|
| 31 |
+
"!python -m pytest tests/test_environment.py -q"
|
| 32 |
+
]
|
| 33 |
+
},
|
| 34 |
+
{
|
| 35 |
+
"cell_type": "code",
|
| 36 |
+
"execution_count": null,
|
| 37 |
+
"metadata": {},
|
| 38 |
+
"outputs": [],
|
| 39 |
+
"source": [
|
| 40 |
+
"!python training/train_grpo.py \\\\\\n\n",
|
| 41 |
+
" --model Qwen/Qwen2.5-1.5B-Instruct \\\\\\n\n",
|
| 42 |
+
" --epochs 2 \\\\\\n\n",
|
| 43 |
+
" --lr 5e-6 \\\\\\n\n",
|
| 44 |
+
" --batch_size 1 \\\\\\n\n",
|
| 45 |
+
" --group_size 2 \\\\\\n\n",
|
| 46 |
+
" --lora_rank 16 \\\\\\n\n",
|
| 47 |
+
" --lora_alpha 32 \\\\\\n\n",
|
| 48 |
+
" --output_dir ./training_output"
|
| 49 |
+
]
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"cell_type": "code",
|
| 53 |
+
"execution_count": null,
|
| 54 |
+
"metadata": {},
|
| 55 |
+
"outputs": [],
|
| 56 |
+
"source": [
|
| 57 |
+
"import json\\n\n",
|
| 58 |
+
"import matplotlib.pyplot as plt\\n\n",
|
| 59 |
+
"from pathlib import Path\\n\n",
|
| 60 |
+
"\\n\n",
|
| 61 |
+
"p = Path('training_output/training_metrics.json')\\n\n",
|
| 62 |
+
"logs = json.loads(p.read_text())\\n\n",
|
| 63 |
+
"\\n\n",
|
| 64 |
+
"steps = [float(x['step']) for x in logs if 'step' in x and 'loss' in x]\\n\n",
|
| 65 |
+
"loss = [float(x['loss']) for x in logs if 'step' in x and 'loss' in x]\\n\n",
|
| 66 |
+
"r_steps = [float(x['step']) for x in logs if 'step' in x and 'reward' in x]\\n\n",
|
| 67 |
+
"rewards = [float(x['reward']) for x in logs if 'step' in x and 'reward' in x]\\n\n",
|
| 68 |
+
"\\n\n",
|
| 69 |
+
"plt.figure(figsize=(8,5))\\n\n",
|
| 70 |
+
"plt.plot(steps, loss, marker='o')\\n\n",
|
| 71 |
+
"plt.title('CommitmentOS GRPO Loss vs Step')\\n\n",
|
| 72 |
+
"plt.xlabel('Step'); plt.ylabel('Loss'); plt.grid(alpha=0.3)\\n\n",
|
| 73 |
+
"plt.tight_layout(); plt.savefig('loss_curve.png', dpi=200); plt.show()\\n\n",
|
| 74 |
+
"\\n\n",
|
| 75 |
+
"plt.figure(figsize=(8,5))\\n\n",
|
| 76 |
+
"plt.plot(r_steps, rewards, marker='o')\\n\n",
|
| 77 |
+
"plt.title('CommitmentOS GRPO Reward vs Step')\\n\n",
|
| 78 |
+
"plt.xlabel('Step'); plt.ylabel('Reward'); plt.grid(alpha=0.3)\\n\n",
|
| 79 |
+
"plt.tight_layout(); plt.savefig('reward_curve.png', dpi=200); plt.show()"
|
| 80 |
+
]
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"cell_type": "markdown",
|
| 84 |
+
"id": "e788b455",
|
| 85 |
+
"metadata": {},
|
| 86 |
+
"source": [
|
| 87 |
+
"### Optional: zip `training_output` for download\n",
|
| 88 |
+
"\n",
|
| 89 |
+
"Run after training completes. On Colab, use **Files** sidebar or `files.download` for the zip.\n"
|
| 90 |
+
]
|
| 91 |
+
},
|
| 92 |
+
{
|
| 93 |
+
"cell_type": "code",
|
| 94 |
+
"execution_count": null,
|
| 95 |
+
"id": "1b3c760a",
|
| 96 |
+
"metadata": {},
|
| 97 |
+
"outputs": [],
|
| 98 |
+
"source": [
|
| 99 |
+
"!cd /content/commitment_os && du -sh training_output && zip -r /content/training_output_only.zip training_output\n",
|
| 100 |
+
"from google.colab import files\n",
|
| 101 |
+
"\n",
|
| 102 |
+
"files.download(\"/content/training_output_only.zip\")\n"
|
| 103 |
+
]
|
| 104 |
+
}
|
| 105 |
+
],
|
| 106 |
+
"metadata": {
|
| 107 |
+
"kernelspec": {
|
| 108 |
+
"display_name": "Python 3",
|
| 109 |
+
"language": "python",
|
| 110 |
+
"name": "python3"
|
| 111 |
+
},
|
| 112 |
+
"language_info": {
|
| 113 |
+
"name": "python",
|
| 114 |
+
"version": "3.10"
|
| 115 |
+
}
|
| 116 |
},
|
| 117 |
+
"nbformat": 4,
|
| 118 |
+
"nbformat_minor": 5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
}
|
uv.lock
CHANGED
|
@@ -660,9 +660,19 @@ inference = [
|
|
| 660 |
{ name = "openai" },
|
| 661 |
{ name = "requests" },
|
| 662 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 663 |
training = [
|
|
|
|
| 664 |
{ name = "datasets" },
|
| 665 |
{ name = "peft" },
|
|
|
|
| 666 |
{ name = "torch" },
|
| 667 |
{ name = "transformers" },
|
| 668 |
{ name = "trl" },
|
|
@@ -670,24 +680,32 @@ training = [
|
|
| 670 |
|
| 671 |
[package.metadata]
|
| 672 |
requires-dist = [
|
|
|
|
|
|
|
| 673 |
{ name = "datasets", marker = "extra == 'training'", specifier = ">=3.0.0" },
|
| 674 |
{ name = "fastapi", specifier = ">=0.110.0" },
|
| 675 |
{ name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27.0" },
|
| 676 |
{ name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
|
| 677 |
{ name = "openai", marker = "extra == 'inference'", specifier = ">=1.0.0" },
|
| 678 |
{ name = "openenv-core", specifier = ">=0.2.0" },
|
|
|
|
| 679 |
{ name = "peft", marker = "extra == 'training'", specifier = ">=0.14.0" },
|
| 680 |
{ name = "pydantic", specifier = ">=2.0.0" },
|
| 681 |
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
|
| 682 |
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
| 683 |
{ name = "requests", marker = "extra == 'dev'", specifier = ">=2.31.0" },
|
| 684 |
{ name = "requests", marker = "extra == 'inference'", specifier = ">=2.31.0" },
|
|
|
|
|
|
|
|
|
|
|
|
|
| 685 |
{ name = "torch", marker = "extra == 'training'", specifier = ">=2.0.0" },
|
|
|
|
| 686 |
{ name = "transformers", marker = "extra == 'training'", specifier = ">=4.45.0" },
|
| 687 |
{ name = "trl", marker = "extra == 'training'", specifier = ">=0.14.0" },
|
| 688 |
{ name = "uvicorn", extras = ["standard"], specifier = ">=0.29.0" },
|
| 689 |
]
|
| 690 |
-
provides-extras = ["inference", "dev", "training"]
|
| 691 |
|
| 692 |
[[package]]
|
| 693 |
name = "cryptography"
|
|
@@ -754,7 +772,7 @@ name = "cuda-bindings"
|
|
| 754 |
version = "13.2.0"
|
| 755 |
source = { registry = "https://pypi.org/simple" }
|
| 756 |
dependencies = [
|
| 757 |
-
{ name = "cuda-pathfinder" },
|
| 758 |
]
|
| 759 |
wheels = [
|
| 760 |
{ url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254 },
|
|
@@ -789,37 +807,37 @@ wheels = [
|
|
| 789 |
|
| 790 |
[package.optional-dependencies]
|
| 791 |
cublas = [
|
| 792 |
-
{ name = "nvidia-cublas", marker = "sys_platform == '
|
| 793 |
]
|
| 794 |
cudart = [
|
| 795 |
-
{ name = "nvidia-cuda-runtime", marker = "sys_platform == '
|
| 796 |
]
|
| 797 |
cufft = [
|
| 798 |
-
{ name = "nvidia-cufft", marker = "sys_platform == '
|
| 799 |
]
|
| 800 |
cufile = [
|
| 801 |
{ name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
|
| 802 |
]
|
| 803 |
cupti = [
|
| 804 |
-
{ name = "nvidia-cuda-cupti", marker = "sys_platform == '
|
| 805 |
]
|
| 806 |
curand = [
|
| 807 |
-
{ name = "nvidia-curand", marker = "sys_platform == '
|
| 808 |
]
|
| 809 |
cusolver = [
|
| 810 |
-
{ name = "nvidia-cusolver", marker = "sys_platform == '
|
| 811 |
]
|
| 812 |
cusparse = [
|
| 813 |
-
{ name = "nvidia-cusparse", marker = "sys_platform == '
|
| 814 |
]
|
| 815 |
nvjitlink = [
|
| 816 |
-
{ name = "nvidia-nvjitlink", marker = "sys_platform == '
|
| 817 |
]
|
| 818 |
nvrtc = [
|
| 819 |
-
{ name = "nvidia-cuda-nvrtc", marker = "sys_platform == '
|
| 820 |
]
|
| 821 |
nvtx = [
|
| 822 |
-
{ name = "nvidia-nvtx", marker = "sys_platform == '
|
| 823 |
]
|
| 824 |
|
| 825 |
[[package]]
|
|
@@ -2158,7 +2176,7 @@ name = "nvidia-cudnn-cu13"
|
|
| 2158 |
version = "9.19.0.56"
|
| 2159 |
source = { registry = "https://pypi.org/simple" }
|
| 2160 |
dependencies = [
|
| 2161 |
-
{ name = "nvidia-cublas" },
|
| 2162 |
]
|
| 2163 |
wheels = [
|
| 2164 |
{ url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201 },
|
|
@@ -2170,7 +2188,7 @@ name = "nvidia-cufft"
|
|
| 2170 |
version = "12.0.0.61"
|
| 2171 |
source = { registry = "https://pypi.org/simple" }
|
| 2172 |
dependencies = [
|
| 2173 |
-
{ name = "nvidia-nvjitlink" },
|
| 2174 |
]
|
| 2175 |
wheels = [
|
| 2176 |
{ url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554 },
|
|
@@ -2200,9 +2218,9 @@ name = "nvidia-cusolver"
|
|
| 2200 |
version = "12.0.4.66"
|
| 2201 |
source = { registry = "https://pypi.org/simple" }
|
| 2202 |
dependencies = [
|
| 2203 |
-
{ name = "nvidia-cublas" },
|
| 2204 |
-
{ name = "nvidia-cusparse" },
|
| 2205 |
-
{ name = "nvidia-nvjitlink" },
|
| 2206 |
]
|
| 2207 |
wheels = [
|
| 2208 |
{ url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760 },
|
|
@@ -2214,7 +2232,7 @@ name = "nvidia-cusparse"
|
|
| 2214 |
version = "12.6.3.3"
|
| 2215 |
source = { registry = "https://pypi.org/simple" }
|
| 2216 |
dependencies = [
|
| 2217 |
-
{ name = "nvidia-nvjitlink" },
|
| 2218 |
]
|
| 2219 |
wheels = [
|
| 2220 |
{ url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568 },
|
|
@@ -3637,6 +3655,70 @@ wheels = [
|
|
| 3637 |
{ url = "https://files.pythonhosted.org/packages/6a/23/8146aad7d88f4fcb3a6218f41a60f6c2d4e3a72de72da1825dc7c8f7877c/semantic_version-2.10.0-py2.py3-none-any.whl", hash = "sha256:de78a3b8e0feda74cabc54aab2da702113e33ac9d9eb9d2389bcf1f58b7d9177", size = 15552 },
|
| 3638 |
]
|
| 3639 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3640 |
[[package]]
|
| 3641 |
name = "setuptools"
|
| 3642 |
version = "81.0.0"
|
|
|
|
| 660 |
{ name = "openai" },
|
| 661 |
{ name = "requests" },
|
| 662 |
]
|
| 663 |
+
llm-eval = [
|
| 664 |
+
{ name = "accelerate" },
|
| 665 |
+
{ name = "peft" },
|
| 666 |
+
{ name = "requests" },
|
| 667 |
+
{ name = "sentencepiece" },
|
| 668 |
+
{ name = "torch" },
|
| 669 |
+
{ name = "transformers" },
|
| 670 |
+
]
|
| 671 |
training = [
|
| 672 |
+
{ name = "accelerate" },
|
| 673 |
{ name = "datasets" },
|
| 674 |
{ name = "peft" },
|
| 675 |
+
{ name = "sentencepiece" },
|
| 676 |
{ name = "torch" },
|
| 677 |
{ name = "transformers" },
|
| 678 |
{ name = "trl" },
|
|
|
|
| 680 |
|
| 681 |
[package.metadata]
|
| 682 |
requires-dist = [
|
| 683 |
+
{ name = "accelerate", marker = "extra == 'llm-eval'", specifier = ">=0.30.0" },
|
| 684 |
+
{ name = "accelerate", marker = "extra == 'training'", specifier = ">=0.30.0" },
|
| 685 |
{ name = "datasets", marker = "extra == 'training'", specifier = ">=3.0.0" },
|
| 686 |
{ name = "fastapi", specifier = ">=0.110.0" },
|
| 687 |
{ name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27.0" },
|
| 688 |
{ name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
|
| 689 |
{ name = "openai", marker = "extra == 'inference'", specifier = ">=1.0.0" },
|
| 690 |
{ name = "openenv-core", specifier = ">=0.2.0" },
|
| 691 |
+
{ name = "peft", marker = "extra == 'llm-eval'", specifier = ">=0.14.0" },
|
| 692 |
{ name = "peft", marker = "extra == 'training'", specifier = ">=0.14.0" },
|
| 693 |
{ name = "pydantic", specifier = ">=2.0.0" },
|
| 694 |
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
|
| 695 |
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
| 696 |
{ name = "requests", marker = "extra == 'dev'", specifier = ">=2.31.0" },
|
| 697 |
{ name = "requests", marker = "extra == 'inference'", specifier = ">=2.31.0" },
|
| 698 |
+
{ name = "requests", marker = "extra == 'llm-eval'", specifier = ">=2.31.0" },
|
| 699 |
+
{ name = "sentencepiece", marker = "extra == 'llm-eval'", specifier = ">=0.2.0" },
|
| 700 |
+
{ name = "sentencepiece", marker = "extra == 'training'", specifier = ">=0.2.0" },
|
| 701 |
+
{ name = "torch", marker = "extra == 'llm-eval'", specifier = ">=2.0.0" },
|
| 702 |
{ name = "torch", marker = "extra == 'training'", specifier = ">=2.0.0" },
|
| 703 |
+
{ name = "transformers", marker = "extra == 'llm-eval'", specifier = ">=4.45.0" },
|
| 704 |
{ name = "transformers", marker = "extra == 'training'", specifier = ">=4.45.0" },
|
| 705 |
{ name = "trl", marker = "extra == 'training'", specifier = ">=0.14.0" },
|
| 706 |
{ name = "uvicorn", extras = ["standard"], specifier = ">=0.29.0" },
|
| 707 |
]
|
| 708 |
+
provides-extras = ["inference", "dev", "training", "llm-eval"]
|
| 709 |
|
| 710 |
[[package]]
|
| 711 |
name = "cryptography"
|
|
|
|
| 772 |
version = "13.2.0"
|
| 773 |
source = { registry = "https://pypi.org/simple" }
|
| 774 |
dependencies = [
|
| 775 |
+
{ name = "cuda-pathfinder", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 776 |
]
|
| 777 |
wheels = [
|
| 778 |
{ url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254 },
|
|
|
|
| 807 |
|
| 808 |
[package.optional-dependencies]
|
| 809 |
cublas = [
|
| 810 |
+
{ name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 811 |
]
|
| 812 |
cudart = [
|
| 813 |
+
{ name = "nvidia-cuda-runtime", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 814 |
]
|
| 815 |
cufft = [
|
| 816 |
+
{ name = "nvidia-cufft", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 817 |
]
|
| 818 |
cufile = [
|
| 819 |
{ name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
|
| 820 |
]
|
| 821 |
cupti = [
|
| 822 |
+
{ name = "nvidia-cuda-cupti", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 823 |
]
|
| 824 |
curand = [
|
| 825 |
+
{ name = "nvidia-curand", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 826 |
]
|
| 827 |
cusolver = [
|
| 828 |
+
{ name = "nvidia-cusolver", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 829 |
]
|
| 830 |
cusparse = [
|
| 831 |
+
{ name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 832 |
]
|
| 833 |
nvjitlink = [
|
| 834 |
+
{ name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 835 |
]
|
| 836 |
nvrtc = [
|
| 837 |
+
{ name = "nvidia-cuda-nvrtc", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 838 |
]
|
| 839 |
nvtx = [
|
| 840 |
+
{ name = "nvidia-nvtx", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" },
|
| 841 |
]
|
| 842 |
|
| 843 |
[[package]]
|
|
|
|
| 2176 |
version = "9.19.0.56"
|
| 2177 |
source = { registry = "https://pypi.org/simple" }
|
| 2178 |
dependencies = [
|
| 2179 |
+
{ name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2180 |
]
|
| 2181 |
wheels = [
|
| 2182 |
{ url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201 },
|
|
|
|
| 2188 |
version = "12.0.0.61"
|
| 2189 |
source = { registry = "https://pypi.org/simple" }
|
| 2190 |
dependencies = [
|
| 2191 |
+
{ name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2192 |
]
|
| 2193 |
wheels = [
|
| 2194 |
{ url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554 },
|
|
|
|
| 2218 |
version = "12.0.4.66"
|
| 2219 |
source = { registry = "https://pypi.org/simple" }
|
| 2220 |
dependencies = [
|
| 2221 |
+
{ name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2222 |
+
{ name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2223 |
+
{ name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2224 |
]
|
| 2225 |
wheels = [
|
| 2226 |
{ url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760 },
|
|
|
|
| 2232 |
version = "12.6.3.3"
|
| 2233 |
source = { registry = "https://pypi.org/simple" }
|
| 2234 |
dependencies = [
|
| 2235 |
+
{ name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
|
| 2236 |
]
|
| 2237 |
wheels = [
|
| 2238 |
{ url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568 },
|
|
|
|
| 3655 |
{ url = "https://files.pythonhosted.org/packages/6a/23/8146aad7d88f4fcb3a6218f41a60f6c2d4e3a72de72da1825dc7c8f7877c/semantic_version-2.10.0-py2.py3-none-any.whl", hash = "sha256:de78a3b8e0feda74cabc54aab2da702113e33ac9d9eb9d2389bcf1f58b7d9177", size = 15552 },
|
| 3656 |
]
|
| 3657 |
|
| 3658 |
+
[[package]]
|
| 3659 |
+
name = "sentencepiece"
|
| 3660 |
+
version = "0.2.1"
|
| 3661 |
+
source = { registry = "https://pypi.org/simple" }
|
| 3662 |
+
sdist = { url = "https://files.pythonhosted.org/packages/15/15/2e7a025fc62d764b151ae6d0f2a92f8081755ebe8d4a64099accc6f77ba6/sentencepiece-0.2.1.tar.gz", hash = "sha256:8138cec27c2f2282f4a34d9a016e3374cd40e5c6e9cb335063db66a0a3b71fad", size = 3228515 }
|
| 3663 |
+
wheels = [
|
| 3664 |
+
{ url = "https://files.pythonhosted.org/packages/af/31/5b7cccb307b485db1a2372d6d2980b0a65d067f8be5ca943a103b4acd5b3/sentencepiece-0.2.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e10fa50bdbaa5e2445dbd387979980d391760faf0ec99a09bd7780ff37eaec44", size = 1942557 },
|
| 3665 |
+
{ url = "https://files.pythonhosted.org/packages/1f/41/0ac923a8e685ad290c5afc8ae55c5844977b8d75076fcc04302b9a324274/sentencepiece-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f27ae6deea72efdb6f361750c92f6c21fd0ad087445082770cc34015213c526", size = 1325384 },
|
| 3666 |
+
{ url = "https://files.pythonhosted.org/packages/fc/ef/3751555d67daf9003384978f169d31c775cb5c7baf28633caaf1eb2b2b4d/sentencepiece-0.2.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:60937c959e6f44159fdd9f56fbdd302501f96114a5ba436829496d5f32d8de3f", size = 1253317 },
|
| 3667 |
+
{ url = "https://files.pythonhosted.org/packages/46/a5/742c69b7bd144eb32b6e5fd50dbd8abbbc7a95fce2fe16e50156fa400e3b/sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8b1d91545578852f128650b8cce4ec20f93d39b378ff554ebe66290f2dabb92", size = 1316379 },
|
| 3668 |
+
{ url = "https://files.pythonhosted.org/packages/c8/89/8deeafbba2871e8fa10f20f17447786f4ac38085925335728d360eaf4cae/sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:27e38eee653abc3d387862e67bc5c8b6f428cd604e688b85d29170b7e725c26c", size = 1387926 },
|
| 3669 |
+
{ url = "https://files.pythonhosted.org/packages/c3/ca/67fe73005f0ab617c6a970b199754e28e524b6873aa7025224fad3cda252/sentencepiece-0.2.1-cp310-cp310-win32.whl", hash = "sha256:251874d720ac7f28024a168501f3c7bb15d1802245f6e66de565f18bbb9b5eaa", size = 999550 },
|
| 3670 |
+
{ url = "https://files.pythonhosted.org/packages/6d/33/dc5b54042050d2dda4229c3ce1f862541c99966390b6aa20f54d520d2dc2/sentencepiece-0.2.1-cp310-cp310-win_amd64.whl", hash = "sha256:e52144670738b4b477fade6c2a9b6af71a8d0094514c9853ac9f6fc1fcfabae7", size = 1054613 },
|
| 3671 |
+
{ url = "https://files.pythonhosted.org/packages/fa/19/1ea47f46ff97fe04422b78997da1a37cd632f414aae042d27a9009c5b733/sentencepiece-0.2.1-cp310-cp310-win_arm64.whl", hash = "sha256:9076430ac25dfa7147d9d05751dbc66a04bc1aaac371c07f84952979ea59f0d0", size = 1033884 },
|
| 3672 |
+
{ url = "https://files.pythonhosted.org/packages/d8/15/46afbab00733d81788b64be430ca1b93011bb9388527958e26cc31832de5/sentencepiece-0.2.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:6356d0986b8b8dc351b943150fcd81a1c6e6e4d439772e8584c64230e58ca987", size = 1942560 },
|
| 3673 |
+
{ url = "https://files.pythonhosted.org/packages/fa/79/7c01b8ef98a0567e9d84a4e7a910f8e7074fcbf398a5cd76f93f4b9316f9/sentencepiece-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:8f8ba89a3acb3dc1ae90f65ec1894b0b9596fdb98ab003ff38e058f898b39bc7", size = 1325385 },
|
| 3674 |
+
{ url = "https://files.pythonhosted.org/packages/bb/88/2b41e07bd24f33dcf2f18ec3b74247aa4af3526bad8907b8727ea3caba03/sentencepiece-0.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:02593eca45440ef39247cee8c47322a34bdcc1d8ae83ad28ba5a899a2cf8d79a", size = 1253319 },
|
| 3675 |
+
{ url = "https://files.pythonhosted.org/packages/a0/54/38a1af0c6210a3c6f95aa46d23d6640636d020fba7135cd0d9a84ada05a7/sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a0d15781a171d188b661ae4bde1d998c303f6bd8621498c50c671bd45a4798e", size = 1316162 },
|
| 3676 |
+
{ url = "https://files.pythonhosted.org/packages/ef/66/fb191403ade791ad2c3c1e72fe8413e63781b08cfa3aa4c9dfc536d6e795/sentencepiece-0.2.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4f5a3e0d9f445ed9d66c0fec47d4b23d12cfc858b407a03c194c1b26c2ac2a63", size = 1387785 },
|
| 3677 |
+
{ url = "https://files.pythonhosted.org/packages/a9/2d/3bd9b08e70067b2124518b308db6a84a4f8901cc8a4317e2e4288cdd9b4d/sentencepiece-0.2.1-cp311-cp311-win32.whl", hash = "sha256:6d297a1748d429ba8534eebe5535448d78b8acc32d00a29b49acf28102eeb094", size = 999555 },
|
| 3678 |
+
{ url = "https://files.pythonhosted.org/packages/32/b8/f709977f5fda195ae1ea24f24e7c581163b6f142b1005bc3d0bbfe4d7082/sentencepiece-0.2.1-cp311-cp311-win_amd64.whl", hash = "sha256:82d9ead6591015f009cb1be1cb1c015d5e6f04046dbb8c9588b931e869a29728", size = 1054617 },
|
| 3679 |
+
{ url = "https://files.pythonhosted.org/packages/7a/40/a1fc23be23067da0f703709797b464e8a30a1c78cc8a687120cd58d4d509/sentencepiece-0.2.1-cp311-cp311-win_arm64.whl", hash = "sha256:39f8651bd10974eafb9834ce30d9bcf5b73e1fc798a7f7d2528f9820ca86e119", size = 1033877 },
|
| 3680 |
+
{ url = "https://files.pythonhosted.org/packages/4a/be/32ce495aa1d0e0c323dcb1ba87096037358edee539cac5baf8755a6bd396/sentencepiece-0.2.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:57cae326c8727de58c85977b175af132a7138d84c764635d7e71bbee7e774133", size = 1943152 },
|
| 3681 |
+
{ url = "https://files.pythonhosted.org/packages/88/7e/ff23008899a58678e98c6ff592bf4d368eee5a71af96d0df6b38a039dd4f/sentencepiece-0.2.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:56dd39a3c4d6493db3cdca7e8cc68c6b633f0d4195495cbadfcf5af8a22d05a6", size = 1325651 },
|
| 3682 |
+
{ url = "https://files.pythonhosted.org/packages/19/84/42eb3ce4796777a1b5d3699dfd4dca85113e68b637f194a6c8d786f16a04/sentencepiece-0.2.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d9381351182ff9888cc80e41c632e7e274b106f450de33d67a9e8f6043da6f76", size = 1253645 },
|
| 3683 |
+
{ url = "https://files.pythonhosted.org/packages/89/fa/d3d5ebcba3cb9e6d3775a096251860c41a6bc53a1b9461151df83fe93255/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:99f955df238021bf11f0fc37cdb54fd5e5b5f7fd30ecc3d93fb48b6815437167", size = 1316273 },
|
| 3684 |
+
{ url = "https://files.pythonhosted.org/packages/04/88/14f2f4a2b922d8b39be45bf63d79e6cd3a9b2f248b2fcb98a69b12af12f5/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0cdfecef430d985f1c2bcbfff3defd1d95dae876fbd0173376012d2d7d24044b", size = 1387881 },
|
| 3685 |
+
{ url = "https://files.pythonhosted.org/packages/fd/b8/903e5ccb77b4ef140605d5d71b4f9e0ad95d456d6184688073ed11712809/sentencepiece-0.2.1-cp312-cp312-win32.whl", hash = "sha256:a483fd29a34c3e34c39ac5556b0a90942bec253d260235729e50976f5dba1068", size = 999540 },
|
| 3686 |
+
{ url = "https://files.pythonhosted.org/packages/2d/81/92df5673c067148c2545b1bfe49adfd775bcc3a169a047f5a0e6575ddaca/sentencepiece-0.2.1-cp312-cp312-win_amd64.whl", hash = "sha256:4cdc7c36234fda305e85c32949c5211faaf8dd886096c7cea289ddc12a2d02de", size = 1054671 },
|
| 3687 |
+
{ url = "https://files.pythonhosted.org/packages/fe/02/c5e3bc518655d714622bec87d83db9cdba1cd0619a4a04e2109751c4f47f/sentencepiece-0.2.1-cp312-cp312-win_arm64.whl", hash = "sha256:daeb5e9e9fcad012324807856113708614d534f596d5008638eb9b40112cd9e4", size = 1033923 },
|
| 3688 |
+
{ url = "https://files.pythonhosted.org/packages/ba/4a/85fbe1706d4d04a7e826b53f327c4b80f849cf1c7b7c5e31a20a97d8f28b/sentencepiece-0.2.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:dcd8161eee7b41aae57ded06272905dbd680a0a04b91edd0f64790c796b2f706", size = 1943150 },
|
| 3689 |
+
{ url = "https://files.pythonhosted.org/packages/c2/83/4cfb393e287509fc2155480b9d184706ef8d9fa8cbf5505d02a5792bf220/sentencepiece-0.2.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:c6c8f42949f419ff8c7e9960dbadcfbc982d7b5efc2f6748210d3dd53a7de062", size = 1325651 },
|
| 3690 |
+
{ url = "https://files.pythonhosted.org/packages/8d/de/5a007fb53b1ab0aafc69d11a5a3dd72a289d5a3e78dcf2c3a3d9b14ffe93/sentencepiece-0.2.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:097f3394e99456e9e4efba1737c3749d7e23563dd1588ce71a3d007f25475fff", size = 1253641 },
|
| 3691 |
+
{ url = "https://files.pythonhosted.org/packages/2c/d2/f552be5928105588f4f4d66ee37dd4c61460d8097e62d0e2e0eec41bc61d/sentencepiece-0.2.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d7b670879c370d350557edabadbad1f6561a9e6968126e6debca4029e5547820", size = 1316271 },
|
| 3692 |
+
{ url = "https://files.pythonhosted.org/packages/96/df/0cfe748ace5485be740fed9476dee7877f109da32ed0d280312c94ec259f/sentencepiece-0.2.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c7f0fd2f2693309e6628aeeb2e2faf6edd221134dfccac3308ca0de01f8dab47", size = 1387882 },
|
| 3693 |
+
{ url = "https://files.pythonhosted.org/packages/ac/dd/f7774d42a881ced8e1739f393ab1e82ece39fc9abd4779e28050c2e975b5/sentencepiece-0.2.1-cp313-cp313-win32.whl", hash = "sha256:92b3816aa2339355fda2c8c4e021a5de92180b00aaccaf5e2808972e77a4b22f", size = 999541 },
|
| 3694 |
+
{ url = "https://files.pythonhosted.org/packages/dd/e9/932b9eae6fd7019548321eee1ab8d5e3b3d1294df9d9a0c9ac517c7b636d/sentencepiece-0.2.1-cp313-cp313-win_amd64.whl", hash = "sha256:10ed3dab2044c47f7a2e7b4969b0c430420cdd45735d78c8f853191fa0e3148b", size = 1054669 },
|
| 3695 |
+
{ url = "https://files.pythonhosted.org/packages/c9/3a/76488a00ea7d6931689cda28726a1447d66bf1a4837943489314593d5596/sentencepiece-0.2.1-cp313-cp313-win_arm64.whl", hash = "sha256:ac650534e2251083c5f75dde4ff28896ce7c8904133dc8fef42780f4d5588fcd", size = 1033922 },
|
| 3696 |
+
{ url = "https://files.pythonhosted.org/packages/4a/b6/08fe2ce819e02ccb0296f4843e3f195764ce9829cbda61b7513f29b95718/sentencepiece-0.2.1-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:8dd4b477a7b069648d19363aad0cab9bad2f4e83b2d179be668efa672500dc94", size = 1946052 },
|
| 3697 |
+
{ url = "https://files.pythonhosted.org/packages/ab/d9/1ea0e740591ff4c6fc2b6eb1d7510d02f3fb885093f19b2f3abd1363b402/sentencepiece-0.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0c0f672da370cc490e4c59d89e12289778310a0e71d176c541e4834759e1ae07", size = 1327408 },
|
| 3698 |
+
{ url = "https://files.pythonhosted.org/packages/99/7e/1fb26e8a21613f6200e1ab88824d5d203714162cf2883248b517deb500b7/sentencepiece-0.2.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ad8493bea8432dae8d6830365352350f3b4144415a1d09c4c8cb8d30cf3b6c3c", size = 1254857 },
|
| 3699 |
+
{ url = "https://files.pythonhosted.org/packages/bc/85/c72fd1f3c7a6010544d6ae07f8ddb38b5e2a7e33bd4318f87266c0bbafbf/sentencepiece-0.2.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b81a24733726e3678d2db63619acc5a8dccd074f7aa7a54ecd5ca33ca6d2d596", size = 1315722 },
|
| 3700 |
+
{ url = "https://files.pythonhosted.org/packages/4a/e8/661e5bd82a8aa641fd6c1020bd0e890ef73230a2b7215ddf9c8cd8e941c2/sentencepiece-0.2.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0a81799d0a68d618e89063fb423c3001a034c893069135ffe51fee439ae474d6", size = 1387452 },
|
| 3701 |
+
{ url = "https://files.pythonhosted.org/packages/99/5e/ae66c361023a470afcbc1fbb8da722c72ea678a2fcd9a18f1a12598c7501/sentencepiece-0.2.1-cp313-cp313t-win32.whl", hash = "sha256:89a3ea015517c42c0341d0d962f3e6aaf2cf10d71b1932d475c44ba48d00aa2b", size = 1002501 },
|
| 3702 |
+
{ url = "https://files.pythonhosted.org/packages/c1/03/d332828c4ff764e16c1b56c2c8f9a33488bbe796b53fb6b9c4205ddbf167/sentencepiece-0.2.1-cp313-cp313t-win_amd64.whl", hash = "sha256:33f068c9382dc2e7c228eedfd8163b52baa86bb92f50d0488bf2b7da7032e484", size = 1057555 },
|
| 3703 |
+
{ url = "https://files.pythonhosted.org/packages/88/14/5aee0bf0864df9bd82bd59e7711362908e4935e3f9cdc1f57246b5d5c9b9/sentencepiece-0.2.1-cp313-cp313t-win_arm64.whl", hash = "sha256:b3616ad246f360e52c85781e47682d31abfb6554c779e42b65333d4b5f44ecc0", size = 1036042 },
|
| 3704 |
+
{ url = "https://files.pythonhosted.org/packages/24/9c/89eb8b2052f720a612478baf11c8227dcf1dc28cd4ea4c0c19506b5af2a2/sentencepiece-0.2.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:5d0350b686c320068702116276cfb26c066dc7e65cfef173980b11bb4d606719", size = 1943147 },
|
| 3705 |
+
{ url = "https://files.pythonhosted.org/packages/82/0b/a1432bc87f97c2ace36386ca23e8bd3b91fb40581b5e6148d24b24186419/sentencepiece-0.2.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:c7f54a31cde6fa5cb030370566f68152a742f433f8d2be458463d06c208aef33", size = 1325624 },
|
| 3706 |
+
{ url = "https://files.pythonhosted.org/packages/ea/99/bbe054ebb5a5039457c590e0a4156ed073fb0fe9ce4f7523404dd5b37463/sentencepiece-0.2.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c83b85ab2d6576607f31df77ff86f28182be4a8de6d175d2c33ca609925f5da1", size = 1253670 },
|
| 3707 |
+
{ url = "https://files.pythonhosted.org/packages/19/ad/d5c7075f701bd97971d7c2ac2904f227566f51ef0838dfbdfdccb58cd212/sentencepiece-0.2.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1855f57db07b51fb51ed6c9c452f570624d2b169b36f0f79ef71a6e6c618cd8b", size = 1316247 },
|
| 3708 |
+
{ url = "https://files.pythonhosted.org/packages/fb/03/35fbe5f3d9a7435eebd0b473e09584bd3cc354ce118b960445b060d33781/sentencepiece-0.2.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:01e6912125cb45d3792f530a4d38f8e21bf884d6b4d4ade1b2de5cf7a8d2a52b", size = 1387894 },
|
| 3709 |
+
{ url = "https://files.pythonhosted.org/packages/dc/aa/956ef729aafb6c8f9c443104c9636489093bb5c61d6b90fc27aa1a865574/sentencepiece-0.2.1-cp314-cp314-win32.whl", hash = "sha256:c415c9de1447e0a74ae3fdb2e52f967cb544113a3a5ce3a194df185cbc1f962f", size = 1096698 },
|
| 3710 |
+
{ url = "https://files.pythonhosted.org/packages/b8/cb/fe400d8836952cc535c81a0ce47dc6875160e5fedb71d2d9ff0e9894c2a6/sentencepiece-0.2.1-cp314-cp314-win_amd64.whl", hash = "sha256:881b2e44b14fc19feade3cbed314be37de639fc415375cefaa5bc81a4be137fd", size = 1155115 },
|
| 3711 |
+
{ url = "https://files.pythonhosted.org/packages/32/89/047921cf70f36c7b6b6390876b2399b3633ab73b8d0cb857e5a964238941/sentencepiece-0.2.1-cp314-cp314-win_arm64.whl", hash = "sha256:2005242a16d2dc3ac5fe18aa7667549134d37854823df4c4db244752453b78a8", size = 1133890 },
|
| 3712 |
+
{ url = "https://files.pythonhosted.org/packages/a1/11/5b414b9fae6255b5fb1e22e2ed3dc3a72d3a694e5703910e640ac78346bb/sentencepiece-0.2.1-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:a19adcec27c524cb7069a1c741060add95f942d1cbf7ad0d104dffa0a7d28a2b", size = 1946081 },
|
| 3713 |
+
{ url = "https://files.pythonhosted.org/packages/77/eb/7a5682bb25824db8545f8e5662e7f3e32d72a508fdce086029d89695106b/sentencepiece-0.2.1-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:e37e4b4c4a11662b5db521def4e44d4d30ae69a1743241412a93ae40fdcab4bb", size = 1327406 },
|
| 3714 |
+
{ url = "https://files.pythonhosted.org/packages/03/b0/811dae8fb9f2784e138785d481469788f2e0d0c109c5737372454415f55f/sentencepiece-0.2.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:477c81505db072b3ab627e7eab972ea1025331bd3a92bacbf798df2b75ea86ec", size = 1254846 },
|
| 3715 |
+
{ url = "https://files.pythonhosted.org/packages/ef/23/195b2e7ec85ebb6a547969f60b723c7aca5a75800ece6cc3f41da872d14e/sentencepiece-0.2.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:010f025a544ef770bb395091d57cb94deb9652d8972e0d09f71d85d5a0816c8c", size = 1315721 },
|
| 3716 |
+
{ url = "https://files.pythonhosted.org/packages/7e/aa/553dbe4178b5f23eb28e59393dddd64186178b56b81d9b8d5c3ff1c28395/sentencepiece-0.2.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:733e59ff1794d26db706cd41fc2d7ca5f6c64a820709cb801dc0ea31780d64ab", size = 1387458 },
|
| 3717 |
+
{ url = "https://files.pythonhosted.org/packages/66/7c/08ff0012507297a4dd74a5420fdc0eb9e3e80f4e88cab1538d7f28db303d/sentencepiece-0.2.1-cp314-cp314t-win32.whl", hash = "sha256:d3233770f78e637dc8b1fda2cd7c3b99ec77e7505041934188a4e7fe751de3b0", size = 1099765 },
|
| 3718 |
+
{ url = "https://files.pythonhosted.org/packages/91/d5/2a69e1ce15881beb9ddfc7e3f998322f5cedcd5e4d244cb74dade9441663/sentencepiece-0.2.1-cp314-cp314t-win_amd64.whl", hash = "sha256:5e4366c97b68218fd30ea72d70c525e6e78a6c0a88650f57ac4c43c63b234a9d", size = 1157807 },
|
| 3719 |
+
{ url = "https://files.pythonhosted.org/packages/f3/16/54f611fcfc2d1c46cbe3ec4169780b2cfa7cf63708ef2b71611136db7513/sentencepiece-0.2.1-cp314-cp314t-win_arm64.whl", hash = "sha256:105e36e75cbac1292642045458e8da677b2342dcd33df503e640f0b457cb6751", size = 1136264 },
|
| 3720 |
+
]
|
| 3721 |
+
|
| 3722 |
[[package]]
|
| 3723 |
name = "setuptools"
|
| 3724 |
version = "81.0.0"
|