jayantaggarwal-sketch
Sync latest project updates to Hugging Face Space.
d53a65c

True LLM Learning Evaluation (Pre-RL vs Post-RL)

This folder is for checkpoint-vs-checkpoint evidence:

  • pre-RL base model
  • post-RL trained checkpoint

Both are evaluated with an identical protocol.

Required environment variables

  • BASELINE_MODEL_NAME
  • TRAINED_MODEL_PATH (local directory with adapter_config.json)
  • ENV_BASE_URL (CommitmentOS HTTP API)

Optional:

  • HF_TOKEN (gated Hub models / rate limits)

Optional protocol overrides:

  • EVAL_SEED (default: 42)
  • EVAL_MAX_STEPS (default: 12)
  • EVAL_TEMPERATURE (default: 0.0)
  • EVAL_TOP_P (default: 1.0)
  • EVAL_MAX_NEW_TOKENS (default: 256)
  • EVAL_SUCCESS_THRESHOLD (default: 0.6)

Run

cd commitment_os
pip install -e ".[llm-eval]"
python3 evaluation/evaluate_llm_checkpoints.py
python3 evaluation/plot_llm_checkpoints.py

The evaluator prints one line per task ([eval …] task i/n) so long Colab runs do not look frozen.

After Colab

Zip weights + artifacts for download (paths assume /content/commitment_os):

cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm

Or copy training_output/ and artifacts/evals_llm/ to Google Drive if the zip is too large for the browser.

These bundles are not checked into git (clone speed + history). A ~330MB zip (weights + this folder) is a normal size: publish it as a GitHub Release asset, HF Hub, or Google Drive.

Drive (weights + this folder): commitment_os_bundle — after download you should have artifacts/evals_llm/ (this layout) next to training_output/. See root README for gdown / TRAINED_MODEL_PATH notes.

Expected outputs

  • llm_eval_protocol.json
  • baseline_llm_eval.json
  • trained_llm_eval.json
  • llm_comparison.csv
  • llm_summary.json
  • llm_case_study_hard_015.md
  • llm_reward_by_task.svg
  • llm_violations_before_after.svg