LES-wrapper: Learning Efficiency Score Evaluation
Overview
The LES-wrapper automates the evaluation of model trainability across multiple training checkpoints. It runs inference on PRS (Positive Reference Set) and RRS (Random Reference Set) datasets at each checkpoint, computes ROC metrics, and derives integrated learning efficiency scores.
What is LES?
LES (Learning Efficiency Score) is defined as the area under the metric-vs-iteration curve. Unlike metrics that measure only final performance, LES captures the entire learning trajectory:
- LES-AUC: Area under the AUC trajectory curve
- LES-F1: Area under the Best-F1 trajectory curve
- LES-Threshold: Area under the optimal-threshold trajectory curve
Higher LES values indicate faster learning, better overall performance across training, and more efficient use of training iterations.
Workflow
For each checkpoint the wrapper:
- Runs inference on PRS and RRS prompt files
- Extracts softmax probabilities for the positive class
- Combines probabilities into a single file for ROC analysis
- Computes AUC, Best-F1, and optimal threshold
- Generates a color-coded ROC curve plot
- Aggregates results into a summary table
- Plots metric trajectories across checkpoints
- Computes LES values for each metric
Installation
conda activate gpt
pip install scikit-learn matplotlib numpy
Basic Usage
python LES-wrapper.py \
--checkpoint_dir out_ppiGPLM_MED4 \
--prs_file MED4_Int_100pairs_prompts.txt \
--rrs_file MED4_100_RND_prompts.txt \
--output_dir LES_results_MED4 \
--vanilla
The --vanilla flag is required when evaluating standard ppiGPLM checkpoints
(i.e., checkpoints trained with train_.py rather than a HOPE-variant trainer).
Common Patterns
Selecting Specific Checkpoints
# Only checkpoints at iterations 1000, 2000, and 5000
python LES-wrapper.py \
--checkpoint_dir out_ppiGPLM_MED4 \
--prs_file prs.txt \
--rrs_file rrs.txt \
--output_dir results \
--checkpoint_pattern "ckpt_[125]000.pt" \
--vanilla
# Every 5000 iterations
python LES-wrapper.py \
--checkpoint_dir out_ppiGPLM_MED4 \
--prs_file prs.txt \
--rrs_file rrs.txt \
--output_dir results \
--checkpoint_pattern "ckpt_*000.pt" \
--vanilla
Skipping Inference (Re-computing Metrics Only)
If you have already run inference and just want to recompute metrics or plots:
python LES-wrapper.py \
--checkpoint_dir out_ppiGPLM_MED4 \
--prs_file MED4_Int_100pairs_prompts.txt \
--rrs_file MED4_100_RND_prompts.txt \
--output_dir LES_results_MED4 \
--skip_inference \
--vanilla
Plot Customization
Use --no_plots to skip trajectory figure generation when you only need the
summary CSV.
Command-Line Arguments (Vanilla Flags)
| Argument | Default | Description |
|---|---|---|
--checkpoint_dir |
(required) | Directory containing checkpoint files (ckpt_*.pt) |
--prs_file |
(required) | Path to Positive Reference Set prompts file |
--rrs_file |
(required) | Path to Random Reference Set prompts file |
--output_dir |
LES_results |
Directory for all output files |
--checkpoint_pattern |
ckpt_*.pt |
Glob pattern to select checkpoints |
--include_final |
False | Also evaluate ckpt.pt (the final checkpoint) |
--no_plots |
False | Skip generating trajectory plots |
--skip_inference |
False | Skip inference; reuse existing probability files |
--vanilla |
False | Use standard GPT checkpoint format (required for ppiGPLM) |
Output Structure
LES_results/
βββ ckpt_1000/
β βββ PRS_iter1000_probabilities.csv
β βββ PRS_iter1000_classifications.txt
β βββ RRS_iter1000_probabilities.csv
β βββ RRS_iter1000_classifications.txt
β βββ combined_probabilities_iter1000.csv
β βββ ROC_iter1000.png
β βββ inference_log.md
βββ ckpt_2000/ ...
βββ trajectory_AUC.png
βββ trajectory_F1.png
βββ trajectory_Threshold.png
βββ trajectory_combined.png
βββ summary_table.csv
βββ manifest.json
summary_table.csv contains per-checkpoint metrics plus a final row with the
integrated LES values. manifest.json records complete run metadata.
Appendix: HOPE/Titan Checkpoint Support (not used for vanilla ppiGPLM)
The script also supports checkpoints from HOPE/Titan-variant trainers. These flags
are no-ops when --vanilla is passed, so they do not affect standard ppiGPLM runs.
| Flag | Description |
|---|---|
--use_titan_in_forward |
Override Titan memory-in-forward flag (-1 = use checkpoint value) |
--enable_surprise_updates |
Enable Titan surprise-based memory updates (0/1) |
--surprise_update_in_eval |
Allow memory updates during evaluation (0/1) |
--adapt_mode |
Prefix-adaptation mode: none or prefix |
--adapt_steps |
Number of adaptation steps or teaching epochs |
--memory_state_in |
Path to a saved memory-only state file |
--teach_file |
CSV of supervised teaching pairs for pre-evaluation conditioning |
--teach_delim |
Delimiter for the teaching CSV (default |) |
--teach_has_header |
Whether the teaching CSV has a header row (0/1) |
--teach_reset_policy |
Memory reset policy during teaching: pair, file, or none |
--teach_shuffle |
Shuffle teaching examples each epoch (0/1) |
--teach_max_rows |
Limit teaching rows; 0 = use all |
These flags are relevant to the HOPE project (if/when it has its own public repo).