explcre commited on
Commit
bca3e91
·
verified ·
1 Parent(s): d01b5e3

server3_h100_20260502: server3_h100_20260502/README.md

Browse files
Files changed (1) hide show
  1. server3_h100_20260502/README.md +24 -0
server3_h100_20260502/README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 8 RL — server3 H100 cluster runs (2026-05-02)
2
+
3
+ This subfolder contains Phase 8 RL ablation runs from the server3 H100 cluster,
4
+ in addition to the lab cluster's runs already present in the repo.
5
+
6
+ ## Algos × ckpts
7
+ - {svgspo, dapo, gspo_v2, grpo} × {ckpt_step000100.pt (peak), ckpt_step000200.pt (final)}
8
+
9
+ ## Audit
10
+ - GSPO has a length-norm fix (Qwen 2025 geometric-mean ratio); see github commit
11
+ `fa2b5ab` on branch mllm-integrate-server3.
12
+ - DAPO is "Clip-Higher only" ablation (1 of 4 paper components).
13
+ - See EXPERIMENTS.md cycles 25-33 for full audit + cross-cluster comparison.
14
+
15
+ ## Logs
16
+ - log.jsonl — per-step training metrics (18 columns: reward channels, ratio,
17
+ KL, clip_frac).
18
+ - rollouts.jsonl — per-rollout case-study log (1600 rollouts per algo) with
19
+ full reward channel breakdown for visualization.
20
+
21
+ ## Figures
22
+ - F6_rl_training_curves.pdf — basic 6-panel grid (4 algos overlaid)
23
+ - F6_rl_rich.pdf / .png — rolling-mean + per-channel + best-of-K + clip activity
24
+ (4 rows × 4 cols, addresses "raw curve looks spiky" question).