qtzx06 commited on
Commit
8da9024
·
1 Parent(s): 0b5e8b0

docs: log Qwen 3.5 9B inference test on H100 (reward=0.25)

Browse files
Files changed (1) hide show
  1. docs/process.md +11 -0
docs/process.md CHANGED
@@ -70,3 +70,14 @@ Logging rules:
70
  - Rewrote `train/minimal_trl_openenv.py` to include both `--mode handcrafted` (quick demo) and `--mode train` (TRL GRPO with Qwen2.5-Coder-0.5B).
71
  - Verified end-to-end: server starts, `/health` returns OK, handcrafted rollout completes with reward=0.125.
72
  - Next: make repo public, deploy to HF Spaces, test training script in Colab.
 
 
 
 
 
 
 
 
 
 
 
 
70
  - Rewrote `train/minimal_trl_openenv.py` to include both `--mode handcrafted` (quick demo) and `--mode train` (TRL GRPO with Qwen2.5-Coder-0.5B).
71
  - Verified end-to-end: server starts, `/health` returns OK, handcrafted rollout completes with reward=0.125.
72
  - Next: make repo public, deploy to HF Spaces, test training script in Colab.
73
+
74
+ ## 2026-03-07 18:30 PST
75
+
76
+ - Made repo public via `gh repo edit --visibility public`.
77
+ - Set up Northflank H100 (80GB HBM3) instance with PyTorch 2.8.0 + CUDA 12.6.
78
+ - Installed project and all deps on the H100 pod.
79
+ - Ran Qwen 3.5 9B (`Qwen/Qwen3.5-9B`) inference test against live Zero960 env on H100.
80
+ - Model downloaded 19.3GB weights at ~3.3GB/s, loaded in ~7s on H100.
81
+ - Result: model ran 2-step episode (run_static_eval → finish), got reward=0.25.
82
+ - Model reasons coherently about the Chess960 task but doesn't attempt code edits yet (expected pre-training baseline).
83
+ - Next: run GRPO training loop to teach the model to edit eval.py and improve match scores.