--- base_model: THUDM/GLM-4-32B-0414 library_name: peft --- 40% Epoch checkpoint (~40M tokens seen). Producing some interesting output but inconsistent, potential target for stabilizing RL. Saving this in case it gets worse later.