| base_model: THUDM/GLM-4-32B-0414 | |
| library_name: peft | |
| 40% Epoch checkpoint (~40M tokens seen). Producing some interesting output but inconsistent, potential target for stabilizing RL. Saving this in case it gets worse later. |
| base_model: THUDM/GLM-4-32B-0414 | |
| library_name: peft | |
| 40% Epoch checkpoint (~40M tokens seen). Producing some interesting output but inconsistent, potential target for stabilizing RL. Saving this in case it gets worse later. |