--- base_model: - ConicCat/GLM-4.1V-Text-9B-Base datasets: - ConicCat/TuluAmoral50K-MIG --- Preview / Testing tune of GLM4-9B for data mix and hyperparameters. I'm hoping to find a roughly optimal SFT setup for GLM 9B base, before testing KTO and transfering over the setup to the new Arcee GLM4-32B-Base pretrain. This is a hybrid thinking model that defaults to no thinking, but can think if prompted to and prefilled with the `` tags. | dataset | version | metric | mode | Marvin-9B_hf-vllm | |----- | ----- | ----- | ----- | -----| | GPQA_diamond | 5aeece | accuracy | gen | 33.33 | | ARC-c | 1e0de5 | accuracy | gen | 82.71 | | dataset | version | metric | mode | GLM-4-9B-0414_hf-vllm | |----- | ----- | ----- | ----- | -----| | GPQA_diamond | 5aeece | accuracy | gen | 34.34 | About equivalent on GPQA to the official checkpoint w/o RL which is pretty nice too.