---
base_model:
- ConicCat/GLM-4.1V-Text-9B-Base
datasets:
- ConicCat/TuluAmoral50K-MIG
---

Preview / Testing tune of GLM4-9B for data mix and hyperparameters. I'm hoping to find a roughly optimal SFT setup for GLM 9B base, before testing KTO and transfering over the setup to the new Arcee GLM4-32B-Base pretrain.

This is a hybrid thinking model that defaults to no thinking, but can think if prompted to and prefilled with the `<think>` tags.

| dataset | version | metric | mode | Marvin-9B_hf-vllm |
|----- | ----- | ----- | ----- | -----|
| GPQA_diamond | 5aeece | accuracy | gen | 33.33 |
| ARC-c | 1e0de5 | accuracy | gen | 82.71 |


| dataset | version | metric | mode | GLM-4-9B-0414_hf-vllm |
|----- | ----- | ----- | ----- | -----|
| GPQA_diamond | 5aeece | accuracy | gen | 34.34 |


About equivalent on GPQA to the official checkpoint w/o RL which is pretty nice too.