Gemma-3-1B-IT — Knights-and-Knaves SFT

Paper Link: https://arxiv.org/abs/2605.28814

Cold-start supervised-fine-tuned (SFT) model of google/gemma-3-1b-it on the Knights-and-Knaves (K&K) logic-puzzle dataset.

For the post-trained model on top of this SFT model, see Xkev/gemma-3-1b-it-kk-bes.

Training

  • Base model: google/gemma-3-1b-it
  • Dataset: K&K puzzles (1k subset), formatted as chat with reasoning + JSON answer
  • Framework: verl sft_trainer
  • Hyperparameters: lr=1e-5, weight_decay=0.01, lr_warmup_ratio=0.1, cosine schedule, epochs=3, dtype=bf16

Intended use

Research on logical reasoning and post-training. Not intended for general dialog or production.

License

MIT. Base model google/gemma-3-1b-it is governed by Google's Gemma Terms of Use, which still apply to this model.

Downloads last month
53
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xkev/gemma-3-1b-it-kk

Finetuned
(554)
this model
Finetunes
1 model

Dataset used to train Xkev/gemma-3-1b-it-kk

Collection including Xkev/gemma-3-1b-it-kk

Paper for Xkev/gemma-3-1b-it-kk