Gemma-3-1B-IT: Countdown Task LoRA (Rationale Distillation)

This model is a fine-tuned version of google/gemma-3-1b-it. It contains LoRA adapters trained to solve the Countdown math puzzle as part of a Knowledge Distillation competition.

📌 Model Overview

Base Model: google/gemma-3-1b-it
Teacher Model: Qwen/Qwen2.5-Math-7B-Instruct
Training Method: Teacher-Forced Rationale Distillation (SFT) + Direct Preference Optimization (DPO).
Task: Countdown (reach a target number using provided numbers and basic arithmetic).

🧠 Distillation Approach

Instead of learning from raw equations, this model was trained to replicate the reasoning steps of the Teacher model using a strict Telegraphic Style. This reduces linguistic noise, preventing the 1B student model from wasting attention capacity on conversational fillers. DPO was applied to penalize arithmetic hallucinations and logic shortcuts identified in the student's own generations.

🚀 How to Load and Use

Since these are LoRA weights, you need to load the base Gemma model first and then apply the peft adapters.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# 1. Load Base Model
base_model_name = "google/gemma-3-1b-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# 2. Load LoRA Adapters
lora_repo = "Ilyayaya/gemma_kontur" 
model = PeftModel.from_pretrained(model, lora_repo)

# 3. Inference Example
nums = [20, 50, 2, 4, 10, 4]
target = 1007

instruction = f"Using the numbers {nums}, create an equation that equals {target}. Show work in <think> and result in <answer>."
prompt = f"<start_of_turn>user\n{instruction}<end_of_turn>\n<start_of_turn>model\n<think>\nTarget: {target}.\nStep 1:"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Note: The model performs best with Best-of-N sampling
with torch.inference_mode():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=400, 
        temperature=0.8, 
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ilyayaya/gemma_kontur

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Adapter

(185)

this model

Ilyayaya
/

gemma_kontur

Gemma-3-1B-IT: Countdown Task LoRA (Rationale Distillation)

📌 Model Overview

🧠 Distillation Approach

🚀 How to Load and Use

Model tree for Ilyayaya/gemma_kontur

Dataset used to train Ilyayaya/gemma_kontur