Qwen3-4B — DASD Fine-tune (GSM8K)

LoRA adapter entraîné via la méthode DASD (Distribution-Aligned Sequence Distillation) sur le dataset GSM8K.

Méthode

Le modèle a été distillé depuis GPT-oss-120B (teacher) vers Qwen3-4B (étudiant) en utilisant :

  • Divergence-Aware Sampling (DAS) : filtre les exemples où le teacher est confiant mais l'étudiant ne l'est pas
  • Temperature-Scheduled Learning : Stage 1 à τ=0.3 (stable), Stage 2 à τ=0.9 (diversifié)

Entraînement réalisé avec LLaMA-Factory.

Utilisation

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit"
adapter = "While4ent/qwen3-4b-dasd-gsm8k"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Données d'entraînement

  • Source : GSM8K — 2000 exemples (1000 par stage)
  • Teacher : GPT-oss-120B via API Infomaniak
  • Filtre DAS : exemples conservés si P_teacher > 0.6 et divergence > 0.15

Paramètres d'entraînement

Paramètre Stage 1 Stage 2
Learning rate 2e-4 1e-4
Epochs 3 3
Température données 0.3 0.9
Loss finale 0.057 0.079

Exemple de sortie

Question : Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Réponse :

We are told that Natalia sold clips to 48 friends in April. In May, she sold half as many as in April : 48 ÷ 2 = 24 Total = 48 + 24 = 72

TP4 — M2 IA — Basé sur "Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning" (Alibaba, 2026)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for While4ent/qwen3-4b-dasd-gsm8k