Qwen3-4B — DASD Fine-tune (GSM8K)

LoRA adapter entraîné via la méthode DASD (Distribution-Aligned Sequence Distillation) sur le dataset GSM8K.

Méthode

Le modèle a été distillé depuis GPT-oss-120B (teacher) vers Qwen3-4B (étudiant) en utilisant :

Divergence-Aware Sampling (DAS) : filtre les exemples où le teacher est confiant mais l'étudiant ne l'est pas
Temperature-Scheduled Learning : Stage 1 à τ=0.3 (stable), Stage 2 à τ=0.9 (diversifié)

Entraînement réalisé avec LLaMA-Factory.

Utilisation

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit"
adapter = "While4ent/qwen3-4b-dasd-gsm8k"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Données d'entraînement

Source : GSM8K — 2000 exemples (1000 par stage)
Teacher : GPT-oss-120B via API Infomaniak
Filtre DAS : exemples conservés si P_teacher > 0.6 et divergence > 0.15

Paramètres d'entraînement

Paramètre	Stage 1	Stage 2
Learning rate	2e-4	1e-4
Epochs	3	3
Température données	0.3	0.9
Loss finale	0.057	0.079

Exemple de sortie

Question : Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Réponse :

We are told that Natalia sold clips to 48 friends in April. In May, she sold half as many as in April : 48 ÷ 2 = 24 Total = 48 + 24 = 72

TP4 — M2 IA — Basé sur "Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning" (Alibaba, 2026)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for While4ent/qwen3-4b-dasd-gsm8k

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit

Adapter

(11)

this model