oscar128372/difl-qwen2.5-3b-math

Model details

  • Base model: Qwen/Qwen2.5-3B-Instruct
  • Params trained: LoRA adapters q_proj, v_proj, gate_proj, up_proj, down_proj
  • Method: QLoRA-style PEFT training in 4-bit (nf4), mixed precision
  • Objective: Causal LM with token-level importance weighting (DIFL)
  • Context during training: up to 512 tokens per sample

About DIFL: tokens are weighted by an “importance field” that blends two signals:

  • Normalized token entropy from the model’s own logits (higher entropy → more weight)
  • A causal position bias that decays with distance (recent tokens receive a bit more weight)

A small smoothing regularizer encourages stable importance across valid positions.

Note: DIFL is still in testing, so this model may not be great.

What’s inside

  • Importance weights: alpha_entropy=0.3, beta_causal=0.4, causal_decay=0.95
  • Smoothing: lambda_smooth=0.05
  • Contrastive component: disabled (gamma_contrastive=0.0)
  • Label masking: only_assistant_loss=True; a simple heuristic un-masks the latter part of each conversation turn

Intended use and limitations

Intended use:

  • Math Q&A, worked examples, short-form tutoring
  • Helpful as a teaching or practice assistant, not a source of absolute truth

Limitations:

  • Not a general-purpose model
  • May produce incorrect math or flawed logic
  • Trained on relatively short contexts, so longer contexts can work but are out-of-distribution for this adapter
  • No formal quantitative evaluation is reported here

Training data

  • Dataset: anaonymous-aad/GenQA_math
  • Splits: train and test
  • Sample counts: up to ~2,000 training examples and ~200 evaluation examples loaded from the dataset
  • Data format: chat-style messages with roles: system, user, assistant

Please refer to the dataset card for licenses and known issues for that dataset.

Training procedure

Key hyperparameters:

  • Epochs: 1
  • Max length: 512
  • Batch size: 2
  • Gradient accumulation: 8 (effective batch ≈ 16 sequences)
  • Learning rate: 2e-5 (AdamW, linear warmup 100 steps, weight decay 0.01)
  • Mixed precision: fp16 by default (bf16 optional)
  • Gradient checkpointing: enabled
  • Quantization: 4-bit nf4 with double quant; PEFT LoRA r=8, alpha=16, dropout=0.05

How to use

Load the base model and apply the LoRA adapter:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base = "Qwen/Qwen2.5-3B-Instruct"
adapter = "oscar128372/difl-qwen2.5-3b-math"

# Optional: 4-bit inference for low memory
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

model = AutoModelForCausalLM.from_pretrained(
    base,
    trust_remote_code=True,
    quantization_config=bnb_config,  # or remove for full-precision
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a helpful math assistant."},
    {"role": "user", "content": "Find the derivative of f(x) = x^3 - 5x + 2."}
]

inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tok.pad_token_id,
    eos_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oscar128372/difl-qwen2.5-3b-math

Base model

Qwen/Qwen2.5-3B
Finetuned
(906)
this model

Dataset used to train oscar128372/difl-qwen2.5-3b-math