oscar128372/difl-qwen2.5-3b-math

Model details

Base model: Qwen/Qwen2.5-3B-Instruct
Params trained: LoRA adapters q_proj, v_proj, gate_proj, up_proj, down_proj
Method: QLoRA-style PEFT training in 4-bit (nf4), mixed precision
Objective: Causal LM with token-level importance weighting (DIFL)
Context during training: up to 512 tokens per sample

About DIFL: tokens are weighted by an “importance field” that blends two signals:

Normalized token entropy from the model’s own logits (higher entropy → more weight)
A causal position bias that decays with distance (recent tokens receive a bit more weight)

A small smoothing regularizer encourages stable importance across valid positions.

Note: DIFL is still in testing, so this model may not be great.

What’s inside

Importance weights: alpha_entropy=0.3, beta_causal=0.4, causal_decay=0.95
Smoothing: lambda_smooth=0.05
Contrastive component: disabled (gamma_contrastive=0.0)
Label masking: only_assistant_loss=True; a simple heuristic un-masks the latter part of each conversation turn

Intended use and limitations

Intended use:

Math Q&A, worked examples, short-form tutoring
Helpful as a teaching or practice assistant, not a source of absolute truth

Limitations:

Not a general-purpose model
May produce incorrect math or flawed logic
Trained on relatively short contexts, so longer contexts can work but are out-of-distribution for this adapter
No formal quantitative evaluation is reported here

Training data

Dataset: anaonymous-aad/GenQA_math
Splits: train and test
Sample counts: up to ~2,000 training examples and ~200 evaluation examples loaded from the dataset
Data format: chat-style messages with roles: system, user, assistant

Please refer to the dataset card for licenses and known issues for that dataset.

Training procedure

Key hyperparameters:

Epochs: 1
Max length: 512
Batch size: 2
Gradient accumulation: 8 (effective batch ≈ 16 sequences)
Learning rate: 2e-5 (AdamW, linear warmup 100 steps, weight decay 0.01)
Mixed precision: fp16 by default (bf16 optional)
Gradient checkpointing: enabled
Quantization: 4-bit nf4 with double quant; PEFT LoRA r=8, alpha=16, dropout=0.05

How to use

Load the base model and apply the LoRA adapter:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base = "Qwen/Qwen2.5-3B-Instruct"
adapter = "oscar128372/difl-qwen2.5-3b-math"

# Optional: 4-bit inference for low memory
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

model = AutoModelForCausalLM.from_pretrained(
    base,
    trust_remote_code=True,
    quantization_config=bnb_config,  # or remove for full-precision
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a helpful math assistant."},
    {"role": "user", "content": "Find the derivative of f(x) = x^3 - 5x + 2."}
]

inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tok.pad_token_id,
    eos_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oscar128372/difl-qwen2.5-3b-math

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(906)

this model

oscar128372
/

difl-qwen2.5-3b-math