oscar128372/difl-qwen2.5-3b-math
Model details
- Base model: Qwen/Qwen2.5-3B-Instruct
- Params trained: LoRA adapters q_proj, v_proj, gate_proj, up_proj, down_proj
- Method: QLoRA-style PEFT training in 4-bit (nf4), mixed precision
- Objective: Causal LM with token-level importance weighting (DIFL)
- Context during training: up to 512 tokens per sample
About DIFL: tokens are weighted by an “importance field” that blends two signals:
- Normalized token entropy from the model’s own logits (higher entropy → more weight)
- A causal position bias that decays with distance (recent tokens receive a bit more weight)
A small smoothing regularizer encourages stable importance across valid positions.
Note: DIFL is still in testing, so this model may not be great.
What’s inside
- Importance weights: alpha_entropy=0.3, beta_causal=0.4, causal_decay=0.95
- Smoothing: lambda_smooth=0.05
- Contrastive component: disabled (gamma_contrastive=0.0)
- Label masking: only_assistant_loss=True; a simple heuristic un-masks the latter part of each conversation turn
Intended use and limitations
Intended use:
- Math Q&A, worked examples, short-form tutoring
- Helpful as a teaching or practice assistant, not a source of absolute truth
Limitations:
- Not a general-purpose model
- May produce incorrect math or flawed logic
- Trained on relatively short contexts, so longer contexts can work but are out-of-distribution for this adapter
- No formal quantitative evaluation is reported here
Training data
- Dataset: anaonymous-aad/GenQA_math
- Splits: train and test
- Sample counts: up to ~2,000 training examples and ~200 evaluation examples loaded from the dataset
- Data format: chat-style messages with roles: system, user, assistant
Please refer to the dataset card for licenses and known issues for that dataset.
Training procedure
Key hyperparameters:
- Epochs: 1
- Max length: 512
- Batch size: 2
- Gradient accumulation: 8 (effective batch ≈ 16 sequences)
- Learning rate: 2e-5 (AdamW, linear warmup 100 steps, weight decay 0.01)
- Mixed precision: fp16 by default (bf16 optional)
- Gradient checkpointing: enabled
- Quantization: 4-bit nf4 with double quant; PEFT LoRA r=8, alpha=16, dropout=0.05
How to use
Load the base model and apply the LoRA adapter:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
base = "Qwen/Qwen2.5-3B-Instruct"
adapter = "oscar128372/difl-qwen2.5-3b-math"
# Optional: 4-bit inference for low memory
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
)
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
model = AutoModelForCausalLM.from_pretrained(
base,
trust_remote_code=True,
quantization_config=bnb_config, # or remove for full-precision
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a helpful math assistant."},
{"role": "user", "content": "Find the derivative of f(x) = x^3 - 5x + 2."}
]
inputs = tok.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tok.pad_token_id,
eos_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support