Maxx โ€” qwen2.5-1.5b-agentic-v1

A fine-tuned version of Qwen2.5-1.5B-Instruct optimised for agentic tool-use and step-by-step reasoning.

Training

  • Base model: Qwen2.5-1.5B-Instruct
  • Method: QLoRA SFT โ†’ DPO (Unsloth)
  • SFT data: OpenHermes 2.5, SlimOrca, Glaive FC v2, Hermes FC, synthetic ReAct tool-use trajectories (~131K examples)
  • DPO data: Synthetic preference pairs โ€” good vs bad tool calls (~5K pairs)
  • Compute: Kaggle T4

Targets

  • HuggingFace Open LLM Leaderboard
  • Berkeley BFCL (Function Calling Leaderboard)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "bolajiev/maxx1.5Bv2",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("bolajiev/qwen2.5-1.5b-agentic-v1")
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bolajiev/maxx1.5Bv2

Finetuned
(1582)
this model