Qwen2.5-14B-Function-Calling-xLAM

This model is a fine-tuned version of Qwen/Qwen2.5-14B-Instruct trained on the Salesforce/xlam-function-calling-60k dataset using SFT with LoRA adapters.

Overview

Qwen2.5-14B-Function-Calling-xLAM is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.

Key Features

  • High-Quality Fine-Tuning: Trained on N/A carefully curated examples
  • Efficient Training: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
  • Strong Performance: Achieves N/A token accuracy on evaluation set
  • Optimized for Inference: Available in multiple formats including GGUF quantizations

Model Details

Property Value
Developed by ermiaazarkhalili
License APACHE-2.0
Language English
Base Model Qwen/Qwen2.5-14B-Instruct
Model Size 14B parameters
Tensor Type BF16
Context Length 2,048 tokens
Training Method SFT with LoRA

Training Information

Training Configuration

Parameter Value
Learning Rate 0.0002
Batch Size 2 per device
Gradient Accumulation Steps 8
Effective Batch Size 16
Number of Epochs 1
Max Sequence Length 2,048 tokens
LR Scheduler Linear warmup + Cosine annealing
Warmup Ratio 0.1
Precision BF16 mixed precision
Gradient Checkpointing Enabled
Random Seed 42

LoRA Configuration

Parameter Value
LoRA Rank (r) 64
LoRA Alpha 128
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4

Training Metrics

Metric Value
Hardware NVIDIA H100 MIG

Dataset

This model was trained on the Salesforce/xlam-function-calling-60k dataset.

Split Samples
Training N/A
Evaluation N/A

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the sum of 2 + 2?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using Pipeline

from transformers import pipeline

generator = pipeline("text-generation", model="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM", device_map="auto")
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM",
    quantization_config=quantization_config,
    device_map="auto"
)

GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!"

Limitations

  • Language: Primarily trained on English data
  • Knowledge Cutoff: Limited to base model's training data cutoff
  • Hallucinations: May generate plausible-sounding but incorrect information
  • Context Length: Fine-tuned with 2,048 token limit
  • Safety: Not extensively safety-tuned; use with appropriate guardrails

Intended Use

Recommended Uses

  • Research on language model fine-tuning
  • Educational purposes
  • Personal projects
  • Prototyping conversational AI

Out-of-Scope Uses

  • Production systems without additional safety measures
  • Medical, legal, or financial advice
  • Generating harmful or misleading content

Training Framework

  • TRL: 0.24.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.0
  • Datasets: 4.3.0
  • PEFT: 0.18.0
  • BitsAndBytes: 0.49.0

Citation

@misc{ermiaazarkhalili_qwen2.5_14b_function_calling_xlam,
    author = {ermiaazarkhalili},
    title = {Qwen2.5-14B-Function-Calling-xLAM: Fine-tuned Qwen2.5-14B-Instruct on xlam-function-calling-60k},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM}}
}

Acknowledgments

  • Base model developers at Qwen
  • Hugging Face TRL Team for the training library
  • Dataset creators and contributors
  • Compute Canada / DRAC for HPC resources

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
234
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Qwen2.5-14B-Instruct_Function_Calling_xLAM

Base model

Qwen/Qwen2.5-14B
Adapter
(270)
this model
Quantizations
1 model

Dataset used to train ermiaazarkhalili/Qwen2.5-14B-Instruct_Function_Calling_xLAM

Collection including ermiaazarkhalili/Qwen2.5-14B-Instruct_Function_Calling_xLAM