Qwen-Function-Calling-xLAM
Collection
5 items • Updated
This model is a fine-tuned version of Qwen/Qwen2.5-14B-Instruct trained on the Salesforce/xlam-function-calling-60k dataset using SFT with LoRA adapters.
Qwen2.5-14B-Function-Calling-xLAM is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.
| Property | Value |
|---|---|
| Developed by | ermiaazarkhalili |
| License | APACHE-2.0 |
| Language | English |
| Base Model | Qwen/Qwen2.5-14B-Instruct |
| Model Size | 14B parameters |
| Tensor Type | BF16 |
| Context Length | 2,048 tokens |
| Training Method | SFT with LoRA |
| Parameter | Value |
|---|---|
| Learning Rate | 0.0002 |
| Batch Size | 2 per device |
| Gradient Accumulation Steps | 8 |
| Effective Batch Size | 16 |
| Number of Epochs | 1 |
| Max Sequence Length | 2,048 tokens |
| LR Scheduler | Linear warmup + Cosine annealing |
| Warmup Ratio | 0.1 |
| Precision | BF16 mixed precision |
| Gradient Checkpointing | Enabled |
| Random Seed | 42 |
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 |
| Metric | Value |
|---|---|
| Hardware | NVIDIA H100 MIG |
This model was trained on the Salesforce/xlam-function-calling-60k dataset.
| Split | Samples |
|---|---|
| Training | N/A |
| Evaluation | N/A |
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the sum of 2 + 2?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
from transformers import pipeline
generator = pipeline("text-generation", model="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM", device_map="auto")
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
"ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM",
quantization_config=quantization_config,
device_map="auto"
)
For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF
ollama pull hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!"
@misc{ermiaazarkhalili_qwen2.5_14b_function_calling_xlam,
author = {ermiaazarkhalili},
title = {Qwen2.5-14B-Function-Calling-xLAM: Fine-tuned Qwen2.5-14B-Instruct on xlam-function-calling-60k},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM}}
}
For questions or issues, please open an issue on the model repository.