Model Card for Qwen2.5_1.5B_Reasoning
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct. It has been trained using TRL.
This version has been fine-tuned on "Jofthomas/hermes-function-calling-thinking-V1" dataset for function calling (CoT).
Quick start
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
PEFT_MODEL_ID = f"skander-bs/Qwen2.5_1.5B_Reasoning"
BNB_CONFIG = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
def load_model_and_tokenizer():
"""Loads the base model and tokenizer, and applies the LoRA adapter."""
config = PeftConfig.from_pretrained(PEFT_MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(PEFT_MODEL_ID)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, PEFT_MODEL_ID)
model.to(torch.bfloat16)
model.eval()
return model, tokenizer
def generate_response(model, tokenizer, prompt, max_new_tokens=300):
"""Generates a response given a prompt using the fine-tuned model."""
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to("cuda") for k, v in inputs.items()} # Move tensors to GPU
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
top_p=0.95,
temperature=0.01,
repetition_penalty=1.0,
eos_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
def main():
"""Loads the model, prepares a prompt, and generates a response."""
model, tokenizer = load_model_and_tokenizer()
prompt = """<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.
You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
Here are the available tools:<tools>
[{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another',
'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'},
'from_currency': {'type': 'string', 'description': 'The currency to convert from'},
'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}},
{'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations',
'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'},
'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}]
</tools>
Use the following pydantic model json schema for each tool call you will make:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'},
'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>
Also, before making a call to a function take the time to plan the function to take.
Make that thinking process between <think>{your thoughts}</think>
Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
<start_of_turn>model
<think>"""
response = generate_response(model, tokenizer, prompt)
print("\nGenerated Response:\n", response)
Training procedure
This model was trained with SFT.
Framework versions
- TRL: 0.15.1
- Transformers: 4.49.0
- Pytorch: 2.6.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support