GPT-OSS 20B Tool-Calling (LoRA)
LoRA adapter for GPT-OSS 20B fine-tuned for tool/function calling. Use this adapter with the base model to get a model that follows tool-calling conventions (e.g. emitting structured tool calls in chat).
Model description
- Base model: unsloth/gpt-oss-20b-unsloth-bnb-4bit
- Task: Causal language modeling, supervised fine-tuning (SFT) for tool-calling behavior
- Adapter: PEFT LoRA (r=8, alpha=16, dropout=0), applied to
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Training data comes from zhendongnvidia/qwen3-tool-calling-sft-dataset: OpenAI-style messages with tool_calls and tool schemas, rendered with the base model’s chat template. The adapter was trained with Unsloth and TRL’s SFTTrainer.
Intended use
- In scope: Chat assistants that use tools (function calling) in the same style as the training data.
- Out of scope: General-purpose chat without tools; production safety or moderation (use additional safeguards as needed).
How to use
Load the base model and this PEFT adapter, then run text generation (e.g. with the same chat/tool template used in training).
With transformers and PEFT
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base = "unsloth/gpt-oss-20b-unsloth-bnb-4bit"
adapter = "nileagi/gpt-oss-20b-tool-calling"
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
base,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
# Example: one user turn (use your chat/tool template in practice)
messages = [{"role": "user", "content": "What's the weather in Paris?"}]
text = tokenizer.apply_chat_template(
messages,
tools=your_tools_list,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
With Unsloth (4-bit base)
If you use Unsloth’s FastLanguageModel for 4-bit loading:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gpt-oss-20b-unsloth-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(model) # no new LoRA; load adapter next
model.load_adapter("nileagi/gpt-oss-20b-tool-calling")
# Then generate as above
Training
- Data:
zhendongnvidia/qwen3-tool-calling-sft-dataset(train split), up to 50k examples; 1% held out for validation. Rendered with the base tokenizer’s chat template and tool schema cleaning (seeprepare_dataset.pyin the source repo). - Objective: SFT on the rendered text (next-token prediction).
- Setup: 1 epoch; max sequence length 2048; effective batch size 8 (batch 1 × grad_accum 8); learning rate 2e-4; warmup ratio 0.03; LoRA r=8, alpha=16, dropout=0.
- Framework: Unsloth, TRL SFTTrainer, PEFT.
Citation
If you use this adapter or the training code, please cite the base model and TRL:
@misc{vonwerra2022trl,
title = {{GPT-OSS-20B: For Calling Tools}},
author = {NileAGI},
year = {2026},
url = {https://github.com/nsomazr/temporalabs-gpt-oss-tool-usage.git},
}
License
MIT (see license in the repo).
- Downloads last month
- 1
Model tree for nileagi/gpt-oss-20b-tool-calling-lora
Base model
openai/gpt-oss-20b
Quantized
unsloth/gpt-oss-20b-unsloth-bnb-4bit