GPT-OSS 20B Tool-Calling (LoRA)

LoRA adapter for GPT-OSS 20B fine-tuned for tool/function calling. Use this adapter with the base model to get a model that follows tool-calling conventions (e.g. emitting structured tool calls in chat).

Model description

  • Base model: unsloth/gpt-oss-20b-unsloth-bnb-4bit
  • Task: Causal language modeling, supervised fine-tuning (SFT) for tool-calling behavior
  • Adapter: PEFT LoRA (r=8, alpha=16, dropout=0), applied to q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training data comes from zhendongnvidia/qwen3-tool-calling-sft-dataset: OpenAI-style messages with tool_calls and tool schemas, rendered with the base model’s chat template. The adapter was trained with Unsloth and TRL’s SFTTrainer.

Intended use

  • In scope: Chat assistants that use tools (function calling) in the same style as the training data.
  • Out of scope: General-purpose chat without tools; production safety or moderation (use additional safeguards as needed).

How to use

Load the base model and this PEFT adapter, then run text generation (e.g. with the same chat/tool template used in training).

With transformers and PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = "unsloth/gpt-oss-20b-unsloth-bnb-4bit"
adapter = "nileagi/gpt-oss-20b-tool-calling"  

tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
    base,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

# Example: one user turn (use your chat/tool template in practice)
messages = [{"role": "user", "content": "What's the weather in Paris?"}]
text = tokenizer.apply_chat_template(
    messages,
    tools=your_tools_list,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

With Unsloth (4-bit base)

If you use Unsloth’s FastLanguageModel for 4-bit loading:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gpt-oss-20b-unsloth-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(model)  # no new LoRA; load adapter next
model.load_adapter("nileagi/gpt-oss-20b-tool-calling")
# Then generate as above

Training

  • Data: zhendongnvidia/qwen3-tool-calling-sft-dataset (train split), up to 50k examples; 1% held out for validation. Rendered with the base tokenizer’s chat template and tool schema cleaning (see prepare_dataset.py in the source repo).
  • Objective: SFT on the rendered text (next-token prediction).
  • Setup: 1 epoch; max sequence length 2048; effective batch size 8 (batch 1 × grad_accum 8); learning rate 2e-4; warmup ratio 0.03; LoRA r=8, alpha=16, dropout=0.
  • Framework: Unsloth, TRL SFTTrainer, PEFT.

Citation

If you use this adapter or the training code, please cite the base model and TRL:

@misc{vonwerra2022trl,
  title  = {{GPT-OSS-20B: For Calling Tools}},
  author = {NileAGI},
  year   = {2026},
  url    = {https://github.com/nsomazr/temporalabs-gpt-oss-tool-usage.git},
}

License

MIT (see license in the repo).

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nileagi/gpt-oss-20b-tool-calling-lora

Base model

openai/gpt-oss-20b
Adapter
(58)
this model