HybriKo-117M-ToolLLaMA-SFT

A HybriKo model fine-tuned for Function Calling / Tool Use using the ToolLLaMA dataset (187k samples).

Model Description

Property Value
Base Model HybriKo-117M (Griffin-inspired RNN+Attention Hybrid)
Parameters 117.8M
Context Length 8192 tokens
Training Data ToolLLaMA G123 DFS (187,542 samples)
Training Time ~71 minutes on A100 x 8
Final Loss 0.90
Final PPL 2.46

Architecture

HybriKo uses a 2:1 hybrid ratio of RNN (RGLRU) to Attention blocks:

  • Layers 1, 2: Griffin Block (RGLRU)
  • Layer 3: Attention Block (GQA with RoPE)
  • Pattern repeats...

This provides efficient long-context modeling with linear complexity for most layers.

Training Details (arXiv:2512.15943 Settings)

Hyperparameter Value
Learning Rate 5e-5
Max Grad Norm 0.3 (aggressive clipping)
Batch Size 256 effective (16 x 8 GPUs x 2 grad accum)
Epochs 1
Context Length 8192
Optimizer AdamW

Usage (Colab Compatible)

Installation

pip install torch transformers sentencepiece huggingface_hub

Quick Start

import torch
import sentencepiece as spm
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM, AutoConfig

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Yaongi/HybriKo-117M-ToolLLaMA-SFT",
    trust_remote_code=True,
    torch_dtype=torch.float32
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

# Load tokenizer
sp_path = hf_hub_download("Yaongi/HybriKo-117M-ToolLLaMA-SFT", "HybriKo_tok.model")
sp = spm.SentencePieceProcessor()
sp.Load(sp_path)

# Special tokens
SPECIAL_TOKENS = {
    "<|im_start|>": 32000,
    "<|im_end|>": 32001,
    "<thought>": 32002,
    "</thought>": 32003,
    "<tool_call>": 32004,
    "</tool_call>": 32005,
    "<tools>": 32006,
    "</tools>": 32007,
}

def encode(text):
    """Encode text with special token handling."""
    for token, token_id in SPECIAL_TOKENS.items():
        text = text.replace(token, f" \x00{token_id}\x00 ")

    tokens = []
    for part in text.split("\x00"):
        if part.strip().isdigit() and int(part.strip()) in SPECIAL_TOKENS.values():
            tokens.append(int(part.strip()))
        elif part.strip():
            tokens.extend(sp.EncodeAsIds(part))
    return tokens

def decode(ids):
    """Decode token IDs to text."""
    id_to_token = {v: k for k, v in SPECIAL_TOKENS.items()}
    result = []
    regular_ids = []

    for id in ids:
        if id in id_to_token:
            if regular_ids:
                result.append(sp.DecodeIds(regular_ids))
                regular_ids = []
            result.append(id_to_token[id])
        else:
            regular_ids.append(id)

    if regular_ids:
        result.append(sp.DecodeIds(regular_ids))

    return "".join(result)

@torch.no_grad()
def generate(prompt, max_new_tokens=200, temperature=0.7, top_k=50):
    """Generate with stop sequence detection."""
    input_ids = torch.tensor([encode(prompt)]).to(device)
    stop_sequences = ["<|im_end|>", "</tool_call>"]

    for _ in range(max_new_tokens):
        logits = model(input_ids)["logits"][:, -1] / temperature

        if top_k:
            v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
            logits[logits < v[:, [-1]]] = float("-inf")

        probs = torch.softmax(logits, dim=-1)
        next_token = torch.multinomial(probs, 1)
        input_ids = torch.cat([input_ids, next_token], dim=1)

        # Check stop sequences
        text = decode(input_ids[0].tolist())
        for stop in stop_sequences:
            if stop in text.split(prompt)[-1]:
                return text

    return decode(input_ids[0].tolist())

Example: Weather API Call

prompt = """<|im_start|>system
You are an AI assistant with access to tools.
<tools>
{"name": "get_weather", "description": "Get current weather", "parameters": {"location": {"type": "string", "description": "City name"}}}
</tools><|im_end|>
<|im_start|>user
What's the weather in Seoul?<|im_end|>
<|im_start|>assistant
"""

response = generate(prompt, temperature=0.3, top_k=10)
print(response)

Expected Output:

<|im_start|>assistant
<thought>
The user wants to know the weather in Seoul. I should call the get_weather function.
</thought>
<tool_call>
{"name": "get_weather", "arguments": {"location": "Seoul"}}
</tool_call><|im_end|>

Example: Search Query

prompt = """<|im_start|>system
You are an AI assistant with access to tools.
<tools>
{"name": "web_search", "description": "Search the web", "parameters": {"query": {"type": "string"}}}
</tools><|im_end|>
<|im_start|>user
Find information about the latest AI research<|im_end|>
<|im_start|>assistant
"""

response = generate(prompt, temperature=0.3)
print(response)

Prompt Format (Hermes ChatML)

<|im_start|>system
You are an AI assistant with access to tools.
<tools>
[Tool definitions in JSON format]
</tools><|im_end|>
<|im_start|>user
[User message]<|im_end|>
<|im_start|>assistant
<thought>
[Model's reasoning]
</thought>
<tool_call>
{"name": "[tool_name]", "arguments": {...}}
</tool_call><|im_end|}
<|im_start|>tool
[Tool response]<|im_end|>
<|im_start|>assistant
[Final response]<|im_end|>

Special Tokens

Token ID Purpose
<|im_start|> 32000 Start of message
<|im_end|> 32001 End of message
<thought> 32002 Start of reasoning
</thought> 32003 End of reasoning
<tool_call> 32004 Start of tool call
</tool_call> 32005 End of tool call
<tools> 32006 Start of tool definitions
</tools> 32007 End of tool definitions

Training Loss Curve

Step Loss PPL
10 6.72 825
100 2.15 8.6
400 1.06 2.9
730 (final) 0.90 2.5

Limitations

  • Optimized for English tool-calling; Korean support is limited
  • 117M parameters - suitable for edge deployment but less capable than larger models
  • Best with structured tool-calling format; may struggle with free-form conversation

Citation

If you use this model, please cite:

@misc{hybridko-toolllama-sft,
  title={HybriKo-117M-ToolLLaMA-SFT: Function Calling Fine-tuned Hybrid RNN-Attention Model},
  author={Yaongi},
  year={2024},
  url={https://huggingface.co/Yaongi/HybriKo-117M-ToolLLaMA-SFT}
}

License

Apache 2.0

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Yaongi/HybriKo-117M-ToolLLaMA-SFT