Instructions to use sriksven/ToolSmith-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sriksven/ToolSmith-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sriksven/ToolSmith-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sriksven/ToolSmith-8b")
model = AutoModelForCausalLM.from_pretrained("sriksven/ToolSmith-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sriksven/ToolSmith-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sriksven/ToolSmith-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ToolSmith-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sriksven/ToolSmith-8b

SGLang

How to use sriksven/ToolSmith-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sriksven/ToolSmith-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ToolSmith-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sriksven/ToolSmith-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sriksven/ToolSmith-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use sriksven/ToolSmith-8b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/ToolSmith-8b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sriksven/ToolSmith-8b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sriksven/ToolSmith-8b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="sriksven/ToolSmith-8b",
    max_seq_length=2048,
)

Docker Model Runner
How to use sriksven/ToolSmith-8b with Docker Model Runner:
```
docker model run hf.co/sriksven/ToolSmith-8b
```

ToolSmith-8b / README.md

sriksven

Create README.md

1204477 verified 15 days ago

preview code

raw

history blame contribute delete

3.97 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - function-calling
  - tool-use
  - qlora
  - unsloth
  - qwen2.5
  - agents
  - json
datasets:
  - glaiveai/glaive-function-calling-v2
language:
  - en
pipeline_tag: text-generation
library_name: transformers
model-index:
  - name: krishna-toolcall-7b
    results: []

krishna-toolcall-7b

A fine-tuned Qwen2.5-7B-Instruct model specialized for reliable JSON tool/function calling in AI agent workflows. Built to output structured function call schemas consistently, making it suitable for local agentic pipelines where tool invocation accuracy matters.

Key Details


Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA (4-bit NF4, rank 16, alpha 16)
Library	Unsloth + TRL SFTTrainer
Dataset	glaiveai/glaive-function-calling-v2 (10K examples)
Hardware	NVIDIA RTX A5000 (24GB VRAM) on RunPod
Training time	~2.75 hours
Final loss	0.375
Parameters trained	40.4M of 7.66B (0.53%)
Format	ChatML (`<\|im_start\|>` / `<\|im_end\|>`)
Output	Merged 16-bit safetensors

Training Metrics

Training ran for 500 steps across ~3.2 epochs. Loss decreased from 1.17 to 0.29 over training with stable gradient norms throughout.

Step	Loss	Epoch
10	1.172	0.06
100	0.428	0.64
250	0.348	1.60
400	0.331	2.57
500	0.295	3.21

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriksven/krishna-toolcall-7b")
tokenizer = AutoTokenizer.from_pretrained("sriksven/krishna-toolcall-7b")

messages = [
    {
        "role": "system",
        "content": (
            "You are a helpful assistant with access to the following functions. "
            "Use them if required -\n"
            '{"name": "get_weather", "description": "Get current weather", '
            '"parameters": {"type": "object", "properties": {"location": '
            '{"type": "string"}}, "required": ["location"]}}'
        ),
    },
    {"role": "user", "content": "What's the weather in Boston?"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="sriksven/krishna-toolcall-7b",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Intended Use

Building AI agents that invoke tools via structured JSON function calls
Local/private agentic pipelines where API-based models are not an option
Prototyping multi-agent systems with reliable tool-use behavior
Research on function-calling capabilities in open-weight 7B models

Limitations

Trained on synthetic function-calling data (glaive-v2), not real API traces
10K training examples — may not cover all tool-calling edge cases
No RLHF or DPO alignment applied — outputs may occasionally be off-format
Best used with the ChatML prompt template matching the training format
Not suitable for safety-critical applications without additional validation

Training Infrastructure


GPU	NVIDIA RTX A5000 24GB
Cloud	RunPod ($0.27/hr)
Framework	Unsloth 2026.5.2 + TRL + Transformers 5.5.0
Precision	BF16 training, 4-bit NF4 base quantization
Optimizer	AdamW 8-bit
Learning rate	2e-4, linear decay
Batch size	16 effective (4 per device × 4 accumulation)
Packing	Enabled

Source Code

Training scripts and configs: github.com/sriksven/LLM-FineTune-Suite

License

Apache 2.0