Instructions to use Arojit/orbi-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Arojit/orbi-1b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Arojit/orbi-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Arojit/orbi-1b")
model = AutoModelForCausalLM.from_pretrained("Arojit/orbi-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use Arojit/orbi-1b with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Arojit/orbi-1b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Arojit/orbi-1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arojit/orbi-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Arojit/orbi-1b

SGLang

How to use Arojit/orbi-1b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Arojit/orbi-1b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arojit/orbi-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Arojit/orbi-1b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arojit/orbi-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Arojit/orbi-1b with Docker Model Runner:
```
docker model run hf.co/Arojit/orbi-1b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Orbi-1B

Orbi-1B is a fine-tuned variant of TinyLlama-1.1B-Chat specialized for function calling and robotic assistant interactions. The model is trained to generate structured tool calls in response to natural language commands.

Model Description

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Size: 1.1B parameters
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Precision: bfloat16
License: Apache 2.0 (inherited from base model)

Intended Use

Orbi-1B is designed to act as the "brain" of a robotic assistant named Orbi. It translates natural language user requests into structured JSON tool calls that can be executed by downstream systems.

Supported Tools

The model can generate calls for the following functions:

Physical Actions: smile(), cry(), move_hands(), dance()
Content Generation: tell_news(), tell_story()
Information: whats_your_name(), who_am_i()
Utilities: answer_arithmetic(), english_learning()

Usage

Installation

pip install transformers torch peft

Basic Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_dir = "your-username/orbi-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

system_prompt = """You are Orbi's brain.
Respond with one or more <tool_call> JSON blocks, in the exact order the user requests actions.
that calls the best tool for the user's request. Do not write stories yourself.
Do not summarize news yourself. Map synonyms to the tool argument enums.
If parameters are missing, pick sensible defaults. Keep outputs terse.

Available tools and enums:
- smile() -> {}
- cry() -> {}
- move_hands(direction ∈ {left,right,up,down,wave}, speed ∈ {slow,normal,fast})
- dance(style ∈ {hiphop,ballet,robot,random}, duration_sec ∈ [10..120])
- tell_news(topic: string)
- tell_story(topic: string, tone ∈ {wholesome,funny,dramatic,spooky,random}, length ∈ {short,medium,long})
"""

user_input = "Wave your hands quickly and smile"
prompt = f"<|system|>\n{system_prompt}\n<|user|>\n{user_input}\n<|assistant|>\n"

inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(
    **inputs,
    max_new_tokens=192,
    temperature=0.0,
    do_sample=False
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
print(response)

Expected Output Format

The model generates responses in the following format:

<tool_call>
{"name": "move_hands", "arguments": {"direction": "wave", "speed": "fast"}}
</tool_call>
<tool_call>
{"name": "smile", "arguments": {}}
</tool_call>

Parsing Tool Calls

import json
import re

def parse_tool_calls(text):
    pattern = r"<tool_call>\s*(\{.*?\})\s*</tool_call>"
    matches = re.findall(pattern, text, re.DOTALL)
    tools = []
    for match in matches:
        try:
            tools.append(json.loads(match))
        except:
            continue
    return tools

tools = parse_tool_calls(response)
print(tools)
# [{'name': 'move_hands', 'arguments': {'direction': 'wave', 'speed': 'fast'}},
#  {'name': 'smile', 'arguments': {}}]

Training Details

Training Data

The model was fine-tuned on a custom dataset of conversational examples mapping natural language commands to structured tool calls in JSONL format.

Training Procedure

Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization: 4-bit with NF4 quantization during training
Optimizer: Paged AdamW (32-bit)
Learning Rate: 2e-4 with cosine scheduling
Batch Size: 8 per device with 2 gradient accumulation steps
Epochs: 2
Warmup Ratio: 0.03

Limitations

The model is specialized for a specific set of tools and may not generalize well to arbitrary function calling tasks
Limited to 1.1B parameters, so reasoning capabilities are constrained compared to larger models
Best performance with greedy decoding (temperature=0.0)
Requires exact tool names and argument formats as specified in the system prompt

Ethical Considerations

This model is designed for robotic assistant applications. Users should:

Ensure appropriate safety measures when connecting to physical robotic systems
Validate all tool calls before execution
Implement proper error handling and fallback mechanisms
Consider privacy implications when using news/story generation features

Citation

@misc{orbi-1b,
  title={Orbi-1B: A Fine-tuned TinyLlama for Function Calling},
  author={Arojit Ghosh},
  year={2025},
  howpublished={\url{https://huggingface.co/Arojit/orbi-1b}}
}

Acknowledgments

Built on top of TinyLlama-1.1B-Chat-v1.0 by the TinyLlama team.

Downloads last month: 1

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for Arojit/orbi-1b

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1515)

this model

Adapters

1 model

Quantizations

1 model