Instructions to use Arojit/orbi-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Arojit/orbi-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Arojit/orbi-1b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Arojit/orbi-1b") model = AutoModelForCausalLM.from_pretrained("Arojit/orbi-1b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use Arojit/orbi-1b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Arojit/orbi-1b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Arojit/orbi-1b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arojit/orbi-1b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Arojit/orbi-1b
- SGLang
How to use Arojit/orbi-1b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Arojit/orbi-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arojit/orbi-1b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Arojit/orbi-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Arojit/orbi-1b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Arojit/orbi-1b with Docker Model Runner:
docker model run hf.co/Arojit/orbi-1b
Orbi-1B
Orbi-1B is a fine-tuned variant of TinyLlama-1.1B-Chat specialized for function calling and robotic assistant interactions. The model is trained to generate structured tool calls in response to natural language commands.
Model Description
- Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Model Size: 1.1B parameters
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Precision: bfloat16
- License: Apache 2.0 (inherited from base model)
Intended Use
Orbi-1B is designed to act as the "brain" of a robotic assistant named Orbi. It translates natural language user requests into structured JSON tool calls that can be executed by downstream systems.
Supported Tools
The model can generate calls for the following functions:
- Physical Actions:
smile(),cry(),move_hands(),dance() - Content Generation:
tell_news(),tell_story() - Information:
whats_your_name(),who_am_i() - Utilities:
answer_arithmetic(),english_learning()
Usage
Installation
pip install transformers torch peft
Basic Inference
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_dir = "your-username/orbi-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.bfloat16,
device_map="auto"
)
system_prompt = """You are Orbi's brain.
Respond with one or more <tool_call> JSON blocks, in the exact order the user requests actions.
that calls the best tool for the user's request. Do not write stories yourself.
Do not summarize news yourself. Map synonyms to the tool argument enums.
If parameters are missing, pick sensible defaults. Keep outputs terse.
Available tools and enums:
- smile() -> {}
- cry() -> {}
- move_hands(direction ∈ {left,right,up,down,wave}, speed ∈ {slow,normal,fast})
- dance(style ∈ {hiphop,ballet,robot,random}, duration_sec ∈ [10..120])
- tell_news(topic: string)
- tell_story(topic: string, tone ∈ {wholesome,funny,dramatic,spooky,random}, length ∈ {short,medium,long})
"""
user_input = "Wave your hands quickly and smile"
prompt = f"<|system|>\n{system_prompt}\n<|user|>\n{user_input}\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(
**inputs,
max_new_tokens=192,
temperature=0.0,
do_sample=False
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
print(response)
Expected Output Format
The model generates responses in the following format:
<tool_call>
{"name": "move_hands", "arguments": {"direction": "wave", "speed": "fast"}}
</tool_call>
<tool_call>
{"name": "smile", "arguments": {}}
</tool_call>
Parsing Tool Calls
import json
import re
def parse_tool_calls(text):
pattern = r"<tool_call>\s*(\{.*?\})\s*</tool_call>"
matches = re.findall(pattern, text, re.DOTALL)
tools = []
for match in matches:
try:
tools.append(json.loads(match))
except:
continue
return tools
tools = parse_tool_calls(response)
print(tools)
# [{'name': 'move_hands', 'arguments': {'direction': 'wave', 'speed': 'fast'}},
# {'name': 'smile', 'arguments': {}}]
Training Details
Training Data
The model was fine-tuned on a custom dataset of conversational examples mapping natural language commands to structured tool calls in JSONL format.
Training Procedure
- Method: Supervised Fine-Tuning (SFT) with LoRA
- LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Quantization: 4-bit with NF4 quantization during training
- Optimizer: Paged AdamW (32-bit)
- Learning Rate: 2e-4 with cosine scheduling
- Batch Size: 8 per device with 2 gradient accumulation steps
- Epochs: 2
- Warmup Ratio: 0.03
Limitations
- The model is specialized for a specific set of tools and may not generalize well to arbitrary function calling tasks
- Limited to 1.1B parameters, so reasoning capabilities are constrained compared to larger models
- Best performance with greedy decoding (temperature=0.0)
- Requires exact tool names and argument formats as specified in the system prompt
Ethical Considerations
This model is designed for robotic assistant applications. Users should:
- Ensure appropriate safety measures when connecting to physical robotic systems
- Validate all tool calls before execution
- Implement proper error handling and fallback mechanisms
- Consider privacy implications when using news/story generation features
Citation
@misc{orbi-1b,
title={Orbi-1B: A Fine-tuned TinyLlama for Function Calling},
author={Arojit Ghosh},
year={2025},
howpublished={\url{https://huggingface.co/Arojit/orbi-1b}}
}
Acknowledgments
Built on top of TinyLlama-1.1B-Chat-v1.0 by the TinyLlama team.
- Downloads last month
- 1
Model tree for Arojit/orbi-1b
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0
docker model run hf.co/Arojit/orbi-1b