Instructions to use Shrijanagain/TIGER-OM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Shrijanagain/TIGER-OM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Shrijanagain/TIGER-OM")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Shrijanagain/TIGER-OM")
model = AutoModelForCausalLM.from_pretrained("Shrijanagain/TIGER-OM")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Shrijanagain/TIGER-OM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Shrijanagain/TIGER-OM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Shrijanagain/TIGER-OM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Shrijanagain/TIGER-OM

SGLang

How to use Shrijanagain/TIGER-OM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Shrijanagain/TIGER-OM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Shrijanagain/TIGER-OM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Shrijanagain/TIGER-OM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Shrijanagain/TIGER-OM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Shrijanagain/TIGER-OM with Docker Model Runner:
```
docker model run hf.co/Shrijanagain/TIGER-OM
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🚀 TIGER-OM (SKT-OM) - 13B MoE Agentic Model

Advanced 13B Mixture-of-Experts (MoE) Model optimized for Agentic RAG with Think Mode & Plugin Architecture.

Built for AMD Developer Hackathon 2026 using AMD Developer Cloud.

📊 Model Details

Model Name: TIGER-OM (SKT-OM)
Architecture: Mixture of Experts (MoE)
Total Parameters: 13B (Active parameters much lower due to MoE sparsity)
Base Models:
- Primary Base: Shrijanagain/ST-X-0
- Expert Integration: Mistral-7B
Format: Safetensors (Safe & Fast loading)
Quantization: FP16 / BF16 (Original) + Q4_K_M GGUF available in separate repo
Context Length: 8192 tokens
Training Hardware: AMD Developer Cloud GPUs ($100 developer credits)
Inference Optimized: ROCm 7.0 + vLLM + AMD MI300X

🌟 Key Features

True MoE Architecture — Sparse activation for better efficiency and performance
Think Mode Reasoning — Advanced Chain-of-Thought, Planning, Self-Reflection & Verification
Dynamic Plugin System — Intelligent routing to Code, Math, Search, Data Analysis plugins
Agentic Capabilities — Full LangGraph multi-agent workflow
Advanced RAG Integration — SKT RAG + Query Rewriting + Multi-hop + Reranking
Stateful Memory — Persistent conversation context

🏗️ Architecture Breakdown

TIGER-OM is built on a 13B MoE backbone:

Base: Shrijanagain/ST-X-0 (strong foundational model)
Experts: Fine-tuned using Mistral-7B as expert layers for specialized reasoning and tool-use capabilities
Router Network: Learned gating mechanism for expert selection
Think Mode Layer: Custom system prompt + reasoning controller
Plugin Head: Tool calling & execution layer

This hybrid approach (ST-X-0 + Mistral-7B experts) gives excellent reasoning, code understanding, and general intelligence while maintaining MoE efficiency.

📁 Files in this Repo (Safetensors)

model-00001-of-0000X.safetensors → Main model weights
config.json
tokenizer.json / tokenizer_config.json
generation_config.json
special_tokens_map.json
model.safetensors.index.json

All weights are in safe safetensors format — No pickle risk.

🚀 How to Use (Safetensors)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Shrijanagain/TIGER-OM"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = """You are SKT-OM, an advanced agentic AI with Think Mode enabled.
User Query: Calculate training cost comparison and suggest best option..."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔗 Important Links

Live Demo: SKT-OM Space
GGUF Quantized (Q4_K_M): Shrijanagain/TIGER-GGUF
GitHub (RAG + ADK Code): SHRIJANAGAIN/SKT-AMD-FILES

🛠️ Technologies & Stack

Base Models: Shrijanagain/ST-X-0 + Mistral-7B Experts
RAG: SKT RAG + AMD ADK Kit
Agents: LangGraph
Hardware: AMD MI300X + ROCm 7.0
Inference: vLLM (FP16) + transformers (Safetensors)
Training: AMD Developer Cloud

⚡ Performance

Excellent balance of quality vs efficiency due to MoE architecture
Strong performance on reasoning, tool-use, code, and multi-step tasks
Significantly lower inference cost compared to dense 13B+ models

📌 Use Cases

Complex technical Q&A
Agentic workflows & tool calling
Research assistance
Code generation & debugging
Mathematical & logical reasoning
Comparative analysis
Data analysis with plugins

🏆 Hackathon

AMD Developer Hackathon 2026
Trained entirely on AMD Developer Cloud
Fully built in public with multiple technical updates.

📄 License

MIT License

Downloads last month: 142

Safetensors

Model size

13B params

Tensor type

BF16

Model tree for Shrijanagain/TIGER-OM

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3294)

this model

Quantizations

2 models