Model Card for SeaLION-TC v1 (Tool Calling)
SeaLION-TC v1 is a specialized QLoRA fine-tune of aisingapore/Qwen-SEA-LION-v4-8B-VL, engineered specifically for Agentic Workflow Orchestration and Function Calling.
Unlike general-purpose chat models, this adapter was trained to enforce strict syntax compliance for tool usage while prioritizing safety (hallucination resistance). It is designed to act as a reliable "Edge Agent" for orchestrating multi-step tasks in regional contexts.
This model was built for HackRift 2025 at Singapore Institute of Technology
π Benchmark Performance (BFCL v4)
This model was evaluated on the Berkeley Function Calling Leaderboard (BFCL v4) against the base SeaLION Instruct model.
Key Result: We achieved a +12% improvement in Safety (Irrelevance) and a +25% improvement in Real-World Multitasking (Live Parallel) compared to the base model.
| Metric | SeaLION Base | SeaLION-TC v1 | Delta | Analysis |
|---|---|---|---|---|
| Irrelevance (Safety) | 79.17% | 91.25% | π’ +12.08% | significantly reduced hallucinated tool calls during casual conversation. |
| Live Parallel | 50.00% | 75.00% | π’ +25.00% | Massive gain in handling simultaneous, multi-intent requests. |
| Live Parallel Multiple | 54.17% | 70.83% | π’ +16.66% | Improved orchestration of complex, concurrent tool calls. |
| Simple Python | 95.00% | 93.50% | π΄ -1.50% | Negligible trade-off for increased safety. |
| Simple JS | 76.00% | 70.00% | π΄ -6.00% | Known Limitation: Non-Python syntax degraded slightly. |
The rest of the tests remain within margin of error or with slight improvements! Full benchmark suite and comparison to come
β οΈ Intended Use & Limitations
Best For:
- Python-based Agentic Backends: The model is highly optimized for Python function definitions.
- RAG Orchestration: Excellent at selecting relevant tools from long lists (
Multiplescore: 94.5%). - Edge Deployment: Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
Known Limitations:
- The "Alignment Tax": In exchange for higher safety and parallel reasoning, the model's ability to generate valid Javascript tool calls has regressed by ~5-6% compared to the base model.
- Vision Capabilities: While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.
βοΈ Training procedure
This model was trained using TRL with QLoRA instruction tuning.
Training Hyperparameters
- Compute: 1x NVIDIA RTX 3090 (24GB VRAM)
- Precision: 4-bit (NF4) Quantization
- LoRA Rank: 32
- LoRA Alpha: 64
- LoRA Dropout: 0.05
- Target Modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Strategy: Checkpoint selection via Early Stopping based on Agentic Capability (BFCL v4) at Step 1000.
Citations (WIP)
Berkeley Function Calling Leaderboard:
@misc{patil2024gorilla,
title={Gorilla: Large Language Model Connected with Massive APIs},
author={Shishir Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
year={2023},
journal={arXiv preprint arXiv:2305.15334}
}
SeaLION (AI Singapore):
@article{sealion2024,
title={SeaLION: Southeast Asian Languages In One Network},
author={AI Singapore},
year={2024}
}
- Downloads last month
- 11
4-bit
8-bit
16-bit
Model tree for LLJYY/SEALION-TC-v1
Base model
Qwen/Qwen3-VL-8B-Instruct