Model Card for SeaLION-TC v1 (Tool Calling)

SeaLION-TC v1 is a specialized QLoRA fine-tune of aisingapore/Qwen-SEA-LION-v4-8B-VL, engineered specifically for Agentic Workflow Orchestration and Function Calling.

Unlike general-purpose chat models, this adapter was trained to enforce strict syntax compliance for tool usage while prioritizing safety (hallucination resistance). It is designed to act as a reliable "Edge Agent" for orchestrating multi-step tasks in regional contexts.

This model was built for HackRift 2025 at Singapore Institute of Technology

πŸ† Benchmark Performance (BFCL v4)

This model was evaluated on the Berkeley Function Calling Leaderboard (BFCL v4) against the base SeaLION Instruct model.

Key Result: We achieved a +12% improvement in Safety (Irrelevance) and a +25% improvement in Real-World Multitasking (Live Parallel) compared to the base model.

Metric SeaLION Base SeaLION-TC v1 Delta Analysis
Irrelevance (Safety) 79.17% 91.25% 🟒 +12.08% significantly reduced hallucinated tool calls during casual conversation.
Live Parallel 50.00% 75.00% 🟒 +25.00% Massive gain in handling simultaneous, multi-intent requests.
Live Parallel Multiple 54.17% 70.83% 🟒 +16.66% Improved orchestration of complex, concurrent tool calls.
Simple Python 95.00% 93.50% πŸ”΄ -1.50% Negligible trade-off for increased safety.
Simple JS 76.00% 70.00% πŸ”΄ -6.00% Known Limitation: Non-Python syntax degraded slightly.

The rest of the tests remain within margin of error or with slight improvements! Full benchmark suite and comparison to come

⚠️ Intended Use & Limitations

Best For:

  • Python-based Agentic Backends: The model is highly optimized for Python function definitions.
  • RAG Orchestration: Excellent at selecting relevant tools from long lists (Multiple score: 94.5%).
  • Edge Deployment: Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).

Known Limitations:

  • The "Alignment Tax": In exchange for higher safety and parallel reasoning, the model's ability to generate valid Javascript tool calls has regressed by ~5-6% compared to the base model.
  • Vision Capabilities: While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.

βš™οΈ Training procedure

This model was trained using TRL with QLoRA instruction tuning.

Training Hyperparameters

  • Compute: 1x NVIDIA RTX 3090 (24GB VRAM)
  • Precision: 4-bit (NF4) Quantization
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • LoRA Dropout: 0.05
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Strategy: Checkpoint selection via Early Stopping based on Agentic Capability (BFCL v4) at Step 1000.

Citations (WIP)

Berkeley Function Calling Leaderboard:

@misc{patil2024gorilla,
    title={Gorilla: Large Language Model Connected with Massive APIs},
    author={Shishir Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
    year={2023},
    journal={arXiv preprint arXiv:2305.15334}
}

SeaLION (AI Singapore):

@article{sealion2024,
    title={SeaLION: Southeast Asian Languages In One Network},
    author={AI Singapore},
    year={2024}
}
Downloads last month
11
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LLJYY/SEALION-TC-v1

Quantized
(3)
this model