LLJYY
/

SEALION-TC-v1

@@ -1,58 +1,55 @@
 ---
 base_model: aisingapore/Qwen-SEA-LION-v4-8B-VL
 library_name: transformers
-model_name: SEALION-Test
 tags:
-- generated_from_trainer
-- trl
-- sft
-licence: license
 ---
-# Model Card for SEALION-Test
-This model is a fine-tuned version of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="LLJYY/SEALION-Test", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
-### Framework versions
-- TRL: 0.26.0
-- Transformers: 4.57.3
-- Pytorch: 2.9.1
-- Datasets: 4.4.1
-- Tokenizers: 0.22.1
-## Citations
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
-}
-```

 ---
 base_model: aisingapore/Qwen-SEA-LION-v4-8B-VL
 library_name: transformers
+model_name: SeaLION-TC-v1
 tags:
+- function-calling
+- tool-use
+- agent
+- sealion
+- qlora
+- bfcl-v4
+license: apache-2.0
+language:
+- en
+- zh
+- th
+- vi
+- id
 ---
+# Model Card for SeaLION-TC v1 (Tool Chain)
+**SeaLION-TC v1** is a specialized QLoRA fine-tune of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL), engineered specifically for **Agentic Workflow Orchestration** and **Function Calling**.
+Unlike general-purpose chat models, this adapter was trained to enforce strict syntax compliance for tool usage while prioritizing safety (hallucination resistance). It is designed to act as a reliable "Edge Agent" for orchestrating multi-step tasks in regional contexts.
+This model was built for HackRift 2025 at Singapore Institute of Technology
+## 🏆 Benchmark Performance (BFCL v4)
+This model was evaluated on the **Berkeley Function Calling Leaderboard (BFCL v4)** against the base SeaLION Instruct model.
+**Key Result:** We achieved a **+12% improvement in Safety (Irrelevance)** and a **+25% improvement in Real-World Multitasking (Live Parallel)** compared to the base model.
+| Metric | SeaLION Base | **SeaLION-TC v1** | Delta | Analysis |
+| :--- | :--- | :--- | :--- | :--- |
+| **Irrelevance (Safety)** | 79.17% | **91.25%** | 🟢 **+12.08%** | significantly reduced hallucinated tool calls during casual conversation. |
+| **Live Parallel** | 50.00% | **75.00%** | 🟢 **+25.00%** | Massive gain in handling simultaneous, multi-intent requests. |
+| **Simple Python** | 95.00% | **93.50%** | 🔴 -1.50% | Negligible trade-off for increased safety. |
+| **Simple JS** | 76.00% | **70.00%** | 🔴 -6.00% | **Known Limitation:** Non-Python syntax degraded slightly. |
+The rest of the tests remain within margin of error or with slight improvements!
+Full benchmark suite and comparison to come
+## ⚠️ Intended Use & Limitations
+### Best For:
+* **Python-based Agentic Backends:** The model is highly optimized for Python function definitions.
+* **RAG Orchestration:** Excellent at selecting relevant tools from long lists (`Multiple` score: 94.5%).
+* **Edge Deployment:** Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
+### Known Limitations:
+* **The "Alignment Tax":** In exchange for higher safety and parallel reasoning, the model's ability to generate valid **Javascript** and **Java** tool calls has regressed by ~5-6% compared to the base model.
+* **Vision Capabilities:** While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.