LLJYY
/

SEALION-TC-v1

@@ -18,7 +18,7 @@ language:
 - id
 ---
-# Model Card for SeaLION-TC v1 (Tool Chain)
 **SeaLION-TC v1** is a specialized QLoRA fine-tune of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL), engineered specifically for **Agentic Workflow Orchestration** and **Function Calling**.
@@ -36,6 +36,7 @@ This model was evaluated on the **Berkeley Function Calling Leaderboard (BFCL v4
 | :--- | :--- | :--- | :--- | :--- |
 | **Irrelevance (Safety)** | 79.17% | **91.25%** | 🟢 **+12.08%** | significantly reduced hallucinated tool calls during casual conversation. |
 | **Live Parallel** | 50.00% | **75.00%** | 🟢 **+25.00%** | Massive gain in handling simultaneous, multi-intent requests. |
 | **Simple Python** | 95.00% | **93.50%** | 🔴 -1.50% | Negligible trade-off for increased safety. |
 | **Simple JS** | 76.00% | **70.00%** | 🔴 -6.00% | **Known Limitation:** Non-Python syntax degraded slightly. |
@@ -50,6 +51,40 @@ Full benchmark suite and comparison to come
 * **Edge Deployment:** Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
 ### Known Limitations:
-* **The "Alignment Tax":** In exchange for higher safety and parallel reasoning, the model's ability to generate valid **Javascript** and **Java** tool calls has regressed by ~5-6% compared to the base model.
 * **Vision Capabilities:** While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.

 - id
 ---
+# Model Card for SeaLION-TC v1 (Tool Calling)
 **SeaLION-TC v1** is a specialized QLoRA fine-tune of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL), engineered specifically for **Agentic Workflow Orchestration** and **Function Calling**.
 | :--- | :--- | :--- | :--- | :--- |
 | **Irrelevance (Safety)** | 79.17% | **91.25%** | 🟢 **+12.08%** | significantly reduced hallucinated tool calls during casual conversation. |
 | **Live Parallel** | 50.00% | **75.00%** | 🟢 **+25.00%** | Massive gain in handling simultaneous, multi-intent requests. |
+| **Live Parallel Multiple** | 54.17% | **70.83%** | 🟢 **+16.66%** | Improved orchestration of complex, concurrent tool calls. |
 | **Simple Python** | 95.00% | **93.50%** | 🔴 -1.50% | Negligible trade-off for increased safety. |
 | **Simple JS** | 76.00% | **70.00%** | 🔴 -6.00% | **Known Limitation:** Non-Python syntax degraded slightly. |
 * **Edge Deployment:** Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
 ### Known Limitations:
+* **The "Alignment Tax":** In exchange for higher safety and parallel reasoning, the model's ability to generate valid **Javascript** tool calls has regressed by ~5-6% compared to the base model.
 * **Vision Capabilities:** While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.
+## ⚙️ Training procedure
+This model was trained using [TRL](https://github.com/huggingface/trl) with QLoRA instruction tuning.
+### Training Hyperparameters
+* **Compute:** 1x NVIDIA RTX 3090 (24GB VRAM)
+* **Precision:** 4-bit (NF4) Quantization
+* **LoRA Rank:** 32
+* **LoRA Alpha:** 64
+* **LoRA Dropout:** 0.05
+* **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+* **Strategy:** Checkpoint selection via Early Stopping based on Agentic Capability (BFCL v4) at Step 1000.
+## Citations (WIP)
+**Berkeley Function Calling Leaderboard:**
+```bibtex
+@misc{patil2024gorilla,
+    title={Gorilla: Large Language Model Connected with Massive APIs},
+    author={Shishir Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
+    year={2023},
+    journal={arXiv preprint arXiv:2305.15334}
+}
+```
+**SeaLION (AI Singapore):**
+```bibtex
+@article{sealion2024,
+    title={SeaLION: Southeast Asian Languages In One Network},
+    author={AI Singapore},
+    year={2024}
+}
+```