Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ language:
|
|
| 18 |
- id
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# Model Card for SeaLION-TC v1 (Tool
|
| 22 |
|
| 23 |
**SeaLION-TC v1** is a specialized QLoRA fine-tune of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL), engineered specifically for **Agentic Workflow Orchestration** and **Function Calling**.
|
| 24 |
|
|
@@ -36,6 +36,7 @@ This model was evaluated on the **Berkeley Function Calling Leaderboard (BFCL v4
|
|
| 36 |
| :--- | :--- | :--- | :--- | :--- |
|
| 37 |
| **Irrelevance (Safety)** | 79.17% | **91.25%** | π’ **+12.08%** | significantly reduced hallucinated tool calls during casual conversation. |
|
| 38 |
| **Live Parallel** | 50.00% | **75.00%** | π’ **+25.00%** | Massive gain in handling simultaneous, multi-intent requests. |
|
|
|
|
| 39 |
| **Simple Python** | 95.00% | **93.50%** | π΄ -1.50% | Negligible trade-off for increased safety. |
|
| 40 |
| **Simple JS** | 76.00% | **70.00%** | π΄ -6.00% | **Known Limitation:** Non-Python syntax degraded slightly. |
|
| 41 |
|
|
@@ -50,6 +51,40 @@ Full benchmark suite and comparison to come
|
|
| 50 |
* **Edge Deployment:** Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
|
| 51 |
|
| 52 |
### Known Limitations:
|
| 53 |
-
* **The "Alignment Tax":** In exchange for higher safety and parallel reasoning, the model's ability to generate valid **Javascript**
|
| 54 |
* **Vision Capabilities:** While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
- id
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# Model Card for SeaLION-TC v1 (Tool Calling)
|
| 22 |
|
| 23 |
**SeaLION-TC v1** is a specialized QLoRA fine-tune of [aisingapore/Qwen-SEA-LION-v4-8B-VL](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-8B-VL), engineered specifically for **Agentic Workflow Orchestration** and **Function Calling**.
|
| 24 |
|
|
|
|
| 36 |
| :--- | :--- | :--- | :--- | :--- |
|
| 37 |
| **Irrelevance (Safety)** | 79.17% | **91.25%** | π’ **+12.08%** | significantly reduced hallucinated tool calls during casual conversation. |
|
| 38 |
| **Live Parallel** | 50.00% | **75.00%** | π’ **+25.00%** | Massive gain in handling simultaneous, multi-intent requests. |
|
| 39 |
+
| **Live Parallel Multiple** | 54.17% | **70.83%** | π’ **+16.66%** | Improved orchestration of complex, concurrent tool calls. |
|
| 40 |
| **Simple Python** | 95.00% | **93.50%** | π΄ -1.50% | Negligible trade-off for increased safety. |
|
| 41 |
| **Simple JS** | 76.00% | **70.00%** | π΄ -6.00% | **Known Limitation:** Non-Python syntax degraded slightly. |
|
| 42 |
|
|
|
|
| 51 |
* **Edge Deployment:** Optimized for 4-bit quantization (GGUF) on consumer hardware (e.g., NVIDIA GeForce, AMD Ryzen AI).
|
| 52 |
|
| 53 |
### Known Limitations:
|
| 54 |
+
* **The "Alignment Tax":** In exchange for higher safety and parallel reasoning, the model's ability to generate valid **Javascript** tool calls has regressed by ~5-6% compared to the base model.
|
| 55 |
* **Vision Capabilities:** While based on a VLM, this fine-tune focused exclusively on text-based function calling. Vision-related tool usage has not been strictly benchmarked.
|
| 56 |
|
| 57 |
+
## βοΈ Training procedure
|
| 58 |
+
|
| 59 |
+
This model was trained using [TRL](https://github.com/huggingface/trl) with QLoRA instruction tuning.
|
| 60 |
+
|
| 61 |
+
### Training Hyperparameters
|
| 62 |
+
* **Compute:** 1x NVIDIA RTX 3090 (24GB VRAM)
|
| 63 |
+
* **Precision:** 4-bit (NF4) Quantization
|
| 64 |
+
* **LoRA Rank:** 32
|
| 65 |
+
* **LoRA Alpha:** 64
|
| 66 |
+
* **LoRA Dropout:** 0.05
|
| 67 |
+
* **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
| 68 |
+
* **Strategy:** Checkpoint selection via Early Stopping based on Agentic Capability (BFCL v4) at Step 1000.
|
| 69 |
+
|
| 70 |
+
## Citations (WIP)
|
| 71 |
+
|
| 72 |
+
**Berkeley Function Calling Leaderboard:**
|
| 73 |
+
```bibtex
|
| 74 |
+
@misc{patil2024gorilla,
|
| 75 |
+
title={Gorilla: Large Language Model Connected with Massive APIs},
|
| 76 |
+
author={Shishir Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
|
| 77 |
+
year={2023},
|
| 78 |
+
journal={arXiv preprint arXiv:2305.15334}
|
| 79 |
+
}
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
**SeaLION (AI Singapore):**
|
| 83 |
+
```bibtex
|
| 84 |
+
@article{sealion2024,
|
| 85 |
+
title={SeaLION: Southeast Asian Languages In One Network},
|
| 86 |
+
author={AI Singapore},
|
| 87 |
+
year={2024}
|
| 88 |
+
}
|
| 89 |
+
```
|
| 90 |
+
|