unicorn-team
/

Unicorn-R3

Text Generation

text-generation-inference

Model card Files Files and versions

rin2401 commited on Dec 3, 2025

Commit

e347455

·

verified ·

1 Parent(s): dbd5b67

Update README.md

Files changed (1) hide show

README.md +12 -10

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
----
-library_name: transformers
-base_model:
-- Qwen/Qwen3-8B
-datasets:
-- allenai/Dolci-Think-SFT-7B
----
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -189,12 +189,14 @@ special_tokens:
 ## 3. Nộp bài
 + Mô hình tốt nhất của team training được là Qwen3-VL-8B với 74.87 điểm trên VMLU, nhưng 2 task Instruction Following và Function calling thì chất lượng không bằng Qwen3-8B (chỉ có 71.74 trên VMLU).
-=>  Sau khi tính AVG điểm thì Qwen3-8B đạt 74.03 và Qwen3-8B-VL đạt 73.43 nên team quyết định chọn Qwen3-8B làm final model
 + Trong quá trình inference test để hiểu hơn về mô hình, team nhận thấy Qwen3 hay mắc các lỗi về thêm các token tiếng Trung vào trong response dù đã prompt kĩ lưỡng
-=> Thực hiện model pruning weight để khiến mô hình không sinh các token tiếng Trung
 + Finaly kết quả trước và sau training của Qwen3-8B:
   * VMLU: 69.0 -> 71.74
-  * LLM Judge 12 task: 52 > 72 (Gemini-2.5-Flash: 84 / Gemini-2.5-Pro: 90)

+---
+library_name: transformers
+base_model:
+- Qwen/Qwen3-8B
+datasets:
+- allenai/Dolci-Think-SFT-7B
+---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 ## 3. Nộp bài
 + Mô hình tốt nhất của team training được là Qwen3-VL-8B với 74.87 điểm trên VMLU, nhưng 2 task Instruction Following và Function calling thì chất lượng không bằng Qwen3-8B (chỉ có 71.74 trên VMLU).
+  =>  Sau khi tính AVG điểm thì Qwen3-8B đạt 74.03 và Qwen3-8B-VL đạt 73.43 nên team quyết định chọn Qwen3-8B làm final model
 + Trong quá trình inference test để hiểu hơn về mô hình, team nhận thấy Qwen3 hay mắc các lỗi về thêm các token tiếng Trung vào trong response dù đã prompt kĩ lưỡng
+  => Thực hiện model pruning weight để khiến mô hình không sinh các token tiếng Trung
 + Finaly kết quả trước và sau training của Qwen3-8B:
   * VMLU: 69.0 -> 71.74
+  * LLM Judge 12 task: 52 -> 72 (Gemini-2.5-Flash: 84 / Gemini-2.5-Pro: 90)