TurkishCodeMan
/

Nanbeige4.1-3B-Gmail-Tool-Use

Text Generation

function-calling

Model card Files Files and versions

TurkishCodeMan commited on about 18 hours ago

Commit

ae237d8

·

verified ·

1 Parent(s): 6bf2c04

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +9 -5

README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 language: en
 license: apache-2.0
 base_model: Nanbeige/Nanbeige4.1-3B
 tags:
   - tool-use
   - gmail
@@ -16,19 +18,21 @@ pipeline_tag: text-generation
 Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
 for Gmail tool-calling tasks using a two-stage training pipeline.
 ## Training Pipeline
 ### Stage 1 — Supervised Fine-Tuning (SFT)
-- **Dataset:** Multi-turn Gmail agent traces generated via LangGraph + Claude emulation
 - **Format:** ChatML with tool_calls (OpenAI function-calling schema)
 - **Method:** LoRA r=16, α=32, 7 target modules
 - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
 ### Stage 2 — Direct Preference Optimization (DPO)
-- **Dataset:** 3223 preference pairs from SFT traces (3 rejection strategies)
-  - `wrong_tool` — incorrect tool selected
-  - `missing_args` — required arguments omitted
-  - `bad_answer` — poor final response
 - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
 - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52

 language: en
 license: apache-2.0
 base_model: Nanbeige/Nanbeige4.1-3B
+datasets:
+  - TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
 tags:
   - tool-use
   - gmail
 Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
 for Gmail tool-calling tasks using a two-stage training pipeline.
+**Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
 ## Training Pipeline
 ### Stage 1 — Supervised Fine-Tuning (SFT)
+- **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
 - **Format:** ChatML with tool_calls (OpenAI function-calling schema)
 - **Method:** LoRA r=16, α=32, 7 target modules
 - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
 ### Stage 2 — Direct Preference Optimization (DPO)
+- **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
+  - `wrong_tool` — incorrect tool selected (~34%)
+  - `missing_args` — required arguments omitted (~32%)
+  - `bad_answer` — poor final response (~34%)
 - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
 - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52