TurkishCodeMan commited on
Commit
ae237d8
·
verified ·
1 Parent(s): 6bf2c04

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -2,6 +2,8 @@
2
  language: en
3
  license: apache-2.0
4
  base_model: Nanbeige/Nanbeige4.1-3B
 
 
5
  tags:
6
  - tool-use
7
  - gmail
@@ -16,19 +18,21 @@ pipeline_tag: text-generation
16
  Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
17
  for Gmail tool-calling tasks using a two-stage training pipeline.
18
 
 
 
19
  ## Training Pipeline
20
 
21
  ### Stage 1 — Supervised Fine-Tuning (SFT)
22
- - **Dataset:** Multi-turn Gmail agent traces generated via LangGraph + Claude emulation
23
  - **Format:** ChatML with tool_calls (OpenAI function-calling schema)
24
  - **Method:** LoRA r=16, α=32, 7 target modules
25
  - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
26
 
27
  ### Stage 2 — Direct Preference Optimization (DPO)
28
- - **Dataset:** 3223 preference pairs from SFT traces (3 rejection strategies)
29
- - `wrong_tool` — incorrect tool selected
30
- - `missing_args` — required arguments omitted
31
- - `bad_answer` — poor final response
32
  - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
33
  - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
34
 
 
2
  language: en
3
  license: apache-2.0
4
  base_model: Nanbeige/Nanbeige4.1-3B
5
+ datasets:
6
+ - TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
7
  tags:
8
  - tool-use
9
  - gmail
 
18
  Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
19
  for Gmail tool-calling tasks using a two-stage training pipeline.
20
 
21
+ **Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
22
+
23
  ## Training Pipeline
24
 
25
  ### Stage 1 — Supervised Fine-Tuning (SFT)
26
+ - **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
27
  - **Format:** ChatML with tool_calls (OpenAI function-calling schema)
28
  - **Method:** LoRA r=16, α=32, 7 target modules
29
  - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
30
 
31
  ### Stage 2 — Direct Preference Optimization (DPO)
32
+ - **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) 3 rejection strategies:
33
+ - `wrong_tool` — incorrect tool selected (~34%)
34
+ - `missing_args` — required arguments omitted (~32%)
35
+ - `bad_answer` — poor final response (~34%)
36
  - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
37
  - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
38