--- language: en license: apache-2.0 base_model: Nanbeige/Nanbeige4.1-3B datasets: - TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets tags: - tool-use - gmail - function-calling - sft - dpo pipeline_tag: text-generation --- # Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO) Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B) for Gmail tool-calling tasks using a two-stage training pipeline. **Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets) ## Training Pipeline ### Stage 1 — Supervised Fine-Tuning (SFT) - **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`) - **Format:** ChatML with tool_calls (OpenAI function-calling schema) - **Method:** LoRA r=16, α=32, 7 target modules - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21 ### Stage 2 — Direct Preference Optimization (DPO) - **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies: - `wrong_tool` — incorrect tool selected (~34%) - `missing_args` — required arguments omitted (~32%) - `bad_answer` — poor final response (~34%) - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref) - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52 ## Supported Tools | Tool | Description | |---|---| | `search_emails` | Search Gmail inbox with filters | | `read_email` | Read full email content by ID | | `send_email` | Send a new email | | `draft_email` | Create a draft | | `modify_email` | Add/remove labels, mark read/unread | | `download_attachment` | Download email attachment | ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model = AutoModelForCausalLM.from_pretrained( "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use", torch_dtype=torch.bfloat16, trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained( "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use", trust_remote_code=True, ) ``` ## Training Details | Parameter | Value | |---|---| | Base model | Nanbeige/Nanbeige4.1-3B | | SFT LoRA rank | 16 | | DPO LoRA rank | 16 | | DPO β | 0.1 | | Max length | 2682 tokens | | GPU | 1× RTX 4090 24GB | | Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 |