File size: 2,419 Bytes
6bf2c04 ae237d8 6bf2c04 ae237d8 6bf2c04 ae237d8 6bf2c04 ae237d8 6bf2c04 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ---
language: en
license: apache-2.0
base_model: Nanbeige/Nanbeige4.1-3B
datasets:
- TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
tags:
- tool-use
- gmail
- function-calling
- sft
- dpo
pipeline_tag: text-generation
---
# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)
Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
for Gmail tool-calling tasks using a two-stage training pipeline.
**Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)
## Training Pipeline
### Stage 1 — Supervised Fine-Tuning (SFT)
- **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
- **Format:** ChatML with tool_calls (OpenAI function-calling schema)
- **Method:** LoRA r=16, α=32, 7 target modules
- **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21
### Stage 2 — Direct Preference Optimization (DPO)
- **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
- `wrong_tool` — incorrect tool selected (~34%)
- `missing_args` — required arguments omitted (~32%)
- `bad_answer` — poor final response (~34%)
- **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
- **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52
## Supported Tools
| Tool | Description |
|---|---|
| `search_emails` | Search Gmail inbox with filters |
| `read_email` | Read full email content by ID |
| `send_email` | Send a new email |
| `draft_email` | Create a draft |
| `modify_email` | Add/remove labels, mark read/unread |
| `download_attachment` | Download email attachment |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
trust_remote_code=True,
)
```
## Training Details
| Parameter | Value |
|---|---|
| Base model | Nanbeige/Nanbeige4.1-3B |
| SFT LoRA rank | 16 |
| DPO LoRA rank | 16 |
| DPO β | 0.1 |
| Max length | 2682 tokens |
| GPU | 1× RTX 4090 24GB |
| Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 |
|