TurkishCodeMan
/

Nanbeige4.1-3B-Gmail-Tool-Use

Text Generation

function-calling

Model card Files Files and versions

Nanbeige4.1-3B-Gmail-Tool-Use / README.md

TurkishCodeMan's picture

Upload README.md with huggingface_hub

ae237d8 verified 1 day ago

|

history blame contribute delete

2.42 kB

	---
	language: en
	license: apache-2.0
	base_model: Nanbeige/Nanbeige4.1-3B
	datasets:
	- TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
	tags:
	- tool-use
	- gmail
	- function-calling
	- sft
	- dpo
	pipeline_tag: text-generation
	---

	# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)

	Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
	for Gmail tool-calling tasks using a two-stage training pipeline.

	Training datasets: [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)

	## Training Pipeline

	### Stage 1 — Supervised Fine-Tuning (SFT)
	- Dataset: 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
	- Format: ChatML with tool_calls (OpenAI function-calling schema)
	- Method: LoRA r=16, α=32, 7 target modules
	- Result: loss 0.8464 → 0.1888 · PPL 2.33 → 1.21

	### Stage 2 — Direct Preference Optimization (DPO)
	- Dataset: 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
	- `wrong_tool` — incorrect tool selected (~34%)
	- `missing_args` — required arguments omitted (~32%)
	- `bad_answer` — poor final response (~34%)
	- Method: DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
	- Result: val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52

	## Supported Tools

	\| Tool \| Description \|
	\|---\|---\|
	\| `search_emails` \| Search Gmail inbox with filters \|
	\| `read_email` \| Read full email content by ID \|
	\| `send_email` \| Send a new email \|
	\| `draft_email` \| Create a draft \|
	\| `modify_email` \| Add/remove labels, mark read/unread \|
	\| `download_attachment` \| Download email attachment \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
	trust_remote_code=True,
	)
	```

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Nanbeige/Nanbeige4.1-3B \|
	\| SFT LoRA rank \| 16 \|
	\| DPO LoRA rank \| 16 \|
	\| DPO β \| 0.1 \|
	\| Max length \| 2682 tokens \|
	\| GPU \| 1× RTX 4090 24GB \|
	\| Framework \| TRL 0.22 · Transformers 4.57 · PEFT 0.18 \|