metga97
/

functiongemma-270m-ar-tooluse

Text Generation

text-generation-inference

Model card Files Files and versions

functiongemma-270m-ar-tooluse / README.md

metga97's picture

Update README.md

1045749 verified 11 days ago

|

history blame contribute delete

2.96 kB

	---
	library_name: transformers
	tags: []
	---

	language:
	- ar
	tags:
	- function-calling
	- tool-use
	- arabic
	- instruction-tuning
	- gemma
	- transformers
	license: apache-2.0
	base_model: google/functiongemma-270m-it
	---

	# FunctionGemma-270M Arabic Tool Use

	This model is a finetuned version of `google/functiongemma-270m-it` for Arabic tool use / function calling across multiple dialects and domains.

	It is trained to produce exactly one tool call when a tool is required, using FunctionGemma-native tool formatting (special function-call tokens) and structured JSON arguments.

	## Base model
	- `google/functiongemma-270m-it`

	## Dataset
	- `metga97/arabic-tooluse-functiongemma-v1`

	## What the model outputs

	When a tool is required, generation should include a FunctionGemma tool call pattern such as:

	- `<start_function_call>call:TOOL_NAME{ ...json args... }<end_function_call>`

	For non-tool requests, it returns a short Arabic reply.

	## Evaluation (by slang / dialect)

	Evaluated on the test split of `metga97/arabic-tooluse-functiongemma-v1`.

	### Overall
	- Parsed OK rate: 0.891
	- Tool name accuracy: 0.9921
	- Strict EM: 0.6564
	- Key-F1 (avg): 0.9925
	- Missed-call rate: 0.0064
	- False-call rate (negatives): 0.0

	### Strict EM by slang / dialect
	- Egyptian: 0.6791 (denom_calls: 1069)
	- Gulf: 0.6237 (denom_calls: 1172)
	- Levantine: 0.6558 (denom_calls: 706)
	- MSA: 0.6804 (denom_calls: 1408)
	- Maghrebi: 0.5455 (denom_calls: 176)

	### Strict EM by domain
	- banking_finance: 0.6255 (denom_calls: 542)
	- ecommerce: 0.64 (denom_calls: 550)
	- government_services: 0.7651 (denom_calls: 613)
	- healthcare: 0.5754 (denom_calls: 577)
	- islamic_services: 0.7119 (denom_calls: 597)
	- travel: 0.6028 (denom_calls: 564)
	- utilities: 0.4652 (denom_calls: 561)
	- weather: 0.8653 (denom_calls: 527)

	## Inference (important)

	### 1) Use left padding for decoder-only generation
	Set:
	- `tokenizer.padding_side = "left"`
	- `tokenizer.pad_token = tokenizer.eos_token` (if missing)

	### 2) Pass tools via `apply_chat_template(..., tools=tools_list)`
	This is critical for FunctionGemma-style function calling.

	Example outline:
	1. Select a tool subset for the request (domain pack + deterministic sampling).
	2. Build prompt with `apply_chat_template` including `tools=tools_list`.
	3. `generate()` deterministically (`do_sample=False`, `temperature=0.0`).
	4. Parse tool call tokens and arguments.

	## Known limitations / improvement ideas

	- Some outputs may translate slot values into English (e.g., “Abu Dhabi”, “ID renewal”).
	- Mitigations: stronger developer prompt constraints, post-processing, adding explicit anti-translation supervision, and/or filtering/rebalancing training examples where values are English.
	- Parsed OK < 1.0: you can improve formatting consistency with:
	- longer training
	- slightly stronger prompt
	- adding more negative/no-tool examples with explicit non-tool responses