music-2func / README.md

Upload folder using huggingface_hub

baf150b verified about 2 months ago

5.7 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- function-calling
	- music
	- gemma
	- peft
	- lora
	base_model: google/functiongemma-270m-it
	---

	# FunctionGemma Music Assistant (2-Function)

	A fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) for music playback control.
	This model supports 2 core music functions with 100% accuracy.

	## Model Description

	This is a LoRA-adapted FunctionGemma model trained specifically for music function calling.
	The model generates function calls in the FunctionGemma format for controlling music playback.

	Training Details:
	- Base Model: google/functiongemma-270m-it (270M parameters)
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Parameters Trained: 3.8M (1.40% of total)
	- Training Examples: 60 (30 per function)
	- Training Time: ~1 minute
	- Accuracy: 100% (5/5 test cases)

	## Supported Functions

	### 1. play_song
	Play a specific song by name or artist.

	Parameters:
	- `song_name` (required): Name of the song to play
	- `artist` (optional): Artist name
	- `album` (optional): Album name

	Examples:
	- "Play Bohemian Rhapsody"
	- "Play Imagine by John Lennon"
	- "I want to hear Wonderwall"

	### 2. playback_control
	Control music playback (pause, resume, skip).

	Parameters:
	- `action` (required): One of: play, pause, skip, next, previous, stop, resume

	Examples:
	- "Pause"
	- "Resume"
	- "Skip to next song"
	- "Stop the music"

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/functiongemma-270m-it",
	torch_dtype=torch.float16,
	device_map="auto"
	)

	tokenizer = AutoTokenizer.from_pretrained("google/functiongemma-270m-it")

	# Load LoRA adapters
	model = PeftModel.from_pretrained(base_model, "your-username/music-2func")

	# Define functions
	FUNCTIONS = [
	{
	"type": "function",
	"function": {
	"name": "play_song",
	"description": "Play a specific song by name or artist",
	"parameters": {
	"type": "object",
	"properties": {
	"song_name": {"type": "string", "description": "Name of the song to play"},
	"artist": {"type": "string", "description": "Artist name (optional)"},
	"album": {"type": "string", "description": "Album name (optional)"}
	},
	"required": ["song_name"]
	}
	}
	},
	{
	"type": "function",
	"function": {
	"name": "playback_control",
	"description": "Control music playback",
	"parameters": {
	"type": "object",
	"properties": {
	"action": {
	"type": "string",
	"enum": ["play", "pause", "skip", "next", "previous", "stop", "resume"]
	}
	},
	"required": ["action"]
	}
	}
	}
	]

	# Generate function call
	user_input = "Play Bohemian Rhapsody"

	messages = [{"role": "user", "content": user_input}]

	prompt = tokenizer.apply_chat_template(
	messages,
	tools=FUNCTIONS,
	add_generation_prompt=True,
	tokenize=False
	)

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=128,
	temperature=0.1,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
	print(response)
	```

	Expected Output:
	```
	<start_function_call>call:play_song{song_name:<escape>Bohemian Rhapsody<escape>}<end_function_call>
	```

	## Test Results

	\| Test Input \| Expected Function \| Result \|
	\|-----------\|------------------\|--------\|
	\| "Play Bohemian Rhapsody" \| `play_song` \| ✅ Pass \|
	\| "Pause the music" \| `playback_control` \| ✅ Pass \|
	\| "Skip to next song" \| `playback_control` \| ✅ Pass \|
	\| "Play Wonderwall" \| `play_song` \| ✅ Pass \|
	\| "Resume" \| `playback_control` \| ✅ Pass \|

	Success Rate: 100% (5/5 tests)

	## Training Methodology

	This model was trained using a gradual scaling approach to avoid cognitive overload:

	1. Started with 2 functions (play_song, playback_control)
	2. 30 examples per function covering diverse phrasings
	3. Correct format: Pass dict directly to `apply_chat_template` (NOT `json.dumps()`)

	### Key Learnings

	1. Critical Bug Fixed: Must pass arguments as dict, not `json.dumps(arguments)`
	2. Cognitive Overload: Training with 18 functions failed (0% accuracy), but 2 functions achieved 100%
	3. Gradual Scaling: Recommended path is 2→4→8→18 functions

	## Limitations

	- Only supports 2 functions (play_song and playback_control)
	- Trained on English language only
	- Best performance with clear, direct commands
	- Not compatible with Ollama (Ollama doesn't support FunctionGemma's dynamic tool schema)

	## Future Work

	- Scale to 4 functions (add search_music, create_playlist)
	- Scale to 8 functions (add volume control, queue management)
	- Eventually scale to full 18-function music system

	## Citation

	```bibtex
	@misc{music-2func-2026,
	title={FunctionGemma Music Assistant (2-Function)},
	author={Your Name},
	year={2026},
	publisher={HuggingFace},
	howpublished={\url{https://huggingface.co/your-username/music-2func}}
	}
	```

	## License

	Apache 2.0 (inherited from base model)

	## Base Model

	This model is based on [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it).