gemma-event-parser / README.md

Update README.md

d091a8b verified 6 days ago

6.2 kB

	---
	base_model: google/gemma-2-2b-it
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- function-calling
	- sports
	- event-parsing
	- natural-language-processing
	license: gemma
	language:
	- en
	---

	# Gemma 2B Event Parser - Sports Event Function Calling

	A fine-tuned LoRA adapter for Gemma 2B that converts natural language descriptions into structured JSON for creating sports events.

	## Model Description

	This model takes casual text like "I want to play soccer this week Friday 4 PM @ Central Park" and converts it into a properly formatted `CreateEventRequest` JSON object for backend API consumption.

	Base Model: `google/gemma-2-2b-it`
	Fine-tuning Method: LoRA (Low-Rank Adaptation)
	Training Framework: Transformers + PEFT
	Primary Use Case: Natural language to structured API requests for sports event creation

	## Usage

	### Installation
	```bash
	pip install transformers peft torch
	```

	### Quick Start
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch
	import json

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/gemma-2-2b-it",
	device_map="auto",
	dtype=torch.float16
	)

	# Load fine-tuned adapter
	model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/gemma-event-parser")
	tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-event-parser")

	# Define function schema
	function_schema = {
	"name": "create_sports_event",
	"description": "Create a new sports event from natural language description",
	"parameters": {
	"type": "object",
	"properties": {
	"sport": {"type": "string", "description": "Sport type (e.g., Soccer, Basketball, Tennis)"},
	"venue_name": {"type": "string", "description": "Venue name"},
	"start_time": {"type": "string", "description": "ISO 8601 format (e.g., 2026-02-07T16:00:00Z)"},
	"max_participants": {"type": "integer", "default": 2},
	"event_type": {
	"type": "string",
	"enum": ["Casual", "Light Training", "Looking to Improve", "Competitive Game"],
	"default": "Casual"
	}
	},
	"required": ["sport", "venue_name", "start_time"]
	}
	}

	# Parse natural language
	def parse_event(user_query):
	prompt = f"""<start_of_turn>user
	{user_query}

	Available functions:
	{json.dumps([function_schema], indent=2)}<end_of_turn>
	<start_of_turn>model
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.1,
	do_sample=True,
	top_p=0.95
	)

	result = tokenizer.decode(outputs[0], skip_special_tokens=True)

	# Extract JSON
	start = result.find("<function_call>") + len("<function_call>")
	end = result.find("</function_call>")
	function_call = json.loads(result[start:end].strip())

	return function_call["arguments"]

	# Example
	query = "I want to play soccer this week Friday 4 PM @ Central Park"
	event_json = parse_event(query)
	print(json.dumps(event_json, indent=2))
	```

	Output:
	```json
	{
	"sport": "Soccer",
	"venue_name": "Central Park",
	"start_time": "2026-02-07T16:00:00Z",
	"max_participants": 22,
	"event_type": "Casual"
	}
	```

	## Examples

	\| Input \| Output \|
	\|-------\|--------\|
	\| "Basketball game tomorrow 6pm at Riverside Courts, competitive" \| `{"sport": "Basketball", "venue_name": "Riverside Courts", "start_time": "2026-02-07T18:00:00Z", "max_participants": 10, "event_type": "Competitive Game"}` \|
	\| "Tennis match Wednesday 10 AM Ashburn Park, looking to improve" \| `{"sport": "Tennis", "venue_name": "Ashburn Park", "start_time": "2026-02-12T10:00:00Z", "max_participants": 2, "event_type": "Looking to Improve"}` \|
	\| "Casual volleyball Saturday 2pm Beach Courts" \| `{"sport": "Volleyball", "venue_name": "Beach Courts", "start_time": "2026-02-08T14:00:00Z", "max_participants": 12, "event_type": "Casual"}` \|

	## Training Details

	### Training Data

	Fine-tuned on synthetic examples covering:
	- Multiple sports (Soccer, Basketball, Tennis, Volleyball, Badminton, etc.)
	- Various time formats (relative dates, specific times)
	- All event types (Casual, Light Training, Looking to Improve, Competitive Game)
	- Different venue patterns

	Training Size: ~10-20 high-quality examples (LoRA requires less data)

	### Training Hyperparameters

	- LoRA Rank (r): 16
	- LoRA Alpha: 32
	- Target Modules: `q_proj, k_proj, v_proj, o_proj`
	- Learning Rate: 2e-4
	- Epochs: 20
	- Batch Size: 2 (with gradient accumulation: 4)
	- Optimizer: AdamW
	- Scheduler: Cosine with warmup
	- Precision: FP16
	- Training Time: ~1-2 minutes on free Colab

	### Framework Versions

	- Transformers: 4.x
	- PEFT: 0.18.1
	- PyTorch: 2.x
	- Python: 3.10+

	## Limitations

	- Date Parsing: Currently handles relative dates ("Friday", "tomorrow") but assumes current week context
	- Time Zones: Defaults to UTC (Z suffix)
	- Sports Coverage: Best performance on common sports; may need examples for niche sports
	- Language: English only

	## Intended Use

	✅ Good for:
	- Converting casual user input to structured API requests
	- Sports event management applications
	- Voice-to-API integrations
	- Chatbot backends for sports booking

	❌ Not suitable for:
	- Mission-critical systems without validation
	- Non-English languages
	- Complex multi-event scheduling
	- Historical date parsing

	## License

	This adapter follows the [Gemma License](https://ai.google.dev/gemma/terms). The base model is subject to Google's Gemma terms of use.

	## Citation

	If you use this model, please cite:
	```bibtex
	@misc{gemma-event-parser-2026,
	author = {YOUR_NAME},
	title = {Gemma 2B Event Parser - Sports Event Function Calling},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/YOUR_USERNAME/gemma-event-parser}
	}
	```

	## Acknowledgments

	- Base model: Google's Gemma 2B-IT
	- Fine-tuning framework: Hugging Face PEFT
	- Training compute: Google Colab

	---

	Questions? Open an issue or discussion on this model's page!