|
|
--- |
|
|
base_model: google/gemma-2-2b-it |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- lora |
|
|
- function-calling |
|
|
- sports |
|
|
- event-parsing |
|
|
- natural-language-processing |
|
|
license: gemma |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Gemma 2B Event Parser - Sports Event Function Calling |
|
|
|
|
|
A fine-tuned LoRA adapter for Gemma 2B that converts natural language descriptions into structured JSON for creating sports events. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model takes casual text like **"I want to play soccer this week Friday 4 PM @ Central Park"** and converts it into a properly formatted `CreateEventRequest` JSON object for backend API consumption. |
|
|
|
|
|
**Base Model:** `google/gemma-2-2b-it` |
|
|
**Fine-tuning Method:** LoRA (Low-Rank Adaptation) |
|
|
**Training Framework:** Transformers + PEFT |
|
|
**Primary Use Case:** Natural language to structured API requests for sports event creation |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
pip install transformers peft torch |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
import json |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"google/gemma-2-2b-it", |
|
|
device_map="auto", |
|
|
dtype=torch.float16 |
|
|
) |
|
|
|
|
|
# Load fine-tuned adapter |
|
|
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/gemma-event-parser") |
|
|
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/gemma-event-parser") |
|
|
|
|
|
# Define function schema |
|
|
function_schema = { |
|
|
"name": "create_sports_event", |
|
|
"description": "Create a new sports event from natural language description", |
|
|
"parameters": { |
|
|
"type": "object", |
|
|
"properties": { |
|
|
"sport": {"type": "string", "description": "Sport type (e.g., Soccer, Basketball, Tennis)"}, |
|
|
"venue_name": {"type": "string", "description": "Venue name"}, |
|
|
"start_time": {"type": "string", "description": "ISO 8601 format (e.g., 2026-02-07T16:00:00Z)"}, |
|
|
"max_participants": {"type": "integer", "default": 2}, |
|
|
"event_type": { |
|
|
"type": "string", |
|
|
"enum": ["Casual", "Light Training", "Looking to Improve", "Competitive Game"], |
|
|
"default": "Casual" |
|
|
} |
|
|
}, |
|
|
"required": ["sport", "venue_name", "start_time"] |
|
|
} |
|
|
} |
|
|
|
|
|
# Parse natural language |
|
|
def parse_event(user_query): |
|
|
prompt = f"""<start_of_turn>user |
|
|
{user_query} |
|
|
|
|
|
Available functions: |
|
|
{json.dumps([function_schema], indent=2)}<end_of_turn> |
|
|
<start_of_turn>model |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=256, |
|
|
temperature=0.1, |
|
|
do_sample=True, |
|
|
top_p=0.95 |
|
|
) |
|
|
|
|
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
# Extract JSON |
|
|
start = result.find("<function_call>") + len("<function_call>") |
|
|
end = result.find("</function_call>") |
|
|
function_call = json.loads(result[start:end].strip()) |
|
|
|
|
|
return function_call["arguments"] |
|
|
|
|
|
# Example |
|
|
query = "I want to play soccer this week Friday 4 PM @ Central Park" |
|
|
event_json = parse_event(query) |
|
|
print(json.dumps(event_json, indent=2)) |
|
|
``` |
|
|
|
|
|
**Output:** |
|
|
```json |
|
|
{ |
|
|
"sport": "Soccer", |
|
|
"venue_name": "Central Park", |
|
|
"start_time": "2026-02-07T16:00:00Z", |
|
|
"max_participants": 22, |
|
|
"event_type": "Casual" |
|
|
} |
|
|
``` |
|
|
|
|
|
## Examples |
|
|
|
|
|
| Input | Output | |
|
|
|-------|--------| |
|
|
| "Basketball game tomorrow 6pm at Riverside Courts, competitive" | `{"sport": "Basketball", "venue_name": "Riverside Courts", "start_time": "2026-02-07T18:00:00Z", "max_participants": 10, "event_type": "Competitive Game"}` | |
|
|
| "Tennis match Wednesday 10 AM Ashburn Park, looking to improve" | `{"sport": "Tennis", "venue_name": "Ashburn Park", "start_time": "2026-02-12T10:00:00Z", "max_participants": 2, "event_type": "Looking to Improve"}` | |
|
|
| "Casual volleyball Saturday 2pm Beach Courts" | `{"sport": "Volleyball", "venue_name": "Beach Courts", "start_time": "2026-02-08T14:00:00Z", "max_participants": 12, "event_type": "Casual"}` | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
Fine-tuned on synthetic examples covering: |
|
|
- Multiple sports (Soccer, Basketball, Tennis, Volleyball, Badminton, etc.) |
|
|
- Various time formats (relative dates, specific times) |
|
|
- All event types (Casual, Light Training, Looking to Improve, Competitive Game) |
|
|
- Different venue patterns |
|
|
|
|
|
**Training Size:** ~10-20 high-quality examples (LoRA requires less data) |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- **LoRA Rank (r):** 16 |
|
|
- **LoRA Alpha:** 32 |
|
|
- **Target Modules:** `q_proj, k_proj, v_proj, o_proj` |
|
|
- **Learning Rate:** 2e-4 |
|
|
- **Epochs:** 20 |
|
|
- **Batch Size:** 2 (with gradient accumulation: 4) |
|
|
- **Optimizer:** AdamW |
|
|
- **Scheduler:** Cosine with warmup |
|
|
- **Precision:** FP16 |
|
|
- **Training Time:** ~1-2 minutes on free Colab |
|
|
|
|
|
### Framework Versions |
|
|
|
|
|
- **Transformers:** 4.x |
|
|
- **PEFT:** 0.18.1 |
|
|
- **PyTorch:** 2.x |
|
|
- **Python:** 3.10+ |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Date Parsing:** Currently handles relative dates ("Friday", "tomorrow") but assumes current week context |
|
|
- **Time Zones:** Defaults to UTC (Z suffix) |
|
|
- **Sports Coverage:** Best performance on common sports; may need examples for niche sports |
|
|
- **Language:** English only |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
✅ **Good for:** |
|
|
- Converting casual user input to structured API requests |
|
|
- Sports event management applications |
|
|
- Voice-to-API integrations |
|
|
- Chatbot backends for sports booking |
|
|
|
|
|
❌ **Not suitable for:** |
|
|
- Mission-critical systems without validation |
|
|
- Non-English languages |
|
|
- Complex multi-event scheduling |
|
|
- Historical date parsing |
|
|
|
|
|
## License |
|
|
|
|
|
This adapter follows the [Gemma License](https://ai.google.dev/gemma/terms). The base model is subject to Google's Gemma terms of use. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
```bibtex |
|
|
@misc{gemma-event-parser-2026, |
|
|
author = {YOUR_NAME}, |
|
|
title = {Gemma 2B Event Parser - Sports Event Function Calling}, |
|
|
year = {2026}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/YOUR_USERNAME/gemma-event-parser} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: Google's Gemma 2B-IT |
|
|
- Fine-tuning framework: Hugging Face PEFT |
|
|
- Training compute: Google Colab |
|
|
|
|
|
--- |
|
|
|
|
|
**Questions?** Open an issue or discussion on this model's page! |