File size: 2,099 Bytes
1ea8d66 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
library_name: transformers
tags:
- multimodal
- video-understanding
- sports
- commentary-generation
- llama3
- soccer
language:
- en
datasets:
- MatchTime
pipeline_tag: text-generation
---
# Matchcommentary: Automatic Soccer Game Commentary Generation
## Model Description
Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.
## Architecture
The model consists of:
- **Vision Encoder**: Q-Former architecture for processing video features
- **Language Model**: LLaMA-3-8B-Instruct for text generation
- **Feature Fusion**: Cross-attention mechanism between visual and textual information
- **Domain Adaptation**: Soccer-specific vocabulary constraints
## Intended Use
### Primary Use Cases
- Automatic soccer game commentary generation
- Sports video understanding and description
- Multimodal video-to-text generation
### Limitations
- Trained specifically on soccer/football content
- Requires pre-extracted video features
- Performance may vary on different video qualities or angles
## Training Data
The model was trained on the MatchTime dataset, which contains:
- Soccer game videos with corresponding commentary
- Multiple leagues and seasons
- Temporal alignment between visual events and commentary
## Performance
The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.
## Usage
```python
from models.matchvoice_model import matchvoice_model
import torch
# Load model
model = matchvoice_model(
llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
num_video_query_token=32,
num_features=512,
device="cuda:0",
inference=True
)
# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth")
model.load_state_dict(checkpoint)
model.eval()
# Generate commentary
with torch.no_grad():
commentary = model(video_samples)
``` |