metadata
library_name: transformers
tags:
- multimodal
- video-understanding
- sports
- commentary-generation
- llama3
- soccer
language:
- en
datasets:
- MatchTime
pipeline_tag: text-generation
Matchcommentary: Automatic Soccer Game Commentary Generation
Model Description
Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.
Architecture
The model consists of:
- Vision Encoder: Q-Former architecture for processing video features
- Language Model: LLaMA-3-8B-Instruct for text generation
- Feature Fusion: Cross-attention mechanism between visual and textual information
- Domain Adaptation: Soccer-specific vocabulary constraints
Intended Use
Primary Use Cases
- Automatic soccer game commentary generation
- Sports video understanding and description
- Multimodal video-to-text generation
Limitations
- Trained specifically on soccer/football content
- Requires pre-extracted video features
- Performance may vary on different video qualities or angles
Training Data
The model was trained on the MatchTime dataset, which contains:
- Soccer game videos with corresponding commentary
- Multiple leagues and seasons
- Temporal alignment between visual events and commentary
Performance
The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.
Usage
from models.matchvoice_model import matchvoice_model
import torch
# Load model
model = matchvoice_model(
llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
num_video_query_token=32,
num_features=512,
device="cuda:0",
inference=True
)
# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth")
model.load_state_dict(checkpoint)
model.eval()
# Generate commentary
with torch.no_grad():
commentary = model(video_samples)