abocide
/

matchcommentary

Text Generation

Matchcommentary

video-understanding

Model card Files Files and versions

matchcommentary / model_card.md

abocide's picture

Upload folder using huggingface_hub

1ea8d66 verified 5 months ago

|

history blame contribute delete

2.1 kB

	---
	library_name: transformers
	tags:
	- multimodal
	- video-understanding
	- sports
	- commentary-generation
	- llama3
	- soccer
	language:
	- en
	datasets:
	- MatchTime
	pipeline_tag: text-generation
	---

	# Matchcommentary: Automatic Soccer Game Commentary Generation

	## Model Description

	Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.

	## Architecture

	The model consists of:
	- Vision Encoder: Q-Former architecture for processing video features
	- Language Model: LLaMA-3-8B-Instruct for text generation
	- Feature Fusion: Cross-attention mechanism between visual and textual information
	- Domain Adaptation: Soccer-specific vocabulary constraints

	## Intended Use

	### Primary Use Cases
	- Automatic soccer game commentary generation
	- Sports video understanding and description
	- Multimodal video-to-text generation

	### Limitations
	- Trained specifically on soccer/football content
	- Requires pre-extracted video features
	- Performance may vary on different video qualities or angles

	## Training Data

	The model was trained on the MatchTime dataset, which contains:
	- Soccer game videos with corresponding commentary
	- Multiple leagues and seasons
	- Temporal alignment between visual events and commentary

	## Performance

	The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.

	## Usage

	```python
	from models.matchvoice_model import matchvoice_model
	import torch

	# Load model
	model = matchvoice_model(
	llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
	tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
	num_video_query_token=32,
	num_features=512,
	device="cuda:0",
	inference=True
	)

	# Load checkpoint
	checkpoint = torch.load("model_save_best_val_CIDEr.pth")
	model.load_state_dict(checkpoint)
	model.eval()

	# Generate commentary
	with torch.no_grad():
	commentary = model(video_samples)
	```