matchcommentary / model_card.md
abocide's picture
Upload folder using huggingface_hub
1ea8d66 verified
metadata
library_name: transformers
tags:
  - multimodal
  - video-understanding
  - sports
  - commentary-generation
  - llama3
  - soccer
language:
  - en
datasets:
  - MatchTime
pipeline_tag: text-generation

Matchcommentary: Automatic Soccer Game Commentary Generation

Model Description

Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.

Architecture

The model consists of:

  • Vision Encoder: Q-Former architecture for processing video features
  • Language Model: LLaMA-3-8B-Instruct for text generation
  • Feature Fusion: Cross-attention mechanism between visual and textual information
  • Domain Adaptation: Soccer-specific vocabulary constraints

Intended Use

Primary Use Cases

  • Automatic soccer game commentary generation
  • Sports video understanding and description
  • Multimodal video-to-text generation

Limitations

  • Trained specifically on soccer/football content
  • Requires pre-extracted video features
  • Performance may vary on different video qualities or angles

Training Data

The model was trained on the MatchTime dataset, which contains:

  • Soccer game videos with corresponding commentary
  • Multiple leagues and seasons
  • Temporal alignment between visual events and commentary

Performance

The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.

Usage

from models.matchvoice_model import matchvoice_model
import torch

# Load model
model = matchvoice_model(
    llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    num_video_query_token=32,
    num_features=512,
    device="cuda:0",
    inference=True
)

# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth")
model.load_state_dict(checkpoint)
model.eval()

# Generate commentary
with torch.no_grad():
    commentary = model(video_samples)