matchcommentary / model_card.md

abocide

Upload folder using huggingface_hub

1ea8d66 verified 5 months ago

preview code

raw

history blame contribute delete

2.1 kB

metadata

library_name: transformers
tags:
  - multimodal
  - video-understanding
  - sports
  - commentary-generation
  - llama3
  - soccer
language:
  - en
datasets:
  - MatchTime
pipeline_tag: text-generation

Matchcommentary: Automatic Soccer Game Commentary Generation

Model Description

Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.

Architecture

The model consists of:

Vision Encoder: Q-Former architecture for processing video features
Language Model: LLaMA-3-8B-Instruct for text generation
Feature Fusion: Cross-attention mechanism between visual and textual information
Domain Adaptation: Soccer-specific vocabulary constraints

Intended Use

Primary Use Cases

Automatic soccer game commentary generation
Sports video understanding and description
Multimodal video-to-text generation

Limitations

Trained specifically on soccer/football content
Requires pre-extracted video features
Performance may vary on different video qualities or angles

Training Data

The model was trained on the MatchTime dataset, which contains:

Soccer game videos with corresponding commentary
Multiple leagues and seasons
Temporal alignment between visual events and commentary

Performance

The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.

Usage

from models.matchvoice_model import matchvoice_model
import torch

# Load model
model = matchvoice_model(
    llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    num_video_query_token=32,
    num_features=512,
    device="cuda:0",
    inference=True
)

# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth")
model.load_state_dict(checkpoint)
model.eval()

# Generate commentary
with torch.no_grad():
    commentary = model(video_samples)