File size: 2,099 Bytes
1ea8d66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
library_name: transformers
tags:
- multimodal
- video-understanding
- sports
- commentary-generation
- llama3
- soccer
language:
- en
datasets:
- MatchTime
pipeline_tag: text-generation
---

# Matchcommentary: Automatic Soccer Game Commentary Generation

## Model Description

Matchcommentary is a multimodal model designed for automatic soccer game commentary generation. It combines video feature understanding with large language models to generate fluent and contextually appropriate soccer commentary.

## Architecture

The model consists of:
- **Vision Encoder**: Q-Former architecture for processing video features
- **Language Model**: LLaMA-3-8B-Instruct for text generation
- **Feature Fusion**: Cross-attention mechanism between visual and textual information
- **Domain Adaptation**: Soccer-specific vocabulary constraints

## Intended Use

### Primary Use Cases
- Automatic soccer game commentary generation
- Sports video understanding and description
- Multimodal video-to-text generation

### Limitations
- Trained specifically on soccer/football content
- Requires pre-extracted video features
- Performance may vary on different video qualities or angles

## Training Data

The model was trained on the MatchTime dataset, which contains:
- Soccer game videos with corresponding commentary
- Multiple leagues and seasons
- Temporal alignment between visual events and commentary

## Performance

The model achieves state-of-the-art performance on the MatchTime benchmark, with the best validation CIDEr score among tested configurations.

## Usage

```python
from models.matchvoice_model import matchvoice_model
import torch

# Load model
model = matchvoice_model(
    llm_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    tokenizer_ckpt="meta-llama/Meta-Llama-3-8B-Instruct",
    num_video_query_token=32,
    num_features=512,
    device="cuda:0",
    inference=True
)

# Load checkpoint
checkpoint = torch.load("model_save_best_val_CIDEr.pth")
model.load_state_dict(checkpoint)
model.eval()

# Generate commentary
with torch.no_grad():
    commentary = model(video_samples)
```