|
|
|
|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-summarization |
|
|
- meeting-summarization |
|
|
- t5 |
|
|
- transformers |
|
|
- qmsum |
|
|
datasets: |
|
|
- qmsum |
|
|
metrics: |
|
|
- rouge |
|
|
pipeline_tag: summarization |
|
|
--- |
|
|
|
|
|
# Meeting Summarizer |
|
|
|
|
|
This model is a fine-tuned version of `t5-small` for meeting summarization tasks. |
|
|
|
|
|
## Model Details |
|
|
- **Base Model**: t5-small |
|
|
- **Task**: Abstractive Meeting Summarization |
|
|
- **Training Data**: QMSum Dataset + Enhanced Training |
|
|
- **Parameters**: t5-small architecture |
|
|
|
|
|
## Training Configuration |
|
|
- **Max Input Length**: 256 tokens |
|
|
- **Max Output Length**: 64 tokens |
|
|
- **Batch Size**: 16 |
|
|
- **Learning Rate**: 5e-05 |
|
|
- **Training Epochs**: 1 |
|
|
- **Training Samples**: N/A |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("CodeXRyu/meeting-summarizer") |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("CodeXRyu/meeting-summarizer") |
|
|
|
|
|
def generate_summary(meeting_text, max_length=150): |
|
|
# Prepare input |
|
|
input_text = "summarize: " + meeting_text |
|
|
inputs = tokenizer(input_text, max_length=512, truncation=True, return_tensors="pt") |
|
|
|
|
|
# Generate summary |
|
|
summary_ids = model.generate( |
|
|
inputs["input_ids"], |
|
|
max_length=max_length, |
|
|
num_beams=4, |
|
|
length_penalty=2.0, |
|
|
early_stopping=True |
|
|
) |
|
|
|
|
|
return tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
|
|
|
# Example usage |
|
|
meeting_transcript = ''' |
|
|
John: Good morning team. Let's discuss our Q3 results. |
|
|
Sarah: Our sales exceeded targets by 15%, reaching $2.1M in revenue. |
|
|
Mike: The new marketing campaign was very effective. |
|
|
John: Great work everyone. Let's plan for Q4. |
|
|
''' |
|
|
|
|
|
summary = generate_summary(meeting_transcript) |
|
|
print(summary) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
This model was trained on the QMSum dataset, which contains real meeting transcripts from multiple domains: |
|
|
- Academic meetings |
|
|
- Product development meetings |
|
|
- Committee meetings |
|
|
|
|
|
## Performance |
|
|
The model achieves competitive ROUGE scores on meeting summarization benchmarks. |
|
|
|
|
|
## Limitations |
|
|
- Optimized for English meeting transcripts |
|
|
- Performance may vary on very long meetings (>512 tokens input) |
|
|
- Best suited for structured meeting formats with speaker labels |
|
|
|
|
|
## Citation |
|
|
If you use this model, please cite: |
|
|
``` |
|
|
@misc{meeting-summarizer-codexryu, |
|
|
author = {CodeXRyu}, |
|
|
title = {Meeting Summarizer}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/CodeXRyu/meeting-summarizer} |
|
|
} |
|
|
``` |
|
|
|