meeting-summarizer / README.md
CodeXRyu's picture
Enhanced meeting summarizer with QMSum dataset
d839810 verified
---
language: en
license: apache-2.0
tags:
- text-summarization
- meeting-summarization
- t5
- transformers
- qmsum
datasets:
- qmsum
metrics:
- rouge
pipeline_tag: summarization
---
# Meeting Summarizer
This model is a fine-tuned version of `t5-small` for meeting summarization tasks.
## Model Details
- **Base Model**: t5-small
- **Task**: Abstractive Meeting Summarization
- **Training Data**: QMSum Dataset + Enhanced Training
- **Parameters**: t5-small architecture
## Training Configuration
- **Max Input Length**: 256 tokens
- **Max Output Length**: 64 tokens
- **Batch Size**: 16
- **Learning Rate**: 5e-05
- **Training Epochs**: 1
- **Training Samples**: N/A
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CodeXRyu/meeting-summarizer")
model = AutoModelForSeq2SeqLM.from_pretrained("CodeXRyu/meeting-summarizer")
def generate_summary(meeting_text, max_length=150):
# Prepare input
input_text = "summarize: " + meeting_text
inputs = tokenizer(input_text, max_length=512, truncation=True, return_tensors="pt")
# Generate summary
summary_ids = model.generate(
inputs["input_ids"],
max_length=max_length,
num_beams=4,
length_penalty=2.0,
early_stopping=True
)
return tokenizer.decode(summary_ids[0], skip_special_tokens=True)
# Example usage
meeting_transcript = '''
John: Good morning team. Let's discuss our Q3 results.
Sarah: Our sales exceeded targets by 15%, reaching $2.1M in revenue.
Mike: The new marketing campaign was very effective.
John: Great work everyone. Let's plan for Q4.
'''
summary = generate_summary(meeting_transcript)
print(summary)
```
## Training Data
This model was trained on the QMSum dataset, which contains real meeting transcripts from multiple domains:
- Academic meetings
- Product development meetings
- Committee meetings
## Performance
The model achieves competitive ROUGE scores on meeting summarization benchmarks.
## Limitations
- Optimized for English meeting transcripts
- Performance may vary on very long meetings (>512 tokens input)
- Best suited for structured meeting formats with speaker labels
## Citation
If you use this model, please cite:
```
@misc{meeting-summarizer-codexryu,
author = {CodeXRyu},
title = {Meeting Summarizer},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/CodeXRyu/meeting-summarizer}
}
```