Instructions to use yunu919/pegasus-large-dialogue-summarization with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yunu919/pegasus-large-dialogue-summarization with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="yunu919/pegasus-large-dialogue-summarization")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("yunu919/pegasus-large-dialogue-summarization") model = AutoModelForSeq2SeqLM.from_pretrained("yunu919/pegasus-large-dialogue-summarization") - Notebooks
- Google Colab
- Kaggle
pegasus-large-dialogue-summarization
This model is a fine-tuned version of google/pegasus-large for English dialogue summarization.
It takes an English multi-turn dialogue as input and generates a short English summary.
Model Details
- Base model:
google/pegasus-large - Task: Dialogue Summarization
- Language: English
- Framework: Hugging Face Transformers
- Training environment: Google Colab with NVIDIA T4 GPU
Training Data
The model was trained on a dialogue summarization dataset with the following fields:
- input:
dialogue - target:
summary
After cleaning, the dataset size was:
- train: 14,730
- validation: 818
- test: 819
For a quicker experiment, training and evaluation were run on subsets:
- train: 3,000
- validation: 500
- test: 500
Preprocessing
The following preprocessing steps were applied:
- removed placeholder tokens such as
<file_gif> - normalized whitespace
- removed empty rows
- removed duplicate examples
Training Setup
- Max source length: 512
- Max target length: 48
- Epochs: 2
- Learning rate: 2e-5
- Beam size: 2
- Per-device train batch size: 1
- Per-device eval batch size: 1
- Gradient accumulation steps: 8
- Optimizer: Adafactor
Results
Validation
- ROUGE-1: 44.4848
- ROUGE-2: 21.2919
- ROUGE-L: 37.0605
- ROUGE-Lsum: 40.5285
Test
- ROUGE-1: 44.5355
- ROUGE-2: 20.5233
- ROUGE-L: 37.2629
- ROUGE-Lsum: 40.2454
Example
Input
Hannah: Hey, do you have Betty's number?
Amanda: Lemme check
Amanda: Sorry, can't find it.
Amanda: Ask Larry
Amanda: He called her last time we were at the park together
Hannah: I don't know him well
Amanda: Don't be shy, he's very nice
Hannah: I'd rather you texted him
Amanda: Just text him ๐
Hannah: Urgh.. Alright
Hannah: Bye
Amanda: Bye bye
Generated Summary
Betty's number is on Amanda's phone. Amanda will text Larry.
- Downloads last month
- 1