bart-large-samsum / README.md

linydub

intentionally misspell rouge metrics as rogue for paperswithcode leaderboard

5d32c80 over 4 years ago

preview code

raw

history blame contribute delete

5.76 kB

metadata

language:
  - en
license: apache-2.0
tags:
  - summarization
  - azureml
  - azure
  - codecarbon
  - bart
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: bart-large-samsum
    results:
      - task:
          name: Abstractive Text Summarization
          type: abstractive-text-summarization
        dataset:
          name: >-
            SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive
            Summarization
          type: samsum
        metrics:
          - name: Validation ROGUE-1
            type: rouge-1
            value: 55.0234
          - name: Validation ROGUE-2
            type: rouge-2
            value: 29.6005
          - name: Validation ROGUE-L
            type: rouge-L
            value: 44.914
          - name: Validation ROGUE-Lsum
            type: rouge-Lsum
            value: 50.464
          - name: Test ROGUE-1
            type: rouge-1
            value: 53.4345
          - name: Test ROGUE-2
            type: rouge-2
            value: 28.7445
          - name: Test ROGUE-L
            type: rouge-L
            value: 44.1848
          - name: Test ROGUE-Lsum
            type: rouge-Lsum
            value: 49.1874
widget:
  - text: >
      Henry: Hey, is Nate coming over to watch the movie tonight?

      Kevin: Yea, he said he'll be arriving a bit later at around 7 since he
      gets off of work at 6. Have you taken out the garbage yet?

      Henry: Oh I forgot. I'll do that once I'm finished with my assignment for
      my math class.

      Kevin: Yea, you should take it out as soon as possible. And also, Nate is
      bringing his girlfriend.

      Henry: Nice, I'm really looking forward to seeing them again.

`bart-large-samsum`

This model was trained using Microsoft's Azure Machine Learning Service. It was fine-tuned on the samsum corpus from facebook/bart-large checkpoint.

Usage (Inference)

from transformers import pipeline
summarizer = pipeline("summarization", model="linydub/bart-large-samsum")

input_text = '''
    Henry: Hey, is Nate coming over to watch the movie tonight?
    Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet?
    Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
    Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend.
    Henry: Nice, I'm really looking forward to seeing them again.
'''
summarizer(input_text)

Fine-tune on AzureML

More information about the fine-tuning process (including samples and benchmarks):
[Preview] https://github.com/linydub/azureml-greenai-txtsum

Resource Usage

These results were retrieved from Azure Monitor Metrics. All experiments were ran on AzureML low priority compute clusters.

Key	Value
Region	US West 2
AzureML Compute SKU	STANDARD_ND40RS_V2
Compute SKU GPU Device	8 x NVIDIA V100 32GB (NVLink)
Compute Node Count	1
Run Duration	6m 48s
Compute Cost (Dedicated/LowPriority)	$2.50 / $0.50 USD
Average CPU Utilization	47.9%
Average GPU Utilization	69.8%
Average GPU Memory Usage	25.71 GB
Total GPU Energy Usage	370.84 kJ

*Compute cost ($) is estimated from the run duration, number of compute nodes utilized, and SKU's price per hour. Updated SKU pricing could be found here.

Carbon Emissions

These results were obtained using CodeCarbon. The carbon emissions are estimated from training runtime only (excl. setup and evaluation runtimes).

Key	Value
timestamp	2021-09-16T23:54:25
duration	263.2430217266083
emissions	0.029715544634717518
energy_consumed	0.09985062041235725
country_name	USA
region	Washington
cloud_provider	azure
cloud_region	westus2

Hyperparameters

max_source_length: 512
max_target_length: 90
fp16: True
seed: 1
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 1
learning_rate: 5e-5
num_train_epochs: 3.0
weight_decay: 0.1

Results

ROUGE	Score
eval_rouge1	55.0234
eval_rouge2	29.6005
eval_rougeL	44.914
eval_rougeLsum	50.464
predict_rouge1	53.4345
predict_rouge2	28.7445
predict_rougeL	44.1848
predict_rougeLsum	49.1874

Metric	Value
epoch	3.0
eval_gen_len	30.6027
eval_loss	1.4327096939086914
eval_runtime	22.9127
eval_samples	818
eval_samples_per_second	35.701
eval_steps_per_second	0.306
predict_gen_len	30.4835
predict_loss	1.4501988887786865
predict_runtime	26.0269
predict_samples	819
predict_samples_per_second	31.467
predict_steps_per_second	0.269
train_loss	1.2014821151207233
train_runtime	263.3678
train_samples	14732
train_samples_per_second	167.811
train_steps_per_second	1.321
total_steps	348
total_flops	4.26008990669865e+16