File size: 2,778 Bytes
c47aa2c
 
cb6202a
 
 
ad7be7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c47aa2c
cb6202a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
datasets: billsum
tags:
- summarization
model-index:
- name: d0r1h/LEDBill
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: billsum
      type: billsum
      config: default
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 38.6502
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 18.5458
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 25.6561
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 33.1575
      verified: true
    - name: loss
      type: loss
      value: 2.1305277347564697
      verified: true
    - name: gen_len
      type: gen_len
      value: 288.372
      verified: true
---

# Longformer Encoder-Decoder (LED) fine-tuned on Billsum



This model is a fine-tuned version of [led-base-16384](https://huggingface.co/allenai/led-base-16384) on the [billsum](https://huggingface.co/datasets/billsum) dataset.

As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, *led-base-16384* was initialized from [*bart-base*](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, *bart-base*'s position embedding matrix was simply copied 16 times.


## How to use

```Python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained("d0r1h/LEDBill")
model = AutoModelForSeq2SeqLM.from_pretrained("d0r1h/LEDBill", return_dict_in_generate=True).to(device)

case = "......."

input_ids = tokenizer(case, return_tensors="pt").input_ids.to(device)
global_attention_mask = torch.zeros_like(input_ids)
global_attention_mask[:, 0] = 1

sequences = model.generate(input_ids, 
                           global_attention_mask=global_attention_mask).sequences
summary = tokenizer.batch_decode(sequences, 
                                 skip_special_tokens=True)
                                 
```


## Evaluation results

When the model is used for summarizing Billsum documents(10 sample), it achieves the following results:

| Model |            rouge1-f  |  rouge1-p  |  rouge2-f   |  rouge2-p  | rougeL-f   |  rougeL-p  |
|:-----------:|:-----:|:-----:|:------:|:-----:|:------:|:-----:|
| LEDBill           | **34** |  **37** | **15**  | **16** | **30**  | **32** | 
| led-base          | 2     |  15     | 0      | 0     | 2      | 15 |

[This notebook](https://colab.research.google.com/drive/1iEEFbWeTGUSDesmxHIU2QDsPQM85Ka1K?usp=sharing) shows how *led* can effectively be used for downstream task such summarization.