Transformers
English
tau/sled
maorivgi commited on
Commit
1753672
·
1 Parent(s): f600476

updated model card

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -2,3 +2,75 @@
2
  license: mit
3
  language: en
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  language: en
4
  ---
5
+
6
+ # BART-SLED (SLiding-Encoder and Decoder, base-sized model)
7
+
8
+ SLED models use pretrained, short-range encoder-decoder models, and apply them over
9
+ long-text inputs by splitting the input into multiple overlapping chunks, encoding each independtly and perform fusion-in-decoder
10
+
11
+ ## Model description
12
+
13
+ This SLED model is based on the BART model, which is described in its [model card](https://huggingface.co/facebook/bart-base).
14
+ BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works
15
+ well for comprehension tasks (e.g. text classification, question answering). When used as a BART-SLED model, it can be applied on long text tasks.
16
+
17
+ ## Intended uses & limitations
18
+
19
+ You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset.
20
+
21
+ ### How to use
22
+ To use the model, you first have to get a local copy of the SLED model from the [official repository](https://github.com/Mivg/SLED/blob/main/README.md).
23
+
24
+ Here is how to use this model in PyTorch:
25
+
26
+ ```python
27
+ from sled import SledTokenizer, SledModel
28
+ tokenizer = SledTokenizer.from_pretrained('tau/bart-base-sled')
29
+ model = SledModel.from_pretrained('tau/bart-base-sled')
30
+ inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
31
+ outputs = model(**inputs)
32
+ last_hidden_states = outputs.last_hidden_state
33
+ ```
34
+ You can also replace SledModel by SledModelForConditionalGeneration for Seq2Seq generation
35
+
36
+ In case you wish to apply SLED on a task containing a prefix (e.g. question) which should be given as a context to
37
+ every chunk, you can pass the `prefix_length` tensor input as well (A LongTensor in the length of the batch size).
38
+
39
+ Sled is fully compatible with the AutoClasses (AutoTokenizer, AutoConfig, AutoModel
40
+ and AutoModelForCausalLM) and can be loaded using the from_pretrained methods
41
+
42
+ ### BibTeX entry and citation info
43
+
44
+ Please cite both the SLED [paper](https://arxiv.org/abs/2208.00748.pdf) and the BART [paper](https://arxiv.org/abs/1910.13461) by Lewis et al
45
+
46
+ ```bibtex
47
+ @inproceedings{Ivgi2022EfficientLU,
48
+ title={Efficient Long-Text Understanding with Short-Text Models},
49
+ author={Maor Ivgi and Uri Shaham and Jonathan Berant},
50
+ year={2022}
51
+ }
52
+ ```
53
+
54
+ ```bibtex
55
+ @article{DBLP:journals/corr/abs-1910-13461,
56
+ author = {Mike Lewis and
57
+ Yinhan Liu and
58
+ Naman Goyal and
59
+ Marjan Ghazvininejad and
60
+ Abdelrahman Mohamed and
61
+ Omer Levy and
62
+ Veselin Stoyanov and
63
+ Luke Zettlemoyer},
64
+ title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
65
+ Generation, Translation, and Comprehension},
66
+ journal = {CoRR},
67
+ volume = {abs/1910.13461},
68
+ year = {2019},
69
+ url = {http://arxiv.org/abs/1910.13461},
70
+ eprinttype = {arXiv},
71
+ eprint = {1910.13461},
72
+ timestamp = {Thu, 31 Oct 2019 14:02:26 +0100},
73
+ biburl = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib},
74
+ bibsource = {dblp computer science bibliography, https://dblp.org}
75
+ }
76
+ ```