pszemraj autoevaluator HF Staff SFconvertbot commited on
Commit
600de3b
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files

Co-authored-by: autoevaluator <autoevaluator@users.noreply.huggingface.co>
Co-authored-by: SFconvertbot <SFconvertbot@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.onnx filter=lfs diff=lfs merge=lfs -text
13
+ *.ot filter=lfs diff=lfs merge=lfs -text
14
+ *.parquet filter=lfs diff=lfs merge=lfs -text
15
+ *.pb filter=lfs diff=lfs merge=lfs -text
16
+ *.pt filter=lfs diff=lfs merge=lfs -text
17
+ *.pth filter=lfs diff=lfs merge=lfs -text
18
+ *.rar filter=lfs diff=lfs merge=lfs -text
19
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
21
+ *.tflite filter=lfs diff=lfs merge=lfs -text
22
+ *.tgz filter=lfs diff=lfs merge=lfs -text
23
+ *.wasm filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ model.safetensors filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ checkpoint-*/
README.md ADDED
@@ -0,0 +1,518 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license:
5
+ - apache-2.0
6
+ - bsd-3-clause
7
+ tags:
8
+ - summarization
9
+ - led
10
+ - summary
11
+ - longformer
12
+ - booksum
13
+ - long-document
14
+ - long-form
15
+ datasets:
16
+ - kmfoda/booksum
17
+ metrics:
18
+ - rouge
19
+ widget:
20
+ - text: large earthquakes along a given fault segment do not occur at random intervals
21
+ because it takes time to accumulate the strain energy for the rupture. The rates
22
+ at which tectonic plates move and accumulate strain at their boundaries are approximately
23
+ uniform. Therefore, in first approximation, one may expect that large ruptures
24
+ of the same fault segment will occur at approximately constant time intervals.
25
+ If subsequent main shocks have different amounts of slip across the fault, then
26
+ the recurrence time may vary, and the basic idea of periodic mainshocks must be
27
+ modified. For great plate boundary ruptures the length and slip often vary by
28
+ a factor of 2. Along the southern segment of the San Andreas fault the recurrence
29
+ interval is 145 years with variations of several decades. The smaller the standard
30
+ deviation of the average recurrence interval, the more specific could be the long
31
+ term prediction of a future mainshock.
32
+ example_title: earthquakes
33
+ - text: ' A typical feed-forward neural field algorithm. Spatiotemporal coordinates
34
+ are fed into a neural network that predicts values in the reconstructed domain.
35
+ Then, this domain is mapped to the sensor domain where sensor measurements are
36
+ available as supervision. Class and Section Problems Addressed Generalization
37
+ (Section 2) Inverse problems, ill-posed problems, editability; symmetries. Hybrid
38
+ Representations (Section 3) Computation & memory efficiency, representation capacity,
39
+ editability: Forward Maps (Section 4) Inverse problems Network Architecture (Section
40
+ 5) Spectral bias, integration & derivatives. Manipulating Neural Fields (Section
41
+ 6) Edit ability, constraints, regularization. Table 2: The five classes of techniques
42
+ in the neural field toolbox each addresses problems that arise in learning, inference,
43
+ and control. (Section 3). We can supervise reconstruction via differentiable forward
44
+ maps that transform Or project our domain (e.g, 3D reconstruction via 2D images;
45
+ Section 4) With appropriate network architecture choices, we can overcome neural
46
+ network spectral biases (blurriness) and efficiently compute derivatives and integrals
47
+ (Section 5). Finally, we can manipulate neural fields to add constraints and regularizations,
48
+ and to achieve editable representations (Section 6). Collectively, these classes
49
+ constitute a ''toolbox'' of techniques to help solve problems with neural fields
50
+ There are three components in a conditional neural field: (1) An encoder or inference
51
+ function € that outputs the conditioning latent variable 2 given an observation
52
+ 0 E(0) =2. 2 is typically a low-dimensional vector, and is often referred to aS
53
+ a latent code Or feature code_ (2) A mapping function 4 between Z and neural field
54
+ parameters O: Y(z) = O; (3) The neural field itself $. The encoder € finds the
55
+ most probable z given the observations O: argmaxz P(2/0). The decoder maximizes
56
+ the inverse conditional probability to find the most probable 0 given Z: arg-
57
+ max P(Olz). We discuss different encoding schemes with different optimality guarantees
58
+ (Section 2.1.1), both global and local conditioning (Section 2.1.2), and different
59
+ mapping functions Y (Section 2.1.3) 2. Generalization Suppose we wish to estimate
60
+ a plausible 3D surface shape given a partial or noisy point cloud. We need a suitable
61
+ prior over the sur- face in its reconstruction domain to generalize to the partial
62
+ observations. A neural network expresses a prior via the function space of its
63
+ architecture and parameters 0, and generalization is influenced by the inductive
64
+ bias of this function space (Section 5).'
65
+ example_title: scientific paper
66
+ - text: ' the big variety of data coming from diverse sources is one of the key properties
67
+ of the big data phenomenon. It is, therefore, beneficial to understand how data
68
+ is generated in various environments and scenarios, before looking at what should
69
+ be done with this data and how to design the best possible architecture to accomplish
70
+ this The evolution of IT architectures, described in Chapter 2, means that the
71
+ data is no longer processed by a few big monolith systems, but rather by a group
72
+ of services In parallel to the processing layer, the underlying data storage has
73
+ also changed and became more distributed This, in turn, required a significant
74
+ paradigm shift as the traditional approach to transactions (ACID) could no longer
75
+ be supported. On top of this, cloud computing is becoming a major approach with
76
+ the benefits of reducing costs and providing on-demand scalability but at the
77
+ same time introducing concerns about privacy, data ownership, etc In the meantime
78
+ the Internet continues its exponential growth: Every day both structured and unstructured
79
+ data is published and available for processing: To achieve competitive advantage
80
+ companies have to relate their corporate resources to external services, e.g.
81
+ financial markets, weather forecasts, social media, etc While several of the sites
82
+ provide some sort of API to access the data in a more orderly fashion; countless
83
+ sources require advanced web mining and Natural Language Processing (NLP) processing
84
+ techniques: Advances in science push researchers to construct new instruments
85
+ for observing the universe O conducting experiments to understand even better
86
+ the laws of physics and other domains. Every year humans have at their disposal
87
+ new telescopes, space probes, particle accelerators, etc These instruments generate
88
+ huge streams of data, which need to be stored and analyzed. The constant drive
89
+ for efficiency in the industry motivates the introduction of new automation techniques
90
+ and process optimization: This could not be done without analyzing the precise
91
+ data that describe these processes. As more and more human tasks are automated,
92
+ machines provide rich data sets, which can be analyzed in real-time to drive efficiency
93
+ to new levels. Finally, it is now evident that the growth of the Internet of Things
94
+ is becoming a major source of data. More and more of the devices are equipped
95
+ with significant computational power and can generate a continuous data stream
96
+ from their sensors. In the subsequent sections of this chapter, we will look at
97
+ the domains described above to see what they generate in terms of data sets. We
98
+ will compare the volumes but will also look at what is characteristic and important
99
+ from their respective points of view. 3.1 The Internet is undoubtedly the largest
100
+ database ever created by humans. While several well described; cleaned, and structured
101
+ data sets have been made available through this medium, most of the resources
102
+ are of an ambiguous, unstructured, incomplete or even erroneous nature. Still,
103
+ several examples in the areas such as opinion mining, social media analysis, e-governance,
104
+ etc, clearly show the potential lying in these resources. Those who can successfully
105
+ mine and interpret the Internet data can gain unique insight and competitive advantage
106
+ in their business An important area of data analytics on the edge of corporate
107
+ IT and the Internet is Web Analytics.'
108
+ example_title: data science textbook
109
+ - text: 'Transformer-based models have shown to be very useful for many NLP tasks.
110
+ However, a major limitation of transformers-based models is its O(n^2)O(n 2) time
111
+ & memory complexity (where nn is sequence length). Hence, it''s computationally
112
+ very expensive to apply transformer-based models on long sequences n > 512n>512.
113
+ Several recent papers, e.g. Longformer, Performer, Reformer, Clustered attention
114
+ try to remedy this problem by approximating the full attention matrix. You can
115
+ checkout 🤗''s recent blog post in case you are unfamiliar with these models.
116
+
117
+ BigBird (introduced in paper) is one of such recent models to address this issue.
118
+ BigBird relies on block sparse attention instead of normal attention (i.e. BERT''s
119
+ attention) and can handle sequences up to a length of 4096 at a much lower computational
120
+ cost compared to BERT. It has achieved SOTA on various tasks involving very long
121
+ sequences such as long documents summarization, question-answering with long contexts.
122
+
123
+ BigBird RoBERTa-like model is now available in 🤗Transformers. The goal of this
124
+ post is to give the reader an in-depth understanding of big bird implementation
125
+ & ease one''s life in using BigBird with 🤗Transformers. But, before going into
126
+ more depth, it is important to remember that the BigBird''s attention is an approximation
127
+ of BERT''s full attention and therefore does not strive to be better than BERT''s
128
+ full attention, but rather to be more efficient. It simply allows to apply transformer-based
129
+ models to much longer sequences since BERT''s quadratic memory requirement quickly
130
+ becomes unbearable. Simply put, if we would have ∞ compute & ∞ time, BERT''s attention
131
+ would be preferred over block sparse attention (which we are going to discuss
132
+ in this post).
133
+
134
+ If you wonder why we need more compute when working with longer sequences, this
135
+ blog post is just right for you!
136
+
137
+ Some of the main questions one might have when working with standard BERT-like
138
+ attention include:
139
+
140
+ Do all tokens really have to attend to all other tokens? Why not compute attention
141
+ only over important tokens? How to decide what tokens are important? How to attend
142
+ to just a few tokens in a very efficient way? In this blog post, we will try to
143
+ answer those questions.
144
+
145
+ What tokens should be attended to? We will give a practical example of how attention
146
+ works by considering the sentence ''BigBird is now available in HuggingFace for
147
+ extractive question answering''. In BERT-like attention, every word would simply
148
+ attend to all other tokens.
149
+
150
+ Let''s think about a sensible choice of key tokens that a queried token actually
151
+ only should attend to by writing some pseudo-code. Will will assume that the token
152
+ available is queried and build a sensible list of key tokens to attend to.
153
+
154
+ >>> # let''s consider following sentence as an example >>> example = [''BigBird'',
155
+ ''is'', ''now'', ''available'', ''in'', ''HuggingFace'', ''for'', ''extractive'',
156
+ ''question'', ''answering'']
157
+
158
+ >>> # further let''s assume, we''re trying to understand the representation of
159
+ ''available'' i.e. >>> query_token = ''available'' >>> # We will initialize an
160
+ empty `set` and fill up the tokens of our interest as we proceed in this section.
161
+ >>> key_tokens = [] # => currently ''available'' token doesn''t have anything
162
+ to attend Nearby tokens should be important because, in a sentence (sequence of
163
+ words), the current word is highly dependent on neighboring past & future tokens.
164
+ This intuition is the idea behind the concept of sliding attention.'
165
+ example_title: bigbird blog intro
166
+ - text: 'The majority of available text summarization datasets include short-form
167
+ source documents that lack long-range causal and temporal dependencies, and often
168
+ contain strong layout and stylistic biases. While relevant, such datasets will
169
+ offer limited challenges for future generations of text summarization systems.
170
+ We address these issues by introducing BookSum, a collection of datasets for long-form
171
+ narrative summarization. Our dataset covers source documents from the literature
172
+ domain, such as novels, plays and stories, and includes highly abstractive, human
173
+ written summaries on three levels of granularity of increasing difficulty: paragraph-,
174
+ chapter-, and book-level. The domain and structure of our dataset poses a unique
175
+ set of challenges for summarization systems, which include: processing very long
176
+ documents, non-trivial causal and temporal dependencies, and rich discourse structures.
177
+ To facilitate future work, we trained and evaluated multiple extractive and abstractive
178
+ summarization models as baselines for our dataset.'
179
+ example_title: BookSum Abstract
180
+ inference:
181
+ parameters:
182
+ max_length: 64
183
+ min_length: 8
184
+ no_repeat_ngram_size: 3
185
+ early_stopping: true
186
+ repetition_penalty: 3.5
187
+ length_penalty: 0.3
188
+ encoder_no_repeat_ngram_size: 3
189
+ num_beams: 4
190
+ model-index:
191
+ - name: pszemraj/led-large-book-summary
192
+ results:
193
+ - task:
194
+ type: summarization
195
+ name: Summarization
196
+ dataset:
197
+ name: kmfoda/booksum
198
+ type: kmfoda/booksum
199
+ config: kmfoda--booksum
200
+ split: test
201
+ metrics:
202
+ - type: rouge
203
+ value: 31.7308
204
+ name: ROUGE-1
205
+ verified: true
206
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjJmZjMxYTY0OGU3MzNjNmIzNmYyODNlNDg2ZGRhZDAzNTMwMDM5YWMxODc1OTc1ZWE3MzM2OTg1ODFhZDBkNCIsInZlcnNpb24iOjF9.B8BCKgySYVZW910_1zP0LfCpQYJbAe6loyWut76JlgZb2kV1_x9ybqtNESX0ka-lNqhYyXUNDpuS-7pTmsJVDg
207
+ - type: rouge
208
+ value: 5.3311
209
+ name: ROUGE-2
210
+ verified: true
211
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzViMmY4ODFjYTc5ODk5MmRhMDQ3ZDRiYWQwMDg0OTk3ZTA4NDAxYTNiNDgyMmI4NDA3ZDMwYWViOTBkODBjNyIsInZlcnNpb24iOjF9.MOhJLDcgvv93mVFL1igIgIiTAH3b2Xa4gmBObq7RF44Mmu8Kxtd1KP7rOlDVFOrtrsooGPGsyE1GMCQ2kqeMDg
212
+ - type: rouge
213
+ value: 16.1465
214
+ name: ROUGE-L
215
+ verified: true
216
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzNjMzEwMTliZGE3ZmQ4M2UxMDAyMTY3YzJjZmMyMDYyN2YyNDM0N2VhNzI1MDc1YTg4MTRjMmEzNjVkNTk1NCIsInZlcnNpb24iOjF9.XLJ-DVKiYLlbw5E5rWADKbzUzf5fNHhlTCWPCC5dU4NI9Yeh76aR7TPt36ZzLDwTBknnR8KHqlaF8F8YAvBUAg
217
+ - type: rouge
218
+ value: 29.0883
219
+ name: ROUGE-LSUM
220
+ verified: true
221
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTcwNzEwMmE5NjQxZTkzYmQyZDZmNzllYzYyNGI5OTMyNWMwNjdiM2I2YmM5YjdmY2E5OWQ3OTk3ZDA1MTc3YyIsInZlcnNpb24iOjF9.d6rFxjCB6RJNI_pn2DNNSjuZe4rdvj0RatkaTJRp5lP0F_AFfU5Zn9zRWzZJV7V-xMauIc4UhfdoLp9r_-CABA
222
+ - type: loss
223
+ value: 4.815707206726074
224
+ name: loss
225
+ verified: true
226
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTMwMTgxMmJkODY3MjkzOWJhMzJhOTIxMWVkODhjZmM0MWUzMWQ1N2JkZjRhOTQxNmU1YWVjYzQ0MDNlZWI3OSIsInZlcnNpb24iOjF9.mkBQHYhYFfDV6F4klXGJ1dSsF-pbCs-6F9zcw6IYznwmXUjtk7m5J4Zt4JAju5LKz4YizvEcUCl_L0WddnfvDA
227
+ - type: gen_len
228
+ value: 154.9036
229
+ name: gen_len
230
+ verified: true
231
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTc0ZmM1ZDM4MDE0MzY3MDM3OWJhNDkzZjJkZDdkMjU5M2JmMDJjYTIxODA1OTllNmY5ZWQzZDlmNWFiYzk4NiIsInZlcnNpb24iOjF9.VQ_O_xSTz870tnM08PJXQOwg9OsNNwI_HVX4S7AuW57_FzGGyRaWSuGE5SWzRS4Tur9YP0QxV4VV0Yoaoi3IAA
232
+ - task:
233
+ type: summarization
234
+ name: Summarization
235
+ dataset:
236
+ name: samsum
237
+ type: samsum
238
+ config: samsum
239
+ split: test
240
+ metrics:
241
+ - type: rouge
242
+ value: 33.4484
243
+ name: ROUGE-1
244
+ verified: true
245
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTk4Yjg1YTc4YmY0MzBiZDU4ZjFhNzI4MjZkMWU1MzBlOWNlMjQ5ODMzY2YzYzRhYjJkMGUzNmI3ZjdkMzIzZSIsInZlcnNpb24iOjF9.AqS8A1OUiM0IZFBEGirv5F3Novk8lSUYSfPc3bYWLA6t-W7wgup3qA207eGbE5j9CkDWZ7QrSG1U6Z9A0sOqAA
246
+ - type: rouge
247
+ value: 10.4249
248
+ name: ROUGE-2
249
+ verified: true
250
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2U4NjUyNTFmOGM5OTlhZDMyMTlmM2E4OWI2NGFiMDAyMGJjMzRjNWNlMGEyYWFmNTE5ZWMxM2I0ZGZmNWNmOCIsInZlcnNpb24iOjF9.SgJcHJ4qoRWXFvFiwv1PUutWktvsxQNynVPEv-GtBgxd6WI7o561ONyco5U-5tcyE_1SbSCJzz-L-R-q3cvoDA
251
+ - type: rouge
252
+ value: 24.5802
253
+ name: ROUGE-L
254
+ verified: true
255
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmQ5MDI5MzdiNGE5NDM0MmU5OThmZTBkNjkxMzg5N2IxNGVlODdhZTZhNjg3NzFjYWEyMzA3MTQxNjMyMjRkOCIsInZlcnNpb24iOjF9.Bg5dHqCcJjmxa-xGWNR5lD9g3quX7lKkH0pjiTd2xE5WiPoLLN2c0mYa2GovdW7__WnYwhhHC7es03jmvyZbCw
256
+ - type: rouge
257
+ value: 29.8226
258
+ name: ROUGE-LSUM
259
+ verified: true
260
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGFhOTEwNGM1MmZkNDk2ZjQ1Y2MyNjM3MGI5MGY3MWVkM2I0MjU2NWFiYmEwMjE4MTJlZWIwOGQ2MjQ3YjgzYSIsInZlcnNpb24iOjF9.W_aQKs10oXQdKEczJBGM3iiwJgb-VaXTpyA3sGof5WbhHf9vITAQA-xvynh5LgKtXQ1zjx737hnHgjEsu_Y0Cw
261
+ - type: loss
262
+ value: 4.176078796386719
263
+ name: loss
264
+ verified: true
265
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2JhODQ5YTZkNDZkZGYyNGU2MzkxMWU5MTEwMGM2YmVjZTA5YzI5NTMxMDNhYjhlOTAxMzFiMDYwYmM0MjEzZCIsInZlcnNpb24iOjF9.OvZrPBOR5jhkoTGBgsInkH7j3_xpacXHDoT7UIXEnyXzadfBO-O-K6fjalLNZw8wSkbjHIFcL_6S_qTTxPsNAQ
266
+ - type: gen_len
267
+ value: 65.4005
268
+ name: gen_len
269
+ verified: true
270
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2NhYjc3ZjQzNDEwYmMzOTM0ODkyZTJhZWNhNzZhYmEyZTYxMzA2YTYzMWFjOTA5ZjlhYWMzODg3NzY1ZTUwYSIsInZlcnNpb24iOjF9.vk9bgmtQFeRwdY3VXjtrJr_5wUCIeoAkI3kO0cHxhxmJo6RvUnyXiut72FuB-mlLZvqgiNkaZ-u_bh0Z3DjuCw
271
+ - task:
272
+ type: summarization
273
+ name: Summarization
274
+ dataset:
275
+ name: billsum
276
+ type: billsum
277
+ config: default
278
+ split: test
279
+ metrics:
280
+ - type: rouge
281
+ value: 40.5843
282
+ name: ROUGE-1
283
+ verified: true
284
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTVjMDkyMWZjYTQ0NzgzNGUxZjNiMTg3NjU1MWJlNTQ2MWQ1NjE1MDk1OTU4ZjJiNGQ5ODg3Y2VlMWUyMzllNyIsInZlcnNpb24iOjF9.OhqBcVIuHk7fzmdrsWMvUe1bLeVMZVstZUoZpP7C1vR-3aIDl7r6eBmPrt5w-KcNq5p4teNPBsq7oKzbd5ZgDQ
285
+ - type: rouge
286
+ value: 17.3401
287
+ name: ROUGE-2
288
+ verified: true
289
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGQxYmQzMmE0OTcyNTM5NmMwNjIxNzYxZDcwMDFkYzJkOWY4YWY3NTdhZGRhZDdlMDAxNzcwODQ5OGM3Mzc1MCIsInZlcnNpb24iOjF9.Pksn25EEqvmx757N7Swrd4yXc_xU7-AMN9yNe8lrbBa-l1LoI_2PUASvnjML4f705cfuyMAfb0FkFp5WfER2AA
290
+ - type: rouge
291
+ value: 25.1256
292
+ name: ROUGE-L
293
+ verified: true
294
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjhjYzI5MDBiMjk2NTY3MDNmZTdiOGYwMTRlYjIwZjAwMjdlNTAyYzdhYTJlODQ4MjYzYmQ3MjRlYTA2YzhhZSIsInZlcnNpb24iOjF9.1jPepsweS2bzIqDverQzzhmhFGch7gpoEGFGqQ8zW7K10aUKWFX8lt-uZAmTa1Z5ZhzyXGBzc3dReFPhWRRJBg
295
+ - type: rouge
296
+ value: 34.6619
297
+ name: ROUGE-LSUM
298
+ verified: true
299
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2VkZDIxNWJjOTA0NzFjOTIwOTdjYjc1M2EyNDVjZjY2ZjY3MjIxNDk3YTc5YWExNzAwN2FhOTc1NjVhYjBkYiIsInZlcnNpb24iOjF9.8opqHSUckPohoSF9jfPTpXDz2AtDwvdMqOdIXx2kE1tkOcbLPbOBfcc8RhRR98y8S26yC6EYFhFnf03CV2ejAQ
300
+ - type: loss
301
+ value: 4.792657375335693
302
+ name: loss
303
+ verified: true
304
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTY5ZTRkMGU3OGVkODMzMDU5OWE1NTM5YjA4NDliZDlmNzc2NzZjNjFmNTA3M2EwY2NmN2E0MWJmZjQ5ZDliMiIsInZlcnNpb24iOjF9.KCKdk8xt2NWcMmYKV3-9eVEsFm9MqGllSMu9QCFJFIQlnyNXllHKdBLouoaGQz8IRYXvZKH8_TLDPIQx-31jAg
305
+ - type: gen_len
306
+ value: 163.9394
307
+ name: gen_len
308
+ verified: true
309
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzdkZDYyZGUzYmFkZmI2NjUwYmQ0MzZjMmIyZjI1YTFiMzM4OThiZjBiMzljOTVkZTgwMjA0NTE5OGM2YmFjMiIsInZlcnNpb24iOjF9.XyMZLUdkUIF32KTJMuv_bJswQCx_Tfg4Fx823cURUixSeoIKps8_a634AreZ3Z8kb7bfE_sFGh3rM9KWsMxlDw
310
+ - task:
311
+ type: summarization
312
+ name: Summarization
313
+ dataset:
314
+ name: multi_news
315
+ type: multi_news
316
+ config: default
317
+ split: test
318
+ metrics:
319
+ - type: rouge
320
+ value: 39.0834
321
+ name: ROUGE-1
322
+ verified: true
323
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjYzMmVlMDM4MTNkMTI4MjAyMTU2YTg1ZWQwNTI1MmJlNGUwZmE1NTRmYTljZTQwY2RlMjcxOTgyZGMyYTc0ZiIsInZlcnNpb24iOjF9.6yuSr7UmsFatwqQ-mEO4gmsEtWI05kGB5Ib2pnl05H1OiPT2uUwmqdUytUw8KTx9u1jv9q0cTF1cL-n2kPEJAA
324
+ - type: rouge
325
+ value: 11.4043
326
+ name: ROUGE-2
327
+ verified: true
328
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWI5N2U2ZWI1ODM2MWUwOTIzYTAzNmRhNDA2OWEzZWRjMGEzMjBmY2EwN2YyYzU1NWE0YjIyZDE3MWE0MmMxZCIsInZlcnNpb24iOjF9.wonuxbBl25TzEaHUH_E816nHJ1OSXKfkaq7eJzbLpsfeGwcDklxUSxZxRO7VBiBMaY3Qttf9ywmEIPp40HnpBA
329
+ - type: rouge
330
+ value: 19.1813
331
+ name: ROUGE-L
332
+ verified: true
333
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjU1NDZhN2NkMzZiZGJkODE4NDZiYjViOTZkNGMyNDlkNjBlZmFjYzU1N2IzMjFjYjY1MDU1Zjk2MzA0M2U4NyIsInZlcnNpb24iOjF9.bTCRzv3J9NiCh4aV23tAWGTvrdQCv_RS40zGwC4AJXtGS40cY7tJHYwBf9U9_rCetDBxqfjJpdaUbCAOglxLAA
334
+ - type: rouge
335
+ value: 35.1581
336
+ name: ROUGE-LSUM
337
+ verified: true
338
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDNhNTUyZjE4NjYxYjIzYThmMDM2YWNhM2QwYzY1ODI2ZTE3NmNjMmVhOTAzZjZlOWQwYzc1NzU2NDNjNzIxMyIsInZlcnNpb24iOjF9.cWlSbEBgrMN5D-fV_yL9geNMyMkIItcVO3wehNJPzFi3E0v1-4q8pnX-UgjLzto8X7JLi6as2V_HtZE4-C-CDw
339
+ - type: loss
340
+ value: 4.654905319213867
341
+ name: loss
342
+ verified: true
343
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTc5Nzk0ODhiNWUzNTAxNzk2YzZmMjU2NDliY2UzOTYyYTdmZGEyYjI5NDNhOTE0MGUxOTgxMGVjMmNhM2UyMSIsInZlcnNpb24iOjF9.eBBAebcl3AwkrjR6a8BvoSjDfpw8LWTRFjyIFHVzspvoOKVfnO8_NB_UeR_K127OwXyoZ70Z7X_aKJOe-2kTDA
344
+ - type: gen_len
345
+ value: 186.2494
346
+ name: gen_len
347
+ verified: true
348
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWI2NjVlYjgwYWJiMjcyMDUzMzEwNDNjZTMxMDM0MjAzMzk1ZmIwY2Q1ZDQ2Y2M5NDBlMDEzYzFkNWEyNzJmNiIsInZlcnNpb24iOjF9.iZ1Iy7FuWL4GH7LS5EylVj5eZRC3L2ZsbYQapAkMNzR_VXPoMGvoM69Hp-kU7gW55tmz2V4Qxhvoz9cM8fciBA
349
+ - task:
350
+ type: summarization
351
+ name: Summarization
352
+ dataset:
353
+ name: cnn_dailymail
354
+ type: cnn_dailymail
355
+ config: 3.0.0
356
+ split: test
357
+ metrics:
358
+ - type: rouge
359
+ value: 32.8774
360
+ name: ROUGE-1
361
+ verified: true
362
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYWVlNjQzNWU1NTgyNTk2MzdhMDkyM2U3N2UzYzQ3ODJmOTJiMGViZDc0NzNiNDlmZGZmNTQzZmNjYTFjMzJmMCIsInZlcnNpb24iOjF9.qA54KJrGf79XCLnDrAMPp0saErVL_zKicLso9ZX2xxNdCANGExal5PFmmTT7aw7TUdkmUsNhmIRI9cBZ8J_1BA
363
+ - type: rouge
364
+ value: 13.3706
365
+ name: ROUGE-2
366
+ verified: true
367
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDMzZWVjZmQ4ZWI2MWZmMGEzNjJhY2JmZjJhZTYwMTk2OTM2ODhlMmFmYmMxZGUyZWQzMmUxYzA0ZjJiMjcwYiIsInZlcnNpb24iOjF9.03Di-BfbZoWAVqRJc3x37Tn1Ae6vtZWymZL2w1ob8OQ8iOggYwmDmNQwv-bCXjT7fLjXYvh9uTndYsL05nj_Ag
368
+ - type: rouge
369
+ value: 20.4365
370
+ name: ROUGE-L
371
+ verified: true
372
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjI5YzdjZmM0YmZjYTU0OTg3ZTRjZWZkYTU2NzhlZjkwNGE2YmUzYzI1OThjMDUxOTcyNzk3ZTUyNmIzMWYzZCIsInZlcnNpb24iOjF9.LDg9lCKTh74kilxRBpunGSeOXJohaICXWjNf525ck-1h21AtjIQB8U7BTm80eyNRe7yIQpAlgOruCAxRqpTHDw
373
+ - type: rouge
374
+ value: 30.4408
375
+ name: ROUGE-LSUM
376
+ verified: true
377
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTZhMGJjMzg0MzQxY2U2ZTIzYTYzOGRhMGEyYjY1ZjQyZjNmNGIwMzFjOWJjNzU2NWQzMzc1Y2IxYWZkZGY5YyIsInZlcnNpb24iOjF9.LkvaIEsw0U-osBR--46f7rsF-s1fcu19Z22DkvwiMwWJj9AnsUwDWNcCecIyi5tziQpUx0PpZEKyXAhCrVx1Bw
378
+ - type: loss
379
+ value: 5.3488945960998535
380
+ name: loss
381
+ verified: true
382
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTc4Y2JlZWRlNDRkOTI4ODQyZjBlMjU5NmUyZTZmNzJjYTg0NjM1YzI4NzUzYjhmODBkY2U4NGJiMTlhYTc2ZiIsInZlcnNpb24iOjF9.CB6oO5j3cKJPOelM8pwT2lTenp5bZTkBFC5MPYW_nus-O5F1s4DaY-gdSUK3baTkMXbQ2yqaI_g_QAfNVmqhDQ
383
+ - type: gen_len
384
+ value: 181.8326
385
+ name: gen_len
386
+ verified: true
387
+ verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOThmMGNlMGEwYjljMmNiZjdkMjc5NzZhNTYwMzAzOWFkYzA1NzZiNTIyN2IxNDJmOTk4MDliYzY2YjdjNGY4MSIsInZlcnNpb24iOjF9._buvRpxKLuKNNtOmALbFm3-nWCs2NCLh1l8gfVqDmKmv8JqJHQ27cdgZ4mklPLYOUhf6YWjby5_lp3ZGEctkCQ
388
+ ---
389
+ # led-large-book-summary
390
+
391
+ <a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
392
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
393
+ </a>
394
+
395
+ This model is a fine-tuned version of [allenai/led-large-16384](https://huggingface.co/allenai/led-large-16384) on the `BookSum` dataset (`kmfoda/booksum`). It aims to generalize well and be useful in summarizing lengthy text for both academic and everyday purposes.
396
+
397
+ - Handles up to 16,384 tokens input
398
+ - See the Colab demo linked above or try the [demo on Spaces](https://huggingface.co/spaces/pszemraj/summarize-long-text)
399
+
400
+ > **Note:** Due to inference API timeout constraints, outputs may be truncated before the fully summary is returned (try python or the demo)
401
+
402
+ ---
403
+
404
+ ## Basic Usage
405
+
406
+ To improve summary quality, use `encoder_no_repeat_ngram_size=3` when calling the pipeline object. This setting encourages the model to utilize new vocabulary and construct an abstractive summary.
407
+
408
+ Load the model into a pipeline object:
409
+
410
+ ```python
411
+ import torch
412
+ from transformers import pipeline
413
+
414
+ hf_name = 'pszemraj/led-large-book-summary'
415
+
416
+ summarizer = pipeline(
417
+ "summarization",
418
+ hf_name,
419
+ device=0 if torch.cuda.is_available() else -1,
420
+ )
421
+ ```
422
+
423
+ Feed the text into the pipeline object:
424
+
425
+ ```python
426
+ wall_of_text = "your words here"
427
+
428
+ result = summarizer(
429
+ wall_of_text,
430
+ min_length=16,
431
+ max_length=256,
432
+ no_repeat_ngram_size=3,
433
+ encoder_no_repeat_ngram_size=3,
434
+ repetition_penalty=3.5,
435
+ num_beams=4,
436
+ early_stopping=True,
437
+ )
438
+ ```
439
+
440
+ **Important:** For optimal summary quality, use the global attention mask when decoding, as demonstrated in [this community notebook](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing), see the definition of `generate_answer(batch)`.
441
+
442
+ If you're facing computing constraints, consider using the base version [`pszemraj/led-base-book-summary`](https://huggingface.co/pszemraj/led-base-book-summary).
443
+
444
+ ---
445
+
446
+ ## Training Information
447
+
448
+ ### Data
449
+
450
+ The model was fine-tuned on the [booksum](https://arxiv.org/abs/2105.08209) dataset. During training, the `chapter`was the input col, while the `summary_text` was the output.
451
+
452
+ ### Procedure
453
+
454
+ Fine-tuning was run on the BookSum dataset across 13+ epochs. Notably, the final four epochs combined the training and validation sets as 'train' to enhance generalization.
455
+
456
+ ### Hyperparameters
457
+
458
+ The training process involved different settings across stages:
459
+
460
+ - **Initial Three Epochs:** Low learning rate (5e-05), batch size of 1, 4 gradient accumulation steps, and a linear learning rate scheduler.
461
+ - **In-between Epochs:** Learning rate reduced to 4e-05, increased batch size to 2, 16 gradient accumulation steps, and switched to a cosine learning rate scheduler with a 0.05 warmup ratio.
462
+ - **Final Two Epochs:** Further reduced learning rate (2e-05), batch size reverted to 1, maintained gradient accumulation steps at 16, and continued with a cosine learning rate scheduler, albeit with a lower warmup ratio (0.03).
463
+
464
+ ### Versions
465
+
466
+ - Transformers 4.19.2
467
+ - Pytorch 1.11.0+cu113
468
+ - Datasets 2.2.2
469
+ - Tokenizers 0.12.1
470
+
471
+ ---
472
+
473
+ ## Simplified Usage with TextSum
474
+
475
+ To streamline the process of using this and other models, I've developed [a Python package utility](https://github.com/pszemraj/textsum) named `textsum`. This package offers simple interfaces for applying summarization models to text documents of arbitrary length.
476
+
477
+ Install TextSum:
478
+
479
+ ```bash
480
+ pip install textsum
481
+ ```
482
+
483
+ Then use it in Python with this model:
484
+
485
+ ```python
486
+ from textsum.summarize import Summarizer
487
+
488
+ model_name = "pszemraj/led-large-book-summary"
489
+ summarizer = Summarizer(
490
+ model_name_or_path=model_name, # you can use any Seq2Seq model on the Hub
491
+ token_batch_length=4096, # tokens to batch summarize at a time, up to 16384
492
+ )
493
+ long_string = "This is a long string of text that will be summarized."
494
+ out_str = summarizer.summarize_string(long_string)
495
+ print(f"summary: {out_str}")
496
+ ```
497
+
498
+ Currently implemented interfaces include a Python API, a Command-Line Interface (CLI), and a demo/web UI.
499
+
500
+ For detailed explanations and documentation, check the [README](https://github.com/pszemraj/textsum) or the [wiki](https://github.com/pszemraj/textsum/wiki)
501
+
502
+
503
+ ---
504
+
505
+ ## Related Models
506
+
507
+ Check out these other related models, also trained on the BookSum dataset:
508
+
509
+ - [LED-large continued](https://huggingface.co/pszemraj/led-large-book-summary-continued) - experiment with further fine-tuning
510
+ - [Long-T5-tglobal-base](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary)
511
+ - [BigBird-Pegasus-Large-K](https://huggingface.co/pszemraj/bigbird-pegasus-large-K-booksum)
512
+ - [Pegasus-X-Large](https://huggingface.co/pszemraj/pegasus-x-large-book-summary)
513
+ - [Long-T5-tglobal-XL](https://huggingface.co/pszemraj/long-t5-tglobal-xl-16384-book-summary)
514
+
515
+ There are also other variants on other datasets etc on my hf profile, feel free to try them out :)
516
+
517
+
518
+ ---
config.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "pszemraj/led-large-book-summary",
3
+ "_num_labels": 3,
4
+ "activation_dropout": 0.0,
5
+ "activation_function": "gelu",
6
+ "architectures": [
7
+ "LEDForConditionalGeneration"
8
+ ],
9
+ "attention_dropout": 0.0,
10
+ "attention_window": [
11
+ 1024,
12
+ 1024,
13
+ 1024,
14
+ 1024,
15
+ 1024,
16
+ 1024,
17
+ 1024,
18
+ 1024,
19
+ 1024,
20
+ 1024,
21
+ 1024,
22
+ 1024
23
+ ],
24
+ "bos_token_id": 0,
25
+ "classif_dropout": 0.0,
26
+ "classifier_dropout": 0.0,
27
+ "d_model": 1024,
28
+ "decoder_attention_heads": 16,
29
+ "decoder_ffn_dim": 4096,
30
+ "decoder_layerdrop": 0.0,
31
+ "decoder_layers": 12,
32
+ "decoder_start_token_id": 2,
33
+ "dropout": 0.1,
34
+ "early_stopping": true,
35
+ "encoder_attention_heads": 16,
36
+ "encoder_ffn_dim": 4096,
37
+ "encoder_layerdrop": 0.0,
38
+ "encoder_layers": 12,
39
+ "eos_token_id": 2,
40
+ "id2label": {
41
+ "0": "LABEL_0",
42
+ "1": "LABEL_1",
43
+ "2": "LABEL_2"
44
+ },
45
+ "init_std": 0.02,
46
+ "is_encoder_decoder": true,
47
+ "label2id": {
48
+ "LABEL_0": 0,
49
+ "LABEL_1": 1,
50
+ "LABEL_2": 2
51
+ },
52
+ "length_penalty": 0.8,
53
+ "max_decoder_position_embeddings": 1024,
54
+ "max_encoder_position_embeddings": 16384,
55
+ "max_length": 1024,
56
+ "min_length": 8,
57
+ "model_type": "led",
58
+ "no_repeat_ngram_size": 3,
59
+ "num_beams": 4,
60
+ "num_hidden_layers": 12,
61
+ "output_past": false,
62
+ "pad_token_id": 1,
63
+ "prefix": " ",
64
+ "repetition_penalty": 3.5,
65
+ "torch_dtype": "float32",
66
+ "transformers_version": "4.19.2",
67
+ "use_cache": true,
68
+ "vocab_size": 50265
69
+ }
ds_config_zero2.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "amp": {
3
+ "enabled": "auto",
4
+ "opt_level": "auto"
5
+ },
6
+
7
+ "optimizer": {
8
+ "type": "AdamW",
9
+ "params": {
10
+ "lr": "auto",
11
+ "betas": "auto",
12
+ "eps": "auto",
13
+ "weight_decay": "auto"
14
+ }
15
+ },
16
+
17
+
18
+ "zero_optimization": {
19
+ "stage": 2,
20
+ "offload_optimizer": {
21
+ "device": "cpu",
22
+ "pin_memory": true
23
+ },
24
+ "allgather_partitions": true,
25
+ "allgather_bucket_size": 2e8,
26
+ "overlap_comm": true,
27
+ "reduce_scatter": true,
28
+ "reduce_bucket_size": 2e8,
29
+ "round_robin_gradients": true,
30
+ "contiguous_gradients": true
31
+ },
32
+
33
+ "gradient_accumulation_steps": "auto",
34
+ "gradient_clipping": "auto",
35
+ "steps_per_print": 4000,
36
+ "train_batch_size": "auto",
37
+ "train_micro_batch_size_per_gpu": "auto",
38
+ "wall_clock_breakdown": false
39
+ }
evals-outputs/GAUNTLET.md ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # gauntlet results
2
+
3
+ These are are this model's output results on my "summarization gauntlet". You can find more info about that [here on my dropbox for it](https://www.dropbox.com/sh/axu1xlscrrexy55/AADAm01-4Zs3POyHQrgbDAsda?dl=0) or at [this dataset](https://huggingface.co/datasets/pszemraj/summcomparer-gauntlet-v0p1).
4
+
5
+ - if you aren't familiar with it, one thing to note is some of the docs **purposefully** are "messy"/have spelling errors etc.
6
+
7
+ parameters
8
+
9
+ ```json
10
+ {
11
+ "model_name_or_path": "pszemraj/led-large-book-summary",
12
+ "use_cuda": true,
13
+ "token_batch_length": 16384,
14
+ "batch_stride": 16,
15
+ "max_length_ratio": 0.25,
16
+ "load_in_8bit": false,
17
+ "compile_model": true,
18
+ "optimum_onnx": false,
19
+ "device": "cuda",
20
+ "inference_params": {
21
+ "min_length": 8,
22
+ "max_length": 4096,
23
+ "no_repeat_ngram_size": 3,
24
+ "encoder_no_repeat_ngram_size": 4,
25
+ "repetition_penalty": 2.5,
26
+ "num_beams": 10,
27
+ "num_beam_groups": 1,
28
+ "length_penalty": 1.0,
29
+ "early_stopping": true,
30
+ "do_sample": false
31
+ },
32
+ "textsum_version": "0.2.0"
33
+ }
34
+ ```
35
+
36
+ - Created: `2023-11-28T20:15:03.532200`
37
+
38
+ ## ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary
39
+
40
+ In this and the next two talks, Dr. Barlow discusses some of the most important issues in the field of machine learning. He begins by explaining that he wants to discuss the "fundamental computational operations" that go into constructing natural language and why they should be used rather than other types of artificial intelligence. In particular, he would like to discuss several things: 1) what we're trying to achieve by studying language; 2) why there are so many different ways to look at language; and 3) why we need to find a common foundation for all of these different approaches. In order to answer these broad questions, he goes back to the beginning of the scientific revolutions of the seventeenth century when people marveled at the fact that we could create such a large number of sounds with such a small amount of computing power. Back then, scientists believed that it was possible to express thoughts using only a few different sounds. Now, though, advances in science have made it possible to understand the entire range of information within a single language or set of words. This has led to new developments in the fields of linguistics, including the concept of structure theory, which is now called "the null hypothesis." The purpose of this talk is to give a more detailed overview of the various approaches to machine learning where each approach has its own set of advantages and disadvantages.
41
+
42
+ ---
43
+
44
+ Section Scores for ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853561_0_part1_summary:
45
+
46
+ - -0.8402
47
+
48
+ ---
49
+
50
+ ## ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary
51
+
52
+ the UM more clearly lays out the problems with the concept of merge and explains how he thinks it can be used to solve real-world problems. Earlier in the chapter, he had talked about how the concept was loosely defined and other applications that had been given, but now he sees that many of these applications don't fit within the original intent of the merge and are therefore illegitimate. Later, he would like to address some of the specific problems caused by this lack of clarity. Here, he discusses the principle of stochasticism, or "separation of equals." The idea is that if two objects are joined together, they will eventually form a whole new set of objects. He uses the example of a sandwich to demonstrate how this can be done. In addition, he uses examples from the natural sciences such as geometry and logic to show how the same set of ideas can be applied to different types of problems. Finally, he introduces concepts such as "parallel" and "no-tide," which are used in deterministic machine learning.
53
+ the author discusses the notion of resource restriction and how it relates to head movement. In particular, he discusses the concept of an adjunction structure, which is a pair- merged structure. He discusses the problems posed by this structure, such as what happens if you try to build something with an existing structure x and then decide to merge it with another structure y. This leads to the problem of what happens when you want to build up an exocentric structure with no space left for new additions. The author suggests that there is a way to solve this problem by "sharpening the idea of restricting computation." That is, instead of adding lots of stuff to the screen, you should only add a tiny bit at a time.
54
+
55
+ ---
56
+
57
+ Section Scores for ASR-whisper-rpunctuated_Noam Chomsky, Fundam_1669853631_0_part2_summary:
58
+
59
+ - -0.7798
60
+
61
+ - -0.7059
62
+
63
+ ---
64
+
65
+ ## ASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary
66
+
67
+ The course is a "applied Natural Language Processing" course where the students learn about natural language processing and machine learning. The professor explains to the class why this course is different from other courses he has taught in the past and discusses some of the tools that will be used in this course including but not limited to machine learning, natural language analysis, social science, and political science. He also gives a detailed overview of the course syllabus with all of the sections as they will apply to the real-world use of these tools in the real world.
68
+
69
+ ---
70
+
71
+ Section Scores for ASRnlp_law_lecture_week_1_v_2_c_transcription_1_summary:
72
+
73
+ - -0.6212
74
+
75
+ ---
76
+
77
+ ## ASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary
78
+
79
+ Now that the class is live, the professor explains some of the things that people might be wondering about. For example, should students copy paste their notes so they know what they're doing? Should students do homework in general without books? The course model has been set up so that anyone who has any questions can just hit it off with the professor and he'll answer all of her questions from there. Some people wanted to know if courses could count different kinds of courses as minor courses or if credits were counted differently on different types of courses. The professor says that these sorts of questions can be handled by talking to someone in your own department. Another thing the professor wanted to talk about was how to get a course project done during the school year so that you don't have to wait until the summer for it to be due. Next, he talked about feature-specific tasks like deciding which words to include in a text and which ones to cut down on. He also talked about learning to classify using machine learning. Finally, he gave us a long description of how he's going to help people learn to read and write.
80
+
81
+ ---
82
+
83
+ Section Scores for ASRnlp_law_lecture_week_2_v_2_c_transcription_2_summary:
84
+
85
+ - -0.7026
86
+
87
+ ---
88
+
89
+ ## ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary
90
+
91
+ The next morning, Dr. Barlow and his colleagues get up early and go over some of the slides from the previous week. This includes a bunch of machine learning exercises that focus on different types of data. For example, they're going to do some spot-on analysis of phrases in English so that they can predict what each phrase is going to mean in terms of who's using it and how people are using it. A lot of this will be useful for companies that need to make some sort of quick decision about whether or not to hire an outside company to help them with a particular problem. The first thing the team does is use machine learning to predict the likelihood that a given piece of information will be used in a particular business transaction. Next, they do some deep learning exercises where they train their artificial intelligence systems on a set of randomly generated strings of numbers and then run them through a series of filters to see what kinds of patterns they see. They also do some modeling exercises that involve clustering and prediction using machine learning. Finally, they give a short tutorial on topic-learning, which is a technique for predicting the meaning of words in documents by clustering them into topics.
92
+
93
+ ---
94
+
95
+ Section Scores for ASRnlp_law_lecture_week_3_part_1_v_2_c_transcription_3_summary:
96
+
97
+ - -0.8123
98
+
99
+ ---
100
+
101
+ ## Emie_dissertation_cleansed_summary
102
+
103
+ Dissertation No. 10 is entitled "Act of Violence and the Man Between" and it consists of only 14,952 words. It is a dissertation on post-war American and British film Noir. The narrator thanks his mentors, Dr. Lisa Mullen and Mr. Rhodes, for helping him to develop as a scholar. He also thanks his course mates, especially Sarah Sharp and Sebastian Benzecry, for inspiring him. Most of all, he is thankful for his family.
104
+ Ivo tries to convince Susanne to go to the opera with him, but she rejects his offer. During their confrontation, she accidentally slaps a lightbulb, causing it to go off and send her into a tizzy. Reed uses "material reality" to heighten the tension in this scene as well as other similar ones in the film. Another important consideration in this chapter is the relationship that the various characters have with the outside world. Specifically, we get to see more of Ivo, Susanne, and Reed's use of the camera to track their movement through the city. This allows the audience to feel what they are experiencing, and helps to create a sense of distance between the two characters. The subplot of The Man Between also uses the cinema as a vehicle for showing the different ways in which people deal with the "post-war period" in the Western Sector of Berlin.
105
+
106
+ ---
107
+
108
+ Section Scores for Emie_dissertation_cleansed_summary:
109
+
110
+ - -0.5613
111
+
112
+ - -0.6878
113
+
114
+ ---
115
+
116
+ ## OCR_ML4HLecture02image__summary
117
+
118
+ Gunnar Ratsch, a graduate student at the School of Engineering at the University of Zurich, presents a lecture on machine learning to the faculty and students in the Department of Computer Science. The purpose of the lecture is to advance the use of artificial intelligence in medicine by enabling doctors to more easily diagnose diseases without the need for expensive special equipment. In order to aid in this mission, the university has invested a large amount of money in the creation of a number of cutting-edge research centers devoted to the study of medical image analysis. This lecture will consist of only a few of the many lectures that are expected to be given in the course of the next several years. The topics covered in this lecture include but are not limited to:Medical image analysis problemsMachine learning for Healthcare 99 - The Basics of Machine Learning for Medical Imaging Julia Vogt and Valentina Boevo & Gunnar Ratsche Institute for Machine Logistics, Computer ScienceDepartment @ gxr @gelnarschinstitute #datastorectorserviceservice #dataScience #precisionmedicine #clinicaldata Li BPORicys DinFKD BIOL University Hospital Zurich Gunnar ratsch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33Next, the lecture focuses on some of the most commonly used terms in the field of medicine such as "segmentation" and "separation." Segmentation is the division or segmentation of images into regions. It can be applied to both biological and non-biological tissues. A section on segmentation is also included in the lecture
119
+
120
+ ---
121
+
122
+ Section Scores for OCR_ML4HLecture02image__summary:
123
+
124
+ - -0.778
125
+
126
+ ---
127
+
128
+ ## OCR_ML4HLecture04RepresentationLearning.pptx__summary
129
+
130
+ Ratsch, a graduate student at the University of Ingolstadt in Germany, demonstrates how machine learning can be used to predict the future in order to improve medicine by predicting which drugs will work and which ones will fail. He uses a model developed by Eynsford-Rand for predicting disease severity and predicts that heart failure will be more common in women than in men. He predicts that smoking cessation will have less of a negative effect on mortality than if smoking had been thought of as a curative or curative agent. He also predicts that there will be fewer false alarms about heart failure in women because they are more likely to respond to specific drugs than men.
131
+
132
+ ---
133
+
134
+ Section Scores for OCR_ML4HLecture04RepresentationLearning.pptx__summary:
135
+
136
+ - -0.7075
137
+
138
+ ---
139
+
140
+ ## OCR_ML4HLecture05-NLP.pptx__summary
141
+
142
+ Ezurich delivers a lecture on machine learning to the entire faculty at the Russian Academy of Engineering. The purpose of the lecture is to advance the art of machine learning in the field of medicine by making use of natural language processing and statistical analysis in order to predict patient outcomes in real-time. The authors do not cover all of the topics covered in the lecture but rather they cover the basics.
143
+
144
+ ---
145
+
146
+ Section Scores for OCR_ML4HLecture05-NLP.pptx__summary:
147
+
148
+ - -0.7488
149
+
150
+ ---
151
+
152
+ ## OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary
153
+
154
+ the authors discuss the advances in artificial intelligence that have been made in the field of video-on-demand since the advent of computer vision. They examine how different types of artificial intelligence--including those based on machine learning and deep learning--have been used to predict the behavior of actors in live television shows and movies. Their goal is to create a reliable, high-quality, fully-prepared model of how humans watch and learn from video clips. They use a variety of approaches to train their model, which they call "Cog Video," to predict what scenes to show in upcoming TV shows.
155
+
156
+ ---
157
+
158
+ Section Scores for OCR_PAPER_Hong et al. - 2022 - CogVideo Large-scale Pretraining for Text-to-Video Generation via Transformers-annotated__summary:
159
+
160
+ - -0.7125
161
+
162
+ ---
163
+
164
+ ## OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary
165
+
166
+ The authors present a new approach to enhancing music performance by using machine learning and artificial intelligence to improve the sound of music. Their approach works better than previous methods such as those of "speech enhancement" and "audio perception" and they note that it does not interfere with human perception. They discuss their approach in more detail in this paper.
167
+
168
+ ---
169
+
170
+ Section Scores for OCR_PAPER_Kandpal, Nieto, Jin - 2022 - Music Enhancement via Image Translation and Vocoding-annotated__summary:
171
+
172
+ - -0.7279
173
+
174
+ ---
175
+
176
+ ## OCR_PAPER_dall-e-2-annotated__summary
177
+
178
+ This paper describes a method of image generation using computer vision and machine learning to predict and classify images. It uses two different approaches to train its model: first, it uses an image-learning algorithm to predict objects in images, and second, it applies the same principle to predictions of sentimentality and sentimentality in text. The underlying principles guiding both approaches are shown in Figure 1. In both cases, however, the final product is more accurate and better discriminative than either approach. The authors believe that this superiority to statistical analysis lies in the fact that the underlying principles are more easily modifiable and thus easier to apply to real-world applications.
179
+
180
+ ---
181
+
182
+ Section Scores for OCR_PAPER_dall-e-2-annotated__summary:
183
+
184
+ - -0.7276
185
+
186
+ ---
187
+
188
+ ## The Most Dangerous Game--Richard Connell_summary
189
+
190
+ Rainsford is sitting on the deck of the ship with General Zaroff and Captain Nielsen when Whitney, another sailor, tells him that they need to head south in order to hunt for jaguars in the Amazon. There is a superstition that some sailors have a dread of this island because of a bad reputation; it has been said that it is full of cannibals and other bad things. Even Capt Nielsen, the toughest of the sailors, was afraid to go ashore because of this bad reputation. But Rainsford tries to change the subject by talking about the dangerous game they are about to hunt. He talks about the reputation of the island as though it were full of bad ghosts and evil spirits. Then he talks about how Captain Nielsen felt a "mental chill" when they were drawing closer to the island. This chill was so strong that even Captain Nielsen had to take off his clothes and shout at the top of his lungs. The captain explains that sailors have extra senses that tell them what sort of danger they are in, and that an evil place like this can send out vibes of evil. So far as he can tell, the men aboard the ship haven't picked up on any of these feelings, but he hopes they will once they get to the South American jungle. They approach the island, which is dark and misty. As they climb higher up the side of the Hispaniola towards the high treetops, there is a loud noise of something crashing down below them. It turns out to be a gun shot fired from somewhere deep down in the jungle. Something else is going on: someone has fired a gunpowder three times into the blackened sea. Suddenly everyone on board the ship starts hearing the sounds of the Gun Club's guns. Uh oh. What could this be all about? Well, we don't know yet, but it won't come up again until later in the novel. At this point, you might want to take a look at this map to figure out where everything is. You'll see why in a second. Once they get close enough to see the lights of the boat through binoculars, the captain realizes that they must be approaching a large building. When he climbs down to the level of the stairs leading to the main building, he notices that all of the lights are in one part of the building. In the center of the giant structure is seated General Ivan Zaroff, who is holding a gun. And speaking of huge animals, there are lions, tigers, bears, elephants, and moose. Oh, did we mention moose? All of these animals are huge and very manly looking. Also lurking near the edge of the big building is a pack of viciously noiseless hounds. After a few minutes of swimming along the massive tree root-lined ravine outside the main door of the fortress-like chateau called the Tanzanian tiger pit, Rainsfords first thoughts turn to fear. He knows that he can do only two things: he can either stay where he is and die, or he can run away and meet up with the rest of the pack. He decides to run away. His first instinct is to flee, but then he thinks better of it. He climbs a tree and ties his knife to his belt just in case he needs to use it to stop the hounds if he wants to make it back to the ship. The idea of running away doesn't really appeal to him, so he keeps climbing until he finds a gap in the trees between the trees and the shore. He leaps out onto the shore and watches as the sea rumbles nearby. Later that night, after having enjoyed a good dinner and a bottle of brandy, he goes to bed. He looks out his window and sees the glistening water of the bay and decides that this is the best bed he's ever slept in. Back to Top
191
+
192
+ ---
193
+
194
+ Section Scores for The Most Dangerous Game--Richard Connell_summary:
195
+
196
+ - -1.1326
197
+
198
+ ---
199
+
200
+ ## gpt_peter_testing_group_exemplars_summary
201
+
202
+ The boys' school is in session. The narrator introduces us to some of the characters and asks a few questions to find out who they are and what they like to do. Some of the questions include: What hobbies do you enjoy? Do you like to climb the Swiss Alps? What music do you listen to? Why does the cinema charge too much for popcorn? Which variant of Korea do you think is better? Peter, the leader of the gang, says he would like to know how it feels to be a "fly on the wall" while someone has a "psychotic episode. How does it feel?" Is it scary? Which covert operations occur on Thursday nights? I've always wondered," he says. He goes on to tell us about the beliefs of the South Park kids when their mother creates a riot when they refuse to let them buy Cheesy Poofs because it contains a sexually graphic scene. The film is banned in South park because of its graphic nature, so the students have to find other ways to occupy themselves. Philip and Terrance try to get the Canadian ambassador's movie to be released without fear of retaliation from the U.S. The main issue here is whether or not the narrator can call himself a hard worker. As the chapter comes to a close, we learn that Peter has more soldiers on his side after Paul's defection.
203
+
204
+ ---
205
+
206
+ Section Scores for gpt_peter_testing_group_exemplars_summary:
207
+
208
+ - -0.8511
209
+
210
+ ---
211
+
212
+ ## navy seals copy pasta_summary
213
+
214
+ The narrator interrupts the girl's little tirade to remind her that she is nothing more than a target. He graduated at the top of his class, he tells her, and has done a lot more killing than she ever could. In fact, he's so good at it that he's become a "top sniper" in the entire United States armed forces. She's nothing to him, kid, and he'll just be another target. The narrator brags about all the great things he's done, including killing over 300 men in secret raids against Al-Qaeda and training in gorilla warfare. He promises to kill her in seven hundred ways with his bare hands. If she can't take that, she's going to have to face the full fury of the U.S. military.
215
+
216
+ ---
217
+
218
+ Section Scores for navy seals copy pasta_summary:
219
+
220
+ - -0.5135
221
+
222
+ ---
223
+
224
+ ## script_findingnemo_summary
225
+
226
+ The Finding Nemo film transcript is presented as a work in progress. It is not 100% accurate, as some words are missing or the character does not seem to have fitted into the script exactly. However, it is about 99% accurate. BaD_Burn informs the audience that he did not intend to infringe on the copyright of the movie or its screenplay. The narrator reminds the audience to remember the saying, "Fish are friends; not food". Marlin and Coral discuss the ocean view they have been given by Mr. Creakle, the man who built their house. They talk about the many clownfish that have come to build their home, and wonder if they will be able to create a nest of their own right next to their bedroom window. They discuss naming all of the fish Marlin Jr., Coral's children, and decide to name them all "Coral jalopies." They also discuss whether or not they will grow up to be parents, commenting on the 400 eggs that are expected to hatch over the next few days. Finally, they discuss the possibility of having a sextant, a drug that makes people addicted to drugs.
227
+ Nemo tells his friends that his dad just fought three giant sharks. Oh, and did we mention that he also fought a bunch of jellyfish and turtles? Brain Snack: Jellyfish is a common term for jellyfish, a kind of large jellyfish that's often mistaken for a jellyfish. In fact, lots of people think of them as "jellyfish" because they have a habit of chewing on their own corpses. Anyway, the point is that Nemo fought some big fish and beat them up, and now he's headed to Sydney. The gang decides to play a little game called "Swim-A-Thon," in which they try to figure out how to get back to the bottom of the ocean before Darla makes it there in time. Of course, they don't know how to do this without knowing how to talk to each other, so they all switch back and forth between talking to animals and trying to find Nemo. At one point, Dory even tries to speak like a whale, but everyone else can't make any sense out of it. It's pretty funny.
228
+
229
+ ---
230
+
231
+ Section Scores for script_findingnemo_summary:
232
+
233
+ - -0.887
234
+
235
+ - -0.8901
236
+
237
+ ---
238
+
239
+ ## script_frozendisney_summary
240
+
241
+ Arendelle opens with a shot of ice that looks like it's about to freeze solid. A band of workers are making their way across the frozen lake. They sing "The Frozen Heart," a Sami folk song, as they drag huge ice blocks through the water through a chink in the ice. An 8-year old boy named Kristoff is trying to get a piece of ice off his step. There's also a reindeer with a baby on it. Then there's a young girl named Anna. She and her little sister, 5-year-old Anna, do a lot of running around and chasing after each other during the party. At one point, Anna even runs headfirst into a giant pile of ice. It's all very dramatic. Next, we get a close-up look at some of the action as the girls try to figure out how to make a snowman for the first time ever. Anna gets so freaked out by the whole situation that she forgets to bring her gloves. And here comes the icing on the cake: a giant ice sculpture made of styrofoam. Of course, everyone thinks it's supposed to be a toy, not a real thing. But then again, who hasn't seen a toy before? As the scene continues, we find out that this is a set up for a big production called Arendelle's Trading Post and Fair. The two companies are competing to see who can make the most extravagant snow sculpture. Here, you can think of it as a giant version of Legally Blonde meets Disney's Beauty and the Beast. Oh, and here's a shout-out to Count Olaf.
242
+ Anxiously, Anna and Sven watch through the window as Arendelle thaws into the frozen mountains. The three try to find their way to the North Mountain, but are unable to find it. As they near the top of the mountain, Anna sees a glimmering icicle in the distance and realizes it is Olaf. Suddenly, an ax hits Olaf right in the nose and he falls down, just barely avoiding being hit by a falling icicle. Anna catches Olaf's axe and pulls him back to safety. She offers to replace his sled with hers if he does not want to continue to help her. He tells her that this incident has ruined his chances of helping anyone ever again. In the middle of their conversation, the three see Anna take a wrong turn and end up at the wrong place. They follow her as she tries to find her way back to the main road. It is now very cold and everyone is worried about the possibility of frostbite. On the other hand, there seems to be plenty of other things to worry about such as whether or not Anna will be able to make it back alive. At last, they hear a voice calling out to them: "It's Olaf!" And who should it be but the little snowman Olaf! All four of them rush toward the ice palace only to find themselves surrounded by a huge wall of ice--the result of a giant icicle flung from a cannonball thrown by one of the storm chandeliers. No matter how hard they try, though, no amount of effort can break through the impassable ice. When they finally get to the top level, everything seems to have frozen solid except for two things: 1) Anna's sword; and 2) Olaf, who keeps trying to bite through the ice like a faucet; and 3) Anna herself, who manages to free herself from the ice by using a stick as a thrower. After much hilarity ensues, the play ends with the three friends having a heart-to-heart.
243
+ As the storm continues, Anna and Olaf are trapped in the castle. They try to break out through a window, but the ice is so thick that they are trapped. Anna struggles on with Olaf's help. As the storm gets worse, Anna realizes she needs to find Kristoff. She runs to him, but he doesn't seem to care. Suddenly, the walls begin to crack under the pressure of the ice. The whole castle begins to crack and ice begins to spill out. Anna sees Kristoff and starts to run towards him. Just as she reaches him, his sword is drawn. With her last strength, Anna leaps in front of her sister and catches the sword. The sword strikes her instead of Elsa, and it knocks both of them out cold. When they come to, Anna embraces her sister. Olaf is so excited about what has happened that he lifts his head off his head so that you can see how excited he is. He says that an act of "true love" will thaw even a frozen heart. As Anna hugs her sister, the ice breaks loose and the whole fjord thaws. The villagers come outside to watch the warmth return. Anna gives a signal to Olaf and he comes over to her. In one final gesture, she distributes the snow in a giant snowflake and waves it away. It becomes a warm summer day.
244
+
245
+ ---
246
+
247
+ Section Scores for script_frozendisney_summary:
248
+
249
+ - -0.9939
250
+
251
+ - -0.9178
252
+
253
+ - -0.6323
254
+
255
+ ---
256
+
257
+ ## script_strangersonatrain_summary
258
+
259
+ On a train platform in Washington D.C., the action shifts to the station at the Union Station. Cameras focus on a man getting out of a taxi and into a waiting limousine; he is wearing dark colored brogues, a conservative suit, and sport shoes. A second scene shows two young men sitting in the parlor of a train car. One is Bruno Anthony, a well-dressed tennis player, and the other is George Haines, who we learn is his mother's son. They are engaged in conversation as they wait for the train to pull into the station. When it arrives, they alight and board the train. On the train platform, there is much activity as people make their way from one terminal to another. The scene switches to the Parlor car where Anne Burton and her friends are waiting for Bruno to arrive. She is accompanied by two of her younger boy friends. As she waits with her friend George, she receives a message that her father wants to see her. In the meantime, Mrs. Joyce approaches her and tells her not to stay out so late. This prompts Anne to tell her mother that she will be staying out too late. At this point, the action switches to Guy's apartment where he is being introduced to his fiancee, Anne's father, Mr. Alden. He introduces himself as "Bruno" and explains that he has come to take care of some family business. There follows a series of brief conversations between him and Anne. Next, we cut to a scene in a music shop where both parties are involved in what appears to be a domestic quarrel. It begins when Captain Bruno meets up with Eunice (Guy's mother) and invites her to have lunch with him. We then switch to the scene in the living room of Mrs. Anthony's home where Bruno is trying to convince his mother about the plan to kill her husband. While doing so, he also tries to get her to realize that she does not want a divorce from him. Finally, the scene shifts back to the front of the store where Ganymede is attempting to deliver a letter to Mrs. Jarndyce de Bassompierre. All of this takes place in a shot taken from the rear window of the shop.
260
+ The scene shifts to the home of Mr. and Mrs. Burton, where Guy is being questioned by the police about his whereabouts on the night of the murder. When he arrives at the Burton house, he gives a false alibi for 9:30 that night and explains that he spoke with a professor named Collins who advised him to stay away from prostitutes. Anne is relieved that Guy can give her an alibi, but she is still worried about what will happen in the morning when news of this incident reaches the family. The narrator informs us that there will be "a lot of reporters at your house tomorrow morning". At the same time, however, the Burton's servant, Captain Turley, has been given the key to the house by one of the other guests. A few moments later, Barbary enters the room carrying a large envelope which contains some papers. She asks Haines if he is planning any plans for the evening. He tells her that he would like to do so much as long as they don't disturb Mrs. Antony or Mr. Bushy. In response, she says that she doesn't mind scandal but that she'd like them to find someone else to take care of it. Meanwhile, back at the train station, Anne confesses to Guy that she thinks it would be nice to have a man who loves her so much to marry her. They are interrupted by the arrival of Bruno, whom Anne recognizes as the man she talked to on the day of the Metcalf affair. After introductions are made between Anne and Guy, the two other women enter the room. One of these is Barbara, while the other is Drusseemingly interested in Guy. It turns out to be Professor Collins, who was Guy's drinking buddy during their date at the night before. As soon realizes that Anne knows about the murder, he tries to convince her not to involve herself too heavily in the matter. However, she does not want to get involved because she wants to protect everyone around her--her father, her father, Barbara, and now Guy. Anne decides that she must tell her father all about the situation. Back at the Burton house, Guy goes up to his room and finds a note addressed to him written in a sloppy, illegible handwriting. The note reads: "We have to call me at Arlington." Before Guy can read the note, though, he is confronted by Bruno, who demands to know how he came to be in the wrong place at the wrong time. This leads to a series of misunderstandings and misunderstandings until Guy finally realizes that there is no point in trying to figure out exactly what happened. He begins to feel guilty, knowing that he might be implicated in the murder even though he hasn't yet committed himself to the crime. To cover his own assiduously, he takes off his dinner clothes and puts on a pair of tracksuit pants. Then he jumps into a cab and makes his way toward the Bennett household.
261
+ The next scene opens late at night in the apartment where Guy and Anne are engaged in a conversation. The mother of one of Guy's tennis opponents, Barbara, is trying to convince Anne that her son should be punished for murdering a woman. She tries to talk Anne into talking to her son about it, but Anne is so upset that she can only stammer out that she leaves the room. As soon as she leaves, however, she hears her son's voice: "Bruno". It turns out that his mother was not very helpful after he had been away from home for such a long time. He tells Anne that he is upset with Guy because he did not have the foresight to send her on this errand. But he also confesses that he feels guilty for protecting Anne ever since they had their first sexual encounter when he told her how much he hated his then-wife. At this point, we cut back to the play on the tennis court between Guy and Mr. Reynolds, who is preparing for the second round of the doubles match between him and Guy. While watching the action on the court, we learn that Haines has won the first set and will need to win the next set to take control of the match. On the other side of the net, there is an even more intense battle between Bruno and Guy over the love of a young girl who is riding the merry go-round. In the midst of all of this excitement, we catch a brief glimpse of Bruno trying to get rid of a cigarette lighter that he dropped by accidentally into the drain. When he fails miserably, he grabs hold of a man's hand and runs off with it. A crowd gathers around the broken up towards the end of the final set.
262
+
263
+ ---
264
+
265
+ Section Scores for script_strangersonatrain_summary:
266
+
267
+ - -1.1458
268
+
269
+ - -1.2179
270
+
271
+ - -1.1111
272
+
273
+ ---
274
+
275
+ ## script_sunsetblvd._summary
276
+
277
+ On March 21, the setting shifts to Los Angeles where Charles Brackett and Billy Wilder are involved in a murder investigation. A 46 Plymouth convertible is found floating in a pool of Mabel Norma Desmond's swimming pool. The story opens with a street sign that reads "Sunset Boulevard" and the police squad cars drive by. Two men arrive at the apartment where John Gillis is working on a script and they ask him for his car, which he has lent to a friend. When they leave, he gives them the keys to the car only to discover that it is gone. He meets with Norma who offers him a job as a writer. She wants him to write a romantic comedy set in the fictional town of Bel-Air about a young man who falls in love with a girl and then commits suicide. At the same time, she shows him a picture of a dead chimpanzee wrapped in a shawl lying on a table. Later, we learn that the dead man was George Sheldrake who had been shot in the back and killed by a machinegun while attempting to steal a $5,000 pearl from a pawnbroker. Next, we see scenes from the shooting of a number of films including Bases Loaded, Gold Rush, The Bride of Lamon and many others. Finally, we jump forward in time to the scene of a newly married couple, Norma and Max sitting in their living room. They are interrupted by the sound of an organ being played. Outside, two men come to pick up the car, but when they get to the apartment, they find out that there is no one there. They decide to go to the garret where the car is stored. There, however, where they find Norma waiting for them. She explains to them that she will not only take care of the car but that she also plans to bury it in the garden because she thinks it will be a good way to honor her late husband. Then she asks them to give her some money so that she can have a New Years Eve party.
278
+ It's New Year's Eve, and the whole place is decorated to the nines with lights and garlands. There's a buffet going on in one corner of the room. Norma is there with Max and she's wearing a very high-style outfit. She notices Gillis and flirts with him, complimenting his shoulders and assuring him that he looks divine. They order some drinks and start to dance. The musicians are watching them while they do their thing. When they're done, Norma leads Gillis over to another room where they have a little chat. She tells him about a rich guy who bought all the tickets to the play at the theater and then sat around listening to it all by himself because he was afraid of catching cold. This rich guy's name is Mr. de Mille. He's planning to rent out the play for a couple of weeks and wants Norma to write a script for him. Meanwhile, back at the studio, Betty is trying to convince her to quit being a prostitute and get back into acting. She hints that she'd like to try her hand at writing again since she's so much better at it than she ever was when she was working as a prostitute. At this point, Joe Gillis shows up and takes a walk down to see if anyone knows him. No one does. We find out that he's looking for a woman named "Norma Desmond." A few seconds later, though, someone does know him. It's none other than... Norma herself! She gives him a hug good-bye before running off to get her script. Next, we head over to the next scene, which is set in a dingy little apartment called the "The Readers' Department". It's full of people getting ready for a New Years Eve party - actors, stagehands, comedians, and more. In fact, it's such a shindig that the place is filled to the brim with people from all over town. Everyone's having such a good time that they decide to give Norma a run for her money. After a bit of banter about how awesome Norma would make a great actress, she gets up to leave. As she leaves, she runs into Joe, who tries to flirt with her but ends up stabbing her in the arm with a penknife. Finally, she retreats to her room and sends Joe packing. Back at the readers' department, everyone is talking about what a dope Norma has turned out to be. Apparently, she's playing hooky from De Mille's Theater Club and instead of waiting for the rest of the group to show up she's spending the night doing her own thing. What could possibly go wrong? Well, you can't exactly count the number of times she's had sex without getting chewed out, can you? Just look at these lines: "I had a nightmare and screamed for you. I had a toothache and woke up screaming for you". That's not a euphemistic way to say "hey, did you guys have sex last night?" Yeah, well, um, no. Some things never change.
279
+ Gillis notices that Betty is staring at him and asks her why she is staring. Betty tells him that something came up, but she won't say what. She walks out in tears. When she comes back in, she tells Gillis that she's getting married to Artie Green. He takes her in his arms for a moment and then the camera pans away. The next night, Norma goes to sleep with the door open. In the middle of the night, Betty gets a telegram from Artie, who wants her to come to his house in Arizona to get married so they can save money on the dowry. Betty starts crying and tells them that she isn't in love with him anymore. They tell her to stop crying and that she can get married as long as she wants. Just then, someone picks up the telegram and says, "Congrats, Betty, you're engaged." This is too bad, because it means that Norma will have to give up her job. Then, there's a phone call. It's none other than Norma herself, trying to reach out to Betty via the phone. But when she gets to the part about how she doesn't want to hurt anyone, she realizes that she was a fool to have lied to her. She wishes she didn't have to tell her secret; maybe she could get away with it and get away from Norma once and for all. From Norma's bedroom, we hear a telephone being dialled. It turns out to be the home of Joe Gladstone. Norma gives the number to Betty, who shares the room with another girl named Betty. On hearing the phone call, both women share the same bedroom. At first, neither of them understands what's going on. Finally, after listening to a few more questions, they get it straightened out enough so that they understand what Norma really wants. She wants to go down to the lower level to shoot herself in the face. Before she does, though, she makes one last attempt to assure Joe that he'll never leave her again. After this point, the action switches to a scene in front of a bunch of news cameras. Hedda is talking to the Times newspaper's city desk, reporting on the happenings at Norma Desmond's house. As Norma prepares to do her scene, she meets with the captain of the hills division and the homicide squad. The captain tries to get her to admit that she did indeed kill Salome, but Norma cuts him off before she can give an unequivocal yes or no answer. Instead, she just repeats over and over that she had nothing to do with it. Meanwhile, everyone is gathered in the hallway below watching Norma prepare for her scene. Newsmen, reporters, and even a famous actress are milling around speculating on what would happen if Norma were to get off scot-free. No matter what happens, however, Normas life will go on forever.
280
+
281
+ ---
282
+
283
+ Section Scores for script_sunsetblvd._summary:
284
+
285
+ - -0.9227
286
+
287
+ - -1.1084
288
+
289
+ - -0.873
290
+
291
+ ---
292
+
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ee872e86edb7c76ea6631b672e2cd3df5df6a749cfea7c8412a319123c87976
3
+ size 1839478370
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0603a70f15308ebccb9b66369464a83efb920ed266f736bdca0279bc066eecf7
3
+ size 1839482407
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"errors": "replace", "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "trim_offsets": true, "model_max_length": 16384, "special_tokens_map_file": "/root/.cache/huggingface/transformers/2ad921573d53ebf0c0450d63a211e61d8e328324e84830c365abff01f2d115f1.cb2244924ab24d706b02fd7fcedaea4531566537687a539ebb94db511fd122a0", "name_or_path": "pszemraj/led-large-book-summary", "tokenizer_class": "LEDTokenizer"}
trainer_state.json ADDED
@@ -0,0 +1,364 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.9914638001896936,
5
+ "global_step": 294,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.05,
12
+ "learning_rate": 7.5e-06,
13
+ "loss": 0.481,
14
+ "step": 5
15
+ },
16
+ {
17
+ "epoch": 0.1,
18
+ "learning_rate": 1.5e-05,
19
+ "loss": 0.5091,
20
+ "step": 10
21
+ },
22
+ {
23
+ "epoch": 0.15,
24
+ "learning_rate": 2.25e-05,
25
+ "loss": 0.4303,
26
+ "step": 15
27
+ },
28
+ {
29
+ "epoch": 0.2,
30
+ "learning_rate": 3e-05,
31
+ "loss": 0.4055,
32
+ "step": 20
33
+ },
34
+ {
35
+ "epoch": 0.25,
36
+ "learning_rate": 2.998662940889891e-05,
37
+ "loss": 0.4338,
38
+ "step": 25
39
+ },
40
+ {
41
+ "epoch": 0.3,
42
+ "learning_rate": 2.9946541471956496e-05,
43
+ "loss": 0.4793,
44
+ "step": 30
45
+ },
46
+ {
47
+ "epoch": 0.35,
48
+ "learning_rate": 2.9879807655761145e-05,
49
+ "loss": 0.4291,
50
+ "step": 35
51
+ },
52
+ {
53
+ "epoch": 0.4,
54
+ "learning_rate": 2.9786546929722055e-05,
55
+ "loss": 0.4908,
56
+ "step": 40
57
+ },
58
+ {
59
+ "epoch": 0.46,
60
+ "learning_rate": 2.966692555397705e-05,
61
+ "loss": 0.463,
62
+ "step": 45
63
+ },
64
+ {
65
+ "epoch": 0.51,
66
+ "learning_rate": 2.9521156782993066e-05,
67
+ "loss": 0.528,
68
+ "step": 50
69
+ },
70
+ {
71
+ "epoch": 0.56,
72
+ "learning_rate": 2.9349500485387718e-05,
73
+ "loss": 0.5178,
74
+ "step": 55
75
+ },
76
+ {
77
+ "epoch": 0.61,
78
+ "learning_rate": 2.9152262680649704e-05,
79
+ "loss": 0.4602,
80
+ "step": 60
81
+ },
82
+ {
83
+ "epoch": 0.66,
84
+ "learning_rate": 2.8929794993583937e-05,
85
+ "loss": 0.5044,
86
+ "step": 65
87
+ },
88
+ {
89
+ "epoch": 0.71,
90
+ "learning_rate": 2.8682494027454e-05,
91
+ "loss": 0.4217,
92
+ "step": 70
93
+ },
94
+ {
95
+ "epoch": 0.76,
96
+ "learning_rate": 2.8410800656939512e-05,
97
+ "loss": 0.502,
98
+ "step": 75
99
+ },
100
+ {
101
+ "epoch": 0.81,
102
+ "learning_rate": 2.811519924216873e-05,
103
+ "loss": 0.4549,
104
+ "step": 80
105
+ },
106
+ {
107
+ "epoch": 0.86,
108
+ "learning_rate": 2.779621676522777e-05,
109
+ "loss": 0.5692,
110
+ "step": 85
111
+ },
112
+ {
113
+ "epoch": 0.91,
114
+ "learning_rate": 2.7454421890685647e-05,
115
+ "loss": 0.4312,
116
+ "step": 90
117
+ },
118
+ {
119
+ "epoch": 0.96,
120
+ "learning_rate": 2.709042395181008e-05,
121
+ "loss": 0.4938,
122
+ "step": 95
123
+ },
124
+ {
125
+ "epoch": 1.02,
126
+ "learning_rate": 2.6704871864281377e-05,
127
+ "loss": 0.5433,
128
+ "step": 100
129
+ },
130
+ {
131
+ "epoch": 1.07,
132
+ "learning_rate": 2.6298452969340952e-05,
133
+ "loss": 0.3459,
134
+ "step": 105
135
+ },
136
+ {
137
+ "epoch": 1.12,
138
+ "learning_rate": 2.58718918084368e-05,
139
+ "loss": 0.3739,
140
+ "step": 110
141
+ },
142
+ {
143
+ "epoch": 1.17,
144
+ "learning_rate": 2.5425948831550528e-05,
145
+ "loss": 0.375,
146
+ "step": 115
147
+ },
148
+ {
149
+ "epoch": 1.22,
150
+ "learning_rate": 2.496141904150859e-05,
151
+ "loss": 0.3809,
152
+ "step": 120
153
+ },
154
+ {
155
+ "epoch": 1.27,
156
+ "learning_rate": 2.447913057669456e-05,
157
+ "loss": 0.4183,
158
+ "step": 125
159
+ },
160
+ {
161
+ "epoch": 1.32,
162
+ "learning_rate": 2.3979943234689226e-05,
163
+ "loss": 0.4207,
164
+ "step": 130
165
+ },
166
+ {
167
+ "epoch": 1.37,
168
+ "learning_rate": 2.3464746939470288e-05,
169
+ "loss": 0.3767,
170
+ "step": 135
171
+ },
172
+ {
173
+ "epoch": 1.42,
174
+ "learning_rate": 2.2934460154904436e-05,
175
+ "loss": 0.4248,
176
+ "step": 140
177
+ },
178
+ {
179
+ "epoch": 1.48,
180
+ "learning_rate": 2.2390028247360042e-05,
181
+ "loss": 0.3374,
182
+ "step": 145
183
+ },
184
+ {
185
+ "epoch": 1.53,
186
+ "learning_rate": 2.183242180035951e-05,
187
+ "loss": 0.4582,
188
+ "step": 150
189
+ },
190
+ {
191
+ "epoch": 1.58,
192
+ "learning_rate": 2.1262634884275948e-05,
193
+ "loss": 0.4153,
194
+ "step": 155
195
+ },
196
+ {
197
+ "epoch": 1.63,
198
+ "learning_rate": 2.068168328415864e-05,
199
+ "loss": 0.409,
200
+ "step": 160
201
+ },
202
+ {
203
+ "epoch": 1.68,
204
+ "learning_rate": 2.0090602688846884e-05,
205
+ "loss": 0.4023,
206
+ "step": 165
207
+ },
208
+ {
209
+ "epoch": 1.73,
210
+ "learning_rate": 1.9490446844600375e-05,
211
+ "loss": 0.3426,
212
+ "step": 170
213
+ },
214
+ {
215
+ "epoch": 1.78,
216
+ "learning_rate": 1.888228567653781e-05,
217
+ "loss": 0.4059,
218
+ "step": 175
219
+ },
220
+ {
221
+ "epoch": 1.83,
222
+ "learning_rate": 1.8267203381232774e-05,
223
+ "loss": 0.4449,
224
+ "step": 180
225
+ },
226
+ {
227
+ "epoch": 1.88,
228
+ "learning_rate": 1.764629649386713e-05,
229
+ "loss": 0.4362,
230
+ "step": 185
231
+ },
232
+ {
233
+ "epoch": 1.93,
234
+ "learning_rate": 1.7020671933387917e-05,
235
+ "loss": 0.4874,
236
+ "step": 190
237
+ },
238
+ {
239
+ "epoch": 1.98,
240
+ "learning_rate": 1.63914450291526e-05,
241
+ "loss": 0.3326,
242
+ "step": 195
243
+ },
244
+ {
245
+ "epoch": 2.04,
246
+ "learning_rate": 1.5633197410233404e-05,
247
+ "loss": 0.4035,
248
+ "step": 200
249
+ },
250
+ {
251
+ "epoch": 2.09,
252
+ "learning_rate": 1.5e-05,
253
+ "loss": 0.3291,
254
+ "step": 205
255
+ },
256
+ {
257
+ "epoch": 2.14,
258
+ "learning_rate": 1.4366802589766598e-05,
259
+ "loss": 0.353,
260
+ "step": 210
261
+ },
262
+ {
263
+ "epoch": 2.19,
264
+ "learning_rate": 1.373473400935433e-05,
265
+ "loss": 0.319,
266
+ "step": 215
267
+ },
268
+ {
269
+ "epoch": 2.24,
270
+ "learning_rate": 1.3104921076168065e-05,
271
+ "loss": 0.341,
272
+ "step": 220
273
+ },
274
+ {
275
+ "epoch": 2.29,
276
+ "learning_rate": 1.247848658636778e-05,
277
+ "loss": 0.3276,
278
+ "step": 225
279
+ },
280
+ {
281
+ "epoch": 2.34,
282
+ "learning_rate": 1.185654731320877e-05,
283
+ "loss": 0.3628,
284
+ "step": 230
285
+ },
286
+ {
287
+ "epoch": 2.39,
288
+ "learning_rate": 1.124021201611919e-05,
289
+ "loss": 0.2727,
290
+ "step": 235
291
+ },
292
+ {
293
+ "epoch": 2.45,
294
+ "learning_rate": 1.0630579464064182e-05,
295
+ "loss": 0.3466,
296
+ "step": 240
297
+ },
298
+ {
299
+ "epoch": 2.5,
300
+ "learning_rate": 1.0028736476720464e-05,
301
+ "loss": 0.3187,
302
+ "step": 245
303
+ },
304
+ {
305
+ "epoch": 2.55,
306
+ "learning_rate": 9.435755986953485e-06,
307
+ "loss": 0.3837,
308
+ "step": 250
309
+ },
310
+ {
311
+ "epoch": 2.6,
312
+ "learning_rate": 8.852695128051192e-06,
313
+ "loss": 0.2955,
314
+ "step": 255
315
+ },
316
+ {
317
+ "epoch": 2.65,
318
+ "learning_rate": 8.280593349124432e-06,
319
+ "loss": 0.3793,
320
+ "step": 260
321
+ },
322
+ {
323
+ "epoch": 2.7,
324
+ "learning_rate": 7.720470562033787e-06,
325
+ "loss": 0.3443,
326
+ "step": 265
327
+ },
328
+ {
329
+ "epoch": 2.75,
330
+ "learning_rate": 7.17332532314626e-06,
331
+ "loss": 0.2915,
332
+ "step": 270
333
+ },
334
+ {
335
+ "epoch": 2.8,
336
+ "learning_rate": 6.640133053163455e-06,
337
+ "loss": 0.3514,
338
+ "step": 275
339
+ },
340
+ {
341
+ "epoch": 2.85,
342
+ "learning_rate": 6.12184429819474e-06,
343
+ "loss": 0.3221,
344
+ "step": 280
345
+ },
346
+ {
347
+ "epoch": 2.9,
348
+ "learning_rate": 5.619383035175448e-06,
349
+ "loss": 0.2903,
350
+ "step": 285
351
+ },
352
+ {
353
+ "epoch": 2.95,
354
+ "learning_rate": 5.133645024651171e-06,
355
+ "loss": 0.3397,
356
+ "step": 290
357
+ }
358
+ ],
359
+ "max_steps": 392,
360
+ "num_train_epochs": 4,
361
+ "total_flos": 1.821325738775675e+17,
362
+ "trial_name": null,
363
+ "trial_params": null
364
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff