Sampath1987 commited on
Commit
3c75b75
·
verified ·
1 Parent(s): 22747bf

fine-tuned EnergyEmbed-nv1 1 epochs

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,814 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:44838
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Alibaba-NLP/gte-multilingual-base
11
+ widget:
12
+ - source_sentence: How does the volume and flow rate of cement affect the cementing
13
+ process in oil and gas wells?
14
+ sentences:
15
+ - "Overview of International Offshore Decommissioning Regulations: Volume 1 – Facilities\
16
+ \ \nThe Petroleum Code does not make any specific requirements in relation to\
17
+ \ whether\noffshore facilities need to be removed following cessation of production.\
18
+ \ However, as a\nsignatory to UNCLOS III/IMO and the Abidjan Convention, the Republic\
19
+ \ of Guinea is bound\nby these international and regional agreements. \nThe Environment\
20
+ \ Code is enforced by the Ministry of Natural Resources, Energy and\nEnvironment.\
21
+ \ Its key aims are to protect the environment while promoting the use of\nnatural\
22
+ \ resources. Title 2/Chapter III of the Environment Code deals with maritime waters\n\
23
+ and their resources and Title 5 deals with EIA requirements for major projects."
24
+ - 'Well Cementing design is a critical component of Well engineering, as efficient
25
+ cement design ensures the protection of the casing assemblies from fluid corrosion,
26
+ and ensures the mechanical support of the well. It also ensures that hydraulic
27
+ communication between different zones is prevented.
28
+
29
+ Well abandonment is also critical as the design of the slurry required needs to
30
+ be designed to efficiently keep hydrocarbons in the wellbore and prevent any immediate,
31
+ short term or long term migration of hydrocarbons to surface.
32
+
33
+ There are numerous studies and publications discussing the causes of gas migration
34
+ after primary cement jobs and well abandonment, some of the causes of gas migration
35
+ have been linked to poor fluid loss control, poor drilling fluid displacement
36
+ (reduces seal efficiency at the interfaces), and long cement setting times which
37
+ allows time for gas to percolate through the partially set cement slurry.
38
+
39
+ This paper highlights the engineering methods, and how they can be used to properly
40
+ evaluate the cement slurry design to ensure that gas flow through the cement lattice
41
+ is completely prevented. It assumes that all other issues which involving poor
42
+ execution (mud displacement, poor slurry mixing, use of low quality materials
43
+ and chemicals, human errors), are annulled.
44
+
45
+ The correlations/equations discussed and used for the evaluation of the abandoned
46
+ case study well (Well XRT) are the Gas Flow Potential, Slurry Performance Number,
47
+ Hydrostatic Number and Pressure Decay Limit Parameter. Results from critical evaluation
48
+ with these equations confirmed that the Well XRT was efficiently abandoned.
49
+
50
+ The paper further recommends that these equations should be used by Well Engineers
51
+ be used to evaluate slurry designs for casing cementing and abandonment operations,
52
+ as they will help ensure that the mechanical and hydraulic isolation is efficiently
53
+ designed for and achieved.'
54
+ - 'This article discusses the big volume top job of oil and gas wells, specifically
55
+ wells A and B which were drilled in Kuwait. The process involves pumping a larger
56
+ volume of mixture of cement, water, and other additives into the annulus to seal
57
+ the wellbore, prevent fluid migration and provide structural support.
58
+
59
+ The article highlights the need for precision and control to ensure proper placement.
60
+ The conventional methods like two stage method and lightweight systems used for
61
+ the wells A and B were not sufficient to get the good zonal isolation throughout
62
+ the well bore due to the lower fracture gradient observed in this well. The successful
63
+ zonal isolation was achieved due to pumping large volumes from the annulus.
64
+
65
+ The wells were under losses before and during the primary cementing process, which
66
+ was difficult to achieve the desired top of cement (up to surface). To overcome
67
+ these challenges, the well was cemented in unique unconventional method which
68
+ is pumping the bigger volumes from the annulus to cover up to loss zone and eliminate
69
+ any other fluid column in between. Cement Bond Log (CBL) and Variable Density
70
+ Log (VDL) were taken after a 24 Hrs wait on cement and the results were good,
71
+ indicating that the wellbore is properly sealed, and the well is structurally
72
+ stable.
73
+
74
+ Pumping large volumes of cement through the annulus can be challenging, as it
75
+ requires a high level of precision and control to ensure that the cement is properly
76
+ placed. This process is different to that of conventional top jobs carried out
77
+ by installing cement baskets. The intention of conventional top job methods is
78
+ to just seal the annulus at surface without paying any attention to mud caps left
79
+ in the open hole. This has resulted in remedial jobs which has increased the cost
80
+ or reduced the life span of wells.
81
+
82
+ One of the key considerations when pumping cement through the annulus is the volumes
83
+ considered and thickening time. The rate of flow must be carefully controlled
84
+ to ensure that the cement is properly mixed along with the additives and that
85
+ it does not become too thick or too thin. In addition, the rate of flow must be
86
+ adjusted to account for the variations in pressure and temperature that occur
87
+ as the cement moves through the well.
88
+
89
+ Cementing also plays an important role in preventing fluid migration. If the well
90
+ is not properly sealed, there might be inter communication of the fluids which
91
+ affects the life of the well. The extremely lower frac gradient wells undergo
92
+ losses Inspite of using the conventional methods (light weight systems and two
93
+ stage method) and is the reason to follow the unconventional method of cementing
94
+ from the annulus so that entire well bore from shoe to the surface is properly
95
+ sealed with cement. This will result in reducing the unnecessary remedial jobs
96
+ during the life of the well.'
97
+ - source_sentence: How do the various water cut measurement techniques compare for
98
+ suitability in permanent downhole deployment?
99
+ sentences:
100
+ - Optimization of hole cleaning remains a vital challenge when planning and drilling
101
+ deviated, high angle and extended reach wells. Hole cleaning depends on a number
102
+ of factors and as to date most existing models have been deployed in solving hole
103
+ cleaning problems. However, the flow rate predicted by these models may not be
104
+ feasible to apply practically in field operations because it gives a pressure
105
+ exceeding allowable limits of the pop-up valves on the mud pump. This is the major
106
+ cause of downtime during drilling operations. This research is aimed at adding
107
+ value to the existing models in achieving better hole cleaning and reduced down
108
+ time. This was made possible through the use of cutting monitoring model which
109
+ is a real time and quantitative tool. A case study on a well being drilled in
110
+ the Niger Delta was conducted whose from which it was observed that within 5800ft
111
+ to 11500ft, the hole was not properly clean as less cuttings were recovered. This
112
+ information was used to initiate hole cleaning procedure. From the validation,
113
+ the results shows Non-Productive Time associated with hole cleaning has a significant
114
+ drop of 2-5 days when the cutting monitoring model is used in conjunction with
115
+ the existing models.
116
+ - Exhumation describes vertical displacements of rocks from maximum depth of burial
117
+ that results from the removal of overburden material. In this study we invert
118
+ seismic velocity profiles from 2D and 3D seismic reflection datasets to constrain
119
+ the distribution and the magnitude of exhumation within the Slyne Basin, offshore
120
+ NW Ireland. The method has already been successfully applied to 2D datasets offshore
121
+ Britain and Africa; this study is the first attempt to extract exhumation estimates
122
+ from 3D seismic data. Inversion of 3D seismic velocity data yields a continuous
123
+ map of exhumation across the entire 3D footprint. Exhumation estimates from 2D
124
+ seismic sections agree with estimates from co-located 3D data. However, there
125
+ is greater scatter in the 2D-derived exhumation estimates, most easily seen at
126
+ line ties. This scatter in the 2D measurements arises because 2D seismic stacking
127
+ velocities are less well constrained than 3D velocities. Together, the 2D and
128
+ 3D seismic stacking velocity profiles can be used to estimate exhumation patterns
129
+ on spatial scales >10 km to an accuracy of ±200 m. Many estimated changes in exhumation
130
+ are associated with geological structures, suggesting confidence in the results.
131
+ The margins of Slyne Basin have undergone about 1 km more erosion than the basin
132
+ centre to form the Jurassic-Miocene composite unconformity. Inversion anticlines
133
+ in the centre of the basin have undergone a few hundred metres more erosion at
134
+ their crests than at their flanks. There is good agreement between 3D seismic-derived
135
+ exhumation estimates and existing exhumation estimates using traditional techniques
136
+ applied to borehole data. Overall, our results show that regional exhumation can
137
+ be mapped in hitherto unprecedented detail using good quality seismic stacking
138
+ velocity data.
139
+ - This paper addresses the need and challenges associated with the permanent downhole
140
+ water cut measurement in multiphase flow at an individual lateral level for efficient
141
+ and reliable water cut management in a multilateral horizontal well environment.
142
+ Furthermore, it reviews the available water cut measurement techniques and evaluates
143
+ their suitability for permanent downhole deployment in multilateral horizontal
144
+ wells. A comprehensive analysis of the state-of-the-art water cut measurement
145
+ techniques is presented for the first time in this paper to evaluate their suitability
146
+ for permanent downhole deployment. Downhole water cut measurement challenges are
147
+ described in detail and a table is presented comparing various techniques against
148
+ a set of requirements suitable for permanent downhole water cut measurement.
149
+ - source_sentence: What role does AI play in the integrated logistics process in the
150
+ offshore sector?
151
+ sentences:
152
+ - Sustainability has become a pivotal point in the maritime industry, encompassing
153
+ environmental, economic, and social dimensions. This study investigates the impact
154
+ of Industry 4.0 technologies on improving maritime logistics sustainability. An
155
+ extensive literature review will identify key technologies and sustainability
156
+ goals across these dimensions. Using advanced decision-making frameworks like
157
+ AI and ML-enabled decision intelligence or Neutrosophic-TOPSIS methods, the impact
158
+ of these technologies will be quantified and ranked. The results will yield a
159
+ prioritization of technologies and a strategic roadmap for their implementation,
160
+ aimed at optimizing resource allocation and enhancing sustainability. This research
161
+ provides an integrated approach to sustainability and technological adoption,
162
+ offering a novel, industry-specific roadmap.
163
+ - Detection of production and well events is crucial for planning of production
164
+ and operational strategies. Event detection is especially challenging in mature
165
+ fields in which various off-normal events might occur simultaneously. Manual detection
166
+ of these events by an engineer is a tedious task and prone to errors. On the other
167
+ hand, abundance of data in mature fields provides an opportunity to employ data-driven
168
+ methods for an accurate and robust production event detection. In this study a
169
+ data-driven workflow to automatically detect production events based on signatures
170
+ of events provided by experts is demonstrated. In the developed workflow, state-of-the-art
171
+ data-driven methods were integrated with the domain knowledge for an accurate
172
+ and robust detection. The methodology was applied on several case studies of mature
173
+ fields suffering from production issues, such as scaling and liquid loading. It
174
+ was found that the workflow is accurate, robust and computationally efficient
175
+ which could detect new events (verified by the expert). The demonstrated method
176
+ could be implemented both in the real-time or offline fashion. Such a workflow
177
+ is sufficiently generic which can be applied for detection of different events
178
+ and anomalies than tested and verified in this paper, such as leakage, production
179
+ losses, …
180
+ - 'This case study aims to showcase how integrated logistics in the offshore sector
181
+ streamline the supply chain process, reduce costs, and improves efficiency. The
182
+ scope of integrated logistics includes planning, transportation, warehousing,
183
+ inventory management, and information management, focusing on collaboration and
184
+ transparency between all stakeholders in the offshore supply chain.
185
+
186
+ The process of integrated logistics in the offshore sector begins with the cargo
187
+ booking. A detailed logistics plan and schedule are then developed, outlining
188
+ the supply chain network, transportation modes, and inventory management strategies.
189
+ The process is managed by an AI-based platform that automatically creates short
190
+ and long-term schedules using various cargo and telemetric data. During the execution
191
+ phase, real-time tracking and monitoring of the supply chain process are crucial
192
+ to managing disruptions. Continuous improvement is key to optimising the integrated
193
+ logistics process with a machine learning element to the logistics tool, resulting
194
+ in increased efficiency, reduced costs, and improved safety and reliability.
195
+
196
+ Implementing integrated logistics in the offshore sector has yielded several positive
197
+ results. Firstly, it has improved efficiency in the supply chain process, reducing
198
+ the time and cost required to move goods and equipment from the point of origin
199
+ to the point of consumption. Delivery time has been reduced by 23%, achieved by
200
+ using an AI planning system, real-time tracking, and optimised transportation
201
+ modes.
202
+
203
+ Secondly, integrated logistics has helped to maintain high levels of safety by
204
+ reducing the number of entries into the 500M zone by consolidating cargo and increasing
205
+ back deck utilisation. Standardised procedures for logistics operations have been
206
+ established, minimising the risk of errors and improving overall safety.
207
+
208
+ Thirdly, the implementation of integrated logistics has led to increased collaboration
209
+ and communication between stakeholders involved in offshore operations, resulting
210
+ in improved decision-making and reduced delays, as well as better transparency
211
+ between all elements of the supply chain.
212
+
213
+ Real-time tracking and monitoring of the supply chain process have been crucial
214
+ for effectively managing disruptions and addressing issues, which is made possible
215
+ by automating the process using AI, which is more efficient than manual processes.
216
+
217
+ The use of integrated logistics in the offshore sector has resulted in an overall
218
+ cost reduction of 23% on the shipment of goods and a reduction of CO2 emissions
219
+ by 32%, enabling effective management of the movement of goods and equipment while
220
+ promoting sustainability.
221
+
222
+ This approach to integrated offshore logistics will enable effective management
223
+ of the movement of goods and equipment from the point of origin to the point of
224
+ consumption and reduce costs for the oil and gas sector while ensuring compliance
225
+ with regulatory requirements.'
226
+ - source_sentence: How does the incorporation of polyamine and encapsulation polymer
227
+ in the HPWBM contribute to clay stabilization?
228
+ sentences:
229
+ - Clay bearing shale formations tend to swell upon contact with water-based drilling
230
+ fluid. The migration of hydrogen ions into the nano-spacing of shale platelets
231
+ is mainly responsible for its disintegration and swelling. To mitigate the clay
232
+ swelling problem, various shale stabilization materials are added in the water-based
233
+ muds (WBMs). Before adding these additives, it is crucial to understand their
234
+ physical and chemical interactions with clay minerals as well as within fluid.
235
+ In this study, Taro Root Mucilage (TRM) is used as a green chemical in WBM to
236
+ decrease the shale swelling characteristics. Taro root was boiled in distilled
237
+ water at 40°C for 24 h and mucilage was prepared, which was characterized by FTIR
238
+ and XRD pattern. It was then made part of a mud system, which then interacted
239
+ with the shale sample collected from the western zone of Pakistan. Moreover, this
240
+ mucilage was compared with sodium alginate mud system, a biopolymer commonly used
241
+ in industry. The results of the experimental studies showed that TRM appreciably
242
+ reduces clay swelling characteristics compared with the distilled water and sodium
243
+ alginate. Moreover, all the rheological parameters fall under the recommended
244
+ API range for TRM samples. Furthermore, it was found that the TRM produces a thin
245
+ filter cake and minimizes fluid loss volume. In addition, during the shale cutting
246
+ recovery test, 50%, 80% and 100% recoveries were obtained from base mud, whereas
247
+ 10% and 20% were obtained from TRM based WBM respectively. TRM encapsulates the
248
+ drilled cutting and preserves it from breaking into smaller fragments. In addition,
249
+ TRM concentration in drilling mud increases the hydrophobicity of the shale sample.
250
+ The adsorption of TRM over the surface of shale allows less penetration of water
251
+ in the nano-spacing of shale structure and improves the shale stability. Hence,
252
+ the finding in this article implies that TRM can be used as a green and sustainable
253
+ substitute for traditional clay stabilizers in drilling operations to reduce formation
254
+ damage. It has all the desired properties that help it to become an alternate
255
+ solution in the form of a clay swelling inhibitor.
256
+ - 'Exploration drilling obviously requires a robust drilling fluid system to be
257
+ a key factor in overcoming both the known and unexpected challenges of a structure
258
+ that consists of reactive clay and lost circulation zones. Extra consideration
259
+ has to be given to regulatory environmental requirements and complications resulting
260
+ from regional politics. A High-Performance Water Based Mud (HPWBM) system was
261
+ selected to address the aforementioned issues.
262
+
263
+ The HPWBM was customized to respond to the subsurface conditions with the main
264
+ requirement to provide maximum shale inhibition through a non-dispersed environment.
265
+ Polyamine was utilized to stabilize all types of clay; an encapsulation polymer
266
+ and a non-ionic polymer were included to prevent dispersion and to seal micro-fractures.
267
+ A complete shale study was performed to determine the optimum concentration of
268
+ the base fluid and each shale inhibitor. Then hydraulic behaviour of the mud was
269
+ simulated with contractor proprietary software to understand the parameters for
270
+ optimal hole cleaning as well as Equivalent Circulating Density (ECD) simulation.
271
+
272
+ The HPWBM system successfully facilitated the execution of the exploration well
273
+ and provided highly effective clay stabilization. No Non-Productive Time (NPT)
274
+ was recorded as a result of reactive clay issues. The mud system also facilitated
275
+ a good rate of penetration (ROP), formation stability, and lubricity. Waste cuttings
276
+ transportation was not required. In addition, there is also no requirement for
277
+ costly base oil including its associated transportation costs. The successful
278
+ implementation of the HPWBM yielded an estimating saving of 25% compared to invert
279
+ emulsion fluids, prior to considering costs associated with an expensive Liquid
280
+ Mud Plant (LMP), environmental, and freight costs. Significant cost savings were
281
+ achieved by eliminating the need for LMP rental, mobilization and demobilization.
282
+ Another notable saving was realized from the reduced system maintenance of the
283
+ HPWBM as less dilution was required compared to a regular Water Based Mud.
284
+
285
+ Thinking outside of the box and embracing the departure from the default consideration
286
+ of an invert system with a thorough risk assessment augmented value to wellbore
287
+ construction. A smartly designed HPWBM system provided performance comparable
288
+ to an invert emulsion system but with superior benefits with respect to environmental
289
+ protection, simplified logistics and lower costs.'
290
+ - Business Process Outsourcing can be aptly described as the process of forging
291
+ a contractual relationship with external supplier for the provision of capacity
292
+ that has been previously undertaken within an organization. In the global oil
293
+ and gas industry, Business Process Outsourcing (BPO) has emerged in contemporary
294
+ times as a potent tool in their operational mix. This is particularly hinged on
295
+ the imperatives to find a delicate balance between rising global demand, diminishing
296
+ reserves in some of the world's major oil fields, while managing distribution
297
+ and operating costs. The collapse of crude oil prices from US$100.00 in May 2014
298
+ to about US$30.00 and even below in early 2016 has reinforced outsourcing. Empirical
299
+ studies reveal that outsourcing of non-core activities may result in 25% cost
300
+ saving associated with on-/near-site operations and as much as 50-75% for offshore
301
+ operations compared to the cost of engaging in same activities in-house. Apart
302
+ from cost-cutting, other benefits associated with BPO include a stronger focus
303
+ on core competencies; improved regulatory conformity and compliance; as well as
304
+ access to a larger talent pool and novel technologies. The oil and gas industry
305
+ has emerged as the cornerstone of Nigeria's economy, accounting for about 70%
306
+ of annual government revenue and more than 90% of the nation's foreign exchange
307
+ reserves. Since the 1990s, outsourcing has assumed an increasing dimension in
308
+ the nation's oil and gas industry. Empirical studies reveal, for example, that
309
+ up until the early 1990s, employees in the oil industry comprised about 70% and
310
+ 30% of permanent and temporary employees, respectively. The temporary employees
311
+ were initially focused on non-core activities. However, in recent times core activities
312
+ are increasingly contracted to service providers, reversing the structure of employment
313
+ in the industry by 2010, with 40% of permanent employees, while 60% were permanent
314
+ employees. The increasing replacement of permanent employees with temporary ones
315
+ has fueled concern in the industry, led by labour unions, which have expressed
316
+ concern about the sub-standard welfare of contract workers. This development has
317
+ led the Federal government of Nigeria to issue guidelines on staff contracting
318
+ and outsourcing in the Nigerian oil and gas industry.
319
+ - source_sentence: How does the predictive reservoir effectiveness model aid in the
320
+ exploration of the Winduck Interval?
321
+ sentences:
322
+ - 'In recent years, the challenge of reducing accident costs, the results of inquiries
323
+ into large-scale disasters has highlighted the important role of a proactive approach
324
+ to safety management.
325
+
326
+ This has led to many organizations assigning high priority to improve an organization''s
327
+ safety culture. Safety Culture of any organization has an impact on organization
328
+ image, productivity and profitability.
329
+
330
+ This paper describes the importance of applying safety culture into the company
331
+ business and provide a practical knowledge required to put safety culture characteristics
332
+ in place. Many organizations have realized that this provides the perfect opportunity
333
+ for them to streamline their operational process and optimize the associated management
334
+ and control system.
335
+
336
+ It is also true to say that people do not really know what a "safety culture"
337
+ is.
338
+
339
+ Busy Managers asked ‘what does an identifiable safety culture look like?’
340
+
341
+ Definition saying that it is the product of people''s values and beliefs, their
342
+ behavior, and their commitment to Health and Safety programs.
343
+
344
+ Different levels of efforts are concerned with developing strategic plans, converting
345
+ these into action plans and implementing these so that the organization can fully
346
+ integrate safety into all of its systems. Then the most important indicator of
347
+ a positive safety culture is the extent to which employees are actively in safety
348
+ on daily basis.
349
+
350
+ So many organizational endeavors, one of the most salient features that affects
351
+ people''s motivation is the total commitment of senior management and line management.
352
+ This feature in particular has been shown to account for much of the variation
353
+ in safety performance at many different levels in an organization. Since the development
354
+ of a proactive safety culture is an empowering process that aims to win people''s
355
+ hearts and minds, it is absolutely vital that senior management actively demonstrate
356
+ their commitment by providing the necessary leadership.'
357
+ - 'In this multi-Tcf subsea gas development off the North West coast of Australia,
358
+ reservoir simulation supports the key business decisions and processes. An important
359
+ factor when providing production forecasts is ensuring that a range of possible
360
+ outcomes (low-mid-high) are captured accurately by the models. The output from
361
+ these models may then be used by decision makers for evaluating different developments
362
+ and scenarios. The design of experiments (DoE) is commonly employed to aid the
363
+ evaluation of subsurface uncertainties and characterise the impact and influence
364
+ to key model outcomes supporting development decisions.
365
+
366
+ Field production performance is often driven by uncertainty in reservoir outcome.
367
+ This paper is helpful to practitioners involved in any computer modelling of petroleum
368
+ reservoirs who are interested in capturing the uncertainty inherent in a field
369
+ and building an appropriate workflow for the development and sensitivity of a
370
+ range of models. Both model building and using DoE to evaluate developments and
371
+ Value of Information (VoI) studies for reservoir management will be shared. Integrated
372
+ DoE focusing on static, dynamic and well-based uncertainties will be illustrated.
373
+
374
+ Results will cover:
375
+
376
+
377
+
378
+ Lessons learned and best practices using ED (Experimental Design) to generate
379
+ low-mid-high reservoir simulation models
380
+
381
+
382
+
383
+ Understanding reservoir and well based uncertainties separately
384
+
385
+
386
+
387
+ Evaluating incremental field developments using ED
388
+
389
+
390
+
391
+ Utilizing ED to anticipate range of surveillance responses
392
+
393
+ Few papers exist on the integrated application of ED to giant gas fields using
394
+ reservoir simulation. Firstly, this case study will highlight some pitfalls to
395
+ avoid during the workflow. Secondly, the authors will discuss the important issue
396
+ of how to integrate or separate static, dynamic, well and facility based uncertainties.
397
+ Thirdly, the work will show the unique application of ED in VoI and field development
398
+ scoping.'
399
+ - The latest Silurian to Early Devonian Winduck Interval of the extensive but poorly
400
+ exposed Neckarboo Sub-basin, consists of several thousands of metres of a quartzose
401
+ siliciclastic sandstone succession that has been divided into three sequence divisions
402
+ called (in ascending parasequence order) parasequence A (coarse-grained quartz
403
+ sandstone), parasequence B (fining-upward succession of sandstone with siltstone
404
+ and sandstone beds thicken upward) and parasequence C (coarse-grained quartz sandstone
405
+ with siltstone and interbedded calcareous sandstones). These three geophysically
406
+ defined parasequences are separated by slightly discordant erosion surfaces. The
407
+ erosion surfaces are characterised by abrupt breaks at the top of parasequences
408
+ A and B and the surface at the top of parasequence B represents relatively local
409
+ erosion. The top of parasequence C is marked by a major unconformity with the
410
+ Snake Cave Interval. Gamma ray and self-potential signatures within the parasequences
411
+ can be correlated throughout the Neckarboo Sub-basin. The three sequence divisions
412
+ are further subdivided into depositional parasequences, which are readily recognised
413
+ from core sedimentology and electrofacies analysis. The parasequences provide
414
+ the framework for a detailed sedimentological analysis, which focuses on the identification
415
+ of lithofacies successions and parasequences. Petrophysical data are recorded
416
+ and their relationships to the depositional parasequences are discussed. This
417
+ paper presents a predictive reservoir effectiveness model that has been developed
418
+ to aid exploration of the Winduck Interval. The aim is to find the distribution
419
+ of parasequences (based on variations in porosity, net effective thickness and
420
+ lithofacies with burial depth) and to provide a dataset for lithostratigraphic
421
+ units within the Winduck Interval and parameter input for exploration prospect
422
+ evaluation. Parasequence stratigraphic analyses were obtained where there is good
423
+ lithofacies control. The porosity and permeability results have been analyzed
424
+ in a number of parasequences and poor reservoir quality may be due to the effects
425
+ of structure and fluid flow. This approach provides for better and more precise
426
+ stratigraphic trap analysis.
427
+ datasets:
428
+ - Sampath1987/offshore_energy_v1
429
+ pipeline_tag: sentence-similarity
430
+ library_name: sentence-transformers
431
+ metrics:
432
+ - cosine_accuracy
433
+ model-index:
434
+ - name: SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
435
+ results:
436
+ - task:
437
+ type: triplet
438
+ name: Triplet
439
+ dataset:
440
+ name: ai job validation
441
+ type: ai-job-validation
442
+ metrics:
443
+ - type: cosine_accuracy
444
+ value: 0.9800142645835876
445
+ name: Cosine Accuracy
446
+ ---
447
+
448
+ # SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
449
+
450
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) on the [offshore_energy_v1](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
451
+
452
+ ## Model Details
453
+
454
+ ### Model Description
455
+ - **Model Type:** Sentence Transformer
456
+ - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) <!-- at revision 9bbca17d9273fd0d03d5725c7a4b0f6b45142062 -->
457
+ - **Maximum Sequence Length:** 8192 tokens
458
+ - **Output Dimensionality:** 768 dimensions
459
+ - **Similarity Function:** Cosine Similarity
460
+ - **Training Dataset:**
461
+ - [offshore_energy_v1](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1)
462
+ <!-- - **Language:** Unknown -->
463
+ <!-- - **License:** Unknown -->
464
+
465
+ ### Model Sources
466
+
467
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
468
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
469
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
470
+
471
+ ### Full Model Architecture
472
+
473
+ ```
474
+ SentenceTransformer(
475
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'NewModel'})
476
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
477
+ (2): Normalize()
478
+ )
479
+ ```
480
+
481
+ ## Usage
482
+
483
+ ### Direct Usage (Sentence Transformers)
484
+
485
+ First install the Sentence Transformers library:
486
+
487
+ ```bash
488
+ pip install -U sentence-transformers
489
+ ```
490
+
491
+ Then you can load this model and run inference.
492
+ ```python
493
+ from sentence_transformers import SentenceTransformer
494
+
495
+ # Download from the 🤗 Hub
496
+ model = SentenceTransformer("Sampath1987/EnergyEmbed-nv1")
497
+ # Run inference
498
+ sentences = [
499
+ 'How does the predictive reservoir effectiveness model aid in the exploration of the Winduck Interval?',
500
+ 'The latest Silurian to Early Devonian Winduck Interval of the extensive but poorly exposed Neckarboo Sub-basin, consists of several thousands of metres of a quartzose siliciclastic sandstone succession that has been divided into three sequence divisions called (in ascending parasequence order) parasequence A (coarse-grained quartz sandstone), parasequence B (fining-upward succession of sandstone with siltstone and sandstone beds thicken upward) and parasequence C (coarse-grained quartz sandstone with siltstone and interbedded calcareous sandstones). These three geophysically defined parasequences are separated by slightly discordant erosion surfaces. The erosion surfaces are characterised by abrupt breaks at the top of parasequences A and B and the surface at the top of parasequence B represents relatively local erosion. The top of parasequence C is marked by a major unconformity with the Snake Cave Interval. Gamma ray and self-potential signatures within the parasequences can be correlated throughout the Neckarboo Sub-basin. The three sequence divisions are further subdivided into depositional parasequences, which are readily recognised from core sedimentology and electrofacies analysis. The parasequences provide the framework for a detailed sedimentological analysis, which focuses on the identification of lithofacies successions and parasequences. Petrophysical data are recorded and their relationships to the depositional parasequences are discussed. This paper presents a predictive reservoir effectiveness model that has been developed to aid exploration of the Winduck Interval. The aim is to find the distribution of parasequences (based on variations in porosity, net effective thickness and lithofacies with burial depth) and to provide a dataset for lithostratigraphic units within the Winduck Interval and parameter input for exploration prospect evaluation. Parasequence stratigraphic analyses were obtained where there is good lithofacies control. The porosity and permeability results have been analyzed in a number of parasequences and poor reservoir quality may be due to the effects of structure and fluid flow. This approach provides for better and more precise stratigraphic trap analysis.',
501
+ 'In this multi-Tcf subsea gas development off the North West coast of Australia, reservoir simulation supports the key business decisions and processes. An important factor when providing production forecasts is ensuring that a range of possible outcomes (low-mid-high) are captured accurately by the models. The output from these models may then be used by decision makers for evaluating different developments and scenarios. The design of experiments (DoE) is commonly employed to aid the evaluation of subsurface uncertainties and characterise the impact and influence to key model outcomes supporting development decisions.\nField production performance is often driven by uncertainty in reservoir outcome. This paper is helpful to practitioners involved in any computer modelling of petroleum reservoirs who are interested in capturing the uncertainty inherent in a field and building an appropriate workflow for the development and sensitivity of a range of models. Both model building and using DoE to evaluate developments and Value of Information (VoI) studies for reservoir management will be shared. Integrated DoE focusing on static, dynamic and well-based uncertainties will be illustrated.\nResults will cover:\n–\nLessons learned and best practices using ED (Experimental Design) to generate low-mid-high reservoir simulation models\n–\nUnderstanding reservoir and well based uncertainties separately\n–\nEvaluating incremental field developments using ED\n–\nUtilizing ED to anticipate range of surveillance responses\nFew papers exist on the integrated application of ED to giant gas fields using reservoir simulation. Firstly, this case study will highlight some pitfalls to avoid during the workflow. Secondly, the authors will discuss the important issue of how to integrate or separate static, dynamic, well and facility based uncertainties. Thirdly, the work will show the unique application of ED in VoI and field development scoping.',
502
+ ]
503
+ embeddings = model.encode(sentences)
504
+ print(embeddings.shape)
505
+ # [3, 768]
506
+
507
+ # Get the similarity scores for the embeddings
508
+ similarities = model.similarity(embeddings, embeddings)
509
+ print(similarities)
510
+ # tensor([[1.0000, 0.6207, 0.1418],
511
+ # [0.6207, 1.0000, 0.0860],
512
+ # [0.1418, 0.0860, 1.0000]])
513
+ ```
514
+
515
+ <!--
516
+ ### Direct Usage (Transformers)
517
+
518
+ <details><summary>Click to see the direct usage in Transformers</summary>
519
+
520
+ </details>
521
+ -->
522
+
523
+ <!--
524
+ ### Downstream Usage (Sentence Transformers)
525
+
526
+ You can finetune this model on your own dataset.
527
+
528
+ <details><summary>Click to expand</summary>
529
+
530
+ </details>
531
+ -->
532
+
533
+ <!--
534
+ ### Out-of-Scope Use
535
+
536
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
537
+ -->
538
+
539
+ ## Evaluation
540
+
541
+ ### Metrics
542
+
543
+ #### Triplet
544
+
545
+ * Dataset: `ai-job-validation`
546
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
547
+
548
+ | Metric | Value |
549
+ |:--------------------|:---------|
550
+ | **cosine_accuracy** | **0.98** |
551
+
552
+ <!--
553
+ ## Bias, Risks and Limitations
554
+
555
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
556
+ -->
557
+
558
+ <!--
559
+ ### Recommendations
560
+
561
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
562
+ -->
563
+
564
+ ## Training Details
565
+
566
+ ### Training Dataset
567
+
568
+ #### offshore_energy_v1
569
+
570
+ * Dataset: [offshore_energy_v1](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1) at [d4682d4](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1/tree/d4682d4c446c51dfc8da8976e83e9499ef082de5)
571
+ * Size: 44,838 training samples
572
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
573
+ * Approximate statistics based on the first 1000 samples:
574
+ | | anchor | positive | negative |
575
+ |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
576
+ | type | string | string | string |
577
+ | details | <ul><li>min: 13 tokens</li><li>mean: 24.54 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 33 tokens</li><li>mean: 430.25 tokens</li><li>max: 1027 tokens</li></ul> | <ul><li>min: 45 tokens</li><li>mean: 423.92 tokens</li><li>max: 1204 tokens</li></ul> |
578
+ * Samples:
579
+ | anchor | positive | negative |
580
+ |:---------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
581
+ | <code>What benefits were realized through the adoption of remote operations services in the North Sea?</code> | <code>The North Sea has always been a pioneer for the adoption of remote operations services (ROS) in offshore drilling applications. Drilling services such as Measurement While Drilling (MWD), Logging While Drilling (LWD) and/or mud logging (ML) have been performed with an element of ROS for over the last two decades. Early adoption of these remote services delivered initial benefits to operators such as reducing HSE risks related to the travel and accommodation of field service employees at offshore rig sites. Meanwhile service companies were able to explore the added efficiencies gained by having multi-skilled employees providing a higher level of support to customers while also gaining additional agility to manage their personnel through tighter market cycles. The mutual benefit of this early adoption created a solid foundation for ROS to expand the scope of influence in drilling operations to include Directional Drilling (DD).<br>Despite the maturity of ROS within a select community of ope...</code> | <code>A new program for the development of graduate engineers has been implemented in Denmark on a stimulation vessel in the North Sea. It is designed to provide graduate engineers with a three-year period of extensive experience in offshore operations, knowledge of equipment and designing effective stimulation jobs. There are many components to the program that address training, skills, demonstration of capabilities and evidence of competence. These are essential components that ultimately lead to improved operational performance and highlights.<br>The North Sea oil and gas industry requires a constant effort to maintain the engineering skills of its offshore workers so vital to continued success. Paradoxically, there are numerous factors that hinder on site development of young engineering talent in the North Sea. There is a lack of offshore accommodation that often restricts onsite time for trainees. This is exacerbated by a low frequency of many operations compared to other provinces in the...</code> |
582
+ | <code>What is the estimated storage capacity for CO2 in the analyzed study area?</code> | <code>The oil and gas industry is a significant contributor to carbon dioxide (CO2) emissions, which have a major impact on climate change. Geoscientists in the industry play a crucial role in mitigating climate change by identifying and evaluating potential CO2 storage sites, monitoring CO2 behavior after injection, and exploring CO2 enhanced oil recovery (EOR) techniques. CO2 -EOR involves injecting CO2 into depleted oil reservoirs to increase oil production. Reservoir characterization using well log and seismic data analysis helps determine storage capacity, containment, and injectivity of reservoirs for CO2 sequestration and EOR. In this study, two sand reservoirs (RES 1 and RES 2) were analyzed, with RES 2 being considered more suitable for CO2 sequestration and CO2 -EOR. The estimated storage capacity of the study area was approximately 40 million metric tons (MT). Assessments of fault sealing capacity and reservoir properties were conducted to validate storage potential. Further inves...</code> | <code>Transported and geologically stored CO2 contains several impurities that depend on its source and associated capture technology. Impurities in anthropogenic CO2 can have damaging impacts on the different elements of a CCS system, which must be considered when developing a CO2 specification (Table 1). Thus, characterising all the impurities and determining the required purity of the CO2 mixture is critically important for the safe design and operation of CCS transport and storage systems.<br>It is important to note that CO2 specifications relate to normal operations. Short-term excursions outside of the recommended maximum concentrations for each impurity may be permissible provided they do not lead to health and safety risks and / or risks to the mechanical integrity of the asset.</code> |
583
+ | <code>What is the role of a Preventive Maintenance Program (PMP) in enhancing the reliability of Electrical Submersible Pumps (ESPs)?</code> | <code>The reliability of Electrical Submersible Pumps (ESPs) is a critical target for companies managing artificially lifted fields. While efforts to continuously improve the reliability in the downhole system are crucial, it is necessary to focus on the health and long-term reliability of the ESP surface equipment. One effective approach toward achieving this goal is through conducting a comprehensive Preventive Maintenance Program (PMP) for the different components of the ESP surface system.<br>An ESP PMP should be managed without jeopardizing production strategy. The design of the PMP must meet the production demand while maintaining the best-in-class PMP practices. The well operating condition, frequency, weather, well location, required periodic inspection and preemptive servicing and replacement of surface equipment components must be considered, based on studied criterion. The design of the PMP considers equipment upgrades and thermal imaging surveillance to guarantee healthy electrical ...</code> | <code>A family of exciting new Electric Submersible Pump (ESP) technologies promises to radically improve the development economics of many oilfields and field extensions. This technology is particularly relevant to prospects in the range 5-100 million barrels reserves, which are located greater than 15 kilometres from existing platforms and often suffer uncertainties on reservoir performance (pressure, sweep, heterogeneities inflow performance etc.). Prospects in that category generally offer mediocre to inadequate economics or unacceptable risks of ‘downside’ potential. Platform development entails untenable capex exposure, whereas conventional subsea development (e.g. by gas lift) will result in very inferior production performance.<br>The new technologies which ‘unlock’ the economics of such fields are:<br>Viable subsea ESP technology is available now and will be field proven during 1994/95.<br>Proven high reliability pump systems are now available, underwritten by performance contract.<br>Bottom di...</code> |
584
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
585
+ ```json
586
+ {
587
+ "scale": 20.0,
588
+ "similarity_fct": "cos_sim",
589
+ "gather_across_devices": false
590
+ }
591
+ ```
592
+
593
+ ### Evaluation Dataset
594
+
595
+ #### offshore_energy_v1
596
+
597
+ * Dataset: [offshore_energy_v1](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1) at [d4682d4](https://huggingface.co/datasets/Sampath1987/offshore_energy_v1/tree/d4682d4c446c51dfc8da8976e83e9499ef082de5)
598
+ * Size: 5,604 evaluation samples
599
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
600
+ * Approximate statistics based on the first 1000 samples:
601
+ | | anchor | positive | negative |
602
+ |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
603
+ | type | string | string | string |
604
+ | details | <ul><li>min: 14 tokens</li><li>mean: 24.45 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 440.51 tokens</li><li>max: 1091 tokens</li></ul> | <ul><li>min: 56 tokens</li><li>mean: 426.21 tokens</li><li>max: 1152 tokens</li></ul> |
605
+ * Samples:
606
+ | anchor | positive | negative |
607
+ |:--------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
608
+ | <code>What is the role of nanocrystalline cellulose (NCC) in the formulation of hydraulic fracturing fluids?</code> | <code>Guar gum and its derivative based-gels cross-linked with boron have been used in hydraulic fracturing for decades. In order to achieve gel strength requirements, conventional fracturing requires the use of a large amount of thickener and cross-linking agent, which results in more residue and difficulty in the recovery of permeability. At the same time, the gel can be used to achieve the best thermal stability in a high pH environment. Therefore, we proposed a highly efficient organoboron nanocellulose cross-linker for low polymer loading fracturing fluids.<br>Nanocrystalline cellulose (NCC) resulted from sulfuric acid hydrolysis of cellulose microciystalline. Boron-modified nanoparticles were synthesized by one-pot reaction as nano boron cross-linker (NBC). Nanocrystalline cellulose (NCC), (3-Aminopropyl) triethoxysilane, Organic boron (OBC) was mixed at a ratio of 1:4:4 and stirred at a constant temperature of 85°C for 5 hours. The presence of surface modification was shown with FTIR spe...</code> | <code>The unstable wellbore created by the infiltration of drilling fluids into the reservoir formation is a great challenge in drilling operations. Reducing the fluid infiltration using nanoparticles (NPs) brings about a significant improvement in drilling operation. Herein, a mixture of iron oxide nanoparticle (IONP) and polyanionic cellulose nanoparticle (nano-PAC) additives were added to water-based mud (WBM) to determine their impact on rheological and filtration properties measured at 80 °F, 100 °F, and 250 °F. Polyanionic cellulose (PAC-R) was processed into nano-PAC by wet ball-milling process. The rheological behaviour, low-pressure low-temperature (LPLT), and high-pressure high-temperature (HPHT) filtration properties performance of IONP, nano-PAC, and IONP and nano-PAC mixtures were compared in the WBM. The results showed that IONP, nano-PAC, and synergy effect of IONP and nano-PAC in WBM at temperatures of 80 °F and 250 °F improved the density, 10-s and 10-min gel strength (10-s ...</code> |
609
+ | <code>What is the definition of tail gas in oil and gas engineering processes?</code> | <code>#### T <br>**Tail gas** <br>Effluent gas at the end of a process. <br>**Technical Potential** <br>The amount by which it is possible to reduce greenhouse gas emissions by implementing a<br>technology or practice that has reached the demonstration phase. <br>**Tectonically active area** <br>Area of the Earth where deformation is presently causing structural changes. <br>**Thermocline** <br>The ocean phenomenon characterized by a sharp change in temperature with depth. <br>**Thermohaline** <br>The vertical overturning of water masses due to seasonal heating, evaporation, and cooling. <br>**Third party** <br>Entity that is independent of the parties involved with the issues in question Top-down model.<br>A model based on applying macro-economic theory and econometric techniques to historical<br>data about consumption, prices, etc. <br>**Tracer** <br>A chemical compound or isotope added in small quantities to trace flow patterns. <br>36</code> | <code>SUSTAINABILITY REPORTING GUIDANCE FOR THE OIL AND GAS INDUSTRY <br>**Particulate matter:** A complex mixture of small particles or droplets such as salts, organic<br>chemicals, metals and soil particles [ENV-5]. <br>**Petrochemicals:** Chemical products derived from oil and gas. <br>**Pipelines:** Construction and use of facilities to transport liquid or gaseous hydrocarbons<br>over long distances in above-ground, below-ground or underwater pipes. <br>**Primary containment:** The vessel, pipe, barrel, equipment or other barrier that is designed<br>to keep a material within it [ENV-6, ENV-7, SHS-6]. <br>**Primary energy:** The energy content of a hydrocarbon fuel or other energy source used to<br>produce power, usually in the form of electricity, heat or steam [CCE-6]. <br>**Process safety:** A systematic approach to ensuring the safe containment of hazardous<br>materials or energy by applying good design, construction and operating principles [SHS-6].<br>In this Guidance, this term is used synonymously with Asset i...</code> |
610
+ | <code>How is dense phase acid gas injected back into the formation to mitigate environmental impacts?</code> | <code>A systematic hazard management approach was used to identify, assess and mitigate hazards at the conceptual design stage of a large onshore sour gas development in Abu Dhabi. The potential environmental impact of sulphur block production and poor prospects of a sulphur market led to a concept involving injection of dense phase acid gas back into the formation. Significant Health, Safety and Environmental (HSE) challenges were addressed relating to the scale of the sour gas development which included the gathering, processing and injection of sour/acid gas containing 33% – 80% H2S. Quantitative Risk Assessment and H2S dispersion calculations were performed to evaluate the risk reduction effectiveness of specific HSE design considerations including material selection, pipeline design, pipeline routing, well design and the location of the processing facility and sour/acid gas wells. These HSE design considerations were integrated into the concept selection. Best industry practices in desi...</code> | <code>Nowadays, as the deep gas reservoirs in Daqing are explored, the complex volcanic reservoirs have been the major reservoirs in deep natural gas exploration and production. The reserves of volcanic gas reservoirs take up 88% of the total gas reserves. However, the deep complex gas reservoirs may cause heavy pollution during the drilling completion, and some of the barriers between target zones of the wells are very thin, leading to a poor stability. Additionally, because of the complex water/gas relations in the formation, such as appearance of bottom water and water and gas sharing the same formation in some wells, the fracturing operations will induce water channeling. All these facts may cause the failure of the fracturing operations.<br>Especially, when the fractured formation is close to the water/gas interface, the fractures will easily extend into the water layer. The existence of water in the gas wells directly leads to the reduction of production and recovery rate of the gas reser...</code> |
611
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
612
+ ```json
613
+ {
614
+ "scale": 20.0,
615
+ "similarity_fct": "cos_sim",
616
+ "gather_across_devices": false
617
+ }
618
+ ```
619
+
620
+ ### Training Hyperparameters
621
+ #### Non-Default Hyperparameters
622
+
623
+ - `eval_strategy`: steps
624
+ - `per_device_train_batch_size`: 16
625
+ - `per_device_eval_batch_size`: 16
626
+ - `learning_rate`: 2e-05
627
+ - `num_train_epochs`: 1
628
+ - `warmup_ratio`: 0.1
629
+
630
+ #### All Hyperparameters
631
+ <details><summary>Click to expand</summary>
632
+
633
+ - `overwrite_output_dir`: False
634
+ - `do_predict`: False
635
+ - `eval_strategy`: steps
636
+ - `prediction_loss_only`: True
637
+ - `per_device_train_batch_size`: 16
638
+ - `per_device_eval_batch_size`: 16
639
+ - `per_gpu_train_batch_size`: None
640
+ - `per_gpu_eval_batch_size`: None
641
+ - `gradient_accumulation_steps`: 1
642
+ - `eval_accumulation_steps`: None
643
+ - `torch_empty_cache_steps`: None
644
+ - `learning_rate`: 2e-05
645
+ - `weight_decay`: 0.0
646
+ - `adam_beta1`: 0.9
647
+ - `adam_beta2`: 0.999
648
+ - `adam_epsilon`: 1e-08
649
+ - `max_grad_norm`: 1.0
650
+ - `num_train_epochs`: 1
651
+ - `max_steps`: -1
652
+ - `lr_scheduler_type`: linear
653
+ - `lr_scheduler_kwargs`: {}
654
+ - `warmup_ratio`: 0.1
655
+ - `warmup_steps`: 0
656
+ - `log_level`: passive
657
+ - `log_level_replica`: warning
658
+ - `log_on_each_node`: True
659
+ - `logging_nan_inf_filter`: True
660
+ - `save_safetensors`: True
661
+ - `save_on_each_node`: False
662
+ - `save_only_model`: False
663
+ - `restore_callback_states_from_checkpoint`: False
664
+ - `no_cuda`: False
665
+ - `use_cpu`: False
666
+ - `use_mps_device`: False
667
+ - `seed`: 42
668
+ - `data_seed`: None
669
+ - `jit_mode_eval`: False
670
+ - `use_ipex`: False
671
+ - `bf16`: False
672
+ - `fp16`: False
673
+ - `fp16_opt_level`: O1
674
+ - `half_precision_backend`: auto
675
+ - `bf16_full_eval`: False
676
+ - `fp16_full_eval`: False
677
+ - `tf32`: None
678
+ - `local_rank`: 0
679
+ - `ddp_backend`: None
680
+ - `tpu_num_cores`: None
681
+ - `tpu_metrics_debug`: False
682
+ - `debug`: []
683
+ - `dataloader_drop_last`: False
684
+ - `dataloader_num_workers`: 0
685
+ - `dataloader_prefetch_factor`: None
686
+ - `past_index`: -1
687
+ - `disable_tqdm`: False
688
+ - `remove_unused_columns`: True
689
+ - `label_names`: None
690
+ - `load_best_model_at_end`: False
691
+ - `ignore_data_skip`: False
692
+ - `fsdp`: []
693
+ - `fsdp_min_num_params`: 0
694
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
695
+ - `fsdp_transformer_layer_cls_to_wrap`: None
696
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
697
+ - `deepspeed`: None
698
+ - `label_smoothing_factor`: 0.0
699
+ - `optim`: adamw_torch
700
+ - `optim_args`: None
701
+ - `adafactor`: False
702
+ - `group_by_length`: False
703
+ - `length_column_name`: length
704
+ - `ddp_find_unused_parameters`: None
705
+ - `ddp_bucket_cap_mb`: None
706
+ - `ddp_broadcast_buffers`: False
707
+ - `dataloader_pin_memory`: True
708
+ - `dataloader_persistent_workers`: False
709
+ - `skip_memory_metrics`: True
710
+ - `use_legacy_prediction_loop`: False
711
+ - `push_to_hub`: False
712
+ - `resume_from_checkpoint`: None
713
+ - `hub_model_id`: None
714
+ - `hub_strategy`: every_save
715
+ - `hub_private_repo`: None
716
+ - `hub_always_push`: False
717
+ - `hub_revision`: None
718
+ - `gradient_checkpointing`: False
719
+ - `gradient_checkpointing_kwargs`: None
720
+ - `include_inputs_for_metrics`: False
721
+ - `include_for_metrics`: []
722
+ - `eval_do_concat_batches`: True
723
+ - `fp16_backend`: auto
724
+ - `push_to_hub_model_id`: None
725
+ - `push_to_hub_organization`: None
726
+ - `mp_parameters`:
727
+ - `auto_find_batch_size`: False
728
+ - `full_determinism`: False
729
+ - `torchdynamo`: None
730
+ - `ray_scope`: last
731
+ - `ddp_timeout`: 1800
732
+ - `torch_compile`: False
733
+ - `torch_compile_backend`: None
734
+ - `torch_compile_mode`: None
735
+ - `include_tokens_per_second`: False
736
+ - `include_num_input_tokens_seen`: False
737
+ - `neftune_noise_alpha`: None
738
+ - `optim_target_modules`: None
739
+ - `batch_eval_metrics`: False
740
+ - `eval_on_start`: False
741
+ - `use_liger_kernel`: False
742
+ - `liger_kernel_config`: None
743
+ - `eval_use_gather_object`: False
744
+ - `average_tokens_across_devices`: False
745
+ - `prompts`: None
746
+ - `batch_sampler`: batch_sampler
747
+ - `multi_dataset_batch_sampler`: proportional
748
+ - `router_mapping`: {}
749
+ - `learning_rate_mapping`: {}
750
+
751
+ </details>
752
+
753
+ ### Training Logs
754
+ | Epoch | Step | Validation Loss | ai-job-validation_cosine_accuracy |
755
+ |:------:|:----:|:---------------:|:---------------------------------:|
756
+ | 0.3568 | 1000 | 0.0982 | 0.9764 |
757
+ | 0.7135 | 2000 | 0.0870 | 0.9800 |
758
+
759
+
760
+ ### Framework Versions
761
+ - Python: 3.10.12
762
+ - Sentence Transformers: 5.1.0
763
+ - Transformers: 4.53.3
764
+ - PyTorch: 2.8.0+cu128
765
+ - Accelerate: 1.9.0
766
+ - Datasets: 4.0.0
767
+ - Tokenizers: 0.21.2
768
+
769
+ ## Citation
770
+
771
+ ### BibTeX
772
+
773
+ #### Sentence Transformers
774
+ ```bibtex
775
+ @inproceedings{reimers-2019-sentence-bert,
776
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
777
+ author = "Reimers, Nils and Gurevych, Iryna",
778
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
779
+ month = "11",
780
+ year = "2019",
781
+ publisher = "Association for Computational Linguistics",
782
+ url = "https://arxiv.org/abs/1908.10084",
783
+ }
784
+ ```
785
+
786
+ #### MultipleNegativesRankingLoss
787
+ ```bibtex
788
+ @misc{henderson2017efficient,
789
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
790
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
791
+ year={2017},
792
+ eprint={1705.00652},
793
+ archivePrefix={arXiv},
794
+ primaryClass={cs.CL}
795
+ }
796
+ ```
797
+
798
+ <!--
799
+ ## Glossary
800
+
801
+ *Clearly define terms in order to be accessible across audiences.*
802
+ -->
803
+
804
+ <!--
805
+ ## Model Card Authors
806
+
807
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
808
+ -->
809
+
810
+ <!--
811
+ ## Model Card Contact
812
+
813
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
814
+ -->
config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NewModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "auto_map": {
7
+ "AutoConfig": "configuration.NewConfig",
8
+ "AutoModel": "modeling.NewModel",
9
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
10
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
11
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
12
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
13
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
14
+ },
15
+ "classifier_dropout": 0.0,
16
+ "hidden_act": "gelu",
17
+ "hidden_dropout_prob": 0.1,
18
+ "hidden_size": 768,
19
+ "id2label": {
20
+ "0": "LABEL_0"
21
+ },
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 3072,
24
+ "label2id": {
25
+ "LABEL_0": 0
26
+ },
27
+ "layer_norm_eps": 1e-12,
28
+ "layer_norm_type": "layer_norm",
29
+ "logn_attention_clip1": false,
30
+ "logn_attention_scale": false,
31
+ "max_position_embeddings": 8192,
32
+ "model_type": "new",
33
+ "num_attention_heads": 12,
34
+ "num_hidden_layers": 12,
35
+ "pack_qkv": true,
36
+ "pad_token_id": 1,
37
+ "position_embedding_type": "rope",
38
+ "rope_scaling": {
39
+ "factor": 8.0,
40
+ "type": "ntk"
41
+ },
42
+ "rope_theta": 20000,
43
+ "torch_dtype": "float32",
44
+ "transformers_version": "4.53.3",
45
+ "type_vocab_size": 1,
46
+ "unpad_inputs": false,
47
+ "use_memory_efficient_attention": false,
48
+ "vocab_size": 250048
49
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.53.3",
6
+ "pytorch": "2.8.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef910e1bc051c7278474399f603dc788cfdcf73b75c5e41128bd990822828f7f
3
+ size 1221487872
modeling.py ADDED
@@ -0,0 +1,1418 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """PyTorch NEW model."""
17
+
18
+ import math
19
+ from dataclasses import dataclass
20
+ from typing import List, Optional, Tuple, Union
21
+
22
+ import torch
23
+ import torch.utils.checkpoint
24
+ from torch import nn
25
+
26
+ from transformers.activations import ACT2FN
27
+ from transformers.modeling_outputs import (
28
+ BaseModelOutput,
29
+ BaseModelOutputWithPooling,
30
+ MaskedLMOutput,
31
+ MultipleChoiceModelOutput,
32
+ QuestionAnsweringModelOutput,
33
+ SequenceClassifierOutput,
34
+ ModelOutput,
35
+ )
36
+ from transformers.modeling_utils import PreTrainedModel
37
+ from transformers.utils import logging
38
+
39
+ try:
40
+ import xformers.ops as xops
41
+ except ImportError as e:
42
+ xops = None
43
+
44
+ from .configuration import NewConfig
45
+
46
+
47
+ logger = logging.get_logger(__name__)
48
+
49
+
50
+ # Adapted from https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/bert_padding.py
51
+ # Which was adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
52
+ class IndexFirstAxis(torch.autograd.Function):
53
+ @staticmethod
54
+ def forward(ctx, input, indices):
55
+ ctx.save_for_backward(indices)
56
+ assert input.ndim >= 2
57
+ ctx.first_axis_dim, other_shape = input.shape[0], input.shape[1:]
58
+ second_dim = other_shape.numel()
59
+ # TD [2022-03-04] For some reason torch.gather is a bit faster than indexing.
60
+ # return input[indices]
61
+ # return torch.gather(
62
+ # rearrange(input, "b ... -> b (...)"), 0, repeat(indices, "z -> z d", d=second_dim)
63
+ # ).reshape(-1, *other_shape)
64
+ return torch.gather(
65
+ input.view(ctx.first_axis_dim, second_dim),
66
+ 0,
67
+ indices.unsqueeze(-1).expand(indices.size(0), second_dim)
68
+ ).reshape(-1, *other_shape)
69
+
70
+ @staticmethod
71
+ def backward(ctx, grad_output):
72
+ (indices,) = ctx.saved_tensors
73
+ assert grad_output.ndim >= 2
74
+ other_shape = grad_output.shape[1:]
75
+ # grad_output = rearrange(grad_output, "b ... -> b (...)")
76
+ grad_output = grad_output.view(grad_output.size(0), other_shape.numel())
77
+ grad_input = torch.zeros(
78
+ [ctx.first_axis_dim, grad_output.shape[1]],
79
+ device=grad_output.device,
80
+ dtype=grad_output.dtype,
81
+ )
82
+ # TD [2022-03-04] For some reason torch.scatter is a bit faster than indexing.
83
+ # grad_input[indices] = grad_output
84
+ # grad_input.scatter_(0, repeat(indices, "z -> z d", d=grad_output.shape[1]), grad_output)
85
+ grad_input.scatter_(
86
+ 0, indices.unsqueeze(-1).expand(indices.size(0), grad_output.size(1)), grad_output
87
+ )
88
+ return grad_input.reshape(ctx.first_axis_dim, *other_shape), None
89
+
90
+
91
+ index_first_axis = IndexFirstAxis.apply
92
+
93
+
94
+ def unpad_input(hidden_states, attention_mask=None, indices=None):
95
+ """
96
+ Arguments:
97
+ hidden_states: (batch, seqlen, ...)
98
+ attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
99
+ indices: (total_nnz), the indices of non-masked tokens from the flattened input sequence.
100
+ Return:
101
+ hidden_states: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
102
+ """
103
+ if indices is None:
104
+ assert attention_mask is not None
105
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
106
+
107
+ # TD [2022-03-04] We don't want to index with a bool mask, because Pytorch will expand the
108
+ # bool mask, then call nonzero to get the indices, then index with those. The indices is @dim
109
+ # times larger than it needs to be, wasting memory. It's faster and more memory-efficient to
110
+ # index with integer indices. Moreover, torch's index is a bit slower than it needs to be,
111
+ # so we write custom forward and backward to make it a bit faster.
112
+ hidden_states = hidden_states.view(-1, *hidden_states.shape[2:])
113
+ return index_first_axis(hidden_states, indices)
114
+
115
+
116
+ class IndexPutFirstAxis(torch.autograd.Function):
117
+ @staticmethod
118
+ def forward(
119
+ ctx,
120
+ values: torch.Tensor,
121
+ indices: torch.Tensor,
122
+ first_axis_dim
123
+ ) -> torch.Tensor:
124
+ ctx.save_for_backward(indices)
125
+ assert indices.ndim == 1
126
+ assert values.ndim >= 2
127
+ output = torch.zeros(
128
+ first_axis_dim, *values.shape[1:], device=values.device, dtype=values.dtype
129
+ )
130
+ output[indices] = values
131
+ return output
132
+
133
+ @staticmethod
134
+ def backward(ctx, grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
135
+ indices, = ctx.saved_tensors
136
+ grad_values = grad_output[indices]
137
+ return grad_values, None, None
138
+
139
+
140
+ index_put_first_axis = IndexPutFirstAxis.apply
141
+
142
+
143
+ def pad_input(inputs: torch.Tensor, indices: torch.Tensor, batch: int, seqlen: int) -> torch.Tensor:
144
+ """Add padding to sequences.
145
+
146
+ Arguments:
147
+ inputs: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
148
+ indices: (total_nnz), `indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()`
149
+ batch: int batch_size
150
+ seqlen: int max sequence length
151
+
152
+ Returns:
153
+ inputs: (batch, seqlen, ...)
154
+ """
155
+ output = index_put_first_axis(inputs, indices, batch * seqlen)
156
+ return output.view(batch, seqlen, *inputs.shape[1:])
157
+
158
+
159
+ def rotate_half(x):
160
+ """Rotates half the hidden dims of the input."""
161
+ x1 = x[..., : x.shape[-1] // 2]
162
+ x2 = x[..., x.shape[-1] // 2 :]
163
+ return torch.cat((-x2, x1), dim=-1)
164
+
165
+
166
+ def apply_rotary_pos_emb(q, k, cos, sin):
167
+ """Applies Rotary Position Embedding to the query and key tensors.
168
+
169
+ Args:
170
+ q (`torch.Tensor`): The query tensor.
171
+ k (`torch.Tensor`): The key tensor.
172
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
173
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
174
+ Returns:
175
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
176
+ """
177
+ cos, sin = cos.to(q.dtype), sin.to(q.dtype)
178
+ q_embed = (q * cos) + (rotate_half(q) * sin)
179
+ k_embed = (k * cos) + (rotate_half(k) * sin)
180
+ return q_embed, k_embed
181
+
182
+
183
+ class RotaryEmbedding(torch.nn.Module):
184
+ def __init__(self, dim, max_position_embeddings=512, base=10000.0, device=None):
185
+ super().__init__()
186
+
187
+ self.dim = dim
188
+ self.max_position_embeddings = max_position_embeddings
189
+ self.base = base
190
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
191
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
192
+
193
+ # Build here to make `torch.jit.trace` work.
194
+ self._set_cos_sin_cache(
195
+ seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
196
+ )
197
+
198
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
199
+ self.max_seq_len_cached = seq_len
200
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
201
+
202
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
203
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
204
+ emb = torch.cat((freqs, freqs), dim=-1)
205
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
206
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
207
+
208
+ def forward(self, x, seq_len=None):
209
+ # x: [bs, num_attention_heads, seq_len, head_size]
210
+ if seq_len > self.max_seq_len_cached:
211
+ self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
212
+
213
+ return (
214
+ self.cos_cached[:seq_len, ...].to(dtype=x.dtype),
215
+ self.sin_cached[:seq_len, ...].to(dtype=x.dtype),
216
+ )
217
+
218
+
219
+ class NTKScalingRotaryEmbedding(RotaryEmbedding):
220
+ """RotaryEmbedding extended with fixed and mixed NTK scaling. https://kexue.fm/archives/9706 """
221
+
222
+ def __init__(self, dim, max_position_embeddings=512, base=10000, device=None, scaling_factor=1.0, mixed_b=None):
223
+ self.scaling_factor = scaling_factor
224
+ self.mixed_b = mixed_b
225
+ super().__init__(dim, max_position_embeddings, base, device)
226
+ max_position_embeddings = max_position_embeddings * self.scaling_factor
227
+ self._set_cos_sin_cache(max_position_embeddings, self.inv_freq.device, torch.get_default_dtype())
228
+
229
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
230
+ self.max_seq_len_cached = seq_len
231
+
232
+ if seq_len > self.max_position_embeddings:
233
+ base = self.base * (self.scaling_factor if self.mixed_b is None else 1)
234
+ inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
235
+
236
+ if self.mixed_b is None:
237
+ inv_freq = inv_freq / self.scaling_factor ** (2 / self.dim) # (6)
238
+ else:
239
+ a = torch.tensor(self.scaling_factor).log() / (self.dim / 2) ** self.mixed_b # (13)
240
+ lambda_1_m = (a * torch.arange(1, self.dim // 2 + 1).float().to(device) ** self.mixed_b).exp() # (12)
241
+ inv_freq = inv_freq / lambda_1_m # (10)
242
+
243
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
244
+
245
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
246
+
247
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
248
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
249
+ emb = torch.cat((freqs, freqs), dim=-1)
250
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
251
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
252
+
253
+
254
+ class RMSNorm(nn.Module):
255
+ def __init__(self, hidden_size, eps=1e-6):
256
+ """
257
+ RMSNorm is equivalent to T5LayerNorm
258
+ """
259
+ super().__init__()
260
+ self.weight = nn.Parameter(torch.ones(hidden_size))
261
+ self.variance_epsilon = eps
262
+
263
+ def forward(self, hidden_states):
264
+ input_dtype = hidden_states.dtype
265
+ hidden_states = hidden_states.to(torch.float32)
266
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
267
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
268
+ return self.weight * hidden_states.to(input_dtype)
269
+
270
+
271
+ LAYER_NORM = {
272
+ 'layer_norm': nn.LayerNorm,
273
+ 'rms_norm': RMSNorm
274
+ }
275
+
276
+
277
+ class NewEmbeddings(nn.Module):
278
+ """
279
+ Embedding and Unpadding.
280
+ """
281
+
282
+ def __init__(self, config: NewConfig):
283
+ super().__init__()
284
+ self.padding_idx = config.pad_token_id
285
+ self.word_embeddings = nn.Embedding(
286
+ config.vocab_size, config.hidden_size, padding_idx=self.padding_idx
287
+ )
288
+
289
+ self.position_embedding_type = config.position_embedding_type
290
+ if self.position_embedding_type == 'absolute':
291
+ self.position_embeddings = nn.Embedding(
292
+ config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
293
+ )
294
+ elif self.position_embedding_type == 'rope':
295
+ self._init_rope(config)
296
+ else:
297
+ raise ValueError
298
+
299
+ self.type_vocab_size = config.type_vocab_size
300
+ if self.type_vocab_size > 0:
301
+ self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
302
+
303
+ # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
304
+ # any TensorFlow checkpoint file
305
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
306
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
307
+ # position_ids is contiguous in memory and excluded when serialized
308
+ self.register_buffer(
309
+ "position_ids", torch.arange(config.max_position_embeddings), persistent=False
310
+ )
311
+
312
+ def _init_rope(self, config):
313
+ kwargs = dict(
314
+ dim=int(config.hidden_size / config.num_attention_heads),
315
+ max_position_embeddings=config.max_position_embeddings,
316
+ base=config.rope_theta
317
+ )
318
+ if config.rope_scaling is None:
319
+ self.rotary_emb = RotaryEmbedding(**kwargs)
320
+ else:
321
+ kwargs.update(scaling_factor=config.rope_scaling["factor"])
322
+ scaling_type = config.rope_scaling["type"]
323
+ if scaling_type == 'ntk':
324
+ kwargs.update(mixed_b=config.rope_scaling.get('mixed_b', None))
325
+ self.rotary_emb = NTKScalingRotaryEmbedding(**kwargs)
326
+ # elif scaling_type == "linear":
327
+ # self.rotary_emb = LinearScalingRotaryEmbedding(**kwargs)
328
+ # elif scaling_type == "dynamic":
329
+ # self.rotary_emb = DynamicNTKScalingRotaryEmbedding(**kwargs)
330
+ else:
331
+ raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
332
+
333
+ def forward(
334
+ self,
335
+ unpad_inputs: bool,
336
+ input_ids: Optional[torch.Tensor] = None,
337
+ attention_mask: Optional[torch.Tensor] = None,
338
+ length: Optional[List[int]] = None,
339
+ token_type_ids: Optional[torch.Tensor] = None,
340
+ position_ids: Optional[torch.Tensor] = None,
341
+ inputs_embeds: Optional[torch.Tensor] = None,
342
+ ) -> Tuple[torch.Tensor, torch.Tensor, Optional[Tuple], Optional[List[int]]]:
343
+ """
344
+ """
345
+ if inputs_embeds is None:
346
+ device, input_shape = input_ids.device, input_ids.shape
347
+ else:
348
+ device, input_shape = inputs_embeds.device, inputs_embeds.shape[:2]
349
+ batch_size, seq_length = input_shape
350
+
351
+ # Set attention_mask if it's None
352
+ if attention_mask is None:
353
+ attention_mask = torch.ones(input_shape, device=device)
354
+ if length is not None:
355
+ for i, l in enumerate(length):
356
+ attention_mask[i, l:] = 0
357
+
358
+ # Set attention_mask_bool for unpadding
359
+ if unpad_inputs:
360
+ attention_mask_bool = attention_mask.bool()
361
+ if length is None:
362
+ length = attention_mask.sum(-1).tolist()
363
+
364
+ # Get word embeddings
365
+ if inputs_embeds is None:
366
+ if unpad_inputs:
367
+ input_ids = input_ids[attention_mask_bool].unsqueeze(0)
368
+ inputs_embeds = self.word_embeddings(input_ids)
369
+ else:
370
+ if unpad_inputs:
371
+ inputs_embeds = inputs_embeds[attention_mask_bool].unsqueeze(0)
372
+ embeddings = inputs_embeds
373
+
374
+ # Set and unpad position_ids
375
+ if position_ids is None:
376
+ if seq_length > self.position_ids.size(0):
377
+ self.register_buffer(
378
+ "position_ids", torch.arange(seq_length, device=embeddings.device), persistent=False
379
+ )
380
+ if unpad_inputs:
381
+ # [1, cumsum_seq_len]
382
+ position_ids = torch.cat([self.position_ids[:l] for l in length]).unsqueeze(0)
383
+ else:
384
+ # [bs, seq_len]
385
+ position_ids = self.position_ids[:seq_length].expand(batch_size, -1)
386
+ elif unpad_inputs:
387
+ position_ids = position_ids[attention_mask_bool].unsqueeze(0) # [1, cumsum_seq_len]
388
+
389
+ # Compute rotary embedding
390
+ if self.position_embedding_type == 'rope':
391
+ rope_cos, rope_sin = self.rotary_emb(inputs_embeds, seq_len=seq_length)
392
+ rope_cos = rope_cos[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
393
+ rope_sin = rope_sin[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
394
+ rope_embeds = rope_cos, rope_sin
395
+ else:
396
+ rope_embeds = None
397
+
398
+ if self.type_vocab_size > 0:
399
+ if token_type_ids is None:
400
+ token_type_ids = position_ids.mul(0)
401
+ else:
402
+ if self.type_vocab_size < 2:
403
+ token_type_ids.mul_(0)
404
+ if unpad_inputs:
405
+ token_type_ids = token_type_ids[attention_mask_bool].unsqueeze(0)
406
+
407
+ token_type_embeddings = self.token_type_embeddings(token_type_ids)
408
+ embeddings = embeddings + token_type_embeddings
409
+
410
+ # BERT position
411
+ if self.position_embedding_type == "absolute":
412
+ position_embeddings = self.position_embeddings(position_ids)
413
+ embeddings = embeddings + position_embeddings
414
+
415
+ embeddings = self.LayerNorm(embeddings)
416
+ embeddings = self.dropout(embeddings)
417
+
418
+ return embeddings, attention_mask, rope_embeds, length
419
+
420
+
421
+ class NewAttention(nn.Module):
422
+ def __init__(self, config: NewConfig, pack_qkv=None, use_memory_efficient_attention=None):
423
+ super().__init__()
424
+ self.config = config
425
+ if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
426
+ raise ValueError(
427
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
428
+ f"heads ({config.num_attention_heads})"
429
+ )
430
+
431
+ self.hidden_size = config.hidden_size
432
+ self.num_attention_heads = config.num_attention_heads
433
+ self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
434
+ self.all_head_size = self.num_attention_heads * self.attention_head_size
435
+
436
+ if pack_qkv is None:
437
+ pack_qkv = config.pack_qkv
438
+ self.pack_qkv = pack_qkv
439
+
440
+ if self.pack_qkv:
441
+ self.qkv_proj = nn.Linear(config.hidden_size, self.all_head_size * 3, bias=True)
442
+ else:
443
+ self.q_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
444
+ self.k_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
445
+ self.v_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
446
+
447
+ self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
448
+ self.o_proj = nn.Linear(config.hidden_size, config.hidden_size, bias=True)
449
+
450
+ if use_memory_efficient_attention is None:
451
+ use_memory_efficient_attention = self.config.use_memory_efficient_attention
452
+ self.use_memory_efficient_attention = use_memory_efficient_attention
453
+ self.memory_efficient_attention = None if xops is None else xops.memory_efficient_attention
454
+ if self.use_memory_efficient_attention:
455
+ assert self.memory_efficient_attention is not None, 'please install xformers'
456
+
457
+ def forward(
458
+ self,
459
+ hidden_states: torch.Tensor,
460
+ attention_bias: torch.FloatTensor,
461
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
462
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
463
+ attention_scale: Optional[torch.FloatTensor] = None,
464
+ head_mask: Optional[torch.FloatTensor] = None,
465
+ output_attentions: Optional[bool] = False,
466
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
467
+ ) -> Tuple[torch.Tensor, ...]:
468
+ shape_hd = (self.num_attention_heads, self.attention_head_size)
469
+ # qkv
470
+ if self.pack_qkv and qkv_inputs is None:
471
+ qkv_pack = self.qkv_proj(hidden_states).split(self.all_head_size, dim=-1)
472
+ else:
473
+ if qkv_inputs is None:
474
+ qkv_inputs = (hidden_states, hidden_states, hidden_states)
475
+ qkv_pack = [
476
+ getattr(self, n + '_proj')(s) for s, n in zip(qkv_inputs, 'qkv')
477
+ ]
478
+ query_states, key_states, value_states = [t.view(t.shape[:-1] + shape_hd) for t in qkv_pack]
479
+
480
+ if self.config.position_embedding_type == 'rope':
481
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, *rope_embeds)
482
+
483
+ dtype = query_states.dtype
484
+
485
+ if self.config.logn_attention_scale and attention_scale is not None:
486
+ # https://kexue.fm/archives/8823
487
+ query_states = query_states * attention_scale.to(dtype)
488
+
489
+ if padding_inputs is not None:
490
+ query_states = pad_input(query_states.squeeze(), *padding_inputs)
491
+ key_states = pad_input(key_states.squeeze(), *padding_inputs)
492
+ value_states = pad_input(value_states.squeeze(), *padding_inputs)
493
+
494
+ if self.use_memory_efficient_attention:
495
+ assert self.memory_efficient_attention is not None, "xformers is not loaded"
496
+ assert output_attentions is False, "memory_efficient_attention do not output attentions"
497
+ assert head_mask is None, "Not support yet"
498
+ attention_probs = None
499
+ if torch.is_tensor(attention_bias):
500
+ attention_bias = attention_bias.to(dtype)
501
+ context_layer = self.memory_efficient_attention(
502
+ query_states,
503
+ key_states,
504
+ value_states,
505
+ attn_bias=attention_bias,
506
+ p=self.dropout.p
507
+ )
508
+ else:
509
+ if output_attentions and isinstance(self, NewSdpaAttention):
510
+ raise RuntimeError("SDPA do not output attentions")
511
+ context_layer, attention_probs = self._attention(
512
+ query_states, key_states, value_states, attention_bias, head_mask
513
+ )
514
+
515
+ if padding_inputs is not None:
516
+ context_layer = unpad_input(context_layer, indices=padding_inputs[0])
517
+
518
+ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
519
+ context_layer = context_layer.view(new_context_layer_shape)
520
+
521
+ # output proj
522
+ attn_output = self.o_proj(context_layer)
523
+
524
+ # add attentions if we output them
525
+ outputs = (attn_output, attention_probs) if output_attentions else (attn_output,)
526
+ return outputs
527
+
528
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
529
+ """
530
+ Args:
531
+ q/k/v: (B, L, n_head, head_dim),
532
+ Returns:
533
+ attn_output: (B L, n_head, head_dim)
534
+ """
535
+ query_states = query_states.transpose(1, 2)
536
+ key_states = key_states.transpose(1, 2)
537
+ value_states = value_states.transpose(1, 2)
538
+ # Take the dot product between "query" and "key" to get the raw attention scores.
539
+ attention_scores = torch.matmul(query_states, key_states.transpose(-1, -2))
540
+
541
+ attention_scores = attention_scores / math.sqrt(self.attention_head_size)
542
+ if attention_bias is not None:
543
+ # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
544
+ attention_scores = attention_scores + attention_bias
545
+
546
+ # Normalize the attention scores to probabilities.
547
+ attention_probs = nn.functional.softmax(attention_scores, dim=-1)
548
+
549
+ # This is actually dropping out entire tokens to attend to, which might
550
+ # seem a bit unusual, but is taken from the original Transformer paper.
551
+ if self.dropout.p > 0:
552
+ attention_probs = self.dropout(attention_probs)
553
+
554
+ # Mask heads if we want to
555
+ if head_mask is not None:
556
+ attention_probs = attention_probs * head_mask
557
+
558
+ context_layer = torch.matmul(attention_probs, value_states)
559
+
560
+ context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
561
+ return context_layer, attention_probs
562
+
563
+
564
+ class NewSdpaAttention(NewAttention):
565
+ """
566
+ New attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
567
+ `NewAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
568
+ SDPA API.
569
+ """
570
+ def __init__(self, config: NewConfig, **kwargs):
571
+ super().__init__(config, **kwargs)
572
+ # torch.backends.cuda.enable_mem_efficient_sdp(False)
573
+ # logger.warning(
574
+ # "Disable memory efficient attention kernel for `NewSdpaAttention`, you can set "
575
+ # "`use_memory_efficient_attention=True` if it expected to use."
576
+ # )
577
+
578
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
579
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
580
+ query_states.transpose(1, 2),
581
+ key_states.transpose(1, 2),
582
+ value_states.transpose(1, 2),
583
+ attn_mask=attention_bias,
584
+ dropout_p=self.dropout.p if self.training else 0.0,
585
+ )
586
+ attn_output = attn_output.permute(0, 2, 1, 3).contiguous()
587
+ return attn_output, None
588
+
589
+
590
+ NEW_ATTENTION_CLASSES = {
591
+ "eager": NewAttention,
592
+ # "flash_attention_2": , # TODO
593
+ "sdpa": NewSdpaAttention,
594
+ }
595
+
596
+
597
+ class NewGatedMLP(nn.Module):
598
+ """
599
+ GLU Variants Improve Transformer.
600
+ """
601
+
602
+ def __init__(self, config: NewConfig):
603
+ super().__init__()
604
+ self.intermediate_size = config.intermediate_size
605
+ self.up_gate_proj = nn.Linear(config.hidden_size, self.intermediate_size * 2, bias=False)
606
+ self.down_proj = nn.Linear(self.intermediate_size, config.hidden_size, bias=True)
607
+ self.act_fn = ACT2FN[config.hidden_act]
608
+ if config.hidden_dropout_prob > 0:
609
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
610
+ else:
611
+ self.hidden_dropout = None
612
+
613
+ def forward(self, hidden_states):
614
+ up_gate = self.up_gate_proj(hidden_states)
615
+ up_states, gate = torch.split(up_gate, self.intermediate_size, dim=-1)
616
+ gate = self.act_fn(gate)
617
+ gated_states = gate * up_states
618
+ if self.hidden_dropout is not None:
619
+ gated_states = self.hidden_dropout(gated_states)
620
+ down_states = self.down_proj(gated_states)
621
+ return down_states
622
+
623
+
624
+ class NewLayer(nn.Module):
625
+ def __init__(
626
+ self,
627
+ config: NewConfig,
628
+ pack_qkv=None,
629
+ use_memory_efficient_attention=None,
630
+ attn_implementation=None
631
+ ):
632
+ super().__init__()
633
+ if attn_implementation is None:
634
+ attn_implementation = config._attn_implementation
635
+ if use_memory_efficient_attention is None:
636
+ use_memory_efficient_attention = config.use_memory_efficient_attention
637
+ if use_memory_efficient_attention:
638
+ if attn_implementation != 'eager':
639
+ logger.warning_once(f"Override {attn_implementation=} to 'eager' as {use_memory_efficient_attention=}")
640
+ attn_implementation = 'eager' # Since it will be SDPA by default for torch>=2.1.1
641
+ self.attention = NEW_ATTENTION_CLASSES[attn_implementation](
642
+ config, pack_qkv=pack_qkv, use_memory_efficient_attention=use_memory_efficient_attention
643
+ )
644
+ self.mlp = NewGatedMLP(config)
645
+
646
+ ln_class = LAYER_NORM[config.layer_norm_type]
647
+ self.attn_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
648
+ self.mlp_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
649
+
650
+ if config.hidden_dropout_prob > 0:
651
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
652
+ else:
653
+ self.hidden_dropout = None
654
+
655
+ def forward(
656
+ self,
657
+ hidden_states: torch.Tensor,
658
+ attention_bias: torch.FloatTensor,
659
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
660
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
661
+ attention_scale: Optional[torch.FloatTensor] = None,
662
+ subset_indices: Optional[torch.LongTensor] = None,
663
+ head_mask: Optional[torch.FloatTensor] = None,
664
+ output_attentions: Optional[bool] = False,
665
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
666
+ ) -> Tuple[torch.Tensor, ...]:
667
+ # Multi head self attention
668
+ residual = hidden_states if qkv_inputs is None else qkv_inputs[0]
669
+ attention_outputs = self.attention(
670
+ hidden_states,
671
+ attention_bias,
672
+ rope_embeds,
673
+ padding_inputs,
674
+ attention_scale,
675
+ head_mask,
676
+ output_attentions=output_attentions,
677
+ qkv_inputs=qkv_inputs,
678
+ )
679
+ hidden_states = attention_outputs[0]
680
+ if self.hidden_dropout is not None:
681
+ hidden_states = self.hidden_dropout(hidden_states)
682
+ hidden_states = residual + hidden_states
683
+
684
+ # In pretraining, after the attention of last layer, we only need the masked tokens.
685
+ if subset_indices is not None:
686
+ hidden_states = hidden_states[subset_indices]
687
+
688
+ hidden_states = self.attn_ln(hidden_states)
689
+
690
+ # Fully Connected
691
+ residual = hidden_states
692
+ hidden_states = self.mlp(hidden_states)
693
+ if self.hidden_dropout is not None:
694
+ hidden_states = self.hidden_dropout(hidden_states)
695
+ hidden_states = residual + hidden_states
696
+ hidden_states = self.mlp_ln(hidden_states)
697
+
698
+ # add self attentions if we output attention weights
699
+ outputs = (hidden_states,) + attention_outputs[1:]
700
+ return outputs
701
+
702
+
703
+ class NewEncoder(nn.Module):
704
+ def __init__(self, config):
705
+ super().__init__()
706
+ self.config = config
707
+ self.layer = nn.ModuleList([NewLayer(config) for _ in range(config.num_hidden_layers)])
708
+ self.gradient_checkpointing = False
709
+
710
+ def forward(
711
+ self,
712
+ hidden_states: torch.Tensor,
713
+ attention_bias: Optional[torch.FloatTensor] = None,
714
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
715
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
716
+ attention_scale: Optional[torch.FloatTensor] = None,
717
+ subset_indices: Optional[torch.LongTensor] = None,
718
+ head_mask: Optional[torch.FloatTensor] = None,
719
+ output_attentions: Optional[bool] = False,
720
+ output_hidden_states: Optional[bool] = False,
721
+ return_dict: Optional[bool] = True,
722
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
723
+ all_hidden_states = () if output_hidden_states else None
724
+ all_self_attentions = () if output_attentions else None
725
+
726
+ for i, layer_module in enumerate(self.layer):
727
+ if output_hidden_states:
728
+ all_hidden_states = all_hidden_states + (hidden_states,)
729
+
730
+ if i >= len(self.layer) - 1:
731
+ layer_subset_indices = subset_indices
732
+ else:
733
+ layer_subset_indices = None
734
+
735
+ layer_head_mask = head_mask[i] if head_mask is not None else None
736
+
737
+ if self.gradient_checkpointing and self.training:
738
+ layer_outputs = self._gradient_checkpointing_func(
739
+ layer_module.__call__,
740
+ hidden_states,
741
+ attention_bias,
742
+ rope_embeds,
743
+ padding_inputs,
744
+ attention_scale,
745
+ layer_subset_indices,
746
+ layer_head_mask,
747
+ )
748
+ else:
749
+ layer_outputs = layer_module(
750
+ hidden_states,
751
+ attention_bias,
752
+ rope_embeds,
753
+ padding_inputs,
754
+ attention_scale,
755
+ layer_subset_indices,
756
+ layer_head_mask,
757
+ output_attentions,
758
+ )
759
+
760
+ hidden_states = layer_outputs[0]
761
+ if output_attentions:
762
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
763
+
764
+ if output_hidden_states:
765
+ all_hidden_states = all_hidden_states + (hidden_states,)
766
+
767
+ if not return_dict:
768
+ return tuple(
769
+ v
770
+ for v in [
771
+ hidden_states,
772
+ all_hidden_states,
773
+ all_self_attentions,
774
+ ]
775
+ if v is not None
776
+ )
777
+ return BaseModelOutput(
778
+ last_hidden_state=hidden_states,
779
+ hidden_states=all_hidden_states,
780
+ attentions=all_self_attentions,
781
+ )
782
+
783
+
784
+ # Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->New
785
+ class NewPooler(nn.Module):
786
+ def __init__(self, config):
787
+ super().__init__()
788
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
789
+ self.activation = nn.Tanh()
790
+
791
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
792
+ # We "pool" the model by simply taking the hidden state corresponding
793
+ # to the first token.
794
+ first_token_tensor = hidden_states[:, 0]
795
+ pooled_output = self.dense(first_token_tensor)
796
+ pooled_output = self.activation(pooled_output)
797
+ return pooled_output
798
+
799
+
800
+ class NewPreTrainedModel(PreTrainedModel):
801
+ """
802
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
803
+ models.
804
+ """
805
+
806
+ config_class = NewConfig
807
+ base_model_prefix = "new"
808
+ supports_gradient_checkpointing = True
809
+ _supports_sdpa = True
810
+
811
+ def _init_weights(self, module):
812
+ """Initialize the weights"""
813
+ if isinstance(module, nn.Linear):
814
+ # Slightly different from the TF version which uses truncated_normal for initialization
815
+ # cf https://github.com/pytorch/pytorch/pull/5617
816
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
817
+ if module.bias is not None:
818
+ module.bias.data.zero_()
819
+ elif isinstance(module, nn.Embedding):
820
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
821
+ if module.padding_idx is not None:
822
+ module.weight.data[module.padding_idx].zero_()
823
+ elif isinstance(module, nn.LayerNorm):
824
+ module.bias.data.zero_()
825
+ module.weight.data.fill_(1.0)
826
+
827
+
828
+ class NewModel(NewPreTrainedModel):
829
+ """
830
+ The bare New Model transformer outputting raw hidden-states without any specific head on top.
831
+ """
832
+
833
+ def __init__(self, config: NewConfig, add_pooling_layer=False):
834
+ super().__init__(config)
835
+ self.config = config
836
+
837
+ self.embeddings = NewEmbeddings(config)
838
+ self.encoder = NewEncoder(config)
839
+
840
+ self.pooler = NewPooler(config) if add_pooling_layer else None
841
+
842
+ # Initialize weights and apply final processing
843
+ self.post_init()
844
+
845
+ def get_input_embeddings(self):
846
+ return self.embeddings.word_embeddings
847
+
848
+ def set_input_embeddings(self, value):
849
+ self.embeddings.word_embeddings = value
850
+
851
+ def forward(
852
+ self,
853
+ input_ids: Optional[torch.Tensor] = None,
854
+ attention_mask: Optional[torch.Tensor] = None,
855
+ length: Optional[List[int]] = None,
856
+ subset_indices: Optional[torch.LongTensor] = None,
857
+ token_type_ids: Optional[torch.Tensor] = None,
858
+ position_ids: Optional[torch.Tensor] = None,
859
+ head_mask: Optional[torch.Tensor] = None,
860
+ inputs_embeds: Optional[torch.Tensor] = None,
861
+ output_attentions: Optional[bool] = None,
862
+ output_hidden_states: Optional[bool] = None,
863
+ return_dict: Optional[bool] = None,
864
+ unpad_inputs: Optional[bool] = None,
865
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]:
866
+ r"""
867
+ length (`list` of length `batch_size`, *optional*):
868
+ If is `None`, return padded `last_hidden_state`.
869
+ subset_indices ():
870
+ pass
871
+ unpad_inputs (`bool`, *optional*):
872
+ pass
873
+ """
874
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
875
+ output_hidden_states = (
876
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
877
+ )
878
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
879
+ unpad_inputs = unpad_inputs if unpad_inputs is not None else self.config.unpad_inputs
880
+ output_padded = length is None
881
+
882
+ if input_ids is not None and inputs_embeds is not None:
883
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
884
+ elif input_ids is not None:
885
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
886
+ input_shape = input_ids.size()
887
+ elif inputs_embeds is not None:
888
+ input_shape = inputs_embeds.size()[:-1]
889
+ else:
890
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
891
+
892
+ # TODO: not used
893
+ # # Prepare head mask if needed
894
+ # # 1.0 in head_mask indicate we keep the head
895
+ # # attention_probs has shape bsz x n_heads x N x N
896
+ # # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
897
+ # # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
898
+ # head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
899
+
900
+ # Get embeddings, may unpad them
901
+ (embedding_output, attention_mask, rope_embeds, length) = self.embeddings(
902
+ unpad_inputs,
903
+ input_ids=input_ids,
904
+ attention_mask=attention_mask,
905
+ length=length,
906
+ token_type_ids=token_type_ids,
907
+ position_ids=position_ids,
908
+ inputs_embeds=inputs_embeds
909
+ )
910
+
911
+ batch_size, seq_length = input_shape
912
+ if unpad_inputs and self.config.use_memory_efficient_attention:
913
+ attention_bias = xops.fmha.attn_bias.BlockDiagonalMask.from_seqlens(length)
914
+ else:
915
+ # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
916
+ # ourselves in which case we just need to make it broadcastable to all heads.
917
+ attention_bias = self.get_extended_attention_mask(attention_mask, input_shape)
918
+ if self.config.use_memory_efficient_attention:
919
+ # Invalid shape for attention bias: torch.Size([48, 1, 1, 512]) (expected (48, 12, 512, 512))
920
+ attention_bias = attention_bias.expand(-1, self.config.num_attention_heads, seq_length, -1)
921
+
922
+ padding_inputs = None
923
+ if unpad_inputs and (output_padded or not self.config.use_memory_efficient_attention):
924
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
925
+ if not self.config.use_memory_efficient_attention:
926
+ padding_inputs = (indices, *input_shape)
927
+
928
+ attention_scale = None
929
+ if self.config.logn_attention_scale:
930
+ logger.warning_once("TODO: logn_attention_scale")
931
+ # # attention scale log_512(input_len)
932
+ # attention_scale = attention_mask.sum(1).log() / torch.tensor(self.config.max_position_embeddings).log()
933
+ # # inference-time logn scale need clip 1
934
+ # if self.config.logn_attention_clip1:
935
+ # attention_scale.clip_(1)
936
+ # attention_scale = attention_scale[:, None, None, None]
937
+ # else:
938
+ # attention_scale = None
939
+
940
+ encoder_outputs = self.encoder(
941
+ embedding_output,
942
+ attention_bias=attention_bias,
943
+ rope_embeds=rope_embeds,
944
+ padding_inputs=padding_inputs,
945
+ attention_scale=attention_scale,
946
+ subset_indices=subset_indices,
947
+ head_mask=head_mask,
948
+ output_attentions=output_attentions,
949
+ output_hidden_states=output_hidden_states,
950
+ return_dict=return_dict,
951
+ )
952
+ sequence_output = encoder_outputs[0]
953
+ if unpad_inputs and output_padded:
954
+ sequence_output = pad_input(
955
+ sequence_output.squeeze(), indices, batch_size, seq_length
956
+ )
957
+
958
+ pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
959
+
960
+ if not return_dict:
961
+ return (sequence_output, pooled_output) + encoder_outputs[1:]
962
+
963
+ return BaseModelOutputWithPooling(
964
+ last_hidden_state=sequence_output,
965
+ pooler_output=pooled_output,
966
+ hidden_states=encoder_outputs.hidden_states,
967
+ attentions=encoder_outputs.attentions,
968
+ )
969
+
970
+
971
+ class NewLMPredictionHead(nn.Module):
972
+ def __init__(self, config):
973
+ super().__init__()
974
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
975
+ self.transform_act_fn = ACT2FN[config.hidden_act]
976
+ self.norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
977
+
978
+ # The output weights are the same as the input embeddings, but there is
979
+ # an output-only bias for each token.
980
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
981
+
982
+ def forward(self, hidden_states):
983
+ hidden_states = self.dense(hidden_states)
984
+ hidden_states = self.transform_act_fn(hidden_states)
985
+ hidden_states = self.norm(hidden_states)
986
+ hidden_states = self.decoder(hidden_states)
987
+ return hidden_states
988
+
989
+
990
+ class NewForMaskedLM(NewPreTrainedModel):
991
+ _tied_weights_keys = ["lm_head.decoder.bias", "lm_head.decoder.weight"]
992
+
993
+ def __init__(self, config: NewConfig):
994
+ super().__init__(config)
995
+ self.new = NewModel(config, add_pooling_layer=False)
996
+ self.lm_head = NewLMPredictionHead(config)
997
+ self.loss_fct = nn.CrossEntropyLoss()
998
+
999
+ # Initialize weights and apply final processing
1000
+ self.post_init()
1001
+
1002
+ def get_output_embeddings(self):
1003
+ return self.lm_head.decoder
1004
+
1005
+ def set_output_embeddings(self, new_embeddings):
1006
+ self.lm_head.decoder = new_embeddings
1007
+
1008
+ def forward(
1009
+ self,
1010
+ input_ids: Optional[torch.Tensor] = None,
1011
+ attention_mask: Optional[torch.Tensor] = None,
1012
+ token_type_ids: Optional[torch.Tensor] = None,
1013
+ position_ids: Optional[torch.Tensor] = None,
1014
+ head_mask: Optional[torch.Tensor] = None,
1015
+ inputs_embeds: Optional[torch.Tensor] = None,
1016
+ labels: Optional[torch.Tensor] = None,
1017
+ output_attentions: Optional[bool] = None,
1018
+ output_hidden_states: Optional[bool] = None,
1019
+ return_dict: Optional[bool] = None,
1020
+ unpad_inputs: Optional[bool] = None,
1021
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
1022
+ r"""
1023
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1024
+ Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1025
+ config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
1026
+ loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1027
+ """
1028
+
1029
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1030
+
1031
+ if labels is None or not self.new.config.unpad_inputs:
1032
+ length = None
1033
+ subset_indices = None
1034
+ else:
1035
+ length = attention_mask.sum(-1).tolist()
1036
+ labels = labels[attention_mask.bool()].unsqueeze(0)
1037
+ subset_indices = labels > -100
1038
+
1039
+ outputs = self.new(
1040
+ input_ids,
1041
+ attention_mask=attention_mask,
1042
+ length=length,
1043
+ subset_indices=subset_indices,
1044
+ token_type_ids=token_type_ids,
1045
+ position_ids=position_ids,
1046
+ head_mask=head_mask,
1047
+ inputs_embeds=inputs_embeds,
1048
+ output_attentions=output_attentions,
1049
+ output_hidden_states=output_hidden_states,
1050
+ return_dict=return_dict,
1051
+ unpad_inputs=unpad_inputs,
1052
+ )
1053
+
1054
+ sequence_output = outputs[0]
1055
+ prediction_scores = self.lm_head(sequence_output)
1056
+
1057
+ masked_lm_loss = None
1058
+ if labels is not None:
1059
+ if subset_indices is None:
1060
+ mask = attention_mask.bool()
1061
+ prediction_scores = prediction_scores[mask]
1062
+ labels = labels[mask]
1063
+ else:
1064
+ labels = labels[subset_indices]
1065
+ masked_lm_loss = self.loss_fct(prediction_scores, labels)
1066
+
1067
+ if not return_dict:
1068
+ output = (prediction_scores,) + outputs[2:]
1069
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
1070
+
1071
+ return MaskedLMOutput(
1072
+ loss=masked_lm_loss,
1073
+ logits=prediction_scores,
1074
+ hidden_states=outputs.hidden_states,
1075
+ attentions=outputs.attentions,
1076
+ )
1077
+
1078
+
1079
+ class NewForSequenceClassification(NewPreTrainedModel):
1080
+ def __init__(self, config):
1081
+ super().__init__(config)
1082
+ self.num_labels = config.num_labels
1083
+ self.config = config
1084
+
1085
+ self.new = NewModel(config, add_pooling_layer=True)
1086
+ classifier_dropout = (
1087
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1088
+ )
1089
+ self.dropout = nn.Dropout(classifier_dropout)
1090
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1091
+
1092
+ # Initialize weights and apply final processing
1093
+ self.post_init()
1094
+
1095
+ def forward(
1096
+ self,
1097
+ input_ids: Optional[torch.Tensor] = None,
1098
+ attention_mask: Optional[torch.Tensor] = None,
1099
+ token_type_ids: Optional[torch.Tensor] = None,
1100
+ position_ids: Optional[torch.Tensor] = None,
1101
+ head_mask: Optional[torch.Tensor] = None,
1102
+ inputs_embeds: Optional[torch.Tensor] = None,
1103
+ labels: Optional[torch.Tensor] = None,
1104
+ output_attentions: Optional[bool] = None,
1105
+ output_hidden_states: Optional[bool] = None,
1106
+ return_dict: Optional[bool] = None,
1107
+ unpad_inputs: Optional[bool] = None,
1108
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
1109
+ r"""
1110
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1111
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1112
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1113
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1114
+ """
1115
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1116
+
1117
+ outputs = self.new(
1118
+ input_ids,
1119
+ attention_mask=attention_mask,
1120
+ token_type_ids=token_type_ids,
1121
+ position_ids=position_ids,
1122
+ head_mask=head_mask,
1123
+ inputs_embeds=inputs_embeds,
1124
+ output_attentions=output_attentions,
1125
+ output_hidden_states=output_hidden_states,
1126
+ return_dict=return_dict,
1127
+ unpad_inputs=unpad_inputs,
1128
+ )
1129
+
1130
+ pooled_output = outputs[1]
1131
+
1132
+ pooled_output = self.dropout(pooled_output)
1133
+ logits = self.classifier(pooled_output)
1134
+
1135
+ loss = None
1136
+ if labels is not None:
1137
+ if self.config.problem_type is None:
1138
+ if self.num_labels == 1:
1139
+ self.config.problem_type = "regression"
1140
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1141
+ self.config.problem_type = "single_label_classification"
1142
+ else:
1143
+ self.config.problem_type = "multi_label_classification"
1144
+
1145
+ if self.config.problem_type == "regression":
1146
+ loss_fct = nn.MSELoss()
1147
+ if self.num_labels == 1:
1148
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
1149
+ else:
1150
+ loss = loss_fct(logits, labels)
1151
+ elif self.config.problem_type == "single_label_classification":
1152
+ loss_fct = nn.CrossEntropyLoss()
1153
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1154
+ elif self.config.problem_type == "multi_label_classification":
1155
+ loss_fct = nn.BCEWithLogitsLoss()
1156
+ loss = loss_fct(logits, labels)
1157
+
1158
+ if not return_dict:
1159
+ output = (logits,) + outputs[2:]
1160
+ return ((loss,) + output) if loss is not None else output
1161
+
1162
+ return SequenceClassifierOutput(
1163
+ loss=loss,
1164
+ logits=logits,
1165
+ hidden_states=outputs.hidden_states,
1166
+ attentions=outputs.attentions,
1167
+ )
1168
+
1169
+
1170
+ class NewForMultipleChoice(NewPreTrainedModel):
1171
+ def __init__(self, config):
1172
+ super().__init__(config)
1173
+
1174
+ self.new = NewModel(config, add_pooling_layer=True)
1175
+ classifier_dropout = (
1176
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1177
+ )
1178
+ self.dropout = nn.Dropout(classifier_dropout)
1179
+ self.classifier = nn.Linear(config.hidden_size, 1)
1180
+
1181
+ # Initialize weights and apply final processing
1182
+ self.post_init()
1183
+
1184
+ def forward(
1185
+ self,
1186
+ input_ids: Optional[torch.Tensor] = None,
1187
+ attention_mask: Optional[torch.Tensor] = None,
1188
+ token_type_ids: Optional[torch.Tensor] = None,
1189
+ position_ids: Optional[torch.Tensor] = None,
1190
+ head_mask: Optional[torch.Tensor] = None,
1191
+ inputs_embeds: Optional[torch.Tensor] = None,
1192
+ labels: Optional[torch.Tensor] = None,
1193
+ output_attentions: Optional[bool] = None,
1194
+ output_hidden_states: Optional[bool] = None,
1195
+ return_dict: Optional[bool] = None,
1196
+ unpad_inputs: Optional[bool] = None,
1197
+ ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
1198
+ r"""
1199
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1200
+ Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
1201
+ num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
1202
+ `input_ids` above)
1203
+ """
1204
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1205
+ num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
1206
+
1207
+ input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
1208
+ attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
1209
+ token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
1210
+ position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
1211
+ inputs_embeds = (
1212
+ inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
1213
+ if inputs_embeds is not None
1214
+ else None
1215
+ )
1216
+
1217
+ outputs = self.new(
1218
+ input_ids,
1219
+ attention_mask=attention_mask,
1220
+ token_type_ids=token_type_ids,
1221
+ position_ids=position_ids,
1222
+ head_mask=head_mask,
1223
+ inputs_embeds=inputs_embeds,
1224
+ output_attentions=output_attentions,
1225
+ output_hidden_states=output_hidden_states,
1226
+ return_dict=return_dict,
1227
+ unpad_inputs=unpad_inputs,
1228
+ )
1229
+
1230
+ pooled_output = outputs[1]
1231
+
1232
+ pooled_output = self.dropout(pooled_output)
1233
+ logits = self.classifier(pooled_output)
1234
+ reshaped_logits = logits.view(-1, num_choices)
1235
+
1236
+ loss = None
1237
+ if labels is not None:
1238
+ loss_fct = nn.CrossEntropyLoss()
1239
+ loss = loss_fct(reshaped_logits, labels)
1240
+
1241
+ if not return_dict:
1242
+ output = (reshaped_logits,) + outputs[2:]
1243
+ return ((loss,) + output) if loss is not None else output
1244
+
1245
+ return MultipleChoiceModelOutput(
1246
+ loss=loss,
1247
+ logits=reshaped_logits,
1248
+ hidden_states=outputs.hidden_states,
1249
+ attentions=outputs.attentions,
1250
+ )
1251
+
1252
+
1253
+ @dataclass
1254
+ class NewTokenClassifierOutput(ModelOutput):
1255
+ loss: Optional[torch.FloatTensor] = None
1256
+ logits: torch.FloatTensor = None
1257
+ last_hidden_state: torch.FloatTensor = None
1258
+ hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
1259
+ attentions: Optional[Tuple[torch.FloatTensor, ...]] = None
1260
+
1261
+
1262
+ class NewForTokenClassification(NewPreTrainedModel):
1263
+ def __init__(self, config):
1264
+ super().__init__(config)
1265
+ self.num_labels = config.num_labels
1266
+
1267
+ self.new = NewModel(config, add_pooling_layer=False)
1268
+ classifier_dropout = (
1269
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1270
+ )
1271
+ self.dropout = nn.Dropout(classifier_dropout)
1272
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1273
+
1274
+ # Initialize weights and apply final processing
1275
+ self.post_init()
1276
+
1277
+ def forward(
1278
+ self,
1279
+ input_ids: Optional[torch.Tensor] = None,
1280
+ attention_mask: Optional[torch.Tensor] = None,
1281
+ token_type_ids: Optional[torch.Tensor] = None,
1282
+ position_ids: Optional[torch.Tensor] = None,
1283
+ head_mask: Optional[torch.Tensor] = None,
1284
+ inputs_embeds: Optional[torch.Tensor] = None,
1285
+ labels: Optional[torch.Tensor] = None,
1286
+ output_attentions: Optional[bool] = None,
1287
+ output_hidden_states: Optional[bool] = None,
1288
+ return_dict: Optional[bool] = None,
1289
+ unpad_inputs: Optional[bool] = None,
1290
+ ) -> Union[Tuple[torch.Tensor], NewTokenClassifierOutput]:
1291
+ r"""
1292
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1293
+ Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
1294
+ """
1295
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1296
+
1297
+ outputs = self.new(
1298
+ input_ids,
1299
+ attention_mask=attention_mask,
1300
+ token_type_ids=token_type_ids,
1301
+ position_ids=position_ids,
1302
+ head_mask=head_mask,
1303
+ inputs_embeds=inputs_embeds,
1304
+ output_attentions=output_attentions,
1305
+ output_hidden_states=output_hidden_states,
1306
+ return_dict=return_dict,
1307
+ unpad_inputs=unpad_inputs,
1308
+ )
1309
+
1310
+ sequence_output = outputs[0]
1311
+
1312
+ sequence_output = self.dropout(sequence_output)
1313
+ logits = self.classifier(sequence_output)
1314
+
1315
+ loss = None
1316
+ if labels is not None:
1317
+ loss_fct = nn.CrossEntropyLoss()
1318
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1319
+
1320
+ if not return_dict:
1321
+ output = (logits,) + outputs[2:]
1322
+ return ((loss,) + output) if loss is not None else output
1323
+
1324
+ return NewTokenClassifierOutput(
1325
+ loss=loss,
1326
+ logits=logits,
1327
+ last_hidden_state=sequence_output,
1328
+ hidden_states=outputs.hidden_states,
1329
+ attentions=outputs.attentions,
1330
+ )
1331
+
1332
+
1333
+ class NewForQuestionAnswering(NewPreTrainedModel):
1334
+ def __init__(self, config):
1335
+ super().__init__(config)
1336
+ self.num_labels = config.num_labels
1337
+
1338
+ self.new = NewModel(config, add_pooling_layer=False)
1339
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
1340
+
1341
+ # Initialize weights and apply final processing
1342
+ self.post_init()
1343
+
1344
+ def forward(
1345
+ self,
1346
+ input_ids: Optional[torch.Tensor] = None,
1347
+ attention_mask: Optional[torch.Tensor] = None,
1348
+ token_type_ids: Optional[torch.Tensor] = None,
1349
+ position_ids: Optional[torch.Tensor] = None,
1350
+ head_mask: Optional[torch.Tensor] = None,
1351
+ inputs_embeds: Optional[torch.Tensor] = None,
1352
+ start_positions: Optional[torch.Tensor] = None,
1353
+ end_positions: Optional[torch.Tensor] = None,
1354
+ output_attentions: Optional[bool] = None,
1355
+ output_hidden_states: Optional[bool] = None,
1356
+ return_dict: Optional[bool] = None,
1357
+ unpad_inputs: Optional[bool] = None,
1358
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
1359
+ r"""
1360
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1361
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1362
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1363
+ are not taken into account for computing the loss.
1364
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1365
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1366
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1367
+ are not taken into account for computing the loss.
1368
+ """
1369
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1370
+
1371
+ outputs = self.new(
1372
+ input_ids,
1373
+ attention_mask=attention_mask,
1374
+ token_type_ids=token_type_ids,
1375
+ position_ids=position_ids,
1376
+ head_mask=head_mask,
1377
+ inputs_embeds=inputs_embeds,
1378
+ output_attentions=output_attentions,
1379
+ output_hidden_states=output_hidden_states,
1380
+ return_dict=return_dict,
1381
+ unpad_inputs=unpad_inputs,
1382
+ )
1383
+
1384
+ sequence_output = outputs[0]
1385
+
1386
+ logits = self.qa_outputs(sequence_output)
1387
+ start_logits, end_logits = logits.split(1, dim=-1)
1388
+ start_logits = start_logits.squeeze(-1).contiguous()
1389
+ end_logits = end_logits.squeeze(-1).contiguous()
1390
+
1391
+ total_loss = None
1392
+ if start_positions is not None and end_positions is not None:
1393
+ # If we are on multi-GPU, split add a dimension
1394
+ if len(start_positions.size()) > 1:
1395
+ start_positions = start_positions.squeeze(-1)
1396
+ if len(end_positions.size()) > 1:
1397
+ end_positions = end_positions.squeeze(-1)
1398
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1399
+ ignored_index = start_logits.size(1)
1400
+ start_positions = start_positions.clamp(0, ignored_index)
1401
+ end_positions = end_positions.clamp(0, ignored_index)
1402
+
1403
+ loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
1404
+ start_loss = loss_fct(start_logits, start_positions)
1405
+ end_loss = loss_fct(end_logits, end_positions)
1406
+ total_loss = (start_loss + end_loss) / 2
1407
+
1408
+ if not return_dict:
1409
+ output = (start_logits, end_logits) + outputs[2:]
1410
+ return ((total_loss,) + output) if total_loss is not None else output
1411
+
1412
+ return QuestionAnsweringModelOutput(
1413
+ loss=total_loss,
1414
+ start_logits=start_logits,
1415
+ end_logits=end_logits,
1416
+ hidden_states=outputs.hidden_states,
1417
+ attentions=outputs.attentions,
1418
+ )
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 8192,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizerFast",
54
+ "unk_token": "<unk>"
55
+ }