suhwan3 commited on
Commit
27bb250
·
verified ·
1 Parent(s): 565b138

Upload fine-tuned model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,845 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:128997
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: suhwan3/mpnet_step1
10
+ widget:
11
+ - source_sentence: The Global X S&P 500 Risk Managed Income ETF seeks to track the
12
+ Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets
13
+ in index securities. The index's strategy involves holding the underlying stocks
14
+ of the S&P 500 Index while applying an options collar, specifically selling at-the-money
15
+ covered call options and buying monthly 5% out-of-the-money put options corresponding
16
+ to the portfolio's value. This approach aims to generate income, ideally resulting
17
+ in a net credit from the options premiums, and provide risk management, though
18
+ selling at-the-money calls inherently caps the fund's potential for upside participation.
19
+ sentences:
20
+ - Nasdaq, Inc. operates as a technology company that serves capital markets and
21
+ other industries worldwide. The Market Technology segment includes anti financial
22
+ crime technology business, which offers Nasdaq Trade Surveillance, a SaaS solution
23
+ for brokers and other market participants to assist them in complying with market
24
+ rules, regulations, and internal market surveillance policies; Nasdaq Automated
25
+ Investigator, a cloud-deployed anti-money laundering tool; and Verafin, a SaaS
26
+ technology provider of anti-financial crime management solutions. This segment
27
+ also handles assets, such as cash equities, equity derivatives, currencies, interest-bearing
28
+ securities, commodities, energy products, and digital currencies. The Investment
29
+ Intelligence segment sells and distributes historical and real-time market data;
30
+ develops and licenses Nasdaq-branded indexes and financial products; and provides
31
+ investment insights and workflow solutions. The Corporate Platforms segment operates
32
+ listing platforms; and offers investor relations intelligence and governance solutions.
33
+ As of December 31, 2021, it had 4,178 companies listed securities on The Nasdaq
34
+ Stock Market, including 1,632 listings on The Nasdaq Global Select Market; 1,169
35
+ on The Nasdaq Global Market; and 1,377 on The Nasdaq Capital Market. The Market
36
+ Services segment includes equity derivative trading and clearing, cash equity
37
+ trading, fixed income and commodities trading and clearing, and trade management
38
+ service businesses. This segment operates various exchanges and other marketplace
39
+ facilities across various asset classes, which include derivatives, commodities,
40
+ cash equity, debt, structured products, and exchange traded products; and provides
41
+ broker, clearing, settlement, and central depository services. The company was
42
+ formerly known as The NASDAQ OMX Group, Inc. and changed its name to Nasdaq, Inc.
43
+ in September 2015. Nasdaq, Inc. was founded in 1971 and is headquartered in New
44
+ York, New York.
45
+ - Jabil Inc. provides manufacturing services and solutions worldwide. The company
46
+ operates in two segments, Electronics Manufacturing Services and Diversified Manufacturing
47
+ Services. It offers electronics design, production, and product management services.
48
+ The company provides electronic design services, such as application-specific
49
+ integrated circuit design, firmware development, and rapid prototyping services;
50
+ and designs plastic and metal enclosures that include the electro-mechanics, such
51
+ as the printed circuit board assemblies (PCBA). It also specializes in the three-dimensional
52
+ mechanical design comprising the analysis of electronic, electro-mechanical, and
53
+ optical assemblies, as well as offers various industrial design, mechanism development,
54
+ and tooling management services. In addition, the company provides computer-assisted
55
+ design services consisting of PCBA design, as well as PCBA design validation and
56
+ verification services; and other consulting services, such as the generation of
57
+ a bill of materials, approved vendor list, and assembly equipment configuration
58
+ for various PCBA designs. Further, it offers product and process validation services,
59
+ such as product system, product safety, regulatory compliance, and reliability
60
+ tests, as well as manufacturing test solution development services. Additionally,
61
+ the company provides systems assembly, test, direct-order fulfillment, and configure-to-order
62
+ services. It serves 5G, wireless and cloud, digital print and retail, industrial
63
+ and semi-cap, networking and storage, automotive and transportation, connected
64
+ devices, healthcare and packaging, and mobility industries. The company was formerly
65
+ known as Jabil Circuit, Inc. and changed its name to Jabil Inc. in June 2017.
66
+ Jabil Inc. was founded in 1966 and is headquartered in Saint Petersburg, Florida.
67
+ - 'Realty Income, The Monthly Dividend Company, is an S&P 500 company dedicated
68
+ to providing stockholders with dependable monthly income. The company is structured
69
+ as a REIT, and its monthly dividends are supported by the cash flow from over
70
+ 6,500 real estate properties owned under long-term lease agreements with our commercial
71
+ clients. To date, the company has declared 608 consecutive common stock monthly
72
+ dividends throughout its 52-year operating history and increased the dividend
73
+ 109 times since Realty Income''s public listing in 1994 (NYSE: O). The company
74
+ is a member of the S&P 500 Dividend Aristocrats index. Additional information
75
+ about the company can be obtained from the corporate website at www.realtyincome.com.'
76
+ - source_sentence: The iShares U.S. Telecommunications ETF (IYZ) seeks to track the
77
+ investment results of the Russell 1000 Telecommunications RIC 22.5/45 Capped Index,
78
+ which measures the performance of the U.S. telecommunications sector of the U.S.
79
+ equity market as defined by FTSE Russell. This market-cap-weighted index includes
80
+ large-cap companies involved in telecom equipment and service provision and is
81
+ subject to regulatory capping that limits single holdings to 22.5% and aggregate
82
+ large holdings to 45%. The fund generally invests at least 80% of its assets in
83
+ the component securities of its underlying index and is non-diversified; the underlying
84
+ index is rebalanced quarterly.
85
+ sentences:
86
+ - Kanzhun Limited operates an online recruitment platform, BOSS Zhipin in the People's
87
+ Republic of China. Its recruitment platform assists the recruitment process between
88
+ job seekers and employers for enterprises, and corporations. The company was founded
89
+ in 2013 and is headquartered in Beijing, the People's Republic of China.
90
+ - Frontier Communications Parent, Inc., together with its subsidiaries, provides
91
+ communications services for consumer and business customers in 25 states in the
92
+ United States. It offers data and Internet, voice, video, and other services.
93
+ The company was formerly known as Frontier Communications Corporation and changed
94
+ its name to Frontier Communications Parent, Inc. in April 2021. Frontier Communications
95
+ Parent, Inc. was incorporated in 1935 and is based in Norwalk, Connecticut.
96
+ - Broadcom Inc. designs, develops, and supplies various semiconductor devices with
97
+ a focus on complex digital and mixed signal complementary metal oxide semiconductor
98
+ based devices and analog III-V based products worldwide. The company operates
99
+ in two segments, Semiconductor Solutions and Infrastructure Software. It provides
100
+ set-top box system-on-chips (SoCs); cable, digital subscriber line, and passive
101
+ optical networking central office/consumer premise equipment SoCs; wireless local
102
+ area network access point SoCs; Ethernet switching and routing merchant silicon
103
+ products; embedded processors and controllers; serializer/deserializer application
104
+ specific integrated circuits; optical and copper, and physical layers; and fiber
105
+ optic transmitter and receiver components. The company also offers RF front end
106
+ modules, filters, and power amplifiers; Wi-Fi, Bluetooth, and global positioning
107
+ system/global navigation satellite system SoCs; custom touch controllers; serial
108
+ attached small computer system interface, and redundant array of independent disks
109
+ controllers and adapters; peripheral component interconnect express switches;
110
+ fiber channel host bus adapters; read channel based SoCs; custom flash controllers;
111
+ preamplifiers; and optocouplers, industrial fiber optics, and motion control encoders
112
+ and subsystems. Its products are used in various applications, including enterprise
113
+ and data center networking, home connectivity, set-top boxes, broadband access,
114
+ telecommunication equipment, smartphones and base stations, data center servers
115
+ and storage systems, factory automation, power generation and alternative energy
116
+ systems, and electronic displays. Broadcom Inc. was incorporated in 2018 and is
117
+ headquartered in San Jose, California.
118
+ - source_sentence: The Xtrackers MSCI Emerging Markets ESG Leaders Equity ETF tracks
119
+ an index of large- and mid-cap emerging market stocks that emphasize strong environmental,
120
+ social, and governance (ESG) characteristics. The index first excludes companies
121
+ involved in specific controversial industries. From the remaining universe, it
122
+ ranks stocks based on MSCI ESG scores, including a controversy component, to identify
123
+ and select the highest-ranking ESG leaders, effectively screening out ESG laggards.
124
+ To maintain market-like country and sector weights, the index selects the top
125
+ ESG-scoring stocks within each sector until a specified market capitalization
126
+ threshold is reached. Selected stocks are then weighted by market capitalization
127
+ within their respective sectors. The fund typically invests over 80% of its assets
128
+ in the securities of this underlying index.
129
+ sentences:
130
+ - Info Edge (India) Limited operates as an online classifieds company in the areas
131
+ of recruitment, matrimony, real estate, and education and related services in
132
+ India and internationally. It operates through Recruitment Solutions, 99acres,
133
+ and Other segments. The company offers recruitment services through naukri.com,
134
+ an online job website for job seekers and corporate customers, including hiring
135
+ consultants; firstnaukri.com, a job search network for college students and recent
136
+ graduates; naukrigulf.com, a website catering to Gulf markets; and quadranglesearch.com,
137
+ a site that provides off-line placement services to middle and senior management,
138
+ as well as Highorbit/iimjobs.com, zwayam.com, hirist.com, doselect.com, ambitionbox.com,
139
+ bigshyft.com, and jobhai.com. It also provides 99acres.com, which offers listing
140
+ of properties for sale, purchase, and rent; Jeevansathi.com, an online matrimonial
141
+ classifieds services; and shiksha.com, an education classified website that helps
142
+ students to decide their undergraduate and postgraduate options by providing useful
143
+ information on careers, exams, colleges, and courses, as well as operates multiple
144
+ dating platforms on the web through its mobile apps Aisle, Anbe, Arike and HeyDil.
145
+ In addition, the company provides internet, computer, and electronic and related
146
+ services; and software development, consultancy, technical support for consumer
147
+ companies, SAAS providers, and other services in the field of information technology
148
+ and product development, as well as brokerage services in the real estate sector.
149
+ Further, it acts as an investment adviser and manager, financial and management
150
+ consultant, and sponsor of alternative investment funds, as well as provides advertising
151
+ space for colleges and universities on www.shiksha.com. Info Edge (India) Limited
152
+ was incorporated in 1995 and is based in Noida, India.
153
+ - China Overseas Land & Investment Limited, an investment holding company, engages
154
+ in the property development and investment, and other operations in the People's
155
+ Republic of China and the United Kingdom. The company operates through Property
156
+ Development, Property Investment, and Other Operations segments. It is involved
157
+ in the investment, development, and rental of residential and commercial properties;
158
+ issuance of guaranteed notes and corporate bonds; and hotel operation activities.
159
+ The company also provides construction and building design consultancy services.
160
+ In addition, it engages in the investment and financing, land consolidation, regional
161
+ planning, engineering construction, industrial import, commercial operation, and
162
+ property management. Further, the company offers urban services, including office
163
+ buildings, flexible working space, shopping malls, star-rated hotels, long-term
164
+ rental apartments, logistics parks, and architectural design and construction.
165
+ The company was founded in 1979 and is based in Central, Hong Kong. China Overseas
166
+ Land & Investment Limited is a subsidiary of China Overseas Holdings Limited.
167
+ - Mastercard Incorporated, a technology company, provides transaction processing
168
+ and other payment-related products and services in the United States and internationally.
169
+ It facilitates the processing of payment transactions, including authorization,
170
+ clearing, and settlement, as well as delivers other payment-related products and
171
+ services. The company offers integrated products and value-added services for
172
+ account holders, merchants, financial institutions, businesses, governments, and
173
+ other organizations, such as programs that enable issuers to provide consumers
174
+ with credits to defer payments; prepaid programs and management services; commercial
175
+ credit and debit payment products and solutions; and payment products and solutions
176
+ that allow its customers to access funds in deposit and other accounts. It also
177
+ provides value-added products and services comprising cyber and intelligence solutions
178
+ for parties to transact, as well as proprietary insights, drawing on principled
179
+ use of consumer, and merchant data services. In addition, the company offers analytics,
180
+ test and learn, consulting, managed services, loyalty, processing, and payment
181
+ gateway solutions for e-commerce merchants. Further, it provides open banking
182
+ and digital identity platforms services. The company offers payment solutions
183
+ and services under the MasterCard, Maestro, and Cirrus. Mastercard Incorporated
184
+ was founded in 1966 and is headquartered in Purchase, New York.
185
+ - source_sentence: The Global X S&P 500 Risk Managed Income ETF seeks to track the
186
+ Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets
187
+ in index securities. The index's strategy involves holding the underlying stocks
188
+ of the S&P 500 Index while applying an options collar, specifically selling at-the-money
189
+ covered call options and buying monthly 5% out-of-the-money put options corresponding
190
+ to the portfolio's value. This approach aims to generate income, ideally resulting
191
+ in a net credit from the options premiums, and provide risk management, though
192
+ selling at-the-money calls inherently caps the fund's potential for upside participation.
193
+ sentences:
194
+ - Incyte Corporation, a biopharmaceutical company, focuses on the discovery, development,
195
+ and commercialization of proprietary therapeutics in the United States and internationally.
196
+ The company offers JAKAFI, a drug for the treatment of myelofibrosis and polycythemia
197
+ vera; PEMAZYRE, a fibroblast growth factor receptor kinase inhibitor that act
198
+ as oncogenic drivers in various liquid and solid tumor types; and ICLUSIG, a kinase
199
+ inhibitor to treat chronic myeloid leukemia and philadelphia-chromosome positive
200
+ acute lymphoblastic leukemia. Its clinical stage products include ruxolitinib,
201
+ a steroid-refractory chronic graft-versus-host-diseases (GVHD); itacitinib, which
202
+ is in Phase II/III clinical trial to treat naive chronic GVHD; and pemigatinib
203
+ for treating bladder cancer, cholangiocarcinoma, myeloproliferative syndrome,
204
+ and tumor agnostic. In addition, the company engages in developing Parsaclisib,
205
+ which is in Phase II clinical trial for follicular lymphoma, marginal zone lymphoma,
206
+ and mantel cell lymphoma. Additionally, it develops Retifanlimab that is in Phase
207
+ II clinical trials for MSI-high endometrial cancer, merkel cell carcinoma, and
208
+ anal cancer, as well as in Phase II clinical trials for patients with non-small
209
+ cell lung cancer. It has collaboration agreements with Novartis International
210
+ Pharmaceutical Ltd.; Eli Lilly and Company; Agenus Inc.; Calithera Biosciences,
211
+ Inc; MacroGenics, Inc.; Merus N.V.; Syros Pharmaceuticals, Inc.; Innovent Biologics,
212
+ Inc.; Zai Lab Limited; Cellenkos, Inc.; and Nimble Therapeutics, as well as clinical
213
+ collaborations with MorphoSys AG and Xencor, Inc. to investigate the combination
214
+ of tafasitamab, plamotamab, and lenalidomide in patients with relapsed or refractory
215
+ diffuse large B-cell lymphoma, and relapsed or refractory follicular lymphoma.
216
+ The company was incorporated in 1991 and is headquartered in Wilmington, Delaware.
217
+ - Omnicom Group Inc., together with its subsidiaries, provides advertising, marketing,
218
+ and corporate communications services. It provides a range of services in the
219
+ areas of advertising, customer relationship management, public relations, and
220
+ healthcare. The company's services include advertising, branding, content marketing,
221
+ corporate social responsibility consulting, crisis communications, custom publishing,
222
+ data analytics, database management, digital/direct marketing, digital transformation,
223
+ entertainment marketing, experiential marketing, field marketing, financial/corporate
224
+ business-to-business advertising, graphic arts/digital imaging, healthcare marketing
225
+ and communications, and in-store design services. Its services also comprise interactive
226
+ marketing, investor relations, marketing research, media planning and buying,
227
+ merchandising and point of sale, mobile marketing, multi-cultural marketing, non-profit
228
+ marketing, organizational communications, package design, product placement, promotional
229
+ marketing, public affairs, retail marketing, sales support, search engine marketing,
230
+ shopper marketing, social media marketing, and sports and event marketing services.
231
+ It operates in the United States, Canada, Puerto Rico, South America, Mexico,
232
+ Europe, the Middle East, Africa, Australia, Greater China, India, Japan, Korea,
233
+ New Zealand, Singapore, and other Asian countries. The company was incorporated
234
+ in 1944 and is based in New York, New York.
235
+ - NetApp, Inc. provides cloud-led and data-centric services to manage and share
236
+ data on-premises, and private and public clouds worldwide. It operates in two
237
+ segments, Hybrid Cloud and Public Could. The company offers intelligent data management
238
+ software, such as NetApp ONTAP, NetApp Snapshot, NetApp SnapCenter Backup Management,
239
+ NetApp SnapMirror Data Replication, NetApp SnapLock Data Compliance, NetApp ElementOS
240
+ software, and NetApp SANtricity software; and storage infrastructure solutions,
241
+ including NetApp All-Flash FAS series, NetApp Fabric Attached Storage, NetApp
242
+ FlexPod, NetApp E/EF series, NetApp StorageGRID, and NetApp SolidFire. It also
243
+ provides cloud storage and data services comprising NetApp Cloud Volumes ONTAP,
244
+ Azure NetApp Files, Amazon FSx for NetApp ONTAP, NetApp Cloud Volumes Service
245
+ for Google Cloud, NetApp Cloud Sync, NetApp Cloud Tiering, NetApp Cloud Backup,
246
+ NetApp Cloud Data Sense, and NetApp Cloud Volumes Edge Cache; and cloud operations
247
+ services, such as NetApp Cloud Insights, Spot Ocean Kubernetes Suite, Spot Security,
248
+ Spot Eco, and Spot CloudCheckr. In addition, the company offers application-aware
249
+ data management service under the NetApp Astra name; and professional and support
250
+ services, such as strategic consulting, professional, managed, and support services.
251
+ Further, it provides assessment, design, implementation, and migration services.
252
+ The company serves the energy, financial service, government, technology, internet,
253
+ life science, healthcare service, manufacturing, media, entertainment, animation,
254
+ video postproduction, and telecommunication markets through a direct sales force
255
+ and an ecosystem of partners. NetApp, Inc. was incorporated in 1992 and is headquartered
256
+ in San Jose, California.
257
+ - source_sentence: The Global X S&P 500 Risk Managed Income ETF seeks to track the
258
+ Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets
259
+ in index securities. The index's strategy involves holding the underlying stocks
260
+ of the S&P 500 Index while applying an options collar, specifically selling at-the-money
261
+ covered call options and buying monthly 5% out-of-the-money put options corresponding
262
+ to the portfolio's value. This approach aims to generate income, ideally resulting
263
+ in a net credit from the options premiums, and provide risk management, though
264
+ selling at-the-money calls inherently caps the fund's potential for upside participation.
265
+ sentences:
266
+ - Walgreens Boots Alliance, Inc. operates as a pharmacy-led health and beauty retail
267
+ company. It operates through two segments, the United States and International.
268
+ The United States segment sells prescription drugs and an assortment of retail
269
+ products, including health, wellness, beauty, personal care, consumable, and general
270
+ merchandise products through its retail drugstores. It also provides central specialty
271
+ pharmacy services and mail services. As of August 31, 2021, this segment operated
272
+ 8,965 retail stores under the Walgreens and Duane Reade brands in the United States;
273
+ and five specialty pharmacies. The International segment sells prescription drugs;
274
+ and health and wellness, beauty, personal care, and other consumer products through
275
+ its pharmacy-led health and beauty retail stores and optical practices, as well
276
+ as through boots.com and an integrated mobile application. It also engages in
277
+ pharmaceutical wholesaling and distribution business in Germany. As of August
278
+ 31, 2021, this segment operated 4,031 retail stores under the Boots, Benavides,
279
+ and Ahumada in the United Kingdom, Thailand, Norway, the Republic of Ireland,
280
+ the Netherlands, Mexico, and Chile; and 548 optical practices, including 160 on
281
+ a franchise basis. Walgreens Boots Alliance, Inc. was founded in 1901 and is based
282
+ in Deerfield, Illinois.
283
+ - Middlesex Water Company owns and operates regulated water utility and wastewater
284
+ systems. It operates in two segments, Regulated and Non-Regulated. The Regulated
285
+ segment collects, treats, and distributes water on a retail and wholesale basis
286
+ to residential, commercial, industrial, and fire protection customers, as well
287
+ as provides regulated wastewater systems in New Jersey and Delaware. The Non-Regulated
288
+ segment provides non-regulated contract services for the operation and maintenance
289
+ of municipal and private water and wastewater systems in New Jersey and Delaware.
290
+ The company was incorporated in 1896 and is headquartered in Iselin, New Jersey.
291
+ - Liberty Broadband Corporation engages in the communications businesses. It operates
292
+ through GCI Holdings and Charter segments. The GCI Holdings segment provides a
293
+ range of wireless, data, video, voice, and managed services to residential customers,
294
+ businesses, governmental entities, and educational and medical institutions primarily
295
+ in Alaska under the GCI brand. The Charter segment offers subscription-based video
296
+ services comprising video on demand, high-definition television, and digital video
297
+ recorder service; local and long-distance calling, voicemail, call waiting, caller
298
+ ID, call forwarding, and other voice services, as well as international calling
299
+ services; and Spectrum TV. It also provides internet services, including an in-home
300
+ Wi-Fi product that provides customers with high-performance wireless routers and
301
+ managed Wi-Fi services; advanced community Wi-Fi; mobile internet; and a security
302
+ suite that offers protection against computer viruses and spyware. In addition,
303
+ this segment offers internet access, data networking, fiber connectivity to cellular
304
+ towers and office buildings, video entertainment, and business telephone services;
305
+ advertising services on cable television networks and digital outlets; and operates
306
+ regional sports and news networks. Liberty Broadband Corporation was incorporated
307
+ in 2014 and is based in Englewood, Colorado.
308
+ datasets:
309
+ - hobbang/stage2-dataset
310
+ pipeline_tag: sentence-similarity
311
+ library_name: sentence-transformers
312
+ ---
313
+
314
+ # SentenceTransformer based on suhwan3/mpnet_step1
315
+
316
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [suhwan3/mpnet_step1](https://huggingface.co/suhwan3/mpnet_step1) on the [stage2-dataset](https://huggingface.co/datasets/hobbang/stage2-dataset) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
317
+
318
+ ## Model Details
319
+
320
+ ### Model Description
321
+ - **Model Type:** Sentence Transformer
322
+ - **Base model:** [suhwan3/mpnet_step1](https://huggingface.co/suhwan3/mpnet_step1) <!-- at revision 8857c26669998d56b0735085b269cfc7890ca67d -->
323
+ - **Maximum Sequence Length:** 384 tokens
324
+ - **Output Dimensionality:** 768 dimensions
325
+ - **Similarity Function:** Cosine Similarity
326
+ - **Training Dataset:**
327
+ - [stage2-dataset](https://huggingface.co/datasets/hobbang/stage2-dataset)
328
+ <!-- - **Language:** Unknown -->
329
+ <!-- - **License:** Unknown -->
330
+
331
+ ### Model Sources
332
+
333
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
334
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
335
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
336
+
337
+ ### Full Model Architecture
338
+
339
+ ```
340
+ SentenceTransformer(
341
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
342
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
343
+ (2): Normalize()
344
+ )
345
+ ```
346
+
347
+ ## Usage
348
+
349
+ ### Direct Usage (Sentence Transformers)
350
+
351
+ First install the Sentence Transformers library:
352
+
353
+ ```bash
354
+ pip install -U sentence-transformers
355
+ ```
356
+
357
+ Then you can load this model and run inference.
358
+ ```python
359
+ from sentence_transformers import SentenceTransformer
360
+
361
+ # Download from the 🤗 Hub
362
+ model = SentenceTransformer("sentence_transformers_model_id")
363
+ # Run inference
364
+ sentences = [
365
+ "The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.",
366
+ 'Walgreens Boots Alliance, Inc. operates as a pharmacy-led health and beauty retail company. It operates through two segments, the United States and International. The United States segment sells prescription drugs and an assortment of retail products, including health, wellness, beauty, personal care, consumable, and general merchandise products through its retail drugstores. It also provides central specialty pharmacy services and mail services. As of August 31, 2021, this segment operated 8,965 retail stores under the Walgreens and Duane Reade brands in the United States; and five specialty pharmacies. The International segment sells prescription drugs; and health and wellness, beauty, personal care, and other consumer products through its pharmacy-led health and beauty retail stores and optical practices, as well as through boots.com and an integrated mobile application. It also engages in pharmaceutical wholesaling and distribution business in Germany. As of August 31, 2021, this segment operated 4,031 retail stores under the Boots, Benavides, and Ahumada in the United Kingdom, Thailand, Norway, the Republic of Ireland, the Netherlands, Mexico, and Chile; and 548 optical practices, including 160 on a franchise basis. Walgreens Boots Alliance, Inc. was founded in 1901 and is based in Deerfield, Illinois.',
367
+ 'Liberty Broadband Corporation engages in the communications businesses. It operates through GCI Holdings and Charter segments. The GCI Holdings segment provides a range of wireless, data, video, voice, and managed services to residential customers, businesses, governmental entities, and educational and medical institutions primarily in Alaska under the GCI brand. The Charter segment offers subscription-based video services comprising video on demand, high-definition television, and digital video recorder service; local and long-distance calling, voicemail, call waiting, caller ID, call forwarding, and other voice services, as well as international calling services; and Spectrum TV. It also provides internet services, including an in-home Wi-Fi product that provides customers with high-performance wireless routers and managed Wi-Fi services; advanced community Wi-Fi; mobile internet; and a security suite that offers protection against computer viruses and spyware. In addition, this segment offers internet access, data networking, fiber connectivity to cellular towers and office buildings, video entertainment, and business telephone services; advertising services on cable television networks and digital outlets; and operates regional sports and news networks. Liberty Broadband Corporation was incorporated in 2014 and is based in Englewood, Colorado.',
368
+ ]
369
+ embeddings = model.encode(sentences)
370
+ print(embeddings.shape)
371
+ # [3, 768]
372
+
373
+ # Get the similarity scores for the embeddings
374
+ similarities = model.similarity(embeddings, embeddings)
375
+ print(similarities.shape)
376
+ # [3, 3]
377
+ ```
378
+
379
+ <!--
380
+ ### Direct Usage (Transformers)
381
+
382
+ <details><summary>Click to see the direct usage in Transformers</summary>
383
+
384
+ </details>
385
+ -->
386
+
387
+ <!--
388
+ ### Downstream Usage (Sentence Transformers)
389
+
390
+ You can finetune this model on your own dataset.
391
+
392
+ <details><summary>Click to expand</summary>
393
+
394
+ </details>
395
+ -->
396
+
397
+ <!--
398
+ ### Out-of-Scope Use
399
+
400
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
401
+ -->
402
+
403
+ <!--
404
+ ## Bias, Risks and Limitations
405
+
406
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
407
+ -->
408
+
409
+ <!--
410
+ ### Recommendations
411
+
412
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
413
+ -->
414
+
415
+ ## Training Details
416
+
417
+ ### Training Dataset
418
+
419
+ #### stage2-dataset
420
+
421
+ * Dataset: [stage2-dataset](https://huggingface.co/datasets/hobbang/stage2-dataset) at [cd393c2](https://huggingface.co/datasets/hobbang/stage2-dataset/tree/cd393c24f4017971e95aa6f73736f2fcb45e30a0)
422
+ * Size: 128,997 training samples
423
+ * Columns: <code>anchor</code> and <code>positive</code>
424
+ * Approximate statistics based on the first 1000 samples:
425
+ | | anchor | positive |
426
+ |:--------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
427
+ | type | string | string |
428
+ | details | <ul><li>min: 101 tokens</li><li>mean: 143.15 tokens</li><li>max: 186 tokens</li></ul> | <ul><li>min: 35 tokens</li><li>mean: 238.69 tokens</li><li>max: 384 tokens</li></ul> |
429
+ * Samples:
430
+ | anchor | positive |
431
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
432
+ | <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>JPMorgan Chase & Co. operates as a financial services company worldwide. It operates through four segments: Consumer & Community Banking (CCB), Corporate & Investment Bank (CIB), Commercial Banking (CB), and Asset & Wealth Management (AWM). The CCB segment offers s deposit, investment and lending products, payments, and services to consumers; lending, deposit, and cash management and payment solutions to small businesses; mortgage origination and servicing activities; residential mortgages and home equity loans; and credit card, auto loan, and leasing services. The CIB segment provides investment banking products and services, including corporate strategy and structure advisory, and equity and debt markets capital-raising services, as well as loan origination and syndication; payments and cross-border financing; and cash and derivative instruments, risk management solutions, prime brokerage, and research. This segment also offers securities services, including custody, fund accounting ...</code> |
433
+ | <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>JPMorgan Chase & Co. operates as a financial services company worldwide. It operates through four segments: Consumer & Community Banking (CCB), Corporate & Investment Bank (CIB), Commercial Banking (CB), and Asset & Wealth Management (AWM). The CCB segment offers s deposit, investment and lending products, payments, and services to consumers; lending, deposit, and cash management and payment solutions to small businesses; mortgage origination and servicing activities; residential mortgages and home equity loans; and credit card, auto loan, and leasing services. The CIB segment provides investment banking products and services, including corporate strategy and structure advisory, and equity and debt markets capital-raising services, as well as loan origination and syndication; payments and cross-border financing; and cash and derivative instruments, risk management solutions, prime brokerage, and research. This segment also offers securities services, including custody, fund accounting ...</code> |
434
+ | <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>The Allstate Corporation, together with its subsidiaries, provides property and casualty, and other insurance products in the United States and Canada. The company operates through Allstate Protection; Protection Services; Allstate Health and Benefits; and Run-off Property-Liability segments. The Allstate Protection segment offers private passenger auto and homeowners insurance; other personal lines products; and commercial lines products under the Allstate and Encompass brand names. The Protection Services segment provides consumer product protection plans and related technical support for mobile phones, consumer electronics, furniture, and appliances; finance and insurance products, including vehicle service contracts, guaranteed asset protection waivers, road hazard tire and wheel, and paint and fabric protection; towing, jump-start, lockout, fuel delivery, and tire change services; device and mobile data collection services; data and analytic solutions using automotive telematics i...</code> |
435
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
436
+ ```json
437
+ {
438
+ "scale": 20.0,
439
+ "similarity_fct": "cos_sim"
440
+ }
441
+ ```
442
+
443
+ ### Evaluation Dataset
444
+
445
+ #### stage2-dataset
446
+
447
+ * Dataset: [stage2-dataset](https://huggingface.co/datasets/hobbang/stage2-dataset) at [cd393c2](https://huggingface.co/datasets/hobbang/stage2-dataset/tree/cd393c24f4017971e95aa6f73736f2fcb45e30a0)
448
+ * Size: 16,944 evaluation samples
449
+ * Columns: <code>anchor</code> and <code>positive</code>
450
+ * Approximate statistics based on the first 1000 samples:
451
+ | | anchor | positive |
452
+ |:--------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
453
+ | type | string | string |
454
+ | details | <ul><li>min: 135 tokens</li><li>mean: 149.21 tokens</li><li>max: 214 tokens</li></ul> | <ul><li>min: 42 tokens</li><li>mean: 252.75 tokens</li><li>max: 384 tokens</li></ul> |
455
+ * Samples:
456
+ | anchor | positive |
457
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
458
+ | <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription ...</code> |
459
+ | <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cloud-based and on-premises business solutions for organizations and enterprise divisions. The Intelligent Cloud segment licenses SQL, Windows Servers, Visual Studio, System Center, and related Client Access Licenses; GitHub that provides a collaboration platform and code hosting service for developers; Nuance provides healthcare and enterprise AI solutions; and Azure, a cloud platform. It also offers enterprise support, Microsoft consulting, and nuance professional services to assist customers in developing, de...</code> |
460
+ | <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>NVIDIA Corporation provides graphics, and compute and networking solutions in the United States, Taiwan, China, and internationally. The company's Graphics segment offers GeForce GPUs for gaming and PCs, the GeForce NOW game streaming service and related infrastructure, and solutions for gaming platforms; Quadro/NVIDIA RTX GPUs for enterprise workstation graphics; vGPU software for cloud-based visual and virtual computing; automotive platforms for infotainment systems; and Omniverse software for building 3D designs and virtual worlds. Its Compute & Networking segment provides Data Center platforms and systems for AI, HPC, and accelerated computing; Mellanox networking and interconnect solutions; automotive AI Cockpit, autonomous driving development agreements, and autonomous vehicle solutions; cryptocurrency mining processors; Jetson for robotics and other embedded platforms; and NVIDIA AI Enterprise and other software. The company's products are used in gaming, professional visualizat...</code> |
461
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
462
+ ```json
463
+ {
464
+ "scale": 20.0,
465
+ "similarity_fct": "cos_sim"
466
+ }
467
+ ```
468
+
469
+ ### Training Hyperparameters
470
+ #### Non-Default Hyperparameters
471
+
472
+ - `eval_strategy`: steps
473
+ - `per_device_train_batch_size`: 64
474
+ - `per_device_eval_batch_size`: 32
475
+ - `learning_rate`: 3e-05
476
+ - `num_train_epochs`: 1
477
+ - `warmup_ratio`: 0.1
478
+ - `bf16`: True
479
+ - `dataloader_drop_last`: True
480
+ - `load_best_model_at_end`: True
481
+ - `batch_sampler`: no_duplicates
482
+
483
+ #### All Hyperparameters
484
+ <details><summary>Click to expand</summary>
485
+
486
+ - `overwrite_output_dir`: False
487
+ - `do_predict`: False
488
+ - `eval_strategy`: steps
489
+ - `prediction_loss_only`: True
490
+ - `per_device_train_batch_size`: 64
491
+ - `per_device_eval_batch_size`: 32
492
+ - `per_gpu_train_batch_size`: None
493
+ - `per_gpu_eval_batch_size`: None
494
+ - `gradient_accumulation_steps`: 1
495
+ - `eval_accumulation_steps`: None
496
+ - `torch_empty_cache_steps`: None
497
+ - `learning_rate`: 3e-05
498
+ - `weight_decay`: 0.0
499
+ - `adam_beta1`: 0.9
500
+ - `adam_beta2`: 0.999
501
+ - `adam_epsilon`: 1e-08
502
+ - `max_grad_norm`: 1.0
503
+ - `num_train_epochs`: 1
504
+ - `max_steps`: -1
505
+ - `lr_scheduler_type`: linear
506
+ - `lr_scheduler_kwargs`: {}
507
+ - `warmup_ratio`: 0.1
508
+ - `warmup_steps`: 0
509
+ - `log_level`: passive
510
+ - `log_level_replica`: warning
511
+ - `log_on_each_node`: True
512
+ - `logging_nan_inf_filter`: True
513
+ - `save_safetensors`: True
514
+ - `save_on_each_node`: False
515
+ - `save_only_model`: False
516
+ - `restore_callback_states_from_checkpoint`: False
517
+ - `no_cuda`: False
518
+ - `use_cpu`: False
519
+ - `use_mps_device`: False
520
+ - `seed`: 42
521
+ - `data_seed`: None
522
+ - `jit_mode_eval`: False
523
+ - `use_ipex`: False
524
+ - `bf16`: True
525
+ - `fp16`: False
526
+ - `fp16_opt_level`: O1
527
+ - `half_precision_backend`: auto
528
+ - `bf16_full_eval`: False
529
+ - `fp16_full_eval`: False
530
+ - `tf32`: None
531
+ - `local_rank`: 0
532
+ - `ddp_backend`: None
533
+ - `tpu_num_cores`: None
534
+ - `tpu_metrics_debug`: False
535
+ - `debug`: []
536
+ - `dataloader_drop_last`: True
537
+ - `dataloader_num_workers`: 0
538
+ - `dataloader_prefetch_factor`: None
539
+ - `past_index`: -1
540
+ - `disable_tqdm`: False
541
+ - `remove_unused_columns`: True
542
+ - `label_names`: None
543
+ - `load_best_model_at_end`: True
544
+ - `ignore_data_skip`: False
545
+ - `fsdp`: []
546
+ - `fsdp_min_num_params`: 0
547
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
548
+ - `tp_size`: 0
549
+ - `fsdp_transformer_layer_cls_to_wrap`: None
550
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
551
+ - `deepspeed`: None
552
+ - `label_smoothing_factor`: 0.0
553
+ - `optim`: adamw_torch
554
+ - `optim_args`: None
555
+ - `adafactor`: False
556
+ - `group_by_length`: False
557
+ - `length_column_name`: length
558
+ - `ddp_find_unused_parameters`: None
559
+ - `ddp_bucket_cap_mb`: None
560
+ - `ddp_broadcast_buffers`: False
561
+ - `dataloader_pin_memory`: True
562
+ - `dataloader_persistent_workers`: False
563
+ - `skip_memory_metrics`: True
564
+ - `use_legacy_prediction_loop`: False
565
+ - `push_to_hub`: False
566
+ - `resume_from_checkpoint`: None
567
+ - `hub_model_id`: None
568
+ - `hub_strategy`: every_save
569
+ - `hub_private_repo`: None
570
+ - `hub_always_push`: False
571
+ - `gradient_checkpointing`: False
572
+ - `gradient_checkpointing_kwargs`: None
573
+ - `include_inputs_for_metrics`: False
574
+ - `include_for_metrics`: []
575
+ - `eval_do_concat_batches`: True
576
+ - `fp16_backend`: auto
577
+ - `push_to_hub_model_id`: None
578
+ - `push_to_hub_organization`: None
579
+ - `mp_parameters`:
580
+ - `auto_find_batch_size`: False
581
+ - `full_determinism`: False
582
+ - `torchdynamo`: None
583
+ - `ray_scope`: last
584
+ - `ddp_timeout`: 1800
585
+ - `torch_compile`: False
586
+ - `torch_compile_backend`: None
587
+ - `torch_compile_mode`: None
588
+ - `include_tokens_per_second`: False
589
+ - `include_num_input_tokens_seen`: False
590
+ - `neftune_noise_alpha`: None
591
+ - `optim_target_modules`: None
592
+ - `batch_eval_metrics`: False
593
+ - `eval_on_start`: False
594
+ - `use_liger_kernel`: False
595
+ - `eval_use_gather_object`: False
596
+ - `average_tokens_across_devices`: False
597
+ - `prompts`: None
598
+ - `batch_sampler`: no_duplicates
599
+ - `multi_dataset_batch_sampler`: proportional
600
+
601
+ </details>
602
+
603
+ ### Training Logs
604
+ <details><summary>Click to expand</summary>
605
+
606
+ | Epoch | Step | Training Loss | Validation Loss |
607
+ |:------:|:----:|:-------------:|:---------------:|
608
+ | 0.0050 | 10 | 4.6656 | - |
609
+ | 0.0099 | 20 | 4.4733 | - |
610
+ | 0.0149 | 30 | 4.0093 | - |
611
+ | 0.0199 | 40 | 3.9259 | - |
612
+ | 0.0248 | 50 | 3.8315 | - |
613
+ | 0.0298 | 60 | 3.673 | - |
614
+ | 0.0347 | 70 | 3.5076 | - |
615
+ | 0.0397 | 80 | 3.4416 | - |
616
+ | 0.0447 | 90 | 3.4362 | - |
617
+ | 0.0496 | 100 | 3.3934 | - |
618
+ | 0.0546 | 110 | 3.3343 | - |
619
+ | 0.0596 | 120 | 3.3018 | - |
620
+ | 0.0645 | 130 | 3.2882 | - |
621
+ | 0.0695 | 140 | 3.3027 | - |
622
+ | 0.0744 | 150 | 3.2177 | - |
623
+ | 0.0794 | 160 | 3.2708 | - |
624
+ | 0.0844 | 170 | 3.2645 | - |
625
+ | 0.0893 | 180 | 3.1939 | - |
626
+ | 0.0943 | 190 | 3.0575 | - |
627
+ | 0.0993 | 200 | 3.0799 | - |
628
+ | 0.1042 | 210 | 3.0824 | - |
629
+ | 0.1092 | 220 | 3.0693 | - |
630
+ | 0.1141 | 230 | 3.1014 | - |
631
+ | 0.1191 | 240 | 3.0458 | - |
632
+ | 0.1241 | 250 | 3.04 | - |
633
+ | 0.1290 | 260 | 3.0311 | - |
634
+ | 0.1340 | 270 | 2.9778 | - |
635
+ | 0.1390 | 280 | 3.0701 | - |
636
+ | 0.1439 | 290 | 2.9039 | - |
637
+ | 0.1489 | 300 | 3.0449 | 2.5685 |
638
+ | 0.1538 | 310 | 2.8896 | - |
639
+ | 0.1588 | 320 | 3.0527 | - |
640
+ | 0.1638 | 330 | 3.0153 | - |
641
+ | 0.1687 | 340 | 2.869 | - |
642
+ | 0.1737 | 350 | 2.9678 | - |
643
+ | 0.1787 | 360 | 2.9756 | - |
644
+ | 0.1836 | 370 | 2.9348 | - |
645
+ | 0.1886 | 380 | 2.9967 | - |
646
+ | 0.1935 | 390 | 2.8953 | - |
647
+ | 0.1985 | 400 | 2.9546 | - |
648
+ | 0.2035 | 410 | 2.9919 | - |
649
+ | 0.2084 | 420 | 2.8487 | - |
650
+ | 0.2134 | 430 | 2.7609 | - |
651
+ | 0.2184 | 440 | 2.9126 | - |
652
+ | 0.2233 | 450 | 2.8991 | - |
653
+ | 0.2283 | 460 | 2.9272 | - |
654
+ | 0.2333 | 470 | 2.9084 | - |
655
+ | 0.2382 | 480 | 2.7963 | - |
656
+ | 0.2432 | 490 | 2.822 | - |
657
+ | 0.2481 | 500 | 2.9376 | - |
658
+ | 0.2531 | 510 | 2.8969 | - |
659
+ | 0.2581 | 520 | 2.7745 | - |
660
+ | 0.2630 | 530 | 2.8103 | - |
661
+ | 0.2680 | 540 | 2.8189 | - |
662
+ | 0.2730 | 550 | 2.8322 | - |
663
+ | 0.2779 | 560 | 2.7627 | - |
664
+ | 0.2829 | 570 | 2.7796 | - |
665
+ | 0.2878 | 580 | 2.8515 | - |
666
+ | 0.2928 | 590 | 2.8758 | - |
667
+ | 0.2978 | 600 | 2.7963 | 2.4142 |
668
+ | 0.3027 | 610 | 2.8259 | - |
669
+ | 0.3077 | 620 | 2.829 | - |
670
+ | 0.3127 | 630 | 2.7699 | - |
671
+ | 0.3176 | 640 | 2.7311 | - |
672
+ | 0.3226 | 650 | 2.735 | - |
673
+ | 0.3275 | 660 | 2.7306 | - |
674
+ | 0.3325 | 670 | 2.7467 | - |
675
+ | 0.3375 | 680 | 2.7494 | - |
676
+ | 0.3424 | 690 | 2.7386 | - |
677
+ | 0.3474 | 700 | 2.8513 | - |
678
+ | 0.3524 | 710 | 2.673 | - |
679
+ | 0.3573 | 720 | 2.8101 | - |
680
+ | 0.3623 | 730 | 2.7527 | - |
681
+ | 0.3672 | 740 | 2.7213 | - |
682
+ | 0.3722 | 750 | 2.753 | - |
683
+ | 0.3772 | 760 | 2.8034 | - |
684
+ | 0.3821 | 770 | 2.8288 | - |
685
+ | 0.3871 | 780 | 2.613 | - |
686
+ | 0.3921 | 790 | 2.7315 | - |
687
+ | 0.3970 | 800 | 2.8077 | - |
688
+ | 0.4020 | 810 | 2.7442 | - |
689
+ | 0.4069 | 820 | 2.7351 | - |
690
+ | 0.4119 | 830 | 2.7643 | - |
691
+ | 0.4169 | 840 | 2.8984 | - |
692
+ | 0.4218 | 850 | 2.7377 | - |
693
+ | 0.4268 | 860 | 2.7021 | - |
694
+ | 0.4318 | 870 | 2.6756 | - |
695
+ | 0.4367 | 880 | 2.7852 | - |
696
+ | 0.4417 | 890 | 2.7531 | - |
697
+ | 0.4467 | 900 | 2.6636 | 2.3456 |
698
+ | 0.4516 | 910 | 2.7089 | - |
699
+ | 0.4566 | 920 | 2.8029 | - |
700
+ | 0.4615 | 930 | 2.721 | - |
701
+ | 0.4665 | 940 | 2.5606 | - |
702
+ | 0.4715 | 950 | 2.6397 | - |
703
+ | 0.4764 | 960 | 2.6563 | - |
704
+ | 0.4814 | 970 | 2.7163 | - |
705
+ | 0.4864 | 980 | 2.6225 | - |
706
+ | 0.4913 | 990 | 2.645 | - |
707
+ | 0.4963 | 1000 | 2.6576 | - |
708
+ | 0.5012 | 1010 | 2.7019 | - |
709
+ | 0.5062 | 1020 | 2.7195 | - |
710
+ | 0.5112 | 1030 | 2.7242 | - |
711
+ | 0.5161 | 1040 | 2.6729 | - |
712
+ | 0.5211 | 1050 | 2.7637 | - |
713
+ | 0.5261 | 1060 | 2.677 | - |
714
+ | 0.5310 | 1070 | 2.7018 | - |
715
+ | 0.5360 | 1080 | 2.6469 | - |
716
+ | 0.5409 | 1090 | 2.7186 | - |
717
+ | 0.5459 | 1100 | 2.6728 | - |
718
+ | 0.5509 | 1110 | 2.6694 | - |
719
+ | 0.5558 | 1120 | 2.7839 | - |
720
+ | 0.5608 | 1130 | 2.5834 | - |
721
+ | 0.5658 | 1140 | 2.6905 | - |
722
+ | 0.5707 | 1150 | 2.7223 | - |
723
+ | 0.5757 | 1160 | 2.7235 | - |
724
+ | 0.5806 | 1170 | 2.636 | - |
725
+ | 0.5856 | 1180 | 2.6314 | - |
726
+ | 0.5906 | 1190 | 2.5941 | - |
727
+ | 0.5955 | 1200 | 2.7827 | 2.2911 |
728
+ | 0.6005 | 1210 | 2.6104 | - |
729
+ | 0.6055 | 1220 | 2.6148 | - |
730
+ | 0.6104 | 1230 | 2.6355 | - |
731
+ | 0.6154 | 1240 | 2.6269 | - |
732
+ | 0.6203 | 1250 | 2.6003 | - |
733
+ | 0.6253 | 1260 | 2.6256 | - |
734
+ | 0.6303 | 1270 | 2.6326 | - |
735
+ | 0.6352 | 1280 | 2.681 | - |
736
+ | 0.6402 | 1290 | 2.5776 | - |
737
+ | 0.6452 | 1300 | 2.7528 | - |
738
+ | 0.6501 | 1310 | 2.6076 | - |
739
+ | 0.6551 | 1320 | 2.5784 | - |
740
+ | 0.6600 | 1330 | 2.6064 | - |
741
+ | 0.6650 | 1340 | 2.5757 | - |
742
+ | 0.6700 | 1350 | 2.5851 | - |
743
+ | 0.6749 | 1360 | 2.6007 | - |
744
+ | 0.6799 | 1370 | 2.5674 | - |
745
+ | 0.6849 | 1380 | 2.6984 | - |
746
+ | 0.6898 | 1390 | 2.6202 | - |
747
+ | 0.6948 | 1400 | 2.6729 | - |
748
+ | 0.6998 | 1410 | 2.6683 | - |
749
+ | 0.7047 | 1420 | 2.6355 | - |
750
+ | 0.7097 | 1430 | 2.6033 | - |
751
+ | 0.7146 | 1440 | 2.6834 | - |
752
+ | 0.7196 | 1450 | 2.6597 | - |
753
+ | 0.7246 | 1460 | 2.6298 | - |
754
+ | 0.7295 | 1470 | 2.6232 | - |
755
+ | 0.7345 | 1480 | 2.5672 | - |
756
+ | 0.7395 | 1490 | 2.5139 | - |
757
+ | 0.7444 | 1500 | 2.6248 | 2.3090 |
758
+ | 0.7494 | 1510 | 2.6417 | - |
759
+ | 0.7543 | 1520 | 2.6197 | - |
760
+ | 0.7593 | 1530 | 2.6911 | - |
761
+ | 0.7643 | 1540 | 2.5542 | - |
762
+ | 0.7692 | 1550 | 2.6584 | - |
763
+ | 0.7742 | 1560 | 2.6182 | - |
764
+ | 0.7792 | 1570 | 2.6301 | - |
765
+ | 0.7841 | 1580 | 2.5629 | - |
766
+ | 0.7891 | 1590 | 2.5965 | - |
767
+ | 0.7940 | 1600 | 2.5722 | - |
768
+ | 0.7990 | 1610 | 2.5835 | - |
769
+ | 0.8040 | 1620 | 2.5901 | - |
770
+ | 0.8089 | 1630 | 2.6055 | - |
771
+ | 0.8139 | 1640 | 2.6019 | - |
772
+ | 0.8189 | 1650 | 2.6421 | - |
773
+ | 0.8238 | 1660 | 2.6049 | - |
774
+ | 0.8288 | 1670 | 2.5351 | - |
775
+ | 0.8337 | 1680 | 2.6158 | - |
776
+ | 0.8387 | 1690 | 2.5994 | - |
777
+ | 0.8437 | 1700 | 2.5816 | - |
778
+ | 0.8486 | 1710 | 2.5848 | - |
779
+ | 0.8536 | 1720 | 2.6138 | - |
780
+ | 0.8586 | 1730 | 2.5811 | - |
781
+ | 0.8635 | 1740 | 2.5933 | - |
782
+ | 0.8685 | 1750 | 2.5869 | - |
783
+ | 0.8734 | 1760 | 2.5464 | - |
784
+ | 0.8784 | 1770 | 2.6842 | - |
785
+ | 0.8834 | 1780 | 2.6312 | - |
786
+ | 0.8883 | 1790 | 2.5621 | - |
787
+ | 0.8933 | 1800 | 2.6103 | 2.2858 |
788
+
789
+ </details>
790
+
791
+ ### Framework Versions
792
+ - Python: 3.10.12
793
+ - Sentence Transformers: 4.1.0
794
+ - Transformers: 4.51.3
795
+ - PyTorch: 2.1.0+cu118
796
+ - Accelerate: 1.6.0
797
+ - Datasets: 3.5.0
798
+ - Tokenizers: 0.21.1
799
+
800
+ ## Citation
801
+
802
+ ### BibTeX
803
+
804
+ #### Sentence Transformers
805
+ ```bibtex
806
+ @inproceedings{reimers-2019-sentence-bert,
807
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
808
+ author = "Reimers, Nils and Gurevych, Iryna",
809
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
810
+ month = "11",
811
+ year = "2019",
812
+ publisher = "Association for Computational Linguistics",
813
+ url = "https://arxiv.org/abs/1908.10084",
814
+ }
815
+ ```
816
+
817
+ #### MultipleNegativesRankingLoss
818
+ ```bibtex
819
+ @misc{henderson2017efficient,
820
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
821
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
822
+ year={2017},
823
+ eprint={1705.00652},
824
+ archivePrefix={arXiv},
825
+ primaryClass={cs.CL}
826
+ }
827
+ ```
828
+
829
+ <!--
830
+ ## Glossary
831
+
832
+ *Clearly define terms in order to be accessible across audiences.*
833
+ -->
834
+
835
+ <!--
836
+ ## Model Card Authors
837
+
838
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
839
+ -->
840
+
841
+ <!--
842
+ ## Model Card Contact
843
+
844
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
845
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MPNetModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "mpnet",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "relative_attention_num_buckets": 32,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.3",
22
+ "vocab_size": 30527
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.1.0+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a82d71fb5c4db686344d46bd5f3b89c03175ea4388f90df4bed4961e6715f53
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02801de9884e5ed0a20cd47b4a20c53e0e4061c76d612a1cba170e4b46e9b937
3
+ size 871331770
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e953d565ab8d03d27a69e4a09113a526233508f1995ed065d8d184b59c250a61
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abcaaf39545a0471725d97cdeb8ce9b6da32d113c78097ef6a9f53753bf81f30
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
trainer_state.json ADDED
@@ -0,0 +1,1351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1800,
3
+ "best_metric": 2.285778284072876,
4
+ "best_model_checkpoint": "models/mini_stage2/checkpoint-1800",
5
+ "epoch": 0.8933002481389578,
6
+ "eval_steps": 300,
7
+ "global_step": 1800,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.004962779156327543,
14
+ "grad_norm": 7.886381149291992,
15
+ "learning_rate": 1.3366336633663365e-06,
16
+ "loss": 4.6656,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.009925558312655087,
21
+ "grad_norm": 5.6866021156311035,
22
+ "learning_rate": 2.821782178217822e-06,
23
+ "loss": 4.4733,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.01488833746898263,
28
+ "grad_norm": 4.5250091552734375,
29
+ "learning_rate": 4.306930693069306e-06,
30
+ "loss": 4.0093,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.019851116625310174,
35
+ "grad_norm": 3.3117995262145996,
36
+ "learning_rate": 5.792079207920792e-06,
37
+ "loss": 3.9259,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.02481389578163772,
42
+ "grad_norm": 3.5785279273986816,
43
+ "learning_rate": 7.277227722772277e-06,
44
+ "loss": 3.8315,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.02977667493796526,
49
+ "grad_norm": 5.32008695602417,
50
+ "learning_rate": 8.762376237623762e-06,
51
+ "loss": 3.673,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.034739454094292806,
56
+ "grad_norm": 4.776355266571045,
57
+ "learning_rate": 1.0247524752475248e-05,
58
+ "loss": 3.5076,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.03970223325062035,
63
+ "grad_norm": 4.523674488067627,
64
+ "learning_rate": 1.1732673267326734e-05,
65
+ "loss": 3.4416,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.04466501240694789,
70
+ "grad_norm": 4.444321155548096,
71
+ "learning_rate": 1.3217821782178218e-05,
72
+ "loss": 3.4362,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.04962779156327544,
77
+ "grad_norm": 4.364451885223389,
78
+ "learning_rate": 1.4702970297029704e-05,
79
+ "loss": 3.3934,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.05459057071960298,
84
+ "grad_norm": 5.098319053649902,
85
+ "learning_rate": 1.618811881188119e-05,
86
+ "loss": 3.3343,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.05955334987593052,
91
+ "grad_norm": 4.762153148651123,
92
+ "learning_rate": 1.7673267326732672e-05,
93
+ "loss": 3.3018,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.06451612903225806,
98
+ "grad_norm": 4.8034563064575195,
99
+ "learning_rate": 1.915841584158416e-05,
100
+ "loss": 3.2882,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.06947890818858561,
105
+ "grad_norm": 4.955429553985596,
106
+ "learning_rate": 2.0643564356435643e-05,
107
+ "loss": 3.3027,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.07444168734491315,
112
+ "grad_norm": 4.338839530944824,
113
+ "learning_rate": 2.212871287128713e-05,
114
+ "loss": 3.2177,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.0794044665012407,
119
+ "grad_norm": 4.854152202606201,
120
+ "learning_rate": 2.3613861386138615e-05,
121
+ "loss": 3.2708,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.08436724565756824,
126
+ "grad_norm": 4.855740070343018,
127
+ "learning_rate": 2.5099009900990097e-05,
128
+ "loss": 3.2645,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.08933002481389578,
133
+ "grad_norm": 5.82074499130249,
134
+ "learning_rate": 2.6584158415841586e-05,
135
+ "loss": 3.1939,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.09429280397022333,
140
+ "grad_norm": 5.284363746643066,
141
+ "learning_rate": 2.8069306930693072e-05,
142
+ "loss": 3.0575,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.09925558312655088,
147
+ "grad_norm": 5.663958549499512,
148
+ "learning_rate": 2.9554455445544555e-05,
149
+ "loss": 3.0799,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.10421836228287841,
154
+ "grad_norm": 5.67520809173584,
155
+ "learning_rate": 2.9884169884169887e-05,
156
+ "loss": 3.0824,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.10918114143920596,
161
+ "grad_norm": 5.432056903839111,
162
+ "learning_rate": 2.9718698290126862e-05,
163
+ "loss": 3.0693,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.1141439205955335,
168
+ "grad_norm": 5.387454032897949,
169
+ "learning_rate": 2.9553226696083837e-05,
170
+ "loss": 3.1014,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.11910669975186104,
175
+ "grad_norm": 5.515647888183594,
176
+ "learning_rate": 2.9387755102040816e-05,
177
+ "loss": 3.0458,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.12406947890818859,
182
+ "grad_norm": 5.962894916534424,
183
+ "learning_rate": 2.9222283507997795e-05,
184
+ "loss": 3.04,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.12903225806451613,
189
+ "grad_norm": 5.549206733703613,
190
+ "learning_rate": 2.9056811913954774e-05,
191
+ "loss": 3.0311,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.13399503722084366,
196
+ "grad_norm": 4.67368221282959,
197
+ "learning_rate": 2.889134031991175e-05,
198
+ "loss": 2.9778,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.13895781637717122,
203
+ "grad_norm": 5.445274829864502,
204
+ "learning_rate": 2.8725868725868724e-05,
205
+ "loss": 3.0701,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.14392059553349876,
210
+ "grad_norm": 5.643553256988525,
211
+ "learning_rate": 2.8560397131825703e-05,
212
+ "loss": 2.9039,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.1488833746898263,
217
+ "grad_norm": 5.453239440917969,
218
+ "learning_rate": 2.8394925537782682e-05,
219
+ "loss": 3.0449,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.1488833746898263,
224
+ "eval_loss": 2.5684616565704346,
225
+ "eval_runtime": 119.1131,
226
+ "eval_samples_per_second": 142.251,
227
+ "eval_steps_per_second": 4.45,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 0.15384615384615385,
232
+ "grad_norm": 5.800920009613037,
233
+ "learning_rate": 2.822945394373966e-05,
234
+ "loss": 2.8896,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 0.1588089330024814,
239
+ "grad_norm": 4.8736724853515625,
240
+ "learning_rate": 2.8063982349696636e-05,
241
+ "loss": 3.0527,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 0.16377171215880892,
246
+ "grad_norm": 4.498941898345947,
247
+ "learning_rate": 2.7898510755653615e-05,
248
+ "loss": 3.0153,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 0.1687344913151365,
253
+ "grad_norm": 5.518968105316162,
254
+ "learning_rate": 2.773303916161059e-05,
255
+ "loss": 2.869,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 0.17369727047146402,
260
+ "grad_norm": 5.431845188140869,
261
+ "learning_rate": 2.756756756756757e-05,
262
+ "loss": 2.9678,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 0.17866004962779156,
267
+ "grad_norm": 5.1342973709106445,
268
+ "learning_rate": 2.7402095973524544e-05,
269
+ "loss": 2.9756,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 0.18362282878411912,
274
+ "grad_norm": 4.8498101234436035,
275
+ "learning_rate": 2.7236624379481523e-05,
276
+ "loss": 2.9348,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 0.18858560794044665,
281
+ "grad_norm": 5.223404407501221,
282
+ "learning_rate": 2.70711527854385e-05,
283
+ "loss": 2.9967,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 0.1935483870967742,
288
+ "grad_norm": 5.078680038452148,
289
+ "learning_rate": 2.690568119139548e-05,
290
+ "loss": 2.8953,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 0.19851116625310175,
295
+ "grad_norm": 4.336000442504883,
296
+ "learning_rate": 2.6740209597352456e-05,
297
+ "loss": 2.9546,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 0.20347394540942929,
302
+ "grad_norm": 5.194223880767822,
303
+ "learning_rate": 2.657473800330943e-05,
304
+ "loss": 2.9919,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 0.20843672456575682,
309
+ "grad_norm": 4.870718002319336,
310
+ "learning_rate": 2.640926640926641e-05,
311
+ "loss": 2.8487,
312
+ "step": 420
313
+ },
314
+ {
315
+ "epoch": 0.21339950372208435,
316
+ "grad_norm": 5.526561260223389,
317
+ "learning_rate": 2.624379481522339e-05,
318
+ "loss": 2.7609,
319
+ "step": 430
320
+ },
321
+ {
322
+ "epoch": 0.21836228287841192,
323
+ "grad_norm": 5.372407913208008,
324
+ "learning_rate": 2.6078323221180364e-05,
325
+ "loss": 2.9126,
326
+ "step": 440
327
+ },
328
+ {
329
+ "epoch": 0.22332506203473945,
330
+ "grad_norm": 4.934298992156982,
331
+ "learning_rate": 2.5912851627137343e-05,
332
+ "loss": 2.8991,
333
+ "step": 450
334
+ },
335
+ {
336
+ "epoch": 0.228287841191067,
337
+ "grad_norm": 4.916905403137207,
338
+ "learning_rate": 2.574738003309432e-05,
339
+ "loss": 2.9272,
340
+ "step": 460
341
+ },
342
+ {
343
+ "epoch": 0.23325062034739455,
344
+ "grad_norm": 5.660480976104736,
345
+ "learning_rate": 2.5581908439051297e-05,
346
+ "loss": 2.9084,
347
+ "step": 470
348
+ },
349
+ {
350
+ "epoch": 0.23821339950372208,
351
+ "grad_norm": 4.987634658813477,
352
+ "learning_rate": 2.5416436845008272e-05,
353
+ "loss": 2.7963,
354
+ "step": 480
355
+ },
356
+ {
357
+ "epoch": 0.24317617866004962,
358
+ "grad_norm": 6.746623992919922,
359
+ "learning_rate": 2.525096525096525e-05,
360
+ "loss": 2.822,
361
+ "step": 490
362
+ },
363
+ {
364
+ "epoch": 0.24813895781637718,
365
+ "grad_norm": 4.658477783203125,
366
+ "learning_rate": 2.508549365692223e-05,
367
+ "loss": 2.9376,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.2531017369727047,
372
+ "grad_norm": 5.2974162101745605,
373
+ "learning_rate": 2.492002206287921e-05,
374
+ "loss": 2.8969,
375
+ "step": 510
376
+ },
377
+ {
378
+ "epoch": 0.25806451612903225,
379
+ "grad_norm": 5.682817459106445,
380
+ "learning_rate": 2.4754550468836184e-05,
381
+ "loss": 2.7745,
382
+ "step": 520
383
+ },
384
+ {
385
+ "epoch": 0.2630272952853598,
386
+ "grad_norm": 5.85469388961792,
387
+ "learning_rate": 2.458907887479316e-05,
388
+ "loss": 2.8103,
389
+ "step": 530
390
+ },
391
+ {
392
+ "epoch": 0.2679900744416873,
393
+ "grad_norm": 4.600194454193115,
394
+ "learning_rate": 2.4423607280750138e-05,
395
+ "loss": 2.8189,
396
+ "step": 540
397
+ },
398
+ {
399
+ "epoch": 0.2729528535980149,
400
+ "grad_norm": 6.168639659881592,
401
+ "learning_rate": 2.4258135686707117e-05,
402
+ "loss": 2.8322,
403
+ "step": 550
404
+ },
405
+ {
406
+ "epoch": 0.27791563275434245,
407
+ "grad_norm": 4.763863563537598,
408
+ "learning_rate": 2.4092664092664092e-05,
409
+ "loss": 2.7627,
410
+ "step": 560
411
+ },
412
+ {
413
+ "epoch": 0.28287841191067,
414
+ "grad_norm": 5.0977091789245605,
415
+ "learning_rate": 2.392719249862107e-05,
416
+ "loss": 2.7796,
417
+ "step": 570
418
+ },
419
+ {
420
+ "epoch": 0.2878411910669975,
421
+ "grad_norm": 4.786524772644043,
422
+ "learning_rate": 2.376172090457805e-05,
423
+ "loss": 2.8515,
424
+ "step": 580
425
+ },
426
+ {
427
+ "epoch": 0.29280397022332505,
428
+ "grad_norm": 5.386171817779541,
429
+ "learning_rate": 2.3596249310535025e-05,
430
+ "loss": 2.8758,
431
+ "step": 590
432
+ },
433
+ {
434
+ "epoch": 0.2977667493796526,
435
+ "grad_norm": 4.96185827255249,
436
+ "learning_rate": 2.3430777716492e-05,
437
+ "loss": 2.7963,
438
+ "step": 600
439
+ },
440
+ {
441
+ "epoch": 0.2977667493796526,
442
+ "eval_loss": 2.4141688346862793,
443
+ "eval_runtime": 112.2744,
444
+ "eval_samples_per_second": 150.916,
445
+ "eval_steps_per_second": 4.721,
446
+ "step": 600
447
+ },
448
+ {
449
+ "epoch": 0.3027295285359802,
450
+ "grad_norm": 7.280220985412598,
451
+ "learning_rate": 2.326530612244898e-05,
452
+ "loss": 2.8259,
453
+ "step": 610
454
+ },
455
+ {
456
+ "epoch": 0.3076923076923077,
457
+ "grad_norm": 5.671541690826416,
458
+ "learning_rate": 2.3099834528405958e-05,
459
+ "loss": 2.829,
460
+ "step": 620
461
+ },
462
+ {
463
+ "epoch": 0.31265508684863524,
464
+ "grad_norm": 5.268543720245361,
465
+ "learning_rate": 2.2934362934362936e-05,
466
+ "loss": 2.7699,
467
+ "step": 630
468
+ },
469
+ {
470
+ "epoch": 0.3176178660049628,
471
+ "grad_norm": 4.945257186889648,
472
+ "learning_rate": 2.2768891340319915e-05,
473
+ "loss": 2.7311,
474
+ "step": 640
475
+ },
476
+ {
477
+ "epoch": 0.3225806451612903,
478
+ "grad_norm": 5.198265075683594,
479
+ "learning_rate": 2.2603419746276887e-05,
480
+ "loss": 2.735,
481
+ "step": 650
482
+ },
483
+ {
484
+ "epoch": 0.32754342431761785,
485
+ "grad_norm": 4.624203205108643,
486
+ "learning_rate": 2.2437948152233866e-05,
487
+ "loss": 2.7306,
488
+ "step": 660
489
+ },
490
+ {
491
+ "epoch": 0.3325062034739454,
492
+ "grad_norm": 5.143148899078369,
493
+ "learning_rate": 2.2272476558190845e-05,
494
+ "loss": 2.7467,
495
+ "step": 670
496
+ },
497
+ {
498
+ "epoch": 0.337468982630273,
499
+ "grad_norm": 4.643360614776611,
500
+ "learning_rate": 2.2107004964147823e-05,
501
+ "loss": 2.7494,
502
+ "step": 680
503
+ },
504
+ {
505
+ "epoch": 0.3424317617866005,
506
+ "grad_norm": 5.404439926147461,
507
+ "learning_rate": 2.19415333701048e-05,
508
+ "loss": 2.7386,
509
+ "step": 690
510
+ },
511
+ {
512
+ "epoch": 0.34739454094292804,
513
+ "grad_norm": 4.590571403503418,
514
+ "learning_rate": 2.1776061776061778e-05,
515
+ "loss": 2.8513,
516
+ "step": 700
517
+ },
518
+ {
519
+ "epoch": 0.3523573200992556,
520
+ "grad_norm": 5.185126781463623,
521
+ "learning_rate": 2.1610590182018756e-05,
522
+ "loss": 2.673,
523
+ "step": 710
524
+ },
525
+ {
526
+ "epoch": 0.3573200992555831,
527
+ "grad_norm": 5.4743547439575195,
528
+ "learning_rate": 2.144511858797573e-05,
529
+ "loss": 2.8101,
530
+ "step": 720
531
+ },
532
+ {
533
+ "epoch": 0.36228287841191065,
534
+ "grad_norm": 5.463687896728516,
535
+ "learning_rate": 2.1279646993932707e-05,
536
+ "loss": 2.7527,
537
+ "step": 730
538
+ },
539
+ {
540
+ "epoch": 0.36724565756823824,
541
+ "grad_norm": 6.102997303009033,
542
+ "learning_rate": 2.1114175399889686e-05,
543
+ "loss": 2.7213,
544
+ "step": 740
545
+ },
546
+ {
547
+ "epoch": 0.37220843672456577,
548
+ "grad_norm": 5.1998724937438965,
549
+ "learning_rate": 2.0948703805846664e-05,
550
+ "loss": 2.753,
551
+ "step": 750
552
+ },
553
+ {
554
+ "epoch": 0.3771712158808933,
555
+ "grad_norm": 5.181605815887451,
556
+ "learning_rate": 2.0783232211803643e-05,
557
+ "loss": 2.8034,
558
+ "step": 760
559
+ },
560
+ {
561
+ "epoch": 0.38213399503722084,
562
+ "grad_norm": 5.865420818328857,
563
+ "learning_rate": 2.061776061776062e-05,
564
+ "loss": 2.8288,
565
+ "step": 770
566
+ },
567
+ {
568
+ "epoch": 0.3870967741935484,
569
+ "grad_norm": 4.965085029602051,
570
+ "learning_rate": 2.0452289023717594e-05,
571
+ "loss": 2.613,
572
+ "step": 780
573
+ },
574
+ {
575
+ "epoch": 0.3920595533498759,
576
+ "grad_norm": 5.534601211547852,
577
+ "learning_rate": 2.0286817429674573e-05,
578
+ "loss": 2.7315,
579
+ "step": 790
580
+ },
581
+ {
582
+ "epoch": 0.3970223325062035,
583
+ "grad_norm": 5.220632076263428,
584
+ "learning_rate": 2.012134583563155e-05,
585
+ "loss": 2.8077,
586
+ "step": 800
587
+ },
588
+ {
589
+ "epoch": 0.40198511166253104,
590
+ "grad_norm": 5.059537410736084,
591
+ "learning_rate": 1.9955874241588527e-05,
592
+ "loss": 2.7442,
593
+ "step": 810
594
+ },
595
+ {
596
+ "epoch": 0.40694789081885857,
597
+ "grad_norm": 5.325795650482178,
598
+ "learning_rate": 1.9790402647545506e-05,
599
+ "loss": 2.7351,
600
+ "step": 820
601
+ },
602
+ {
603
+ "epoch": 0.4119106699751861,
604
+ "grad_norm": 8.455648422241211,
605
+ "learning_rate": 1.9624931053502484e-05,
606
+ "loss": 2.7643,
607
+ "step": 830
608
+ },
609
+ {
610
+ "epoch": 0.41687344913151364,
611
+ "grad_norm": 5.344241619110107,
612
+ "learning_rate": 1.945945945945946e-05,
613
+ "loss": 2.8984,
614
+ "step": 840
615
+ },
616
+ {
617
+ "epoch": 0.4218362282878412,
618
+ "grad_norm": 4.816617965698242,
619
+ "learning_rate": 1.9293987865416435e-05,
620
+ "loss": 2.7377,
621
+ "step": 850
622
+ },
623
+ {
624
+ "epoch": 0.4267990074441687,
625
+ "grad_norm": 5.334592819213867,
626
+ "learning_rate": 1.9128516271373414e-05,
627
+ "loss": 2.7021,
628
+ "step": 860
629
+ },
630
+ {
631
+ "epoch": 0.4317617866004963,
632
+ "grad_norm": 5.0712385177612305,
633
+ "learning_rate": 1.8963044677330393e-05,
634
+ "loss": 2.6756,
635
+ "step": 870
636
+ },
637
+ {
638
+ "epoch": 0.43672456575682383,
639
+ "grad_norm": 5.225603103637695,
640
+ "learning_rate": 1.879757308328737e-05,
641
+ "loss": 2.7852,
642
+ "step": 880
643
+ },
644
+ {
645
+ "epoch": 0.44168734491315137,
646
+ "grad_norm": 5.172723770141602,
647
+ "learning_rate": 1.8632101489244347e-05,
648
+ "loss": 2.7531,
649
+ "step": 890
650
+ },
651
+ {
652
+ "epoch": 0.4466501240694789,
653
+ "grad_norm": 4.887333869934082,
654
+ "learning_rate": 1.8466629895201322e-05,
655
+ "loss": 2.6636,
656
+ "step": 900
657
+ },
658
+ {
659
+ "epoch": 0.4466501240694789,
660
+ "eval_loss": 2.3456013202667236,
661
+ "eval_runtime": 113.8402,
662
+ "eval_samples_per_second": 148.84,
663
+ "eval_steps_per_second": 4.656,
664
+ "step": 900
665
+ },
666
+ {
667
+ "epoch": 0.45161290322580644,
668
+ "grad_norm": 5.249894142150879,
669
+ "learning_rate": 1.83011583011583e-05,
670
+ "loss": 2.7089,
671
+ "step": 910
672
+ },
673
+ {
674
+ "epoch": 0.456575682382134,
675
+ "grad_norm": 4.984340667724609,
676
+ "learning_rate": 1.813568670711528e-05,
677
+ "loss": 2.8029,
678
+ "step": 920
679
+ },
680
+ {
681
+ "epoch": 0.46153846153846156,
682
+ "grad_norm": 5.297105312347412,
683
+ "learning_rate": 1.7970215113072255e-05,
684
+ "loss": 2.721,
685
+ "step": 930
686
+ },
687
+ {
688
+ "epoch": 0.4665012406947891,
689
+ "grad_norm": 5.074361801147461,
690
+ "learning_rate": 1.7804743519029234e-05,
691
+ "loss": 2.5606,
692
+ "step": 940
693
+ },
694
+ {
695
+ "epoch": 0.47146401985111663,
696
+ "grad_norm": 5.639915943145752,
697
+ "learning_rate": 1.7639271924986212e-05,
698
+ "loss": 2.6397,
699
+ "step": 950
700
+ },
701
+ {
702
+ "epoch": 0.47642679900744417,
703
+ "grad_norm": 5.445567607879639,
704
+ "learning_rate": 1.7473800330943188e-05,
705
+ "loss": 2.6563,
706
+ "step": 960
707
+ },
708
+ {
709
+ "epoch": 0.4813895781637717,
710
+ "grad_norm": 4.462475299835205,
711
+ "learning_rate": 1.7308328736900166e-05,
712
+ "loss": 2.7163,
713
+ "step": 970
714
+ },
715
+ {
716
+ "epoch": 0.48635235732009924,
717
+ "grad_norm": 5.833073139190674,
718
+ "learning_rate": 1.7142857142857142e-05,
719
+ "loss": 2.6225,
720
+ "step": 980
721
+ },
722
+ {
723
+ "epoch": 0.4913151364764268,
724
+ "grad_norm": 5.428512096405029,
725
+ "learning_rate": 1.697738554881412e-05,
726
+ "loss": 2.645,
727
+ "step": 990
728
+ },
729
+ {
730
+ "epoch": 0.49627791563275436,
731
+ "grad_norm": 4.768510818481445,
732
+ "learning_rate": 1.68119139547711e-05,
733
+ "loss": 2.6576,
734
+ "step": 1000
735
+ },
736
+ {
737
+ "epoch": 0.5012406947890818,
738
+ "grad_norm": 4.692178726196289,
739
+ "learning_rate": 1.6646442360728078e-05,
740
+ "loss": 2.7019,
741
+ "step": 1010
742
+ },
743
+ {
744
+ "epoch": 0.5062034739454094,
745
+ "grad_norm": 5.51165771484375,
746
+ "learning_rate": 1.6480970766685053e-05,
747
+ "loss": 2.7195,
748
+ "step": 1020
749
+ },
750
+ {
751
+ "epoch": 0.511166253101737,
752
+ "grad_norm": 4.9710516929626465,
753
+ "learning_rate": 1.631549917264203e-05,
754
+ "loss": 2.7242,
755
+ "step": 1030
756
+ },
757
+ {
758
+ "epoch": 0.5161290322580645,
759
+ "grad_norm": 5.166830062866211,
760
+ "learning_rate": 1.6150027578599008e-05,
761
+ "loss": 2.6729,
762
+ "step": 1040
763
+ },
764
+ {
765
+ "epoch": 0.5210918114143921,
766
+ "grad_norm": 4.945868015289307,
767
+ "learning_rate": 1.5984555984555986e-05,
768
+ "loss": 2.7637,
769
+ "step": 1050
770
+ },
771
+ {
772
+ "epoch": 0.5260545905707196,
773
+ "grad_norm": 5.7991943359375,
774
+ "learning_rate": 1.581908439051296e-05,
775
+ "loss": 2.677,
776
+ "step": 1060
777
+ },
778
+ {
779
+ "epoch": 0.5310173697270472,
780
+ "grad_norm": 5.255971431732178,
781
+ "learning_rate": 1.565361279646994e-05,
782
+ "loss": 2.7018,
783
+ "step": 1070
784
+ },
785
+ {
786
+ "epoch": 0.5359801488833746,
787
+ "grad_norm": 5.910732746124268,
788
+ "learning_rate": 1.548814120242692e-05,
789
+ "loss": 2.6469,
790
+ "step": 1080
791
+ },
792
+ {
793
+ "epoch": 0.5409429280397022,
794
+ "grad_norm": 4.5612616539001465,
795
+ "learning_rate": 1.5322669608383895e-05,
796
+ "loss": 2.7186,
797
+ "step": 1090
798
+ },
799
+ {
800
+ "epoch": 0.5459057071960298,
801
+ "grad_norm": 5.111081600189209,
802
+ "learning_rate": 1.515719801434087e-05,
803
+ "loss": 2.6728,
804
+ "step": 1100
805
+ },
806
+ {
807
+ "epoch": 0.5508684863523573,
808
+ "grad_norm": 5.450465202331543,
809
+ "learning_rate": 1.4991726420297849e-05,
810
+ "loss": 2.6694,
811
+ "step": 1110
812
+ },
813
+ {
814
+ "epoch": 0.5558312655086849,
815
+ "grad_norm": 4.850940704345703,
816
+ "learning_rate": 1.4826254826254827e-05,
817
+ "loss": 2.7839,
818
+ "step": 1120
819
+ },
820
+ {
821
+ "epoch": 0.5607940446650124,
822
+ "grad_norm": 5.036434173583984,
823
+ "learning_rate": 1.4660783232211803e-05,
824
+ "loss": 2.5834,
825
+ "step": 1130
826
+ },
827
+ {
828
+ "epoch": 0.56575682382134,
829
+ "grad_norm": 40.256370544433594,
830
+ "learning_rate": 1.4495311638168781e-05,
831
+ "loss": 2.6905,
832
+ "step": 1140
833
+ },
834
+ {
835
+ "epoch": 0.5707196029776674,
836
+ "grad_norm": 5.939204692840576,
837
+ "learning_rate": 1.4329840044125759e-05,
838
+ "loss": 2.7223,
839
+ "step": 1150
840
+ },
841
+ {
842
+ "epoch": 0.575682382133995,
843
+ "grad_norm": 5.288106441497803,
844
+ "learning_rate": 1.4164368450082736e-05,
845
+ "loss": 2.7235,
846
+ "step": 1160
847
+ },
848
+ {
849
+ "epoch": 0.5806451612903226,
850
+ "grad_norm": 5.540660858154297,
851
+ "learning_rate": 1.3998896856039713e-05,
852
+ "loss": 2.636,
853
+ "step": 1170
854
+ },
855
+ {
856
+ "epoch": 0.5856079404466501,
857
+ "grad_norm": 5.182808876037598,
858
+ "learning_rate": 1.3833425261996691e-05,
859
+ "loss": 2.6314,
860
+ "step": 1180
861
+ },
862
+ {
863
+ "epoch": 0.5905707196029777,
864
+ "grad_norm": 4.9914140701293945,
865
+ "learning_rate": 1.3667953667953668e-05,
866
+ "loss": 2.5941,
867
+ "step": 1190
868
+ },
869
+ {
870
+ "epoch": 0.5955334987593052,
871
+ "grad_norm": 4.8229475021362305,
872
+ "learning_rate": 1.3502482073910646e-05,
873
+ "loss": 2.7827,
874
+ "step": 1200
875
+ },
876
+ {
877
+ "epoch": 0.5955334987593052,
878
+ "eval_loss": 2.2910733222961426,
879
+ "eval_runtime": 115.8769,
880
+ "eval_samples_per_second": 146.224,
881
+ "eval_steps_per_second": 4.574,
882
+ "step": 1200
883
+ },
884
+ {
885
+ "epoch": 0.6004962779156328,
886
+ "grad_norm": 5.556408882141113,
887
+ "learning_rate": 1.3337010479867623e-05,
888
+ "loss": 2.6104,
889
+ "step": 1210
890
+ },
891
+ {
892
+ "epoch": 0.6054590570719603,
893
+ "grad_norm": 6.0620551109313965,
894
+ "learning_rate": 1.3171538885824601e-05,
895
+ "loss": 2.6148,
896
+ "step": 1220
897
+ },
898
+ {
899
+ "epoch": 0.6104218362282878,
900
+ "grad_norm": 5.285867214202881,
901
+ "learning_rate": 1.3006067291781577e-05,
902
+ "loss": 2.6355,
903
+ "step": 1230
904
+ },
905
+ {
906
+ "epoch": 0.6153846153846154,
907
+ "grad_norm": 5.499397277832031,
908
+ "learning_rate": 1.2840595697738555e-05,
909
+ "loss": 2.6269,
910
+ "step": 1240
911
+ },
912
+ {
913
+ "epoch": 0.6203473945409429,
914
+ "grad_norm": 4.932219505310059,
915
+ "learning_rate": 1.2675124103695532e-05,
916
+ "loss": 2.6003,
917
+ "step": 1250
918
+ },
919
+ {
920
+ "epoch": 0.6253101736972705,
921
+ "grad_norm": 5.04417610168457,
922
+ "learning_rate": 1.250965250965251e-05,
923
+ "loss": 2.6256,
924
+ "step": 1260
925
+ },
926
+ {
927
+ "epoch": 0.630272952853598,
928
+ "grad_norm": 5.0753679275512695,
929
+ "learning_rate": 1.2344180915609488e-05,
930
+ "loss": 2.6326,
931
+ "step": 1270
932
+ },
933
+ {
934
+ "epoch": 0.6352357320099256,
935
+ "grad_norm": 5.003329753875732,
936
+ "learning_rate": 1.2178709321566465e-05,
937
+ "loss": 2.681,
938
+ "step": 1280
939
+ },
940
+ {
941
+ "epoch": 0.6401985111662531,
942
+ "grad_norm": 5.094631671905518,
943
+ "learning_rate": 1.2013237727523442e-05,
944
+ "loss": 2.5776,
945
+ "step": 1290
946
+ },
947
+ {
948
+ "epoch": 0.6451612903225806,
949
+ "grad_norm": 4.915186405181885,
950
+ "learning_rate": 1.184776613348042e-05,
951
+ "loss": 2.7528,
952
+ "step": 1300
953
+ },
954
+ {
955
+ "epoch": 0.6501240694789082,
956
+ "grad_norm": 5.3234076499938965,
957
+ "learning_rate": 1.1682294539437398e-05,
958
+ "loss": 2.6076,
959
+ "step": 1310
960
+ },
961
+ {
962
+ "epoch": 0.6550868486352357,
963
+ "grad_norm": 6.5297064781188965,
964
+ "learning_rate": 1.1516822945394374e-05,
965
+ "loss": 2.5784,
966
+ "step": 1320
967
+ },
968
+ {
969
+ "epoch": 0.6600496277915633,
970
+ "grad_norm": 5.549312591552734,
971
+ "learning_rate": 1.1351351351351352e-05,
972
+ "loss": 2.6064,
973
+ "step": 1330
974
+ },
975
+ {
976
+ "epoch": 0.6650124069478908,
977
+ "grad_norm": 4.589717388153076,
978
+ "learning_rate": 1.118587975730833e-05,
979
+ "loss": 2.5757,
980
+ "step": 1340
981
+ },
982
+ {
983
+ "epoch": 0.6699751861042184,
984
+ "grad_norm": 6.217405796051025,
985
+ "learning_rate": 1.1020408163265306e-05,
986
+ "loss": 2.5851,
987
+ "step": 1350
988
+ },
989
+ {
990
+ "epoch": 0.674937965260546,
991
+ "grad_norm": 5.602436542510986,
992
+ "learning_rate": 1.0854936569222283e-05,
993
+ "loss": 2.6007,
994
+ "step": 1360
995
+ },
996
+ {
997
+ "epoch": 0.6799007444168734,
998
+ "grad_norm": 4.934968948364258,
999
+ "learning_rate": 1.0689464975179262e-05,
1000
+ "loss": 2.5674,
1001
+ "step": 1370
1002
+ },
1003
+ {
1004
+ "epoch": 0.684863523573201,
1005
+ "grad_norm": 5.8763322830200195,
1006
+ "learning_rate": 1.0523993381136238e-05,
1007
+ "loss": 2.6984,
1008
+ "step": 1380
1009
+ },
1010
+ {
1011
+ "epoch": 0.6898263027295285,
1012
+ "grad_norm": 5.077223777770996,
1013
+ "learning_rate": 1.0358521787093216e-05,
1014
+ "loss": 2.6202,
1015
+ "step": 1390
1016
+ },
1017
+ {
1018
+ "epoch": 0.6947890818858561,
1019
+ "grad_norm": 6.423431873321533,
1020
+ "learning_rate": 1.0193050193050193e-05,
1021
+ "loss": 2.6729,
1022
+ "step": 1400
1023
+ },
1024
+ {
1025
+ "epoch": 0.6997518610421837,
1026
+ "grad_norm": 5.683679103851318,
1027
+ "learning_rate": 1.002757859900717e-05,
1028
+ "loss": 2.6683,
1029
+ "step": 1410
1030
+ },
1031
+ {
1032
+ "epoch": 0.7047146401985112,
1033
+ "grad_norm": 5.217991828918457,
1034
+ "learning_rate": 9.862107004964148e-06,
1035
+ "loss": 2.6355,
1036
+ "step": 1420
1037
+ },
1038
+ {
1039
+ "epoch": 0.7096774193548387,
1040
+ "grad_norm": 5.0821332931518555,
1041
+ "learning_rate": 9.696635410921126e-06,
1042
+ "loss": 2.6033,
1043
+ "step": 1430
1044
+ },
1045
+ {
1046
+ "epoch": 0.7146401985111662,
1047
+ "grad_norm": 5.337713718414307,
1048
+ "learning_rate": 9.531163816878102e-06,
1049
+ "loss": 2.6834,
1050
+ "step": 1440
1051
+ },
1052
+ {
1053
+ "epoch": 0.7196029776674938,
1054
+ "grad_norm": 5.425897598266602,
1055
+ "learning_rate": 9.36569222283508e-06,
1056
+ "loss": 2.6597,
1057
+ "step": 1450
1058
+ },
1059
+ {
1060
+ "epoch": 0.7245657568238213,
1061
+ "grad_norm": 4.878344535827637,
1062
+ "learning_rate": 9.200220628792057e-06,
1063
+ "loss": 2.6298,
1064
+ "step": 1460
1065
+ },
1066
+ {
1067
+ "epoch": 0.7295285359801489,
1068
+ "grad_norm": 5.754893779754639,
1069
+ "learning_rate": 9.034749034749034e-06,
1070
+ "loss": 2.6232,
1071
+ "step": 1470
1072
+ },
1073
+ {
1074
+ "epoch": 0.7344913151364765,
1075
+ "grad_norm": 4.991476535797119,
1076
+ "learning_rate": 8.869277440706012e-06,
1077
+ "loss": 2.5672,
1078
+ "step": 1480
1079
+ },
1080
+ {
1081
+ "epoch": 0.739454094292804,
1082
+ "grad_norm": 6.236905097961426,
1083
+ "learning_rate": 8.70380584666299e-06,
1084
+ "loss": 2.5139,
1085
+ "step": 1490
1086
+ },
1087
+ {
1088
+ "epoch": 0.7444168734491315,
1089
+ "grad_norm": 4.892019748687744,
1090
+ "learning_rate": 8.538334252619966e-06,
1091
+ "loss": 2.6248,
1092
+ "step": 1500
1093
+ },
1094
+ {
1095
+ "epoch": 0.7444168734491315,
1096
+ "eval_loss": 2.3090243339538574,
1097
+ "eval_runtime": 114.529,
1098
+ "eval_samples_per_second": 147.945,
1099
+ "eval_steps_per_second": 4.628,
1100
+ "step": 1500
1101
+ },
1102
+ {
1103
+ "epoch": 0.749379652605459,
1104
+ "grad_norm": 5.293705463409424,
1105
+ "learning_rate": 8.372862658576944e-06,
1106
+ "loss": 2.6417,
1107
+ "step": 1510
1108
+ },
1109
+ {
1110
+ "epoch": 0.7543424317617866,
1111
+ "grad_norm": 6.500164031982422,
1112
+ "learning_rate": 8.207391064533921e-06,
1113
+ "loss": 2.6197,
1114
+ "step": 1520
1115
+ },
1116
+ {
1117
+ "epoch": 0.7593052109181141,
1118
+ "grad_norm": 5.456136226654053,
1119
+ "learning_rate": 8.0419194704909e-06,
1120
+ "loss": 2.6911,
1121
+ "step": 1530
1122
+ },
1123
+ {
1124
+ "epoch": 0.7642679900744417,
1125
+ "grad_norm": 5.395550727844238,
1126
+ "learning_rate": 7.876447876447876e-06,
1127
+ "loss": 2.5542,
1128
+ "step": 1540
1129
+ },
1130
+ {
1131
+ "epoch": 0.7692307692307693,
1132
+ "grad_norm": 5.446476936340332,
1133
+ "learning_rate": 7.710976282404854e-06,
1134
+ "loss": 2.6584,
1135
+ "step": 1550
1136
+ },
1137
+ {
1138
+ "epoch": 0.7741935483870968,
1139
+ "grad_norm": 5.699354648590088,
1140
+ "learning_rate": 7.5455046883618305e-06,
1141
+ "loss": 2.6182,
1142
+ "step": 1560
1143
+ },
1144
+ {
1145
+ "epoch": 0.7791563275434243,
1146
+ "grad_norm": 5.835291862487793,
1147
+ "learning_rate": 7.3800330943188084e-06,
1148
+ "loss": 2.6301,
1149
+ "step": 1570
1150
+ },
1151
+ {
1152
+ "epoch": 0.7841191066997518,
1153
+ "grad_norm": 5.066339015960693,
1154
+ "learning_rate": 7.214561500275786e-06,
1155
+ "loss": 2.5629,
1156
+ "step": 1580
1157
+ },
1158
+ {
1159
+ "epoch": 0.7890818858560794,
1160
+ "grad_norm": 6.113178253173828,
1161
+ "learning_rate": 7.049089906232763e-06,
1162
+ "loss": 2.5965,
1163
+ "step": 1590
1164
+ },
1165
+ {
1166
+ "epoch": 0.794044665012407,
1167
+ "grad_norm": 5.544576168060303,
1168
+ "learning_rate": 6.8836183121897404e-06,
1169
+ "loss": 2.5722,
1170
+ "step": 1600
1171
+ },
1172
+ {
1173
+ "epoch": 0.7990074441687345,
1174
+ "grad_norm": 5.560300350189209,
1175
+ "learning_rate": 6.718146718146718e-06,
1176
+ "loss": 2.5835,
1177
+ "step": 1610
1178
+ },
1179
+ {
1180
+ "epoch": 0.8039702233250621,
1181
+ "grad_norm": 5.354392051696777,
1182
+ "learning_rate": 6.552675124103695e-06,
1183
+ "loss": 2.5901,
1184
+ "step": 1620
1185
+ },
1186
+ {
1187
+ "epoch": 0.8089330024813896,
1188
+ "grad_norm": 5.256809234619141,
1189
+ "learning_rate": 6.387203530060673e-06,
1190
+ "loss": 2.6055,
1191
+ "step": 1630
1192
+ },
1193
+ {
1194
+ "epoch": 0.8138957816377171,
1195
+ "grad_norm": 6.089339256286621,
1196
+ "learning_rate": 6.22173193601765e-06,
1197
+ "loss": 2.6019,
1198
+ "step": 1640
1199
+ },
1200
+ {
1201
+ "epoch": 0.8188585607940446,
1202
+ "grad_norm": 4.713355541229248,
1203
+ "learning_rate": 6.056260341974627e-06,
1204
+ "loss": 2.6421,
1205
+ "step": 1650
1206
+ },
1207
+ {
1208
+ "epoch": 0.8238213399503722,
1209
+ "grad_norm": 6.513923168182373,
1210
+ "learning_rate": 5.890788747931605e-06,
1211
+ "loss": 2.6049,
1212
+ "step": 1660
1213
+ },
1214
+ {
1215
+ "epoch": 0.8287841191066998,
1216
+ "grad_norm": 5.37587833404541,
1217
+ "learning_rate": 5.725317153888582e-06,
1218
+ "loss": 2.5351,
1219
+ "step": 1670
1220
+ },
1221
+ {
1222
+ "epoch": 0.8337468982630273,
1223
+ "grad_norm": 6.346861362457275,
1224
+ "learning_rate": 5.5598455598455594e-06,
1225
+ "loss": 2.6158,
1226
+ "step": 1680
1227
+ },
1228
+ {
1229
+ "epoch": 0.8387096774193549,
1230
+ "grad_norm": 5.206553936004639,
1231
+ "learning_rate": 5.394373965802537e-06,
1232
+ "loss": 2.5994,
1233
+ "step": 1690
1234
+ },
1235
+ {
1236
+ "epoch": 0.8436724565756824,
1237
+ "grad_norm": 6.567471504211426,
1238
+ "learning_rate": 5.228902371759514e-06,
1239
+ "loss": 2.5816,
1240
+ "step": 1700
1241
+ },
1242
+ {
1243
+ "epoch": 0.8486352357320099,
1244
+ "grad_norm": 5.689151287078857,
1245
+ "learning_rate": 5.0634307777164914e-06,
1246
+ "loss": 2.5848,
1247
+ "step": 1710
1248
+ },
1249
+ {
1250
+ "epoch": 0.8535980148883374,
1251
+ "grad_norm": 5.288684844970703,
1252
+ "learning_rate": 4.897959183673469e-06,
1253
+ "loss": 2.6138,
1254
+ "step": 1720
1255
+ },
1256
+ {
1257
+ "epoch": 0.858560794044665,
1258
+ "grad_norm": 5.238705635070801,
1259
+ "learning_rate": 4.732487589630447e-06,
1260
+ "loss": 2.5811,
1261
+ "step": 1730
1262
+ },
1263
+ {
1264
+ "epoch": 0.8635235732009926,
1265
+ "grad_norm": 5.114046573638916,
1266
+ "learning_rate": 4.567015995587424e-06,
1267
+ "loss": 2.5933,
1268
+ "step": 1740
1269
+ },
1270
+ {
1271
+ "epoch": 0.8684863523573201,
1272
+ "grad_norm": 6.13997220993042,
1273
+ "learning_rate": 4.401544401544402e-06,
1274
+ "loss": 2.5869,
1275
+ "step": 1750
1276
+ },
1277
+ {
1278
+ "epoch": 0.8734491315136477,
1279
+ "grad_norm": 5.258871555328369,
1280
+ "learning_rate": 4.236072807501379e-06,
1281
+ "loss": 2.5464,
1282
+ "step": 1760
1283
+ },
1284
+ {
1285
+ "epoch": 0.8784119106699751,
1286
+ "grad_norm": 5.823874473571777,
1287
+ "learning_rate": 4.070601213458356e-06,
1288
+ "loss": 2.6842,
1289
+ "step": 1770
1290
+ },
1291
+ {
1292
+ "epoch": 0.8833746898263027,
1293
+ "grad_norm": 5.351442337036133,
1294
+ "learning_rate": 3.905129619415334e-06,
1295
+ "loss": 2.6312,
1296
+ "step": 1780
1297
+ },
1298
+ {
1299
+ "epoch": 0.8883374689826302,
1300
+ "grad_norm": 5.9153947830200195,
1301
+ "learning_rate": 3.739658025372311e-06,
1302
+ "loss": 2.5621,
1303
+ "step": 1790
1304
+ },
1305
+ {
1306
+ "epoch": 0.8933002481389578,
1307
+ "grad_norm": 5.032203197479248,
1308
+ "learning_rate": 3.5741864313292883e-06,
1309
+ "loss": 2.6103,
1310
+ "step": 1800
1311
+ },
1312
+ {
1313
+ "epoch": 0.8933002481389578,
1314
+ "eval_loss": 2.285778284072876,
1315
+ "eval_runtime": 108.9864,
1316
+ "eval_samples_per_second": 155.469,
1317
+ "eval_steps_per_second": 4.863,
1318
+ "step": 1800
1319
+ }
1320
+ ],
1321
+ "logging_steps": 10,
1322
+ "max_steps": 2015,
1323
+ "num_input_tokens_seen": 0,
1324
+ "num_train_epochs": 1,
1325
+ "save_steps": 300,
1326
+ "stateful_callbacks": {
1327
+ "EarlyStoppingCallback": {
1328
+ "args": {
1329
+ "early_stopping_patience": 5,
1330
+ "early_stopping_threshold": 0.0
1331
+ },
1332
+ "attributes": {
1333
+ "early_stopping_patience_counter": 0
1334
+ }
1335
+ },
1336
+ "TrainerControl": {
1337
+ "args": {
1338
+ "should_epoch_stop": false,
1339
+ "should_evaluate": false,
1340
+ "should_log": false,
1341
+ "should_save": true,
1342
+ "should_training_stop": false
1343
+ },
1344
+ "attributes": {}
1345
+ }
1346
+ },
1347
+ "total_flos": 0.0,
1348
+ "train_batch_size": 64,
1349
+ "trial_name": null,
1350
+ "trial_params": null
1351
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68854b8e47ac69eebd49c5870121ff0ff8db3121fee3cb68ea673141818acc69
3
+ size 5560
vocab.txt ADDED
The diff for this file is too large to render. See raw diff