yangheng commited on
Commit
11fc638
·
verified ·
1 Parent(s): e987d12

Upload 9 files

Browse files
README.MD ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ar
5
+ - zh
6
+ - nl
7
+ - fr
8
+ - ru
9
+ - es
10
+ - tr
11
+ tags:
12
+ - multilingual-sentiment-analysis
13
+ - sentiment-analysis
14
+ - aspect-based-sentiment-analysis
15
+ - deberta
16
+ - pyabsa
17
+ - efficient
18
+ - lightweight
19
+ - production-ready
20
+ - no-llm
21
+ license: mit
22
+ pipeline_tag: text-classification
23
+ widget:
24
+ - text: >-
25
+ The user interface is brilliant, but the documentation is a total mess.
26
+ [SEP] user interface [SEP]
27
+ - text: >-
28
+ The user interface is brilliant, but the documentation is a total mess.
29
+ [SEP] documentation [SEP]
30
+ ---
31
+
32
+ # State-of-the-Art Multilingual Sentiment Analysis
33
+
34
+ ## Multilingual -> English, Chinese, Arabic, Dutch, French, Russian, Spanish, Turkish, etc.
35
+
36
+ Tired of the high costs, slow latency, and massive computational footprint of Large Language Models? This is the sentiment analysis model you've been waiting for.
37
+
38
+ **`deberta-v3-base-absa-v1.1`** delivers **state-of-the-art accuracy** for fine-grained sentiment analysis with the speed, efficiency, and simplicity of a classic encoder model. It represents a paradigm shift in production-ready AI: maximum performance with minimum operational burden.
39
+
40
+ ### Why This Model?
41
+ - **🎯 Wide Usage:** This model reaches **one million downloads** already! (Maybe) the most downloaded open-source ABSA model ever.
42
+ - **🏆 SOTA Performance:** Built on the powerful `DeBERTa-v3` architecture and fine-tuned with advanced, context-aware methods from [PyABSA](https://github.com/yangheng95/PyABSA), this model achieves top-tier accuracy on complex sentiment tasks.
43
+ - **⚡ LLM-Free Efficiency:** No need for A100s or massive GPU clusters. This model runs inference at a fraction of the computational cost, enabling real-time performance on standard CPUs or modest GPUs.
44
+ - **💰 Lower Costs:** Slash your hosting and API call expenses. The small footprint and high efficiency translate directly to significant savings, whether you're a startup or an enterprise.
45
+ - **🚀 Production-Ready:** Lightweight, fast, and reliable. This model is built to be deployed at scale for applications that demand immediate and accurate sentiment feedback.
46
+
47
+ ### Ideal Use Cases
48
+
49
+ This model excels where speed, cost, and precision are critical:
50
+
51
+ - **Real-time Social Media Monitoring:** Analyze brand sentiment towards specific product features as it happens.
52
+ - **Intelligent Customer Support:** Automatically route tickets based on the sentiment towards different aspects of a complaint.
53
+ - **Product Review Analysis:** Aggregate fine-grained feedback on thousands of reviews to identify precise strengths and weaknesses.
54
+ - **Market Intelligence:** Understand nuanced public opinion on key industry topics.
55
+
56
+ ## How to Use
57
+
58
+ Getting started is incredibly simple. You can use the Hugging Face `pipeline` for a zero-effort implementation.
59
+
60
+
61
+ from transformers import pipeline
62
+
63
+ ### Load the classifier pipeline - it's that easy.
64
+ ```python
65
+ classifier = pipeline("text-classification", model="yangheng/deberta-v3-base-absa-v1.1")
66
+ sentence = "The food was exceptional, although the service was a bit slow."
67
+ ```
68
+ ### Analyze sentiment for the 'food' aspect
69
+ ```python
70
+ result_food = classifier(sentence, text_pair="food")
71
+ result_food ->
72
+ {
73
+ 'Negative': 0.989
74
+ 'Neutral': 0.008
75
+ 'Positive': 0.003
76
+ }
77
+ ```
78
+ ### Analyze sentiment for the Chinese texts.
79
+ ```python
80
+ result_service = classifier("这部手机的性能差劲", text_pair="性能")
81
+ result_service = classifier("这台汽车的引擎推力强劲", text_pair="引擎")
82
+ ```
83
+
84
+ ## Using PyABSA for End-to-End Analysis
85
+ For a more powerful, end-to-end solution that handles both aspect term extraction and sentiment classification in a single call, you can use the PyABSA library. This is the very framework used to train and optimize this model.
86
+
87
+ First, install PyABSA:
88
+
89
+ ```bash
90
+ pip install pyabsa
91
+ ```
92
+ Then, you can perform inference like this. The model will automatically find the aspects in the text and classify their sentiment.
93
+
94
+ ```python3
95
+ from pyabsa import AspectTermExtraction as ATEPC, available_checkpoints
96
+
97
+ # Load the model directly from Hugging Face Hub
98
+ aspect_extractor = ATEPC.AspectExtractor(
99
+ 'multilingual', # Can be replaced with a specific checkpoint name or a local file path
100
+ auto_device=True, # Use GPU/CPU or Auto
101
+ cal_perplexity=True # Calculate text perplexity
102
+ )
103
+ texts = [
104
+ "这家餐厅的牛排很好吃,但是服务很慢。",
105
+ "The battery life is terrible but the camera is excellent."
106
+ ]
107
+ # Perform end-to-end aspect-based sentiment analysis
108
+ result = aspect_extractor.predict(
109
+ texts,
110
+ print_result=True, # Console Printing
111
+ save_result=False, # Save results into a json file
112
+ ignore_error=True, # Exception handling for error cases
113
+ pred_sentiment=True # Predict sentiment for extracted aspects
114
+ )
115
+
116
+ # The output automatically identifies aspects and their corresponding sentiments:
117
+ # {
118
+ # "text": "The user interface is brilliant, but the documentation is a total mess.",
119
+ # "aspect": ["user interface", "documentation"],
120
+ # "position": [[4, 19], [41, 54]],
121
+ # "sentiment": ["Positive", "Negative"],
122
+ # "probability": [[1e-05, 0.0001, 0.9998], [0.9998, 0.0001, 1e-05]],
123
+ # "confidence": [0.9997, 0.9997]
124
+ # }
125
+ ```
126
+ Find more solutions for ABSA tasks in PyASBA.
127
+
128
+ ## The Technology Behind the Performance
129
+
130
+ ### Base Model
131
+
132
+ It starts with `microsoft/deberta-v3-base`, a highly optimized encoder known for its disentangled attention mechanism, which improves efficiency and performance over original BERT/RoBERTa models.
133
+
134
+ ### Fine-Tuning Architecture
135
+
136
+ It employs the FAST-LCF-BERT backbone trained from the PyABSA framework. This introduces a Local Context Focus (LCF) layer that dynamically guides the model to concentrate on the words and phrases most relevant to the given aspect, dramatically improving contextual understanding and accuracy.
137
+
138
+ ### Training Data
139
+
140
+ This model was trained on a robust, aggregated corpus of over 30,000 unique samples (augmented to ~180,000 examples) from canonical ABSA datasets, including SemEval-2014, SemEval-2016, MAMS, and more. The standard test sets were excluded to ensure fair and reliable benchmarking.
141
+
142
+ ## Citation
143
+
144
+ If you use this model in your research or application, please cite the foundational work on the PyABSA framework.
145
+
146
+ ### BibTeX Citation
147
+
148
+ ```bibtex
149
+ @inproceedings{YangCL23PyABSA,
150
+ author = {Heng Yang and Chen Zhang and Ke Li},
151
+ title = {PyABSA: {A} Modularized Framework for Reproducible Aspect-based Sentiment Analysis},
152
+ booktitle = {Proceedings of the 32nd {ACM} International Conference on Information and Knowledge Management, {CIKM} 2023},
153
+ pages = {5117--5122},
154
+ publisher = {{ACM}},
155
+ year = {2023},
156
+ doi = {10.1145/3583780.3614752}
157
+ }
158
+
159
+ @inproceedings{YangL24LCF/LCA,
160
+ author = {Heng Yang and
161
+ Ke Li},
162
+ editor = {Yvette Graham and
163
+ Matthew Purver},
164
+ title = {Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation},
165
+ booktitle = {Findings of the Association for Computational Linguistics: {EACL}
166
+ 2024, St. Julian's, Malta, March 17-22, 2024},
167
+ pages = {182--195},
168
+ publisher = {Association for Computational Linguistics},
169
+ year = {2024},
170
+ url = {https://aclanthology.org/2024.findings-eacl.13},
171
+ timestamp = {Tue, 23 Jul 2024 08:21:59 +0200},
172
+ biburl = {https://dblp.org/rec/conf/eacl/YangL24.bib},
173
+ bibsource = {dblp computer science bibliography, https://dblp.org}
174
+ }
175
+ ```
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_num_labels": 3,
3
+ "architectures": [
4
+ "DebertaV2ForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "B-ASP-Unknown",
12
+ "1": "B-ASP-Negative",
13
+ "2": "B-ASP-Neutral",
14
+ "3": "B-ASP-Positive",
15
+ "4": "I-ASP-Unknown",
16
+ "5": "I-ASP-Negative",
17
+ "6": "I-ASP-Neutral",
18
+ "7": "I-ASP-Positive",
19
+ "8": "O"
20
+ },
21
+ "initializer_range": 0.02,
22
+ "intermediate_size": 3072,
23
+ "label2id": {
24
+ "B-ASP-Unknown": 0,
25
+ "B-ASP-Negative": 1,
26
+ "B-ASP-Neutral": 2,
27
+ "B-ASP-Positive": 3,
28
+ "I-ASP-Unknown": 4,
29
+ "I-ASP-Negative": 5,
30
+ "I-ASP-Neutral": 6,
31
+ "I-ASP-Positive": 7,
32
+ "O": 8
33
+ },
34
+ "layer_norm_eps": 1e-07,
35
+ "legacy": true,
36
+ "max_position_embeddings": 512,
37
+ "max_relative_positions": -1,
38
+ "model_type": "deberta-v2",
39
+ "norm_rel_ebd": "layer_norm",
40
+ "num_attention_heads": 12,
41
+ "num_hidden_layers": 12,
42
+ "pad_token_id": 0,
43
+ "pooler_dropout": 0,
44
+ "pooler_hidden_act": "gelu",
45
+ "pooler_hidden_size": 768,
46
+ "pos_att_type": [
47
+ "p2c",
48
+ "c2p"
49
+ ],
50
+ "position_biased_input": false,
51
+ "position_buckets": 256,
52
+ "relative_attention": true,
53
+ "share_att_key": true,
54
+ "torch_dtype": "float32",
55
+ "transformers_version": "4.55.0",
56
+ "type_vocab_size": 0,
57
+ "vocab_size": 128100
58
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8812932976c1a9f5668689b2675770aa63617a6370e12518b111825b84d0ffde
3
+ size 735378268
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "sp_model_kwargs": {},
55
+ "split_by_punct": false,
56
+ "tokenizer_class": "DebertaV2Tokenizer",
57
+ "unk_token": "[UNK]",
58
+ "vocab_type": "spm"
59
+ }
trainer_state.json ADDED
@@ -0,0 +1,979 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.9940119760479043,
6
+ "eval_steps": 500,
7
+ "global_step": 13500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.022177866489243733,
14
+ "grad_norm": 4.408344745635986,
15
+ "learning_rate": 2.3645320197044334e-06,
16
+ "loss": 1.7171,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.04435573297848747,
21
+ "grad_norm": 0.8857895135879517,
22
+ "learning_rate": 4.8275862068965525e-06,
23
+ "loss": 0.6279,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.0665335994677312,
28
+ "grad_norm": 2.241079807281494,
29
+ "learning_rate": 7.290640394088671e-06,
30
+ "loss": 0.5064,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.08871146595697493,
35
+ "grad_norm": 1.0983695983886719,
36
+ "learning_rate": 9.75369458128079e-06,
37
+ "loss": 0.4626,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.11088933244621868,
42
+ "grad_norm": 1.031287670135498,
43
+ "learning_rate": 1.2216748768472909e-05,
44
+ "loss": 0.423,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.1330671989354624,
49
+ "grad_norm": 1.961317777633667,
50
+ "learning_rate": 1.4679802955665026e-05,
51
+ "loss": 0.4184,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.15524506542470615,
56
+ "grad_norm": 1.3524340391159058,
57
+ "learning_rate": 1.7142857142857142e-05,
58
+ "loss": 0.397,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.17742293191394987,
63
+ "grad_norm": 1.7465412616729736,
64
+ "learning_rate": 1.9605911330049263e-05,
65
+ "loss": 0.3795,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.1996007984031936,
70
+ "grad_norm": 1.1473759412765503,
71
+ "learning_rate": 1.986787259142745e-05,
72
+ "loss": 0.3657,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.22177866489243736,
77
+ "grad_norm": 1.5489747524261475,
78
+ "learning_rate": 1.9710578057412507e-05,
79
+ "loss": 0.3726,
80
+ "step": 1000
81
+ },
82
+ {
83
+ "epoch": 0.24395653138168108,
84
+ "grad_norm": 1.2034003734588623,
85
+ "learning_rate": 1.9553283523397563e-05,
86
+ "loss": 0.3502,
87
+ "step": 1100
88
+ },
89
+ {
90
+ "epoch": 0.2661343978709248,
91
+ "grad_norm": 2.0690724849700928,
92
+ "learning_rate": 1.939598898938262e-05,
93
+ "loss": 0.3518,
94
+ "step": 1200
95
+ },
96
+ {
97
+ "epoch": 0.28831226436016855,
98
+ "grad_norm": 1.9681050777435303,
99
+ "learning_rate": 1.9238694455367677e-05,
100
+ "loss": 0.3361,
101
+ "step": 1300
102
+ },
103
+ {
104
+ "epoch": 0.3104901308494123,
105
+ "grad_norm": 1.3863286972045898,
106
+ "learning_rate": 1.9081399921352733e-05,
107
+ "loss": 0.3397,
108
+ "step": 1400
109
+ },
110
+ {
111
+ "epoch": 0.33266799733865604,
112
+ "grad_norm": 0.9572322964668274,
113
+ "learning_rate": 1.8924105387337793e-05,
114
+ "loss": 0.3339,
115
+ "step": 1500
116
+ },
117
+ {
118
+ "epoch": 0.35484586382789973,
119
+ "grad_norm": 0.892398476600647,
120
+ "learning_rate": 1.8766810853322847e-05,
121
+ "loss": 0.3218,
122
+ "step": 1600
123
+ },
124
+ {
125
+ "epoch": 0.3770237303171435,
126
+ "grad_norm": 1.2381540536880493,
127
+ "learning_rate": 1.8609516319307907e-05,
128
+ "loss": 0.3255,
129
+ "step": 1700
130
+ },
131
+ {
132
+ "epoch": 0.3992015968063872,
133
+ "grad_norm": 0.8742302060127258,
134
+ "learning_rate": 1.8452221785292963e-05,
135
+ "loss": 0.3166,
136
+ "step": 1800
137
+ },
138
+ {
139
+ "epoch": 0.421379463295631,
140
+ "grad_norm": 1.0703165531158447,
141
+ "learning_rate": 1.829492725127802e-05,
142
+ "loss": 0.3098,
143
+ "step": 1900
144
+ },
145
+ {
146
+ "epoch": 0.4435573297848747,
147
+ "grad_norm": 1.6606981754302979,
148
+ "learning_rate": 1.8137632717263076e-05,
149
+ "loss": 0.3102,
150
+ "step": 2000
151
+ },
152
+ {
153
+ "epoch": 0.4657351962741184,
154
+ "grad_norm": 1.0174481868743896,
155
+ "learning_rate": 1.7980338183248133e-05,
156
+ "loss": 0.3061,
157
+ "step": 2100
158
+ },
159
+ {
160
+ "epoch": 0.48791306276336216,
161
+ "grad_norm": 0.9234058856964111,
162
+ "learning_rate": 1.7823043649233193e-05,
163
+ "loss": 0.3023,
164
+ "step": 2200
165
+ },
166
+ {
167
+ "epoch": 0.5100909292526059,
168
+ "grad_norm": 0.8972137570381165,
169
+ "learning_rate": 1.7665749115218246e-05,
170
+ "loss": 0.3062,
171
+ "step": 2300
172
+ },
173
+ {
174
+ "epoch": 0.5322687957418496,
175
+ "grad_norm": 0.7803289890289307,
176
+ "learning_rate": 1.7508454581203306e-05,
177
+ "loss": 0.2996,
178
+ "step": 2400
179
+ },
180
+ {
181
+ "epoch": 0.5544466622310934,
182
+ "grad_norm": 0.879205584526062,
183
+ "learning_rate": 1.7351160047188363e-05,
184
+ "loss": 0.303,
185
+ "step": 2500
186
+ },
187
+ {
188
+ "epoch": 0.5766245287203371,
189
+ "grad_norm": 1.0589395761489868,
190
+ "learning_rate": 1.719386551317342e-05,
191
+ "loss": 0.2876,
192
+ "step": 2600
193
+ },
194
+ {
195
+ "epoch": 0.5988023952095808,
196
+ "grad_norm": 0.9810135960578918,
197
+ "learning_rate": 1.7036570979158476e-05,
198
+ "loss": 0.2841,
199
+ "step": 2700
200
+ },
201
+ {
202
+ "epoch": 0.6209802616988246,
203
+ "grad_norm": 0.835926353931427,
204
+ "learning_rate": 1.6879276445143533e-05,
205
+ "loss": 0.2861,
206
+ "step": 2800
207
+ },
208
+ {
209
+ "epoch": 0.6431581281880683,
210
+ "grad_norm": 0.9618144631385803,
211
+ "learning_rate": 1.672198191112859e-05,
212
+ "loss": 0.2881,
213
+ "step": 2900
214
+ },
215
+ {
216
+ "epoch": 0.6653359946773121,
217
+ "grad_norm": 1.2271337509155273,
218
+ "learning_rate": 1.6564687377113646e-05,
219
+ "loss": 0.2795,
220
+ "step": 3000
221
+ },
222
+ {
223
+ "epoch": 0.6875138611665558,
224
+ "grad_norm": 0.933788537979126,
225
+ "learning_rate": 1.6407392843098702e-05,
226
+ "loss": 0.2758,
227
+ "step": 3100
228
+ },
229
+ {
230
+ "epoch": 0.7096917276557995,
231
+ "grad_norm": 1.3361326456069946,
232
+ "learning_rate": 1.6250098309083762e-05,
233
+ "loss": 0.2755,
234
+ "step": 3200
235
+ },
236
+ {
237
+ "epoch": 0.7318695941450433,
238
+ "grad_norm": 0.9134598970413208,
239
+ "learning_rate": 1.6092803775068816e-05,
240
+ "loss": 0.2693,
241
+ "step": 3300
242
+ },
243
+ {
244
+ "epoch": 0.754047460634287,
245
+ "grad_norm": 0.8436419367790222,
246
+ "learning_rate": 1.5935509241053876e-05,
247
+ "loss": 0.2709,
248
+ "step": 3400
249
+ },
250
+ {
251
+ "epoch": 0.7762253271235308,
252
+ "grad_norm": 0.7325775623321533,
253
+ "learning_rate": 1.5778214707038932e-05,
254
+ "loss": 0.2766,
255
+ "step": 3500
256
+ },
257
+ {
258
+ "epoch": 0.7984031936127745,
259
+ "grad_norm": 0.9576388597488403,
260
+ "learning_rate": 1.562092017302399e-05,
261
+ "loss": 0.2683,
262
+ "step": 3600
263
+ },
264
+ {
265
+ "epoch": 0.8205810601020181,
266
+ "grad_norm": 0.812353789806366,
267
+ "learning_rate": 1.5463625639009045e-05,
268
+ "loss": 0.2643,
269
+ "step": 3700
270
+ },
271
+ {
272
+ "epoch": 0.842758926591262,
273
+ "grad_norm": 1.00551176071167,
274
+ "learning_rate": 1.5306331104994102e-05,
275
+ "loss": 0.2696,
276
+ "step": 3800
277
+ },
278
+ {
279
+ "epoch": 0.8649367930805056,
280
+ "grad_norm": 0.7504218816757202,
281
+ "learning_rate": 1.5149036570979159e-05,
282
+ "loss": 0.262,
283
+ "step": 3900
284
+ },
285
+ {
286
+ "epoch": 0.8871146595697494,
287
+ "grad_norm": 0.6838926076889038,
288
+ "learning_rate": 1.4991742036964217e-05,
289
+ "loss": 0.2582,
290
+ "step": 4000
291
+ },
292
+ {
293
+ "epoch": 0.9092925260589931,
294
+ "grad_norm": 0.9068514108657837,
295
+ "learning_rate": 1.4834447502949274e-05,
296
+ "loss": 0.2613,
297
+ "step": 4100
298
+ },
299
+ {
300
+ "epoch": 0.9314703925482368,
301
+ "grad_norm": 0.8156359791755676,
302
+ "learning_rate": 1.4677152968934332e-05,
303
+ "loss": 0.2575,
304
+ "step": 4200
305
+ },
306
+ {
307
+ "epoch": 0.9536482590374806,
308
+ "grad_norm": 0.8061220049858093,
309
+ "learning_rate": 1.4519858434919387e-05,
310
+ "loss": 0.2512,
311
+ "step": 4300
312
+ },
313
+ {
314
+ "epoch": 0.9758261255267243,
315
+ "grad_norm": 0.7665420174598694,
316
+ "learning_rate": 1.4362563900904445e-05,
317
+ "loss": 0.2551,
318
+ "step": 4400
319
+ },
320
+ {
321
+ "epoch": 0.998003992015968,
322
+ "grad_norm": 1.094953179359436,
323
+ "learning_rate": 1.4205269366889502e-05,
324
+ "loss": 0.2515,
325
+ "step": 4500
326
+ },
327
+ {
328
+ "epoch": 1.0201818585052118,
329
+ "grad_norm": 1.0698802471160889,
330
+ "learning_rate": 1.4047974832874558e-05,
331
+ "loss": 0.2425,
332
+ "step": 4600
333
+ },
334
+ {
335
+ "epoch": 1.0423597249944556,
336
+ "grad_norm": 0.9805143475532532,
337
+ "learning_rate": 1.3890680298859615e-05,
338
+ "loss": 0.2353,
339
+ "step": 4700
340
+ },
341
+ {
342
+ "epoch": 1.0645375914836992,
343
+ "grad_norm": 1.0466519594192505,
344
+ "learning_rate": 1.3733385764844673e-05,
345
+ "loss": 0.2449,
346
+ "step": 4800
347
+ },
348
+ {
349
+ "epoch": 1.086715457972943,
350
+ "grad_norm": 0.9419561624526978,
351
+ "learning_rate": 1.3576091230829728e-05,
352
+ "loss": 0.2362,
353
+ "step": 4900
354
+ },
355
+ {
356
+ "epoch": 1.1088933244621868,
357
+ "grad_norm": 0.9370637536048889,
358
+ "learning_rate": 1.3418796696814786e-05,
359
+ "loss": 0.2327,
360
+ "step": 5000
361
+ },
362
+ {
363
+ "epoch": 1.1310711909514304,
364
+ "grad_norm": 0.7672102451324463,
365
+ "learning_rate": 1.3261502162799845e-05,
366
+ "loss": 0.2337,
367
+ "step": 5100
368
+ },
369
+ {
370
+ "epoch": 1.1532490574406742,
371
+ "grad_norm": 1.0745601654052734,
372
+ "learning_rate": 1.3104207628784901e-05,
373
+ "loss": 0.24,
374
+ "step": 5200
375
+ },
376
+ {
377
+ "epoch": 1.175426923929918,
378
+ "grad_norm": 1.0820897817611694,
379
+ "learning_rate": 1.2946913094769958e-05,
380
+ "loss": 0.2271,
381
+ "step": 5300
382
+ },
383
+ {
384
+ "epoch": 1.1976047904191618,
385
+ "grad_norm": 1.155911922454834,
386
+ "learning_rate": 1.2789618560755015e-05,
387
+ "loss": 0.2361,
388
+ "step": 5400
389
+ },
390
+ {
391
+ "epoch": 1.2197826569084054,
392
+ "grad_norm": 0.9654746651649475,
393
+ "learning_rate": 1.2632324026740073e-05,
394
+ "loss": 0.2389,
395
+ "step": 5500
396
+ },
397
+ {
398
+ "epoch": 1.2419605233976492,
399
+ "grad_norm": 1.0573245286941528,
400
+ "learning_rate": 1.2475029492725128e-05,
401
+ "loss": 0.2264,
402
+ "step": 5600
403
+ },
404
+ {
405
+ "epoch": 1.264138389886893,
406
+ "grad_norm": 1.3749500513076782,
407
+ "learning_rate": 1.2317734958710186e-05,
408
+ "loss": 0.229,
409
+ "step": 5700
410
+ },
411
+ {
412
+ "epoch": 1.2863162563761366,
413
+ "grad_norm": 0.9389622211456299,
414
+ "learning_rate": 1.2160440424695243e-05,
415
+ "loss": 0.2277,
416
+ "step": 5800
417
+ },
418
+ {
419
+ "epoch": 1.3084941228653804,
420
+ "grad_norm": 1.2547938823699951,
421
+ "learning_rate": 1.20031458906803e-05,
422
+ "loss": 0.2265,
423
+ "step": 5900
424
+ },
425
+ {
426
+ "epoch": 1.3306719893546242,
427
+ "grad_norm": 1.1487092971801758,
428
+ "learning_rate": 1.1845851356665356e-05,
429
+ "loss": 0.2266,
430
+ "step": 6000
431
+ },
432
+ {
433
+ "epoch": 1.3528498558438677,
434
+ "grad_norm": 0.6461149454116821,
435
+ "learning_rate": 1.1688556822650414e-05,
436
+ "loss": 0.2235,
437
+ "step": 6100
438
+ },
439
+ {
440
+ "epoch": 1.3750277223331115,
441
+ "grad_norm": 0.8437641859054565,
442
+ "learning_rate": 1.1531262288635473e-05,
443
+ "loss": 0.2266,
444
+ "step": 6200
445
+ },
446
+ {
447
+ "epoch": 1.3972055888223553,
448
+ "grad_norm": 0.8984001278877258,
449
+ "learning_rate": 1.1373967754620527e-05,
450
+ "loss": 0.2195,
451
+ "step": 6300
452
+ },
453
+ {
454
+ "epoch": 1.419383455311599,
455
+ "grad_norm": 1.1755112409591675,
456
+ "learning_rate": 1.1216673220605586e-05,
457
+ "loss": 0.2168,
458
+ "step": 6400
459
+ },
460
+ {
461
+ "epoch": 1.4415613218008427,
462
+ "grad_norm": 1.250999927520752,
463
+ "learning_rate": 1.1059378686590642e-05,
464
+ "loss": 0.2214,
465
+ "step": 6500
466
+ },
467
+ {
468
+ "epoch": 1.4637391882900865,
469
+ "grad_norm": 1.2418690919876099,
470
+ "learning_rate": 1.0902084152575699e-05,
471
+ "loss": 0.2196,
472
+ "step": 6600
473
+ },
474
+ {
475
+ "epoch": 1.4859170547793301,
476
+ "grad_norm": 0.9416905641555786,
477
+ "learning_rate": 1.0744789618560756e-05,
478
+ "loss": 0.2237,
479
+ "step": 6700
480
+ },
481
+ {
482
+ "epoch": 1.508094921268574,
483
+ "grad_norm": 0.9549462199211121,
484
+ "learning_rate": 1.0587495084545814e-05,
485
+ "loss": 0.2231,
486
+ "step": 6800
487
+ },
488
+ {
489
+ "epoch": 1.5302727877578177,
490
+ "grad_norm": 0.9897739291191101,
491
+ "learning_rate": 1.0430200550530869e-05,
492
+ "loss": 0.221,
493
+ "step": 6900
494
+ },
495
+ {
496
+ "epoch": 1.5524506542470613,
497
+ "grad_norm": 1.0174314975738525,
498
+ "learning_rate": 1.0272906016515927e-05,
499
+ "loss": 0.2193,
500
+ "step": 7000
501
+ },
502
+ {
503
+ "epoch": 1.5746285207363053,
504
+ "grad_norm": 0.8986598253250122,
505
+ "learning_rate": 1.0115611482500984e-05,
506
+ "loss": 0.2114,
507
+ "step": 7100
508
+ },
509
+ {
510
+ "epoch": 1.596806387225549,
511
+ "grad_norm": 0.7662016749382019,
512
+ "learning_rate": 9.95831694848604e-06,
513
+ "loss": 0.2162,
514
+ "step": 7200
515
+ },
516
+ {
517
+ "epoch": 1.6189842537147925,
518
+ "grad_norm": 0.875023603439331,
519
+ "learning_rate": 9.801022414471097e-06,
520
+ "loss": 0.2093,
521
+ "step": 7300
522
+ },
523
+ {
524
+ "epoch": 1.6411621202040365,
525
+ "grad_norm": 1.059648036956787,
526
+ "learning_rate": 9.643727880456155e-06,
527
+ "loss": 0.2114,
528
+ "step": 7400
529
+ },
530
+ {
531
+ "epoch": 1.66333998669328,
532
+ "grad_norm": 1.2008799314498901,
533
+ "learning_rate": 9.486433346441212e-06,
534
+ "loss": 0.2129,
535
+ "step": 7500
536
+ },
537
+ {
538
+ "epoch": 1.685517853182524,
539
+ "grad_norm": 1.009397029876709,
540
+ "learning_rate": 9.32913881242627e-06,
541
+ "loss": 0.2069,
542
+ "step": 7600
543
+ },
544
+ {
545
+ "epoch": 1.7076957196717677,
546
+ "grad_norm": 0.9461073875427246,
547
+ "learning_rate": 9.171844278411327e-06,
548
+ "loss": 0.2109,
549
+ "step": 7700
550
+ },
551
+ {
552
+ "epoch": 1.7298735861610113,
553
+ "grad_norm": 0.7946839332580566,
554
+ "learning_rate": 9.014549744396383e-06,
555
+ "loss": 0.2051,
556
+ "step": 7800
557
+ },
558
+ {
559
+ "epoch": 1.752051452650255,
560
+ "grad_norm": 1.0686787366867065,
561
+ "learning_rate": 8.85725521038144e-06,
562
+ "loss": 0.2114,
563
+ "step": 7900
564
+ },
565
+ {
566
+ "epoch": 1.7742293191394989,
567
+ "grad_norm": 1.1309982538223267,
568
+ "learning_rate": 8.699960676366497e-06,
569
+ "loss": 0.2113,
570
+ "step": 8000
571
+ },
572
+ {
573
+ "epoch": 1.7964071856287425,
574
+ "grad_norm": 0.8873094320297241,
575
+ "learning_rate": 8.542666142351555e-06,
576
+ "loss": 0.2032,
577
+ "step": 8100
578
+ },
579
+ {
580
+ "epoch": 1.8185850521179863,
581
+ "grad_norm": 1.1685720682144165,
582
+ "learning_rate": 8.385371608336611e-06,
583
+ "loss": 0.2046,
584
+ "step": 8200
585
+ },
586
+ {
587
+ "epoch": 1.84076291860723,
588
+ "grad_norm": 1.1391305923461914,
589
+ "learning_rate": 8.228077074321668e-06,
590
+ "loss": 0.2059,
591
+ "step": 8300
592
+ },
593
+ {
594
+ "epoch": 1.8629407850964737,
595
+ "grad_norm": 1.0028046369552612,
596
+ "learning_rate": 8.070782540306725e-06,
597
+ "loss": 0.2051,
598
+ "step": 8400
599
+ },
600
+ {
601
+ "epoch": 1.8851186515857175,
602
+ "grad_norm": 1.3470697402954102,
603
+ "learning_rate": 7.913488006291781e-06,
604
+ "loss": 0.2059,
605
+ "step": 8500
606
+ },
607
+ {
608
+ "epoch": 1.9072965180749613,
609
+ "grad_norm": 1.290456771850586,
610
+ "learning_rate": 7.75619347227684e-06,
611
+ "loss": 0.1995,
612
+ "step": 8600
613
+ },
614
+ {
615
+ "epoch": 1.9294743845642048,
616
+ "grad_norm": 0.7506065964698792,
617
+ "learning_rate": 7.598898938261896e-06,
618
+ "loss": 0.2011,
619
+ "step": 8700
620
+ },
621
+ {
622
+ "epoch": 1.9516522510534486,
623
+ "grad_norm": 1.170919418334961,
624
+ "learning_rate": 7.441604404246953e-06,
625
+ "loss": 0.2017,
626
+ "step": 8800
627
+ },
628
+ {
629
+ "epoch": 1.9738301175426924,
630
+ "grad_norm": 1.1888222694396973,
631
+ "learning_rate": 7.28430987023201e-06,
632
+ "loss": 0.1998,
633
+ "step": 8900
634
+ },
635
+ {
636
+ "epoch": 1.996007984031936,
637
+ "grad_norm": 1.1401287317276,
638
+ "learning_rate": 7.127015336217067e-06,
639
+ "loss": 0.1996,
640
+ "step": 9000
641
+ },
642
+ {
643
+ "epoch": 2.01818585052118,
644
+ "grad_norm": 1.0609304904937744,
645
+ "learning_rate": 6.969720802202124e-06,
646
+ "loss": 0.194,
647
+ "step": 9100
648
+ },
649
+ {
650
+ "epoch": 2.0403637170104236,
651
+ "grad_norm": 0.7136222124099731,
652
+ "learning_rate": 6.812426268187181e-06,
653
+ "loss": 0.1907,
654
+ "step": 9200
655
+ },
656
+ {
657
+ "epoch": 2.062541583499667,
658
+ "grad_norm": 0.9201442003250122,
659
+ "learning_rate": 6.6551317341722375e-06,
660
+ "loss": 0.1899,
661
+ "step": 9300
662
+ },
663
+ {
664
+ "epoch": 2.0847194499889112,
665
+ "grad_norm": 1.034180998802185,
666
+ "learning_rate": 6.497837200157295e-06,
667
+ "loss": 0.1905,
668
+ "step": 9400
669
+ },
670
+ {
671
+ "epoch": 2.106897316478155,
672
+ "grad_norm": 1.2538888454437256,
673
+ "learning_rate": 6.340542666142352e-06,
674
+ "loss": 0.1895,
675
+ "step": 9500
676
+ },
677
+ {
678
+ "epoch": 2.1290751829673984,
679
+ "grad_norm": 1.1865867376327515,
680
+ "learning_rate": 6.18324813212741e-06,
681
+ "loss": 0.1903,
682
+ "step": 9600
683
+ },
684
+ {
685
+ "epoch": 2.1512530494566424,
686
+ "grad_norm": 1.1879113912582397,
687
+ "learning_rate": 6.0259535981124665e-06,
688
+ "loss": 0.1827,
689
+ "step": 9700
690
+ },
691
+ {
692
+ "epoch": 2.173430915945886,
693
+ "grad_norm": 0.959338903427124,
694
+ "learning_rate": 5.868659064097523e-06,
695
+ "loss": 0.1871,
696
+ "step": 9800
697
+ },
698
+ {
699
+ "epoch": 2.1956087824351296,
700
+ "grad_norm": 1.0765694379806519,
701
+ "learning_rate": 5.7113645300825806e-06,
702
+ "loss": 0.1904,
703
+ "step": 9900
704
+ },
705
+ {
706
+ "epoch": 2.2177866489243736,
707
+ "grad_norm": 1.1562960147857666,
708
+ "learning_rate": 5.554069996067637e-06,
709
+ "loss": 0.1852,
710
+ "step": 10000
711
+ },
712
+ {
713
+ "epoch": 2.239964515413617,
714
+ "grad_norm": 1.1772807836532593,
715
+ "learning_rate": 5.396775462052695e-06,
716
+ "loss": 0.1875,
717
+ "step": 10100
718
+ },
719
+ {
720
+ "epoch": 2.2621423819028608,
721
+ "grad_norm": 0.9771366715431213,
722
+ "learning_rate": 5.239480928037751e-06,
723
+ "loss": 0.1899,
724
+ "step": 10200
725
+ },
726
+ {
727
+ "epoch": 2.284320248392105,
728
+ "grad_norm": 0.7828590273857117,
729
+ "learning_rate": 5.082186394022808e-06,
730
+ "loss": 0.1846,
731
+ "step": 10300
732
+ },
733
+ {
734
+ "epoch": 2.3064981148813484,
735
+ "grad_norm": 1.0688682794570923,
736
+ "learning_rate": 4.924891860007865e-06,
737
+ "loss": 0.186,
738
+ "step": 10400
739
+ },
740
+ {
741
+ "epoch": 2.3286759813705924,
742
+ "grad_norm": 1.2667362689971924,
743
+ "learning_rate": 4.767597325992922e-06,
744
+ "loss": 0.186,
745
+ "step": 10500
746
+ },
747
+ {
748
+ "epoch": 2.350853847859836,
749
+ "grad_norm": 0.9742441177368164,
750
+ "learning_rate": 4.610302791977979e-06,
751
+ "loss": 0.1822,
752
+ "step": 10600
753
+ },
754
+ {
755
+ "epoch": 2.3730317143490796,
756
+ "grad_norm": 0.8631011843681335,
757
+ "learning_rate": 4.453008257963036e-06,
758
+ "loss": 0.1789,
759
+ "step": 10700
760
+ },
761
+ {
762
+ "epoch": 2.3952095808383236,
763
+ "grad_norm": 0.7579483985900879,
764
+ "learning_rate": 4.2957137239480934e-06,
765
+ "loss": 0.1865,
766
+ "step": 10800
767
+ },
768
+ {
769
+ "epoch": 2.417387447327567,
770
+ "grad_norm": 0.8615408539772034,
771
+ "learning_rate": 4.13841918993315e-06,
772
+ "loss": 0.1805,
773
+ "step": 10900
774
+ },
775
+ {
776
+ "epoch": 2.4395653138168107,
777
+ "grad_norm": 1.0644463300704956,
778
+ "learning_rate": 3.9811246559182075e-06,
779
+ "loss": 0.1849,
780
+ "step": 11000
781
+ },
782
+ {
783
+ "epoch": 2.4617431803060548,
784
+ "grad_norm": 0.9933910965919495,
785
+ "learning_rate": 3.823830121903264e-06,
786
+ "loss": 0.1846,
787
+ "step": 11100
788
+ },
789
+ {
790
+ "epoch": 2.4839210467952983,
791
+ "grad_norm": 1.011958360671997,
792
+ "learning_rate": 3.666535587888321e-06,
793
+ "loss": 0.1863,
794
+ "step": 11200
795
+ },
796
+ {
797
+ "epoch": 2.506098913284542,
798
+ "grad_norm": 1.0306683778762817,
799
+ "learning_rate": 3.5092410538733786e-06,
800
+ "loss": 0.1853,
801
+ "step": 11300
802
+ },
803
+ {
804
+ "epoch": 2.528276779773786,
805
+ "grad_norm": 1.0129719972610474,
806
+ "learning_rate": 3.351946519858435e-06,
807
+ "loss": 0.1855,
808
+ "step": 11400
809
+ },
810
+ {
811
+ "epoch": 2.5504546462630295,
812
+ "grad_norm": 1.0215705633163452,
813
+ "learning_rate": 3.1946519858434922e-06,
814
+ "loss": 0.1867,
815
+ "step": 11500
816
+ },
817
+ {
818
+ "epoch": 2.572632512752273,
819
+ "grad_norm": 1.202038288116455,
820
+ "learning_rate": 3.0373574518285493e-06,
821
+ "loss": 0.1839,
822
+ "step": 11600
823
+ },
824
+ {
825
+ "epoch": 2.594810379241517,
826
+ "grad_norm": 1.19171142578125,
827
+ "learning_rate": 2.8800629178136063e-06,
828
+ "loss": 0.1776,
829
+ "step": 11700
830
+ },
831
+ {
832
+ "epoch": 2.6169882457307607,
833
+ "grad_norm": 1.0898429155349731,
834
+ "learning_rate": 2.7227683837986633e-06,
835
+ "loss": 0.178,
836
+ "step": 11800
837
+ },
838
+ {
839
+ "epoch": 2.6391661122200043,
840
+ "grad_norm": 1.005279779434204,
841
+ "learning_rate": 2.56547384978372e-06,
842
+ "loss": 0.1811,
843
+ "step": 11900
844
+ },
845
+ {
846
+ "epoch": 2.6613439787092483,
847
+ "grad_norm": 1.0780277252197266,
848
+ "learning_rate": 2.408179315768777e-06,
849
+ "loss": 0.1831,
850
+ "step": 12000
851
+ },
852
+ {
853
+ "epoch": 2.683521845198492,
854
+ "grad_norm": 1.318746566772461,
855
+ "learning_rate": 2.252457727093984e-06,
856
+ "loss": 0.1835,
857
+ "step": 12100
858
+ },
859
+ {
860
+ "epoch": 2.7056997116877355,
861
+ "grad_norm": 1.289838433265686,
862
+ "learning_rate": 2.0951631930790405e-06,
863
+ "loss": 0.1813,
864
+ "step": 12200
865
+ },
866
+ {
867
+ "epoch": 2.7278775781769795,
868
+ "grad_norm": 0.806324303150177,
869
+ "learning_rate": 1.9378686590640976e-06,
870
+ "loss": 0.1778,
871
+ "step": 12300
872
+ },
873
+ {
874
+ "epoch": 2.750055444666223,
875
+ "grad_norm": 1.2230814695358276,
876
+ "learning_rate": 1.7805741250491546e-06,
877
+ "loss": 0.1797,
878
+ "step": 12400
879
+ },
880
+ {
881
+ "epoch": 2.7722333111554667,
882
+ "grad_norm": 1.0323050022125244,
883
+ "learning_rate": 1.6232795910342116e-06,
884
+ "loss": 0.1832,
885
+ "step": 12500
886
+ },
887
+ {
888
+ "epoch": 2.7944111776447107,
889
+ "grad_norm": 0.9353643655776978,
890
+ "learning_rate": 1.4659850570192689e-06,
891
+ "loss": 0.1828,
892
+ "step": 12600
893
+ },
894
+ {
895
+ "epoch": 2.8165890441339543,
896
+ "grad_norm": 0.8385490775108337,
897
+ "learning_rate": 1.3086905230043257e-06,
898
+ "loss": 0.1763,
899
+ "step": 12700
900
+ },
901
+ {
902
+ "epoch": 2.838766910623198,
903
+ "grad_norm": 0.9432787299156189,
904
+ "learning_rate": 1.1513959889893827e-06,
905
+ "loss": 0.18,
906
+ "step": 12800
907
+ },
908
+ {
909
+ "epoch": 2.860944777112442,
910
+ "grad_norm": 1.0854963064193726,
911
+ "learning_rate": 9.941014549744397e-07,
912
+ "loss": 0.1786,
913
+ "step": 12900
914
+ },
915
+ {
916
+ "epoch": 2.8831226436016855,
917
+ "grad_norm": 1.0914461612701416,
918
+ "learning_rate": 8.368069209594968e-07,
919
+ "loss": 0.1804,
920
+ "step": 13000
921
+ },
922
+ {
923
+ "epoch": 2.905300510090929,
924
+ "grad_norm": 0.8744707703590393,
925
+ "learning_rate": 6.795123869445537e-07,
926
+ "loss": 0.1776,
927
+ "step": 13100
928
+ },
929
+ {
930
+ "epoch": 2.927478376580173,
931
+ "grad_norm": 1.073390245437622,
932
+ "learning_rate": 5.222178529296107e-07,
933
+ "loss": 0.1797,
934
+ "step": 13200
935
+ },
936
+ {
937
+ "epoch": 2.9496562430694167,
938
+ "grad_norm": 1.0887576341629028,
939
+ "learning_rate": 3.6492331891466777e-07,
940
+ "loss": 0.1791,
941
+ "step": 13300
942
+ },
943
+ {
944
+ "epoch": 2.9718341095586602,
945
+ "grad_norm": 1.3841413259506226,
946
+ "learning_rate": 2.0762878489972477e-07,
947
+ "loss": 0.1792,
948
+ "step": 13400
949
+ },
950
+ {
951
+ "epoch": 2.9940119760479043,
952
+ "grad_norm": 1.0988340377807617,
953
+ "learning_rate": 5.033425088478176e-08,
954
+ "loss": 0.1834,
955
+ "step": 13500
956
+ }
957
+ ],
958
+ "logging_steps": 100,
959
+ "max_steps": 13527,
960
+ "num_input_tokens_seen": 0,
961
+ "num_train_epochs": 3,
962
+ "save_steps": 500,
963
+ "stateful_callbacks": {
964
+ "TrainerControl": {
965
+ "args": {
966
+ "should_epoch_stop": false,
967
+ "should_evaluate": false,
968
+ "should_log": false,
969
+ "should_save": true,
970
+ "should_training_stop": false
971
+ },
972
+ "attributes": {}
973
+ }
974
+ },
975
+ "total_flos": 8.765106604499366e+16,
976
+ "train_batch_size": 64,
977
+ "trial_name": null,
978
+ "trial_params": null
979
+ }