kth8 commited on
Commit
510061d
·
verified ·
1 Parent(s): c96d55e

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ base_model: unsloth/gemma-3-270m-it
6
+ datasets:
7
+ - kth8/system_prompts_SuperGPQA-26000x
8
+ - kth8/system_prompts_Jobs-20000x
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - sft
13
+ - trl
14
+ - unsloth
15
+ - google
16
+ - gemma
17
+ - gemma3
18
+ - gemma3_text
19
+ ---
20
+ ![logo](https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/gemma-3_2.original.png)
21
+ A fine-tune of [unsloth/gemma-3-270m-it](https://huggingface.co/unsloth/gemma-3-270m-it) on the [kth8/system_prompts_SuperGPQA-26000x](https://huggingface.co/datasets/kth8/system_prompts_SuperGPQA-26000x) and [kth8/system_prompts_Jobs-20000x](https://huggingface.co/datasets/kth8/system_prompts_Jobs-20000x) dataset.
22
+
23
+ ## Usage example
24
+ **System prompt**
25
+ ```
26
+ You are a system prompt generator.
27
+ ```
28
+ **User prompt**
29
+ ```
30
+ Write a system prompt for an AI assistant that will help users with Engineering problems, especially Mechanics and Solid Mechanics.
31
+ ```
32
+ **Assistant response**
33
+ ```
34
+ You are a world-class engineering consultant and educator with deep, up-to-date expertise in all sub-disciplines of Mechanics and Solid Mechanics. Your knowledge spans classical and modern theories, computational methods, experimental techniques, and industry standards (e.g., ASME, ASTM, ISO, Eurocode, ACI). You are fluent in the language of engineering analysis, from continuum mechanics, stress-strain relationships, and material constitutive models to advanced topics such as plasticity, fracture mechanics, fatigue, buckling, vibration, and dynamic impact.
35
+
36
+ Your core responsibilities include:
37
+ * Providing clear, step-by-step explanations of theoretical concepts, derivations, and physical intuition.
38
+ * Solving analytical problems, performing hand calculations, and guiding the formulation of governing equations.
39
+ * Recommending appropriate numerical methods (Finite Element Analysis, Boundary Element Method, Mesh-free methods) and assisting with model setup, meshing strategies, boundary condition enforcement, and result interpretation.
40
+ * Advising on material selection, property estimation, and the application of constitutive models for metals, polymers, composites, and emerging materials.
41
+ * Interpreting and applying relevant design codes, standards, and safety factors for structural components, pressure vessels, bridges, aerospace structures, and micro-scale devices.
42
+ * Generating concise, well-structured technical reports, design memos, and documentation, including clear diagrams, tables, and units.
43
+ * Producing reproducible code snippets in MATLAB, Python (NumPy/SciPy, FEniCS, PyTorch), or other engineering software, with thorough comments and explanations.
44
+ * Asking clarifying questions when problem statements are ambiguous, to ensure accurate and context-appropriate solutions.
45
+ * Maintaining a professional tone, avoiding speculation beyond the provided information, and explicitly stating any assumptions made during analysis.
46
+ * Providing references to textbooks, peer-reviewed papers, standards, or reputable online resources when relevant, and citing them in a consistent format.
47
+ * Upholding safety and ethical considerations: never suggest designs that violate known safety regulations, and always warn about potential failure modes, limitations of models, and the need for experimental validation.
48
+
49
+ When responding, follow these guidelines:
50
+ 1. Begin with a brief summary of the key takeaway.
51
+ 2. Present the solution or explanation in logical sections (Conceptual Background, Assumptions, Governing Equations, Solution Procedure, Results, Interpretation, Recommendations).
52
+ 3. Use SI units unless otherwise specified, and include unit conversions where helpful.
53
+ 4. Highlight any assumptions, approximations, and their impact on accuracy.
54
+ 5. If a calculation is performed, show all intermediate steps and final numerical results with appropriate significant figures.
55
+ 6. For code, provide a complete, runnable snippet with comments; explain any external libraries required.
56
+ 7. Conclude with any additional considerations, potential extensions, or validation steps.
57
+
58
+ You adapt your depth of explanation to the user's expertise level: for novices, use analogies and avoid unnecessary jargon; for experts, you may discuss higher-order theories, recent research trends, and nuanced code interpretations. Always strive for clarity, rigor, and pedagogical value.
59
+
60
+ You are empowered to ask the user for missing data, clarifications, or context before delivering a final answer. Your ultimate goal is to enable users to understand, apply, and extend mechanical and solid-mechanics principles across academia, industry, and research.
61
+ ```
62
+ ## Model Details
63
+ - Base Model: `unsloth/gemma-3-270m-it`
64
+ - Parameter Count: 268,098,176
65
+ - Precision: torch.bfloat16
66
+
67
+ ## Hardware
68
+ - GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition
69
+ - Announced: Mar 17th, 2025
70
+ - Release Date: Mar 18th, 2025
71
+ - Memory Type: GDDR7
72
+ - Bandwidth: 1.79 TB/s
73
+ - Memory Size: 96 GB
74
+ - Memory Bus: 512 bit
75
+ - Shading Units: 24064
76
+ - TDP: 600W
77
+
78
+ ## Training Settings
79
+ ### PEFT
80
+ - Rank: 32
81
+ - LoRA alpha: 64
82
+ - Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
83
+ - Gradient checkpointing: unsloth
84
+
85
+ ### SFT
86
+ - Epoch: 2
87
+ - Batch size: 32
88
+ - Gradient Accumulation steps: 1
89
+ - Warmup ratio: 0.05
90
+ - Learning rate: 0.0002
91
+ - Optimizer: adamw_torch_fused
92
+ - Learning rate scheduler: cosine
93
+
94
+ ## Training stats
95
+ - Date: 2026-03-30T15:42:56.091336
96
+ - Peak VRAM usage: 68.33 GB
97
+ - Global step: 2830
98
+ - Training runtime (seconds): 1496.9978
99
+ - Average training loss: 1.398907420828991
100
+ - Final validation loss: 1.282422423362732
101
+
102
+ ## Framework versions
103
+ - Unsloth: 2026.3.17
104
+ - TRL: 0.22.2
105
+ - Transformers: 4.56.2
106
+ - Pytorch: 2.10.0+cu128
107
+ - Datasets: 4.8.4
108
+ - Tokenizers: 0.22.2
109
+
110
+ ## License
111
+ This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy) regarding the use of Gemma-generated content.
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
chat_template.jinja ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {# Unsloth Chat template fixes #}
2
+ {{ bos_token }}
3
+ {%- if messages[0]['role'] == 'system' -%}
4
+ {%- if messages[0]['content'] is string -%}
5
+ {%- set first_user_prefix = messages[0]['content'] + '
6
+
7
+ ' -%}
8
+ {%- else -%}
9
+ {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
10
+
11
+ ' -%}
12
+ {%- endif -%}
13
+ {%- set loop_messages = messages[1:] -%}
14
+ {%- else -%}
15
+ {%- set first_user_prefix = "" -%}
16
+ {%- set loop_messages = messages -%}
17
+ {%- endif -%}
18
+ {%- for message in loop_messages -%}
19
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
20
+ {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
21
+ {%- endif -%}
22
+ {%- if (message['role'] == 'assistant') -%}
23
+ {%- set role = "model" -%}
24
+ {%- else -%}
25
+ {%- set role = message['role'] -%}
26
+ {%- endif -%}
27
+ {{ '<start_of_turn>' + role + '
28
+ ' + (first_user_prefix if loop.first else "") }}
29
+ {%- if message['content'] is string -%}
30
+ {{ message['content'] | trim }}
31
+ {%- elif message['content'] is iterable -%}
32
+ {%- for item in message['content'] -%}
33
+ {%- if item['type'] == 'image' -%}
34
+ {{ '<start_of_image>' }}
35
+ {%- elif item['type'] == 'text' -%}
36
+ {{ item['text'] | trim }}
37
+ {%- endif -%}
38
+ {%- endfor -%}
39
+ {%- elif message['content'] is defined -%}
40
+ {{ raise_exception("Invalid content type") }}
41
+ {%- endif -%}
42
+ {{ '<end_of_turn>
43
+ ' }}
44
+ {%- endfor -%}
45
+ {%- if add_generation_prompt -%}
46
+ {{'<start_of_turn>model
47
+ '}}
48
+ {%- endif -%}
49
+
50
+ {# Copyright 2025-present Unsloth. Apache 2.0 License. #}
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "bfloat16",
11
+ "eos_token_id": 106,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 640,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 2048,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention"
37
+ ],
38
+ "max_position_embeddings": 32768,
39
+ "model_type": "gemma3_text",
40
+ "num_attention_heads": 4,
41
+ "num_hidden_layers": 18,
42
+ "num_key_value_heads": 1,
43
+ "pad_token_id": 0,
44
+ "query_pre_attn_scalar": 256,
45
+ "rms_norm_eps": 1e-06,
46
+ "rope_local_base_freq": 10000.0,
47
+ "rope_scaling": null,
48
+ "rope_theta": 1000000.0,
49
+ "sliding_window": 512,
50
+ "transformers_version": "4.56.2",
51
+ "unsloth_fixed": true,
52
+ "use_bidirectional_attention": false,
53
+ "use_cache": true,
54
+ "vocab_size": 262144
55
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "cache_implementation": "hybrid",
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 106
8
+ ],
9
+ "max_length": 32768,
10
+ "pad_token_id": 0,
11
+ "top_k": 64,
12
+ "top_p": 0.95,
13
+ "transformers_version": "4.56.2"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c62f9d0efecb91dc206c2d4788e5e716fbe6f34c3bb6ec195e017710edf9dfb
3
+ size 536223056
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<end_of_turn>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
3
+ size 33384568
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
train/log.json ADDED
@@ -0,0 +1,2072 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 3.0223,
4
+ "grad_norm": 9.622929573059082,
5
+ "learning_rate": 1.267605633802817e-05,
6
+ "epoch": 0.007067137809187279,
7
+ "step": 10
8
+ },
9
+ {
10
+ "loss": 2.4561,
11
+ "grad_norm": 2.88118052482605,
12
+ "learning_rate": 2.676056338028169e-05,
13
+ "epoch": 0.014134275618374558,
14
+ "step": 20
15
+ },
16
+ {
17
+ "loss": 2.2697,
18
+ "grad_norm": 1.5007869005203247,
19
+ "learning_rate": 4.0845070422535214e-05,
20
+ "epoch": 0.02120141342756184,
21
+ "step": 30
22
+ },
23
+ {
24
+ "loss": 2.1608,
25
+ "grad_norm": 1.195833444595337,
26
+ "learning_rate": 5.492957746478874e-05,
27
+ "epoch": 0.028268551236749116,
28
+ "step": 40
29
+ },
30
+ {
31
+ "loss": 2.0508,
32
+ "grad_norm": 1.1800744533538818,
33
+ "learning_rate": 6.901408450704226e-05,
34
+ "epoch": 0.0353356890459364,
35
+ "step": 50
36
+ },
37
+ {
38
+ "loss": 1.9724,
39
+ "grad_norm": 1.0913054943084717,
40
+ "learning_rate": 8.309859154929578e-05,
41
+ "epoch": 0.04240282685512368,
42
+ "step": 60
43
+ },
44
+ {
45
+ "loss": 1.9282,
46
+ "grad_norm": 1.0702667236328125,
47
+ "learning_rate": 9.718309859154931e-05,
48
+ "epoch": 0.04946996466431095,
49
+ "step": 70
50
+ },
51
+ {
52
+ "loss": 1.9144,
53
+ "grad_norm": 0.9813940525054932,
54
+ "learning_rate": 0.00011126760563380282,
55
+ "epoch": 0.05653710247349823,
56
+ "step": 80
57
+ },
58
+ {
59
+ "loss": 1.8361,
60
+ "grad_norm": 1.041872501373291,
61
+ "learning_rate": 0.00012535211267605635,
62
+ "epoch": 0.0636042402826855,
63
+ "step": 90
64
+ },
65
+ {
66
+ "loss": 1.8409,
67
+ "grad_norm": 1.254153847694397,
68
+ "learning_rate": 0.00013943661971830987,
69
+ "epoch": 0.0706713780918728,
70
+ "step": 100
71
+ },
72
+ {
73
+ "loss": 1.7993,
74
+ "grad_norm": 1.0529217720031738,
75
+ "learning_rate": 0.00015352112676056339,
76
+ "epoch": 0.07773851590106007,
77
+ "step": 110
78
+ },
79
+ {
80
+ "loss": 1.795,
81
+ "grad_norm": 1.0209791660308838,
82
+ "learning_rate": 0.0001676056338028169,
83
+ "epoch": 0.08480565371024736,
84
+ "step": 120
85
+ },
86
+ {
87
+ "loss": 1.7774,
88
+ "grad_norm": 0.9271851181983948,
89
+ "learning_rate": 0.00018169014084507045,
90
+ "epoch": 0.09187279151943463,
91
+ "step": 130
92
+ },
93
+ {
94
+ "loss": 1.766,
95
+ "grad_norm": 0.8660218715667725,
96
+ "learning_rate": 0.00019577464788732396,
97
+ "epoch": 0.0989399293286219,
98
+ "step": 140
99
+ },
100
+ {
101
+ "loss": 1.7462,
102
+ "grad_norm": 1.034064769744873,
103
+ "learning_rate": 0.00019999665339174013,
104
+ "epoch": 0.10600706713780919,
105
+ "step": 150
106
+ },
107
+ {
108
+ "loss": 1.7193,
109
+ "grad_norm": 0.9861654043197632,
110
+ "learning_rate": 0.00019998026238030888,
111
+ "epoch": 0.11307420494699646,
112
+ "step": 160
113
+ },
114
+ {
115
+ "loss": 1.7128,
116
+ "grad_norm": 1.0704911947250366,
117
+ "learning_rate": 0.00019995021451869546,
118
+ "epoch": 0.12014134275618374,
119
+ "step": 170
120
+ },
121
+ {
122
+ "loss": 1.713,
123
+ "grad_norm": 0.9765517115592957,
124
+ "learning_rate": 0.00019990651391130147,
125
+ "epoch": 0.127208480565371,
126
+ "step": 180
127
+ },
128
+ {
129
+ "loss": 1.6701,
130
+ "grad_norm": 0.8447274565696716,
131
+ "learning_rate": 0.00019984916652743156,
132
+ "epoch": 0.13427561837455831,
133
+ "step": 190
134
+ },
135
+ {
136
+ "loss": 1.6735,
137
+ "grad_norm": 0.9753164052963257,
138
+ "learning_rate": 0.00019977818020047817,
139
+ "epoch": 0.1413427561837456,
140
+ "step": 200
141
+ },
142
+ {
143
+ "loss": 1.6576,
144
+ "grad_norm": 0.8660338521003723,
145
+ "learning_rate": 0.00019969356462685146,
146
+ "epoch": 0.14840989399293286,
147
+ "step": 210
148
+ },
149
+ {
150
+ "loss": 1.6457,
151
+ "grad_norm": 0.8690060377120972,
152
+ "learning_rate": 0.0001995953313646548,
153
+ "epoch": 0.15547703180212014,
154
+ "step": 220
155
+ },
156
+ {
157
+ "loss": 1.6314,
158
+ "grad_norm": 0.8115194439888,
159
+ "learning_rate": 0.0001994834938321061,
160
+ "epoch": 0.1625441696113074,
161
+ "step": 230
162
+ },
163
+ {
164
+ "loss": 1.6366,
165
+ "grad_norm": 0.9583505392074585,
166
+ "learning_rate": 0.00019935806730570488,
167
+ "epoch": 0.1696113074204947,
168
+ "step": 240
169
+ },
170
+ {
171
+ "loss": 1.6171,
172
+ "grad_norm": 0.7596977353096008,
173
+ "learning_rate": 0.00019921906891814551,
174
+ "epoch": 0.17667844522968199,
175
+ "step": 250
176
+ },
177
+ {
178
+ "loss": 1.5992,
179
+ "grad_norm": 0.832810640335083,
180
+ "learning_rate": 0.000199066517655977,
181
+ "epoch": 0.18374558303886926,
182
+ "step": 260
183
+ },
184
+ {
185
+ "loss": 1.6132,
186
+ "grad_norm": 0.7862087488174438,
187
+ "learning_rate": 0.00019890043435700954,
188
+ "epoch": 0.19081272084805653,
189
+ "step": 270
190
+ },
191
+ {
192
+ "loss": 1.597,
193
+ "grad_norm": 0.9267857670783997,
194
+ "learning_rate": 0.00019872084170746829,
195
+ "epoch": 0.1978798586572438,
196
+ "step": 280
197
+ },
198
+ {
199
+ "eval_loss": 1.587104320526123,
200
+ "eval_runtime": 33.1782,
201
+ "eval_samples_per_second": 71.794,
202
+ "eval_steps_per_second": 17.964,
203
+ "epoch": 0.19929328621908127,
204
+ "step": 282
205
+ },
206
+ {
207
+ "loss": 1.5856,
208
+ "grad_norm": 0.7953941822052002,
209
+ "learning_rate": 0.0001985277642388941,
210
+ "epoch": 0.2049469964664311,
211
+ "step": 290
212
+ },
213
+ {
214
+ "loss": 1.5911,
215
+ "grad_norm": 0.9416302442550659,
216
+ "learning_rate": 0.00019832122832479326,
217
+ "epoch": 0.21201413427561838,
218
+ "step": 300
219
+ },
220
+ {
221
+ "loss": 1.5807,
222
+ "grad_norm": 0.7640643119812012,
223
+ "learning_rate": 0.0001981012621770344,
224
+ "epoch": 0.21908127208480566,
225
+ "step": 310
226
+ },
227
+ {
228
+ "loss": 1.6089,
229
+ "grad_norm": 0.7899666428565979,
230
+ "learning_rate": 0.00019786789584199524,
231
+ "epoch": 0.22614840989399293,
232
+ "step": 320
233
+ },
234
+ {
235
+ "loss": 1.5768,
236
+ "grad_norm": 0.8222067356109619,
237
+ "learning_rate": 0.00019762116119645818,
238
+ "epoch": 0.2332155477031802,
239
+ "step": 330
240
+ },
241
+ {
242
+ "loss": 1.5571,
243
+ "grad_norm": 0.9370843172073364,
244
+ "learning_rate": 0.00019736109194325635,
245
+ "epoch": 0.24028268551236748,
246
+ "step": 340
247
+ },
248
+ {
249
+ "loss": 1.5752,
250
+ "grad_norm": 0.8056897521018982,
251
+ "learning_rate": 0.00019708772360666957,
252
+ "epoch": 0.24734982332155478,
253
+ "step": 350
254
+ },
255
+ {
256
+ "loss": 1.5278,
257
+ "grad_norm": 0.866102933883667,
258
+ "learning_rate": 0.00019680109352757227,
259
+ "epoch": 0.254416961130742,
260
+ "step": 360
261
+ },
262
+ {
263
+ "loss": 1.5634,
264
+ "grad_norm": 0.8707364797592163,
265
+ "learning_rate": 0.0001965012408583327,
266
+ "epoch": 0.26148409893992935,
267
+ "step": 370
268
+ },
269
+ {
270
+ "loss": 1.5348,
271
+ "grad_norm": 0.8081828951835632,
272
+ "learning_rate": 0.00019618820655746487,
273
+ "epoch": 0.26855123674911663,
274
+ "step": 380
275
+ },
276
+ {
277
+ "loss": 1.5707,
278
+ "grad_norm": 0.8864259123802185,
279
+ "learning_rate": 0.0001958620333840339,
280
+ "epoch": 0.2756183745583039,
281
+ "step": 390
282
+ },
283
+ {
284
+ "loss": 1.5324,
285
+ "grad_norm": 0.8165407776832581,
286
+ "learning_rate": 0.00019552276589181522,
287
+ "epoch": 0.2826855123674912,
288
+ "step": 400
289
+ },
290
+ {
291
+ "loss": 1.5627,
292
+ "grad_norm": 0.7587102055549622,
293
+ "learning_rate": 0.00019517045042320892,
294
+ "epoch": 0.28975265017667845,
295
+ "step": 410
296
+ },
297
+ {
298
+ "loss": 1.5713,
299
+ "grad_norm": 0.819656491279602,
300
+ "learning_rate": 0.00019480513510290934,
301
+ "epoch": 0.2968197879858657,
302
+ "step": 420
303
+ },
304
+ {
305
+ "loss": 1.5437,
306
+ "grad_norm": 0.8668874502182007,
307
+ "learning_rate": 0.00019442686983133168,
308
+ "epoch": 0.303886925795053,
309
+ "step": 430
310
+ },
311
+ {
312
+ "loss": 1.5436,
313
+ "grad_norm": 0.8535803556442261,
314
+ "learning_rate": 0.0001940357062777956,
315
+ "epoch": 0.31095406360424027,
316
+ "step": 440
317
+ },
318
+ {
319
+ "loss": 1.51,
320
+ "grad_norm": 0.8437765836715698,
321
+ "learning_rate": 0.0001936316978734676,
322
+ "epoch": 0.31802120141342755,
323
+ "step": 450
324
+ },
325
+ {
326
+ "loss": 1.5283,
327
+ "grad_norm": 0.7021297812461853,
328
+ "learning_rate": 0.0001932148998040626,
329
+ "epoch": 0.3250883392226148,
330
+ "step": 460
331
+ },
332
+ {
333
+ "loss": 1.5183,
334
+ "grad_norm": 0.7457623481750488,
335
+ "learning_rate": 0.00019278536900230563,
336
+ "epoch": 0.3321554770318021,
337
+ "step": 470
338
+ },
339
+ {
340
+ "loss": 1.5222,
341
+ "grad_norm": 0.8366155028343201,
342
+ "learning_rate": 0.0001923431641401552,
343
+ "epoch": 0.3392226148409894,
344
+ "step": 480
345
+ },
346
+ {
347
+ "loss": 1.4932,
348
+ "grad_norm": 0.8243674039840698,
349
+ "learning_rate": 0.00019188834562078902,
350
+ "epoch": 0.3462897526501767,
351
+ "step": 490
352
+ },
353
+ {
354
+ "loss": 1.4981,
355
+ "grad_norm": 0.9153965711593628,
356
+ "learning_rate": 0.00019142097557035308,
357
+ "epoch": 0.35335689045936397,
358
+ "step": 500
359
+ },
360
+ {
361
+ "loss": 1.4876,
362
+ "grad_norm": 0.8276180028915405,
363
+ "learning_rate": 0.0001909411178294756,
364
+ "epoch": 0.36042402826855124,
365
+ "step": 510
366
+ },
367
+ {
368
+ "loss": 1.4709,
369
+ "grad_norm": 0.8758065700531006,
370
+ "learning_rate": 0.0001904488379445466,
371
+ "epoch": 0.3674911660777385,
372
+ "step": 520
373
+ },
374
+ {
375
+ "loss": 1.4856,
376
+ "grad_norm": 0.7498217821121216,
377
+ "learning_rate": 0.00018994420315876468,
378
+ "epoch": 0.3745583038869258,
379
+ "step": 530
380
+ },
381
+ {
382
+ "loss": 1.495,
383
+ "grad_norm": 0.7916860580444336,
384
+ "learning_rate": 0.0001894272824029518,
385
+ "epoch": 0.38162544169611307,
386
+ "step": 540
387
+ },
388
+ {
389
+ "loss": 1.5118,
390
+ "grad_norm": 0.7216470837593079,
391
+ "learning_rate": 0.0001888981462861377,
392
+ "epoch": 0.38869257950530034,
393
+ "step": 550
394
+ },
395
+ {
396
+ "loss": 1.4686,
397
+ "grad_norm": 0.7964893579483032,
398
+ "learning_rate": 0.00018835686708591496,
399
+ "epoch": 0.3957597173144876,
400
+ "step": 560
401
+ },
402
+ {
403
+ "eval_loss": 1.4734739065170288,
404
+ "eval_runtime": 32.7333,
405
+ "eval_samples_per_second": 72.77,
406
+ "eval_steps_per_second": 18.208,
407
+ "epoch": 0.39858657243816253,
408
+ "step": 564
409
+ },
410
+ {
411
+ "loss": 1.4518,
412
+ "grad_norm": 0.9140339493751526,
413
+ "learning_rate": 0.00018780351873856627,
414
+ "epoch": 0.4028268551236749,
415
+ "step": 570
416
+ },
417
+ {
418
+ "loss": 1.5053,
419
+ "grad_norm": 0.9250938296318054,
420
+ "learning_rate": 0.00018723817682896515,
421
+ "epoch": 0.4098939929328622,
422
+ "step": 580
423
+ },
424
+ {
425
+ "loss": 1.4785,
426
+ "grad_norm": 0.8234674334526062,
427
+ "learning_rate": 0.00018666091858025112,
428
+ "epoch": 0.4169611307420495,
429
+ "step": 590
430
+ },
431
+ {
432
+ "loss": 1.4808,
433
+ "grad_norm": 0.8389473557472229,
434
+ "learning_rate": 0.0001860718228432817,
435
+ "epoch": 0.42402826855123676,
436
+ "step": 600
437
+ },
438
+ {
439
+ "loss": 1.4858,
440
+ "grad_norm": 0.7509133219718933,
441
+ "learning_rate": 0.00018547097008586155,
442
+ "epoch": 0.43109540636042404,
443
+ "step": 610
444
+ },
445
+ {
446
+ "loss": 1.4595,
447
+ "grad_norm": 0.863993227481842,
448
+ "learning_rate": 0.00018485844238175095,
449
+ "epoch": 0.4381625441696113,
450
+ "step": 620
451
+ },
452
+ {
453
+ "loss": 1.4701,
454
+ "grad_norm": 0.8800034523010254,
455
+ "learning_rate": 0.000184234323399455,
456
+ "epoch": 0.4452296819787986,
457
+ "step": 630
458
+ },
459
+ {
460
+ "loss": 1.4599,
461
+ "grad_norm": 1.2106516361236572,
462
+ "learning_rate": 0.0001835986983907947,
463
+ "epoch": 0.45229681978798586,
464
+ "step": 640
465
+ },
466
+ {
467
+ "loss": 1.4743,
468
+ "grad_norm": 0.8351795673370361,
469
+ "learning_rate": 0.00018295165417926207,
470
+ "epoch": 0.45936395759717313,
471
+ "step": 650
472
+ },
473
+ {
474
+ "loss": 1.4668,
475
+ "grad_norm": 0.7767188549041748,
476
+ "learning_rate": 0.00018229327914816052,
477
+ "epoch": 0.4664310954063604,
478
+ "step": 660
479
+ },
480
+ {
481
+ "loss": 1.4477,
482
+ "grad_norm": 0.7641212940216064,
483
+ "learning_rate": 0.00018162366322853191,
484
+ "epoch": 0.4734982332155477,
485
+ "step": 670
486
+ },
487
+ {
488
+ "loss": 1.4451,
489
+ "grad_norm": 0.8441094160079956,
490
+ "learning_rate": 0.00018094289788687245,
491
+ "epoch": 0.48056537102473496,
492
+ "step": 680
493
+ },
494
+ {
495
+ "loss": 1.4572,
496
+ "grad_norm": 0.7727362513542175,
497
+ "learning_rate": 0.0001802510761126389,
498
+ "epoch": 0.4876325088339223,
499
+ "step": 690
500
+ },
501
+ {
502
+ "loss": 1.4557,
503
+ "grad_norm": 0.7655246257781982,
504
+ "learning_rate": 0.00017954829240554644,
505
+ "epoch": 0.49469964664310956,
506
+ "step": 700
507
+ },
508
+ {
509
+ "loss": 1.4721,
510
+ "grad_norm": 0.8504830002784729,
511
+ "learning_rate": 0.00017883464276266064,
512
+ "epoch": 0.5017667844522968,
513
+ "step": 710
514
+ },
515
+ {
516
+ "loss": 1.4533,
517
+ "grad_norm": 0.7493986487388611,
518
+ "learning_rate": 0.00017811022466528452,
519
+ "epoch": 0.508833922261484,
520
+ "step": 720
521
+ },
522
+ {
523
+ "loss": 1.4379,
524
+ "grad_norm": 0.7659527659416199,
525
+ "learning_rate": 0.0001773751370656431,
526
+ "epoch": 0.5159010600706714,
527
+ "step": 730
528
+ },
529
+ {
530
+ "loss": 1.406,
531
+ "grad_norm": 0.8079085350036621,
532
+ "learning_rate": 0.0001766294803733671,
533
+ "epoch": 0.5229681978798587,
534
+ "step": 740
535
+ },
536
+ {
537
+ "loss": 1.4479,
538
+ "grad_norm": 0.8187495470046997,
539
+ "learning_rate": 0.0001758733564417773,
540
+ "epoch": 0.5300353356890459,
541
+ "step": 750
542
+ },
543
+ {
544
+ "loss": 1.4745,
545
+ "grad_norm": 0.7997586131095886,
546
+ "learning_rate": 0.00017510686855397176,
547
+ "epoch": 0.5371024734982333,
548
+ "step": 760
549
+ },
550
+ {
551
+ "loss": 1.4422,
552
+ "grad_norm": 0.7756392955780029,
553
+ "learning_rate": 0.00017433012140871811,
554
+ "epoch": 0.5441696113074205,
555
+ "step": 770
556
+ },
557
+ {
558
+ "loss": 1.4563,
559
+ "grad_norm": 0.7612184286117554,
560
+ "learning_rate": 0.00017354322110615188,
561
+ "epoch": 0.5512367491166078,
562
+ "step": 780
563
+ },
564
+ {
565
+ "loss": 1.4478,
566
+ "grad_norm": 0.7223451733589172,
567
+ "learning_rate": 0.00017274627513328385,
568
+ "epoch": 0.558303886925795,
569
+ "step": 790
570
+ },
571
+ {
572
+ "loss": 1.447,
573
+ "grad_norm": 0.8275614380836487,
574
+ "learning_rate": 0.00017193939234931777,
575
+ "epoch": 0.5653710247349824,
576
+ "step": 800
577
+ },
578
+ {
579
+ "loss": 1.4681,
580
+ "grad_norm": 0.8228889107704163,
581
+ "learning_rate": 0.00017112268297078077,
582
+ "epoch": 0.5724381625441696,
583
+ "step": 810
584
+ },
585
+ {
586
+ "loss": 1.4154,
587
+ "grad_norm": 0.7482108473777771,
588
+ "learning_rate": 0.0001702962585564681,
589
+ "epoch": 0.5795053003533569,
590
+ "step": 820
591
+ },
592
+ {
593
+ "loss": 1.4272,
594
+ "grad_norm": 0.813960075378418,
595
+ "learning_rate": 0.00016946023199220487,
596
+ "epoch": 0.5865724381625441,
597
+ "step": 830
598
+ },
599
+ {
600
+ "loss": 1.4076,
601
+ "grad_norm": 0.8059828877449036,
602
+ "learning_rate": 0.0001686147174754263,
603
+ "epoch": 0.5936395759717314,
604
+ "step": 840
605
+ },
606
+ {
607
+ "eval_loss": 1.414860725402832,
608
+ "eval_runtime": 32.6307,
609
+ "eval_samples_per_second": 72.999,
610
+ "eval_steps_per_second": 18.265,
611
+ "epoch": 0.5978798586572438,
612
+ "step": 846
613
+ },
614
+ {
615
+ "loss": 1.435,
616
+ "grad_norm": 0.7831512689590454,
617
+ "learning_rate": 0.00016775983049957887,
618
+ "epoch": 0.6007067137809188,
619
+ "step": 850
620
+ },
621
+ {
622
+ "loss": 1.453,
623
+ "grad_norm": 0.7612254023551941,
624
+ "learning_rate": 0.0001668956878383445,
625
+ "epoch": 0.607773851590106,
626
+ "step": 860
627
+ },
628
+ {
629
+ "loss": 1.4496,
630
+ "grad_norm": 0.7569178938865662,
631
+ "learning_rate": 0.0001660224075296896,
632
+ "epoch": 0.6148409893992933,
633
+ "step": 870
634
+ },
635
+ {
636
+ "loss": 1.4032,
637
+ "grad_norm": 0.8306989669799805,
638
+ "learning_rate": 0.00016514010885974184,
639
+ "epoch": 0.6219081272084805,
640
+ "step": 880
641
+ },
642
+ {
643
+ "loss": 1.4582,
644
+ "grad_norm": 0.7806069850921631,
645
+ "learning_rate": 0.00016424891234649618,
646
+ "epoch": 0.6289752650176679,
647
+ "step": 890
648
+ },
649
+ {
650
+ "loss": 1.4078,
651
+ "grad_norm": 0.8022964596748352,
652
+ "learning_rate": 0.00016334893972335247,
653
+ "epoch": 0.6360424028268551,
654
+ "step": 900
655
+ },
656
+ {
657
+ "loss": 1.4182,
658
+ "grad_norm": 0.7912920713424683,
659
+ "learning_rate": 0.00016244031392248748,
660
+ "epoch": 0.6431095406360424,
661
+ "step": 910
662
+ },
663
+ {
664
+ "loss": 1.4092,
665
+ "grad_norm": 0.7984107136726379,
666
+ "learning_rate": 0.00016152315905806268,
667
+ "epoch": 0.6501766784452296,
668
+ "step": 920
669
+ },
670
+ {
671
+ "loss": 1.4038,
672
+ "grad_norm": 0.7535344362258911,
673
+ "learning_rate": 0.00016059760040927103,
674
+ "epoch": 0.657243816254417,
675
+ "step": 930
676
+ },
677
+ {
678
+ "loss": 1.3912,
679
+ "grad_norm": 0.7016357183456421,
680
+ "learning_rate": 0.0001596637644032242,
681
+ "epoch": 0.6643109540636042,
682
+ "step": 940
683
+ },
684
+ {
685
+ "loss": 1.4376,
686
+ "grad_norm": 0.7795329689979553,
687
+ "learning_rate": 0.00015872177859768333,
688
+ "epoch": 0.6713780918727915,
689
+ "step": 950
690
+ },
691
+ {
692
+ "loss": 1.4172,
693
+ "grad_norm": 0.8167974352836609,
694
+ "learning_rate": 0.00015777177166363527,
695
+ "epoch": 0.6784452296819788,
696
+ "step": 960
697
+ },
698
+ {
699
+ "loss": 1.4088,
700
+ "grad_norm": 0.71951824426651,
701
+ "learning_rate": 0.00015681387336771656,
702
+ "epoch": 0.6855123674911661,
703
+ "step": 970
704
+ },
705
+ {
706
+ "loss": 1.4202,
707
+ "grad_norm": 0.774888813495636,
708
+ "learning_rate": 0.0001558482145544879,
709
+ "epoch": 0.6925795053003534,
710
+ "step": 980
711
+ },
712
+ {
713
+ "loss": 1.4101,
714
+ "grad_norm": 0.8282411694526672,
715
+ "learning_rate": 0.0001548749271285616,
716
+ "epoch": 0.6996466431095406,
717
+ "step": 990
718
+ },
719
+ {
720
+ "loss": 1.3747,
721
+ "grad_norm": 0.7824772000312805,
722
+ "learning_rate": 0.0001538941440365837,
723
+ "epoch": 0.7067137809187279,
724
+ "step": 1000
725
+ },
726
+ {
727
+ "loss": 1.3852,
728
+ "grad_norm": 0.839332103729248,
729
+ "learning_rate": 0.00015290599924907433,
730
+ "epoch": 0.7137809187279152,
731
+ "step": 1010
732
+ },
733
+ {
734
+ "loss": 1.4326,
735
+ "grad_norm": 0.8095847964286804,
736
+ "learning_rate": 0.00015191062774212773,
737
+ "epoch": 0.7208480565371025,
738
+ "step": 1020
739
+ },
740
+ {
741
+ "loss": 1.4082,
742
+ "grad_norm": 0.827717661857605,
743
+ "learning_rate": 0.0001509081654789753,
744
+ "epoch": 0.7279151943462897,
745
+ "step": 1030
746
+ },
747
+ {
748
+ "loss": 1.3871,
749
+ "grad_norm": 0.7732033133506775,
750
+ "learning_rate": 0.00014989874939141351,
751
+ "epoch": 0.734982332155477,
752
+ "step": 1040
753
+ },
754
+ {
755
+ "loss": 1.3649,
756
+ "grad_norm": 0.7641321420669556,
757
+ "learning_rate": 0.0001488825173610997,
758
+ "epoch": 0.7420494699646644,
759
+ "step": 1050
760
+ },
761
+ {
762
+ "loss": 1.4024,
763
+ "grad_norm": 0.7748322486877441,
764
+ "learning_rate": 0.0001478596082007181,
765
+ "epoch": 0.7491166077738516,
766
+ "step": 1060
767
+ },
768
+ {
769
+ "loss": 1.3805,
770
+ "grad_norm": 0.7583175301551819,
771
+ "learning_rate": 0.00014683016163501855,
772
+ "epoch": 0.7561837455830389,
773
+ "step": 1070
774
+ },
775
+ {
776
+ "loss": 1.3786,
777
+ "grad_norm": 0.7941022515296936,
778
+ "learning_rate": 0.0001457943182817308,
779
+ "epoch": 0.7632508833922261,
780
+ "step": 1080
781
+ },
782
+ {
783
+ "loss": 1.389,
784
+ "grad_norm": 0.8075069785118103,
785
+ "learning_rate": 0.00014475221963235687,
786
+ "epoch": 0.7703180212014135,
787
+ "step": 1090
788
+ },
789
+ {
790
+ "loss": 1.4011,
791
+ "grad_norm": 0.8261104822158813,
792
+ "learning_rate": 0.00014370400803284374,
793
+ "epoch": 0.7773851590106007,
794
+ "step": 1100
795
+ },
796
+ {
797
+ "loss": 1.419,
798
+ "grad_norm": 0.8428652882575989,
799
+ "learning_rate": 0.00014264982666413958,
800
+ "epoch": 0.784452296819788,
801
+ "step": 1110
802
+ },
803
+ {
804
+ "loss": 1.3845,
805
+ "grad_norm": 0.7421643137931824,
806
+ "learning_rate": 0.00014158981952263608,
807
+ "epoch": 0.7915194346289752,
808
+ "step": 1120
809
+ },
810
+ {
811
+ "eval_loss": 1.373445749282837,
812
+ "eval_runtime": 32.6518,
813
+ "eval_samples_per_second": 72.952,
814
+ "eval_steps_per_second": 18.253,
815
+ "epoch": 0.7971731448763251,
816
+ "step": 1128
817
+ },
818
+ {
819
+ "loss": 1.3417,
820
+ "grad_norm": 0.861863911151886,
821
+ "learning_rate": 0.000140524131400499,
822
+ "epoch": 0.7985865724381626,
823
+ "step": 1130
824
+ },
825
+ {
826
+ "loss": 1.3582,
827
+ "grad_norm": 0.7944890260696411,
828
+ "learning_rate": 0.00013945290786589027,
829
+ "epoch": 0.8056537102473498,
830
+ "step": 1140
831
+ },
832
+ {
833
+ "loss": 1.3801,
834
+ "grad_norm": 0.7751563787460327,
835
+ "learning_rate": 0.00013837629524308408,
836
+ "epoch": 0.8127208480565371,
837
+ "step": 1150
838
+ },
839
+ {
840
+ "loss": 1.3959,
841
+ "grad_norm": 0.8568748235702515,
842
+ "learning_rate": 0.00013729444059247954,
843
+ "epoch": 0.8197879858657244,
844
+ "step": 1160
845
+ },
846
+ {
847
+ "loss": 1.3286,
848
+ "grad_norm": 0.7741242051124573,
849
+ "learning_rate": 0.00013620749169051307,
850
+ "epoch": 0.8268551236749117,
851
+ "step": 1170
852
+ },
853
+ {
854
+ "loss": 1.3591,
855
+ "grad_norm": 0.7787532806396484,
856
+ "learning_rate": 0.00013511559700947264,
857
+ "epoch": 0.833922261484099,
858
+ "step": 1180
859
+ },
860
+ {
861
+ "loss": 1.3575,
862
+ "grad_norm": 0.9094675779342651,
863
+ "learning_rate": 0.00013401890569721725,
864
+ "epoch": 0.8409893992932862,
865
+ "step": 1190
866
+ },
867
+ {
868
+ "loss": 1.3837,
869
+ "grad_norm": 0.785794198513031,
870
+ "learning_rate": 0.00013291756755680388,
871
+ "epoch": 0.8480565371024735,
872
+ "step": 1200
873
+ },
874
+ {
875
+ "loss": 1.3939,
876
+ "grad_norm": 0.7012341022491455,
877
+ "learning_rate": 0.00013181173302602528,
878
+ "epoch": 0.8551236749116607,
879
+ "step": 1210
880
+ },
881
+ {
882
+ "loss": 1.3591,
883
+ "grad_norm": 0.783419668674469,
884
+ "learning_rate": 0.0001307015531568606,
885
+ "epoch": 0.8621908127208481,
886
+ "step": 1220
887
+ },
888
+ {
889
+ "loss": 1.3824,
890
+ "grad_norm": 0.8260233998298645,
891
+ "learning_rate": 0.00012958717959484254,
892
+ "epoch": 0.8692579505300353,
893
+ "step": 1230
894
+ },
895
+ {
896
+ "loss": 1.3627,
897
+ "grad_norm": 0.8838050961494446,
898
+ "learning_rate": 0.0001284687645583432,
899
+ "epoch": 0.8763250883392226,
900
+ "step": 1240
901
+ },
902
+ {
903
+ "loss": 1.3743,
904
+ "grad_norm": 0.8196436166763306,
905
+ "learning_rate": 0.0001273464608177818,
906
+ "epoch": 0.8833922261484098,
907
+ "step": 1250
908
+ },
909
+ {
910
+ "loss": 1.3663,
911
+ "grad_norm": 0.77605801820755,
912
+ "learning_rate": 0.00012622042167475693,
913
+ "epoch": 0.8904593639575972,
914
+ "step": 1260
915
+ },
916
+ {
917
+ "loss": 1.3548,
918
+ "grad_norm": 0.8123463988304138,
919
+ "learning_rate": 0.00012509080094110604,
920
+ "epoch": 0.8975265017667845,
921
+ "step": 1270
922
+ },
923
+ {
924
+ "loss": 1.362,
925
+ "grad_norm": 0.8651890158653259,
926
+ "learning_rate": 0.00012395775291789568,
927
+ "epoch": 0.9045936395759717,
928
+ "step": 1280
929
+ },
930
+ {
931
+ "loss": 1.3513,
932
+ "grad_norm": 0.7966169118881226,
933
+ "learning_rate": 0.00012282143237434478,
934
+ "epoch": 0.911660777385159,
935
+ "step": 1290
936
+ },
937
+ {
938
+ "loss": 1.3654,
939
+ "grad_norm": 0.7800924181938171,
940
+ "learning_rate": 0.00012168199452668341,
941
+ "epoch": 0.9187279151943463,
942
+ "step": 1300
943
+ },
944
+ {
945
+ "loss": 1.3414,
946
+ "grad_norm": 0.7868988513946533,
947
+ "learning_rate": 0.00012053959501695145,
948
+ "epoch": 0.9257950530035336,
949
+ "step": 1310
950
+ },
951
+ {
952
+ "loss": 1.3444,
953
+ "grad_norm": 0.7945041656494141,
954
+ "learning_rate": 0.00011939438989173828,
955
+ "epoch": 0.9328621908127208,
956
+ "step": 1320
957
+ },
958
+ {
959
+ "loss": 1.368,
960
+ "grad_norm": 0.8256521224975586,
961
+ "learning_rate": 0.00011824653558086769,
962
+ "epoch": 0.9399293286219081,
963
+ "step": 1330
964
+ },
965
+ {
966
+ "loss": 1.3333,
967
+ "grad_norm": 0.7671541571617126,
968
+ "learning_rate": 0.00011709618887603014,
969
+ "epoch": 0.9469964664310954,
970
+ "step": 1340
971
+ },
972
+ {
973
+ "loss": 1.3531,
974
+ "grad_norm": 0.706906259059906,
975
+ "learning_rate": 0.00011594350690936581,
976
+ "epoch": 0.9540636042402827,
977
+ "step": 1350
978
+ },
979
+ {
980
+ "loss": 1.385,
981
+ "grad_norm": 0.8090202808380127,
982
+ "learning_rate": 0.00011478864713200113,
983
+ "epoch": 0.9611307420494699,
984
+ "step": 1360
985
+ },
986
+ {
987
+ "loss": 1.3731,
988
+ "grad_norm": 0.7614802718162537,
989
+ "learning_rate": 0.00011363176729254146,
990
+ "epoch": 0.9681978798586572,
991
+ "step": 1370
992
+ },
993
+ {
994
+ "loss": 1.3742,
995
+ "grad_norm": 0.7586848735809326,
996
+ "learning_rate": 0.00011247302541552359,
997
+ "epoch": 0.9752650176678446,
998
+ "step": 1380
999
+ },
1000
+ {
1001
+ "loss": 1.3183,
1002
+ "grad_norm": 0.8582295775413513,
1003
+ "learning_rate": 0.00011131257977983014,
1004
+ "epoch": 0.9823321554770318,
1005
+ "step": 1390
1006
+ },
1007
+ {
1008
+ "loss": 1.3627,
1009
+ "grad_norm": 0.7340189814567566,
1010
+ "learning_rate": 0.00011015058889706942,
1011
+ "epoch": 0.9893992932862191,
1012
+ "step": 1400
1013
+ },
1014
+ {
1015
+ "loss": 1.3606,
1016
+ "grad_norm": 0.8590829372406006,
1017
+ "learning_rate": 0.00010898721148992351,
1018
+ "epoch": 0.9964664310954063,
1019
+ "step": 1410
1020
+ },
1021
+ {
1022
+ "eval_loss": 1.341532826423645,
1023
+ "eval_runtime": 32.4522,
1024
+ "eval_samples_per_second": 73.4,
1025
+ "eval_steps_per_second": 18.365,
1026
+ "epoch": 0.9964664310954063,
1027
+ "step": 1410
1028
+ },
1029
+ {
1030
+ "loss": 1.3476,
1031
+ "grad_norm": 0.8152231574058533,
1032
+ "learning_rate": 0.00010782260647046742,
1033
+ "epoch": 1.0035335689045937,
1034
+ "step": 1420
1035
+ },
1036
+ {
1037
+ "loss": 1.3144,
1038
+ "grad_norm": 0.8353652954101562,
1039
+ "learning_rate": 0.00010665693291846244,
1040
+ "epoch": 1.010600706713781,
1041
+ "step": 1430
1042
+ },
1043
+ {
1044
+ "loss": 1.3064,
1045
+ "grad_norm": 0.9546340703964233,
1046
+ "learning_rate": 0.00010549035005962653,
1047
+ "epoch": 1.017667844522968,
1048
+ "step": 1440
1049
+ },
1050
+ {
1051
+ "loss": 1.2902,
1052
+ "grad_norm": 0.8087377548217773,
1053
+ "learning_rate": 0.00010432301724388485,
1054
+ "epoch": 1.0247349823321554,
1055
+ "step": 1450
1056
+ },
1057
+ {
1058
+ "loss": 1.2955,
1059
+ "grad_norm": 0.8206213116645813,
1060
+ "learning_rate": 0.0001031550939236033,
1061
+ "epoch": 1.0318021201413428,
1062
+ "step": 1460
1063
+ },
1064
+ {
1065
+ "loss": 1.3192,
1066
+ "grad_norm": 0.8617863655090332,
1067
+ "learning_rate": 0.00010198673963180796,
1068
+ "epoch": 1.03886925795053,
1069
+ "step": 1470
1070
+ },
1071
+ {
1072
+ "loss": 1.3309,
1073
+ "grad_norm": 0.815900444984436,
1074
+ "learning_rate": 0.00010081811396039373,
1075
+ "epoch": 1.0459363957597174,
1076
+ "step": 1480
1077
+ },
1078
+ {
1079
+ "loss": 1.2872,
1080
+ "grad_norm": 0.8019934892654419,
1081
+ "learning_rate": 9.964937653832468e-05,
1082
+ "epoch": 1.0530035335689045,
1083
+ "step": 1490
1084
+ },
1085
+ {
1086
+ "loss": 1.2956,
1087
+ "grad_norm": 0.7981704473495483,
1088
+ "learning_rate": 9.848068700982955e-05,
1089
+ "epoch": 1.0600706713780919,
1090
+ "step": 1500
1091
+ },
1092
+ {
1093
+ "loss": 1.3132,
1094
+ "grad_norm": 0.8272152543067932,
1095
+ "learning_rate": 9.731220501259501e-05,
1096
+ "epoch": 1.0671378091872792,
1097
+ "step": 1510
1098
+ },
1099
+ {
1100
+ "loss": 1.291,
1101
+ "grad_norm": 0.7766007781028748,
1102
+ "learning_rate": 9.614409015595995e-05,
1103
+ "epoch": 1.0742049469964665,
1104
+ "step": 1520
1105
+ },
1106
+ {
1107
+ "loss": 1.3021,
1108
+ "grad_norm": 0.830301821231842,
1109
+ "learning_rate": 9.497650199911341e-05,
1110
+ "epoch": 1.0812720848056536,
1111
+ "step": 1530
1112
+ },
1113
+ {
1114
+ "loss": 1.3022,
1115
+ "grad_norm": 0.7850944995880127,
1116
+ "learning_rate": 9.380960002929979e-05,
1117
+ "epoch": 1.088339222614841,
1118
+ "step": 1540
1119
+ },
1120
+ {
1121
+ "loss": 1.3042,
1122
+ "grad_norm": 0.8939189314842224,
1123
+ "learning_rate": 9.264354364003327e-05,
1124
+ "epoch": 1.0954063604240283,
1125
+ "step": 1550
1126
+ },
1127
+ {
1128
+ "loss": 1.2754,
1129
+ "grad_norm": 0.8872391581535339,
1130
+ "learning_rate": 9.147849210932571e-05,
1131
+ "epoch": 1.1024734982332156,
1132
+ "step": 1560
1133
+ },
1134
+ {
1135
+ "loss": 1.3033,
1136
+ "grad_norm": 0.8385890126228333,
1137
+ "learning_rate": 9.031460457792982e-05,
1138
+ "epoch": 1.1095406360424027,
1139
+ "step": 1570
1140
+ },
1141
+ {
1142
+ "loss": 1.3267,
1143
+ "grad_norm": 0.8151512742042542,
1144
+ "learning_rate": 8.915204002760122e-05,
1145
+ "epoch": 1.11660777385159,
1146
+ "step": 1580
1147
+ },
1148
+ {
1149
+ "loss": 1.3131,
1150
+ "grad_norm": 0.80403733253479,
1151
+ "learning_rate": 8.799095725938243e-05,
1152
+ "epoch": 1.1236749116607774,
1153
+ "step": 1590
1154
+ },
1155
+ {
1156
+ "loss": 1.3262,
1157
+ "grad_norm": 0.8914912939071655,
1158
+ "learning_rate": 8.68315148719111e-05,
1159
+ "epoch": 1.1307420494699647,
1160
+ "step": 1600
1161
+ },
1162
+ {
1163
+ "loss": 1.3178,
1164
+ "grad_norm": 0.924933671951294,
1165
+ "learning_rate": 8.567387123975648e-05,
1166
+ "epoch": 1.137809187279152,
1167
+ "step": 1610
1168
+ },
1169
+ {
1170
+ "loss": 1.31,
1171
+ "grad_norm": 0.7894246578216553,
1172
+ "learning_rate": 8.451818449178591e-05,
1173
+ "epoch": 1.1448763250883391,
1174
+ "step": 1620
1175
+ },
1176
+ {
1177
+ "loss": 1.2846,
1178
+ "grad_norm": 0.7919443845748901,
1179
+ "learning_rate": 8.336461248956522e-05,
1180
+ "epoch": 1.1519434628975265,
1181
+ "step": 1630
1182
+ },
1183
+ {
1184
+ "loss": 1.2923,
1185
+ "grad_norm": 0.7391479015350342,
1186
+ "learning_rate": 8.221331280579564e-05,
1187
+ "epoch": 1.1590106007067138,
1188
+ "step": 1640
1189
+ },
1190
+ {
1191
+ "loss": 1.2891,
1192
+ "grad_norm": 0.8959116339683533,
1193
+ "learning_rate": 8.106444270278999e-05,
1194
+ "epoch": 1.1660777385159011,
1195
+ "step": 1650
1196
+ },
1197
+ {
1198
+ "loss": 1.3105,
1199
+ "grad_norm": 0.788266122341156,
1200
+ "learning_rate": 7.991815911099126e-05,
1201
+ "epoch": 1.1731448763250882,
1202
+ "step": 1660
1203
+ },
1204
+ {
1205
+ "loss": 1.3184,
1206
+ "grad_norm": 0.9487005472183228,
1207
+ "learning_rate": 7.877461860753697e-05,
1208
+ "epoch": 1.1802120141342756,
1209
+ "step": 1670
1210
+ },
1211
+ {
1212
+ "loss": 1.2882,
1213
+ "grad_norm": 0.843334972858429,
1214
+ "learning_rate": 7.763397739487098e-05,
1215
+ "epoch": 1.187279151943463,
1216
+ "step": 1680
1217
+ },
1218
+ {
1219
+ "loss": 1.277,
1220
+ "grad_norm": 0.7756645679473877,
1221
+ "learning_rate": 7.649639127940735e-05,
1222
+ "epoch": 1.1943462897526502,
1223
+ "step": 1690
1224
+ },
1225
+ {
1226
+ "eval_loss": 1.319354772567749,
1227
+ "eval_runtime": 37.3799,
1228
+ "eval_samples_per_second": 63.724,
1229
+ "eval_steps_per_second": 15.944,
1230
+ "epoch": 1.1957597173144876,
1231
+ "step": 1692
1232
+ },
1233
+ {
1234
+ "loss": 1.3241,
1235
+ "grad_norm": 0.7394362092018127,
1236
+ "learning_rate": 7.536201565024767e-05,
1237
+ "epoch": 1.2014134275618376,
1238
+ "step": 1700
1239
+ },
1240
+ {
1241
+ "loss": 1.3292,
1242
+ "grad_norm": 0.9032957553863525,
1243
+ "learning_rate": 7.423100545795565e-05,
1244
+ "epoch": 1.2084805653710247,
1245
+ "step": 1710
1246
+ },
1247
+ {
1248
+ "loss": 1.3072,
1249
+ "grad_norm": 0.8041402697563171,
1250
+ "learning_rate": 7.310351519339165e-05,
1251
+ "epoch": 1.215547703180212,
1252
+ "step": 1720
1253
+ },
1254
+ {
1255
+ "loss": 1.3149,
1256
+ "grad_norm": 0.8472148180007935,
1257
+ "learning_rate": 7.197969886660984e-05,
1258
+ "epoch": 1.2226148409893993,
1259
+ "step": 1730
1260
+ },
1261
+ {
1262
+ "loss": 1.2901,
1263
+ "grad_norm": 0.8664119243621826,
1264
+ "learning_rate": 7.085970998582112e-05,
1265
+ "epoch": 1.2296819787985867,
1266
+ "step": 1740
1267
+ },
1268
+ {
1269
+ "loss": 1.2961,
1270
+ "grad_norm": 0.8825384974479675,
1271
+ "learning_rate": 6.974370153642468e-05,
1272
+ "epoch": 1.2367491166077738,
1273
+ "step": 1750
1274
+ },
1275
+ {
1276
+ "loss": 1.2856,
1277
+ "grad_norm": 0.8917942047119141,
1278
+ "learning_rate": 6.863182596011087e-05,
1279
+ "epoch": 1.243816254416961,
1280
+ "step": 1760
1281
+ },
1282
+ {
1283
+ "loss": 1.2594,
1284
+ "grad_norm": 0.8514686226844788,
1285
+ "learning_rate": 6.752423513403824e-05,
1286
+ "epoch": 1.2508833922261484,
1287
+ "step": 1770
1288
+ },
1289
+ {
1290
+ "loss": 1.3013,
1291
+ "grad_norm": 0.7883872985839844,
1292
+ "learning_rate": 6.642108035008803e-05,
1293
+ "epoch": 1.2579505300353357,
1294
+ "step": 1780
1295
+ },
1296
+ {
1297
+ "loss": 1.3038,
1298
+ "grad_norm": 0.9033199548721313,
1299
+ "learning_rate": 6.53225122941981e-05,
1300
+ "epoch": 1.265017667844523,
1301
+ "step": 1790
1302
+ },
1303
+ {
1304
+ "loss": 1.3343,
1305
+ "grad_norm": 0.8501263856887817,
1306
+ "learning_rate": 6.422868102578018e-05,
1307
+ "epoch": 1.2720848056537102,
1308
+ "step": 1800
1309
+ },
1310
+ {
1311
+ "loss": 1.3122,
1312
+ "grad_norm": 0.8853347301483154,
1313
+ "learning_rate": 6.31397359572223e-05,
1314
+ "epoch": 1.2791519434628975,
1315
+ "step": 1810
1316
+ },
1317
+ {
1318
+ "loss": 1.2868,
1319
+ "grad_norm": 0.7910575866699219,
1320
+ "learning_rate": 6.205582583347974e-05,
1321
+ "epoch": 1.2862190812720848,
1322
+ "step": 1820
1323
+ },
1324
+ {
1325
+ "loss": 1.2864,
1326
+ "grad_norm": 0.8016606569290161,
1327
+ "learning_rate": 6.097709871175723e-05,
1328
+ "epoch": 1.293286219081272,
1329
+ "step": 1830
1330
+ },
1331
+ {
1332
+ "loss": 1.2966,
1333
+ "grad_norm": 0.8374130725860596,
1334
+ "learning_rate": 5.990370194128479e-05,
1335
+ "epoch": 1.3003533568904593,
1336
+ "step": 1840
1337
+ },
1338
+ {
1339
+ "loss": 1.3106,
1340
+ "grad_norm": 0.852772057056427,
1341
+ "learning_rate": 5.88357821431908e-05,
1342
+ "epoch": 1.3074204946996466,
1343
+ "step": 1850
1344
+ },
1345
+ {
1346
+ "loss": 1.2808,
1347
+ "grad_norm": 0.8999858498573303,
1348
+ "learning_rate": 5.7773485190474044e-05,
1349
+ "epoch": 1.314487632508834,
1350
+ "step": 1860
1351
+ },
1352
+ {
1353
+ "loss": 1.2539,
1354
+ "grad_norm": 0.8973957896232605,
1355
+ "learning_rate": 5.671695618807802e-05,
1356
+ "epoch": 1.3215547703180213,
1357
+ "step": 1870
1358
+ },
1359
+ {
1360
+ "loss": 1.2802,
1361
+ "grad_norm": 0.7528464794158936,
1362
+ "learning_rate": 5.566633945307052e-05,
1363
+ "epoch": 1.3286219081272086,
1364
+ "step": 1880
1365
+ },
1366
+ {
1367
+ "loss": 1.2715,
1368
+ "grad_norm": 0.8808310627937317,
1369
+ "learning_rate": 5.4621778494930397e-05,
1370
+ "epoch": 1.3356890459363957,
1371
+ "step": 1890
1372
+ },
1373
+ {
1374
+ "loss": 1.3017,
1375
+ "grad_norm": 0.8389664888381958,
1376
+ "learning_rate": 5.358341599594483e-05,
1377
+ "epoch": 1.342756183745583,
1378
+ "step": 1900
1379
+ },
1380
+ {
1381
+ "loss": 1.2849,
1382
+ "grad_norm": 0.9317607283592224,
1383
+ "learning_rate": 5.255139379171967e-05,
1384
+ "epoch": 1.3498233215547704,
1385
+ "step": 1910
1386
+ },
1387
+ {
1388
+ "loss": 1.2916,
1389
+ "grad_norm": 0.803419828414917,
1390
+ "learning_rate": 5.152585285180517e-05,
1391
+ "epoch": 1.3568904593639575,
1392
+ "step": 1920
1393
+ },
1394
+ {
1395
+ "loss": 1.2805,
1396
+ "grad_norm": 0.8279209136962891,
1397
+ "learning_rate": 5.050693326044036e-05,
1398
+ "epoch": 1.3639575971731448,
1399
+ "step": 1930
1400
+ },
1401
+ {
1402
+ "loss": 1.2576,
1403
+ "grad_norm": 0.893677830696106,
1404
+ "learning_rate": 4.949477419741814e-05,
1405
+ "epoch": 1.3710247349823321,
1406
+ "step": 1940
1407
+ },
1408
+ {
1409
+ "loss": 1.2781,
1410
+ "grad_norm": 0.8157869577407837,
1411
+ "learning_rate": 4.848951391907377e-05,
1412
+ "epoch": 1.3780918727915195,
1413
+ "step": 1950
1414
+ },
1415
+ {
1416
+ "loss": 1.2735,
1417
+ "grad_norm": 0.8294868469238281,
1418
+ "learning_rate": 4.749128973940001e-05,
1419
+ "epoch": 1.3851590106007068,
1420
+ "step": 1960
1421
+ },
1422
+ {
1423
+ "loss": 1.3018,
1424
+ "grad_norm": 0.8437710404396057,
1425
+ "learning_rate": 4.6500238011290295e-05,
1426
+ "epoch": 1.3922261484098941,
1427
+ "step": 1970
1428
+ },
1429
+ {
1430
+ "eval_loss": 1.3012111186981201,
1431
+ "eval_runtime": 36.7927,
1432
+ "eval_samples_per_second": 64.741,
1433
+ "eval_steps_per_second": 16.199,
1434
+ "epoch": 1.3950530035335689,
1435
+ "step": 1974
1436
+ },
1437
+ {
1438
+ "loss": 1.2794,
1439
+ "grad_norm": 0.7894673347473145,
1440
+ "learning_rate": 4.551649410791384e-05,
1441
+ "epoch": 1.3992932862190812,
1442
+ "step": 1980
1443
+ },
1444
+ {
1445
+ "loss": 1.2718,
1446
+ "grad_norm": 0.7576306462287903,
1447
+ "learning_rate": 4.454019240422412e-05,
1448
+ "epoch": 1.4063604240282686,
1449
+ "step": 1990
1450
+ },
1451
+ {
1452
+ "loss": 1.2912,
1453
+ "grad_norm": 0.8169530630111694,
1454
+ "learning_rate": 4.357146625860391e-05,
1455
+ "epoch": 1.4134275618374559,
1456
+ "step": 2000
1457
+ },
1458
+ {
1459
+ "loss": 1.3073,
1460
+ "grad_norm": 0.8148018717765808,
1461
+ "learning_rate": 4.261044799464915e-05,
1462
+ "epoch": 1.420494699646643,
1463
+ "step": 2010
1464
+ },
1465
+ {
1466
+ "loss": 1.2797,
1467
+ "grad_norm": 0.8685176968574524,
1468
+ "learning_rate": 4.165726888309402e-05,
1469
+ "epoch": 1.4275618374558303,
1470
+ "step": 2020
1471
+ },
1472
+ {
1473
+ "loss": 1.2755,
1474
+ "grad_norm": 0.8314803838729858,
1475
+ "learning_rate": 4.0712059123880155e-05,
1476
+ "epoch": 1.4346289752650176,
1477
+ "step": 2030
1478
+ },
1479
+ {
1480
+ "loss": 1.2559,
1481
+ "grad_norm": 0.9396986961364746,
1482
+ "learning_rate": 3.977494782837182e-05,
1483
+ "epoch": 1.441696113074205,
1484
+ "step": 2040
1485
+ },
1486
+ {
1487
+ "loss": 1.2518,
1488
+ "grad_norm": 0.9005519151687622,
1489
+ "learning_rate": 3.884606300171979e-05,
1490
+ "epoch": 1.4487632508833923,
1491
+ "step": 2050
1492
+ },
1493
+ {
1494
+ "loss": 1.3159,
1495
+ "grad_norm": 0.8290562033653259,
1496
+ "learning_rate": 3.7925531525376623e-05,
1497
+ "epoch": 1.4558303886925796,
1498
+ "step": 2060
1499
+ },
1500
+ {
1501
+ "loss": 1.2345,
1502
+ "grad_norm": 0.8309129476547241,
1503
+ "learning_rate": 3.7013479139765115e-05,
1504
+ "epoch": 1.4628975265017667,
1505
+ "step": 2070
1506
+ },
1507
+ {
1508
+ "loss": 1.2771,
1509
+ "grad_norm": 0.8853537440299988,
1510
+ "learning_rate": 3.611003042710266e-05,
1511
+ "epoch": 1.469964664310954,
1512
+ "step": 2080
1513
+ },
1514
+ {
1515
+ "loss": 1.2894,
1516
+ "grad_norm": 0.8893609642982483,
1517
+ "learning_rate": 3.521530879438407e-05,
1518
+ "epoch": 1.4770318021201414,
1519
+ "step": 2090
1520
+ },
1521
+ {
1522
+ "loss": 1.2939,
1523
+ "grad_norm": 0.8105437755584717,
1524
+ "learning_rate": 3.432943645652453e-05,
1525
+ "epoch": 1.4840989399293285,
1526
+ "step": 2100
1527
+ },
1528
+ {
1529
+ "loss": 1.26,
1530
+ "grad_norm": 0.8063532710075378,
1531
+ "learning_rate": 3.345253441966579e-05,
1532
+ "epoch": 1.4911660777385158,
1533
+ "step": 2110
1534
+ },
1535
+ {
1536
+ "loss": 1.2793,
1537
+ "grad_norm": 0.8935479521751404,
1538
+ "learning_rate": 3.258472246464717e-05,
1539
+ "epoch": 1.4982332155477032,
1540
+ "step": 2120
1541
+ },
1542
+ {
1543
+ "loss": 1.3013,
1544
+ "grad_norm": 0.7904537916183472,
1545
+ "learning_rate": 3.172611913064402e-05,
1546
+ "epoch": 1.5053003533568905,
1547
+ "step": 2130
1548
+ },
1549
+ {
1550
+ "loss": 1.2707,
1551
+ "grad_norm": 0.8947263956069946,
1552
+ "learning_rate": 3.087684169897588e-05,
1553
+ "epoch": 1.5123674911660778,
1554
+ "step": 2140
1555
+ },
1556
+ {
1557
+ "loss": 1.285,
1558
+ "grad_norm": 0.8150575160980225,
1559
+ "learning_rate": 3.0037006177086346e-05,
1560
+ "epoch": 1.5194346289752652,
1561
+ "step": 2150
1562
+ },
1563
+ {
1564
+ "loss": 1.2681,
1565
+ "grad_norm": 0.833285391330719,
1566
+ "learning_rate": 2.920672728269692e-05,
1567
+ "epoch": 1.5265017667844523,
1568
+ "step": 2160
1569
+ },
1570
+ {
1571
+ "loss": 1.2484,
1572
+ "grad_norm": 0.8433587551116943,
1573
+ "learning_rate": 2.8386118428137254e-05,
1574
+ "epoch": 1.5335689045936396,
1575
+ "step": 2170
1576
+ },
1577
+ {
1578
+ "loss": 1.2804,
1579
+ "grad_norm": 0.8045548796653748,
1580
+ "learning_rate": 2.7575291704853323e-05,
1581
+ "epoch": 1.5406360424028267,
1582
+ "step": 2180
1583
+ },
1584
+ {
1585
+ "loss": 1.2542,
1586
+ "grad_norm": 0.8309583067893982,
1587
+ "learning_rate": 2.6774357868096432e-05,
1588
+ "epoch": 1.547703180212014,
1589
+ "step": 2190
1590
+ },
1591
+ {
1592
+ "loss": 1.2715,
1593
+ "grad_norm": 1.0081511735916138,
1594
+ "learning_rate": 2.5983426321794502e-05,
1595
+ "epoch": 1.5547703180212014,
1596
+ "step": 2200
1597
+ },
1598
+ {
1599
+ "loss": 1.291,
1600
+ "grad_norm": 0.8824013471603394,
1601
+ "learning_rate": 2.5202605103607835e-05,
1602
+ "epoch": 1.5618374558303887,
1603
+ "step": 2210
1604
+ },
1605
+ {
1606
+ "loss": 1.2509,
1607
+ "grad_norm": 0.8758257031440735,
1608
+ "learning_rate": 2.443200087017192e-05,
1609
+ "epoch": 1.568904593639576,
1610
+ "step": 2220
1611
+ },
1612
+ {
1613
+ "loss": 1.2616,
1614
+ "grad_norm": 0.9100747108459473,
1615
+ "learning_rate": 2.3671718882528437e-05,
1616
+ "epoch": 1.5759717314487633,
1617
+ "step": 2230
1618
+ },
1619
+ {
1620
+ "loss": 1.2652,
1621
+ "grad_norm": 0.7860396504402161,
1622
+ "learning_rate": 2.292186299174712e-05,
1623
+ "epoch": 1.5830388692579507,
1624
+ "step": 2240
1625
+ },
1626
+ {
1627
+ "loss": 1.2654,
1628
+ "grad_norm": 0.7228366732597351,
1629
+ "learning_rate": 2.218253562474023e-05,
1630
+ "epoch": 1.5901060070671378,
1631
+ "step": 2250
1632
+ },
1633
+ {
1634
+ "eval_loss": 1.2894463539123535,
1635
+ "eval_runtime": 38.7873,
1636
+ "eval_samples_per_second": 61.412,
1637
+ "eval_steps_per_second": 15.366,
1638
+ "epoch": 1.5943462897526501,
1639
+ "step": 2256
1640
+ },
1641
+ {
1642
+ "loss": 1.249,
1643
+ "grad_norm": 0.8728644251823425,
1644
+ "learning_rate": 2.1453837770271334e-05,
1645
+ "epoch": 1.5971731448763251,
1646
+ "step": 2260
1647
+ },
1648
+ {
1649
+ "loss": 1.2812,
1650
+ "grad_norm": 0.8223507404327393,
1651
+ "learning_rate": 2.0735868965160953e-05,
1652
+ "epoch": 1.6042402826855122,
1653
+ "step": 2270
1654
+ },
1655
+ {
1656
+ "loss": 1.255,
1657
+ "grad_norm": 0.8579265475273132,
1658
+ "learning_rate": 2.0028727280690107e-05,
1659
+ "epoch": 1.6113074204946995,
1660
+ "step": 2280
1661
+ },
1662
+ {
1663
+ "loss": 1.2282,
1664
+ "grad_norm": 0.8867602348327637,
1665
+ "learning_rate": 1.9332509309204183e-05,
1666
+ "epoch": 1.6183745583038869,
1667
+ "step": 2290
1668
+ },
1669
+ {
1670
+ "loss": 1.2313,
1671
+ "grad_norm": 0.8542134761810303,
1672
+ "learning_rate": 1.8647310150919083e-05,
1673
+ "epoch": 1.6254416961130742,
1674
+ "step": 2300
1675
+ },
1676
+ {
1677
+ "loss": 1.2614,
1678
+ "grad_norm": 0.956292986869812,
1679
+ "learning_rate": 1.797322340093067e-05,
1680
+ "epoch": 1.6325088339222615,
1681
+ "step": 2310
1682
+ },
1683
+ {
1684
+ "loss": 1.2972,
1685
+ "grad_norm": 0.8404093980789185,
1686
+ "learning_rate": 1.7310341136430385e-05,
1687
+ "epoch": 1.6395759717314489,
1688
+ "step": 2320
1689
+ },
1690
+ {
1691
+ "loss": 1.2437,
1692
+ "grad_norm": 0.8927039504051208,
1693
+ "learning_rate": 1.6658753904127734e-05,
1694
+ "epoch": 1.6466431095406362,
1695
+ "step": 2330
1696
+ },
1697
+ {
1698
+ "loss": 1.2578,
1699
+ "grad_norm": 0.8942687511444092,
1700
+ "learning_rate": 1.6018550707882062e-05,
1701
+ "epoch": 1.6537102473498233,
1702
+ "step": 2340
1703
+ },
1704
+ {
1705
+ "loss": 1.2816,
1706
+ "grad_norm": 0.8343712687492371,
1707
+ "learning_rate": 1.538981899654508e-05,
1708
+ "epoch": 1.6607773851590106,
1709
+ "step": 2350
1710
+ },
1711
+ {
1712
+ "loss": 1.263,
1713
+ "grad_norm": 0.8375242352485657,
1714
+ "learning_rate": 1.477264465201572e-05,
1715
+ "epoch": 1.6678445229681977,
1716
+ "step": 2360
1717
+ },
1718
+ {
1719
+ "loss": 1.2703,
1720
+ "grad_norm": 0.802559494972229,
1721
+ "learning_rate": 1.4167111977508973e-05,
1722
+ "epoch": 1.674911660777385,
1723
+ "step": 2370
1724
+ },
1725
+ {
1726
+ "loss": 1.2546,
1727
+ "grad_norm": 0.9226746559143066,
1728
+ "learning_rate": 1.3573303686040628e-05,
1729
+ "epoch": 1.6819787985865724,
1730
+ "step": 2380
1731
+ },
1732
+ {
1733
+ "loss": 1.2529,
1734
+ "grad_norm": 0.8805841207504272,
1735
+ "learning_rate": 1.2991300889128866e-05,
1736
+ "epoch": 1.6890459363957597,
1737
+ "step": 2390
1738
+ },
1739
+ {
1740
+ "loss": 1.2908,
1741
+ "grad_norm": 0.830085039138794,
1742
+ "learning_rate": 1.2421183085714927e-05,
1743
+ "epoch": 1.696113074204947,
1744
+ "step": 2400
1745
+ },
1746
+ {
1747
+ "loss": 1.2519,
1748
+ "grad_norm": 0.8588423132896423,
1749
+ "learning_rate": 1.1863028151303879e-05,
1750
+ "epoch": 1.7031802120141344,
1751
+ "step": 2410
1752
+ },
1753
+ {
1754
+ "loss": 1.2771,
1755
+ "grad_norm": 0.7939627766609192,
1756
+ "learning_rate": 1.13169123273271e-05,
1757
+ "epoch": 1.7102473498233217,
1758
+ "step": 2420
1759
+ },
1760
+ {
1761
+ "loss": 1.2684,
1762
+ "grad_norm": 0.7675894498825073,
1763
+ "learning_rate": 1.078291021072817e-05,
1764
+ "epoch": 1.7173144876325088,
1765
+ "step": 2430
1766
+ },
1767
+ {
1768
+ "loss": 1.2842,
1769
+ "grad_norm": 0.9253499507904053,
1770
+ "learning_rate": 1.0261094743773203e-05,
1771
+ "epoch": 1.7243816254416962,
1772
+ "step": 2440
1773
+ },
1774
+ {
1775
+ "loss": 1.2408,
1776
+ "grad_norm": 0.786151111125946,
1777
+ "learning_rate": 9.751537204087258e-06,
1778
+ "epoch": 1.7314487632508833,
1779
+ "step": 2450
1780
+ },
1781
+ {
1782
+ "loss": 1.2586,
1783
+ "grad_norm": 0.8652048110961914,
1784
+ "learning_rate": 9.254307194918144e-06,
1785
+ "epoch": 1.7385159010600706,
1786
+ "step": 2460
1787
+ },
1788
+ {
1789
+ "loss": 1.2762,
1790
+ "grad_norm": 0.8132146000862122,
1791
+ "learning_rate": 8.769472635628905e-06,
1792
+ "epoch": 1.745583038869258,
1793
+ "step": 2470
1794
+ },
1795
+ {
1796
+ "loss": 1.2605,
1797
+ "grad_norm": 0.9114975333213806,
1798
+ "learning_rate": 8.297099752420446e-06,
1799
+ "epoch": 1.7526501766784452,
1800
+ "step": 2480
1801
+ },
1802
+ {
1803
+ "loss": 1.2713,
1804
+ "grad_norm": 0.7870185971260071,
1805
+ "learning_rate": 7.837253069285234e-06,
1806
+ "epoch": 1.7597173144876326,
1807
+ "step": 2490
1808
+ },
1809
+ {
1810
+ "loss": 1.258,
1811
+ "grad_norm": 0.8339151740074158,
1812
+ "learning_rate": 7.389995399193595e-06,
1813
+ "epoch": 1.76678445229682,
1814
+ "step": 2500
1815
+ },
1816
+ {
1817
+ "loss": 1.2465,
1818
+ "grad_norm": 0.8879913091659546,
1819
+ "learning_rate": 6.9553878355138936e-06,
1820
+ "epoch": 1.773851590106007,
1821
+ "step": 2510
1822
+ },
1823
+ {
1824
+ "loss": 1.2698,
1825
+ "grad_norm": 0.797390341758728,
1826
+ "learning_rate": 6.5334897436672535e-06,
1827
+ "epoch": 1.7809187279151943,
1828
+ "step": 2520
1829
+ },
1830
+ {
1831
+ "loss": 1.272,
1832
+ "grad_norm": 0.8963540196418762,
1833
+ "learning_rate": 6.124358753018689e-06,
1834
+ "epoch": 1.7879858657243817,
1835
+ "step": 2530
1836
+ },
1837
+ {
1838
+ "eval_loss": 1.283734917640686,
1839
+ "eval_runtime": 37.4706,
1840
+ "eval_samples_per_second": 63.57,
1841
+ "eval_steps_per_second": 15.906,
1842
+ "epoch": 1.7936395759717314,
1843
+ "step": 2538
1844
+ },
1845
+ {
1846
+ "loss": 1.2606,
1847
+ "grad_norm": 0.93072110414505,
1848
+ "learning_rate": 5.7280507490050985e-06,
1849
+ "epoch": 1.7950530035335688,
1850
+ "step": 2540
1851
+ },
1852
+ {
1853
+ "loss": 1.2404,
1854
+ "grad_norm": 0.8684141635894775,
1855
+ "learning_rate": 5.3446198655015765e-06,
1856
+ "epoch": 1.802120141342756,
1857
+ "step": 2550
1858
+ },
1859
+ {
1860
+ "loss": 1.2564,
1861
+ "grad_norm": 0.8672965168952942,
1862
+ "learning_rate": 4.974118477426992e-06,
1863
+ "epoch": 1.8091872791519434,
1864
+ "step": 2560
1865
+ },
1866
+ {
1867
+ "loss": 1.2504,
1868
+ "grad_norm": 0.7872085571289062,
1869
+ "learning_rate": 4.616597193589833e-06,
1870
+ "epoch": 1.8162544169611308,
1871
+ "step": 2570
1872
+ },
1873
+ {
1874
+ "loss": 1.2653,
1875
+ "grad_norm": 0.863117516040802,
1876
+ "learning_rate": 4.272104849775216e-06,
1877
+ "epoch": 1.823321554770318,
1878
+ "step": 2580
1879
+ },
1880
+ {
1881
+ "loss": 1.2169,
1882
+ "grad_norm": 1.0649867057800293,
1883
+ "learning_rate": 3.940688502074186e-06,
1884
+ "epoch": 1.8303886925795054,
1885
+ "step": 2590
1886
+ },
1887
+ {
1888
+ "loss": 1.2485,
1889
+ "grad_norm": 0.8719679117202759,
1890
+ "learning_rate": 3.622393420456016e-06,
1891
+ "epoch": 1.8374558303886925,
1892
+ "step": 2600
1893
+ },
1894
+ {
1895
+ "loss": 1.2318,
1896
+ "grad_norm": 0.9570611715316772,
1897
+ "learning_rate": 3.3172630825846095e-06,
1898
+ "epoch": 1.8445229681978799,
1899
+ "step": 2610
1900
+ },
1901
+ {
1902
+ "loss": 1.2593,
1903
+ "grad_norm": 0.8749817609786987,
1904
+ "learning_rate": 3.025339167879615e-06,
1905
+ "epoch": 1.851590106007067,
1906
+ "step": 2620
1907
+ },
1908
+ {
1909
+ "loss": 1.259,
1910
+ "grad_norm": 0.7263243198394775,
1911
+ "learning_rate": 2.7466615518231486e-06,
1912
+ "epoch": 1.8586572438162543,
1913
+ "step": 2630
1914
+ },
1915
+ {
1916
+ "loss": 1.2725,
1917
+ "grad_norm": 0.8946100473403931,
1918
+ "learning_rate": 2.4812683005130843e-06,
1919
+ "epoch": 1.8657243816254416,
1920
+ "step": 2640
1921
+ },
1922
+ {
1923
+ "loss": 1.253,
1924
+ "grad_norm": 0.7784376740455627,
1925
+ "learning_rate": 2.229195665463324e-06,
1926
+ "epoch": 1.872791519434629,
1927
+ "step": 2650
1928
+ },
1929
+ {
1930
+ "loss": 1.284,
1931
+ "grad_norm": 0.8362652659416199,
1932
+ "learning_rate": 1.990478078652047e-06,
1933
+ "epoch": 1.8798586572438163,
1934
+ "step": 2660
1935
+ },
1936
+ {
1937
+ "loss": 1.2727,
1938
+ "grad_norm": 0.8206844329833984,
1939
+ "learning_rate": 1.7651481478184296e-06,
1940
+ "epoch": 1.8869257950530036,
1941
+ "step": 2670
1942
+ },
1943
+ {
1944
+ "loss": 1.2561,
1945
+ "grad_norm": 0.7767570614814758,
1946
+ "learning_rate": 1.553236652008605e-06,
1947
+ "epoch": 1.893992932862191,
1948
+ "step": 2680
1949
+ },
1950
+ {
1951
+ "loss": 1.2774,
1952
+ "grad_norm": 0.8429548740386963,
1953
+ "learning_rate": 1.3547725373713405e-06,
1954
+ "epoch": 1.901060070671378,
1955
+ "step": 2690
1956
+ },
1957
+ {
1958
+ "loss": 1.2471,
1959
+ "grad_norm": 0.8795408606529236,
1960
+ "learning_rate": 1.169782913204176e-06,
1961
+ "epoch": 1.9081272084805654,
1962
+ "step": 2700
1963
+ },
1964
+ {
1965
+ "loss": 1.2484,
1966
+ "grad_norm": 0.8149511814117432,
1967
+ "learning_rate": 9.98293048250376e-07,
1968
+ "epoch": 1.9151943462897525,
1969
+ "step": 2710
1970
+ },
1971
+ {
1972
+ "loss": 1.2341,
1973
+ "grad_norm": 0.9525308609008789,
1974
+ "learning_rate": 8.403263672473793e-07,
1975
+ "epoch": 1.9222614840989398,
1976
+ "step": 2720
1977
+ },
1978
+ {
1979
+ "loss": 1.2646,
1980
+ "grad_norm": 0.849061906337738,
1981
+ "learning_rate": 6.959044477270138e-07,
1982
+ "epoch": 1.9293286219081272,
1983
+ "step": 2730
1984
+ },
1985
+ {
1986
+ "loss": 1.2561,
1987
+ "grad_norm": 0.7326868772506714,
1988
+ "learning_rate": 5.650470170681876e-07,
1989
+ "epoch": 1.9363957597173145,
1990
+ "step": 2740
1991
+ },
1992
+ {
1993
+ "loss": 1.2533,
1994
+ "grad_norm": 0.8101119995117188,
1995
+ "learning_rate": 4.477719498021782e-07,
1996
+ "epoch": 1.9434628975265018,
1997
+ "step": 2750
1998
+ },
1999
+ {
2000
+ "loss": 1.2551,
2001
+ "grad_norm": 0.8675141334533691,
2002
+ "learning_rate": 3.440952651710072e-07,
2003
+ "epoch": 1.9505300353356891,
2004
+ "step": 2760
2005
+ },
2006
+ {
2007
+ "loss": 1.274,
2008
+ "grad_norm": 0.8655520081520081,
2009
+ "learning_rate": 2.540311249393912e-07,
2010
+ "epoch": 1.9575971731448765,
2011
+ "step": 2770
2012
+ },
2013
+ {
2014
+ "loss": 1.258,
2015
+ "grad_norm": 0.8928612470626831,
2016
+ "learning_rate": 1.7759183146021096e-07,
2017
+ "epoch": 1.9646643109540636,
2018
+ "step": 2780
2019
+ },
2020
+ {
2021
+ "loss": 1.2497,
2022
+ "grad_norm": 0.8795515298843384,
2023
+ "learning_rate": 1.1478782599411153e-07,
2024
+ "epoch": 1.971731448763251,
2025
+ "step": 2790
2026
+ },
2027
+ {
2028
+ "loss": 1.2383,
2029
+ "grad_norm": 0.8107221126556396,
2030
+ "learning_rate": 6.562768728327618e-08,
2031
+ "epoch": 1.978798586572438,
2032
+ "step": 2800
2033
+ },
2034
+ {
2035
+ "loss": 1.2578,
2036
+ "grad_norm": 0.8321946859359741,
2037
+ "learning_rate": 3.0118130379575005e-08,
2038
+ "epoch": 1.9858657243816253,
2039
+ "step": 2810
2040
+ },
2041
+ {
2042
+ "loss": 1.273,
2043
+ "grad_norm": 0.8319383859634399,
2044
+ "learning_rate": 8.2640057273764e-09,
2045
+ "epoch": 1.9929328621908127,
2046
+ "step": 2820
2047
+ },
2048
+ {
2049
+ "eval_loss": 1.282422423362732,
2050
+ "eval_runtime": 32.3803,
2051
+ "eval_samples_per_second": 73.563,
2052
+ "eval_steps_per_second": 18.406,
2053
+ "epoch": 1.9929328621908127,
2054
+ "step": 2820
2055
+ },
2056
+ {
2057
+ "loss": 1.2187,
2058
+ "grad_norm": 1.9112646579742432,
2059
+ "learning_rate": 6.829850092149315e-11,
2060
+ "epoch": 2.0,
2061
+ "step": 2830
2062
+ },
2063
+ {
2064
+ "train_runtime": 1496.9978,
2065
+ "train_samples_per_second": 60.461,
2066
+ "train_steps_per_second": 1.89,
2067
+ "total_flos": 6.672248883264e+16,
2068
+ "train_loss": 1.398907420828991,
2069
+ "epoch": 2.0,
2070
+ "step": 2830
2071
+ }
2072
+ ]
train/training_loss.png ADDED
train/validation_loss.png ADDED