Text Generation
Transformers
Safetensors
qwen3
llama-factory
full
Generated from Trainer
conversational
text-generation-inference
Instructions to use TabibitoQZP/Qwen3-4B-3Task with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TabibitoQZP/Qwen3-4B-3Task with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TabibitoQZP/Qwen3-4B-3Task") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("TabibitoQZP/Qwen3-4B-3Task") model = AutoModelForMultimodalLM.from_pretrained("TabibitoQZP/Qwen3-4B-3Task") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TabibitoQZP/Qwen3-4B-3Task with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TabibitoQZP/Qwen3-4B-3Task" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TabibitoQZP/Qwen3-4B-3Task", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TabibitoQZP/Qwen3-4B-3Task
- SGLang
How to use TabibitoQZP/Qwen3-4B-3Task with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TabibitoQZP/Qwen3-4B-3Task" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TabibitoQZP/Qwen3-4B-3Task", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TabibitoQZP/Qwen3-4B-3Task" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TabibitoQZP/Qwen3-4B-3Task", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TabibitoQZP/Qwen3-4B-3Task with Docker Model Runner:
docker model run hf.co/TabibitoQZP/Qwen3-4B-3Task
Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +61 -0
- added_tokens.json +28 -0
- all_results.json +8 -0
- chat_template.jinja +89 -0
- config.json +30 -0
- generation_config.json +13 -0
- merges.txt +0 -0
- model-00001-of-00002.safetensors +3 -0
- model-00002-of-00002.safetensors +3 -0
- model.safetensors.index.json +405 -0
- special_tokens_map.json +31 -0
- tokenizer.json +3 -0
- tokenizer_config.json +240 -0
- train_results.json +8 -0
- trainer_log.jsonl +160 -0
- trainer_state.json +1156 -0
- training_args.bin +3 -0
- training_loss.png +0 -0
- vocab.json +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
license: other
|
| 4 |
+
base_model: Qwen/Qwen3-4B
|
| 5 |
+
tags:
|
| 6 |
+
- llama-factory
|
| 7 |
+
- full
|
| 8 |
+
- generated_from_trainer
|
| 9 |
+
model-index:
|
| 10 |
+
- name: Qwen3-4B-3Task
|
| 11 |
+
results: []
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 15 |
+
should probably proofread and complete it, then remove this comment. -->
|
| 16 |
+
|
| 17 |
+
# Qwen3-4B-3Task
|
| 18 |
+
|
| 19 |
+
This model is a fine-tuned version of [/home/zipengqiu/models/Qwen3-4B/](https://huggingface.co//home/zipengqiu/models/Qwen3-4B/) on the 3task_data dataset.
|
| 20 |
+
|
| 21 |
+
## Model description
|
| 22 |
+
|
| 23 |
+
More information needed
|
| 24 |
+
|
| 25 |
+
## Intended uses & limitations
|
| 26 |
+
|
| 27 |
+
More information needed
|
| 28 |
+
|
| 29 |
+
## Training and evaluation data
|
| 30 |
+
|
| 31 |
+
More information needed
|
| 32 |
+
|
| 33 |
+
## Training procedure
|
| 34 |
+
|
| 35 |
+
### Training hyperparameters
|
| 36 |
+
|
| 37 |
+
The following hyperparameters were used during training:
|
| 38 |
+
- learning_rate: 1e-05
|
| 39 |
+
- train_batch_size: 1
|
| 40 |
+
- eval_batch_size: 8
|
| 41 |
+
- seed: 42
|
| 42 |
+
- distributed_type: multi-GPU
|
| 43 |
+
- num_devices: 4
|
| 44 |
+
- gradient_accumulation_steps: 8
|
| 45 |
+
- total_train_batch_size: 32
|
| 46 |
+
- total_eval_batch_size: 32
|
| 47 |
+
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 48 |
+
- lr_scheduler_type: cosine
|
| 49 |
+
- lr_scheduler_warmup_ratio: 0.1
|
| 50 |
+
- num_epochs: 2.0
|
| 51 |
+
|
| 52 |
+
### Training results
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
### Framework versions
|
| 57 |
+
|
| 58 |
+
- Transformers 4.52.4
|
| 59 |
+
- Pytorch 2.6.0+cu124
|
| 60 |
+
- Datasets 3.6.0
|
| 61 |
+
- Tokenizers 0.21.1
|
added_tokens.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"</think>": 151668,
|
| 3 |
+
"</tool_call>": 151658,
|
| 4 |
+
"</tool_response>": 151666,
|
| 5 |
+
"<think>": 151667,
|
| 6 |
+
"<tool_call>": 151657,
|
| 7 |
+
"<tool_response>": 151665,
|
| 8 |
+
"<|box_end|>": 151649,
|
| 9 |
+
"<|box_start|>": 151648,
|
| 10 |
+
"<|endoftext|>": 151643,
|
| 11 |
+
"<|file_sep|>": 151664,
|
| 12 |
+
"<|fim_middle|>": 151660,
|
| 13 |
+
"<|fim_pad|>": 151662,
|
| 14 |
+
"<|fim_prefix|>": 151659,
|
| 15 |
+
"<|fim_suffix|>": 151661,
|
| 16 |
+
"<|im_end|>": 151645,
|
| 17 |
+
"<|im_start|>": 151644,
|
| 18 |
+
"<|image_pad|>": 151655,
|
| 19 |
+
"<|object_ref_end|>": 151647,
|
| 20 |
+
"<|object_ref_start|>": 151646,
|
| 21 |
+
"<|quad_end|>": 151651,
|
| 22 |
+
"<|quad_start|>": 151650,
|
| 23 |
+
"<|repo_name|>": 151663,
|
| 24 |
+
"<|video_pad|>": 151656,
|
| 25 |
+
"<|vision_end|>": 151653,
|
| 26 |
+
"<|vision_pad|>": 151654,
|
| 27 |
+
"<|vision_start|>": 151652
|
| 28 |
+
}
|
all_results.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"epoch": 2.0,
|
| 3 |
+
"total_flos": 207829311291392.0,
|
| 4 |
+
"train_loss": 0.4968748902374843,
|
| 5 |
+
"train_runtime": 50004.2557,
|
| 6 |
+
"train_samples_per_second": 1.016,
|
| 7 |
+
"train_steps_per_second": 0.032
|
| 8 |
+
}
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0].role == 'system' %}
|
| 4 |
+
{{- messages[0].content + '\n\n' }}
|
| 5 |
+
{%- endif %}
|
| 6 |
+
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 7 |
+
{%- for tool in tools %}
|
| 8 |
+
{{- "\n" }}
|
| 9 |
+
{{- tool | tojson }}
|
| 10 |
+
{%- endfor %}
|
| 11 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 12 |
+
{%- else %}
|
| 13 |
+
{%- if messages[0].role == 'system' %}
|
| 14 |
+
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
| 15 |
+
{%- endif %}
|
| 16 |
+
{%- endif %}
|
| 17 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 18 |
+
{%- for message in messages[::-1] %}
|
| 19 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 20 |
+
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
| 21 |
+
{%- set ns.multi_step_tool = false %}
|
| 22 |
+
{%- set ns.last_query_index = index %}
|
| 23 |
+
{%- endif %}
|
| 24 |
+
{%- endfor %}
|
| 25 |
+
{%- for message in messages %}
|
| 26 |
+
{%- if message.content is string %}
|
| 27 |
+
{%- set content = message.content %}
|
| 28 |
+
{%- else %}
|
| 29 |
+
{%- set content = '' %}
|
| 30 |
+
{%- endif %}
|
| 31 |
+
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
| 32 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 33 |
+
{%- elif message.role == "assistant" %}
|
| 34 |
+
{%- set reasoning_content = '' %}
|
| 35 |
+
{%- if message.reasoning_content is string %}
|
| 36 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 37 |
+
{%- else %}
|
| 38 |
+
{%- if '</think>' in content %}
|
| 39 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 40 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 41 |
+
{%- endif %}
|
| 42 |
+
{%- endif %}
|
| 43 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 44 |
+
{%- if loop.last or (not loop.last and reasoning_content) %}
|
| 45 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
| 46 |
+
{%- else %}
|
| 47 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 48 |
+
{%- endif %}
|
| 49 |
+
{%- else %}
|
| 50 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 51 |
+
{%- endif %}
|
| 52 |
+
{%- if message.tool_calls %}
|
| 53 |
+
{%- for tool_call in message.tool_calls %}
|
| 54 |
+
{%- if (loop.first and content) or (not loop.first) %}
|
| 55 |
+
{{- '\n' }}
|
| 56 |
+
{%- endif %}
|
| 57 |
+
{%- if tool_call.function %}
|
| 58 |
+
{%- set tool_call = tool_call.function %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 61 |
+
{{- tool_call.name }}
|
| 62 |
+
{{- '", "arguments": ' }}
|
| 63 |
+
{%- if tool_call.arguments is string %}
|
| 64 |
+
{{- tool_call.arguments }}
|
| 65 |
+
{%- else %}
|
| 66 |
+
{{- tool_call.arguments | tojson }}
|
| 67 |
+
{%- endif %}
|
| 68 |
+
{{- '}\n</tool_call>' }}
|
| 69 |
+
{%- endfor %}
|
| 70 |
+
{%- endif %}
|
| 71 |
+
{{- '<|im_end|>\n' }}
|
| 72 |
+
{%- elif message.role == "tool" %}
|
| 73 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 74 |
+
{{- '<|im_start|>user' }}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{{- '\n<tool_response>\n' }}
|
| 77 |
+
{{- content }}
|
| 78 |
+
{{- '\n</tool_response>' }}
|
| 79 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 80 |
+
{{- '<|im_end|>\n' }}
|
| 81 |
+
{%- endif %}
|
| 82 |
+
{%- endif %}
|
| 83 |
+
{%- endfor %}
|
| 84 |
+
{%- if add_generation_prompt %}
|
| 85 |
+
{{- '<|im_start|>assistant\n' }}
|
| 86 |
+
{%- if enable_thinking is defined and enable_thinking is false %}
|
| 87 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 88 |
+
{%- endif %}
|
| 89 |
+
{%- endif %}
|
config.json
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": 151643,
|
| 8 |
+
"eos_token_id": 151645,
|
| 9 |
+
"head_dim": 128,
|
| 10 |
+
"hidden_act": "silu",
|
| 11 |
+
"hidden_size": 2560,
|
| 12 |
+
"initializer_range": 0.02,
|
| 13 |
+
"intermediate_size": 9728,
|
| 14 |
+
"max_position_embeddings": 40960,
|
| 15 |
+
"max_window_layers": 36,
|
| 16 |
+
"model_type": "qwen3",
|
| 17 |
+
"num_attention_heads": 32,
|
| 18 |
+
"num_hidden_layers": 36,
|
| 19 |
+
"num_key_value_heads": 8,
|
| 20 |
+
"rms_norm_eps": 1e-06,
|
| 21 |
+
"rope_scaling": null,
|
| 22 |
+
"rope_theta": 1000000,
|
| 23 |
+
"sliding_window": null,
|
| 24 |
+
"tie_word_embeddings": true,
|
| 25 |
+
"torch_dtype": "bfloat16",
|
| 26 |
+
"transformers_version": "4.52.4",
|
| 27 |
+
"use_cache": false,
|
| 28 |
+
"use_sliding_window": false,
|
| 29 |
+
"vocab_size": 151936
|
| 30 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 151643,
|
| 3 |
+
"do_sample": true,
|
| 4 |
+
"eos_token_id": [
|
| 5 |
+
151645,
|
| 6 |
+
151643
|
| 7 |
+
],
|
| 8 |
+
"pad_token_id": 151643,
|
| 9 |
+
"temperature": 0.6,
|
| 10 |
+
"top_k": 20,
|
| 11 |
+
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "4.52.4"
|
| 13 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model-00001-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:36df1e304a7079c6a48260d2ace4cdf660125fe882176757289c604aebee7789
|
| 3 |
+
size 4967215360
|
model-00002-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3302a8f2305b30780cc187db70ed82d748d68bb74d7122f2e833aa9b80407cd4
|
| 3 |
+
size 3077766632
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,405 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"metadata": {
|
| 3 |
+
"total_size": 8044936192
|
| 4 |
+
},
|
| 5 |
+
"weight_map": {
|
| 6 |
+
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
| 7 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 8 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 9 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 10 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 11 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 12 |
+
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 13 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 14 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 15 |
+
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 16 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 17 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 18 |
+
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 19 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 20 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 21 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 22 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 23 |
+
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 24 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 25 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 26 |
+
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 27 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 28 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 29 |
+
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 30 |
+
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 31 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 32 |
+
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 33 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 34 |
+
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 35 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 36 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 37 |
+
"model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 38 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 39 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 40 |
+
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 41 |
+
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 42 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 43 |
+
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 44 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 45 |
+
"model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 46 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 47 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 48 |
+
"model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 49 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 50 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 51 |
+
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 52 |
+
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 53 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 54 |
+
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 55 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 56 |
+
"model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 57 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 58 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 59 |
+
"model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 60 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 61 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 62 |
+
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 63 |
+
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 64 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 65 |
+
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 66 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 67 |
+
"model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 68 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 69 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 70 |
+
"model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 71 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 72 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 73 |
+
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 74 |
+
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 75 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 76 |
+
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 77 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 78 |
+
"model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 79 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 80 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 81 |
+
"model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 82 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 83 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 84 |
+
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 85 |
+
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 86 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 87 |
+
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 88 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 89 |
+
"model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 90 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 91 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 92 |
+
"model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 93 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 94 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 95 |
+
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 96 |
+
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 97 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 98 |
+
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 99 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 100 |
+
"model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 101 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 102 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 103 |
+
"model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 104 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 105 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 106 |
+
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 107 |
+
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 108 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 109 |
+
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 110 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 111 |
+
"model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 112 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 113 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 114 |
+
"model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 115 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 116 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 117 |
+
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 118 |
+
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 119 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 120 |
+
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 121 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 122 |
+
"model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 123 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 124 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 125 |
+
"model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 126 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 127 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 128 |
+
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 129 |
+
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 130 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 131 |
+
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 132 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 133 |
+
"model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 134 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 135 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 136 |
+
"model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 137 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 138 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 139 |
+
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 140 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 141 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 142 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 143 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 144 |
+
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 145 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 146 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 147 |
+
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 148 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 149 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 150 |
+
"model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 151 |
+
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 152 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 153 |
+
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 154 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 155 |
+
"model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 156 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 157 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 158 |
+
"model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 159 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 160 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 161 |
+
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 162 |
+
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 163 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 164 |
+
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 165 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 166 |
+
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 167 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 168 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 169 |
+
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 170 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 171 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 172 |
+
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 173 |
+
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 174 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 175 |
+
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 176 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 177 |
+
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 178 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 179 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 180 |
+
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 181 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 182 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 183 |
+
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 184 |
+
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 185 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 186 |
+
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 187 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 188 |
+
"model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 189 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 190 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 191 |
+
"model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 192 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 193 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 194 |
+
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 195 |
+
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 196 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 197 |
+
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 198 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 199 |
+
"model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 200 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 201 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 202 |
+
"model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 203 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 204 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 205 |
+
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 206 |
+
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 207 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 208 |
+
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 209 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 210 |
+
"model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 211 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 212 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 213 |
+
"model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 214 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 215 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 216 |
+
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 217 |
+
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 218 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 219 |
+
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 220 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 221 |
+
"model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 222 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 223 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 224 |
+
"model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 225 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 226 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 227 |
+
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 228 |
+
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 229 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 230 |
+
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 231 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 232 |
+
"model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 233 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 234 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 235 |
+
"model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 236 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 237 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 238 |
+
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 239 |
+
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 240 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 241 |
+
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 242 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 243 |
+
"model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 244 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 245 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 246 |
+
"model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 247 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 248 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 249 |
+
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 250 |
+
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 251 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 252 |
+
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 253 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 254 |
+
"model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 255 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 256 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 257 |
+
"model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 258 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 259 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 260 |
+
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 261 |
+
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 262 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 263 |
+
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 264 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 265 |
+
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 266 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 267 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 268 |
+
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 269 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 270 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 271 |
+
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 272 |
+
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 273 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 274 |
+
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 275 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 276 |
+
"model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 277 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 278 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 279 |
+
"model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 280 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 281 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 282 |
+
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 283 |
+
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 284 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 285 |
+
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 286 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 287 |
+
"model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 288 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 289 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 290 |
+
"model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 291 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 292 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 293 |
+
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 294 |
+
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 295 |
+
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 296 |
+
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 297 |
+
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 298 |
+
"model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 299 |
+
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 300 |
+
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 301 |
+
"model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 302 |
+
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 303 |
+
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 304 |
+
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 305 |
+
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 306 |
+
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 307 |
+
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 308 |
+
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 309 |
+
"model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 310 |
+
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 311 |
+
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 312 |
+
"model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 313 |
+
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 314 |
+
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 315 |
+
"model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 316 |
+
"model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 317 |
+
"model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 318 |
+
"model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 319 |
+
"model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 320 |
+
"model.layers.34.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 321 |
+
"model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 322 |
+
"model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 323 |
+
"model.layers.34.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 324 |
+
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 325 |
+
"model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 326 |
+
"model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 327 |
+
"model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
| 328 |
+
"model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
| 329 |
+
"model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
| 330 |
+
"model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 331 |
+
"model.layers.35.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
| 332 |
+
"model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
| 333 |
+
"model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
| 334 |
+
"model.layers.35.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
| 335 |
+
"model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
| 336 |
+
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
| 337 |
+
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 338 |
+
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 339 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 340 |
+
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 341 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 342 |
+
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 343 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 344 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 345 |
+
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 346 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 347 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 348 |
+
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 349 |
+
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 350 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 351 |
+
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 352 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 353 |
+
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 354 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 355 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 356 |
+
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 357 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 358 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 359 |
+
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 360 |
+
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 361 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 362 |
+
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 363 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 364 |
+
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 365 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 366 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 367 |
+
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 368 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 369 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 370 |
+
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 371 |
+
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 372 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 373 |
+
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 374 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 375 |
+
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 376 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 377 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 378 |
+
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 379 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 380 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 381 |
+
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 382 |
+
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 383 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 384 |
+
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 385 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 386 |
+
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 387 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 388 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 389 |
+
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 390 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 391 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 392 |
+
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 393 |
+
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
| 394 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
| 395 |
+
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
| 396 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 397 |
+
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
| 398 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
| 399 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
| 400 |
+
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
| 401 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 402 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 403 |
+
"model.norm.weight": "model-00002-of-00002.safetensors"
|
| 404 |
+
}
|
| 405 |
+
}
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"additional_special_tokens": [
|
| 3 |
+
"<|im_start|>",
|
| 4 |
+
"<|im_end|>",
|
| 5 |
+
"<|object_ref_start|>",
|
| 6 |
+
"<|object_ref_end|>",
|
| 7 |
+
"<|box_start|>",
|
| 8 |
+
"<|box_end|>",
|
| 9 |
+
"<|quad_start|>",
|
| 10 |
+
"<|quad_end|>",
|
| 11 |
+
"<|vision_start|>",
|
| 12 |
+
"<|vision_end|>",
|
| 13 |
+
"<|vision_pad|>",
|
| 14 |
+
"<|image_pad|>",
|
| 15 |
+
"<|video_pad|>"
|
| 16 |
+
],
|
| 17 |
+
"eos_token": {
|
| 18 |
+
"content": "<|im_end|>",
|
| 19 |
+
"lstrip": false,
|
| 20 |
+
"normalized": false,
|
| 21 |
+
"rstrip": false,
|
| 22 |
+
"single_word": false
|
| 23 |
+
},
|
| 24 |
+
"pad_token": {
|
| 25 |
+
"content": "<|endoftext|>",
|
| 26 |
+
"lstrip": false,
|
| 27 |
+
"normalized": false,
|
| 28 |
+
"rstrip": false,
|
| 29 |
+
"single_word": false
|
| 30 |
+
}
|
| 31 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
| 3 |
+
size 11422654
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,240 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_bos_token": false,
|
| 3 |
+
"add_prefix_space": false,
|
| 4 |
+
"added_tokens_decoder": {
|
| 5 |
+
"151643": {
|
| 6 |
+
"content": "<|endoftext|>",
|
| 7 |
+
"lstrip": false,
|
| 8 |
+
"normalized": false,
|
| 9 |
+
"rstrip": false,
|
| 10 |
+
"single_word": false,
|
| 11 |
+
"special": true
|
| 12 |
+
},
|
| 13 |
+
"151644": {
|
| 14 |
+
"content": "<|im_start|>",
|
| 15 |
+
"lstrip": false,
|
| 16 |
+
"normalized": false,
|
| 17 |
+
"rstrip": false,
|
| 18 |
+
"single_word": false,
|
| 19 |
+
"special": true
|
| 20 |
+
},
|
| 21 |
+
"151645": {
|
| 22 |
+
"content": "<|im_end|>",
|
| 23 |
+
"lstrip": false,
|
| 24 |
+
"normalized": false,
|
| 25 |
+
"rstrip": false,
|
| 26 |
+
"single_word": false,
|
| 27 |
+
"special": true
|
| 28 |
+
},
|
| 29 |
+
"151646": {
|
| 30 |
+
"content": "<|object_ref_start|>",
|
| 31 |
+
"lstrip": false,
|
| 32 |
+
"normalized": false,
|
| 33 |
+
"rstrip": false,
|
| 34 |
+
"single_word": false,
|
| 35 |
+
"special": true
|
| 36 |
+
},
|
| 37 |
+
"151647": {
|
| 38 |
+
"content": "<|object_ref_end|>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false,
|
| 43 |
+
"special": true
|
| 44 |
+
},
|
| 45 |
+
"151648": {
|
| 46 |
+
"content": "<|box_start|>",
|
| 47 |
+
"lstrip": false,
|
| 48 |
+
"normalized": false,
|
| 49 |
+
"rstrip": false,
|
| 50 |
+
"single_word": false,
|
| 51 |
+
"special": true
|
| 52 |
+
},
|
| 53 |
+
"151649": {
|
| 54 |
+
"content": "<|box_end|>",
|
| 55 |
+
"lstrip": false,
|
| 56 |
+
"normalized": false,
|
| 57 |
+
"rstrip": false,
|
| 58 |
+
"single_word": false,
|
| 59 |
+
"special": true
|
| 60 |
+
},
|
| 61 |
+
"151650": {
|
| 62 |
+
"content": "<|quad_start|>",
|
| 63 |
+
"lstrip": false,
|
| 64 |
+
"normalized": false,
|
| 65 |
+
"rstrip": false,
|
| 66 |
+
"single_word": false,
|
| 67 |
+
"special": true
|
| 68 |
+
},
|
| 69 |
+
"151651": {
|
| 70 |
+
"content": "<|quad_end|>",
|
| 71 |
+
"lstrip": false,
|
| 72 |
+
"normalized": false,
|
| 73 |
+
"rstrip": false,
|
| 74 |
+
"single_word": false,
|
| 75 |
+
"special": true
|
| 76 |
+
},
|
| 77 |
+
"151652": {
|
| 78 |
+
"content": "<|vision_start|>",
|
| 79 |
+
"lstrip": false,
|
| 80 |
+
"normalized": false,
|
| 81 |
+
"rstrip": false,
|
| 82 |
+
"single_word": false,
|
| 83 |
+
"special": true
|
| 84 |
+
},
|
| 85 |
+
"151653": {
|
| 86 |
+
"content": "<|vision_end|>",
|
| 87 |
+
"lstrip": false,
|
| 88 |
+
"normalized": false,
|
| 89 |
+
"rstrip": false,
|
| 90 |
+
"single_word": false,
|
| 91 |
+
"special": true
|
| 92 |
+
},
|
| 93 |
+
"151654": {
|
| 94 |
+
"content": "<|vision_pad|>",
|
| 95 |
+
"lstrip": false,
|
| 96 |
+
"normalized": false,
|
| 97 |
+
"rstrip": false,
|
| 98 |
+
"single_word": false,
|
| 99 |
+
"special": true
|
| 100 |
+
},
|
| 101 |
+
"151655": {
|
| 102 |
+
"content": "<|image_pad|>",
|
| 103 |
+
"lstrip": false,
|
| 104 |
+
"normalized": false,
|
| 105 |
+
"rstrip": false,
|
| 106 |
+
"single_word": false,
|
| 107 |
+
"special": true
|
| 108 |
+
},
|
| 109 |
+
"151656": {
|
| 110 |
+
"content": "<|video_pad|>",
|
| 111 |
+
"lstrip": false,
|
| 112 |
+
"normalized": false,
|
| 113 |
+
"rstrip": false,
|
| 114 |
+
"single_word": false,
|
| 115 |
+
"special": true
|
| 116 |
+
},
|
| 117 |
+
"151657": {
|
| 118 |
+
"content": "<tool_call>",
|
| 119 |
+
"lstrip": false,
|
| 120 |
+
"normalized": false,
|
| 121 |
+
"rstrip": false,
|
| 122 |
+
"single_word": false,
|
| 123 |
+
"special": false
|
| 124 |
+
},
|
| 125 |
+
"151658": {
|
| 126 |
+
"content": "</tool_call>",
|
| 127 |
+
"lstrip": false,
|
| 128 |
+
"normalized": false,
|
| 129 |
+
"rstrip": false,
|
| 130 |
+
"single_word": false,
|
| 131 |
+
"special": false
|
| 132 |
+
},
|
| 133 |
+
"151659": {
|
| 134 |
+
"content": "<|fim_prefix|>",
|
| 135 |
+
"lstrip": false,
|
| 136 |
+
"normalized": false,
|
| 137 |
+
"rstrip": false,
|
| 138 |
+
"single_word": false,
|
| 139 |
+
"special": false
|
| 140 |
+
},
|
| 141 |
+
"151660": {
|
| 142 |
+
"content": "<|fim_middle|>",
|
| 143 |
+
"lstrip": false,
|
| 144 |
+
"normalized": false,
|
| 145 |
+
"rstrip": false,
|
| 146 |
+
"single_word": false,
|
| 147 |
+
"special": false
|
| 148 |
+
},
|
| 149 |
+
"151661": {
|
| 150 |
+
"content": "<|fim_suffix|>",
|
| 151 |
+
"lstrip": false,
|
| 152 |
+
"normalized": false,
|
| 153 |
+
"rstrip": false,
|
| 154 |
+
"single_word": false,
|
| 155 |
+
"special": false
|
| 156 |
+
},
|
| 157 |
+
"151662": {
|
| 158 |
+
"content": "<|fim_pad|>",
|
| 159 |
+
"lstrip": false,
|
| 160 |
+
"normalized": false,
|
| 161 |
+
"rstrip": false,
|
| 162 |
+
"single_word": false,
|
| 163 |
+
"special": false
|
| 164 |
+
},
|
| 165 |
+
"151663": {
|
| 166 |
+
"content": "<|repo_name|>",
|
| 167 |
+
"lstrip": false,
|
| 168 |
+
"normalized": false,
|
| 169 |
+
"rstrip": false,
|
| 170 |
+
"single_word": false,
|
| 171 |
+
"special": false
|
| 172 |
+
},
|
| 173 |
+
"151664": {
|
| 174 |
+
"content": "<|file_sep|>",
|
| 175 |
+
"lstrip": false,
|
| 176 |
+
"normalized": false,
|
| 177 |
+
"rstrip": false,
|
| 178 |
+
"single_word": false,
|
| 179 |
+
"special": false
|
| 180 |
+
},
|
| 181 |
+
"151665": {
|
| 182 |
+
"content": "<tool_response>",
|
| 183 |
+
"lstrip": false,
|
| 184 |
+
"normalized": false,
|
| 185 |
+
"rstrip": false,
|
| 186 |
+
"single_word": false,
|
| 187 |
+
"special": false
|
| 188 |
+
},
|
| 189 |
+
"151666": {
|
| 190 |
+
"content": "</tool_response>",
|
| 191 |
+
"lstrip": false,
|
| 192 |
+
"normalized": false,
|
| 193 |
+
"rstrip": false,
|
| 194 |
+
"single_word": false,
|
| 195 |
+
"special": false
|
| 196 |
+
},
|
| 197 |
+
"151667": {
|
| 198 |
+
"content": "<think>",
|
| 199 |
+
"lstrip": false,
|
| 200 |
+
"normalized": false,
|
| 201 |
+
"rstrip": false,
|
| 202 |
+
"single_word": false,
|
| 203 |
+
"special": false
|
| 204 |
+
},
|
| 205 |
+
"151668": {
|
| 206 |
+
"content": "</think>",
|
| 207 |
+
"lstrip": false,
|
| 208 |
+
"normalized": false,
|
| 209 |
+
"rstrip": false,
|
| 210 |
+
"single_word": false,
|
| 211 |
+
"special": false
|
| 212 |
+
}
|
| 213 |
+
},
|
| 214 |
+
"additional_special_tokens": [
|
| 215 |
+
"<|im_start|>",
|
| 216 |
+
"<|im_end|>",
|
| 217 |
+
"<|object_ref_start|>",
|
| 218 |
+
"<|object_ref_end|>",
|
| 219 |
+
"<|box_start|>",
|
| 220 |
+
"<|box_end|>",
|
| 221 |
+
"<|quad_start|>",
|
| 222 |
+
"<|quad_end|>",
|
| 223 |
+
"<|vision_start|>",
|
| 224 |
+
"<|vision_end|>",
|
| 225 |
+
"<|vision_pad|>",
|
| 226 |
+
"<|image_pad|>",
|
| 227 |
+
"<|video_pad|>"
|
| 228 |
+
],
|
| 229 |
+
"bos_token": null,
|
| 230 |
+
"clean_up_tokenization_spaces": false,
|
| 231 |
+
"eos_token": "<|im_end|>",
|
| 232 |
+
"errors": "replace",
|
| 233 |
+
"extra_special_tokens": {},
|
| 234 |
+
"model_max_length": 131072,
|
| 235 |
+
"pad_token": "<|endoftext|>",
|
| 236 |
+
"padding_side": "right",
|
| 237 |
+
"split_special_tokens": false,
|
| 238 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 239 |
+
"unk_token": null
|
| 240 |
+
}
|
train_results.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"epoch": 2.0,
|
| 3 |
+
"total_flos": 207829311291392.0,
|
| 4 |
+
"train_loss": 0.4968748902374843,
|
| 5 |
+
"train_runtime": 50004.2557,
|
| 6 |
+
"train_samples_per_second": 1.016,
|
| 7 |
+
"train_steps_per_second": 0.032
|
| 8 |
+
}
|
trainer_log.jsonl
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"current_steps": 10, "total_steps": 1590, "loss": 1.2378, "lr": 5.660377358490567e-07, "epoch": 0.012592475995592633, "percentage": 0.63, "elapsed_time": "0:05:18", "remaining_time": "13:59:01"}
|
| 2 |
+
{"current_steps": 20, "total_steps": 1590, "loss": 0.9875, "lr": 1.1949685534591195e-06, "epoch": 0.025184951991185266, "percentage": 1.26, "elapsed_time": "0:10:33", "remaining_time": "13:48:12"}
|
| 3 |
+
{"current_steps": 30, "total_steps": 1590, "loss": 0.6558, "lr": 1.8238993710691824e-06, "epoch": 0.0377774279867779, "percentage": 1.89, "elapsed_time": "0:15:38", "remaining_time": "13:33:18"}
|
| 4 |
+
{"current_steps": 40, "total_steps": 1590, "loss": 0.5882, "lr": 2.4528301886792453e-06, "epoch": 0.05036990398237053, "percentage": 2.52, "elapsed_time": "0:20:39", "remaining_time": "13:20:44"}
|
| 5 |
+
{"current_steps": 50, "total_steps": 1590, "loss": 0.5781, "lr": 3.0817610062893084e-06, "epoch": 0.06296237997796317, "percentage": 3.14, "elapsed_time": "0:25:55", "remaining_time": "13:18:32"}
|
| 6 |
+
{"current_steps": 60, "total_steps": 1590, "loss": 0.5495, "lr": 3.710691823899371e-06, "epoch": 0.0755548559735558, "percentage": 3.77, "elapsed_time": "0:30:53", "remaining_time": "13:07:47"}
|
| 7 |
+
{"current_steps": 70, "total_steps": 1590, "loss": 0.5535, "lr": 4.339622641509435e-06, "epoch": 0.08814733196914844, "percentage": 4.4, "elapsed_time": "0:36:05", "remaining_time": "13:03:49"}
|
| 8 |
+
{"current_steps": 80, "total_steps": 1590, "loss": 0.5396, "lr": 4.968553459119497e-06, "epoch": 0.10073980796474107, "percentage": 5.03, "elapsed_time": "0:41:10", "remaining_time": "12:57:03"}
|
| 9 |
+
{"current_steps": 90, "total_steps": 1590, "loss": 0.5413, "lr": 5.59748427672956e-06, "epoch": 0.11333228396033371, "percentage": 5.66, "elapsed_time": "0:46:16", "remaining_time": "12:51:15"}
|
| 10 |
+
{"current_steps": 100, "total_steps": 1590, "loss": 0.5376, "lr": 6.226415094339623e-06, "epoch": 0.12592475995592634, "percentage": 6.29, "elapsed_time": "0:51:22", "remaining_time": "12:45:25"}
|
| 11 |
+
{"current_steps": 110, "total_steps": 1590, "loss": 0.5351, "lr": 6.855345911949685e-06, "epoch": 0.13851723595151896, "percentage": 6.92, "elapsed_time": "0:56:29", "remaining_time": "12:40:01"}
|
| 12 |
+
{"current_steps": 120, "total_steps": 1590, "loss": 0.5409, "lr": 7.484276729559748e-06, "epoch": 0.1511097119471116, "percentage": 7.55, "elapsed_time": "1:01:48", "remaining_time": "12:37:13"}
|
| 13 |
+
{"current_steps": 130, "total_steps": 1590, "loss": 0.5442, "lr": 8.113207547169812e-06, "epoch": 0.16370218794270425, "percentage": 8.18, "elapsed_time": "1:07:13", "remaining_time": "12:34:59"}
|
| 14 |
+
{"current_steps": 140, "total_steps": 1590, "loss": 0.5365, "lr": 8.742138364779875e-06, "epoch": 0.17629466393829687, "percentage": 8.81, "elapsed_time": "1:12:17", "remaining_time": "12:28:42"}
|
| 15 |
+
{"current_steps": 150, "total_steps": 1590, "loss": 0.5382, "lr": 9.371069182389939e-06, "epoch": 0.1888871399338895, "percentage": 9.43, "elapsed_time": "1:17:40", "remaining_time": "12:25:41"}
|
| 16 |
+
{"current_steps": 160, "total_steps": 1590, "loss": 0.5414, "lr": 1e-05, "epoch": 0.20147961592948213, "percentage": 10.06, "elapsed_time": "1:23:02", "remaining_time": "12:22:10"}
|
| 17 |
+
{"current_steps": 170, "total_steps": 1590, "loss": 0.5331, "lr": 9.998795122086687e-06, "epoch": 0.21407209192507476, "percentage": 10.69, "elapsed_time": "1:28:16", "remaining_time": "12:17:20"}
|
| 18 |
+
{"current_steps": 180, "total_steps": 1590, "loss": 0.5345, "lr": 9.995181069039055e-06, "epoch": 0.22666456792066741, "percentage": 11.32, "elapsed_time": "1:33:20", "remaining_time": "12:11:07"}
|
| 19 |
+
{"current_steps": 190, "total_steps": 1590, "loss": 0.5302, "lr": 9.989159582654187e-06, "epoch": 0.23925704391626004, "percentage": 11.95, "elapsed_time": "1:38:32", "remaining_time": "12:06:08"}
|
| 20 |
+
{"current_steps": 200, "total_steps": 1590, "loss": 0.5337, "lr": 9.98073356499446e-06, "epoch": 0.25184951991185267, "percentage": 12.58, "elapsed_time": "1:43:53", "remaining_time": "12:02:03"}
|
| 21 |
+
{"current_steps": 210, "total_steps": 1590, "loss": 0.5229, "lr": 9.969907076988907e-06, "epoch": 0.2644419959074453, "percentage": 13.21, "elapsed_time": "1:49:03", "remaining_time": "11:56:42"}
|
| 22 |
+
{"current_steps": 220, "total_steps": 1590, "loss": 0.54, "lr": 9.956685336476037e-06, "epoch": 0.2770344719030379, "percentage": 13.84, "elapsed_time": "1:54:36", "remaining_time": "11:53:38"}
|
| 23 |
+
{"current_steps": 230, "total_steps": 1590, "loss": 0.5252, "lr": 9.941074715689097e-06, "epoch": 0.2896269478986306, "percentage": 14.47, "elapsed_time": "1:59:47", "remaining_time": "11:48:18"}
|
| 24 |
+
{"current_steps": 240, "total_steps": 1590, "loss": 0.5324, "lr": 9.923082738184969e-06, "epoch": 0.3022194238942232, "percentage": 15.09, "elapsed_time": "2:05:03", "remaining_time": "11:43:24"}
|
| 25 |
+
{"current_steps": 250, "total_steps": 1590, "loss": 0.5213, "lr": 9.902718075218176e-06, "epoch": 0.31481189988981584, "percentage": 15.72, "elapsed_time": "2:10:18", "remaining_time": "11:38:29"}
|
| 26 |
+
{"current_steps": 260, "total_steps": 1590, "loss": 0.5221, "lr": 9.879990541561766e-06, "epoch": 0.3274043758854085, "percentage": 16.35, "elapsed_time": "2:15:33", "remaining_time": "11:33:25"}
|
| 27 |
+
{"current_steps": 270, "total_steps": 1590, "loss": 0.5116, "lr": 9.854911090777071e-06, "epoch": 0.3399968518810011, "percentage": 16.98, "elapsed_time": "2:20:39", "remaining_time": "11:27:42"}
|
| 28 |
+
{"current_steps": 280, "total_steps": 1590, "loss": 0.5198, "lr": 9.827491809934621e-06, "epoch": 0.35258932787659375, "percentage": 17.61, "elapsed_time": "2:25:50", "remaining_time": "11:22:22"}
|
| 29 |
+
{"current_steps": 290, "total_steps": 1590, "loss": 0.5243, "lr": 9.797745913788772e-06, "epoch": 0.36518180387218635, "percentage": 18.24, "elapsed_time": "2:31:05", "remaining_time": "11:17:17"}
|
| 30 |
+
{"current_steps": 300, "total_steps": 1590, "loss": 0.5154, "lr": 9.765687738408834e-06, "epoch": 0.377774279867779, "percentage": 18.87, "elapsed_time": "2:36:14", "remaining_time": "11:11:51"}
|
| 31 |
+
{"current_steps": 310, "total_steps": 1590, "loss": 0.5368, "lr": 9.731332734269791e-06, "epoch": 0.39036675586337166, "percentage": 19.5, "elapsed_time": "2:41:31", "remaining_time": "11:06:55"}
|
| 32 |
+
{"current_steps": 320, "total_steps": 1590, "loss": 0.5257, "lr": 9.69469745880592e-06, "epoch": 0.40295923185896426, "percentage": 20.13, "elapsed_time": "2:46:58", "remaining_time": "11:02:41"}
|
| 33 |
+
{"current_steps": 330, "total_steps": 1590, "loss": 0.5179, "lr": 9.655799568430926e-06, "epoch": 0.4155517078545569, "percentage": 20.75, "elapsed_time": "2:52:14", "remaining_time": "10:57:38"}
|
| 34 |
+
{"current_steps": 340, "total_steps": 1590, "loss": 0.5205, "lr": 9.614657810028402e-06, "epoch": 0.4281441838501495, "percentage": 21.38, "elapsed_time": "2:57:37", "remaining_time": "10:53:00"}
|
| 35 |
+
{"current_steps": 350, "total_steps": 1590, "loss": 0.5185, "lr": 9.571292011916753e-06, "epoch": 0.4407366598457422, "percentage": 22.01, "elapsed_time": "3:02:56", "remaining_time": "10:48:09"}
|
| 36 |
+
{"current_steps": 360, "total_steps": 1590, "loss": 0.5193, "lr": 9.525723074292916e-06, "epoch": 0.45332913584133483, "percentage": 22.64, "elapsed_time": "3:08:06", "remaining_time": "10:42:43"}
|
| 37 |
+
{"current_steps": 370, "total_steps": 1590, "loss": 0.5163, "lr": 9.47797295915948e-06, "epoch": 0.46592161183692743, "percentage": 23.27, "elapsed_time": "3:13:15", "remaining_time": "10:37:13"}
|
| 38 |
+
{"current_steps": 380, "total_steps": 1590, "loss": 0.5108, "lr": 9.428064679740081e-06, "epoch": 0.4785140878325201, "percentage": 23.9, "elapsed_time": "3:18:38", "remaining_time": "10:32:32"}
|
| 39 |
+
{"current_steps": 390, "total_steps": 1590, "loss": 0.5241, "lr": 9.37602228938814e-06, "epoch": 0.4911065638281127, "percentage": 24.53, "elapsed_time": "3:23:59", "remaining_time": "10:27:40"}
|
| 40 |
+
{"current_steps": 400, "total_steps": 1590, "loss": 0.5132, "lr": 9.321870869994336e-06, "epoch": 0.5036990398237053, "percentage": 25.16, "elapsed_time": "3:29:13", "remaining_time": "10:22:27"}
|
| 41 |
+
{"current_steps": 410, "total_steps": 1590, "loss": 0.5015, "lr": 9.26563651989835e-06, "epoch": 0.516291515819298, "percentage": 25.79, "elapsed_time": "3:34:12", "remaining_time": "10:16:29"}
|
| 42 |
+
{"current_steps": 420, "total_steps": 1590, "loss": 0.5329, "lr": 9.207346341310744e-06, "epoch": 0.5288839918148907, "percentage": 26.42, "elapsed_time": "3:39:43", "remaining_time": "10:12:06"}
|
| 43 |
+
{"current_steps": 430, "total_steps": 1590, "loss": 0.5237, "lr": 9.14702842725101e-06, "epoch": 0.5414764678104832, "percentage": 27.04, "elapsed_time": "3:45:08", "remaining_time": "10:07:22"}
|
| 44 |
+
{"current_steps": 440, "total_steps": 1590, "loss": 0.5175, "lr": 9.084711848008122e-06, "epoch": 0.5540689438060759, "percentage": 27.67, "elapsed_time": "3:50:37", "remaining_time": "10:02:45"}
|
| 45 |
+
{"current_steps": 450, "total_steps": 1590, "loss": 0.5048, "lr": 9.020426637130069e-06, "epoch": 0.5666614198016685, "percentage": 28.3, "elapsed_time": "3:55:41", "remaining_time": "9:57:05"}
|
| 46 |
+
{"current_steps": 460, "total_steps": 1590, "loss": 0.524, "lr": 8.954203776949141e-06, "epoch": 0.5792538957972612, "percentage": 28.93, "elapsed_time": "4:01:06", "remaining_time": "9:52:16"}
|
| 47 |
+
{"current_steps": 470, "total_steps": 1590, "loss": 0.5177, "lr": 8.886075183649976e-06, "epoch": 0.5918463717928538, "percentage": 29.56, "elapsed_time": "4:06:21", "remaining_time": "9:47:05"}
|
| 48 |
+
{"current_steps": 480, "total_steps": 1590, "loss": 0.5089, "lr": 8.816073691887506e-06, "epoch": 0.6044388477884464, "percentage": 30.19, "elapsed_time": "4:11:30", "remaining_time": "9:41:36"}
|
| 49 |
+
{"current_steps": 490, "total_steps": 1590, "loss": 0.5006, "lr": 8.744233038962262e-06, "epoch": 0.617031323784039, "percentage": 30.82, "elapsed_time": "4:16:18", "remaining_time": "9:35:22"}
|
| 50 |
+
{"current_steps": 500, "total_steps": 1590, "loss": 0.5069, "lr": 8.670587848560636e-06, "epoch": 0.6296237997796317, "percentage": 31.45, "elapsed_time": "4:21:26", "remaining_time": "9:29:56"}
|
| 51 |
+
{"current_steps": 510, "total_steps": 1590, "loss": 0.5083, "lr": 8.595173614067966e-06, "epoch": 0.6422162757752243, "percentage": 32.08, "elapsed_time": "4:26:26", "remaining_time": "9:24:13"}
|
| 52 |
+
{"current_steps": 520, "total_steps": 1590, "loss": 0.4996, "lr": 8.518026681462448e-06, "epoch": 0.654808751770817, "percentage": 32.7, "elapsed_time": "4:31:42", "remaining_time": "9:19:05"}
|
| 53 |
+
{"current_steps": 530, "total_steps": 1590, "loss": 0.5071, "lr": 8.43918423179815e-06, "epoch": 0.6674012277664095, "percentage": 33.33, "elapsed_time": "4:36:55", "remaining_time": "9:13:50"}
|
| 54 |
+
{"current_steps": 540, "total_steps": 1590, "loss": 0.5038, "lr": 8.358684263285566e-06, "epoch": 0.6799937037620022, "percentage": 33.96, "elapsed_time": "4:42:13", "remaining_time": "9:08:45"}
|
| 55 |
+
{"current_steps": 550, "total_steps": 1590, "loss": 0.5136, "lr": 8.27656557297833e-06, "epoch": 0.6925861797575948, "percentage": 34.59, "elapsed_time": "4:47:38", "remaining_time": "9:03:53"}
|
| 56 |
+
{"current_steps": 560, "total_steps": 1590, "loss": 0.5087, "lr": 8.192867738074927e-06, "epoch": 0.7051786557531875, "percentage": 35.22, "elapsed_time": "4:52:47", "remaining_time": "8:58:32"}
|
| 57 |
+
{"current_steps": 570, "total_steps": 1590, "loss": 0.5061, "lr": 8.107631096844431e-06, "epoch": 0.7177711317487802, "percentage": 35.85, "elapsed_time": "4:57:47", "remaining_time": "8:52:52"}
|
| 58 |
+
{"current_steps": 580, "total_steps": 1590, "loss": 0.5022, "lr": 8.020896729185406e-06, "epoch": 0.7303636077443727, "percentage": 36.48, "elapsed_time": "5:02:59", "remaining_time": "8:47:37"}
|
| 59 |
+
{"current_steps": 590, "total_steps": 1590, "loss": 0.504, "lr": 7.93270643682742e-06, "epoch": 0.7429560837399654, "percentage": 37.11, "elapsed_time": "5:08:15", "remaining_time": "8:42:28"}
|
| 60 |
+
{"current_steps": 600, "total_steps": 1590, "loss": 0.5112, "lr": 7.843102723184647e-06, "epoch": 0.755548559735558, "percentage": 37.74, "elapsed_time": "5:13:30", "remaining_time": "8:37:17"}
|
| 61 |
+
{"current_steps": 610, "total_steps": 1590, "loss": 0.5112, "lr": 7.752128772871292e-06, "epoch": 0.7681410357311507, "percentage": 38.36, "elapsed_time": "5:18:48", "remaining_time": "8:32:10"}
|
| 62 |
+
{"current_steps": 620, "total_steps": 1590, "loss": 0.5058, "lr": 7.659828430888726e-06, "epoch": 0.7807335117267433, "percentage": 38.99, "elapsed_time": "5:24:00", "remaining_time": "8:26:55"}
|
| 63 |
+
{"current_steps": 630, "total_steps": 1590, "loss": 0.5028, "lr": 7.566246181494325e-06, "epoch": 0.7933259877223359, "percentage": 39.62, "elapsed_time": "5:29:01", "remaining_time": "8:21:22"}
|
| 64 |
+
{"current_steps": 640, "total_steps": 1590, "loss": 0.5064, "lr": 7.4714271267622395e-06, "epoch": 0.8059184637179285, "percentage": 40.25, "elapsed_time": "5:34:15", "remaining_time": "8:16:09"}
|
| 65 |
+
{"current_steps": 650, "total_steps": 1590, "loss": 0.5032, "lr": 7.3754169648463924e-06, "epoch": 0.8185109397135212, "percentage": 40.88, "elapsed_time": "5:39:23", "remaining_time": "8:10:48"}
|
| 66 |
+
{"current_steps": 660, "total_steps": 1590, "loss": 0.5064, "lr": 7.278261967956203e-06, "epoch": 0.8311034157091138, "percentage": 41.51, "elapsed_time": "5:44:37", "remaining_time": "8:05:36"}
|
| 67 |
+
{"current_steps": 670, "total_steps": 1590, "loss": 0.5108, "lr": 7.18000896005564e-06, "epoch": 0.8436958917047065, "percentage": 42.14, "elapsed_time": "5:50:01", "remaining_time": "8:00:37"}
|
| 68 |
+
{"current_steps": 680, "total_steps": 1590, "loss": 0.498, "lr": 7.080705294296355e-06, "epoch": 0.856288367700299, "percentage": 42.77, "elapsed_time": "5:55:04", "remaining_time": "7:55:10"}
|
| 69 |
+
{"current_steps": 690, "total_steps": 1590, "loss": 0.4974, "lr": 6.980398830195785e-06, "epoch": 0.8688808436958917, "percentage": 43.4, "elapsed_time": "6:00:18", "remaining_time": "7:49:58"}
|
| 70 |
+
{"current_steps": 700, "total_steps": 1590, "loss": 0.4979, "lr": 6.879137910571191e-06, "epoch": 0.8814733196914843, "percentage": 44.03, "elapsed_time": "6:05:30", "remaining_time": "7:44:42"}
|
| 71 |
+
{"current_steps": 710, "total_steps": 1590, "loss": 0.5063, "lr": 6.77697133824079e-06, "epoch": 0.894065795687077, "percentage": 44.65, "elapsed_time": "6:10:50", "remaining_time": "7:39:38"}
|
| 72 |
+
{"current_steps": 720, "total_steps": 1590, "loss": 0.5148, "lr": 6.673948352503172e-06, "epoch": 0.9066582716826697, "percentage": 45.28, "elapsed_time": "6:16:09", "remaining_time": "7:34:31"}
|
| 73 |
+
{"current_steps": 730, "total_steps": 1590, "loss": 0.5078, "lr": 6.5701186054063704e-06, "epoch": 0.9192507476782622, "percentage": 45.91, "elapsed_time": "6:21:19", "remaining_time": "7:29:13"}
|
| 74 |
+
{"current_steps": 740, "total_steps": 1590, "loss": 0.505, "lr": 6.4655321378179935e-06, "epoch": 0.9318432236738549, "percentage": 46.54, "elapsed_time": "6:26:43", "remaining_time": "7:24:12"}
|
| 75 |
+
{"current_steps": 750, "total_steps": 1590, "loss": 0.5016, "lr": 6.360239355307972e-06, "epoch": 0.9444356996694475, "percentage": 47.17, "elapsed_time": "6:31:44", "remaining_time": "7:18:45"}
|
| 76 |
+
{"current_steps": 760, "total_steps": 1590, "loss": 0.5086, "lr": 6.254291003855537e-06, "epoch": 0.9570281756650402, "percentage": 47.8, "elapsed_time": "6:37:10", "remaining_time": "7:13:45"}
|
| 77 |
+
{"current_steps": 770, "total_steps": 1590, "loss": 0.4998, "lr": 6.147738145392137e-06, "epoch": 0.9696206516606328, "percentage": 48.43, "elapsed_time": "6:42:28", "remaining_time": "7:08:36"}
|
| 78 |
+
{"current_steps": 780, "total_steps": 1590, "loss": 0.4933, "lr": 6.040632133192074e-06, "epoch": 0.9822131276562254, "percentage": 49.06, "elapsed_time": "6:47:42", "remaining_time": "7:03:23"}
|
| 79 |
+
{"current_steps": 790, "total_steps": 1590, "loss": 0.5084, "lr": 5.933024587122745e-06, "epoch": 0.994805603651818, "percentage": 49.69, "elapsed_time": "6:53:04", "remaining_time": "6:58:18"}
|
| 80 |
+
{"current_steps": 800, "total_steps": 1590, "loss": 0.4699, "lr": 5.824967368766375e-06, "epoch": 1.0062962379977962, "percentage": 50.31, "elapsed_time": "6:57:31", "remaining_time": "6:52:18"}
|
| 81 |
+
{"current_steps": 810, "total_steps": 1590, "loss": 0.4574, "lr": 5.716512556425271e-06, "epoch": 1.0188887139933889, "percentage": 50.94, "elapsed_time": "7:02:42", "remaining_time": "6:47:02"}
|
| 82 |
+
{"current_steps": 820, "total_steps": 1590, "loss": 0.4637, "lr": 5.607712420022627e-06, "epoch": 1.0314811899889815, "percentage": 51.57, "elapsed_time": "7:07:53", "remaining_time": "6:41:48"}
|
| 83 |
+
{"current_steps": 830, "total_steps": 1590, "loss": 0.464, "lr": 5.4986193959109716e-06, "epoch": 1.0440736659845742, "percentage": 52.2, "elapsed_time": "7:13:11", "remaining_time": "6:36:39"}
|
| 84 |
+
{"current_steps": 840, "total_steps": 1590, "loss": 0.4587, "lr": 5.389286061600402e-06, "epoch": 1.0566661419801668, "percentage": 52.83, "elapsed_time": "7:18:18", "remaining_time": "6:31:20"}
|
| 85 |
+
{"current_steps": 850, "total_steps": 1590, "loss": 0.4615, "lr": 5.2797651104187965e-06, "epoch": 1.0692586179757595, "percentage": 53.46, "elapsed_time": "7:23:34", "remaining_time": "6:26:10"}
|
| 86 |
+
{"current_steps": 860, "total_steps": 1590, "loss": 0.4544, "lr": 5.1701093261162095e-06, "epoch": 1.0818510939713522, "percentage": 54.09, "elapsed_time": "7:28:42", "remaining_time": "6:20:52"}
|
| 87 |
+
{"current_steps": 870, "total_steps": 1590, "loss": 0.4578, "lr": 5.060371557425669e-06, "epoch": 1.0944435699669448, "percentage": 54.72, "elapsed_time": "7:33:47", "remaining_time": "6:15:32"}
|
| 88 |
+
{"current_steps": 880, "total_steps": 1590, "loss": 0.4601, "lr": 4.9506046925926725e-06, "epoch": 1.1070360459625375, "percentage": 55.35, "elapsed_time": "7:39:12", "remaining_time": "6:10:29"}
|
| 89 |
+
{"current_steps": 890, "total_steps": 1590, "loss": 0.4655, "lr": 4.840861633885642e-06, "epoch": 1.1196285219581301, "percentage": 55.97, "elapsed_time": "7:44:18", "remaining_time": "6:05:11"}
|
| 90 |
+
{"current_steps": 900, "total_steps": 1590, "loss": 0.4741, "lr": 4.7311952720996106e-06, "epoch": 1.1322209979537226, "percentage": 56.6, "elapsed_time": "7:49:50", "remaining_time": "6:00:12"}
|
| 91 |
+
{"current_steps": 910, "total_steps": 1590, "loss": 0.4639, "lr": 4.621658461065435e-06, "epoch": 1.1448134739493152, "percentage": 57.23, "elapsed_time": "7:55:09", "remaining_time": "5:55:04"}
|
| 92 |
+
{"current_steps": 920, "total_steps": 1590, "loss": 0.4664, "lr": 4.512303992176841e-06, "epoch": 1.1574059499449079, "percentage": 57.86, "elapsed_time": "8:00:30", "remaining_time": "5:49:55"}
|
| 93 |
+
{"current_steps": 930, "total_steps": 1590, "loss": 0.466, "lr": 4.4031845689475406e-06, "epoch": 1.1699984259405005, "percentage": 58.49, "elapsed_time": "8:05:40", "remaining_time": "5:44:40"}
|
| 94 |
+
{"current_steps": 940, "total_steps": 1590, "loss": 0.4526, "lr": 4.294352781610722e-06, "epoch": 1.1825909019360932, "percentage": 59.12, "elapsed_time": "8:10:51", "remaining_time": "5:39:25"}
|
| 95 |
+
{"current_steps": 950, "total_steps": 1590, "loss": 0.4488, "lr": 4.185861081773115e-06, "epoch": 1.1951833779316858, "percentage": 59.75, "elapsed_time": "8:15:49", "remaining_time": "5:34:01"}
|
| 96 |
+
{"current_steps": 960, "total_steps": 1590, "loss": 0.4572, "lr": 4.077761757135882e-06, "epoch": 1.2077758539272785, "percentage": 60.38, "elapsed_time": "8:20:56", "remaining_time": "5:28:44"}
|
| 97 |
+
{"current_steps": 970, "total_steps": 1590, "loss": 0.4684, "lr": 3.970106906294509e-06, "epoch": 1.2203683299228711, "percentage": 61.01, "elapsed_time": "8:26:14", "remaining_time": "5:23:34"}
|
| 98 |
+
{"current_steps": 980, "total_steps": 1590, "loss": 0.459, "lr": 3.862948413629806e-06, "epoch": 1.2329608059184638, "percentage": 61.64, "elapsed_time": "8:31:37", "remaining_time": "5:18:27"}
|
| 99 |
+
{"current_steps": 990, "total_steps": 1590, "loss": 0.4509, "lr": 3.7563379243021924e-06, "epoch": 1.2455532819140562, "percentage": 62.26, "elapsed_time": "8:36:36", "remaining_time": "5:13:05"}
|
| 100 |
+
{"current_steps": 1000, "total_steps": 1590, "loss": 0.4625, "lr": 3.6503268193612316e-06, "epoch": 1.258145757909649, "percentage": 62.89, "elapsed_time": "8:42:05", "remaining_time": "5:08:01"}
|
| 101 |
+
{"current_steps": 1010, "total_steps": 1590, "loss": 0.4638, "lr": 3.5449661909824908e-06, "epoch": 1.2707382339052415, "percentage": 63.52, "elapsed_time": "8:48:07", "remaining_time": "5:03:16"}
|
| 102 |
+
{"current_steps": 1020, "total_steps": 1590, "loss": 0.4658, "lr": 3.440306817843592e-06, "epoch": 1.2833307099008342, "percentage": 64.15, "elapsed_time": "8:53:26", "remaining_time": "4:58:06"}
|
| 103 |
+
{"current_steps": 1030, "total_steps": 1590, "loss": 0.4536, "lr": 3.336399140651385e-06, "epoch": 1.2959231858964269, "percentage": 64.78, "elapsed_time": "8:58:34", "remaining_time": "4:52:49"}
|
| 104 |
+
{"current_steps": 1040, "total_steps": 1590, "loss": 0.4489, "lr": 3.2332932378319803e-06, "epoch": 1.3085156618920195, "percentage": 65.41, "elapsed_time": "9:03:52", "remaining_time": "4:47:37"}
|
| 105 |
+
{"current_steps": 1050, "total_steps": 1590, "loss": 0.4627, "lr": 3.1310388013953897e-06, "epoch": 1.3211081378876122, "percentage": 66.04, "elapsed_time": "9:09:04", "remaining_time": "4:42:23"}
|
| 106 |
+
{"current_steps": 1060, "total_steps": 1590, "loss": 0.4656, "lr": 3.029685112986417e-06, "epoch": 1.3337006138832048, "percentage": 66.67, "elapsed_time": "9:14:26", "remaining_time": "4:37:13"}
|
| 107 |
+
{"current_steps": 1070, "total_steps": 1590, "loss": 0.4591, "lr": 2.9292810201332995e-06, "epoch": 1.3462930898787975, "percentage": 67.3, "elapsed_time": "9:19:54", "remaining_time": "4:32:06"}
|
| 108 |
+
{"current_steps": 1080, "total_steps": 1590, "loss": 0.4646, "lr": 2.8298749127055914e-06, "epoch": 1.3588855658743901, "percentage": 67.92, "elapsed_time": "9:25:08", "remaining_time": "4:26:52"}
|
| 109 |
+
{"current_steps": 1090, "total_steps": 1590, "loss": 0.452, "lr": 2.7315146995926085e-06, "epoch": 1.3714780418699828, "percentage": 68.55, "elapsed_time": "9:30:33", "remaining_time": "4:21:43"}
|
| 110 |
+
{"current_steps": 1100, "total_steps": 1590, "loss": 0.4587, "lr": 2.6342477856136806e-06, "epoch": 1.3840705178655752, "percentage": 69.18, "elapsed_time": "9:35:53", "remaining_time": "4:16:32"}
|
| 111 |
+
{"current_steps": 1110, "total_steps": 1590, "loss": 0.4623, "lr": 2.53812104867135e-06, "epoch": 1.3966629938611679, "percentage": 69.81, "elapsed_time": "9:41:18", "remaining_time": "4:11:22"}
|
| 112 |
+
{"current_steps": 1120, "total_steps": 1590, "loss": 0.463, "lr": 2.443180817158502e-06, "epoch": 1.4092554698567605, "percentage": 70.44, "elapsed_time": "9:46:33", "remaining_time": "4:06:08"}
|
| 113 |
+
{"current_steps": 1130, "total_steps": 1590, "loss": 0.4582, "lr": 2.3494728476303547e-06, "epoch": 1.4218479458523532, "percentage": 71.07, "elapsed_time": "9:52:06", "remaining_time": "4:01:02"}
|
| 114 |
+
{"current_steps": 1140, "total_steps": 1590, "loss": 0.4583, "lr": 2.2570423027520175e-06, "epoch": 1.4344404218479458, "percentage": 71.7, "elapsed_time": "9:57:18", "remaining_time": "3:55:46"}
|
| 115 |
+
{"current_steps": 1150, "total_steps": 1590, "loss": 0.4675, "lr": 2.1659337295323117e-06, "epoch": 1.4470328978435385, "percentage": 72.33, "elapsed_time": "10:02:38", "remaining_time": "3:50:34"}
|
| 116 |
+
{"current_steps": 1160, "total_steps": 1590, "loss": 0.4573, "lr": 2.076191037854267e-06, "epoch": 1.4596253738391312, "percentage": 72.96, "elapsed_time": "10:07:57", "remaining_time": "3:45:21"}
|
| 117 |
+
{"current_steps": 1170, "total_steps": 1590, "loss": 0.4647, "lr": 1.987857479312721e-06, "epoch": 1.4722178498347238, "percentage": 73.58, "elapsed_time": "10:13:10", "remaining_time": "3:40:06"}
|
| 118 |
+
{"current_steps": 1180, "total_steps": 1590, "loss": 0.4633, "lr": 1.9009756263691475e-06, "epoch": 1.4848103258303165, "percentage": 74.21, "elapsed_time": "10:18:50", "remaining_time": "3:35:01"}
|
| 119 |
+
{"current_steps": 1190, "total_steps": 1590, "loss": 0.4591, "lr": 1.815587351833818e-06, "epoch": 1.497402801825909, "percentage": 74.84, "elapsed_time": "10:24:25", "remaining_time": "3:29:53"}
|
| 120 |
+
{"current_steps": 1200, "total_steps": 1590, "loss": 0.459, "lr": 1.7317338086851526e-06, "epoch": 1.5099952778215018, "percentage": 75.47, "elapsed_time": "10:29:38", "remaining_time": "3:24:37"}
|
| 121 |
+
{"current_steps": 1210, "total_steps": 1590, "loss": 0.4538, "lr": 1.649455410235985e-06, "epoch": 1.5225877538170942, "percentage": 76.1, "elapsed_time": "10:34:52", "remaining_time": "3:19:22"}
|
| 122 |
+
{"current_steps": 1220, "total_steps": 1590, "loss": 0.4592, "lr": 1.5687918106563326e-06, "epoch": 1.5351802298126869, "percentage": 76.73, "elapsed_time": "10:40:07", "remaining_time": "3:14:08"}
|
| 123 |
+
{"current_steps": 1230, "total_steps": 1590, "loss": 0.4549, "lr": 1.4897818858620095e-06, "epoch": 1.5477727058082795, "percentage": 77.36, "elapsed_time": "10:45:24", "remaining_time": "3:08:53"}
|
| 124 |
+
{"current_steps": 1240, "total_steps": 1590, "loss": 0.4516, "lr": 1.4124637147783431e-06, "epoch": 1.5603651818038722, "percentage": 77.99, "elapsed_time": "10:50:23", "remaining_time": "3:03:34"}
|
| 125 |
+
{"current_steps": 1250, "total_steps": 1590, "loss": 0.4601, "lr": 1.3368745609879908e-06, "epoch": 1.5729576577994648, "percentage": 78.62, "elapsed_time": "10:55:41", "remaining_time": "2:58:20"}
|
| 126 |
+
{"current_steps": 1260, "total_steps": 1590, "loss": 0.4545, "lr": 1.263050854771705e-06, "epoch": 1.5855501337950575, "percentage": 79.25, "elapsed_time": "11:00:58", "remaining_time": "2:53:06"}
|
| 127 |
+
{"current_steps": 1270, "total_steps": 1590, "loss": 0.4455, "lr": 1.191028175550727e-06, "epoch": 1.5981426097906501, "percentage": 79.87, "elapsed_time": "11:06:02", "remaining_time": "2:47:49"}
|
| 128 |
+
{"current_steps": 1280, "total_steps": 1590, "loss": 0.4445, "lr": 1.1208412347392338e-06, "epoch": 1.6107350857862426, "percentage": 80.5, "elapsed_time": "11:11:04", "remaining_time": "2:42:31"}
|
| 129 |
+
{"current_steps": 1290, "total_steps": 1590, "loss": 0.4496, "lr": 1.0525238590151442e-06, "epoch": 1.6233275617818355, "percentage": 81.13, "elapsed_time": "11:16:22", "remaining_time": "2:37:17"}
|
| 130 |
+
{"current_steps": 1300, "total_steps": 1590, "loss": 0.4546, "lr": 9.86108974017298e-07, "epoch": 1.635920037777428, "percentage": 81.76, "elapsed_time": "11:21:31", "remaining_time": "2:32:01"}
|
| 131 |
+
{"current_steps": 1310, "total_steps": 1590, "loss": 0.4583, "lr": 9.216285884769172e-07, "epoch": 1.6485125137730208, "percentage": 82.39, "elapsed_time": "11:26:56", "remaining_time": "2:26:49"}
|
| 132 |
+
{"current_steps": 1320, "total_steps": 1590, "loss": 0.4555, "lr": 8.591137787909503e-07, "epoch": 1.6611049897686132, "percentage": 83.02, "elapsed_time": "11:32:02", "remaining_time": "2:21:33"}
|
| 133 |
+
{"current_steps": 1330, "total_steps": 1590, "loss": 0.4497, "lr": 7.985946740447792e-07, "epoch": 1.6736974657642059, "percentage": 83.65, "elapsed_time": "11:37:03", "remaining_time": "2:16:16"}
|
| 134 |
+
{"current_steps": 1340, "total_steps": 1590, "loss": 0.463, "lr": 7.401004414914586e-07, "epoch": 1.6862899417597985, "percentage": 84.28, "elapsed_time": "11:42:25", "remaining_time": "2:11:02"}
|
| 135 |
+
{"current_steps": 1350, "total_steps": 1590, "loss": 0.4561, "lr": 6.836592724945323e-07, "epoch": 1.6988824177553912, "percentage": 84.91, "elapsed_time": "11:47:41", "remaining_time": "2:05:48"}
|
| 136 |
+
{"current_steps": 1360, "total_steps": 1590, "loss": 0.4624, "lr": 6.292983689411725e-07, "epoch": 1.7114748937509838, "percentage": 85.53, "elapsed_time": "11:53:10", "remaining_time": "2:00:36"}
|
| 137 |
+
{"current_steps": 1370, "total_steps": 1590, "loss": 0.4467, "lr": 5.770439301321929e-07, "epoch": 1.7240673697465763, "percentage": 86.16, "elapsed_time": "11:58:21", "remaining_time": "1:55:21"}
|
| 138 |
+
{"current_steps": 1380, "total_steps": 1590, "loss": 0.451, "lr": 5.269211401552721e-07, "epoch": 1.7366598457421691, "percentage": 86.79, "elapsed_time": "12:03:24", "remaining_time": "1:50:05"}
|
| 139 |
+
{"current_steps": 1390, "total_steps": 1590, "loss": 0.4639, "lr": 4.78954155747448e-07, "epoch": 1.7492523217377616, "percentage": 87.42, "elapsed_time": "12:08:50", "remaining_time": "1:44:52"}
|
| 140 |
+
{"current_steps": 1400, "total_steps": 1590, "loss": 0.4518, "lr": 4.3316609465275437e-07, "epoch": 1.7618447977333545, "percentage": 88.05, "elapsed_time": "12:14:03", "remaining_time": "1:39:37"}
|
| 141 |
+
{"current_steps": 1410, "total_steps": 1590, "loss": 0.4603, "lr": 3.895790244805936e-07, "epoch": 1.7744372737289469, "percentage": 88.68, "elapsed_time": "12:19:18", "remaining_time": "1:34:22"}
|
| 142 |
+
{"current_steps": 1420, "total_steps": 1590, "loss": 0.4512, "lr": 3.4821395207022767e-07, "epoch": 1.7870297497245395, "percentage": 89.31, "elapsed_time": "12:24:33", "remaining_time": "1:29:08"}
|
| 143 |
+
{"current_steps": 1430, "total_steps": 1590, "loss": 0.4597, "lr": 3.0909081336650883e-07, "epoch": 1.7996222257201322, "percentage": 89.94, "elapsed_time": "12:29:42", "remaining_time": "1:23:53"}
|
| 144 |
+
{"current_steps": 1440, "total_steps": 1590, "loss": 0.467, "lr": 2.7222846381172616e-07, "epoch": 1.8122147017157249, "percentage": 90.57, "elapsed_time": "12:35:07", "remaining_time": "1:18:39"}
|
| 145 |
+
{"current_steps": 1450, "total_steps": 1590, "loss": 0.4617, "lr": 2.3764466925820518e-07, "epoch": 1.8248071777113175, "percentage": 91.19, "elapsed_time": "12:40:35", "remaining_time": "1:13:26"}
|
| 146 |
+
{"current_steps": 1460, "total_steps": 1590, "loss": 0.4586, "lr": 2.0535609740603092e-07, "epoch": 1.8373996537069102, "percentage": 91.82, "elapsed_time": "12:45:45", "remaining_time": "1:08:11"}
|
| 147 |
+
{"current_steps": 1470, "total_steps": 1590, "loss": 0.4504, "lr": 1.7537830977003456e-07, "epoch": 1.8499921297025028, "percentage": 92.45, "elapsed_time": "12:50:49", "remaining_time": "1:02:55"}
|
| 148 |
+
{"current_steps": 1480, "total_steps": 1590, "loss": 0.4534, "lr": 1.477257541799032e-07, "epoch": 1.8625846056980953, "percentage": 93.08, "elapsed_time": "12:55:58", "remaining_time": "0:57:40"}
|
| 149 |
+
{"current_steps": 1490, "total_steps": 1590, "loss": 0.4553, "lr": 1.2241175781702587e-07, "epoch": 1.8751770816936881, "percentage": 93.71, "elapsed_time": "13:01:16", "remaining_time": "0:52:26"}
|
| 150 |
+
{"current_steps": 1500, "total_steps": 1590, "loss": 0.4572, "lr": 9.944852079144862e-08, "epoch": 1.8877695576892806, "percentage": 94.34, "elapsed_time": "13:06:40", "remaining_time": "0:47:12"}
|
| 151 |
+
{"current_steps": 1510, "total_steps": 1590, "loss": 0.4589, "lr": 7.884711026201586e-08, "epoch": 1.9003620336848734, "percentage": 94.97, "elapsed_time": "13:11:44", "remaining_time": "0:41:56"}
|
| 152 |
+
{"current_steps": 1520, "total_steps": 1590, "loss": 0.4547, "lr": 6.061745510254069e-08, "epoch": 1.9129545096804659, "percentage": 95.6, "elapsed_time": "13:16:55", "remaining_time": "0:36:42"}
|
| 153 |
+
{"current_steps": 1530, "total_steps": 1590, "loss": 0.4533, "lr": 4.476834111656891e-08, "epoch": 1.9255469856760585, "percentage": 96.23, "elapsed_time": "13:21:57", "remaining_time": "0:31:26"}
|
| 154 |
+
{"current_steps": 1540, "total_steps": 1590, "loss": 0.4503, "lr": 3.130740680305666e-08, "epoch": 1.9381394616716512, "percentage": 96.86, "elapsed_time": "13:27:15", "remaining_time": "0:26:12"}
|
| 155 |
+
{"current_steps": 1550, "total_steps": 1590, "loss": 0.4447, "lr": 2.0241139674982424e-08, "epoch": 1.9507319376672438, "percentage": 97.48, "elapsed_time": "13:32:18", "remaining_time": "0:20:57"}
|
| 156 |
+
{"current_steps": 1560, "total_steps": 1590, "loss": 0.4649, "lr": 1.1574873132684239e-08, "epoch": 1.9633244136628365, "percentage": 98.11, "elapsed_time": "13:37:40", "remaining_time": "0:15:43"}
|
| 157 |
+
{"current_steps": 1570, "total_steps": 1590, "loss": 0.4572, "lr": 5.31278389342138e-09, "epoch": 1.975916889658429, "percentage": 98.74, "elapsed_time": "13:43:01", "remaining_time": "0:10:29"}
|
| 158 |
+
{"current_steps": 1580, "total_steps": 1590, "loss": 0.4551, "lr": 1.4578899784001288e-09, "epoch": 1.9885093656540218, "percentage": 99.37, "elapsed_time": "13:48:11", "remaining_time": "0:05:14"}
|
| 159 |
+
{"current_steps": 1590, "total_steps": 1590, "loss": 0.4489, "lr": 1.2049258235058425e-11, "epoch": 2.0, "percentage": 100.0, "elapsed_time": "13:52:46", "remaining_time": "0:00:00"}
|
| 160 |
+
{"current_steps": 1590, "total_steps": 1590, "epoch": 2.0, "percentage": 100.0, "elapsed_time": "13:53:24", "remaining_time": "0:00:00"}
|
trainer_state.json
ADDED
|
@@ -0,0 +1,1156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"best_global_step": null,
|
| 3 |
+
"best_metric": null,
|
| 4 |
+
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 2.0,
|
| 6 |
+
"eval_steps": 500,
|
| 7 |
+
"global_step": 1590,
|
| 8 |
+
"is_hyper_param_search": false,
|
| 9 |
+
"is_local_process_zero": true,
|
| 10 |
+
"is_world_process_zero": true,
|
| 11 |
+
"log_history": [
|
| 12 |
+
{
|
| 13 |
+
"epoch": 0.012592475995592633,
|
| 14 |
+
"grad_norm": 38.37793731689453,
|
| 15 |
+
"learning_rate": 5.660377358490567e-07,
|
| 16 |
+
"loss": 1.2378,
|
| 17 |
+
"step": 10
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"epoch": 0.025184951991185266,
|
| 21 |
+
"grad_norm": 13.752121925354004,
|
| 22 |
+
"learning_rate": 1.1949685534591195e-06,
|
| 23 |
+
"loss": 0.9875,
|
| 24 |
+
"step": 20
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"epoch": 0.0377774279867779,
|
| 28 |
+
"grad_norm": 3.883774995803833,
|
| 29 |
+
"learning_rate": 1.8238993710691824e-06,
|
| 30 |
+
"loss": 0.6558,
|
| 31 |
+
"step": 30
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"epoch": 0.05036990398237053,
|
| 35 |
+
"grad_norm": 1.039472222328186,
|
| 36 |
+
"learning_rate": 2.4528301886792453e-06,
|
| 37 |
+
"loss": 0.5882,
|
| 38 |
+
"step": 40
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"epoch": 0.06296237997796317,
|
| 42 |
+
"grad_norm": 0.9505448937416077,
|
| 43 |
+
"learning_rate": 3.0817610062893084e-06,
|
| 44 |
+
"loss": 0.5781,
|
| 45 |
+
"step": 50
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"epoch": 0.0755548559735558,
|
| 49 |
+
"grad_norm": 0.8333441019058228,
|
| 50 |
+
"learning_rate": 3.710691823899371e-06,
|
| 51 |
+
"loss": 0.5495,
|
| 52 |
+
"step": 60
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"epoch": 0.08814733196914844,
|
| 56 |
+
"grad_norm": 1.262669324874878,
|
| 57 |
+
"learning_rate": 4.339622641509435e-06,
|
| 58 |
+
"loss": 0.5535,
|
| 59 |
+
"step": 70
|
| 60 |
+
},
|
| 61 |
+
{
|
| 62 |
+
"epoch": 0.10073980796474107,
|
| 63 |
+
"grad_norm": 0.8360942006111145,
|
| 64 |
+
"learning_rate": 4.968553459119497e-06,
|
| 65 |
+
"loss": 0.5396,
|
| 66 |
+
"step": 80
|
| 67 |
+
},
|
| 68 |
+
{
|
| 69 |
+
"epoch": 0.11333228396033371,
|
| 70 |
+
"grad_norm": 0.7950102090835571,
|
| 71 |
+
"learning_rate": 5.59748427672956e-06,
|
| 72 |
+
"loss": 0.5413,
|
| 73 |
+
"step": 90
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"epoch": 0.12592475995592634,
|
| 77 |
+
"grad_norm": 0.8145875334739685,
|
| 78 |
+
"learning_rate": 6.226415094339623e-06,
|
| 79 |
+
"loss": 0.5376,
|
| 80 |
+
"step": 100
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"epoch": 0.13851723595151896,
|
| 84 |
+
"grad_norm": 0.7814944386482239,
|
| 85 |
+
"learning_rate": 6.855345911949685e-06,
|
| 86 |
+
"loss": 0.5351,
|
| 87 |
+
"step": 110
|
| 88 |
+
},
|
| 89 |
+
{
|
| 90 |
+
"epoch": 0.1511097119471116,
|
| 91 |
+
"grad_norm": 0.8993023037910461,
|
| 92 |
+
"learning_rate": 7.484276729559748e-06,
|
| 93 |
+
"loss": 0.5409,
|
| 94 |
+
"step": 120
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"epoch": 0.16370218794270425,
|
| 98 |
+
"grad_norm": 0.866847574710846,
|
| 99 |
+
"learning_rate": 8.113207547169812e-06,
|
| 100 |
+
"loss": 0.5442,
|
| 101 |
+
"step": 130
|
| 102 |
+
},
|
| 103 |
+
{
|
| 104 |
+
"epoch": 0.17629466393829687,
|
| 105 |
+
"grad_norm": 0.7159755229949951,
|
| 106 |
+
"learning_rate": 8.742138364779875e-06,
|
| 107 |
+
"loss": 0.5365,
|
| 108 |
+
"step": 140
|
| 109 |
+
},
|
| 110 |
+
{
|
| 111 |
+
"epoch": 0.1888871399338895,
|
| 112 |
+
"grad_norm": 0.7923691272735596,
|
| 113 |
+
"learning_rate": 9.371069182389939e-06,
|
| 114 |
+
"loss": 0.5382,
|
| 115 |
+
"step": 150
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"epoch": 0.20147961592948213,
|
| 119 |
+
"grad_norm": 0.8047380447387695,
|
| 120 |
+
"learning_rate": 1e-05,
|
| 121 |
+
"loss": 0.5414,
|
| 122 |
+
"step": 160
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"epoch": 0.21407209192507476,
|
| 126 |
+
"grad_norm": 0.8467481732368469,
|
| 127 |
+
"learning_rate": 9.998795122086687e-06,
|
| 128 |
+
"loss": 0.5331,
|
| 129 |
+
"step": 170
|
| 130 |
+
},
|
| 131 |
+
{
|
| 132 |
+
"epoch": 0.22666456792066741,
|
| 133 |
+
"grad_norm": 0.7700195908546448,
|
| 134 |
+
"learning_rate": 9.995181069039055e-06,
|
| 135 |
+
"loss": 0.5345,
|
| 136 |
+
"step": 180
|
| 137 |
+
},
|
| 138 |
+
{
|
| 139 |
+
"epoch": 0.23925704391626004,
|
| 140 |
+
"grad_norm": 1.2827311754226685,
|
| 141 |
+
"learning_rate": 9.989159582654187e-06,
|
| 142 |
+
"loss": 0.5302,
|
| 143 |
+
"step": 190
|
| 144 |
+
},
|
| 145 |
+
{
|
| 146 |
+
"epoch": 0.25184951991185267,
|
| 147 |
+
"grad_norm": 0.774922251701355,
|
| 148 |
+
"learning_rate": 9.98073356499446e-06,
|
| 149 |
+
"loss": 0.5337,
|
| 150 |
+
"step": 200
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"epoch": 0.2644419959074453,
|
| 154 |
+
"grad_norm": 0.7742050290107727,
|
| 155 |
+
"learning_rate": 9.969907076988907e-06,
|
| 156 |
+
"loss": 0.5229,
|
| 157 |
+
"step": 210
|
| 158 |
+
},
|
| 159 |
+
{
|
| 160 |
+
"epoch": 0.2770344719030379,
|
| 161 |
+
"grad_norm": 0.7685420513153076,
|
| 162 |
+
"learning_rate": 9.956685336476037e-06,
|
| 163 |
+
"loss": 0.54,
|
| 164 |
+
"step": 220
|
| 165 |
+
},
|
| 166 |
+
{
|
| 167 |
+
"epoch": 0.2896269478986306,
|
| 168 |
+
"grad_norm": 0.7166957259178162,
|
| 169 |
+
"learning_rate": 9.941074715689097e-06,
|
| 170 |
+
"loss": 0.5252,
|
| 171 |
+
"step": 230
|
| 172 |
+
},
|
| 173 |
+
{
|
| 174 |
+
"epoch": 0.3022194238942232,
|
| 175 |
+
"grad_norm": 0.7183574438095093,
|
| 176 |
+
"learning_rate": 9.923082738184969e-06,
|
| 177 |
+
"loss": 0.5324,
|
| 178 |
+
"step": 240
|
| 179 |
+
},
|
| 180 |
+
{
|
| 181 |
+
"epoch": 0.31481189988981584,
|
| 182 |
+
"grad_norm": 0.7572329044342041,
|
| 183 |
+
"learning_rate": 9.902718075218176e-06,
|
| 184 |
+
"loss": 0.5213,
|
| 185 |
+
"step": 250
|
| 186 |
+
},
|
| 187 |
+
{
|
| 188 |
+
"epoch": 0.3274043758854085,
|
| 189 |
+
"grad_norm": 0.7367239594459534,
|
| 190 |
+
"learning_rate": 9.879990541561766e-06,
|
| 191 |
+
"loss": 0.5221,
|
| 192 |
+
"step": 260
|
| 193 |
+
},
|
| 194 |
+
{
|
| 195 |
+
"epoch": 0.3399968518810011,
|
| 196 |
+
"grad_norm": 0.7647990584373474,
|
| 197 |
+
"learning_rate": 9.854911090777071e-06,
|
| 198 |
+
"loss": 0.5116,
|
| 199 |
+
"step": 270
|
| 200 |
+
},
|
| 201 |
+
{
|
| 202 |
+
"epoch": 0.35258932787659375,
|
| 203 |
+
"grad_norm": 0.7244531512260437,
|
| 204 |
+
"learning_rate": 9.827491809934621e-06,
|
| 205 |
+
"loss": 0.5198,
|
| 206 |
+
"step": 280
|
| 207 |
+
},
|
| 208 |
+
{
|
| 209 |
+
"epoch": 0.36518180387218635,
|
| 210 |
+
"grad_norm": 0.7534804344177246,
|
| 211 |
+
"learning_rate": 9.797745913788772e-06,
|
| 212 |
+
"loss": 0.5243,
|
| 213 |
+
"step": 290
|
| 214 |
+
},
|
| 215 |
+
{
|
| 216 |
+
"epoch": 0.377774279867779,
|
| 217 |
+
"grad_norm": 0.7589166760444641,
|
| 218 |
+
"learning_rate": 9.765687738408834e-06,
|
| 219 |
+
"loss": 0.5154,
|
| 220 |
+
"step": 300
|
| 221 |
+
},
|
| 222 |
+
{
|
| 223 |
+
"epoch": 0.39036675586337166,
|
| 224 |
+
"grad_norm": 0.755774974822998,
|
| 225 |
+
"learning_rate": 9.731332734269791e-06,
|
| 226 |
+
"loss": 0.5368,
|
| 227 |
+
"step": 310
|
| 228 |
+
},
|
| 229 |
+
{
|
| 230 |
+
"epoch": 0.40295923185896426,
|
| 231 |
+
"grad_norm": 0.7197142243385315,
|
| 232 |
+
"learning_rate": 9.69469745880592e-06,
|
| 233 |
+
"loss": 0.5257,
|
| 234 |
+
"step": 320
|
| 235 |
+
},
|
| 236 |
+
{
|
| 237 |
+
"epoch": 0.4155517078545569,
|
| 238 |
+
"grad_norm": 0.6717684268951416,
|
| 239 |
+
"learning_rate": 9.655799568430926e-06,
|
| 240 |
+
"loss": 0.5179,
|
| 241 |
+
"step": 330
|
| 242 |
+
},
|
| 243 |
+
{
|
| 244 |
+
"epoch": 0.4281441838501495,
|
| 245 |
+
"grad_norm": 0.7795198559761047,
|
| 246 |
+
"learning_rate": 9.614657810028402e-06,
|
| 247 |
+
"loss": 0.5205,
|
| 248 |
+
"step": 340
|
| 249 |
+
},
|
| 250 |
+
{
|
| 251 |
+
"epoch": 0.4407366598457422,
|
| 252 |
+
"grad_norm": 0.7759056091308594,
|
| 253 |
+
"learning_rate": 9.571292011916753e-06,
|
| 254 |
+
"loss": 0.5185,
|
| 255 |
+
"step": 350
|
| 256 |
+
},
|
| 257 |
+
{
|
| 258 |
+
"epoch": 0.45332913584133483,
|
| 259 |
+
"grad_norm": 0.7955228090286255,
|
| 260 |
+
"learning_rate": 9.525723074292916e-06,
|
| 261 |
+
"loss": 0.5193,
|
| 262 |
+
"step": 360
|
| 263 |
+
},
|
| 264 |
+
{
|
| 265 |
+
"epoch": 0.46592161183692743,
|
| 266 |
+
"grad_norm": 0.7349282503128052,
|
| 267 |
+
"learning_rate": 9.47797295915948e-06,
|
| 268 |
+
"loss": 0.5163,
|
| 269 |
+
"step": 370
|
| 270 |
+
},
|
| 271 |
+
{
|
| 272 |
+
"epoch": 0.4785140878325201,
|
| 273 |
+
"grad_norm": 0.6859073042869568,
|
| 274 |
+
"learning_rate": 9.428064679740081e-06,
|
| 275 |
+
"loss": 0.5108,
|
| 276 |
+
"step": 380
|
| 277 |
+
},
|
| 278 |
+
{
|
| 279 |
+
"epoch": 0.4911065638281127,
|
| 280 |
+
"grad_norm": 0.7050626873970032,
|
| 281 |
+
"learning_rate": 9.37602228938814e-06,
|
| 282 |
+
"loss": 0.5241,
|
| 283 |
+
"step": 390
|
| 284 |
+
},
|
| 285 |
+
{
|
| 286 |
+
"epoch": 0.5036990398237053,
|
| 287 |
+
"grad_norm": 0.7306986451148987,
|
| 288 |
+
"learning_rate": 9.321870869994336e-06,
|
| 289 |
+
"loss": 0.5132,
|
| 290 |
+
"step": 400
|
| 291 |
+
},
|
| 292 |
+
{
|
| 293 |
+
"epoch": 0.516291515819298,
|
| 294 |
+
"grad_norm": 0.7835687398910522,
|
| 295 |
+
"learning_rate": 9.26563651989835e-06,
|
| 296 |
+
"loss": 0.5015,
|
| 297 |
+
"step": 410
|
| 298 |
+
},
|
| 299 |
+
{
|
| 300 |
+
"epoch": 0.5288839918148907,
|
| 301 |
+
"grad_norm": 0.716162383556366,
|
| 302 |
+
"learning_rate": 9.207346341310744e-06,
|
| 303 |
+
"loss": 0.5329,
|
| 304 |
+
"step": 420
|
| 305 |
+
},
|
| 306 |
+
{
|
| 307 |
+
"epoch": 0.5414764678104832,
|
| 308 |
+
"grad_norm": 0.6837930083274841,
|
| 309 |
+
"learning_rate": 9.14702842725101e-06,
|
| 310 |
+
"loss": 0.5237,
|
| 311 |
+
"step": 430
|
| 312 |
+
},
|
| 313 |
+
{
|
| 314 |
+
"epoch": 0.5540689438060759,
|
| 315 |
+
"grad_norm": 0.680153489112854,
|
| 316 |
+
"learning_rate": 9.084711848008122e-06,
|
| 317 |
+
"loss": 0.5175,
|
| 318 |
+
"step": 440
|
| 319 |
+
},
|
| 320 |
+
{
|
| 321 |
+
"epoch": 0.5666614198016685,
|
| 322 |
+
"grad_norm": 0.711155891418457,
|
| 323 |
+
"learning_rate": 9.020426637130069e-06,
|
| 324 |
+
"loss": 0.5048,
|
| 325 |
+
"step": 450
|
| 326 |
+
},
|
| 327 |
+
{
|
| 328 |
+
"epoch": 0.5792538957972612,
|
| 329 |
+
"grad_norm": 0.7100813388824463,
|
| 330 |
+
"learning_rate": 8.954203776949141e-06,
|
| 331 |
+
"loss": 0.524,
|
| 332 |
+
"step": 460
|
| 333 |
+
},
|
| 334 |
+
{
|
| 335 |
+
"epoch": 0.5918463717928538,
|
| 336 |
+
"grad_norm": 0.7131823897361755,
|
| 337 |
+
"learning_rate": 8.886075183649976e-06,
|
| 338 |
+
"loss": 0.5177,
|
| 339 |
+
"step": 470
|
| 340 |
+
},
|
| 341 |
+
{
|
| 342 |
+
"epoch": 0.6044388477884464,
|
| 343 |
+
"grad_norm": 0.6676498651504517,
|
| 344 |
+
"learning_rate": 8.816073691887506e-06,
|
| 345 |
+
"loss": 0.5089,
|
| 346 |
+
"step": 480
|
| 347 |
+
},
|
| 348 |
+
{
|
| 349 |
+
"epoch": 0.617031323784039,
|
| 350 |
+
"grad_norm": 0.6786186099052429,
|
| 351 |
+
"learning_rate": 8.744233038962262e-06,
|
| 352 |
+
"loss": 0.5006,
|
| 353 |
+
"step": 490
|
| 354 |
+
},
|
| 355 |
+
{
|
| 356 |
+
"epoch": 0.6296237997796317,
|
| 357 |
+
"grad_norm": 0.695183515548706,
|
| 358 |
+
"learning_rate": 8.670587848560636e-06,
|
| 359 |
+
"loss": 0.5069,
|
| 360 |
+
"step": 500
|
| 361 |
+
},
|
| 362 |
+
{
|
| 363 |
+
"epoch": 0.6422162757752243,
|
| 364 |
+
"grad_norm": 0.6954060792922974,
|
| 365 |
+
"learning_rate": 8.595173614067966e-06,
|
| 366 |
+
"loss": 0.5083,
|
| 367 |
+
"step": 510
|
| 368 |
+
},
|
| 369 |
+
{
|
| 370 |
+
"epoch": 0.654808751770817,
|
| 371 |
+
"grad_norm": 0.7450917959213257,
|
| 372 |
+
"learning_rate": 8.518026681462448e-06,
|
| 373 |
+
"loss": 0.4996,
|
| 374 |
+
"step": 520
|
| 375 |
+
},
|
| 376 |
+
{
|
| 377 |
+
"epoch": 0.6674012277664095,
|
| 378 |
+
"grad_norm": 0.7362504005432129,
|
| 379 |
+
"learning_rate": 8.43918423179815e-06,
|
| 380 |
+
"loss": 0.5071,
|
| 381 |
+
"step": 530
|
| 382 |
+
},
|
| 383 |
+
{
|
| 384 |
+
"epoch": 0.6799937037620022,
|
| 385 |
+
"grad_norm": 0.7175170183181763,
|
| 386 |
+
"learning_rate": 8.358684263285566e-06,
|
| 387 |
+
"loss": 0.5038,
|
| 388 |
+
"step": 540
|
| 389 |
+
},
|
| 390 |
+
{
|
| 391 |
+
"epoch": 0.6925861797575948,
|
| 392 |
+
"grad_norm": 0.7169266939163208,
|
| 393 |
+
"learning_rate": 8.27656557297833e-06,
|
| 394 |
+
"loss": 0.5136,
|
| 395 |
+
"step": 550
|
| 396 |
+
},
|
| 397 |
+
{
|
| 398 |
+
"epoch": 0.7051786557531875,
|
| 399 |
+
"grad_norm": 0.68199223279953,
|
| 400 |
+
"learning_rate": 8.192867738074927e-06,
|
| 401 |
+
"loss": 0.5087,
|
| 402 |
+
"step": 560
|
| 403 |
+
},
|
| 404 |
+
{
|
| 405 |
+
"epoch": 0.7177711317487802,
|
| 406 |
+
"grad_norm": 0.7578943967819214,
|
| 407 |
+
"learning_rate": 8.107631096844431e-06,
|
| 408 |
+
"loss": 0.5061,
|
| 409 |
+
"step": 570
|
| 410 |
+
},
|
| 411 |
+
{
|
| 412 |
+
"epoch": 0.7303636077443727,
|
| 413 |
+
"grad_norm": 0.6680927276611328,
|
| 414 |
+
"learning_rate": 8.020896729185406e-06,
|
| 415 |
+
"loss": 0.5022,
|
| 416 |
+
"step": 580
|
| 417 |
+
},
|
| 418 |
+
{
|
| 419 |
+
"epoch": 0.7429560837399654,
|
| 420 |
+
"grad_norm": 0.7082540988922119,
|
| 421 |
+
"learning_rate": 7.93270643682742e-06,
|
| 422 |
+
"loss": 0.504,
|
| 423 |
+
"step": 590
|
| 424 |
+
},
|
| 425 |
+
{
|
| 426 |
+
"epoch": 0.755548559735558,
|
| 427 |
+
"grad_norm": 0.6694560050964355,
|
| 428 |
+
"learning_rate": 7.843102723184647e-06,
|
| 429 |
+
"loss": 0.5112,
|
| 430 |
+
"step": 600
|
| 431 |
+
},
|
| 432 |
+
{
|
| 433 |
+
"epoch": 0.7681410357311507,
|
| 434 |
+
"grad_norm": 0.6237286925315857,
|
| 435 |
+
"learning_rate": 7.752128772871292e-06,
|
| 436 |
+
"loss": 0.5112,
|
| 437 |
+
"step": 610
|
| 438 |
+
},
|
| 439 |
+
{
|
| 440 |
+
"epoch": 0.7807335117267433,
|
| 441 |
+
"grad_norm": 0.7196964025497437,
|
| 442 |
+
"learning_rate": 7.659828430888726e-06,
|
| 443 |
+
"loss": 0.5058,
|
| 444 |
+
"step": 620
|
| 445 |
+
},
|
| 446 |
+
{
|
| 447 |
+
"epoch": 0.7933259877223359,
|
| 448 |
+
"grad_norm": 0.7414775490760803,
|
| 449 |
+
"learning_rate": 7.566246181494325e-06,
|
| 450 |
+
"loss": 0.5028,
|
| 451 |
+
"step": 630
|
| 452 |
+
},
|
| 453 |
+
{
|
| 454 |
+
"epoch": 0.8059184637179285,
|
| 455 |
+
"grad_norm": 0.7570202350616455,
|
| 456 |
+
"learning_rate": 7.4714271267622395e-06,
|
| 457 |
+
"loss": 0.5064,
|
| 458 |
+
"step": 640
|
| 459 |
+
},
|
| 460 |
+
{
|
| 461 |
+
"epoch": 0.8185109397135212,
|
| 462 |
+
"grad_norm": 0.7066277861595154,
|
| 463 |
+
"learning_rate": 7.3754169648463924e-06,
|
| 464 |
+
"loss": 0.5032,
|
| 465 |
+
"step": 650
|
| 466 |
+
},
|
| 467 |
+
{
|
| 468 |
+
"epoch": 0.8311034157091138,
|
| 469 |
+
"grad_norm": 0.6958999037742615,
|
| 470 |
+
"learning_rate": 7.278261967956203e-06,
|
| 471 |
+
"loss": 0.5064,
|
| 472 |
+
"step": 660
|
| 473 |
+
},
|
| 474 |
+
{
|
| 475 |
+
"epoch": 0.8436958917047065,
|
| 476 |
+
"grad_norm": 0.7002361416816711,
|
| 477 |
+
"learning_rate": 7.18000896005564e-06,
|
| 478 |
+
"loss": 0.5108,
|
| 479 |
+
"step": 670
|
| 480 |
+
},
|
| 481 |
+
{
|
| 482 |
+
"epoch": 0.856288367700299,
|
| 483 |
+
"grad_norm": 0.6879448294639587,
|
| 484 |
+
"learning_rate": 7.080705294296355e-06,
|
| 485 |
+
"loss": 0.498,
|
| 486 |
+
"step": 680
|
| 487 |
+
},
|
| 488 |
+
{
|
| 489 |
+
"epoch": 0.8688808436958917,
|
| 490 |
+
"grad_norm": 0.6851900219917297,
|
| 491 |
+
"learning_rate": 6.980398830195785e-06,
|
| 492 |
+
"loss": 0.4974,
|
| 493 |
+
"step": 690
|
| 494 |
+
},
|
| 495 |
+
{
|
| 496 |
+
"epoch": 0.8814733196914843,
|
| 497 |
+
"grad_norm": 0.6358043551445007,
|
| 498 |
+
"learning_rate": 6.879137910571191e-06,
|
| 499 |
+
"loss": 0.4979,
|
| 500 |
+
"step": 700
|
| 501 |
+
},
|
| 502 |
+
{
|
| 503 |
+
"epoch": 0.894065795687077,
|
| 504 |
+
"grad_norm": 0.7097135782241821,
|
| 505 |
+
"learning_rate": 6.77697133824079e-06,
|
| 506 |
+
"loss": 0.5063,
|
| 507 |
+
"step": 710
|
| 508 |
+
},
|
| 509 |
+
{
|
| 510 |
+
"epoch": 0.9066582716826697,
|
| 511 |
+
"grad_norm": 0.646121621131897,
|
| 512 |
+
"learning_rate": 6.673948352503172e-06,
|
| 513 |
+
"loss": 0.5148,
|
| 514 |
+
"step": 720
|
| 515 |
+
},
|
| 516 |
+
{
|
| 517 |
+
"epoch": 0.9192507476782622,
|
| 518 |
+
"grad_norm": 0.661415159702301,
|
| 519 |
+
"learning_rate": 6.5701186054063704e-06,
|
| 520 |
+
"loss": 0.5078,
|
| 521 |
+
"step": 730
|
| 522 |
+
},
|
| 523 |
+
{
|
| 524 |
+
"epoch": 0.9318432236738549,
|
| 525 |
+
"grad_norm": 0.7175974249839783,
|
| 526 |
+
"learning_rate": 6.4655321378179935e-06,
|
| 527 |
+
"loss": 0.505,
|
| 528 |
+
"step": 740
|
| 529 |
+
},
|
| 530 |
+
{
|
| 531 |
+
"epoch": 0.9444356996694475,
|
| 532 |
+
"grad_norm": 0.7281399369239807,
|
| 533 |
+
"learning_rate": 6.360239355307972e-06,
|
| 534 |
+
"loss": 0.5016,
|
| 535 |
+
"step": 750
|
| 536 |
+
},
|
| 537 |
+
{
|
| 538 |
+
"epoch": 0.9570281756650402,
|
| 539 |
+
"grad_norm": 0.6843705177307129,
|
| 540 |
+
"learning_rate": 6.254291003855537e-06,
|
| 541 |
+
"loss": 0.5086,
|
| 542 |
+
"step": 760
|
| 543 |
+
},
|
| 544 |
+
{
|
| 545 |
+
"epoch": 0.9696206516606328,
|
| 546 |
+
"grad_norm": 0.6782203316688538,
|
| 547 |
+
"learning_rate": 6.147738145392137e-06,
|
| 548 |
+
"loss": 0.4998,
|
| 549 |
+
"step": 770
|
| 550 |
+
},
|
| 551 |
+
{
|
| 552 |
+
"epoch": 0.9822131276562254,
|
| 553 |
+
"grad_norm": 0.6951731443405151,
|
| 554 |
+
"learning_rate": 6.040632133192074e-06,
|
| 555 |
+
"loss": 0.4933,
|
| 556 |
+
"step": 780
|
| 557 |
+
},
|
| 558 |
+
{
|
| 559 |
+
"epoch": 0.994805603651818,
|
| 560 |
+
"grad_norm": 0.6784470081329346,
|
| 561 |
+
"learning_rate": 5.933024587122745e-06,
|
| 562 |
+
"loss": 0.5084,
|
| 563 |
+
"step": 790
|
| 564 |
+
},
|
| 565 |
+
{
|
| 566 |
+
"epoch": 1.0062962379977962,
|
| 567 |
+
"grad_norm": 0.7149181962013245,
|
| 568 |
+
"learning_rate": 5.824967368766375e-06,
|
| 569 |
+
"loss": 0.4699,
|
| 570 |
+
"step": 800
|
| 571 |
+
},
|
| 572 |
+
{
|
| 573 |
+
"epoch": 1.0188887139933889,
|
| 574 |
+
"grad_norm": 0.6838890910148621,
|
| 575 |
+
"learning_rate": 5.716512556425271e-06,
|
| 576 |
+
"loss": 0.4574,
|
| 577 |
+
"step": 810
|
| 578 |
+
},
|
| 579 |
+
{
|
| 580 |
+
"epoch": 1.0314811899889815,
|
| 581 |
+
"grad_norm": 0.6297887563705444,
|
| 582 |
+
"learning_rate": 5.607712420022627e-06,
|
| 583 |
+
"loss": 0.4637,
|
| 584 |
+
"step": 820
|
| 585 |
+
},
|
| 586 |
+
{
|
| 587 |
+
"epoch": 1.0440736659845742,
|
| 588 |
+
"grad_norm": 0.6451053619384766,
|
| 589 |
+
"learning_rate": 5.4986193959109716e-06,
|
| 590 |
+
"loss": 0.464,
|
| 591 |
+
"step": 830
|
| 592 |
+
},
|
| 593 |
+
{
|
| 594 |
+
"epoch": 1.0566661419801668,
|
| 595 |
+
"grad_norm": 0.6407359838485718,
|
| 596 |
+
"learning_rate": 5.389286061600402e-06,
|
| 597 |
+
"loss": 0.4587,
|
| 598 |
+
"step": 840
|
| 599 |
+
},
|
| 600 |
+
{
|
| 601 |
+
"epoch": 1.0692586179757595,
|
| 602 |
+
"grad_norm": 0.6714970469474792,
|
| 603 |
+
"learning_rate": 5.2797651104187965e-06,
|
| 604 |
+
"loss": 0.4615,
|
| 605 |
+
"step": 850
|
| 606 |
+
},
|
| 607 |
+
{
|
| 608 |
+
"epoch": 1.0818510939713522,
|
| 609 |
+
"grad_norm": 0.6700007319450378,
|
| 610 |
+
"learning_rate": 5.1701093261162095e-06,
|
| 611 |
+
"loss": 0.4544,
|
| 612 |
+
"step": 860
|
| 613 |
+
},
|
| 614 |
+
{
|
| 615 |
+
"epoch": 1.0944435699669448,
|
| 616 |
+
"grad_norm": 0.721367359161377,
|
| 617 |
+
"learning_rate": 5.060371557425669e-06,
|
| 618 |
+
"loss": 0.4578,
|
| 619 |
+
"step": 870
|
| 620 |
+
},
|
| 621 |
+
{
|
| 622 |
+
"epoch": 1.1070360459625375,
|
| 623 |
+
"grad_norm": 0.6329649090766907,
|
| 624 |
+
"learning_rate": 4.9506046925926725e-06,
|
| 625 |
+
"loss": 0.4601,
|
| 626 |
+
"step": 880
|
| 627 |
+
},
|
| 628 |
+
{
|
| 629 |
+
"epoch": 1.1196285219581301,
|
| 630 |
+
"grad_norm": 0.7482566833496094,
|
| 631 |
+
"learning_rate": 4.840861633885642e-06,
|
| 632 |
+
"loss": 0.4655,
|
| 633 |
+
"step": 890
|
| 634 |
+
},
|
| 635 |
+
{
|
| 636 |
+
"epoch": 1.1322209979537226,
|
| 637 |
+
"grad_norm": 0.6573087573051453,
|
| 638 |
+
"learning_rate": 4.7311952720996106e-06,
|
| 639 |
+
"loss": 0.4741,
|
| 640 |
+
"step": 900
|
| 641 |
+
},
|
| 642 |
+
{
|
| 643 |
+
"epoch": 1.1448134739493152,
|
| 644 |
+
"grad_norm": 0.6579948663711548,
|
| 645 |
+
"learning_rate": 4.621658461065435e-06,
|
| 646 |
+
"loss": 0.4639,
|
| 647 |
+
"step": 910
|
| 648 |
+
},
|
| 649 |
+
{
|
| 650 |
+
"epoch": 1.1574059499449079,
|
| 651 |
+
"grad_norm": 0.702316164970398,
|
| 652 |
+
"learning_rate": 4.512303992176841e-06,
|
| 653 |
+
"loss": 0.4664,
|
| 654 |
+
"step": 920
|
| 655 |
+
},
|
| 656 |
+
{
|
| 657 |
+
"epoch": 1.1699984259405005,
|
| 658 |
+
"grad_norm": 0.6987930536270142,
|
| 659 |
+
"learning_rate": 4.4031845689475406e-06,
|
| 660 |
+
"loss": 0.466,
|
| 661 |
+
"step": 930
|
| 662 |
+
},
|
| 663 |
+
{
|
| 664 |
+
"epoch": 1.1825909019360932,
|
| 665 |
+
"grad_norm": 0.7473271489143372,
|
| 666 |
+
"learning_rate": 4.294352781610722e-06,
|
| 667 |
+
"loss": 0.4526,
|
| 668 |
+
"step": 940
|
| 669 |
+
},
|
| 670 |
+
{
|
| 671 |
+
"epoch": 1.1951833779316858,
|
| 672 |
+
"grad_norm": 0.7382975816726685,
|
| 673 |
+
"learning_rate": 4.185861081773115e-06,
|
| 674 |
+
"loss": 0.4488,
|
| 675 |
+
"step": 950
|
| 676 |
+
},
|
| 677 |
+
{
|
| 678 |
+
"epoch": 1.2077758539272785,
|
| 679 |
+
"grad_norm": 0.7045325040817261,
|
| 680 |
+
"learning_rate": 4.077761757135882e-06,
|
| 681 |
+
"loss": 0.4572,
|
| 682 |
+
"step": 960
|
| 683 |
+
},
|
| 684 |
+
{
|
| 685 |
+
"epoch": 1.2203683299228711,
|
| 686 |
+
"grad_norm": 0.6717881560325623,
|
| 687 |
+
"learning_rate": 3.970106906294509e-06,
|
| 688 |
+
"loss": 0.4684,
|
| 689 |
+
"step": 970
|
| 690 |
+
},
|
| 691 |
+
{
|
| 692 |
+
"epoch": 1.2329608059184638,
|
| 693 |
+
"grad_norm": 0.6612910628318787,
|
| 694 |
+
"learning_rate": 3.862948413629806e-06,
|
| 695 |
+
"loss": 0.459,
|
| 696 |
+
"step": 980
|
| 697 |
+
},
|
| 698 |
+
{
|
| 699 |
+
"epoch": 1.2455532819140562,
|
| 700 |
+
"grad_norm": 0.6803273558616638,
|
| 701 |
+
"learning_rate": 3.7563379243021924e-06,
|
| 702 |
+
"loss": 0.4509,
|
| 703 |
+
"step": 990
|
| 704 |
+
},
|
| 705 |
+
{
|
| 706 |
+
"epoch": 1.258145757909649,
|
| 707 |
+
"grad_norm": 0.6448065638542175,
|
| 708 |
+
"learning_rate": 3.6503268193612316e-06,
|
| 709 |
+
"loss": 0.4625,
|
| 710 |
+
"step": 1000
|
| 711 |
+
},
|
| 712 |
+
{
|
| 713 |
+
"epoch": 1.2707382339052415,
|
| 714 |
+
"grad_norm": 0.6815820932388306,
|
| 715 |
+
"learning_rate": 3.5449661909824908e-06,
|
| 716 |
+
"loss": 0.4638,
|
| 717 |
+
"step": 1010
|
| 718 |
+
},
|
| 719 |
+
{
|
| 720 |
+
"epoch": 1.2833307099008342,
|
| 721 |
+
"grad_norm": 0.6469627022743225,
|
| 722 |
+
"learning_rate": 3.440306817843592e-06,
|
| 723 |
+
"loss": 0.4658,
|
| 724 |
+
"step": 1020
|
| 725 |
+
},
|
| 726 |
+
{
|
| 727 |
+
"epoch": 1.2959231858964269,
|
| 728 |
+
"grad_norm": 0.6718019843101501,
|
| 729 |
+
"learning_rate": 3.336399140651385e-06,
|
| 730 |
+
"loss": 0.4536,
|
| 731 |
+
"step": 1030
|
| 732 |
+
},
|
| 733 |
+
{
|
| 734 |
+
"epoch": 1.3085156618920195,
|
| 735 |
+
"grad_norm": 0.7337303161621094,
|
| 736 |
+
"learning_rate": 3.2332932378319803e-06,
|
| 737 |
+
"loss": 0.4489,
|
| 738 |
+
"step": 1040
|
| 739 |
+
},
|
| 740 |
+
{
|
| 741 |
+
"epoch": 1.3211081378876122,
|
| 742 |
+
"grad_norm": 0.6914992332458496,
|
| 743 |
+
"learning_rate": 3.1310388013953897e-06,
|
| 744 |
+
"loss": 0.4627,
|
| 745 |
+
"step": 1050
|
| 746 |
+
},
|
| 747 |
+
{
|
| 748 |
+
"epoch": 1.3337006138832048,
|
| 749 |
+
"grad_norm": 0.6538123488426208,
|
| 750 |
+
"learning_rate": 3.029685112986417e-06,
|
| 751 |
+
"loss": 0.4656,
|
| 752 |
+
"step": 1060
|
| 753 |
+
},
|
| 754 |
+
{
|
| 755 |
+
"epoch": 1.3462930898787975,
|
| 756 |
+
"grad_norm": 0.6971728205680847,
|
| 757 |
+
"learning_rate": 2.9292810201332995e-06,
|
| 758 |
+
"loss": 0.4591,
|
| 759 |
+
"step": 1070
|
| 760 |
+
},
|
| 761 |
+
{
|
| 762 |
+
"epoch": 1.3588855658743901,
|
| 763 |
+
"grad_norm": 0.7958908081054688,
|
| 764 |
+
"learning_rate": 2.8298749127055914e-06,
|
| 765 |
+
"loss": 0.4646,
|
| 766 |
+
"step": 1080
|
| 767 |
+
},
|
| 768 |
+
{
|
| 769 |
+
"epoch": 1.3714780418699828,
|
| 770 |
+
"grad_norm": 0.6317901015281677,
|
| 771 |
+
"learning_rate": 2.7315146995926085e-06,
|
| 772 |
+
"loss": 0.452,
|
| 773 |
+
"step": 1090
|
| 774 |
+
},
|
| 775 |
+
{
|
| 776 |
+
"epoch": 1.3840705178655752,
|
| 777 |
+
"grad_norm": 0.6657803654670715,
|
| 778 |
+
"learning_rate": 2.6342477856136806e-06,
|
| 779 |
+
"loss": 0.4587,
|
| 780 |
+
"step": 1100
|
| 781 |
+
},
|
| 782 |
+
{
|
| 783 |
+
"epoch": 1.3966629938611679,
|
| 784 |
+
"grad_norm": 0.6851221323013306,
|
| 785 |
+
"learning_rate": 2.53812104867135e-06,
|
| 786 |
+
"loss": 0.4623,
|
| 787 |
+
"step": 1110
|
| 788 |
+
},
|
| 789 |
+
{
|
| 790 |
+
"epoch": 1.4092554698567605,
|
| 791 |
+
"grad_norm": 0.6648300290107727,
|
| 792 |
+
"learning_rate": 2.443180817158502e-06,
|
| 793 |
+
"loss": 0.463,
|
| 794 |
+
"step": 1120
|
| 795 |
+
},
|
| 796 |
+
{
|
| 797 |
+
"epoch": 1.4218479458523532,
|
| 798 |
+
"grad_norm": 0.6653546690940857,
|
| 799 |
+
"learning_rate": 2.3494728476303547e-06,
|
| 800 |
+
"loss": 0.4582,
|
| 801 |
+
"step": 1130
|
| 802 |
+
},
|
| 803 |
+
{
|
| 804 |
+
"epoch": 1.4344404218479458,
|
| 805 |
+
"grad_norm": 0.6322280168533325,
|
| 806 |
+
"learning_rate": 2.2570423027520175e-06,
|
| 807 |
+
"loss": 0.4583,
|
| 808 |
+
"step": 1140
|
| 809 |
+
},
|
| 810 |
+
{
|
| 811 |
+
"epoch": 1.4470328978435385,
|
| 812 |
+
"grad_norm": 0.6664165258407593,
|
| 813 |
+
"learning_rate": 2.1659337295323117e-06,
|
| 814 |
+
"loss": 0.4675,
|
| 815 |
+
"step": 1150
|
| 816 |
+
},
|
| 817 |
+
{
|
| 818 |
+
"epoch": 1.4596253738391312,
|
| 819 |
+
"grad_norm": 0.6616364121437073,
|
| 820 |
+
"learning_rate": 2.076191037854267e-06,
|
| 821 |
+
"loss": 0.4573,
|
| 822 |
+
"step": 1160
|
| 823 |
+
},
|
| 824 |
+
{
|
| 825 |
+
"epoch": 1.4722178498347238,
|
| 826 |
+
"grad_norm": 0.6122522950172424,
|
| 827 |
+
"learning_rate": 1.987857479312721e-06,
|
| 828 |
+
"loss": 0.4647,
|
| 829 |
+
"step": 1170
|
| 830 |
+
},
|
| 831 |
+
{
|
| 832 |
+
"epoch": 1.4848103258303165,
|
| 833 |
+
"grad_norm": 0.7026521563529968,
|
| 834 |
+
"learning_rate": 1.9009756263691475e-06,
|
| 835 |
+
"loss": 0.4633,
|
| 836 |
+
"step": 1180
|
| 837 |
+
},
|
| 838 |
+
{
|
| 839 |
+
"epoch": 1.497402801825909,
|
| 840 |
+
"grad_norm": 0.7002488374710083,
|
| 841 |
+
"learning_rate": 1.815587351833818e-06,
|
| 842 |
+
"loss": 0.4591,
|
| 843 |
+
"step": 1190
|
| 844 |
+
},
|
| 845 |
+
{
|
| 846 |
+
"epoch": 1.5099952778215018,
|
| 847 |
+
"grad_norm": 0.6730997562408447,
|
| 848 |
+
"learning_rate": 1.7317338086851526e-06,
|
| 849 |
+
"loss": 0.459,
|
| 850 |
+
"step": 1200
|
| 851 |
+
},
|
| 852 |
+
{
|
| 853 |
+
"epoch": 1.5225877538170942,
|
| 854 |
+
"grad_norm": 0.6867558360099792,
|
| 855 |
+
"learning_rate": 1.649455410235985e-06,
|
| 856 |
+
"loss": 0.4538,
|
| 857 |
+
"step": 1210
|
| 858 |
+
},
|
| 859 |
+
{
|
| 860 |
+
"epoch": 1.5351802298126869,
|
| 861 |
+
"grad_norm": 0.646712064743042,
|
| 862 |
+
"learning_rate": 1.5687918106563326e-06,
|
| 863 |
+
"loss": 0.4592,
|
| 864 |
+
"step": 1220
|
| 865 |
+
},
|
| 866 |
+
{
|
| 867 |
+
"epoch": 1.5477727058082795,
|
| 868 |
+
"grad_norm": 0.7295082807540894,
|
| 869 |
+
"learning_rate": 1.4897818858620095e-06,
|
| 870 |
+
"loss": 0.4549,
|
| 871 |
+
"step": 1230
|
| 872 |
+
},
|
| 873 |
+
{
|
| 874 |
+
"epoch": 1.5603651818038722,
|
| 875 |
+
"grad_norm": 0.6636416912078857,
|
| 876 |
+
"learning_rate": 1.4124637147783431e-06,
|
| 877 |
+
"loss": 0.4516,
|
| 878 |
+
"step": 1240
|
| 879 |
+
},
|
| 880 |
+
{
|
| 881 |
+
"epoch": 1.5729576577994648,
|
| 882 |
+
"grad_norm": 0.7151824831962585,
|
| 883 |
+
"learning_rate": 1.3368745609879908e-06,
|
| 884 |
+
"loss": 0.4601,
|
| 885 |
+
"step": 1250
|
| 886 |
+
},
|
| 887 |
+
{
|
| 888 |
+
"epoch": 1.5855501337950575,
|
| 889 |
+
"grad_norm": 0.626083493232727,
|
| 890 |
+
"learning_rate": 1.263050854771705e-06,
|
| 891 |
+
"loss": 0.4545,
|
| 892 |
+
"step": 1260
|
| 893 |
+
},
|
| 894 |
+
{
|
| 895 |
+
"epoch": 1.5981426097906501,
|
| 896 |
+
"grad_norm": 0.638517439365387,
|
| 897 |
+
"learning_rate": 1.191028175550727e-06,
|
| 898 |
+
"loss": 0.4455,
|
| 899 |
+
"step": 1270
|
| 900 |
+
},
|
| 901 |
+
{
|
| 902 |
+
"epoch": 1.6107350857862426,
|
| 903 |
+
"grad_norm": 0.6936119198799133,
|
| 904 |
+
"learning_rate": 1.1208412347392338e-06,
|
| 905 |
+
"loss": 0.4445,
|
| 906 |
+
"step": 1280
|
| 907 |
+
},
|
| 908 |
+
{
|
| 909 |
+
"epoch": 1.6233275617818355,
|
| 910 |
+
"grad_norm": 0.6687029004096985,
|
| 911 |
+
"learning_rate": 1.0525238590151442e-06,
|
| 912 |
+
"loss": 0.4496,
|
| 913 |
+
"step": 1290
|
| 914 |
+
},
|
| 915 |
+
{
|
| 916 |
+
"epoch": 1.635920037777428,
|
| 917 |
+
"grad_norm": 0.6389179229736328,
|
| 918 |
+
"learning_rate": 9.86108974017298e-07,
|
| 919 |
+
"loss": 0.4546,
|
| 920 |
+
"step": 1300
|
| 921 |
+
},
|
| 922 |
+
{
|
| 923 |
+
"epoch": 1.6485125137730208,
|
| 924 |
+
"grad_norm": 0.615524172782898,
|
| 925 |
+
"learning_rate": 9.216285884769172e-07,
|
| 926 |
+
"loss": 0.4583,
|
| 927 |
+
"step": 1310
|
| 928 |
+
},
|
| 929 |
+
{
|
| 930 |
+
"epoch": 1.6611049897686132,
|
| 931 |
+
"grad_norm": 0.6831961274147034,
|
| 932 |
+
"learning_rate": 8.591137787909503e-07,
|
| 933 |
+
"loss": 0.4555,
|
| 934 |
+
"step": 1320
|
| 935 |
+
},
|
| 936 |
+
{
|
| 937 |
+
"epoch": 1.6736974657642059,
|
| 938 |
+
"grad_norm": 0.6675463914871216,
|
| 939 |
+
"learning_rate": 7.985946740447792e-07,
|
| 940 |
+
"loss": 0.4497,
|
| 941 |
+
"step": 1330
|
| 942 |
+
},
|
| 943 |
+
{
|
| 944 |
+
"epoch": 1.6862899417597985,
|
| 945 |
+
"grad_norm": 0.6614457368850708,
|
| 946 |
+
"learning_rate": 7.401004414914586e-07,
|
| 947 |
+
"loss": 0.463,
|
| 948 |
+
"step": 1340
|
| 949 |
+
},
|
| 950 |
+
{
|
| 951 |
+
"epoch": 1.6988824177553912,
|
| 952 |
+
"grad_norm": 0.6566441059112549,
|
| 953 |
+
"learning_rate": 6.836592724945323e-07,
|
| 954 |
+
"loss": 0.4561,
|
| 955 |
+
"step": 1350
|
| 956 |
+
},
|
| 957 |
+
{
|
| 958 |
+
"epoch": 1.7114748937509838,
|
| 959 |
+
"grad_norm": 0.6838300228118896,
|
| 960 |
+
"learning_rate": 6.292983689411725e-07,
|
| 961 |
+
"loss": 0.4624,
|
| 962 |
+
"step": 1360
|
| 963 |
+
},
|
| 964 |
+
{
|
| 965 |
+
"epoch": 1.7240673697465763,
|
| 966 |
+
"grad_norm": 0.7261980772018433,
|
| 967 |
+
"learning_rate": 5.770439301321929e-07,
|
| 968 |
+
"loss": 0.4467,
|
| 969 |
+
"step": 1370
|
| 970 |
+
},
|
| 971 |
+
{
|
| 972 |
+
"epoch": 1.7366598457421691,
|
| 973 |
+
"grad_norm": 0.6649404168128967,
|
| 974 |
+
"learning_rate": 5.269211401552721e-07,
|
| 975 |
+
"loss": 0.451,
|
| 976 |
+
"step": 1380
|
| 977 |
+
},
|
| 978 |
+
{
|
| 979 |
+
"epoch": 1.7492523217377616,
|
| 980 |
+
"grad_norm": 0.6621150374412537,
|
| 981 |
+
"learning_rate": 4.78954155747448e-07,
|
| 982 |
+
"loss": 0.4639,
|
| 983 |
+
"step": 1390
|
| 984 |
+
},
|
| 985 |
+
{
|
| 986 |
+
"epoch": 1.7618447977333545,
|
| 987 |
+
"grad_norm": 0.71415114402771,
|
| 988 |
+
"learning_rate": 4.3316609465275437e-07,
|
| 989 |
+
"loss": 0.4518,
|
| 990 |
+
"step": 1400
|
| 991 |
+
},
|
| 992 |
+
{
|
| 993 |
+
"epoch": 1.7744372737289469,
|
| 994 |
+
"grad_norm": 0.6590133309364319,
|
| 995 |
+
"learning_rate": 3.895790244805936e-07,
|
| 996 |
+
"loss": 0.4603,
|
| 997 |
+
"step": 1410
|
| 998 |
+
},
|
| 999 |
+
{
|
| 1000 |
+
"epoch": 1.7870297497245395,
|
| 1001 |
+
"grad_norm": 0.673554003238678,
|
| 1002 |
+
"learning_rate": 3.4821395207022767e-07,
|
| 1003 |
+
"loss": 0.4512,
|
| 1004 |
+
"step": 1420
|
| 1005 |
+
},
|
| 1006 |
+
{
|
| 1007 |
+
"epoch": 1.7996222257201322,
|
| 1008 |
+
"grad_norm": 0.6459655165672302,
|
| 1009 |
+
"learning_rate": 3.0909081336650883e-07,
|
| 1010 |
+
"loss": 0.4597,
|
| 1011 |
+
"step": 1430
|
| 1012 |
+
},
|
| 1013 |
+
{
|
| 1014 |
+
"epoch": 1.8122147017157249,
|
| 1015 |
+
"grad_norm": 0.5669092535972595,
|
| 1016 |
+
"learning_rate": 2.7222846381172616e-07,
|
| 1017 |
+
"loss": 0.467,
|
| 1018 |
+
"step": 1440
|
| 1019 |
+
},
|
| 1020 |
+
{
|
| 1021 |
+
"epoch": 1.8248071777113175,
|
| 1022 |
+
"grad_norm": 0.6353556513786316,
|
| 1023 |
+
"learning_rate": 2.3764466925820518e-07,
|
| 1024 |
+
"loss": 0.4617,
|
| 1025 |
+
"step": 1450
|
| 1026 |
+
},
|
| 1027 |
+
{
|
| 1028 |
+
"epoch": 1.8373996537069102,
|
| 1029 |
+
"grad_norm": 0.780803382396698,
|
| 1030 |
+
"learning_rate": 2.0535609740603092e-07,
|
| 1031 |
+
"loss": 0.4586,
|
| 1032 |
+
"step": 1460
|
| 1033 |
+
},
|
| 1034 |
+
{
|
| 1035 |
+
"epoch": 1.8499921297025028,
|
| 1036 |
+
"grad_norm": 0.6868489980697632,
|
| 1037 |
+
"learning_rate": 1.7537830977003456e-07,
|
| 1038 |
+
"loss": 0.4504,
|
| 1039 |
+
"step": 1470
|
| 1040 |
+
},
|
| 1041 |
+
{
|
| 1042 |
+
"epoch": 1.8625846056980953,
|
| 1043 |
+
"grad_norm": 0.7090204358100891,
|
| 1044 |
+
"learning_rate": 1.477257541799032e-07,
|
| 1045 |
+
"loss": 0.4534,
|
| 1046 |
+
"step": 1480
|
| 1047 |
+
},
|
| 1048 |
+
{
|
| 1049 |
+
"epoch": 1.8751770816936881,
|
| 1050 |
+
"grad_norm": 0.6413393616676331,
|
| 1051 |
+
"learning_rate": 1.2241175781702587e-07,
|
| 1052 |
+
"loss": 0.4553,
|
| 1053 |
+
"step": 1490
|
| 1054 |
+
},
|
| 1055 |
+
{
|
| 1056 |
+
"epoch": 1.8877695576892806,
|
| 1057 |
+
"grad_norm": 0.6809853911399841,
|
| 1058 |
+
"learning_rate": 9.944852079144862e-08,
|
| 1059 |
+
"loss": 0.4572,
|
| 1060 |
+
"step": 1500
|
| 1061 |
+
},
|
| 1062 |
+
{
|
| 1063 |
+
"epoch": 1.9003620336848734,
|
| 1064 |
+
"grad_norm": 0.6310750842094421,
|
| 1065 |
+
"learning_rate": 7.884711026201586e-08,
|
| 1066 |
+
"loss": 0.4589,
|
| 1067 |
+
"step": 1510
|
| 1068 |
+
},
|
| 1069 |
+
{
|
| 1070 |
+
"epoch": 1.9129545096804659,
|
| 1071 |
+
"grad_norm": 0.6572443842887878,
|
| 1072 |
+
"learning_rate": 6.061745510254069e-08,
|
| 1073 |
+
"loss": 0.4547,
|
| 1074 |
+
"step": 1520
|
| 1075 |
+
},
|
| 1076 |
+
{
|
| 1077 |
+
"epoch": 1.9255469856760585,
|
| 1078 |
+
"grad_norm": 0.6803452372550964,
|
| 1079 |
+
"learning_rate": 4.476834111656891e-08,
|
| 1080 |
+
"loss": 0.4533,
|
| 1081 |
+
"step": 1530
|
| 1082 |
+
},
|
| 1083 |
+
{
|
| 1084 |
+
"epoch": 1.9381394616716512,
|
| 1085 |
+
"grad_norm": 0.7150864005088806,
|
| 1086 |
+
"learning_rate": 3.130740680305666e-08,
|
| 1087 |
+
"loss": 0.4503,
|
| 1088 |
+
"step": 1540
|
| 1089 |
+
},
|
| 1090 |
+
{
|
| 1091 |
+
"epoch": 1.9507319376672438,
|
| 1092 |
+
"grad_norm": 0.6401649117469788,
|
| 1093 |
+
"learning_rate": 2.0241139674982424e-08,
|
| 1094 |
+
"loss": 0.4447,
|
| 1095 |
+
"step": 1550
|
| 1096 |
+
},
|
| 1097 |
+
{
|
| 1098 |
+
"epoch": 1.9633244136628365,
|
| 1099 |
+
"grad_norm": 0.635547935962677,
|
| 1100 |
+
"learning_rate": 1.1574873132684239e-08,
|
| 1101 |
+
"loss": 0.4649,
|
| 1102 |
+
"step": 1560
|
| 1103 |
+
},
|
| 1104 |
+
{
|
| 1105 |
+
"epoch": 1.975916889658429,
|
| 1106 |
+
"grad_norm": 0.6223708391189575,
|
| 1107 |
+
"learning_rate": 5.31278389342138e-09,
|
| 1108 |
+
"loss": 0.4572,
|
| 1109 |
+
"step": 1570
|
| 1110 |
+
},
|
| 1111 |
+
{
|
| 1112 |
+
"epoch": 1.9885093656540218,
|
| 1113 |
+
"grad_norm": 0.6320503950119019,
|
| 1114 |
+
"learning_rate": 1.4578899784001288e-09,
|
| 1115 |
+
"loss": 0.4551,
|
| 1116 |
+
"step": 1580
|
| 1117 |
+
},
|
| 1118 |
+
{
|
| 1119 |
+
"epoch": 2.0,
|
| 1120 |
+
"grad_norm": 0.6981777548789978,
|
| 1121 |
+
"learning_rate": 1.2049258235058425e-11,
|
| 1122 |
+
"loss": 0.4489,
|
| 1123 |
+
"step": 1590
|
| 1124 |
+
},
|
| 1125 |
+
{
|
| 1126 |
+
"epoch": 2.0,
|
| 1127 |
+
"step": 1590,
|
| 1128 |
+
"total_flos": 207829311291392.0,
|
| 1129 |
+
"train_loss": 0.4968748902374843,
|
| 1130 |
+
"train_runtime": 50004.2557,
|
| 1131 |
+
"train_samples_per_second": 1.016,
|
| 1132 |
+
"train_steps_per_second": 0.032
|
| 1133 |
+
}
|
| 1134 |
+
],
|
| 1135 |
+
"logging_steps": 10,
|
| 1136 |
+
"max_steps": 1590,
|
| 1137 |
+
"num_input_tokens_seen": 0,
|
| 1138 |
+
"num_train_epochs": 2,
|
| 1139 |
+
"save_steps": 1000,
|
| 1140 |
+
"stateful_callbacks": {
|
| 1141 |
+
"TrainerControl": {
|
| 1142 |
+
"args": {
|
| 1143 |
+
"should_epoch_stop": false,
|
| 1144 |
+
"should_evaluate": false,
|
| 1145 |
+
"should_log": false,
|
| 1146 |
+
"should_save": true,
|
| 1147 |
+
"should_training_stop": true
|
| 1148 |
+
},
|
| 1149 |
+
"attributes": {}
|
| 1150 |
+
}
|
| 1151 |
+
},
|
| 1152 |
+
"total_flos": 207829311291392.0,
|
| 1153 |
+
"train_batch_size": 1,
|
| 1154 |
+
"trial_name": null,
|
| 1155 |
+
"trial_params": null
|
| 1156 |
+
}
|
training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a767ca601b21e241f288dac49cc117bb52125a081887f482139f0773312d1ba8
|
| 3 |
+
size 7736
|
training_loss.png
ADDED
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|