Text Generation
Transformers
Safetensors
qwen3
Generated from Trainer
unsloth
trl
sft
custom_code
text-generation-inference
Instructions to use Ba2han/qwen3_from_scratch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/qwen3_from_scratch with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/qwen3_from_scratch", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/qwen3_from_scratch", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/qwen3_from_scratch", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/qwen3_from_scratch with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/qwen3_from_scratch" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/qwen3_from_scratch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/qwen3_from_scratch
- SGLang
How to use Ba2han/qwen3_from_scratch with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/qwen3_from_scratch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/qwen3_from_scratch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/qwen3_from_scratch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/qwen3_from_scratch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/qwen3_from_scratch with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/qwen3_from_scratch to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/qwen3_from_scratch to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/qwen3_from_scratch to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/qwen3_from_scratch", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/qwen3_from_scratch with Docker Model Runner:
docker model run hf.co/Ba2han/qwen3_from_scratch
Training in progress, step 19845, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1049610600
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:41e0abab32c119c1d1232cc0d5e5a5d0eab6127a4b8d374eacf6dc6b48824f43
|
| 3 |
size 1049610600
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 679309771
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cab47d41f556f72ae4e0d4d10afb62b8977f3b82b0d2b92d0e3ee8035302e52d
|
| 3 |
size 679309771
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1465
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:161f030bb1939184a948b9597b7753b74c9883f7e303ca9bb24970dba8e1e05a
|
| 3 |
size 1465
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 2450,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -66893,6 +66893,2575 @@
|
|
| 66893 |
"learning_rate": 0.013540973312401885,
|
| 66894 |
"loss": 2.0799460411071777,
|
| 66895 |
"step": 19110
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66896 |
}
|
| 66897 |
],
|
| 66898 |
"logging_steps": 2,
|
|
@@ -66912,7 +69481,7 @@
|
|
| 66912 |
"attributes": {}
|
| 66913 |
}
|
| 66914 |
},
|
| 66915 |
-
"total_flos": 6.
|
| 66916 |
"train_batch_size": 6,
|
| 66917 |
"trial_name": null,
|
| 66918 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.81,
|
| 6 |
"eval_steps": 2450,
|
| 7 |
+
"global_step": 19845,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 66893 |
"learning_rate": 0.013540973312401885,
|
| 66894 |
"loss": 2.0799460411071777,
|
| 66895 |
"step": 19110
|
| 66896 |
+
},
|
| 66897 |
+
{
|
| 66898 |
+
"epoch": 0.7800816326530612,
|
| 66899 |
+
"grad_norm": 0.0693359375,
|
| 66900 |
+
"learning_rate": 0.013535949764521193,
|
| 66901 |
+
"loss": 2.054192066192627,
|
| 66902 |
+
"step": 19112
|
| 66903 |
+
},
|
| 66904 |
+
{
|
| 66905 |
+
"epoch": 0.7801632653061225,
|
| 66906 |
+
"grad_norm": 0.06591796875,
|
| 66907 |
+
"learning_rate": 0.013530926216640501,
|
| 66908 |
+
"loss": 2.090846061706543,
|
| 66909 |
+
"step": 19114
|
| 66910 |
+
},
|
| 66911 |
+
{
|
| 66912 |
+
"epoch": 0.7802448979591837,
|
| 66913 |
+
"grad_norm": 0.06884765625,
|
| 66914 |
+
"learning_rate": 0.013525902668759813,
|
| 66915 |
+
"loss": 2.0709638595581055,
|
| 66916 |
+
"step": 19116
|
| 66917 |
+
},
|
| 66918 |
+
{
|
| 66919 |
+
"epoch": 0.7803265306122449,
|
| 66920 |
+
"grad_norm": 0.06787109375,
|
| 66921 |
+
"learning_rate": 0.01352087912087912,
|
| 66922 |
+
"loss": 2.0809552669525146,
|
| 66923 |
+
"step": 19118
|
| 66924 |
+
},
|
| 66925 |
+
{
|
| 66926 |
+
"epoch": 0.7804081632653062,
|
| 66927 |
+
"grad_norm": 0.06787109375,
|
| 66928 |
+
"learning_rate": 0.01351585557299843,
|
| 66929 |
+
"loss": 2.0769827365875244,
|
| 66930 |
+
"step": 19120
|
| 66931 |
+
},
|
| 66932 |
+
{
|
| 66933 |
+
"epoch": 0.7804897959183673,
|
| 66934 |
+
"grad_norm": 0.06591796875,
|
| 66935 |
+
"learning_rate": 0.013510832025117742,
|
| 66936 |
+
"loss": 2.0893237590789795,
|
| 66937 |
+
"step": 19122
|
| 66938 |
+
},
|
| 66939 |
+
{
|
| 66940 |
+
"epoch": 0.7805714285714286,
|
| 66941 |
+
"grad_norm": 0.06884765625,
|
| 66942 |
+
"learning_rate": 0.01350580847723705,
|
| 66943 |
+
"loss": 2.116316795349121,
|
| 66944 |
+
"step": 19124
|
| 66945 |
+
},
|
| 66946 |
+
{
|
| 66947 |
+
"epoch": 0.7806530612244897,
|
| 66948 |
+
"grad_norm": 0.0703125,
|
| 66949 |
+
"learning_rate": 0.013500784929356358,
|
| 66950 |
+
"loss": 2.1180949211120605,
|
| 66951 |
+
"step": 19126
|
| 66952 |
+
},
|
| 66953 |
+
{
|
| 66954 |
+
"epoch": 0.780734693877551,
|
| 66955 |
+
"grad_norm": 0.06396484375,
|
| 66956 |
+
"learning_rate": 0.013495761381475666,
|
| 66957 |
+
"loss": 2.1151981353759766,
|
| 66958 |
+
"step": 19128
|
| 66959 |
+
},
|
| 66960 |
+
{
|
| 66961 |
+
"epoch": 0.7808163265306123,
|
| 66962 |
+
"grad_norm": 0.0673828125,
|
| 66963 |
+
"learning_rate": 0.013490737833594977,
|
| 66964 |
+
"loss": 2.1070563793182373,
|
| 66965 |
+
"step": 19130
|
| 66966 |
+
},
|
| 66967 |
+
{
|
| 66968 |
+
"epoch": 0.7808979591836734,
|
| 66969 |
+
"grad_norm": 0.0751953125,
|
| 66970 |
+
"learning_rate": 0.013485714285714287,
|
| 66971 |
+
"loss": 2.143016815185547,
|
| 66972 |
+
"step": 19132
|
| 66973 |
+
},
|
| 66974 |
+
{
|
| 66975 |
+
"epoch": 0.7809795918367347,
|
| 66976 |
+
"grad_norm": 0.06982421875,
|
| 66977 |
+
"learning_rate": 0.013480690737833595,
|
| 66978 |
+
"loss": 2.1049246788024902,
|
| 66979 |
+
"step": 19134
|
| 66980 |
+
},
|
| 66981 |
+
{
|
| 66982 |
+
"epoch": 0.7810612244897959,
|
| 66983 |
+
"grad_norm": 0.07470703125,
|
| 66984 |
+
"learning_rate": 0.013475667189952906,
|
| 66985 |
+
"loss": 2.116550922393799,
|
| 66986 |
+
"step": 19136
|
| 66987 |
+
},
|
| 66988 |
+
{
|
| 66989 |
+
"epoch": 0.7811428571428571,
|
| 66990 |
+
"grad_norm": 0.06591796875,
|
| 66991 |
+
"learning_rate": 0.013470643642072214,
|
| 66992 |
+
"loss": 2.1489453315734863,
|
| 66993 |
+
"step": 19138
|
| 66994 |
+
},
|
| 66995 |
+
{
|
| 66996 |
+
"epoch": 0.7812244897959184,
|
| 66997 |
+
"grad_norm": 0.06494140625,
|
| 66998 |
+
"learning_rate": 0.013465620094191522,
|
| 66999 |
+
"loss": 2.127504348754883,
|
| 67000 |
+
"step": 19140
|
| 67001 |
+
},
|
| 67002 |
+
{
|
| 67003 |
+
"epoch": 0.7813061224489796,
|
| 67004 |
+
"grad_norm": 0.0654296875,
|
| 67005 |
+
"learning_rate": 0.013460596546310832,
|
| 67006 |
+
"loss": 2.1396970748901367,
|
| 67007 |
+
"step": 19142
|
| 67008 |
+
},
|
| 67009 |
+
{
|
| 67010 |
+
"epoch": 0.7813877551020408,
|
| 67011 |
+
"grad_norm": 0.06591796875,
|
| 67012 |
+
"learning_rate": 0.013455572998430143,
|
| 67013 |
+
"loss": 2.1371233463287354,
|
| 67014 |
+
"step": 19144
|
| 67015 |
+
},
|
| 67016 |
+
{
|
| 67017 |
+
"epoch": 0.781469387755102,
|
| 67018 |
+
"grad_norm": 0.06640625,
|
| 67019 |
+
"learning_rate": 0.013450549450549451,
|
| 67020 |
+
"loss": 2.141871690750122,
|
| 67021 |
+
"step": 19146
|
| 67022 |
+
},
|
| 67023 |
+
{
|
| 67024 |
+
"epoch": 0.7815510204081633,
|
| 67025 |
+
"grad_norm": 0.0673828125,
|
| 67026 |
+
"learning_rate": 0.013445525902668759,
|
| 67027 |
+
"loss": 2.151590347290039,
|
| 67028 |
+
"step": 19148
|
| 67029 |
+
},
|
| 67030 |
+
{
|
| 67031 |
+
"epoch": 0.7816326530612245,
|
| 67032 |
+
"grad_norm": 0.06591796875,
|
| 67033 |
+
"learning_rate": 0.013440502354788069,
|
| 67034 |
+
"loss": 2.1239943504333496,
|
| 67035 |
+
"step": 19150
|
| 67036 |
+
},
|
| 67037 |
+
{
|
| 67038 |
+
"epoch": 0.7817142857142857,
|
| 67039 |
+
"grad_norm": 0.0673828125,
|
| 67040 |
+
"learning_rate": 0.013435478806907378,
|
| 67041 |
+
"loss": 2.147034168243408,
|
| 67042 |
+
"step": 19152
|
| 67043 |
+
},
|
| 67044 |
+
{
|
| 67045 |
+
"epoch": 0.781795918367347,
|
| 67046 |
+
"grad_norm": 0.07177734375,
|
| 67047 |
+
"learning_rate": 0.013430455259026688,
|
| 67048 |
+
"loss": 2.155128240585327,
|
| 67049 |
+
"step": 19154
|
| 67050 |
+
},
|
| 67051 |
+
{
|
| 67052 |
+
"epoch": 0.7818775510204081,
|
| 67053 |
+
"grad_norm": 0.0673828125,
|
| 67054 |
+
"learning_rate": 0.013425431711145996,
|
| 67055 |
+
"loss": 2.1568539142608643,
|
| 67056 |
+
"step": 19156
|
| 67057 |
+
},
|
| 67058 |
+
{
|
| 67059 |
+
"epoch": 0.7819591836734694,
|
| 67060 |
+
"grad_norm": 0.0654296875,
|
| 67061 |
+
"learning_rate": 0.013420408163265308,
|
| 67062 |
+
"loss": 2.1336488723754883,
|
| 67063 |
+
"step": 19158
|
| 67064 |
+
},
|
| 67065 |
+
{
|
| 67066 |
+
"epoch": 0.7820408163265307,
|
| 67067 |
+
"grad_norm": 0.072265625,
|
| 67068 |
+
"learning_rate": 0.013415384615384615,
|
| 67069 |
+
"loss": 2.1924004554748535,
|
| 67070 |
+
"step": 19160
|
| 67071 |
+
},
|
| 67072 |
+
{
|
| 67073 |
+
"epoch": 0.7821224489795918,
|
| 67074 |
+
"grad_norm": 0.06689453125,
|
| 67075 |
+
"learning_rate": 0.013410361067503925,
|
| 67076 |
+
"loss": 2.146590232849121,
|
| 67077 |
+
"step": 19162
|
| 67078 |
+
},
|
| 67079 |
+
{
|
| 67080 |
+
"epoch": 0.7822040816326531,
|
| 67081 |
+
"grad_norm": 0.0703125,
|
| 67082 |
+
"learning_rate": 0.013405337519623233,
|
| 67083 |
+
"loss": 2.188626766204834,
|
| 67084 |
+
"step": 19164
|
| 67085 |
+
},
|
| 67086 |
+
{
|
| 67087 |
+
"epoch": 0.7822857142857143,
|
| 67088 |
+
"grad_norm": 0.06884765625,
|
| 67089 |
+
"learning_rate": 0.013400313971742545,
|
| 67090 |
+
"loss": 2.1519150733947754,
|
| 67091 |
+
"step": 19166
|
| 67092 |
+
},
|
| 67093 |
+
{
|
| 67094 |
+
"epoch": 0.7823673469387755,
|
| 67095 |
+
"grad_norm": 0.06640625,
|
| 67096 |
+
"learning_rate": 0.013395290423861853,
|
| 67097 |
+
"loss": 2.1799697875976562,
|
| 67098 |
+
"step": 19168
|
| 67099 |
+
},
|
| 67100 |
+
{
|
| 67101 |
+
"epoch": 0.7824489795918367,
|
| 67102 |
+
"grad_norm": 0.068359375,
|
| 67103 |
+
"learning_rate": 0.01339026687598116,
|
| 67104 |
+
"loss": 2.18399715423584,
|
| 67105 |
+
"step": 19170
|
| 67106 |
+
},
|
| 67107 |
+
{
|
| 67108 |
+
"epoch": 0.782530612244898,
|
| 67109 |
+
"grad_norm": 0.06884765625,
|
| 67110 |
+
"learning_rate": 0.013385243328100472,
|
| 67111 |
+
"loss": 2.1406588554382324,
|
| 67112 |
+
"step": 19172
|
| 67113 |
+
},
|
| 67114 |
+
{
|
| 67115 |
+
"epoch": 0.7826122448979592,
|
| 67116 |
+
"grad_norm": 0.07080078125,
|
| 67117 |
+
"learning_rate": 0.013380219780219782,
|
| 67118 |
+
"loss": 2.1615281105041504,
|
| 67119 |
+
"step": 19174
|
| 67120 |
+
},
|
| 67121 |
+
{
|
| 67122 |
+
"epoch": 0.7826938775510204,
|
| 67123 |
+
"grad_norm": 0.07763671875,
|
| 67124 |
+
"learning_rate": 0.01337519623233909,
|
| 67125 |
+
"loss": 2.171473503112793,
|
| 67126 |
+
"step": 19176
|
| 67127 |
+
},
|
| 67128 |
+
{
|
| 67129 |
+
"epoch": 0.7827755102040816,
|
| 67130 |
+
"grad_norm": 0.0732421875,
|
| 67131 |
+
"learning_rate": 0.013370172684458398,
|
| 67132 |
+
"loss": 2.1505184173583984,
|
| 67133 |
+
"step": 19178
|
| 67134 |
+
},
|
| 67135 |
+
{
|
| 67136 |
+
"epoch": 0.7828571428571428,
|
| 67137 |
+
"grad_norm": 0.06884765625,
|
| 67138 |
+
"learning_rate": 0.013365149136577709,
|
| 67139 |
+
"loss": 2.160815954208374,
|
| 67140 |
+
"step": 19180
|
| 67141 |
+
},
|
| 67142 |
+
{
|
| 67143 |
+
"epoch": 0.7829387755102041,
|
| 67144 |
+
"grad_norm": 0.0673828125,
|
| 67145 |
+
"learning_rate": 0.013360125588697017,
|
| 67146 |
+
"loss": 2.1818461418151855,
|
| 67147 |
+
"step": 19182
|
| 67148 |
+
},
|
| 67149 |
+
{
|
| 67150 |
+
"epoch": 0.7830204081632653,
|
| 67151 |
+
"grad_norm": 0.07373046875,
|
| 67152 |
+
"learning_rate": 0.013355102040816327,
|
| 67153 |
+
"loss": 2.183828353881836,
|
| 67154 |
+
"step": 19184
|
| 67155 |
+
},
|
| 67156 |
+
{
|
| 67157 |
+
"epoch": 0.7831020408163265,
|
| 67158 |
+
"grad_norm": 0.07275390625,
|
| 67159 |
+
"learning_rate": 0.013350078492935638,
|
| 67160 |
+
"loss": 2.183948040008545,
|
| 67161 |
+
"step": 19186
|
| 67162 |
+
},
|
| 67163 |
+
{
|
| 67164 |
+
"epoch": 0.7831836734693878,
|
| 67165 |
+
"grad_norm": 0.0732421875,
|
| 67166 |
+
"learning_rate": 0.013345054945054946,
|
| 67167 |
+
"loss": 2.185272693634033,
|
| 67168 |
+
"step": 19188
|
| 67169 |
+
},
|
| 67170 |
+
{
|
| 67171 |
+
"epoch": 0.7832653061224489,
|
| 67172 |
+
"grad_norm": 0.08154296875,
|
| 67173 |
+
"learning_rate": 0.013340031397174254,
|
| 67174 |
+
"loss": 2.154519557952881,
|
| 67175 |
+
"step": 19190
|
| 67176 |
+
},
|
| 67177 |
+
{
|
| 67178 |
+
"epoch": 0.7833469387755102,
|
| 67179 |
+
"grad_norm": 0.06982421875,
|
| 67180 |
+
"learning_rate": 0.013335007849293564,
|
| 67181 |
+
"loss": 2.1563374996185303,
|
| 67182 |
+
"step": 19192
|
| 67183 |
+
},
|
| 67184 |
+
{
|
| 67185 |
+
"epoch": 0.7834285714285715,
|
| 67186 |
+
"grad_norm": 0.0712890625,
|
| 67187 |
+
"learning_rate": 0.013329984301412873,
|
| 67188 |
+
"loss": 2.176814556121826,
|
| 67189 |
+
"step": 19194
|
| 67190 |
+
},
|
| 67191 |
+
{
|
| 67192 |
+
"epoch": 0.7835102040816326,
|
| 67193 |
+
"grad_norm": 0.068359375,
|
| 67194 |
+
"learning_rate": 0.013324960753532183,
|
| 67195 |
+
"loss": 2.189197063446045,
|
| 67196 |
+
"step": 19196
|
| 67197 |
+
},
|
| 67198 |
+
{
|
| 67199 |
+
"epoch": 0.7835918367346939,
|
| 67200 |
+
"grad_norm": 0.06982421875,
|
| 67201 |
+
"learning_rate": 0.013319937205651491,
|
| 67202 |
+
"loss": 2.2034380435943604,
|
| 67203 |
+
"step": 19198
|
| 67204 |
+
},
|
| 67205 |
+
{
|
| 67206 |
+
"epoch": 0.7836734693877551,
|
| 67207 |
+
"grad_norm": 0.06689453125,
|
| 67208 |
+
"learning_rate": 0.013314913657770799,
|
| 67209 |
+
"loss": 2.1950595378875732,
|
| 67210 |
+
"step": 19200
|
| 67211 |
+
},
|
| 67212 |
+
{
|
| 67213 |
+
"epoch": 0.7837551020408163,
|
| 67214 |
+
"grad_norm": 0.0712890625,
|
| 67215 |
+
"learning_rate": 0.01330989010989011,
|
| 67216 |
+
"loss": 2.16892147064209,
|
| 67217 |
+
"step": 19202
|
| 67218 |
+
},
|
| 67219 |
+
{
|
| 67220 |
+
"epoch": 0.7838367346938776,
|
| 67221 |
+
"grad_norm": 0.07275390625,
|
| 67222 |
+
"learning_rate": 0.01330486656200942,
|
| 67223 |
+
"loss": 2.201587200164795,
|
| 67224 |
+
"step": 19204
|
| 67225 |
+
},
|
| 67226 |
+
{
|
| 67227 |
+
"epoch": 0.7839183673469388,
|
| 67228 |
+
"grad_norm": 0.06884765625,
|
| 67229 |
+
"learning_rate": 0.013299843014128728,
|
| 67230 |
+
"loss": 2.175142765045166,
|
| 67231 |
+
"step": 19206
|
| 67232 |
+
},
|
| 67233 |
+
{
|
| 67234 |
+
"epoch": 0.784,
|
| 67235 |
+
"grad_norm": 0.06982421875,
|
| 67236 |
+
"learning_rate": 0.01329481946624804,
|
| 67237 |
+
"loss": 2.187685012817383,
|
| 67238 |
+
"step": 19208
|
| 67239 |
+
},
|
| 67240 |
+
{
|
| 67241 |
+
"epoch": 0.7840816326530612,
|
| 67242 |
+
"grad_norm": 0.0703125,
|
| 67243 |
+
"learning_rate": 0.013289795918367348,
|
| 67244 |
+
"loss": 2.1536762714385986,
|
| 67245 |
+
"step": 19210
|
| 67246 |
+
},
|
| 67247 |
+
{
|
| 67248 |
+
"epoch": 0.7841632653061225,
|
| 67249 |
+
"grad_norm": 0.072265625,
|
| 67250 |
+
"learning_rate": 0.013284772370486655,
|
| 67251 |
+
"loss": 2.1982879638671875,
|
| 67252 |
+
"step": 19212
|
| 67253 |
+
},
|
| 67254 |
+
{
|
| 67255 |
+
"epoch": 0.7842448979591837,
|
| 67256 |
+
"grad_norm": 0.06640625,
|
| 67257 |
+
"learning_rate": 0.013279748822605965,
|
| 67258 |
+
"loss": 2.1765246391296387,
|
| 67259 |
+
"step": 19214
|
| 67260 |
+
},
|
| 67261 |
+
{
|
| 67262 |
+
"epoch": 0.7843265306122449,
|
| 67263 |
+
"grad_norm": 0.06640625,
|
| 67264 |
+
"learning_rate": 0.013274725274725277,
|
| 67265 |
+
"loss": 2.169567108154297,
|
| 67266 |
+
"step": 19216
|
| 67267 |
+
},
|
| 67268 |
+
{
|
| 67269 |
+
"epoch": 0.7844081632653062,
|
| 67270 |
+
"grad_norm": 0.07177734375,
|
| 67271 |
+
"learning_rate": 0.013269701726844585,
|
| 67272 |
+
"loss": 2.1906380653381348,
|
| 67273 |
+
"step": 19218
|
| 67274 |
+
},
|
| 67275 |
+
{
|
| 67276 |
+
"epoch": 0.7844897959183673,
|
| 67277 |
+
"grad_norm": 0.06591796875,
|
| 67278 |
+
"learning_rate": 0.013264678178963893,
|
| 67279 |
+
"loss": 2.1787290573120117,
|
| 67280 |
+
"step": 19220
|
| 67281 |
+
},
|
| 67282 |
+
{
|
| 67283 |
+
"epoch": 0.7845714285714286,
|
| 67284 |
+
"grad_norm": 0.0693359375,
|
| 67285 |
+
"learning_rate": 0.013259654631083204,
|
| 67286 |
+
"loss": 2.1768317222595215,
|
| 67287 |
+
"step": 19222
|
| 67288 |
+
},
|
| 67289 |
+
{
|
| 67290 |
+
"epoch": 0.7846530612244897,
|
| 67291 |
+
"grad_norm": 0.06884765625,
|
| 67292 |
+
"learning_rate": 0.013254631083202512,
|
| 67293 |
+
"loss": 2.1436753273010254,
|
| 67294 |
+
"step": 19224
|
| 67295 |
+
},
|
| 67296 |
+
{
|
| 67297 |
+
"epoch": 0.784734693877551,
|
| 67298 |
+
"grad_norm": 0.0703125,
|
| 67299 |
+
"learning_rate": 0.013249607535321822,
|
| 67300 |
+
"loss": 2.188709259033203,
|
| 67301 |
+
"step": 19226
|
| 67302 |
+
},
|
| 67303 |
+
{
|
| 67304 |
+
"epoch": 0.7848163265306123,
|
| 67305 |
+
"grad_norm": 0.0673828125,
|
| 67306 |
+
"learning_rate": 0.01324458398744113,
|
| 67307 |
+
"loss": 2.1767630577087402,
|
| 67308 |
+
"step": 19228
|
| 67309 |
+
},
|
| 67310 |
+
{
|
| 67311 |
+
"epoch": 0.7848979591836734,
|
| 67312 |
+
"grad_norm": 0.0732421875,
|
| 67313 |
+
"learning_rate": 0.013239560439560441,
|
| 67314 |
+
"loss": 2.199808120727539,
|
| 67315 |
+
"step": 19230
|
| 67316 |
+
},
|
| 67317 |
+
{
|
| 67318 |
+
"epoch": 0.7849795918367347,
|
| 67319 |
+
"grad_norm": 0.0693359375,
|
| 67320 |
+
"learning_rate": 0.013234536891679749,
|
| 67321 |
+
"loss": 2.183558464050293,
|
| 67322 |
+
"step": 19232
|
| 67323 |
+
},
|
| 67324 |
+
{
|
| 67325 |
+
"epoch": 0.7850612244897959,
|
| 67326 |
+
"grad_norm": 0.07080078125,
|
| 67327 |
+
"learning_rate": 0.013229513343799057,
|
| 67328 |
+
"loss": 2.1799445152282715,
|
| 67329 |
+
"step": 19234
|
| 67330 |
+
},
|
| 67331 |
+
{
|
| 67332 |
+
"epoch": 0.7851428571428571,
|
| 67333 |
+
"grad_norm": 0.07177734375,
|
| 67334 |
+
"learning_rate": 0.013224489795918367,
|
| 67335 |
+
"loss": 2.1602892875671387,
|
| 67336 |
+
"step": 19236
|
| 67337 |
+
},
|
| 67338 |
+
{
|
| 67339 |
+
"epoch": 0.7852244897959184,
|
| 67340 |
+
"grad_norm": 0.06884765625,
|
| 67341 |
+
"learning_rate": 0.013219466248037678,
|
| 67342 |
+
"loss": 2.1609771251678467,
|
| 67343 |
+
"step": 19238
|
| 67344 |
+
},
|
| 67345 |
+
{
|
| 67346 |
+
"epoch": 0.7853061224489796,
|
| 67347 |
+
"grad_norm": 0.06884765625,
|
| 67348 |
+
"learning_rate": 0.013214442700156986,
|
| 67349 |
+
"loss": 2.201321601867676,
|
| 67350 |
+
"step": 19240
|
| 67351 |
+
},
|
| 67352 |
+
{
|
| 67353 |
+
"epoch": 0.7853877551020408,
|
| 67354 |
+
"grad_norm": 0.06982421875,
|
| 67355 |
+
"learning_rate": 0.013209419152276294,
|
| 67356 |
+
"loss": 2.179065227508545,
|
| 67357 |
+
"step": 19242
|
| 67358 |
+
},
|
| 67359 |
+
{
|
| 67360 |
+
"epoch": 0.785469387755102,
|
| 67361 |
+
"grad_norm": 0.0703125,
|
| 67362 |
+
"learning_rate": 0.013204395604395605,
|
| 67363 |
+
"loss": 2.182922840118408,
|
| 67364 |
+
"step": 19244
|
| 67365 |
+
},
|
| 67366 |
+
{
|
| 67367 |
+
"epoch": 0.7855510204081633,
|
| 67368 |
+
"grad_norm": 0.06982421875,
|
| 67369 |
+
"learning_rate": 0.013199372056514913,
|
| 67370 |
+
"loss": 2.165276288986206,
|
| 67371 |
+
"step": 19246
|
| 67372 |
+
},
|
| 67373 |
+
{
|
| 67374 |
+
"epoch": 0.7856326530612245,
|
| 67375 |
+
"grad_norm": 0.07421875,
|
| 67376 |
+
"learning_rate": 0.013194348508634223,
|
| 67377 |
+
"loss": 2.163874864578247,
|
| 67378 |
+
"step": 19248
|
| 67379 |
+
},
|
| 67380 |
+
{
|
| 67381 |
+
"epoch": 0.7857142857142857,
|
| 67382 |
+
"grad_norm": 0.07080078125,
|
| 67383 |
+
"learning_rate": 0.013189324960753531,
|
| 67384 |
+
"loss": 2.171300172805786,
|
| 67385 |
+
"step": 19250
|
| 67386 |
+
},
|
| 67387 |
+
{
|
| 67388 |
+
"epoch": 0.785795918367347,
|
| 67389 |
+
"grad_norm": 0.06884765625,
|
| 67390 |
+
"learning_rate": 0.013184301412872843,
|
| 67391 |
+
"loss": 2.202861785888672,
|
| 67392 |
+
"step": 19252
|
| 67393 |
+
},
|
| 67394 |
+
{
|
| 67395 |
+
"epoch": 0.7858775510204081,
|
| 67396 |
+
"grad_norm": 0.0703125,
|
| 67397 |
+
"learning_rate": 0.01317927786499215,
|
| 67398 |
+
"loss": 2.194568157196045,
|
| 67399 |
+
"step": 19254
|
| 67400 |
+
},
|
| 67401 |
+
{
|
| 67402 |
+
"epoch": 0.7859591836734694,
|
| 67403 |
+
"grad_norm": 0.06787109375,
|
| 67404 |
+
"learning_rate": 0.01317425431711146,
|
| 67405 |
+
"loss": 2.180964469909668,
|
| 67406 |
+
"step": 19256
|
| 67407 |
+
},
|
| 67408 |
+
{
|
| 67409 |
+
"epoch": 0.7860408163265307,
|
| 67410 |
+
"grad_norm": 0.06689453125,
|
| 67411 |
+
"learning_rate": 0.01316923076923077,
|
| 67412 |
+
"loss": 2.1783342361450195,
|
| 67413 |
+
"step": 19258
|
| 67414 |
+
},
|
| 67415 |
+
{
|
| 67416 |
+
"epoch": 0.7861224489795918,
|
| 67417 |
+
"grad_norm": 0.07080078125,
|
| 67418 |
+
"learning_rate": 0.01316420722135008,
|
| 67419 |
+
"loss": 2.1898348331451416,
|
| 67420 |
+
"step": 19260
|
| 67421 |
+
},
|
| 67422 |
+
{
|
| 67423 |
+
"epoch": 0.7862040816326531,
|
| 67424 |
+
"grad_norm": 0.0693359375,
|
| 67425 |
+
"learning_rate": 0.013159183673469388,
|
| 67426 |
+
"loss": 2.2145168781280518,
|
| 67427 |
+
"step": 19262
|
| 67428 |
+
},
|
| 67429 |
+
{
|
| 67430 |
+
"epoch": 0.7862857142857143,
|
| 67431 |
+
"grad_norm": 0.06982421875,
|
| 67432 |
+
"learning_rate": 0.013154160125588696,
|
| 67433 |
+
"loss": 2.1751883029937744,
|
| 67434 |
+
"step": 19264
|
| 67435 |
+
},
|
| 67436 |
+
{
|
| 67437 |
+
"epoch": 0.7863673469387755,
|
| 67438 |
+
"grad_norm": 0.06689453125,
|
| 67439 |
+
"learning_rate": 0.013149136577708007,
|
| 67440 |
+
"loss": 2.1894140243530273,
|
| 67441 |
+
"step": 19266
|
| 67442 |
+
},
|
| 67443 |
+
{
|
| 67444 |
+
"epoch": 0.7864489795918367,
|
| 67445 |
+
"grad_norm": 0.06689453125,
|
| 67446 |
+
"learning_rate": 0.013144113029827317,
|
| 67447 |
+
"loss": 2.202747106552124,
|
| 67448 |
+
"step": 19268
|
| 67449 |
+
},
|
| 67450 |
+
{
|
| 67451 |
+
"epoch": 0.786530612244898,
|
| 67452 |
+
"grad_norm": 0.0693359375,
|
| 67453 |
+
"learning_rate": 0.013139089481946625,
|
| 67454 |
+
"loss": 2.215573310852051,
|
| 67455 |
+
"step": 19270
|
| 67456 |
+
},
|
| 67457 |
+
{
|
| 67458 |
+
"epoch": 0.7866122448979592,
|
| 67459 |
+
"grad_norm": 0.0673828125,
|
| 67460 |
+
"learning_rate": 0.013134065934065936,
|
| 67461 |
+
"loss": 2.2045533657073975,
|
| 67462 |
+
"step": 19272
|
| 67463 |
+
},
|
| 67464 |
+
{
|
| 67465 |
+
"epoch": 0.7866938775510204,
|
| 67466 |
+
"grad_norm": 0.0703125,
|
| 67467 |
+
"learning_rate": 0.013129042386185244,
|
| 67468 |
+
"loss": 2.198678493499756,
|
| 67469 |
+
"step": 19274
|
| 67470 |
+
},
|
| 67471 |
+
{
|
| 67472 |
+
"epoch": 0.7867755102040817,
|
| 67473 |
+
"grad_norm": 0.0703125,
|
| 67474 |
+
"learning_rate": 0.013124018838304552,
|
| 67475 |
+
"loss": 2.2082996368408203,
|
| 67476 |
+
"step": 19276
|
| 67477 |
+
},
|
| 67478 |
+
{
|
| 67479 |
+
"epoch": 0.7868571428571428,
|
| 67480 |
+
"grad_norm": 0.0703125,
|
| 67481 |
+
"learning_rate": 0.013118995290423862,
|
| 67482 |
+
"loss": 2.2229394912719727,
|
| 67483 |
+
"step": 19278
|
| 67484 |
+
},
|
| 67485 |
+
{
|
| 67486 |
+
"epoch": 0.7869387755102041,
|
| 67487 |
+
"grad_norm": 0.07080078125,
|
| 67488 |
+
"learning_rate": 0.013113971742543173,
|
| 67489 |
+
"loss": 2.1980977058410645,
|
| 67490 |
+
"step": 19280
|
| 67491 |
+
},
|
| 67492 |
+
{
|
| 67493 |
+
"epoch": 0.7870204081632654,
|
| 67494 |
+
"grad_norm": 0.072265625,
|
| 67495 |
+
"learning_rate": 0.013108948194662481,
|
| 67496 |
+
"loss": 2.2099947929382324,
|
| 67497 |
+
"step": 19282
|
| 67498 |
+
},
|
| 67499 |
+
{
|
| 67500 |
+
"epoch": 0.7871020408163265,
|
| 67501 |
+
"grad_norm": 0.07177734375,
|
| 67502 |
+
"learning_rate": 0.013103924646781789,
|
| 67503 |
+
"loss": 2.212822675704956,
|
| 67504 |
+
"step": 19284
|
| 67505 |
+
},
|
| 67506 |
+
{
|
| 67507 |
+
"epoch": 0.7871836734693878,
|
| 67508 |
+
"grad_norm": 0.06982421875,
|
| 67509 |
+
"learning_rate": 0.013098901098901099,
|
| 67510 |
+
"loss": 2.210226058959961,
|
| 67511 |
+
"step": 19286
|
| 67512 |
+
},
|
| 67513 |
+
{
|
| 67514 |
+
"epoch": 0.7872653061224489,
|
| 67515 |
+
"grad_norm": 0.07421875,
|
| 67516 |
+
"learning_rate": 0.013093877551020408,
|
| 67517 |
+
"loss": 2.2132301330566406,
|
| 67518 |
+
"step": 19288
|
| 67519 |
+
},
|
| 67520 |
+
{
|
| 67521 |
+
"epoch": 0.7873469387755102,
|
| 67522 |
+
"grad_norm": 0.0673828125,
|
| 67523 |
+
"learning_rate": 0.013088854003139718,
|
| 67524 |
+
"loss": 2.2306199073791504,
|
| 67525 |
+
"step": 19290
|
| 67526 |
+
},
|
| 67527 |
+
{
|
| 67528 |
+
"epoch": 0.7874285714285715,
|
| 67529 |
+
"grad_norm": 0.0673828125,
|
| 67530 |
+
"learning_rate": 0.013083830455259026,
|
| 67531 |
+
"loss": 2.2127418518066406,
|
| 67532 |
+
"step": 19292
|
| 67533 |
+
},
|
| 67534 |
+
{
|
| 67535 |
+
"epoch": 0.7875102040816326,
|
| 67536 |
+
"grad_norm": 0.07470703125,
|
| 67537 |
+
"learning_rate": 0.013078806907378338,
|
| 67538 |
+
"loss": 2.2419562339782715,
|
| 67539 |
+
"step": 19294
|
| 67540 |
+
},
|
| 67541 |
+
{
|
| 67542 |
+
"epoch": 0.7875918367346939,
|
| 67543 |
+
"grad_norm": 0.07177734375,
|
| 67544 |
+
"learning_rate": 0.013073783359497645,
|
| 67545 |
+
"loss": 2.2651734352111816,
|
| 67546 |
+
"step": 19296
|
| 67547 |
+
},
|
| 67548 |
+
{
|
| 67549 |
+
"epoch": 0.7876734693877551,
|
| 67550 |
+
"grad_norm": 0.0712890625,
|
| 67551 |
+
"learning_rate": 0.013068759811616955,
|
| 67552 |
+
"loss": 2.2525441646575928,
|
| 67553 |
+
"step": 19298
|
| 67554 |
+
},
|
| 67555 |
+
{
|
| 67556 |
+
"epoch": 0.7877551020408163,
|
| 67557 |
+
"grad_norm": 0.07080078125,
|
| 67558 |
+
"learning_rate": 0.013063736263736263,
|
| 67559 |
+
"loss": 2.2593140602111816,
|
| 67560 |
+
"step": 19300
|
| 67561 |
+
},
|
| 67562 |
+
{
|
| 67563 |
+
"epoch": 0.7878367346938776,
|
| 67564 |
+
"grad_norm": 0.07666015625,
|
| 67565 |
+
"learning_rate": 0.013058712715855575,
|
| 67566 |
+
"loss": 2.2675082683563232,
|
| 67567 |
+
"step": 19302
|
| 67568 |
+
},
|
| 67569 |
+
{
|
| 67570 |
+
"epoch": 0.7879183673469388,
|
| 67571 |
+
"grad_norm": 0.07421875,
|
| 67572 |
+
"learning_rate": 0.013053689167974883,
|
| 67573 |
+
"loss": 2.280200958251953,
|
| 67574 |
+
"step": 19304
|
| 67575 |
+
},
|
| 67576 |
+
{
|
| 67577 |
+
"epoch": 0.788,
|
| 67578 |
+
"grad_norm": 0.078125,
|
| 67579 |
+
"learning_rate": 0.01304866562009419,
|
| 67580 |
+
"loss": 2.292228937149048,
|
| 67581 |
+
"step": 19306
|
| 67582 |
+
},
|
| 67583 |
+
{
|
| 67584 |
+
"epoch": 0.7880816326530612,
|
| 67585 |
+
"grad_norm": 0.0703125,
|
| 67586 |
+
"learning_rate": 0.013043642072213502,
|
| 67587 |
+
"loss": 2.286612033843994,
|
| 67588 |
+
"step": 19308
|
| 67589 |
+
},
|
| 67590 |
+
{
|
| 67591 |
+
"epoch": 0.7881632653061225,
|
| 67592 |
+
"grad_norm": 0.0693359375,
|
| 67593 |
+
"learning_rate": 0.013038618524332812,
|
| 67594 |
+
"loss": 2.2930197715759277,
|
| 67595 |
+
"step": 19310
|
| 67596 |
+
},
|
| 67597 |
+
{
|
| 67598 |
+
"epoch": 0.7882448979591836,
|
| 67599 |
+
"grad_norm": 0.06787109375,
|
| 67600 |
+
"learning_rate": 0.01303359497645212,
|
| 67601 |
+
"loss": 2.279954433441162,
|
| 67602 |
+
"step": 19312
|
| 67603 |
+
},
|
| 67604 |
+
{
|
| 67605 |
+
"epoch": 0.7883265306122449,
|
| 67606 |
+
"grad_norm": 0.068359375,
|
| 67607 |
+
"learning_rate": 0.013028571428571428,
|
| 67608 |
+
"loss": 2.294743537902832,
|
| 67609 |
+
"step": 19314
|
| 67610 |
+
},
|
| 67611 |
+
{
|
| 67612 |
+
"epoch": 0.7884081632653062,
|
| 67613 |
+
"grad_norm": 0.0712890625,
|
| 67614 |
+
"learning_rate": 0.013023547880690739,
|
| 67615 |
+
"loss": 2.2848730087280273,
|
| 67616 |
+
"step": 19316
|
| 67617 |
+
},
|
| 67618 |
+
{
|
| 67619 |
+
"epoch": 0.7884897959183673,
|
| 67620 |
+
"grad_norm": 0.07568359375,
|
| 67621 |
+
"learning_rate": 0.013018524332810047,
|
| 67622 |
+
"loss": 2.289844512939453,
|
| 67623 |
+
"step": 19318
|
| 67624 |
+
},
|
| 67625 |
+
{
|
| 67626 |
+
"epoch": 0.7885714285714286,
|
| 67627 |
+
"grad_norm": 0.06982421875,
|
| 67628 |
+
"learning_rate": 0.013013500784929357,
|
| 67629 |
+
"loss": 2.2738637924194336,
|
| 67630 |
+
"step": 19320
|
| 67631 |
+
},
|
| 67632 |
+
{
|
| 67633 |
+
"epoch": 0.7886530612244897,
|
| 67634 |
+
"grad_norm": 0.07763671875,
|
| 67635 |
+
"learning_rate": 0.013008477237048665,
|
| 67636 |
+
"loss": 2.334277629852295,
|
| 67637 |
+
"step": 19322
|
| 67638 |
+
},
|
| 67639 |
+
{
|
| 67640 |
+
"epoch": 0.788734693877551,
|
| 67641 |
+
"grad_norm": 0.0751953125,
|
| 67642 |
+
"learning_rate": 0.013003453689167976,
|
| 67643 |
+
"loss": 2.3090291023254395,
|
| 67644 |
+
"step": 19324
|
| 67645 |
+
},
|
| 67646 |
+
{
|
| 67647 |
+
"epoch": 0.7888163265306123,
|
| 67648 |
+
"grad_norm": 0.0703125,
|
| 67649 |
+
"learning_rate": 0.012998430141287284,
|
| 67650 |
+
"loss": 2.3247487545013428,
|
| 67651 |
+
"step": 19326
|
| 67652 |
+
},
|
| 67653 |
+
{
|
| 67654 |
+
"epoch": 0.7888979591836734,
|
| 67655 |
+
"grad_norm": 0.07177734375,
|
| 67656 |
+
"learning_rate": 0.012993406593406592,
|
| 67657 |
+
"loss": 2.298673391342163,
|
| 67658 |
+
"step": 19328
|
| 67659 |
+
},
|
| 67660 |
+
{
|
| 67661 |
+
"epoch": 0.7889795918367347,
|
| 67662 |
+
"grad_norm": 0.07275390625,
|
| 67663 |
+
"learning_rate": 0.012988383045525903,
|
| 67664 |
+
"loss": 2.3253488540649414,
|
| 67665 |
+
"step": 19330
|
| 67666 |
+
},
|
| 67667 |
+
{
|
| 67668 |
+
"epoch": 0.7890612244897959,
|
| 67669 |
+
"grad_norm": 0.0673828125,
|
| 67670 |
+
"learning_rate": 0.012983359497645213,
|
| 67671 |
+
"loss": 2.293173313140869,
|
| 67672 |
+
"step": 19332
|
| 67673 |
+
},
|
| 67674 |
+
{
|
| 67675 |
+
"epoch": 0.7891428571428571,
|
| 67676 |
+
"grad_norm": 0.06884765625,
|
| 67677 |
+
"learning_rate": 0.012978335949764521,
|
| 67678 |
+
"loss": 2.280346155166626,
|
| 67679 |
+
"step": 19334
|
| 67680 |
+
},
|
| 67681 |
+
{
|
| 67682 |
+
"epoch": 0.7892244897959184,
|
| 67683 |
+
"grad_norm": 0.07177734375,
|
| 67684 |
+
"learning_rate": 0.012973312401883829,
|
| 67685 |
+
"loss": 2.306351900100708,
|
| 67686 |
+
"step": 19336
|
| 67687 |
+
},
|
| 67688 |
+
{
|
| 67689 |
+
"epoch": 0.7893061224489796,
|
| 67690 |
+
"grad_norm": 0.06787109375,
|
| 67691 |
+
"learning_rate": 0.01296828885400314,
|
| 67692 |
+
"loss": 2.287632465362549,
|
| 67693 |
+
"step": 19338
|
| 67694 |
+
},
|
| 67695 |
+
{
|
| 67696 |
+
"epoch": 0.7893877551020408,
|
| 67697 |
+
"grad_norm": 0.07080078125,
|
| 67698 |
+
"learning_rate": 0.012963265306122448,
|
| 67699 |
+
"loss": 2.3143272399902344,
|
| 67700 |
+
"step": 19340
|
| 67701 |
+
},
|
| 67702 |
+
{
|
| 67703 |
+
"epoch": 0.789469387755102,
|
| 67704 |
+
"grad_norm": 0.07177734375,
|
| 67705 |
+
"learning_rate": 0.012958241758241758,
|
| 67706 |
+
"loss": 2.297276020050049,
|
| 67707 |
+
"step": 19342
|
| 67708 |
+
},
|
| 67709 |
+
{
|
| 67710 |
+
"epoch": 0.7895510204081633,
|
| 67711 |
+
"grad_norm": 0.07373046875,
|
| 67712 |
+
"learning_rate": 0.01295321821036107,
|
| 67713 |
+
"loss": 2.325594902038574,
|
| 67714 |
+
"step": 19344
|
| 67715 |
+
},
|
| 67716 |
+
{
|
| 67717 |
+
"epoch": 0.7896326530612245,
|
| 67718 |
+
"grad_norm": 0.06982421875,
|
| 67719 |
+
"learning_rate": 0.012948194662480378,
|
| 67720 |
+
"loss": 2.296551465988159,
|
| 67721 |
+
"step": 19346
|
| 67722 |
+
},
|
| 67723 |
+
{
|
| 67724 |
+
"epoch": 0.7897142857142857,
|
| 67725 |
+
"grad_norm": 0.07421875,
|
| 67726 |
+
"learning_rate": 0.012943171114599685,
|
| 67727 |
+
"loss": 2.3254494667053223,
|
| 67728 |
+
"step": 19348
|
| 67729 |
+
},
|
| 67730 |
+
{
|
| 67731 |
+
"epoch": 0.789795918367347,
|
| 67732 |
+
"grad_norm": 0.0732421875,
|
| 67733 |
+
"learning_rate": 0.012938147566718995,
|
| 67734 |
+
"loss": 2.3109796047210693,
|
| 67735 |
+
"step": 19350
|
| 67736 |
+
},
|
| 67737 |
+
{
|
| 67738 |
+
"epoch": 0.7898775510204081,
|
| 67739 |
+
"grad_norm": 0.0751953125,
|
| 67740 |
+
"learning_rate": 0.012933124018838305,
|
| 67741 |
+
"loss": 2.323305606842041,
|
| 67742 |
+
"step": 19352
|
| 67743 |
+
},
|
| 67744 |
+
{
|
| 67745 |
+
"epoch": 0.7899591836734694,
|
| 67746 |
+
"grad_norm": 0.0732421875,
|
| 67747 |
+
"learning_rate": 0.012928100470957615,
|
| 67748 |
+
"loss": 2.305424690246582,
|
| 67749 |
+
"step": 19354
|
| 67750 |
+
},
|
| 67751 |
+
{
|
| 67752 |
+
"epoch": 0.7900408163265306,
|
| 67753 |
+
"grad_norm": 0.07568359375,
|
| 67754 |
+
"learning_rate": 0.012923076923076923,
|
| 67755 |
+
"loss": 2.2779955863952637,
|
| 67756 |
+
"step": 19356
|
| 67757 |
+
},
|
| 67758 |
+
{
|
| 67759 |
+
"epoch": 0.7901224489795918,
|
| 67760 |
+
"grad_norm": 0.07177734375,
|
| 67761 |
+
"learning_rate": 0.012918053375196234,
|
| 67762 |
+
"loss": 2.3076109886169434,
|
| 67763 |
+
"step": 19358
|
| 67764 |
+
},
|
| 67765 |
+
{
|
| 67766 |
+
"epoch": 0.7902040816326531,
|
| 67767 |
+
"grad_norm": 0.072265625,
|
| 67768 |
+
"learning_rate": 0.012913029827315542,
|
| 67769 |
+
"loss": 2.3165040016174316,
|
| 67770 |
+
"step": 19360
|
| 67771 |
+
},
|
| 67772 |
+
{
|
| 67773 |
+
"epoch": 0.7902857142857143,
|
| 67774 |
+
"grad_norm": 0.07275390625,
|
| 67775 |
+
"learning_rate": 0.012908006279434852,
|
| 67776 |
+
"loss": 2.324634552001953,
|
| 67777 |
+
"step": 19362
|
| 67778 |
+
},
|
| 67779 |
+
{
|
| 67780 |
+
"epoch": 0.7903673469387755,
|
| 67781 |
+
"grad_norm": 0.072265625,
|
| 67782 |
+
"learning_rate": 0.01290298273155416,
|
| 67783 |
+
"loss": 2.318744659423828,
|
| 67784 |
+
"step": 19364
|
| 67785 |
+
},
|
| 67786 |
+
{
|
| 67787 |
+
"epoch": 0.7904489795918367,
|
| 67788 |
+
"grad_norm": 0.0732421875,
|
| 67789 |
+
"learning_rate": 0.012897959183673471,
|
| 67790 |
+
"loss": 2.3420937061309814,
|
| 67791 |
+
"step": 19366
|
| 67792 |
+
},
|
| 67793 |
+
{
|
| 67794 |
+
"epoch": 0.790530612244898,
|
| 67795 |
+
"grad_norm": 0.07373046875,
|
| 67796 |
+
"learning_rate": 0.012892935635792779,
|
| 67797 |
+
"loss": 2.3248512744903564,
|
| 67798 |
+
"step": 19368
|
| 67799 |
+
},
|
| 67800 |
+
{
|
| 67801 |
+
"epoch": 0.7906122448979592,
|
| 67802 |
+
"grad_norm": 0.07080078125,
|
| 67803 |
+
"learning_rate": 0.012887912087912087,
|
| 67804 |
+
"loss": 2.31795597076416,
|
| 67805 |
+
"step": 19370
|
| 67806 |
+
},
|
| 67807 |
+
{
|
| 67808 |
+
"epoch": 0.7906938775510204,
|
| 67809 |
+
"grad_norm": 0.07275390625,
|
| 67810 |
+
"learning_rate": 0.012882888540031397,
|
| 67811 |
+
"loss": 2.344648838043213,
|
| 67812 |
+
"step": 19372
|
| 67813 |
+
},
|
| 67814 |
+
{
|
| 67815 |
+
"epoch": 0.7907755102040817,
|
| 67816 |
+
"grad_norm": 0.07421875,
|
| 67817 |
+
"learning_rate": 0.012877864992150708,
|
| 67818 |
+
"loss": 2.309953212738037,
|
| 67819 |
+
"step": 19374
|
| 67820 |
+
},
|
| 67821 |
+
{
|
| 67822 |
+
"epoch": 0.7908571428571428,
|
| 67823 |
+
"grad_norm": 0.07177734375,
|
| 67824 |
+
"learning_rate": 0.012872841444270016,
|
| 67825 |
+
"loss": 2.3025102615356445,
|
| 67826 |
+
"step": 19376
|
| 67827 |
+
},
|
| 67828 |
+
{
|
| 67829 |
+
"epoch": 0.7909387755102041,
|
| 67830 |
+
"grad_norm": 0.07470703125,
|
| 67831 |
+
"learning_rate": 0.012867817896389324,
|
| 67832 |
+
"loss": 2.3165602684020996,
|
| 67833 |
+
"step": 19378
|
| 67834 |
+
},
|
| 67835 |
+
{
|
| 67836 |
+
"epoch": 0.7910204081632654,
|
| 67837 |
+
"grad_norm": 0.0712890625,
|
| 67838 |
+
"learning_rate": 0.012862794348508635,
|
| 67839 |
+
"loss": 2.3167994022369385,
|
| 67840 |
+
"step": 19380
|
| 67841 |
+
},
|
| 67842 |
+
{
|
| 67843 |
+
"epoch": 0.7911020408163265,
|
| 67844 |
+
"grad_norm": 0.07470703125,
|
| 67845 |
+
"learning_rate": 0.012857770800627943,
|
| 67846 |
+
"loss": 2.311446189880371,
|
| 67847 |
+
"step": 19382
|
| 67848 |
+
},
|
| 67849 |
+
{
|
| 67850 |
+
"epoch": 0.7911836734693878,
|
| 67851 |
+
"grad_norm": 0.07470703125,
|
| 67852 |
+
"learning_rate": 0.012852747252747253,
|
| 67853 |
+
"loss": 2.329148769378662,
|
| 67854 |
+
"step": 19384
|
| 67855 |
+
},
|
| 67856 |
+
{
|
| 67857 |
+
"epoch": 0.7912653061224489,
|
| 67858 |
+
"grad_norm": 0.072265625,
|
| 67859 |
+
"learning_rate": 0.012847723704866561,
|
| 67860 |
+
"loss": 2.33450984954834,
|
| 67861 |
+
"step": 19386
|
| 67862 |
+
},
|
| 67863 |
+
{
|
| 67864 |
+
"epoch": 0.7913469387755102,
|
| 67865 |
+
"grad_norm": 0.0771484375,
|
| 67866 |
+
"learning_rate": 0.012842700156985873,
|
| 67867 |
+
"loss": 2.312901020050049,
|
| 67868 |
+
"step": 19388
|
| 67869 |
+
},
|
| 67870 |
+
{
|
| 67871 |
+
"epoch": 0.7914285714285715,
|
| 67872 |
+
"grad_norm": 0.072265625,
|
| 67873 |
+
"learning_rate": 0.01283767660910518,
|
| 67874 |
+
"loss": 2.3334836959838867,
|
| 67875 |
+
"step": 19390
|
| 67876 |
+
},
|
| 67877 |
+
{
|
| 67878 |
+
"epoch": 0.7915102040816326,
|
| 67879 |
+
"grad_norm": 0.0751953125,
|
| 67880 |
+
"learning_rate": 0.01283265306122449,
|
| 67881 |
+
"loss": 2.3270745277404785,
|
| 67882 |
+
"step": 19392
|
| 67883 |
+
},
|
| 67884 |
+
{
|
| 67885 |
+
"epoch": 0.7915918367346939,
|
| 67886 |
+
"grad_norm": 0.07421875,
|
| 67887 |
+
"learning_rate": 0.0128276295133438,
|
| 67888 |
+
"loss": 2.332155466079712,
|
| 67889 |
+
"step": 19394
|
| 67890 |
+
},
|
| 67891 |
+
{
|
| 67892 |
+
"epoch": 0.7916734693877551,
|
| 67893 |
+
"grad_norm": 0.07373046875,
|
| 67894 |
+
"learning_rate": 0.01282260596546311,
|
| 67895 |
+
"loss": 2.301142930984497,
|
| 67896 |
+
"step": 19396
|
| 67897 |
+
},
|
| 67898 |
+
{
|
| 67899 |
+
"epoch": 0.7917551020408163,
|
| 67900 |
+
"grad_norm": 0.0712890625,
|
| 67901 |
+
"learning_rate": 0.012817582417582418,
|
| 67902 |
+
"loss": 2.2945077419281006,
|
| 67903 |
+
"step": 19398
|
| 67904 |
+
},
|
| 67905 |
+
{
|
| 67906 |
+
"epoch": 0.7918367346938775,
|
| 67907 |
+
"grad_norm": 0.0712890625,
|
| 67908 |
+
"learning_rate": 0.012812558869701726,
|
| 67909 |
+
"loss": 2.3141279220581055,
|
| 67910 |
+
"step": 19400
|
| 67911 |
+
},
|
| 67912 |
+
{
|
| 67913 |
+
"epoch": 0.7919183673469388,
|
| 67914 |
+
"grad_norm": 0.0703125,
|
| 67915 |
+
"learning_rate": 0.012807535321821037,
|
| 67916 |
+
"loss": 2.29447078704834,
|
| 67917 |
+
"step": 19402
|
| 67918 |
+
},
|
| 67919 |
+
{
|
| 67920 |
+
"epoch": 0.792,
|
| 67921 |
+
"grad_norm": 0.0703125,
|
| 67922 |
+
"learning_rate": 0.012802511773940347,
|
| 67923 |
+
"loss": 2.3134188652038574,
|
| 67924 |
+
"step": 19404
|
| 67925 |
+
},
|
| 67926 |
+
{
|
| 67927 |
+
"epoch": 0.7920816326530612,
|
| 67928 |
+
"grad_norm": 0.07080078125,
|
| 67929 |
+
"learning_rate": 0.012797488226059655,
|
| 67930 |
+
"loss": 2.280491828918457,
|
| 67931 |
+
"step": 19406
|
| 67932 |
+
},
|
| 67933 |
+
{
|
| 67934 |
+
"epoch": 0.7921632653061225,
|
| 67935 |
+
"grad_norm": 0.07275390625,
|
| 67936 |
+
"learning_rate": 0.012792464678178963,
|
| 67937 |
+
"loss": 2.3208813667297363,
|
| 67938 |
+
"step": 19408
|
| 67939 |
+
},
|
| 67940 |
+
{
|
| 67941 |
+
"epoch": 0.7922448979591836,
|
| 67942 |
+
"grad_norm": 0.0712890625,
|
| 67943 |
+
"learning_rate": 0.012787441130298274,
|
| 67944 |
+
"loss": 2.3248229026794434,
|
| 67945 |
+
"step": 19410
|
| 67946 |
+
},
|
| 67947 |
+
{
|
| 67948 |
+
"epoch": 0.7923265306122449,
|
| 67949 |
+
"grad_norm": 0.072265625,
|
| 67950 |
+
"learning_rate": 0.012782417582417582,
|
| 67951 |
+
"loss": 2.309404134750366,
|
| 67952 |
+
"step": 19412
|
| 67953 |
+
},
|
| 67954 |
+
{
|
| 67955 |
+
"epoch": 0.7924081632653062,
|
| 67956 |
+
"grad_norm": 0.06982421875,
|
| 67957 |
+
"learning_rate": 0.012777394034536892,
|
| 67958 |
+
"loss": 2.3096113204956055,
|
| 67959 |
+
"step": 19414
|
| 67960 |
+
},
|
| 67961 |
+
{
|
| 67962 |
+
"epoch": 0.7924897959183673,
|
| 67963 |
+
"grad_norm": 0.07568359375,
|
| 67964 |
+
"learning_rate": 0.012772370486656203,
|
| 67965 |
+
"loss": 2.3204994201660156,
|
| 67966 |
+
"step": 19416
|
| 67967 |
+
},
|
| 67968 |
+
{
|
| 67969 |
+
"epoch": 0.7925714285714286,
|
| 67970 |
+
"grad_norm": 0.0693359375,
|
| 67971 |
+
"learning_rate": 0.012767346938775511,
|
| 67972 |
+
"loss": 2.3343091011047363,
|
| 67973 |
+
"step": 19418
|
| 67974 |
+
},
|
| 67975 |
+
{
|
| 67976 |
+
"epoch": 0.7926530612244898,
|
| 67977 |
+
"grad_norm": 0.0732421875,
|
| 67978 |
+
"learning_rate": 0.012762323390894819,
|
| 67979 |
+
"loss": 2.349900484085083,
|
| 67980 |
+
"step": 19420
|
| 67981 |
+
},
|
| 67982 |
+
{
|
| 67983 |
+
"epoch": 0.792734693877551,
|
| 67984 |
+
"grad_norm": 0.07275390625,
|
| 67985 |
+
"learning_rate": 0.012757299843014127,
|
| 67986 |
+
"loss": 2.3116989135742188,
|
| 67987 |
+
"step": 19422
|
| 67988 |
+
},
|
| 67989 |
+
{
|
| 67990 |
+
"epoch": 0.7928163265306123,
|
| 67991 |
+
"grad_norm": 0.07666015625,
|
| 67992 |
+
"learning_rate": 0.012752276295133438,
|
| 67993 |
+
"loss": 2.32127046585083,
|
| 67994 |
+
"step": 19424
|
| 67995 |
+
},
|
| 67996 |
+
{
|
| 67997 |
+
"epoch": 0.7928979591836735,
|
| 67998 |
+
"grad_norm": 0.0771484375,
|
| 67999 |
+
"learning_rate": 0.012747252747252748,
|
| 68000 |
+
"loss": 2.3043527603149414,
|
| 68001 |
+
"step": 19426
|
| 68002 |
+
},
|
| 68003 |
+
{
|
| 68004 |
+
"epoch": 0.7929795918367347,
|
| 68005 |
+
"grad_norm": 0.07421875,
|
| 68006 |
+
"learning_rate": 0.012742229199372056,
|
| 68007 |
+
"loss": 2.335134506225586,
|
| 68008 |
+
"step": 19428
|
| 68009 |
+
},
|
| 68010 |
+
{
|
| 68011 |
+
"epoch": 0.7930612244897959,
|
| 68012 |
+
"grad_norm": 0.076171875,
|
| 68013 |
+
"learning_rate": 0.012737205651491368,
|
| 68014 |
+
"loss": 2.349153995513916,
|
| 68015 |
+
"step": 19430
|
| 68016 |
+
},
|
| 68017 |
+
{
|
| 68018 |
+
"epoch": 0.7931428571428571,
|
| 68019 |
+
"grad_norm": 0.07421875,
|
| 68020 |
+
"learning_rate": 0.012732182103610675,
|
| 68021 |
+
"loss": 2.330496311187744,
|
| 68022 |
+
"step": 19432
|
| 68023 |
+
},
|
| 68024 |
+
{
|
| 68025 |
+
"epoch": 0.7932244897959184,
|
| 68026 |
+
"grad_norm": 0.07470703125,
|
| 68027 |
+
"learning_rate": 0.012727158555729983,
|
| 68028 |
+
"loss": 2.332169771194458,
|
| 68029 |
+
"step": 19434
|
| 68030 |
+
},
|
| 68031 |
+
{
|
| 68032 |
+
"epoch": 0.7933061224489796,
|
| 68033 |
+
"grad_norm": 0.0810546875,
|
| 68034 |
+
"learning_rate": 0.012722135007849293,
|
| 68035 |
+
"loss": 2.3390865325927734,
|
| 68036 |
+
"step": 19436
|
| 68037 |
+
},
|
| 68038 |
+
{
|
| 68039 |
+
"epoch": 0.7933877551020408,
|
| 68040 |
+
"grad_norm": 0.07568359375,
|
| 68041 |
+
"learning_rate": 0.012717111459968605,
|
| 68042 |
+
"loss": 2.3492190837860107,
|
| 68043 |
+
"step": 19438
|
| 68044 |
+
},
|
| 68045 |
+
{
|
| 68046 |
+
"epoch": 0.793469387755102,
|
| 68047 |
+
"grad_norm": 0.0751953125,
|
| 68048 |
+
"learning_rate": 0.012712087912087913,
|
| 68049 |
+
"loss": 2.3320634365081787,
|
| 68050 |
+
"step": 19440
|
| 68051 |
+
},
|
| 68052 |
+
{
|
| 68053 |
+
"epoch": 0.7935510204081633,
|
| 68054 |
+
"grad_norm": 0.07470703125,
|
| 68055 |
+
"learning_rate": 0.01270706436420722,
|
| 68056 |
+
"loss": 2.3551440238952637,
|
| 68057 |
+
"step": 19442
|
| 68058 |
+
},
|
| 68059 |
+
{
|
| 68060 |
+
"epoch": 0.7936326530612244,
|
| 68061 |
+
"grad_norm": 0.07373046875,
|
| 68062 |
+
"learning_rate": 0.012702040816326532,
|
| 68063 |
+
"loss": 2.3386178016662598,
|
| 68064 |
+
"step": 19444
|
| 68065 |
+
},
|
| 68066 |
+
{
|
| 68067 |
+
"epoch": 0.7937142857142857,
|
| 68068 |
+
"grad_norm": 0.078125,
|
| 68069 |
+
"learning_rate": 0.01269701726844584,
|
| 68070 |
+
"loss": 2.344766616821289,
|
| 68071 |
+
"step": 19446
|
| 68072 |
+
},
|
| 68073 |
+
{
|
| 68074 |
+
"epoch": 0.793795918367347,
|
| 68075 |
+
"grad_norm": 0.076171875,
|
| 68076 |
+
"learning_rate": 0.01269199372056515,
|
| 68077 |
+
"loss": 2.337897777557373,
|
| 68078 |
+
"step": 19448
|
| 68079 |
+
},
|
| 68080 |
+
{
|
| 68081 |
+
"epoch": 0.7938775510204081,
|
| 68082 |
+
"grad_norm": 0.07275390625,
|
| 68083 |
+
"learning_rate": 0.012686970172684458,
|
| 68084 |
+
"loss": 2.3620877265930176,
|
| 68085 |
+
"step": 19450
|
| 68086 |
+
},
|
| 68087 |
+
{
|
| 68088 |
+
"epoch": 0.7939591836734694,
|
| 68089 |
+
"grad_norm": 0.072265625,
|
| 68090 |
+
"learning_rate": 0.012681946624803769,
|
| 68091 |
+
"loss": 2.3315300941467285,
|
| 68092 |
+
"step": 19452
|
| 68093 |
+
},
|
| 68094 |
+
{
|
| 68095 |
+
"epoch": 0.7940408163265306,
|
| 68096 |
+
"grad_norm": 0.07373046875,
|
| 68097 |
+
"learning_rate": 0.012676923076923077,
|
| 68098 |
+
"loss": 2.3214097023010254,
|
| 68099 |
+
"step": 19454
|
| 68100 |
+
},
|
| 68101 |
+
{
|
| 68102 |
+
"epoch": 0.7941224489795918,
|
| 68103 |
+
"grad_norm": 0.07373046875,
|
| 68104 |
+
"learning_rate": 0.012671899529042387,
|
| 68105 |
+
"loss": 2.3604679107666016,
|
| 68106 |
+
"step": 19456
|
| 68107 |
+
},
|
| 68108 |
+
{
|
| 68109 |
+
"epoch": 0.7942040816326531,
|
| 68110 |
+
"grad_norm": 0.07666015625,
|
| 68111 |
+
"learning_rate": 0.012666875981161695,
|
| 68112 |
+
"loss": 2.368152141571045,
|
| 68113 |
+
"step": 19458
|
| 68114 |
+
},
|
| 68115 |
+
{
|
| 68116 |
+
"epoch": 0.7942857142857143,
|
| 68117 |
+
"grad_norm": 0.07568359375,
|
| 68118 |
+
"learning_rate": 0.012661852433281006,
|
| 68119 |
+
"loss": 2.3702588081359863,
|
| 68120 |
+
"step": 19460
|
| 68121 |
+
},
|
| 68122 |
+
{
|
| 68123 |
+
"epoch": 0.7943673469387755,
|
| 68124 |
+
"grad_norm": 0.07666015625,
|
| 68125 |
+
"learning_rate": 0.012656828885400314,
|
| 68126 |
+
"loss": 2.3476812839508057,
|
| 68127 |
+
"step": 19462
|
| 68128 |
+
},
|
| 68129 |
+
{
|
| 68130 |
+
"epoch": 0.7944489795918367,
|
| 68131 |
+
"grad_norm": 0.0712890625,
|
| 68132 |
+
"learning_rate": 0.012651805337519622,
|
| 68133 |
+
"loss": 2.338712215423584,
|
| 68134 |
+
"step": 19464
|
| 68135 |
+
},
|
| 68136 |
+
{
|
| 68137 |
+
"epoch": 0.794530612244898,
|
| 68138 |
+
"grad_norm": 0.0732421875,
|
| 68139 |
+
"learning_rate": 0.012646781789638933,
|
| 68140 |
+
"loss": 2.3641529083251953,
|
| 68141 |
+
"step": 19466
|
| 68142 |
+
},
|
| 68143 |
+
{
|
| 68144 |
+
"epoch": 0.7946122448979592,
|
| 68145 |
+
"grad_norm": 0.068359375,
|
| 68146 |
+
"learning_rate": 0.012641758241758243,
|
| 68147 |
+
"loss": 2.3641324043273926,
|
| 68148 |
+
"step": 19468
|
| 68149 |
+
},
|
| 68150 |
+
{
|
| 68151 |
+
"epoch": 0.7946938775510204,
|
| 68152 |
+
"grad_norm": 0.068359375,
|
| 68153 |
+
"learning_rate": 0.012636734693877551,
|
| 68154 |
+
"loss": 2.3714561462402344,
|
| 68155 |
+
"step": 19470
|
| 68156 |
+
},
|
| 68157 |
+
{
|
| 68158 |
+
"epoch": 0.7947755102040817,
|
| 68159 |
+
"grad_norm": 0.0791015625,
|
| 68160 |
+
"learning_rate": 0.012631711145996859,
|
| 68161 |
+
"loss": 2.3471617698669434,
|
| 68162 |
+
"step": 19472
|
| 68163 |
+
},
|
| 68164 |
+
{
|
| 68165 |
+
"epoch": 0.7948571428571428,
|
| 68166 |
+
"grad_norm": 0.0712890625,
|
| 68167 |
+
"learning_rate": 0.01262668759811617,
|
| 68168 |
+
"loss": 2.3641815185546875,
|
| 68169 |
+
"step": 19474
|
| 68170 |
+
},
|
| 68171 |
+
{
|
| 68172 |
+
"epoch": 0.7949387755102041,
|
| 68173 |
+
"grad_norm": 0.07421875,
|
| 68174 |
+
"learning_rate": 0.012621664050235478,
|
| 68175 |
+
"loss": 2.375720977783203,
|
| 68176 |
+
"step": 19476
|
| 68177 |
+
},
|
| 68178 |
+
{
|
| 68179 |
+
"epoch": 0.7950204081632654,
|
| 68180 |
+
"grad_norm": 0.0751953125,
|
| 68181 |
+
"learning_rate": 0.012616640502354788,
|
| 68182 |
+
"loss": 2.3651504516601562,
|
| 68183 |
+
"step": 19478
|
| 68184 |
+
},
|
| 68185 |
+
{
|
| 68186 |
+
"epoch": 0.7951020408163265,
|
| 68187 |
+
"grad_norm": 0.06884765625,
|
| 68188 |
+
"learning_rate": 0.0126116169544741,
|
| 68189 |
+
"loss": 2.3463969230651855,
|
| 68190 |
+
"step": 19480
|
| 68191 |
+
},
|
| 68192 |
+
{
|
| 68193 |
+
"epoch": 0.7951836734693878,
|
| 68194 |
+
"grad_norm": 0.07958984375,
|
| 68195 |
+
"learning_rate": 0.012606593406593408,
|
| 68196 |
+
"loss": 2.3550612926483154,
|
| 68197 |
+
"step": 19482
|
| 68198 |
+
},
|
| 68199 |
+
{
|
| 68200 |
+
"epoch": 0.795265306122449,
|
| 68201 |
+
"grad_norm": 0.07421875,
|
| 68202 |
+
"learning_rate": 0.012601569858712715,
|
| 68203 |
+
"loss": 2.3722891807556152,
|
| 68204 |
+
"step": 19484
|
| 68205 |
+
},
|
| 68206 |
+
{
|
| 68207 |
+
"epoch": 0.7953469387755102,
|
| 68208 |
+
"grad_norm": 0.07177734375,
|
| 68209 |
+
"learning_rate": 0.012596546310832025,
|
| 68210 |
+
"loss": 2.340359687805176,
|
| 68211 |
+
"step": 19486
|
| 68212 |
+
},
|
| 68213 |
+
{
|
| 68214 |
+
"epoch": 0.7954285714285714,
|
| 68215 |
+
"grad_norm": 0.06982421875,
|
| 68216 |
+
"learning_rate": 0.012591522762951335,
|
| 68217 |
+
"loss": 2.3622655868530273,
|
| 68218 |
+
"step": 19488
|
| 68219 |
+
},
|
| 68220 |
+
{
|
| 68221 |
+
"epoch": 0.7955102040816326,
|
| 68222 |
+
"grad_norm": 0.0712890625,
|
| 68223 |
+
"learning_rate": 0.012586499215070645,
|
| 68224 |
+
"loss": 2.342348098754883,
|
| 68225 |
+
"step": 19490
|
| 68226 |
+
},
|
| 68227 |
+
{
|
| 68228 |
+
"epoch": 0.7955918367346939,
|
| 68229 |
+
"grad_norm": 0.0712890625,
|
| 68230 |
+
"learning_rate": 0.012581475667189953,
|
| 68231 |
+
"loss": 2.3823227882385254,
|
| 68232 |
+
"step": 19492
|
| 68233 |
+
},
|
| 68234 |
+
{
|
| 68235 |
+
"epoch": 0.7956734693877551,
|
| 68236 |
+
"grad_norm": 0.06982421875,
|
| 68237 |
+
"learning_rate": 0.01257645211930926,
|
| 68238 |
+
"loss": 2.3682281970977783,
|
| 68239 |
+
"step": 19494
|
| 68240 |
+
},
|
| 68241 |
+
{
|
| 68242 |
+
"epoch": 0.7957551020408163,
|
| 68243 |
+
"grad_norm": 0.072265625,
|
| 68244 |
+
"learning_rate": 0.012571428571428572,
|
| 68245 |
+
"loss": 2.3520236015319824,
|
| 68246 |
+
"step": 19496
|
| 68247 |
+
},
|
| 68248 |
+
{
|
| 68249 |
+
"epoch": 0.7958367346938775,
|
| 68250 |
+
"grad_norm": 0.072265625,
|
| 68251 |
+
"learning_rate": 0.012566405023547882,
|
| 68252 |
+
"loss": 2.3591842651367188,
|
| 68253 |
+
"step": 19498
|
| 68254 |
+
},
|
| 68255 |
+
{
|
| 68256 |
+
"epoch": 0.7959183673469388,
|
| 68257 |
+
"grad_norm": 0.07421875,
|
| 68258 |
+
"learning_rate": 0.01256138147566719,
|
| 68259 |
+
"loss": 2.3518619537353516,
|
| 68260 |
+
"step": 19500
|
| 68261 |
+
},
|
| 68262 |
+
{
|
| 68263 |
+
"epoch": 0.796,
|
| 68264 |
+
"grad_norm": 0.0703125,
|
| 68265 |
+
"learning_rate": 0.012556357927786501,
|
| 68266 |
+
"loss": 2.3446855545043945,
|
| 68267 |
+
"step": 19502
|
| 68268 |
+
},
|
| 68269 |
+
{
|
| 68270 |
+
"epoch": 0.7960816326530612,
|
| 68271 |
+
"grad_norm": 0.07177734375,
|
| 68272 |
+
"learning_rate": 0.012551334379905809,
|
| 68273 |
+
"loss": 2.409269332885742,
|
| 68274 |
+
"step": 19504
|
| 68275 |
+
},
|
| 68276 |
+
{
|
| 68277 |
+
"epoch": 0.7961632653061225,
|
| 68278 |
+
"grad_norm": 0.07421875,
|
| 68279 |
+
"learning_rate": 0.012546310832025117,
|
| 68280 |
+
"loss": 2.364818572998047,
|
| 68281 |
+
"step": 19506
|
| 68282 |
+
},
|
| 68283 |
+
{
|
| 68284 |
+
"epoch": 0.7962448979591836,
|
| 68285 |
+
"grad_norm": 0.07470703125,
|
| 68286 |
+
"learning_rate": 0.012541287284144427,
|
| 68287 |
+
"loss": 2.3533730506896973,
|
| 68288 |
+
"step": 19508
|
| 68289 |
+
},
|
| 68290 |
+
{
|
| 68291 |
+
"epoch": 0.7963265306122449,
|
| 68292 |
+
"grad_norm": 0.0703125,
|
| 68293 |
+
"learning_rate": 0.012536263736263738,
|
| 68294 |
+
"loss": 2.3455986976623535,
|
| 68295 |
+
"step": 19510
|
| 68296 |
+
},
|
| 68297 |
+
{
|
| 68298 |
+
"epoch": 0.7964081632653062,
|
| 68299 |
+
"grad_norm": 0.06982421875,
|
| 68300 |
+
"learning_rate": 0.012531240188383046,
|
| 68301 |
+
"loss": 2.3600990772247314,
|
| 68302 |
+
"step": 19512
|
| 68303 |
+
},
|
| 68304 |
+
{
|
| 68305 |
+
"epoch": 0.7964897959183673,
|
| 68306 |
+
"grad_norm": 0.07470703125,
|
| 68307 |
+
"learning_rate": 0.012526216640502354,
|
| 68308 |
+
"loss": 2.3625497817993164,
|
| 68309 |
+
"step": 19514
|
| 68310 |
+
},
|
| 68311 |
+
{
|
| 68312 |
+
"epoch": 0.7965714285714286,
|
| 68313 |
+
"grad_norm": 0.07568359375,
|
| 68314 |
+
"learning_rate": 0.012521193092621665,
|
| 68315 |
+
"loss": 2.363391876220703,
|
| 68316 |
+
"step": 19516
|
| 68317 |
+
},
|
| 68318 |
+
{
|
| 68319 |
+
"epoch": 0.7966530612244898,
|
| 68320 |
+
"grad_norm": 0.07421875,
|
| 68321 |
+
"learning_rate": 0.012516169544740973,
|
| 68322 |
+
"loss": 2.3531782627105713,
|
| 68323 |
+
"step": 19518
|
| 68324 |
+
},
|
| 68325 |
+
{
|
| 68326 |
+
"epoch": 0.796734693877551,
|
| 68327 |
+
"grad_norm": 0.07568359375,
|
| 68328 |
+
"learning_rate": 0.012511145996860283,
|
| 68329 |
+
"loss": 2.3748888969421387,
|
| 68330 |
+
"step": 19520
|
| 68331 |
+
},
|
| 68332 |
+
{
|
| 68333 |
+
"epoch": 0.7968163265306123,
|
| 68334 |
+
"grad_norm": 0.07470703125,
|
| 68335 |
+
"learning_rate": 0.012506122448979591,
|
| 68336 |
+
"loss": 2.3742713928222656,
|
| 68337 |
+
"step": 19522
|
| 68338 |
+
},
|
| 68339 |
+
{
|
| 68340 |
+
"epoch": 0.7968979591836735,
|
| 68341 |
+
"grad_norm": 0.072265625,
|
| 68342 |
+
"learning_rate": 0.012501098901098903,
|
| 68343 |
+
"loss": 2.3482837677001953,
|
| 68344 |
+
"step": 19524
|
| 68345 |
+
},
|
| 68346 |
+
{
|
| 68347 |
+
"epoch": 0.7969795918367347,
|
| 68348 |
+
"grad_norm": 0.0771484375,
|
| 68349 |
+
"learning_rate": 0.01249607535321821,
|
| 68350 |
+
"loss": 2.3652472496032715,
|
| 68351 |
+
"step": 19526
|
| 68352 |
+
},
|
| 68353 |
+
{
|
| 68354 |
+
"epoch": 0.7970612244897959,
|
| 68355 |
+
"grad_norm": 0.0791015625,
|
| 68356 |
+
"learning_rate": 0.012491051805337518,
|
| 68357 |
+
"loss": 2.403733968734741,
|
| 68358 |
+
"step": 19528
|
| 68359 |
+
},
|
| 68360 |
+
{
|
| 68361 |
+
"epoch": 0.7971428571428572,
|
| 68362 |
+
"grad_norm": 0.072265625,
|
| 68363 |
+
"learning_rate": 0.01248602825745683,
|
| 68364 |
+
"loss": 2.369178295135498,
|
| 68365 |
+
"step": 19530
|
| 68366 |
+
},
|
| 68367 |
+
{
|
| 68368 |
+
"epoch": 0.7972244897959183,
|
| 68369 |
+
"grad_norm": 0.06982421875,
|
| 68370 |
+
"learning_rate": 0.01248100470957614,
|
| 68371 |
+
"loss": 2.3255224227905273,
|
| 68372 |
+
"step": 19532
|
| 68373 |
+
},
|
| 68374 |
+
{
|
| 68375 |
+
"epoch": 0.7973061224489796,
|
| 68376 |
+
"grad_norm": 0.0712890625,
|
| 68377 |
+
"learning_rate": 0.012475981161695448,
|
| 68378 |
+
"loss": 2.341620922088623,
|
| 68379 |
+
"step": 19534
|
| 68380 |
+
},
|
| 68381 |
+
{
|
| 68382 |
+
"epoch": 0.7973877551020409,
|
| 68383 |
+
"grad_norm": 0.07470703125,
|
| 68384 |
+
"learning_rate": 0.012470957613814755,
|
| 68385 |
+
"loss": 2.3535399436950684,
|
| 68386 |
+
"step": 19536
|
| 68387 |
+
},
|
| 68388 |
+
{
|
| 68389 |
+
"epoch": 0.797469387755102,
|
| 68390 |
+
"grad_norm": 0.06640625,
|
| 68391 |
+
"learning_rate": 0.012465934065934067,
|
| 68392 |
+
"loss": 2.3349781036376953,
|
| 68393 |
+
"step": 19538
|
| 68394 |
+
},
|
| 68395 |
+
{
|
| 68396 |
+
"epoch": 0.7975510204081633,
|
| 68397 |
+
"grad_norm": 0.06884765625,
|
| 68398 |
+
"learning_rate": 0.012460910518053375,
|
| 68399 |
+
"loss": 2.2968239784240723,
|
| 68400 |
+
"step": 19540
|
| 68401 |
+
},
|
| 68402 |
+
{
|
| 68403 |
+
"epoch": 0.7976326530612244,
|
| 68404 |
+
"grad_norm": 0.0693359375,
|
| 68405 |
+
"learning_rate": 0.012455886970172685,
|
| 68406 |
+
"loss": 2.306023120880127,
|
| 68407 |
+
"step": 19542
|
| 68408 |
+
},
|
| 68409 |
+
{
|
| 68410 |
+
"epoch": 0.7977142857142857,
|
| 68411 |
+
"grad_norm": 0.06982421875,
|
| 68412 |
+
"learning_rate": 0.012450863422291993,
|
| 68413 |
+
"loss": 2.321829080581665,
|
| 68414 |
+
"step": 19544
|
| 68415 |
+
},
|
| 68416 |
+
{
|
| 68417 |
+
"epoch": 0.797795918367347,
|
| 68418 |
+
"grad_norm": 0.07763671875,
|
| 68419 |
+
"learning_rate": 0.012445839874411304,
|
| 68420 |
+
"loss": 2.334275245666504,
|
| 68421 |
+
"step": 19546
|
| 68422 |
+
},
|
| 68423 |
+
{
|
| 68424 |
+
"epoch": 0.7978775510204081,
|
| 68425 |
+
"grad_norm": 0.07177734375,
|
| 68426 |
+
"learning_rate": 0.012440816326530612,
|
| 68427 |
+
"loss": 2.3195035457611084,
|
| 68428 |
+
"step": 19548
|
| 68429 |
+
},
|
| 68430 |
+
{
|
| 68431 |
+
"epoch": 0.7979591836734694,
|
| 68432 |
+
"grad_norm": 0.080078125,
|
| 68433 |
+
"learning_rate": 0.012435792778649922,
|
| 68434 |
+
"loss": 2.3166139125823975,
|
| 68435 |
+
"step": 19550
|
| 68436 |
+
},
|
| 68437 |
+
{
|
| 68438 |
+
"epoch": 0.7980408163265306,
|
| 68439 |
+
"grad_norm": 0.06591796875,
|
| 68440 |
+
"learning_rate": 0.012430769230769231,
|
| 68441 |
+
"loss": 2.289377212524414,
|
| 68442 |
+
"step": 19552
|
| 68443 |
+
},
|
| 68444 |
+
{
|
| 68445 |
+
"epoch": 0.7981224489795918,
|
| 68446 |
+
"grad_norm": 0.072265625,
|
| 68447 |
+
"learning_rate": 0.012425745682888541,
|
| 68448 |
+
"loss": 2.3104300498962402,
|
| 68449 |
+
"step": 19554
|
| 68450 |
+
},
|
| 68451 |
+
{
|
| 68452 |
+
"epoch": 0.7982040816326531,
|
| 68453 |
+
"grad_norm": 0.072265625,
|
| 68454 |
+
"learning_rate": 0.012420722135007849,
|
| 68455 |
+
"loss": 2.3646931648254395,
|
| 68456 |
+
"step": 19556
|
| 68457 |
+
},
|
| 68458 |
+
{
|
| 68459 |
+
"epoch": 0.7982857142857143,
|
| 68460 |
+
"grad_norm": 0.072265625,
|
| 68461 |
+
"learning_rate": 0.012415698587127157,
|
| 68462 |
+
"loss": 2.329939365386963,
|
| 68463 |
+
"step": 19558
|
| 68464 |
+
},
|
| 68465 |
+
{
|
| 68466 |
+
"epoch": 0.7983673469387755,
|
| 68467 |
+
"grad_norm": 0.076171875,
|
| 68468 |
+
"learning_rate": 0.012410675039246468,
|
| 68469 |
+
"loss": 2.3487930297851562,
|
| 68470 |
+
"step": 19560
|
| 68471 |
+
},
|
| 68472 |
+
{
|
| 68473 |
+
"epoch": 0.7984489795918367,
|
| 68474 |
+
"grad_norm": 0.0732421875,
|
| 68475 |
+
"learning_rate": 0.012405651491365778,
|
| 68476 |
+
"loss": 2.334249496459961,
|
| 68477 |
+
"step": 19562
|
| 68478 |
+
},
|
| 68479 |
+
{
|
| 68480 |
+
"epoch": 0.798530612244898,
|
| 68481 |
+
"grad_norm": 0.07470703125,
|
| 68482 |
+
"learning_rate": 0.012400627943485086,
|
| 68483 |
+
"loss": 2.321340560913086,
|
| 68484 |
+
"step": 19564
|
| 68485 |
+
},
|
| 68486 |
+
{
|
| 68487 |
+
"epoch": 0.7986122448979592,
|
| 68488 |
+
"grad_norm": 0.072265625,
|
| 68489 |
+
"learning_rate": 0.012395604395604398,
|
| 68490 |
+
"loss": 2.3352317810058594,
|
| 68491 |
+
"step": 19566
|
| 68492 |
+
},
|
| 68493 |
+
{
|
| 68494 |
+
"epoch": 0.7986938775510204,
|
| 68495 |
+
"grad_norm": 0.07568359375,
|
| 68496 |
+
"learning_rate": 0.012390580847723705,
|
| 68497 |
+
"loss": 2.3469889163970947,
|
| 68498 |
+
"step": 19568
|
| 68499 |
+
},
|
| 68500 |
+
{
|
| 68501 |
+
"epoch": 0.7987755102040817,
|
| 68502 |
+
"grad_norm": 0.0791015625,
|
| 68503 |
+
"learning_rate": 0.012385557299843013,
|
| 68504 |
+
"loss": 2.355968475341797,
|
| 68505 |
+
"step": 19570
|
| 68506 |
+
},
|
| 68507 |
+
{
|
| 68508 |
+
"epoch": 0.7988571428571428,
|
| 68509 |
+
"grad_norm": 0.078125,
|
| 68510 |
+
"learning_rate": 0.012380533751962323,
|
| 68511 |
+
"loss": 2.339796543121338,
|
| 68512 |
+
"step": 19572
|
| 68513 |
+
},
|
| 68514 |
+
{
|
| 68515 |
+
"epoch": 0.7989387755102041,
|
| 68516 |
+
"grad_norm": 0.07373046875,
|
| 68517 |
+
"learning_rate": 0.012375510204081635,
|
| 68518 |
+
"loss": 2.3372936248779297,
|
| 68519 |
+
"step": 19574
|
| 68520 |
+
},
|
| 68521 |
+
{
|
| 68522 |
+
"epoch": 0.7990204081632654,
|
| 68523 |
+
"grad_norm": 0.07080078125,
|
| 68524 |
+
"learning_rate": 0.012370486656200943,
|
| 68525 |
+
"loss": 2.3368887901306152,
|
| 68526 |
+
"step": 19576
|
| 68527 |
+
},
|
| 68528 |
+
{
|
| 68529 |
+
"epoch": 0.7991020408163265,
|
| 68530 |
+
"grad_norm": 0.076171875,
|
| 68531 |
+
"learning_rate": 0.01236546310832025,
|
| 68532 |
+
"loss": 2.3702683448791504,
|
| 68533 |
+
"step": 19578
|
| 68534 |
+
},
|
| 68535 |
+
{
|
| 68536 |
+
"epoch": 0.7991836734693878,
|
| 68537 |
+
"grad_norm": 0.07568359375,
|
| 68538 |
+
"learning_rate": 0.01236043956043956,
|
| 68539 |
+
"loss": 2.3589913845062256,
|
| 68540 |
+
"step": 19580
|
| 68541 |
+
},
|
| 68542 |
+
{
|
| 68543 |
+
"epoch": 0.799265306122449,
|
| 68544 |
+
"grad_norm": 0.0732421875,
|
| 68545 |
+
"learning_rate": 0.01235541601255887,
|
| 68546 |
+
"loss": 2.3596179485321045,
|
| 68547 |
+
"step": 19582
|
| 68548 |
+
},
|
| 68549 |
+
{
|
| 68550 |
+
"epoch": 0.7993469387755102,
|
| 68551 |
+
"grad_norm": 0.07080078125,
|
| 68552 |
+
"learning_rate": 0.01235039246467818,
|
| 68553 |
+
"loss": 2.3832199573516846,
|
| 68554 |
+
"step": 19584
|
| 68555 |
+
},
|
| 68556 |
+
{
|
| 68557 |
+
"epoch": 0.7994285714285714,
|
| 68558 |
+
"grad_norm": 0.076171875,
|
| 68559 |
+
"learning_rate": 0.012345368916797488,
|
| 68560 |
+
"loss": 2.3422749042510986,
|
| 68561 |
+
"step": 19586
|
| 68562 |
+
},
|
| 68563 |
+
{
|
| 68564 |
+
"epoch": 0.7995102040816326,
|
| 68565 |
+
"grad_norm": 0.07568359375,
|
| 68566 |
+
"learning_rate": 0.012340345368916799,
|
| 68567 |
+
"loss": 2.3693618774414062,
|
| 68568 |
+
"step": 19588
|
| 68569 |
+
},
|
| 68570 |
+
{
|
| 68571 |
+
"epoch": 0.7995918367346939,
|
| 68572 |
+
"grad_norm": 0.07373046875,
|
| 68573 |
+
"learning_rate": 0.012335321821036107,
|
| 68574 |
+
"loss": 2.365375280380249,
|
| 68575 |
+
"step": 19590
|
| 68576 |
+
},
|
| 68577 |
+
{
|
| 68578 |
+
"epoch": 0.7996734693877551,
|
| 68579 |
+
"grad_norm": 0.07421875,
|
| 68580 |
+
"learning_rate": 0.012330298273155417,
|
| 68581 |
+
"loss": 2.3598885536193848,
|
| 68582 |
+
"step": 19592
|
| 68583 |
+
},
|
| 68584 |
+
{
|
| 68585 |
+
"epoch": 0.7997551020408163,
|
| 68586 |
+
"grad_norm": 0.07666015625,
|
| 68587 |
+
"learning_rate": 0.012325274725274725,
|
| 68588 |
+
"loss": 2.372281789779663,
|
| 68589 |
+
"step": 19594
|
| 68590 |
+
},
|
| 68591 |
+
{
|
| 68592 |
+
"epoch": 0.7998367346938775,
|
| 68593 |
+
"grad_norm": 0.07421875,
|
| 68594 |
+
"learning_rate": 0.012320251177394036,
|
| 68595 |
+
"loss": 2.3862695693969727,
|
| 68596 |
+
"step": 19596
|
| 68597 |
+
},
|
| 68598 |
+
{
|
| 68599 |
+
"epoch": 0.7999183673469388,
|
| 68600 |
+
"grad_norm": 0.07763671875,
|
| 68601 |
+
"learning_rate": 0.012315227629513344,
|
| 68602 |
+
"loss": 2.366892099380493,
|
| 68603 |
+
"step": 19598
|
| 68604 |
+
},
|
| 68605 |
+
{
|
| 68606 |
+
"epoch": 0.8,
|
| 68607 |
+
"grad_norm": 0.07177734375,
|
| 68608 |
+
"learning_rate": 0.012310204081632652,
|
| 68609 |
+
"loss": 2.3490514755249023,
|
| 68610 |
+
"step": 19600
|
| 68611 |
+
},
|
| 68612 |
+
{
|
| 68613 |
+
"epoch": 0.8000816326530612,
|
| 68614 |
+
"grad_norm": 0.07275390625,
|
| 68615 |
+
"learning_rate": 0.012305180533751963,
|
| 68616 |
+
"loss": 2.351602554321289,
|
| 68617 |
+
"step": 19602
|
| 68618 |
+
},
|
| 68619 |
+
{
|
| 68620 |
+
"epoch": 0.8001632653061225,
|
| 68621 |
+
"grad_norm": 0.072265625,
|
| 68622 |
+
"learning_rate": 0.012300156985871273,
|
| 68623 |
+
"loss": 2.3530547618865967,
|
| 68624 |
+
"step": 19604
|
| 68625 |
+
},
|
| 68626 |
+
{
|
| 68627 |
+
"epoch": 0.8002448979591836,
|
| 68628 |
+
"grad_norm": 0.06884765625,
|
| 68629 |
+
"learning_rate": 0.012295133437990581,
|
| 68630 |
+
"loss": 2.3380069732666016,
|
| 68631 |
+
"step": 19606
|
| 68632 |
+
},
|
| 68633 |
+
{
|
| 68634 |
+
"epoch": 0.8003265306122449,
|
| 68635 |
+
"grad_norm": 0.07275390625,
|
| 68636 |
+
"learning_rate": 0.012290109890109889,
|
| 68637 |
+
"loss": 2.3770291805267334,
|
| 68638 |
+
"step": 19608
|
| 68639 |
+
},
|
| 68640 |
+
{
|
| 68641 |
+
"epoch": 0.8004081632653062,
|
| 68642 |
+
"grad_norm": 0.072265625,
|
| 68643 |
+
"learning_rate": 0.0122850863422292,
|
| 68644 |
+
"loss": 2.334489345550537,
|
| 68645 |
+
"step": 19610
|
| 68646 |
+
},
|
| 68647 |
+
{
|
| 68648 |
+
"epoch": 0.8004897959183673,
|
| 68649 |
+
"grad_norm": 0.07421875,
|
| 68650 |
+
"learning_rate": 0.012280062794348508,
|
| 68651 |
+
"loss": 2.361966609954834,
|
| 68652 |
+
"step": 19612
|
| 68653 |
+
},
|
| 68654 |
+
{
|
| 68655 |
+
"epoch": 0.8005714285714286,
|
| 68656 |
+
"grad_norm": 0.0673828125,
|
| 68657 |
+
"learning_rate": 0.012275039246467818,
|
| 68658 |
+
"loss": 2.3430423736572266,
|
| 68659 |
+
"step": 19614
|
| 68660 |
+
},
|
| 68661 |
+
{
|
| 68662 |
+
"epoch": 0.8006530612244898,
|
| 68663 |
+
"grad_norm": 0.07763671875,
|
| 68664 |
+
"learning_rate": 0.01227001569858713,
|
| 68665 |
+
"loss": 2.363872528076172,
|
| 68666 |
+
"step": 19616
|
| 68667 |
+
},
|
| 68668 |
+
{
|
| 68669 |
+
"epoch": 0.800734693877551,
|
| 68670 |
+
"grad_norm": 0.07421875,
|
| 68671 |
+
"learning_rate": 0.012264992150706438,
|
| 68672 |
+
"loss": 2.362565279006958,
|
| 68673 |
+
"step": 19618
|
| 68674 |
+
},
|
| 68675 |
+
{
|
| 68676 |
+
"epoch": 0.8008163265306123,
|
| 68677 |
+
"grad_norm": 0.07470703125,
|
| 68678 |
+
"learning_rate": 0.012259968602825745,
|
| 68679 |
+
"loss": 2.379392385482788,
|
| 68680 |
+
"step": 19620
|
| 68681 |
+
},
|
| 68682 |
+
{
|
| 68683 |
+
"epoch": 0.8008979591836735,
|
| 68684 |
+
"grad_norm": 0.07958984375,
|
| 68685 |
+
"learning_rate": 0.012254945054945053,
|
| 68686 |
+
"loss": 2.3871548175811768,
|
| 68687 |
+
"step": 19622
|
| 68688 |
+
},
|
| 68689 |
+
{
|
| 68690 |
+
"epoch": 0.8009795918367347,
|
| 68691 |
+
"grad_norm": 0.07275390625,
|
| 68692 |
+
"learning_rate": 0.012249921507064365,
|
| 68693 |
+
"loss": 2.388363838195801,
|
| 68694 |
+
"step": 19624
|
| 68695 |
+
},
|
| 68696 |
+
{
|
| 68697 |
+
"epoch": 0.8010612244897959,
|
| 68698 |
+
"grad_norm": 0.078125,
|
| 68699 |
+
"learning_rate": 0.012244897959183675,
|
| 68700 |
+
"loss": 2.419611930847168,
|
| 68701 |
+
"step": 19626
|
| 68702 |
+
},
|
| 68703 |
+
{
|
| 68704 |
+
"epoch": 0.8011428571428572,
|
| 68705 |
+
"grad_norm": 0.07958984375,
|
| 68706 |
+
"learning_rate": 0.012239874411302983,
|
| 68707 |
+
"loss": 2.4101812839508057,
|
| 68708 |
+
"step": 19628
|
| 68709 |
+
},
|
| 68710 |
+
{
|
| 68711 |
+
"epoch": 0.8012244897959183,
|
| 68712 |
+
"grad_norm": 0.0771484375,
|
| 68713 |
+
"learning_rate": 0.01223485086342229,
|
| 68714 |
+
"loss": 2.392086982727051,
|
| 68715 |
+
"step": 19630
|
| 68716 |
+
},
|
| 68717 |
+
{
|
| 68718 |
+
"epoch": 0.8013061224489796,
|
| 68719 |
+
"grad_norm": 0.08056640625,
|
| 68720 |
+
"learning_rate": 0.012229827315541602,
|
| 68721 |
+
"loss": 2.4279868602752686,
|
| 68722 |
+
"step": 19632
|
| 68723 |
+
},
|
| 68724 |
+
{
|
| 68725 |
+
"epoch": 0.8013877551020409,
|
| 68726 |
+
"grad_norm": 0.07275390625,
|
| 68727 |
+
"learning_rate": 0.01222480376766091,
|
| 68728 |
+
"loss": 2.4118125438690186,
|
| 68729 |
+
"step": 19634
|
| 68730 |
+
},
|
| 68731 |
+
{
|
| 68732 |
+
"epoch": 0.801469387755102,
|
| 68733 |
+
"grad_norm": 0.072265625,
|
| 68734 |
+
"learning_rate": 0.01221978021978022,
|
| 68735 |
+
"loss": 2.418177843093872,
|
| 68736 |
+
"step": 19636
|
| 68737 |
+
},
|
| 68738 |
+
{
|
| 68739 |
+
"epoch": 0.8015510204081633,
|
| 68740 |
+
"grad_norm": 0.0771484375,
|
| 68741 |
+
"learning_rate": 0.012214756671899531,
|
| 68742 |
+
"loss": 2.424241065979004,
|
| 68743 |
+
"step": 19638
|
| 68744 |
+
},
|
| 68745 |
+
{
|
| 68746 |
+
"epoch": 0.8016326530612244,
|
| 68747 |
+
"grad_norm": 0.0771484375,
|
| 68748 |
+
"learning_rate": 0.012209733124018839,
|
| 68749 |
+
"loss": 2.4139513969421387,
|
| 68750 |
+
"step": 19640
|
| 68751 |
+
},
|
| 68752 |
+
{
|
| 68753 |
+
"epoch": 0.8017142857142857,
|
| 68754 |
+
"grad_norm": 0.07275390625,
|
| 68755 |
+
"learning_rate": 0.012204709576138147,
|
| 68756 |
+
"loss": 2.417970657348633,
|
| 68757 |
+
"step": 19642
|
| 68758 |
+
},
|
| 68759 |
+
{
|
| 68760 |
+
"epoch": 0.801795918367347,
|
| 68761 |
+
"grad_norm": 0.080078125,
|
| 68762 |
+
"learning_rate": 0.012199686028257457,
|
| 68763 |
+
"loss": 2.3920037746429443,
|
| 68764 |
+
"step": 19644
|
| 68765 |
+
},
|
| 68766 |
+
{
|
| 68767 |
+
"epoch": 0.8018775510204081,
|
| 68768 |
+
"grad_norm": 0.0810546875,
|
| 68769 |
+
"learning_rate": 0.012194662480376766,
|
| 68770 |
+
"loss": 2.3896732330322266,
|
| 68771 |
+
"step": 19646
|
| 68772 |
+
},
|
| 68773 |
+
{
|
| 68774 |
+
"epoch": 0.8019591836734694,
|
| 68775 |
+
"grad_norm": 0.072265625,
|
| 68776 |
+
"learning_rate": 0.012189638932496076,
|
| 68777 |
+
"loss": 2.4060072898864746,
|
| 68778 |
+
"step": 19648
|
| 68779 |
+
},
|
| 68780 |
+
{
|
| 68781 |
+
"epoch": 0.8020408163265306,
|
| 68782 |
+
"grad_norm": 0.07666015625,
|
| 68783 |
+
"learning_rate": 0.012184615384615384,
|
| 68784 |
+
"loss": 2.3897781372070312,
|
| 68785 |
+
"step": 19650
|
| 68786 |
+
},
|
| 68787 |
+
{
|
| 68788 |
+
"epoch": 0.8021224489795918,
|
| 68789 |
+
"grad_norm": 0.07275390625,
|
| 68790 |
+
"learning_rate": 0.012179591836734695,
|
| 68791 |
+
"loss": 2.4214906692504883,
|
| 68792 |
+
"step": 19652
|
| 68793 |
+
},
|
| 68794 |
+
{
|
| 68795 |
+
"epoch": 0.8022040816326531,
|
| 68796 |
+
"grad_norm": 0.072265625,
|
| 68797 |
+
"learning_rate": 0.012174568288854003,
|
| 68798 |
+
"loss": 2.439345598220825,
|
| 68799 |
+
"step": 19654
|
| 68800 |
+
},
|
| 68801 |
+
{
|
| 68802 |
+
"epoch": 0.8022857142857143,
|
| 68803 |
+
"grad_norm": 0.07763671875,
|
| 68804 |
+
"learning_rate": 0.012169544740973313,
|
| 68805 |
+
"loss": 2.429586887359619,
|
| 68806 |
+
"step": 19656
|
| 68807 |
+
},
|
| 68808 |
+
{
|
| 68809 |
+
"epoch": 0.8023673469387755,
|
| 68810 |
+
"grad_norm": 0.06982421875,
|
| 68811 |
+
"learning_rate": 0.012164521193092621,
|
| 68812 |
+
"loss": 2.426178455352783,
|
| 68813 |
+
"step": 19658
|
| 68814 |
+
},
|
| 68815 |
+
{
|
| 68816 |
+
"epoch": 0.8024489795918367,
|
| 68817 |
+
"grad_norm": 0.07470703125,
|
| 68818 |
+
"learning_rate": 0.012159497645211933,
|
| 68819 |
+
"loss": 2.4246809482574463,
|
| 68820 |
+
"step": 19660
|
| 68821 |
+
},
|
| 68822 |
+
{
|
| 68823 |
+
"epoch": 0.802530612244898,
|
| 68824 |
+
"grad_norm": 0.07177734375,
|
| 68825 |
+
"learning_rate": 0.01215447409733124,
|
| 68826 |
+
"loss": 2.3846840858459473,
|
| 68827 |
+
"step": 19662
|
| 68828 |
+
},
|
| 68829 |
+
{
|
| 68830 |
+
"epoch": 0.8026122448979592,
|
| 68831 |
+
"grad_norm": 0.078125,
|
| 68832 |
+
"learning_rate": 0.012149450549450548,
|
| 68833 |
+
"loss": 2.4173293113708496,
|
| 68834 |
+
"step": 19664
|
| 68835 |
+
},
|
| 68836 |
+
{
|
| 68837 |
+
"epoch": 0.8026938775510204,
|
| 68838 |
+
"grad_norm": 0.07177734375,
|
| 68839 |
+
"learning_rate": 0.012144427001569858,
|
| 68840 |
+
"loss": 2.411623001098633,
|
| 68841 |
+
"step": 19666
|
| 68842 |
+
},
|
| 68843 |
+
{
|
| 68844 |
+
"epoch": 0.8027755102040817,
|
| 68845 |
+
"grad_norm": 0.07421875,
|
| 68846 |
+
"learning_rate": 0.01213940345368917,
|
| 68847 |
+
"loss": 2.384122848510742,
|
| 68848 |
+
"step": 19668
|
| 68849 |
+
},
|
| 68850 |
+
{
|
| 68851 |
+
"epoch": 0.8028571428571428,
|
| 68852 |
+
"grad_norm": 0.07861328125,
|
| 68853 |
+
"learning_rate": 0.012134379905808478,
|
| 68854 |
+
"loss": 2.4190845489501953,
|
| 68855 |
+
"step": 19670
|
| 68856 |
+
},
|
| 68857 |
+
{
|
| 68858 |
+
"epoch": 0.8029387755102041,
|
| 68859 |
+
"grad_norm": 0.07373046875,
|
| 68860 |
+
"learning_rate": 0.012129356357927785,
|
| 68861 |
+
"loss": 2.4180305004119873,
|
| 68862 |
+
"step": 19672
|
| 68863 |
+
},
|
| 68864 |
+
{
|
| 68865 |
+
"epoch": 0.8030204081632653,
|
| 68866 |
+
"grad_norm": 0.07470703125,
|
| 68867 |
+
"learning_rate": 0.012124332810047097,
|
| 68868 |
+
"loss": 2.38792085647583,
|
| 68869 |
+
"step": 19674
|
| 68870 |
+
},
|
| 68871 |
+
{
|
| 68872 |
+
"epoch": 0.8031020408163265,
|
| 68873 |
+
"grad_norm": 0.0791015625,
|
| 68874 |
+
"learning_rate": 0.012119309262166405,
|
| 68875 |
+
"loss": 2.4028103351593018,
|
| 68876 |
+
"step": 19676
|
| 68877 |
+
},
|
| 68878 |
+
{
|
| 68879 |
+
"epoch": 0.8031836734693878,
|
| 68880 |
+
"grad_norm": 0.068359375,
|
| 68881 |
+
"learning_rate": 0.012114285714285715,
|
| 68882 |
+
"loss": 2.394305467605591,
|
| 68883 |
+
"step": 19678
|
| 68884 |
+
},
|
| 68885 |
+
{
|
| 68886 |
+
"epoch": 0.803265306122449,
|
| 68887 |
+
"grad_norm": 0.06884765625,
|
| 68888 |
+
"learning_rate": 0.012109262166405023,
|
| 68889 |
+
"loss": 2.376171112060547,
|
| 68890 |
+
"step": 19680
|
| 68891 |
+
},
|
| 68892 |
+
{
|
| 68893 |
+
"epoch": 0.8033469387755102,
|
| 68894 |
+
"grad_norm": 0.07275390625,
|
| 68895 |
+
"learning_rate": 0.012104238618524334,
|
| 68896 |
+
"loss": 2.3368752002716064,
|
| 68897 |
+
"step": 19682
|
| 68898 |
+
},
|
| 68899 |
+
{
|
| 68900 |
+
"epoch": 0.8034285714285714,
|
| 68901 |
+
"grad_norm": 0.06982421875,
|
| 68902 |
+
"learning_rate": 0.012099215070643642,
|
| 68903 |
+
"loss": 2.355717420578003,
|
| 68904 |
+
"step": 19684
|
| 68905 |
+
},
|
| 68906 |
+
{
|
| 68907 |
+
"epoch": 0.8035102040816327,
|
| 68908 |
+
"grad_norm": 0.0712890625,
|
| 68909 |
+
"learning_rate": 0.012094191522762952,
|
| 68910 |
+
"loss": 2.340198516845703,
|
| 68911 |
+
"step": 19686
|
| 68912 |
+
},
|
| 68913 |
+
{
|
| 68914 |
+
"epoch": 0.8035918367346939,
|
| 68915 |
+
"grad_norm": 0.068359375,
|
| 68916 |
+
"learning_rate": 0.012089167974882261,
|
| 68917 |
+
"loss": 2.3500404357910156,
|
| 68918 |
+
"step": 19688
|
| 68919 |
+
},
|
| 68920 |
+
{
|
| 68921 |
+
"epoch": 0.8036734693877551,
|
| 68922 |
+
"grad_norm": 0.07275390625,
|
| 68923 |
+
"learning_rate": 0.012084144427001571,
|
| 68924 |
+
"loss": 2.3426971435546875,
|
| 68925 |
+
"step": 19690
|
| 68926 |
+
},
|
| 68927 |
+
{
|
| 68928 |
+
"epoch": 0.8037551020408163,
|
| 68929 |
+
"grad_norm": 0.07177734375,
|
| 68930 |
+
"learning_rate": 0.012079120879120879,
|
| 68931 |
+
"loss": 2.339369297027588,
|
| 68932 |
+
"step": 19692
|
| 68933 |
+
},
|
| 68934 |
+
{
|
| 68935 |
+
"epoch": 0.8038367346938775,
|
| 68936 |
+
"grad_norm": 0.06982421875,
|
| 68937 |
+
"learning_rate": 0.012074097331240187,
|
| 68938 |
+
"loss": 2.341980457305908,
|
| 68939 |
+
"step": 19694
|
| 68940 |
+
},
|
| 68941 |
+
{
|
| 68942 |
+
"epoch": 0.8039183673469388,
|
| 68943 |
+
"grad_norm": 0.07080078125,
|
| 68944 |
+
"learning_rate": 0.012069073783359498,
|
| 68945 |
+
"loss": 2.360912322998047,
|
| 68946 |
+
"step": 19696
|
| 68947 |
+
},
|
| 68948 |
+
{
|
| 68949 |
+
"epoch": 0.804,
|
| 68950 |
+
"grad_norm": 0.07275390625,
|
| 68951 |
+
"learning_rate": 0.012064050235478808,
|
| 68952 |
+
"loss": 2.3762946128845215,
|
| 68953 |
+
"step": 19698
|
| 68954 |
+
},
|
| 68955 |
+
{
|
| 68956 |
+
"epoch": 0.8040816326530612,
|
| 68957 |
+
"grad_norm": 0.07373046875,
|
| 68958 |
+
"learning_rate": 0.012059026687598116,
|
| 68959 |
+
"loss": 2.376282215118408,
|
| 68960 |
+
"step": 19700
|
| 68961 |
+
},
|
| 68962 |
+
{
|
| 68963 |
+
"epoch": 0.8041632653061225,
|
| 68964 |
+
"grad_norm": 0.072265625,
|
| 68965 |
+
"learning_rate": 0.012054003139717427,
|
| 68966 |
+
"loss": 2.3489975929260254,
|
| 68967 |
+
"step": 19702
|
| 68968 |
+
},
|
| 68969 |
+
{
|
| 68970 |
+
"epoch": 0.8042448979591836,
|
| 68971 |
+
"grad_norm": 0.0732421875,
|
| 68972 |
+
"learning_rate": 0.012048979591836735,
|
| 68973 |
+
"loss": 2.361898422241211,
|
| 68974 |
+
"step": 19704
|
| 68975 |
+
},
|
| 68976 |
+
{
|
| 68977 |
+
"epoch": 0.8043265306122449,
|
| 68978 |
+
"grad_norm": 0.07080078125,
|
| 68979 |
+
"learning_rate": 0.012043956043956043,
|
| 68980 |
+
"loss": 2.3717527389526367,
|
| 68981 |
+
"step": 19706
|
| 68982 |
+
},
|
| 68983 |
+
{
|
| 68984 |
+
"epoch": 0.8044081632653062,
|
| 68985 |
+
"grad_norm": 0.072265625,
|
| 68986 |
+
"learning_rate": 0.012038932496075353,
|
| 68987 |
+
"loss": 2.368581771850586,
|
| 68988 |
+
"step": 19708
|
| 68989 |
+
},
|
| 68990 |
+
{
|
| 68991 |
+
"epoch": 0.8044897959183673,
|
| 68992 |
+
"grad_norm": 0.0732421875,
|
| 68993 |
+
"learning_rate": 0.012033908948194665,
|
| 68994 |
+
"loss": 2.356705665588379,
|
| 68995 |
+
"step": 19710
|
| 68996 |
+
},
|
| 68997 |
+
{
|
| 68998 |
+
"epoch": 0.8045714285714286,
|
| 68999 |
+
"grad_norm": 0.0810546875,
|
| 69000 |
+
"learning_rate": 0.012028885400313973,
|
| 69001 |
+
"loss": 2.3784570693969727,
|
| 69002 |
+
"step": 19712
|
| 69003 |
+
},
|
| 69004 |
+
{
|
| 69005 |
+
"epoch": 0.8046530612244898,
|
| 69006 |
+
"grad_norm": 0.07470703125,
|
| 69007 |
+
"learning_rate": 0.01202386185243328,
|
| 69008 |
+
"loss": 2.347832202911377,
|
| 69009 |
+
"step": 19714
|
| 69010 |
+
},
|
| 69011 |
+
{
|
| 69012 |
+
"epoch": 0.804734693877551,
|
| 69013 |
+
"grad_norm": 0.07421875,
|
| 69014 |
+
"learning_rate": 0.012018838304552588,
|
| 69015 |
+
"loss": 2.3770973682403564,
|
| 69016 |
+
"step": 19716
|
| 69017 |
+
},
|
| 69018 |
+
{
|
| 69019 |
+
"epoch": 0.8048163265306122,
|
| 69020 |
+
"grad_norm": 0.07080078125,
|
| 69021 |
+
"learning_rate": 0.0120138147566719,
|
| 69022 |
+
"loss": 2.3499016761779785,
|
| 69023 |
+
"step": 19718
|
| 69024 |
+
},
|
| 69025 |
+
{
|
| 69026 |
+
"epoch": 0.8048979591836735,
|
| 69027 |
+
"grad_norm": 0.06787109375,
|
| 69028 |
+
"learning_rate": 0.01200879120879121,
|
| 69029 |
+
"loss": 2.373753309249878,
|
| 69030 |
+
"step": 19720
|
| 69031 |
+
},
|
| 69032 |
+
{
|
| 69033 |
+
"epoch": 0.8049795918367347,
|
| 69034 |
+
"grad_norm": 0.07080078125,
|
| 69035 |
+
"learning_rate": 0.012003767660910518,
|
| 69036 |
+
"loss": 2.381924629211426,
|
| 69037 |
+
"step": 19722
|
| 69038 |
+
},
|
| 69039 |
+
{
|
| 69040 |
+
"epoch": 0.8050612244897959,
|
| 69041 |
+
"grad_norm": 0.07080078125,
|
| 69042 |
+
"learning_rate": 0.011998744113029827,
|
| 69043 |
+
"loss": 2.3731064796447754,
|
| 69044 |
+
"step": 19724
|
| 69045 |
+
},
|
| 69046 |
+
{
|
| 69047 |
+
"epoch": 0.8051428571428572,
|
| 69048 |
+
"grad_norm": 0.0693359375,
|
| 69049 |
+
"learning_rate": 0.011993720565149137,
|
| 69050 |
+
"loss": 2.348055124282837,
|
| 69051 |
+
"step": 19726
|
| 69052 |
+
},
|
| 69053 |
+
{
|
| 69054 |
+
"epoch": 0.8052244897959183,
|
| 69055 |
+
"grad_norm": 0.0703125,
|
| 69056 |
+
"learning_rate": 0.011988697017268445,
|
| 69057 |
+
"loss": 2.364255905151367,
|
| 69058 |
+
"step": 19728
|
| 69059 |
+
},
|
| 69060 |
+
{
|
| 69061 |
+
"epoch": 0.8053061224489796,
|
| 69062 |
+
"grad_norm": 0.0751953125,
|
| 69063 |
+
"learning_rate": 0.011983673469387756,
|
| 69064 |
+
"loss": 2.3294835090637207,
|
| 69065 |
+
"step": 19730
|
| 69066 |
+
},
|
| 69067 |
+
{
|
| 69068 |
+
"epoch": 0.8053877551020409,
|
| 69069 |
+
"grad_norm": 0.0673828125,
|
| 69070 |
+
"learning_rate": 0.011978649921507064,
|
| 69071 |
+
"loss": 2.343836545944214,
|
| 69072 |
+
"step": 19732
|
| 69073 |
+
},
|
| 69074 |
+
{
|
| 69075 |
+
"epoch": 0.805469387755102,
|
| 69076 |
+
"grad_norm": 0.072265625,
|
| 69077 |
+
"learning_rate": 0.011973626373626374,
|
| 69078 |
+
"loss": 2.360543727874756,
|
| 69079 |
+
"step": 19734
|
| 69080 |
+
},
|
| 69081 |
+
{
|
| 69082 |
+
"epoch": 0.8055510204081633,
|
| 69083 |
+
"grad_norm": 0.0732421875,
|
| 69084 |
+
"learning_rate": 0.011968602825745682,
|
| 69085 |
+
"loss": 2.361093044281006,
|
| 69086 |
+
"step": 19736
|
| 69087 |
+
},
|
| 69088 |
+
{
|
| 69089 |
+
"epoch": 0.8056326530612244,
|
| 69090 |
+
"grad_norm": 0.0712890625,
|
| 69091 |
+
"learning_rate": 0.011963579277864992,
|
| 69092 |
+
"loss": 2.340592384338379,
|
| 69093 |
+
"step": 19738
|
| 69094 |
+
},
|
| 69095 |
+
{
|
| 69096 |
+
"epoch": 0.8057142857142857,
|
| 69097 |
+
"grad_norm": 0.06982421875,
|
| 69098 |
+
"learning_rate": 0.011958555729984301,
|
| 69099 |
+
"loss": 2.3814022541046143,
|
| 69100 |
+
"step": 19740
|
| 69101 |
+
},
|
| 69102 |
+
{
|
| 69103 |
+
"epoch": 0.805795918367347,
|
| 69104 |
+
"grad_norm": 0.07275390625,
|
| 69105 |
+
"learning_rate": 0.011953532182103611,
|
| 69106 |
+
"loss": 2.3910179138183594,
|
| 69107 |
+
"step": 19742
|
| 69108 |
+
},
|
| 69109 |
+
{
|
| 69110 |
+
"epoch": 0.8058775510204081,
|
| 69111 |
+
"grad_norm": 0.07275390625,
|
| 69112 |
+
"learning_rate": 0.01194850863422292,
|
| 69113 |
+
"loss": 2.3585877418518066,
|
| 69114 |
+
"step": 19744
|
| 69115 |
+
},
|
| 69116 |
+
{
|
| 69117 |
+
"epoch": 0.8059591836734694,
|
| 69118 |
+
"grad_norm": 0.0732421875,
|
| 69119 |
+
"learning_rate": 0.011943485086342229,
|
| 69120 |
+
"loss": 2.3598692417144775,
|
| 69121 |
+
"step": 19746
|
| 69122 |
+
},
|
| 69123 |
+
{
|
| 69124 |
+
"epoch": 0.8060408163265306,
|
| 69125 |
+
"grad_norm": 0.06982421875,
|
| 69126 |
+
"learning_rate": 0.011938461538461538,
|
| 69127 |
+
"loss": 2.378390073776245,
|
| 69128 |
+
"step": 19748
|
| 69129 |
+
},
|
| 69130 |
+
{
|
| 69131 |
+
"epoch": 0.8061224489795918,
|
| 69132 |
+
"grad_norm": 0.0693359375,
|
| 69133 |
+
"learning_rate": 0.011933437990580848,
|
| 69134 |
+
"loss": 2.374227523803711,
|
| 69135 |
+
"step": 19750
|
| 69136 |
+
},
|
| 69137 |
+
{
|
| 69138 |
+
"epoch": 0.8062040816326531,
|
| 69139 |
+
"grad_norm": 0.0693359375,
|
| 69140 |
+
"learning_rate": 0.011928414442700158,
|
| 69141 |
+
"loss": 2.329139471054077,
|
| 69142 |
+
"step": 19752
|
| 69143 |
+
},
|
| 69144 |
+
{
|
| 69145 |
+
"epoch": 0.8062857142857143,
|
| 69146 |
+
"grad_norm": 0.07470703125,
|
| 69147 |
+
"learning_rate": 0.011923390894819468,
|
| 69148 |
+
"loss": 2.377678632736206,
|
| 69149 |
+
"step": 19754
|
| 69150 |
+
},
|
| 69151 |
+
{
|
| 69152 |
+
"epoch": 0.8063673469387755,
|
| 69153 |
+
"grad_norm": 0.06982421875,
|
| 69154 |
+
"learning_rate": 0.011918367346938775,
|
| 69155 |
+
"loss": 2.3697052001953125,
|
| 69156 |
+
"step": 19756
|
| 69157 |
+
},
|
| 69158 |
+
{
|
| 69159 |
+
"epoch": 0.8064489795918367,
|
| 69160 |
+
"grad_norm": 0.07275390625,
|
| 69161 |
+
"learning_rate": 0.011913343799058085,
|
| 69162 |
+
"loss": 2.385080337524414,
|
| 69163 |
+
"step": 19758
|
| 69164 |
+
},
|
| 69165 |
+
{
|
| 69166 |
+
"epoch": 0.806530612244898,
|
| 69167 |
+
"grad_norm": 0.0771484375,
|
| 69168 |
+
"learning_rate": 0.011908320251177395,
|
| 69169 |
+
"loss": 2.368070363998413,
|
| 69170 |
+
"step": 19760
|
| 69171 |
+
},
|
| 69172 |
+
{
|
| 69173 |
+
"epoch": 0.8066122448979591,
|
| 69174 |
+
"grad_norm": 0.07373046875,
|
| 69175 |
+
"learning_rate": 0.011903296703296705,
|
| 69176 |
+
"loss": 2.3694677352905273,
|
| 69177 |
+
"step": 19762
|
| 69178 |
+
},
|
| 69179 |
+
{
|
| 69180 |
+
"epoch": 0.8066938775510204,
|
| 69181 |
+
"grad_norm": 0.07470703125,
|
| 69182 |
+
"learning_rate": 0.011898273155416013,
|
| 69183 |
+
"loss": 2.341770648956299,
|
| 69184 |
+
"step": 19764
|
| 69185 |
+
},
|
| 69186 |
+
{
|
| 69187 |
+
"epoch": 0.8067755102040817,
|
| 69188 |
+
"grad_norm": 0.07421875,
|
| 69189 |
+
"learning_rate": 0.011893249607535322,
|
| 69190 |
+
"loss": 2.3311171531677246,
|
| 69191 |
+
"step": 19766
|
| 69192 |
+
},
|
| 69193 |
+
{
|
| 69194 |
+
"epoch": 0.8068571428571428,
|
| 69195 |
+
"grad_norm": 0.072265625,
|
| 69196 |
+
"learning_rate": 0.01188822605965463,
|
| 69197 |
+
"loss": 2.3576769828796387,
|
| 69198 |
+
"step": 19768
|
| 69199 |
+
},
|
| 69200 |
+
{
|
| 69201 |
+
"epoch": 0.8069387755102041,
|
| 69202 |
+
"grad_norm": 0.0732421875,
|
| 69203 |
+
"learning_rate": 0.01188320251177394,
|
| 69204 |
+
"loss": 2.3769869804382324,
|
| 69205 |
+
"step": 19770
|
| 69206 |
+
},
|
| 69207 |
+
{
|
| 69208 |
+
"epoch": 0.8070204081632653,
|
| 69209 |
+
"grad_norm": 0.06884765625,
|
| 69210 |
+
"learning_rate": 0.011878178963893251,
|
| 69211 |
+
"loss": 2.368157386779785,
|
| 69212 |
+
"step": 19772
|
| 69213 |
+
},
|
| 69214 |
+
{
|
| 69215 |
+
"epoch": 0.8071020408163265,
|
| 69216 |
+
"grad_norm": 0.06787109375,
|
| 69217 |
+
"learning_rate": 0.01187315541601256,
|
| 69218 |
+
"loss": 2.377338409423828,
|
| 69219 |
+
"step": 19774
|
| 69220 |
+
},
|
| 69221 |
+
{
|
| 69222 |
+
"epoch": 0.8071836734693878,
|
| 69223 |
+
"grad_norm": 0.072265625,
|
| 69224 |
+
"learning_rate": 0.011868131868131869,
|
| 69225 |
+
"loss": 2.3505287170410156,
|
| 69226 |
+
"step": 19776
|
| 69227 |
+
},
|
| 69228 |
+
{
|
| 69229 |
+
"epoch": 0.807265306122449,
|
| 69230 |
+
"grad_norm": 0.07177734375,
|
| 69231 |
+
"learning_rate": 0.011863108320251177,
|
| 69232 |
+
"loss": 2.3647561073303223,
|
| 69233 |
+
"step": 19778
|
| 69234 |
+
},
|
| 69235 |
+
{
|
| 69236 |
+
"epoch": 0.8073469387755102,
|
| 69237 |
+
"grad_norm": 0.0849609375,
|
| 69238 |
+
"learning_rate": 0.011858084772370487,
|
| 69239 |
+
"loss": 2.374401092529297,
|
| 69240 |
+
"step": 19780
|
| 69241 |
+
},
|
| 69242 |
+
{
|
| 69243 |
+
"epoch": 0.8074285714285714,
|
| 69244 |
+
"grad_norm": 0.07421875,
|
| 69245 |
+
"learning_rate": 0.011853061224489796,
|
| 69246 |
+
"loss": 2.3652830123901367,
|
| 69247 |
+
"step": 19782
|
| 69248 |
+
},
|
| 69249 |
+
{
|
| 69250 |
+
"epoch": 0.8075102040816327,
|
| 69251 |
+
"grad_norm": 0.0751953125,
|
| 69252 |
+
"learning_rate": 0.011848037676609106,
|
| 69253 |
+
"loss": 2.3782095909118652,
|
| 69254 |
+
"step": 19784
|
| 69255 |
+
},
|
| 69256 |
+
{
|
| 69257 |
+
"epoch": 0.8075918367346939,
|
| 69258 |
+
"grad_norm": 0.07275390625,
|
| 69259 |
+
"learning_rate": 0.011843014128728414,
|
| 69260 |
+
"loss": 2.3436713218688965,
|
| 69261 |
+
"step": 19786
|
| 69262 |
+
},
|
| 69263 |
+
{
|
| 69264 |
+
"epoch": 0.8076734693877551,
|
| 69265 |
+
"grad_norm": 0.0732421875,
|
| 69266 |
+
"learning_rate": 0.011837990580847724,
|
| 69267 |
+
"loss": 2.357180118560791,
|
| 69268 |
+
"step": 19788
|
| 69269 |
+
},
|
| 69270 |
+
{
|
| 69271 |
+
"epoch": 0.8077551020408164,
|
| 69272 |
+
"grad_norm": 0.0732421875,
|
| 69273 |
+
"learning_rate": 0.011832967032967033,
|
| 69274 |
+
"loss": 2.373889446258545,
|
| 69275 |
+
"step": 19790
|
| 69276 |
+
},
|
| 69277 |
+
{
|
| 69278 |
+
"epoch": 0.8078367346938775,
|
| 69279 |
+
"grad_norm": 0.07080078125,
|
| 69280 |
+
"learning_rate": 0.011827943485086343,
|
| 69281 |
+
"loss": 2.3825087547302246,
|
| 69282 |
+
"step": 19792
|
| 69283 |
+
},
|
| 69284 |
+
{
|
| 69285 |
+
"epoch": 0.8079183673469388,
|
| 69286 |
+
"grad_norm": 0.07373046875,
|
| 69287 |
+
"learning_rate": 0.011822919937205653,
|
| 69288 |
+
"loss": 2.3740861415863037,
|
| 69289 |
+
"step": 19794
|
| 69290 |
+
},
|
| 69291 |
+
{
|
| 69292 |
+
"epoch": 0.808,
|
| 69293 |
+
"grad_norm": 0.07275390625,
|
| 69294 |
+
"learning_rate": 0.01181789638932496,
|
| 69295 |
+
"loss": 2.3542816638946533,
|
| 69296 |
+
"step": 19796
|
| 69297 |
+
},
|
| 69298 |
+
{
|
| 69299 |
+
"epoch": 0.8080816326530612,
|
| 69300 |
+
"grad_norm": 0.07421875,
|
| 69301 |
+
"learning_rate": 0.01181287284144427,
|
| 69302 |
+
"loss": 2.3433971405029297,
|
| 69303 |
+
"step": 19798
|
| 69304 |
+
},
|
| 69305 |
+
{
|
| 69306 |
+
"epoch": 0.8081632653061225,
|
| 69307 |
+
"grad_norm": 0.072265625,
|
| 69308 |
+
"learning_rate": 0.011807849293563578,
|
| 69309 |
+
"loss": 2.3547396659851074,
|
| 69310 |
+
"step": 19800
|
| 69311 |
+
},
|
| 69312 |
+
{
|
| 69313 |
+
"epoch": 0.8082448979591836,
|
| 69314 |
+
"grad_norm": 0.07763671875,
|
| 69315 |
+
"learning_rate": 0.011802825745682888,
|
| 69316 |
+
"loss": 2.333374500274658,
|
| 69317 |
+
"step": 19802
|
| 69318 |
+
},
|
| 69319 |
+
{
|
| 69320 |
+
"epoch": 0.8083265306122449,
|
| 69321 |
+
"grad_norm": 0.07275390625,
|
| 69322 |
+
"learning_rate": 0.011797802197802198,
|
| 69323 |
+
"loss": 2.3479530811309814,
|
| 69324 |
+
"step": 19804
|
| 69325 |
+
},
|
| 69326 |
+
{
|
| 69327 |
+
"epoch": 0.8084081632653061,
|
| 69328 |
+
"grad_norm": 0.07080078125,
|
| 69329 |
+
"learning_rate": 0.011792778649921508,
|
| 69330 |
+
"loss": 2.337071418762207,
|
| 69331 |
+
"step": 19806
|
| 69332 |
+
},
|
| 69333 |
+
{
|
| 69334 |
+
"epoch": 0.8084897959183673,
|
| 69335 |
+
"grad_norm": 0.06884765625,
|
| 69336 |
+
"learning_rate": 0.011787755102040817,
|
| 69337 |
+
"loss": 2.336080551147461,
|
| 69338 |
+
"step": 19808
|
| 69339 |
+
},
|
| 69340 |
+
{
|
| 69341 |
+
"epoch": 0.8085714285714286,
|
| 69342 |
+
"grad_norm": 0.07373046875,
|
| 69343 |
+
"learning_rate": 0.011782731554160125,
|
| 69344 |
+
"loss": 2.387566089630127,
|
| 69345 |
+
"step": 19810
|
| 69346 |
+
},
|
| 69347 |
+
{
|
| 69348 |
+
"epoch": 0.8086530612244898,
|
| 69349 |
+
"grad_norm": 0.07373046875,
|
| 69350 |
+
"learning_rate": 0.011777708006279435,
|
| 69351 |
+
"loss": 2.3615612983703613,
|
| 69352 |
+
"step": 19812
|
| 69353 |
+
},
|
| 69354 |
+
{
|
| 69355 |
+
"epoch": 0.808734693877551,
|
| 69356 |
+
"grad_norm": 0.0703125,
|
| 69357 |
+
"learning_rate": 0.011772684458398745,
|
| 69358 |
+
"loss": 2.3455235958099365,
|
| 69359 |
+
"step": 19814
|
| 69360 |
+
},
|
| 69361 |
+
{
|
| 69362 |
+
"epoch": 0.8088163265306122,
|
| 69363 |
+
"grad_norm": 0.07177734375,
|
| 69364 |
+
"learning_rate": 0.011767660910518054,
|
| 69365 |
+
"loss": 2.369464874267578,
|
| 69366 |
+
"step": 19816
|
| 69367 |
+
},
|
| 69368 |
+
{
|
| 69369 |
+
"epoch": 0.8088979591836735,
|
| 69370 |
+
"grad_norm": 0.0712890625,
|
| 69371 |
+
"learning_rate": 0.011762637362637362,
|
| 69372 |
+
"loss": 2.371023416519165,
|
| 69373 |
+
"step": 19818
|
| 69374 |
+
},
|
| 69375 |
+
{
|
| 69376 |
+
"epoch": 0.8089795918367347,
|
| 69377 |
+
"grad_norm": 0.0712890625,
|
| 69378 |
+
"learning_rate": 0.011757613814756672,
|
| 69379 |
+
"loss": 2.3752963542938232,
|
| 69380 |
+
"step": 19820
|
| 69381 |
+
},
|
| 69382 |
+
{
|
| 69383 |
+
"epoch": 0.8090612244897959,
|
| 69384 |
+
"grad_norm": 0.07177734375,
|
| 69385 |
+
"learning_rate": 0.01175259026687598,
|
| 69386 |
+
"loss": 2.332517623901367,
|
| 69387 |
+
"step": 19822
|
| 69388 |
+
},
|
| 69389 |
+
{
|
| 69390 |
+
"epoch": 0.8091428571428572,
|
| 69391 |
+
"grad_norm": 0.0751953125,
|
| 69392 |
+
"learning_rate": 0.011747566718995291,
|
| 69393 |
+
"loss": 2.377016544342041,
|
| 69394 |
+
"step": 19824
|
| 69395 |
+
},
|
| 69396 |
+
{
|
| 69397 |
+
"epoch": 0.8092244897959183,
|
| 69398 |
+
"grad_norm": 0.076171875,
|
| 69399 |
+
"learning_rate": 0.011742543171114601,
|
| 69400 |
+
"loss": 2.3930702209472656,
|
| 69401 |
+
"step": 19826
|
| 69402 |
+
},
|
| 69403 |
+
{
|
| 69404 |
+
"epoch": 0.8093061224489796,
|
| 69405 |
+
"grad_norm": 0.07568359375,
|
| 69406 |
+
"learning_rate": 0.011737519623233909,
|
| 69407 |
+
"loss": 2.374648094177246,
|
| 69408 |
+
"step": 19828
|
| 69409 |
+
},
|
| 69410 |
+
{
|
| 69411 |
+
"epoch": 0.8093877551020409,
|
| 69412 |
+
"grad_norm": 0.07177734375,
|
| 69413 |
+
"learning_rate": 0.011732496075353219,
|
| 69414 |
+
"loss": 2.324946403503418,
|
| 69415 |
+
"step": 19830
|
| 69416 |
+
},
|
| 69417 |
+
{
|
| 69418 |
+
"epoch": 0.809469387755102,
|
| 69419 |
+
"grad_norm": 0.0732421875,
|
| 69420 |
+
"learning_rate": 0.011727472527472527,
|
| 69421 |
+
"loss": 2.3560104370117188,
|
| 69422 |
+
"step": 19832
|
| 69423 |
+
},
|
| 69424 |
+
{
|
| 69425 |
+
"epoch": 0.8095510204081633,
|
| 69426 |
+
"grad_norm": 0.07275390625,
|
| 69427 |
+
"learning_rate": 0.011722448979591836,
|
| 69428 |
+
"loss": 2.329263687133789,
|
| 69429 |
+
"step": 19834
|
| 69430 |
+
},
|
| 69431 |
+
{
|
| 69432 |
+
"epoch": 0.8096326530612244,
|
| 69433 |
+
"grad_norm": 0.06884765625,
|
| 69434 |
+
"learning_rate": 0.011717425431711146,
|
| 69435 |
+
"loss": 2.3265180587768555,
|
| 69436 |
+
"step": 19836
|
| 69437 |
+
},
|
| 69438 |
+
{
|
| 69439 |
+
"epoch": 0.8097142857142857,
|
| 69440 |
+
"grad_norm": 0.0693359375,
|
| 69441 |
+
"learning_rate": 0.011712401883830456,
|
| 69442 |
+
"loss": 2.317690372467041,
|
| 69443 |
+
"step": 19838
|
| 69444 |
+
},
|
| 69445 |
+
{
|
| 69446 |
+
"epoch": 0.809795918367347,
|
| 69447 |
+
"grad_norm": 0.0732421875,
|
| 69448 |
+
"learning_rate": 0.011707378335949765,
|
| 69449 |
+
"loss": 2.3084182739257812,
|
| 69450 |
+
"step": 19840
|
| 69451 |
+
},
|
| 69452 |
+
{
|
| 69453 |
+
"epoch": 0.8098775510204081,
|
| 69454 |
+
"grad_norm": 0.07177734375,
|
| 69455 |
+
"learning_rate": 0.011702354788069073,
|
| 69456 |
+
"loss": 2.313098669052124,
|
| 69457 |
+
"step": 19842
|
| 69458 |
+
},
|
| 69459 |
+
{
|
| 69460 |
+
"epoch": 0.8099591836734694,
|
| 69461 |
+
"grad_norm": 0.07275390625,
|
| 69462 |
+
"learning_rate": 0.011697331240188383,
|
| 69463 |
+
"loss": 2.295409679412842,
|
| 69464 |
+
"step": 19844
|
| 69465 |
}
|
| 69466 |
],
|
| 69467 |
"logging_steps": 2,
|
|
|
|
| 69481 |
"attributes": {}
|
| 69482 |
}
|
| 69483 |
},
|
| 69484 |
+
"total_flos": 6.585466428973069e+19,
|
| 69485 |
"train_batch_size": 6,
|
| 69486 |
"trial_name": null,
|
| 69487 |
"trial_params": null
|