Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 3900, checkpoint
Browse files- last-checkpoint/trainer_state.json +3427 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3900,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -10242,6 +10242,3430 @@
|
|
| 10242 |
"learning_rate": 0.005,
|
| 10243 |
"loss": 2.4524848461151123,
|
| 10244 |
"step": 2924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10245 |
}
|
| 10246 |
],
|
| 10247 |
"logging_steps": 2,
|
|
@@ -10261,7 +13685,7 @@
|
|
| 10261 |
"attributes": {}
|
| 10262 |
}
|
| 10263 |
},
|
| 10264 |
-
"total_flos":
|
| 10265 |
"train_batch_size": 4,
|
| 10266 |
"trial_name": null,
|
| 10267 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.1,
|
| 6 |
"eval_steps": 3900,
|
| 7 |
+
"global_step": 3900,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 10242 |
"learning_rate": 0.005,
|
| 10243 |
"loss": 2.4524848461151123,
|
| 10244 |
"step": 2924
|
| 10245 |
+
},
|
| 10246 |
+
{
|
| 10247 |
+
"epoch": 0.07502564102564102,
|
| 10248 |
+
"grad_norm": 0.08935546875,
|
| 10249 |
+
"learning_rate": 0.005,
|
| 10250 |
+
"loss": 2.4391305446624756,
|
| 10251 |
+
"step": 2926
|
| 10252 |
+
},
|
| 10253 |
+
{
|
| 10254 |
+
"epoch": 0.07507692307692308,
|
| 10255 |
+
"grad_norm": 0.0673828125,
|
| 10256 |
+
"learning_rate": 0.005,
|
| 10257 |
+
"loss": 2.4293200969696045,
|
| 10258 |
+
"step": 2928
|
| 10259 |
+
},
|
| 10260 |
+
{
|
| 10261 |
+
"epoch": 0.07512820512820513,
|
| 10262 |
+
"grad_norm": 0.06591796875,
|
| 10263 |
+
"learning_rate": 0.005,
|
| 10264 |
+
"loss": 2.4632315635681152,
|
| 10265 |
+
"step": 2930
|
| 10266 |
+
},
|
| 10267 |
+
{
|
| 10268 |
+
"epoch": 0.07517948717948718,
|
| 10269 |
+
"grad_norm": 0.06005859375,
|
| 10270 |
+
"learning_rate": 0.005,
|
| 10271 |
+
"loss": 2.460137367248535,
|
| 10272 |
+
"step": 2932
|
| 10273 |
+
},
|
| 10274 |
+
{
|
| 10275 |
+
"epoch": 0.07523076923076923,
|
| 10276 |
+
"grad_norm": 0.0654296875,
|
| 10277 |
+
"learning_rate": 0.005,
|
| 10278 |
+
"loss": 2.4655890464782715,
|
| 10279 |
+
"step": 2934
|
| 10280 |
+
},
|
| 10281 |
+
{
|
| 10282 |
+
"epoch": 0.07528205128205129,
|
| 10283 |
+
"grad_norm": 0.06982421875,
|
| 10284 |
+
"learning_rate": 0.005,
|
| 10285 |
+
"loss": 2.4589149951934814,
|
| 10286 |
+
"step": 2936
|
| 10287 |
+
},
|
| 10288 |
+
{
|
| 10289 |
+
"epoch": 0.07533333333333334,
|
| 10290 |
+
"grad_norm": 0.0732421875,
|
| 10291 |
+
"learning_rate": 0.005,
|
| 10292 |
+
"loss": 2.4794600009918213,
|
| 10293 |
+
"step": 2938
|
| 10294 |
+
},
|
| 10295 |
+
{
|
| 10296 |
+
"epoch": 0.07538461538461538,
|
| 10297 |
+
"grad_norm": 0.057861328125,
|
| 10298 |
+
"learning_rate": 0.005,
|
| 10299 |
+
"loss": 2.434568166732788,
|
| 10300 |
+
"step": 2940
|
| 10301 |
+
},
|
| 10302 |
+
{
|
| 10303 |
+
"epoch": 0.07543589743589743,
|
| 10304 |
+
"grad_norm": 0.0517578125,
|
| 10305 |
+
"learning_rate": 0.005,
|
| 10306 |
+
"loss": 2.4523746967315674,
|
| 10307 |
+
"step": 2942
|
| 10308 |
+
},
|
| 10309 |
+
{
|
| 10310 |
+
"epoch": 0.07548717948717949,
|
| 10311 |
+
"grad_norm": 0.0615234375,
|
| 10312 |
+
"learning_rate": 0.005,
|
| 10313 |
+
"loss": 2.464432954788208,
|
| 10314 |
+
"step": 2944
|
| 10315 |
+
},
|
| 10316 |
+
{
|
| 10317 |
+
"epoch": 0.07553846153846154,
|
| 10318 |
+
"grad_norm": 0.0595703125,
|
| 10319 |
+
"learning_rate": 0.005,
|
| 10320 |
+
"loss": 2.4381930828094482,
|
| 10321 |
+
"step": 2946
|
| 10322 |
+
},
|
| 10323 |
+
{
|
| 10324 |
+
"epoch": 0.07558974358974359,
|
| 10325 |
+
"grad_norm": 0.0751953125,
|
| 10326 |
+
"learning_rate": 0.005,
|
| 10327 |
+
"loss": 2.4721946716308594,
|
| 10328 |
+
"step": 2948
|
| 10329 |
+
},
|
| 10330 |
+
{
|
| 10331 |
+
"epoch": 0.07564102564102564,
|
| 10332 |
+
"grad_norm": 0.0673828125,
|
| 10333 |
+
"learning_rate": 0.005,
|
| 10334 |
+
"loss": 2.4443116188049316,
|
| 10335 |
+
"step": 2950
|
| 10336 |
+
},
|
| 10337 |
+
{
|
| 10338 |
+
"epoch": 0.0756923076923077,
|
| 10339 |
+
"grad_norm": 0.06787109375,
|
| 10340 |
+
"learning_rate": 0.005,
|
| 10341 |
+
"loss": 2.445600748062134,
|
| 10342 |
+
"step": 2952
|
| 10343 |
+
},
|
| 10344 |
+
{
|
| 10345 |
+
"epoch": 0.07574358974358975,
|
| 10346 |
+
"grad_norm": 0.07470703125,
|
| 10347 |
+
"learning_rate": 0.005,
|
| 10348 |
+
"loss": 2.4611687660217285,
|
| 10349 |
+
"step": 2954
|
| 10350 |
+
},
|
| 10351 |
+
{
|
| 10352 |
+
"epoch": 0.0757948717948718,
|
| 10353 |
+
"grad_norm": 0.07666015625,
|
| 10354 |
+
"learning_rate": 0.005,
|
| 10355 |
+
"loss": 2.4566166400909424,
|
| 10356 |
+
"step": 2956
|
| 10357 |
+
},
|
| 10358 |
+
{
|
| 10359 |
+
"epoch": 0.07584615384615384,
|
| 10360 |
+
"grad_norm": 0.07470703125,
|
| 10361 |
+
"learning_rate": 0.005,
|
| 10362 |
+
"loss": 2.4590096473693848,
|
| 10363 |
+
"step": 2958
|
| 10364 |
+
},
|
| 10365 |
+
{
|
| 10366 |
+
"epoch": 0.0758974358974359,
|
| 10367 |
+
"grad_norm": 0.0595703125,
|
| 10368 |
+
"learning_rate": 0.005,
|
| 10369 |
+
"loss": 2.480334520339966,
|
| 10370 |
+
"step": 2960
|
| 10371 |
+
},
|
| 10372 |
+
{
|
| 10373 |
+
"epoch": 0.07594871794871795,
|
| 10374 |
+
"grad_norm": 0.064453125,
|
| 10375 |
+
"learning_rate": 0.005,
|
| 10376 |
+
"loss": 2.4421913623809814,
|
| 10377 |
+
"step": 2962
|
| 10378 |
+
},
|
| 10379 |
+
{
|
| 10380 |
+
"epoch": 0.076,
|
| 10381 |
+
"grad_norm": 0.07080078125,
|
| 10382 |
+
"learning_rate": 0.005,
|
| 10383 |
+
"loss": 2.4547231197357178,
|
| 10384 |
+
"step": 2964
|
| 10385 |
+
},
|
| 10386 |
+
{
|
| 10387 |
+
"epoch": 0.07605128205128205,
|
| 10388 |
+
"grad_norm": 0.05419921875,
|
| 10389 |
+
"learning_rate": 0.005,
|
| 10390 |
+
"loss": 2.45581316947937,
|
| 10391 |
+
"step": 2966
|
| 10392 |
+
},
|
| 10393 |
+
{
|
| 10394 |
+
"epoch": 0.07610256410256411,
|
| 10395 |
+
"grad_norm": 0.0673828125,
|
| 10396 |
+
"learning_rate": 0.005,
|
| 10397 |
+
"loss": 2.4675562381744385,
|
| 10398 |
+
"step": 2968
|
| 10399 |
+
},
|
| 10400 |
+
{
|
| 10401 |
+
"epoch": 0.07615384615384616,
|
| 10402 |
+
"grad_norm": 0.05419921875,
|
| 10403 |
+
"learning_rate": 0.005,
|
| 10404 |
+
"loss": 2.4712769985198975,
|
| 10405 |
+
"step": 2970
|
| 10406 |
+
},
|
| 10407 |
+
{
|
| 10408 |
+
"epoch": 0.0762051282051282,
|
| 10409 |
+
"grad_norm": 0.0634765625,
|
| 10410 |
+
"learning_rate": 0.005,
|
| 10411 |
+
"loss": 2.4968442916870117,
|
| 10412 |
+
"step": 2972
|
| 10413 |
+
},
|
| 10414 |
+
{
|
| 10415 |
+
"epoch": 0.07625641025641025,
|
| 10416 |
+
"grad_norm": 0.08544921875,
|
| 10417 |
+
"learning_rate": 0.005,
|
| 10418 |
+
"loss": 2.4378502368927,
|
| 10419 |
+
"step": 2974
|
| 10420 |
+
},
|
| 10421 |
+
{
|
| 10422 |
+
"epoch": 0.07630769230769231,
|
| 10423 |
+
"grad_norm": 0.0791015625,
|
| 10424 |
+
"learning_rate": 0.005,
|
| 10425 |
+
"loss": 2.477668285369873,
|
| 10426 |
+
"step": 2976
|
| 10427 |
+
},
|
| 10428 |
+
{
|
| 10429 |
+
"epoch": 0.07635897435897436,
|
| 10430 |
+
"grad_norm": 0.08056640625,
|
| 10431 |
+
"learning_rate": 0.005,
|
| 10432 |
+
"loss": 2.484323501586914,
|
| 10433 |
+
"step": 2978
|
| 10434 |
+
},
|
| 10435 |
+
{
|
| 10436 |
+
"epoch": 0.07641025641025641,
|
| 10437 |
+
"grad_norm": 0.06982421875,
|
| 10438 |
+
"learning_rate": 0.005,
|
| 10439 |
+
"loss": 2.478813648223877,
|
| 10440 |
+
"step": 2980
|
| 10441 |
+
},
|
| 10442 |
+
{
|
| 10443 |
+
"epoch": 0.07646153846153846,
|
| 10444 |
+
"grad_norm": 0.0595703125,
|
| 10445 |
+
"learning_rate": 0.005,
|
| 10446 |
+
"loss": 2.4439213275909424,
|
| 10447 |
+
"step": 2982
|
| 10448 |
+
},
|
| 10449 |
+
{
|
| 10450 |
+
"epoch": 0.07651282051282052,
|
| 10451 |
+
"grad_norm": 0.06201171875,
|
| 10452 |
+
"learning_rate": 0.005,
|
| 10453 |
+
"loss": 2.4764420986175537,
|
| 10454 |
+
"step": 2984
|
| 10455 |
+
},
|
| 10456 |
+
{
|
| 10457 |
+
"epoch": 0.07656410256410257,
|
| 10458 |
+
"grad_norm": 0.05908203125,
|
| 10459 |
+
"learning_rate": 0.005,
|
| 10460 |
+
"loss": 2.436401128768921,
|
| 10461 |
+
"step": 2986
|
| 10462 |
+
},
|
| 10463 |
+
{
|
| 10464 |
+
"epoch": 0.07661538461538461,
|
| 10465 |
+
"grad_norm": 0.05322265625,
|
| 10466 |
+
"learning_rate": 0.005,
|
| 10467 |
+
"loss": 2.453610420227051,
|
| 10468 |
+
"step": 2988
|
| 10469 |
+
},
|
| 10470 |
+
{
|
| 10471 |
+
"epoch": 0.07666666666666666,
|
| 10472 |
+
"grad_norm": 0.059326171875,
|
| 10473 |
+
"learning_rate": 0.005,
|
| 10474 |
+
"loss": 2.469266653060913,
|
| 10475 |
+
"step": 2990
|
| 10476 |
+
},
|
| 10477 |
+
{
|
| 10478 |
+
"epoch": 0.07671794871794872,
|
| 10479 |
+
"grad_norm": 0.05859375,
|
| 10480 |
+
"learning_rate": 0.005,
|
| 10481 |
+
"loss": 2.4697442054748535,
|
| 10482 |
+
"step": 2992
|
| 10483 |
+
},
|
| 10484 |
+
{
|
| 10485 |
+
"epoch": 0.07676923076923077,
|
| 10486 |
+
"grad_norm": 0.068359375,
|
| 10487 |
+
"learning_rate": 0.005,
|
| 10488 |
+
"loss": 2.467418909072876,
|
| 10489 |
+
"step": 2994
|
| 10490 |
+
},
|
| 10491 |
+
{
|
| 10492 |
+
"epoch": 0.07682051282051282,
|
| 10493 |
+
"grad_norm": 0.08251953125,
|
| 10494 |
+
"learning_rate": 0.005,
|
| 10495 |
+
"loss": 2.4437427520751953,
|
| 10496 |
+
"step": 2996
|
| 10497 |
+
},
|
| 10498 |
+
{
|
| 10499 |
+
"epoch": 0.07687179487179487,
|
| 10500 |
+
"grad_norm": 0.07666015625,
|
| 10501 |
+
"learning_rate": 0.005,
|
| 10502 |
+
"loss": 2.503375768661499,
|
| 10503 |
+
"step": 2998
|
| 10504 |
+
},
|
| 10505 |
+
{
|
| 10506 |
+
"epoch": 0.07692307692307693,
|
| 10507 |
+
"grad_norm": 0.08251953125,
|
| 10508 |
+
"learning_rate": 0.005,
|
| 10509 |
+
"loss": 2.449251413345337,
|
| 10510 |
+
"step": 3000
|
| 10511 |
+
},
|
| 10512 |
+
{
|
| 10513 |
+
"epoch": 0.07697435897435898,
|
| 10514 |
+
"grad_norm": 0.08544921875,
|
| 10515 |
+
"learning_rate": 0.005,
|
| 10516 |
+
"loss": 2.449122667312622,
|
| 10517 |
+
"step": 3002
|
| 10518 |
+
},
|
| 10519 |
+
{
|
| 10520 |
+
"epoch": 0.07702564102564102,
|
| 10521 |
+
"grad_norm": 0.0673828125,
|
| 10522 |
+
"learning_rate": 0.005,
|
| 10523 |
+
"loss": 2.434711456298828,
|
| 10524 |
+
"step": 3004
|
| 10525 |
+
},
|
| 10526 |
+
{
|
| 10527 |
+
"epoch": 0.07707692307692307,
|
| 10528 |
+
"grad_norm": 0.056396484375,
|
| 10529 |
+
"learning_rate": 0.005,
|
| 10530 |
+
"loss": 2.4815056324005127,
|
| 10531 |
+
"step": 3006
|
| 10532 |
+
},
|
| 10533 |
+
{
|
| 10534 |
+
"epoch": 0.07712820512820513,
|
| 10535 |
+
"grad_norm": 0.0517578125,
|
| 10536 |
+
"learning_rate": 0.005,
|
| 10537 |
+
"loss": 2.4770801067352295,
|
| 10538 |
+
"step": 3008
|
| 10539 |
+
},
|
| 10540 |
+
{
|
| 10541 |
+
"epoch": 0.07717948717948718,
|
| 10542 |
+
"grad_norm": 0.056396484375,
|
| 10543 |
+
"learning_rate": 0.005,
|
| 10544 |
+
"loss": 2.4372060298919678,
|
| 10545 |
+
"step": 3010
|
| 10546 |
+
},
|
| 10547 |
+
{
|
| 10548 |
+
"epoch": 0.07723076923076923,
|
| 10549 |
+
"grad_norm": 0.060791015625,
|
| 10550 |
+
"learning_rate": 0.005,
|
| 10551 |
+
"loss": 2.45690655708313,
|
| 10552 |
+
"step": 3012
|
| 10553 |
+
},
|
| 10554 |
+
{
|
| 10555 |
+
"epoch": 0.07728205128205128,
|
| 10556 |
+
"grad_norm": 0.07177734375,
|
| 10557 |
+
"learning_rate": 0.005,
|
| 10558 |
+
"loss": 2.4508678913116455,
|
| 10559 |
+
"step": 3014
|
| 10560 |
+
},
|
| 10561 |
+
{
|
| 10562 |
+
"epoch": 0.07733333333333334,
|
| 10563 |
+
"grad_norm": 0.0673828125,
|
| 10564 |
+
"learning_rate": 0.005,
|
| 10565 |
+
"loss": 2.453127145767212,
|
| 10566 |
+
"step": 3016
|
| 10567 |
+
},
|
| 10568 |
+
{
|
| 10569 |
+
"epoch": 0.07738461538461539,
|
| 10570 |
+
"grad_norm": 0.0693359375,
|
| 10571 |
+
"learning_rate": 0.005,
|
| 10572 |
+
"loss": 2.4759812355041504,
|
| 10573 |
+
"step": 3018
|
| 10574 |
+
},
|
| 10575 |
+
{
|
| 10576 |
+
"epoch": 0.07743589743589743,
|
| 10577 |
+
"grad_norm": 0.07470703125,
|
| 10578 |
+
"learning_rate": 0.005,
|
| 10579 |
+
"loss": 2.4663047790527344,
|
| 10580 |
+
"step": 3020
|
| 10581 |
+
},
|
| 10582 |
+
{
|
| 10583 |
+
"epoch": 0.07748717948717948,
|
| 10584 |
+
"grad_norm": 0.060546875,
|
| 10585 |
+
"learning_rate": 0.005,
|
| 10586 |
+
"loss": 2.4800217151641846,
|
| 10587 |
+
"step": 3022
|
| 10588 |
+
},
|
| 10589 |
+
{
|
| 10590 |
+
"epoch": 0.07753846153846154,
|
| 10591 |
+
"grad_norm": 0.0576171875,
|
| 10592 |
+
"learning_rate": 0.005,
|
| 10593 |
+
"loss": 2.465784788131714,
|
| 10594 |
+
"step": 3024
|
| 10595 |
+
},
|
| 10596 |
+
{
|
| 10597 |
+
"epoch": 0.07758974358974359,
|
| 10598 |
+
"grad_norm": 0.058349609375,
|
| 10599 |
+
"learning_rate": 0.005,
|
| 10600 |
+
"loss": 2.485203504562378,
|
| 10601 |
+
"step": 3026
|
| 10602 |
+
},
|
| 10603 |
+
{
|
| 10604 |
+
"epoch": 0.07764102564102564,
|
| 10605 |
+
"grad_norm": 0.0751953125,
|
| 10606 |
+
"learning_rate": 0.005,
|
| 10607 |
+
"loss": 2.4638051986694336,
|
| 10608 |
+
"step": 3028
|
| 10609 |
+
},
|
| 10610 |
+
{
|
| 10611 |
+
"epoch": 0.07769230769230769,
|
| 10612 |
+
"grad_norm": 0.06201171875,
|
| 10613 |
+
"learning_rate": 0.005,
|
| 10614 |
+
"loss": 2.481015205383301,
|
| 10615 |
+
"step": 3030
|
| 10616 |
+
},
|
| 10617 |
+
{
|
| 10618 |
+
"epoch": 0.07774358974358975,
|
| 10619 |
+
"grad_norm": 0.07080078125,
|
| 10620 |
+
"learning_rate": 0.005,
|
| 10621 |
+
"loss": 2.4679746627807617,
|
| 10622 |
+
"step": 3032
|
| 10623 |
+
},
|
| 10624 |
+
{
|
| 10625 |
+
"epoch": 0.0777948717948718,
|
| 10626 |
+
"grad_norm": 0.06640625,
|
| 10627 |
+
"learning_rate": 0.005,
|
| 10628 |
+
"loss": 2.4689884185791016,
|
| 10629 |
+
"step": 3034
|
| 10630 |
+
},
|
| 10631 |
+
{
|
| 10632 |
+
"epoch": 0.07784615384615384,
|
| 10633 |
+
"grad_norm": 0.08837890625,
|
| 10634 |
+
"learning_rate": 0.005,
|
| 10635 |
+
"loss": 2.4693057537078857,
|
| 10636 |
+
"step": 3036
|
| 10637 |
+
},
|
| 10638 |
+
{
|
| 10639 |
+
"epoch": 0.0778974358974359,
|
| 10640 |
+
"grad_norm": 0.064453125,
|
| 10641 |
+
"learning_rate": 0.005,
|
| 10642 |
+
"loss": 2.4597320556640625,
|
| 10643 |
+
"step": 3038
|
| 10644 |
+
},
|
| 10645 |
+
{
|
| 10646 |
+
"epoch": 0.07794871794871795,
|
| 10647 |
+
"grad_norm": 0.0615234375,
|
| 10648 |
+
"learning_rate": 0.005,
|
| 10649 |
+
"loss": 2.4300906658172607,
|
| 10650 |
+
"step": 3040
|
| 10651 |
+
},
|
| 10652 |
+
{
|
| 10653 |
+
"epoch": 0.078,
|
| 10654 |
+
"grad_norm": 0.060791015625,
|
| 10655 |
+
"learning_rate": 0.005,
|
| 10656 |
+
"loss": 2.466387987136841,
|
| 10657 |
+
"step": 3042
|
| 10658 |
+
},
|
| 10659 |
+
{
|
| 10660 |
+
"epoch": 0.07805128205128205,
|
| 10661 |
+
"grad_norm": 0.07421875,
|
| 10662 |
+
"learning_rate": 0.005,
|
| 10663 |
+
"loss": 2.4646570682525635,
|
| 10664 |
+
"step": 3044
|
| 10665 |
+
},
|
| 10666 |
+
{
|
| 10667 |
+
"epoch": 0.07810256410256411,
|
| 10668 |
+
"grad_norm": 0.07177734375,
|
| 10669 |
+
"learning_rate": 0.005,
|
| 10670 |
+
"loss": 2.456585168838501,
|
| 10671 |
+
"step": 3046
|
| 10672 |
+
},
|
| 10673 |
+
{
|
| 10674 |
+
"epoch": 0.07815384615384616,
|
| 10675 |
+
"grad_norm": 0.06787109375,
|
| 10676 |
+
"learning_rate": 0.005,
|
| 10677 |
+
"loss": 2.4241015911102295,
|
| 10678 |
+
"step": 3048
|
| 10679 |
+
},
|
| 10680 |
+
{
|
| 10681 |
+
"epoch": 0.0782051282051282,
|
| 10682 |
+
"grad_norm": 0.07275390625,
|
| 10683 |
+
"learning_rate": 0.005,
|
| 10684 |
+
"loss": 2.4715898036956787,
|
| 10685 |
+
"step": 3050
|
| 10686 |
+
},
|
| 10687 |
+
{
|
| 10688 |
+
"epoch": 0.07825641025641025,
|
| 10689 |
+
"grad_norm": 0.07080078125,
|
| 10690 |
+
"learning_rate": 0.005,
|
| 10691 |
+
"loss": 2.4652395248413086,
|
| 10692 |
+
"step": 3052
|
| 10693 |
+
},
|
| 10694 |
+
{
|
| 10695 |
+
"epoch": 0.07830769230769231,
|
| 10696 |
+
"grad_norm": 0.05859375,
|
| 10697 |
+
"learning_rate": 0.005,
|
| 10698 |
+
"loss": 2.45981764793396,
|
| 10699 |
+
"step": 3054
|
| 10700 |
+
},
|
| 10701 |
+
{
|
| 10702 |
+
"epoch": 0.07835897435897436,
|
| 10703 |
+
"grad_norm": 0.06494140625,
|
| 10704 |
+
"learning_rate": 0.005,
|
| 10705 |
+
"loss": 2.4705002307891846,
|
| 10706 |
+
"step": 3056
|
| 10707 |
+
},
|
| 10708 |
+
{
|
| 10709 |
+
"epoch": 0.07841025641025641,
|
| 10710 |
+
"grad_norm": 0.06298828125,
|
| 10711 |
+
"learning_rate": 0.005,
|
| 10712 |
+
"loss": 2.431687831878662,
|
| 10713 |
+
"step": 3058
|
| 10714 |
+
},
|
| 10715 |
+
{
|
| 10716 |
+
"epoch": 0.07846153846153846,
|
| 10717 |
+
"grad_norm": 0.07080078125,
|
| 10718 |
+
"learning_rate": 0.005,
|
| 10719 |
+
"loss": 2.4738523960113525,
|
| 10720 |
+
"step": 3060
|
| 10721 |
+
},
|
| 10722 |
+
{
|
| 10723 |
+
"epoch": 0.07851282051282052,
|
| 10724 |
+
"grad_norm": 0.09228515625,
|
| 10725 |
+
"learning_rate": 0.005,
|
| 10726 |
+
"loss": 2.4915764331817627,
|
| 10727 |
+
"step": 3062
|
| 10728 |
+
},
|
| 10729 |
+
{
|
| 10730 |
+
"epoch": 0.07856410256410257,
|
| 10731 |
+
"grad_norm": 0.0908203125,
|
| 10732 |
+
"learning_rate": 0.005,
|
| 10733 |
+
"loss": 2.416736125946045,
|
| 10734 |
+
"step": 3064
|
| 10735 |
+
},
|
| 10736 |
+
{
|
| 10737 |
+
"epoch": 0.07861538461538461,
|
| 10738 |
+
"grad_norm": 0.07470703125,
|
| 10739 |
+
"learning_rate": 0.005,
|
| 10740 |
+
"loss": 2.4293811321258545,
|
| 10741 |
+
"step": 3066
|
| 10742 |
+
},
|
| 10743 |
+
{
|
| 10744 |
+
"epoch": 0.07866666666666666,
|
| 10745 |
+
"grad_norm": 0.07275390625,
|
| 10746 |
+
"learning_rate": 0.005,
|
| 10747 |
+
"loss": 2.457162618637085,
|
| 10748 |
+
"step": 3068
|
| 10749 |
+
},
|
| 10750 |
+
{
|
| 10751 |
+
"epoch": 0.07871794871794872,
|
| 10752 |
+
"grad_norm": 0.06787109375,
|
| 10753 |
+
"learning_rate": 0.005,
|
| 10754 |
+
"loss": 2.472923994064331,
|
| 10755 |
+
"step": 3070
|
| 10756 |
+
},
|
| 10757 |
+
{
|
| 10758 |
+
"epoch": 0.07876923076923077,
|
| 10759 |
+
"grad_norm": 0.06787109375,
|
| 10760 |
+
"learning_rate": 0.005,
|
| 10761 |
+
"loss": 2.478541851043701,
|
| 10762 |
+
"step": 3072
|
| 10763 |
+
},
|
| 10764 |
+
{
|
| 10765 |
+
"epoch": 0.07882051282051282,
|
| 10766 |
+
"grad_norm": 0.057373046875,
|
| 10767 |
+
"learning_rate": 0.005,
|
| 10768 |
+
"loss": 2.4363579750061035,
|
| 10769 |
+
"step": 3074
|
| 10770 |
+
},
|
| 10771 |
+
{
|
| 10772 |
+
"epoch": 0.07887179487179487,
|
| 10773 |
+
"grad_norm": 0.056396484375,
|
| 10774 |
+
"learning_rate": 0.005,
|
| 10775 |
+
"loss": 2.473299503326416,
|
| 10776 |
+
"step": 3076
|
| 10777 |
+
},
|
| 10778 |
+
{
|
| 10779 |
+
"epoch": 0.07892307692307693,
|
| 10780 |
+
"grad_norm": 0.057861328125,
|
| 10781 |
+
"learning_rate": 0.005,
|
| 10782 |
+
"loss": 2.438486337661743,
|
| 10783 |
+
"step": 3078
|
| 10784 |
+
},
|
| 10785 |
+
{
|
| 10786 |
+
"epoch": 0.07897435897435898,
|
| 10787 |
+
"grad_norm": 0.05859375,
|
| 10788 |
+
"learning_rate": 0.005,
|
| 10789 |
+
"loss": 2.4957528114318848,
|
| 10790 |
+
"step": 3080
|
| 10791 |
+
},
|
| 10792 |
+
{
|
| 10793 |
+
"epoch": 0.07902564102564102,
|
| 10794 |
+
"grad_norm": 0.0693359375,
|
| 10795 |
+
"learning_rate": 0.005,
|
| 10796 |
+
"loss": 2.465491771697998,
|
| 10797 |
+
"step": 3082
|
| 10798 |
+
},
|
| 10799 |
+
{
|
| 10800 |
+
"epoch": 0.07907692307692307,
|
| 10801 |
+
"grad_norm": 0.056640625,
|
| 10802 |
+
"learning_rate": 0.005,
|
| 10803 |
+
"loss": 2.4558205604553223,
|
| 10804 |
+
"step": 3084
|
| 10805 |
+
},
|
| 10806 |
+
{
|
| 10807 |
+
"epoch": 0.07912820512820513,
|
| 10808 |
+
"grad_norm": 0.05810546875,
|
| 10809 |
+
"learning_rate": 0.005,
|
| 10810 |
+
"loss": 2.4638142585754395,
|
| 10811 |
+
"step": 3086
|
| 10812 |
+
},
|
| 10813 |
+
{
|
| 10814 |
+
"epoch": 0.07917948717948718,
|
| 10815 |
+
"grad_norm": 0.05517578125,
|
| 10816 |
+
"learning_rate": 0.005,
|
| 10817 |
+
"loss": 2.444430351257324,
|
| 10818 |
+
"step": 3088
|
| 10819 |
+
},
|
| 10820 |
+
{
|
| 10821 |
+
"epoch": 0.07923076923076923,
|
| 10822 |
+
"grad_norm": 0.06005859375,
|
| 10823 |
+
"learning_rate": 0.005,
|
| 10824 |
+
"loss": 2.4398293495178223,
|
| 10825 |
+
"step": 3090
|
| 10826 |
+
},
|
| 10827 |
+
{
|
| 10828 |
+
"epoch": 0.07928205128205128,
|
| 10829 |
+
"grad_norm": 0.0556640625,
|
| 10830 |
+
"learning_rate": 0.005,
|
| 10831 |
+
"loss": 2.4735140800476074,
|
| 10832 |
+
"step": 3092
|
| 10833 |
+
},
|
| 10834 |
+
{
|
| 10835 |
+
"epoch": 0.07933333333333334,
|
| 10836 |
+
"grad_norm": 0.056884765625,
|
| 10837 |
+
"learning_rate": 0.005,
|
| 10838 |
+
"loss": 2.4372525215148926,
|
| 10839 |
+
"step": 3094
|
| 10840 |
+
},
|
| 10841 |
+
{
|
| 10842 |
+
"epoch": 0.07938461538461539,
|
| 10843 |
+
"grad_norm": 0.08935546875,
|
| 10844 |
+
"learning_rate": 0.005,
|
| 10845 |
+
"loss": 2.464759111404419,
|
| 10846 |
+
"step": 3096
|
| 10847 |
+
},
|
| 10848 |
+
{
|
| 10849 |
+
"epoch": 0.07943589743589743,
|
| 10850 |
+
"grad_norm": 0.08740234375,
|
| 10851 |
+
"learning_rate": 0.005,
|
| 10852 |
+
"loss": 2.458038091659546,
|
| 10853 |
+
"step": 3098
|
| 10854 |
+
},
|
| 10855 |
+
{
|
| 10856 |
+
"epoch": 0.07948717948717948,
|
| 10857 |
+
"grad_norm": 0.0908203125,
|
| 10858 |
+
"learning_rate": 0.005,
|
| 10859 |
+
"loss": 2.4554927349090576,
|
| 10860 |
+
"step": 3100
|
| 10861 |
+
},
|
| 10862 |
+
{
|
| 10863 |
+
"epoch": 0.07953846153846154,
|
| 10864 |
+
"grad_norm": 0.09130859375,
|
| 10865 |
+
"learning_rate": 0.005,
|
| 10866 |
+
"loss": 2.486560344696045,
|
| 10867 |
+
"step": 3102
|
| 10868 |
+
},
|
| 10869 |
+
{
|
| 10870 |
+
"epoch": 0.07958974358974359,
|
| 10871 |
+
"grad_norm": 0.07275390625,
|
| 10872 |
+
"learning_rate": 0.005,
|
| 10873 |
+
"loss": 2.439016819000244,
|
| 10874 |
+
"step": 3104
|
| 10875 |
+
},
|
| 10876 |
+
{
|
| 10877 |
+
"epoch": 0.07964102564102564,
|
| 10878 |
+
"grad_norm": 0.078125,
|
| 10879 |
+
"learning_rate": 0.005,
|
| 10880 |
+
"loss": 2.47418212890625,
|
| 10881 |
+
"step": 3106
|
| 10882 |
+
},
|
| 10883 |
+
{
|
| 10884 |
+
"epoch": 0.07969230769230769,
|
| 10885 |
+
"grad_norm": 0.078125,
|
| 10886 |
+
"learning_rate": 0.005,
|
| 10887 |
+
"loss": 2.4616634845733643,
|
| 10888 |
+
"step": 3108
|
| 10889 |
+
},
|
| 10890 |
+
{
|
| 10891 |
+
"epoch": 0.07974358974358975,
|
| 10892 |
+
"grad_norm": 0.07177734375,
|
| 10893 |
+
"learning_rate": 0.005,
|
| 10894 |
+
"loss": 2.469085931777954,
|
| 10895 |
+
"step": 3110
|
| 10896 |
+
},
|
| 10897 |
+
{
|
| 10898 |
+
"epoch": 0.0797948717948718,
|
| 10899 |
+
"grad_norm": 0.06396484375,
|
| 10900 |
+
"learning_rate": 0.005,
|
| 10901 |
+
"loss": 2.448444128036499,
|
| 10902 |
+
"step": 3112
|
| 10903 |
+
},
|
| 10904 |
+
{
|
| 10905 |
+
"epoch": 0.07984615384615384,
|
| 10906 |
+
"grad_norm": 0.055419921875,
|
| 10907 |
+
"learning_rate": 0.005,
|
| 10908 |
+
"loss": 2.486172914505005,
|
| 10909 |
+
"step": 3114
|
| 10910 |
+
},
|
| 10911 |
+
{
|
| 10912 |
+
"epoch": 0.07989743589743589,
|
| 10913 |
+
"grad_norm": 0.06982421875,
|
| 10914 |
+
"learning_rate": 0.005,
|
| 10915 |
+
"loss": 2.49394154548645,
|
| 10916 |
+
"step": 3116
|
| 10917 |
+
},
|
| 10918 |
+
{
|
| 10919 |
+
"epoch": 0.07994871794871795,
|
| 10920 |
+
"grad_norm": 0.0849609375,
|
| 10921 |
+
"learning_rate": 0.005,
|
| 10922 |
+
"loss": 2.46441388130188,
|
| 10923 |
+
"step": 3118
|
| 10924 |
+
},
|
| 10925 |
+
{
|
| 10926 |
+
"epoch": 0.08,
|
| 10927 |
+
"grad_norm": 0.07275390625,
|
| 10928 |
+
"learning_rate": 0.005,
|
| 10929 |
+
"loss": 2.4664108753204346,
|
| 10930 |
+
"step": 3120
|
| 10931 |
+
},
|
| 10932 |
+
{
|
| 10933 |
+
"epoch": 0.08005128205128205,
|
| 10934 |
+
"grad_norm": 0.055419921875,
|
| 10935 |
+
"learning_rate": 0.005,
|
| 10936 |
+
"loss": 2.4777071475982666,
|
| 10937 |
+
"step": 3122
|
| 10938 |
+
},
|
| 10939 |
+
{
|
| 10940 |
+
"epoch": 0.0801025641025641,
|
| 10941 |
+
"grad_norm": 0.05078125,
|
| 10942 |
+
"learning_rate": 0.005,
|
| 10943 |
+
"loss": 2.429591178894043,
|
| 10944 |
+
"step": 3124
|
| 10945 |
+
},
|
| 10946 |
+
{
|
| 10947 |
+
"epoch": 0.08015384615384616,
|
| 10948 |
+
"grad_norm": 0.054931640625,
|
| 10949 |
+
"learning_rate": 0.005,
|
| 10950 |
+
"loss": 2.476600170135498,
|
| 10951 |
+
"step": 3126
|
| 10952 |
+
},
|
| 10953 |
+
{
|
| 10954 |
+
"epoch": 0.0802051282051282,
|
| 10955 |
+
"grad_norm": 0.055419921875,
|
| 10956 |
+
"learning_rate": 0.005,
|
| 10957 |
+
"loss": 2.440669536590576,
|
| 10958 |
+
"step": 3128
|
| 10959 |
+
},
|
| 10960 |
+
{
|
| 10961 |
+
"epoch": 0.08025641025641025,
|
| 10962 |
+
"grad_norm": 0.05029296875,
|
| 10963 |
+
"learning_rate": 0.005,
|
| 10964 |
+
"loss": 2.4569332599639893,
|
| 10965 |
+
"step": 3130
|
| 10966 |
+
},
|
| 10967 |
+
{
|
| 10968 |
+
"epoch": 0.0803076923076923,
|
| 10969 |
+
"grad_norm": 0.06640625,
|
| 10970 |
+
"learning_rate": 0.005,
|
| 10971 |
+
"loss": 2.450248956680298,
|
| 10972 |
+
"step": 3132
|
| 10973 |
+
},
|
| 10974 |
+
{
|
| 10975 |
+
"epoch": 0.08035897435897436,
|
| 10976 |
+
"grad_norm": 0.0634765625,
|
| 10977 |
+
"learning_rate": 0.005,
|
| 10978 |
+
"loss": 2.4523491859436035,
|
| 10979 |
+
"step": 3134
|
| 10980 |
+
},
|
| 10981 |
+
{
|
| 10982 |
+
"epoch": 0.08041025641025641,
|
| 10983 |
+
"grad_norm": 0.06494140625,
|
| 10984 |
+
"learning_rate": 0.005,
|
| 10985 |
+
"loss": 2.4598069190979004,
|
| 10986 |
+
"step": 3136
|
| 10987 |
+
},
|
| 10988 |
+
{
|
| 10989 |
+
"epoch": 0.08046153846153846,
|
| 10990 |
+
"grad_norm": 0.06591796875,
|
| 10991 |
+
"learning_rate": 0.005,
|
| 10992 |
+
"loss": 2.471672773361206,
|
| 10993 |
+
"step": 3138
|
| 10994 |
+
},
|
| 10995 |
+
{
|
| 10996 |
+
"epoch": 0.08051282051282051,
|
| 10997 |
+
"grad_norm": 0.083984375,
|
| 10998 |
+
"learning_rate": 0.005,
|
| 10999 |
+
"loss": 2.4498813152313232,
|
| 11000 |
+
"step": 3140
|
| 11001 |
+
},
|
| 11002 |
+
{
|
| 11003 |
+
"epoch": 0.08056410256410257,
|
| 11004 |
+
"grad_norm": 0.08447265625,
|
| 11005 |
+
"learning_rate": 0.005,
|
| 11006 |
+
"loss": 2.463459014892578,
|
| 11007 |
+
"step": 3142
|
| 11008 |
+
},
|
| 11009 |
+
{
|
| 11010 |
+
"epoch": 0.08061538461538462,
|
| 11011 |
+
"grad_norm": 0.09326171875,
|
| 11012 |
+
"learning_rate": 0.005,
|
| 11013 |
+
"loss": 2.4949872493743896,
|
| 11014 |
+
"step": 3144
|
| 11015 |
+
},
|
| 11016 |
+
{
|
| 11017 |
+
"epoch": 0.08066666666666666,
|
| 11018 |
+
"grad_norm": 0.09033203125,
|
| 11019 |
+
"learning_rate": 0.005,
|
| 11020 |
+
"loss": 2.4474332332611084,
|
| 11021 |
+
"step": 3146
|
| 11022 |
+
},
|
| 11023 |
+
{
|
| 11024 |
+
"epoch": 0.08071794871794871,
|
| 11025 |
+
"grad_norm": 0.07958984375,
|
| 11026 |
+
"learning_rate": 0.005,
|
| 11027 |
+
"loss": 2.4668118953704834,
|
| 11028 |
+
"step": 3148
|
| 11029 |
+
},
|
| 11030 |
+
{
|
| 11031 |
+
"epoch": 0.08076923076923077,
|
| 11032 |
+
"grad_norm": 0.06591796875,
|
| 11033 |
+
"learning_rate": 0.005,
|
| 11034 |
+
"loss": 2.49550199508667,
|
| 11035 |
+
"step": 3150
|
| 11036 |
+
},
|
| 11037 |
+
{
|
| 11038 |
+
"epoch": 0.08082051282051282,
|
| 11039 |
+
"grad_norm": 0.07080078125,
|
| 11040 |
+
"learning_rate": 0.005,
|
| 11041 |
+
"loss": 2.500349521636963,
|
| 11042 |
+
"step": 3152
|
| 11043 |
+
},
|
| 11044 |
+
{
|
| 11045 |
+
"epoch": 0.08087179487179487,
|
| 11046 |
+
"grad_norm": 0.08056640625,
|
| 11047 |
+
"learning_rate": 0.005,
|
| 11048 |
+
"loss": 2.4780385494232178,
|
| 11049 |
+
"step": 3154
|
| 11050 |
+
},
|
| 11051 |
+
{
|
| 11052 |
+
"epoch": 0.08092307692307692,
|
| 11053 |
+
"grad_norm": 0.07861328125,
|
| 11054 |
+
"learning_rate": 0.005,
|
| 11055 |
+
"loss": 2.4524781703948975,
|
| 11056 |
+
"step": 3156
|
| 11057 |
+
},
|
| 11058 |
+
{
|
| 11059 |
+
"epoch": 0.08097435897435898,
|
| 11060 |
+
"grad_norm": 0.0712890625,
|
| 11061 |
+
"learning_rate": 0.005,
|
| 11062 |
+
"loss": 2.4667885303497314,
|
| 11063 |
+
"step": 3158
|
| 11064 |
+
},
|
| 11065 |
+
{
|
| 11066 |
+
"epoch": 0.08102564102564103,
|
| 11067 |
+
"grad_norm": 0.08203125,
|
| 11068 |
+
"learning_rate": 0.005,
|
| 11069 |
+
"loss": 2.470641851425171,
|
| 11070 |
+
"step": 3160
|
| 11071 |
+
},
|
| 11072 |
+
{
|
| 11073 |
+
"epoch": 0.08107692307692307,
|
| 11074 |
+
"grad_norm": 0.06689453125,
|
| 11075 |
+
"learning_rate": 0.005,
|
| 11076 |
+
"loss": 2.4619507789611816,
|
| 11077 |
+
"step": 3162
|
| 11078 |
+
},
|
| 11079 |
+
{
|
| 11080 |
+
"epoch": 0.08112820512820512,
|
| 11081 |
+
"grad_norm": 0.0751953125,
|
| 11082 |
+
"learning_rate": 0.005,
|
| 11083 |
+
"loss": 2.461238145828247,
|
| 11084 |
+
"step": 3164
|
| 11085 |
+
},
|
| 11086 |
+
{
|
| 11087 |
+
"epoch": 0.08117948717948718,
|
| 11088 |
+
"grad_norm": 0.0712890625,
|
| 11089 |
+
"learning_rate": 0.005,
|
| 11090 |
+
"loss": 2.472080707550049,
|
| 11091 |
+
"step": 3166
|
| 11092 |
+
},
|
| 11093 |
+
{
|
| 11094 |
+
"epoch": 0.08123076923076923,
|
| 11095 |
+
"grad_norm": 0.056884765625,
|
| 11096 |
+
"learning_rate": 0.005,
|
| 11097 |
+
"loss": 2.47847843170166,
|
| 11098 |
+
"step": 3168
|
| 11099 |
+
},
|
| 11100 |
+
{
|
| 11101 |
+
"epoch": 0.08128205128205128,
|
| 11102 |
+
"grad_norm": 0.06005859375,
|
| 11103 |
+
"learning_rate": 0.005,
|
| 11104 |
+
"loss": 2.463280439376831,
|
| 11105 |
+
"step": 3170
|
| 11106 |
+
},
|
| 11107 |
+
{
|
| 11108 |
+
"epoch": 0.08133333333333333,
|
| 11109 |
+
"grad_norm": 0.05810546875,
|
| 11110 |
+
"learning_rate": 0.005,
|
| 11111 |
+
"loss": 2.4507548809051514,
|
| 11112 |
+
"step": 3172
|
| 11113 |
+
},
|
| 11114 |
+
{
|
| 11115 |
+
"epoch": 0.08138461538461539,
|
| 11116 |
+
"grad_norm": 0.05908203125,
|
| 11117 |
+
"learning_rate": 0.005,
|
| 11118 |
+
"loss": 2.453247308731079,
|
| 11119 |
+
"step": 3174
|
| 11120 |
+
},
|
| 11121 |
+
{
|
| 11122 |
+
"epoch": 0.08143589743589744,
|
| 11123 |
+
"grad_norm": 0.0771484375,
|
| 11124 |
+
"learning_rate": 0.005,
|
| 11125 |
+
"loss": 2.452207088470459,
|
| 11126 |
+
"step": 3176
|
| 11127 |
+
},
|
| 11128 |
+
{
|
| 11129 |
+
"epoch": 0.08148717948717948,
|
| 11130 |
+
"grad_norm": 0.0771484375,
|
| 11131 |
+
"learning_rate": 0.005,
|
| 11132 |
+
"loss": 2.459747552871704,
|
| 11133 |
+
"step": 3178
|
| 11134 |
+
},
|
| 11135 |
+
{
|
| 11136 |
+
"epoch": 0.08153846153846153,
|
| 11137 |
+
"grad_norm": 0.0869140625,
|
| 11138 |
+
"learning_rate": 0.005,
|
| 11139 |
+
"loss": 2.458364963531494,
|
| 11140 |
+
"step": 3180
|
| 11141 |
+
},
|
| 11142 |
+
{
|
| 11143 |
+
"epoch": 0.0815897435897436,
|
| 11144 |
+
"grad_norm": 0.10693359375,
|
| 11145 |
+
"learning_rate": 0.005,
|
| 11146 |
+
"loss": 2.4583213329315186,
|
| 11147 |
+
"step": 3182
|
| 11148 |
+
},
|
| 11149 |
+
{
|
| 11150 |
+
"epoch": 0.08164102564102564,
|
| 11151 |
+
"grad_norm": 0.0625,
|
| 11152 |
+
"learning_rate": 0.005,
|
| 11153 |
+
"loss": 2.46773099899292,
|
| 11154 |
+
"step": 3184
|
| 11155 |
+
},
|
| 11156 |
+
{
|
| 11157 |
+
"epoch": 0.08169230769230769,
|
| 11158 |
+
"grad_norm": 0.052734375,
|
| 11159 |
+
"learning_rate": 0.005,
|
| 11160 |
+
"loss": 2.478337526321411,
|
| 11161 |
+
"step": 3186
|
| 11162 |
+
},
|
| 11163 |
+
{
|
| 11164 |
+
"epoch": 0.08174358974358974,
|
| 11165 |
+
"grad_norm": 0.0615234375,
|
| 11166 |
+
"learning_rate": 0.005,
|
| 11167 |
+
"loss": 2.473055839538574,
|
| 11168 |
+
"step": 3188
|
| 11169 |
+
},
|
| 11170 |
+
{
|
| 11171 |
+
"epoch": 0.0817948717948718,
|
| 11172 |
+
"grad_norm": 0.07373046875,
|
| 11173 |
+
"learning_rate": 0.005,
|
| 11174 |
+
"loss": 2.458576202392578,
|
| 11175 |
+
"step": 3190
|
| 11176 |
+
},
|
| 11177 |
+
{
|
| 11178 |
+
"epoch": 0.08184615384615385,
|
| 11179 |
+
"grad_norm": 0.08349609375,
|
| 11180 |
+
"learning_rate": 0.005,
|
| 11181 |
+
"loss": 2.493762969970703,
|
| 11182 |
+
"step": 3192
|
| 11183 |
+
},
|
| 11184 |
+
{
|
| 11185 |
+
"epoch": 0.0818974358974359,
|
| 11186 |
+
"grad_norm": 0.0830078125,
|
| 11187 |
+
"learning_rate": 0.005,
|
| 11188 |
+
"loss": 2.465613842010498,
|
| 11189 |
+
"step": 3194
|
| 11190 |
+
},
|
| 11191 |
+
{
|
| 11192 |
+
"epoch": 0.08194871794871794,
|
| 11193 |
+
"grad_norm": 0.08935546875,
|
| 11194 |
+
"learning_rate": 0.005,
|
| 11195 |
+
"loss": 2.4811904430389404,
|
| 11196 |
+
"step": 3196
|
| 11197 |
+
},
|
| 11198 |
+
{
|
| 11199 |
+
"epoch": 0.082,
|
| 11200 |
+
"grad_norm": 0.12109375,
|
| 11201 |
+
"learning_rate": 0.005,
|
| 11202 |
+
"loss": 2.4380619525909424,
|
| 11203 |
+
"step": 3198
|
| 11204 |
+
},
|
| 11205 |
+
{
|
| 11206 |
+
"epoch": 0.08205128205128205,
|
| 11207 |
+
"grad_norm": 0.0927734375,
|
| 11208 |
+
"learning_rate": 0.005,
|
| 11209 |
+
"loss": 2.4393973350524902,
|
| 11210 |
+
"step": 3200
|
| 11211 |
+
},
|
| 11212 |
+
{
|
| 11213 |
+
"epoch": 0.0821025641025641,
|
| 11214 |
+
"grad_norm": 0.06494140625,
|
| 11215 |
+
"learning_rate": 0.005,
|
| 11216 |
+
"loss": 2.4635913372039795,
|
| 11217 |
+
"step": 3202
|
| 11218 |
+
},
|
| 11219 |
+
{
|
| 11220 |
+
"epoch": 0.08215384615384616,
|
| 11221 |
+
"grad_norm": 0.07763671875,
|
| 11222 |
+
"learning_rate": 0.005,
|
| 11223 |
+
"loss": 2.450014352798462,
|
| 11224 |
+
"step": 3204
|
| 11225 |
+
},
|
| 11226 |
+
{
|
| 11227 |
+
"epoch": 0.08220512820512821,
|
| 11228 |
+
"grad_norm": 0.0673828125,
|
| 11229 |
+
"learning_rate": 0.005,
|
| 11230 |
+
"loss": 2.4509496688842773,
|
| 11231 |
+
"step": 3206
|
| 11232 |
+
},
|
| 11233 |
+
{
|
| 11234 |
+
"epoch": 0.08225641025641026,
|
| 11235 |
+
"grad_norm": 0.072265625,
|
| 11236 |
+
"learning_rate": 0.005,
|
| 11237 |
+
"loss": 2.4603123664855957,
|
| 11238 |
+
"step": 3208
|
| 11239 |
+
},
|
| 11240 |
+
{
|
| 11241 |
+
"epoch": 0.0823076923076923,
|
| 11242 |
+
"grad_norm": 0.052001953125,
|
| 11243 |
+
"learning_rate": 0.005,
|
| 11244 |
+
"loss": 2.4570529460906982,
|
| 11245 |
+
"step": 3210
|
| 11246 |
+
},
|
| 11247 |
+
{
|
| 11248 |
+
"epoch": 0.08235897435897437,
|
| 11249 |
+
"grad_norm": 0.056884765625,
|
| 11250 |
+
"learning_rate": 0.005,
|
| 11251 |
+
"loss": 2.472304105758667,
|
| 11252 |
+
"step": 3212
|
| 11253 |
+
},
|
| 11254 |
+
{
|
| 11255 |
+
"epoch": 0.08241025641025641,
|
| 11256 |
+
"grad_norm": 0.05517578125,
|
| 11257 |
+
"learning_rate": 0.005,
|
| 11258 |
+
"loss": 2.4439167976379395,
|
| 11259 |
+
"step": 3214
|
| 11260 |
+
},
|
| 11261 |
+
{
|
| 11262 |
+
"epoch": 0.08246153846153846,
|
| 11263 |
+
"grad_norm": 0.0615234375,
|
| 11264 |
+
"learning_rate": 0.005,
|
| 11265 |
+
"loss": 2.4510605335235596,
|
| 11266 |
+
"step": 3216
|
| 11267 |
+
},
|
| 11268 |
+
{
|
| 11269 |
+
"epoch": 0.08251282051282051,
|
| 11270 |
+
"grad_norm": 0.06298828125,
|
| 11271 |
+
"learning_rate": 0.005,
|
| 11272 |
+
"loss": 2.4902453422546387,
|
| 11273 |
+
"step": 3218
|
| 11274 |
+
},
|
| 11275 |
+
{
|
| 11276 |
+
"epoch": 0.08256410256410257,
|
| 11277 |
+
"grad_norm": 0.059814453125,
|
| 11278 |
+
"learning_rate": 0.005,
|
| 11279 |
+
"loss": 2.43049693107605,
|
| 11280 |
+
"step": 3220
|
| 11281 |
+
},
|
| 11282 |
+
{
|
| 11283 |
+
"epoch": 0.08261538461538462,
|
| 11284 |
+
"grad_norm": 0.059814453125,
|
| 11285 |
+
"learning_rate": 0.005,
|
| 11286 |
+
"loss": 2.437856912612915,
|
| 11287 |
+
"step": 3222
|
| 11288 |
+
},
|
| 11289 |
+
{
|
| 11290 |
+
"epoch": 0.08266666666666667,
|
| 11291 |
+
"grad_norm": 0.06689453125,
|
| 11292 |
+
"learning_rate": 0.005,
|
| 11293 |
+
"loss": 2.4781556129455566,
|
| 11294 |
+
"step": 3224
|
| 11295 |
+
},
|
| 11296 |
+
{
|
| 11297 |
+
"epoch": 0.08271794871794871,
|
| 11298 |
+
"grad_norm": 0.0732421875,
|
| 11299 |
+
"learning_rate": 0.005,
|
| 11300 |
+
"loss": 2.453319787979126,
|
| 11301 |
+
"step": 3226
|
| 11302 |
+
},
|
| 11303 |
+
{
|
| 11304 |
+
"epoch": 0.08276923076923078,
|
| 11305 |
+
"grad_norm": 0.09765625,
|
| 11306 |
+
"learning_rate": 0.005,
|
| 11307 |
+
"loss": 2.4623143672943115,
|
| 11308 |
+
"step": 3228
|
| 11309 |
+
},
|
| 11310 |
+
{
|
| 11311 |
+
"epoch": 0.08282051282051282,
|
| 11312 |
+
"grad_norm": 0.08984375,
|
| 11313 |
+
"learning_rate": 0.005,
|
| 11314 |
+
"loss": 2.4553592205047607,
|
| 11315 |
+
"step": 3230
|
| 11316 |
+
},
|
| 11317 |
+
{
|
| 11318 |
+
"epoch": 0.08287179487179487,
|
| 11319 |
+
"grad_norm": 0.095703125,
|
| 11320 |
+
"learning_rate": 0.005,
|
| 11321 |
+
"loss": 2.4645016193389893,
|
| 11322 |
+
"step": 3232
|
| 11323 |
+
},
|
| 11324 |
+
{
|
| 11325 |
+
"epoch": 0.08292307692307692,
|
| 11326 |
+
"grad_norm": 0.06103515625,
|
| 11327 |
+
"learning_rate": 0.005,
|
| 11328 |
+
"loss": 2.4372928142547607,
|
| 11329 |
+
"step": 3234
|
| 11330 |
+
},
|
| 11331 |
+
{
|
| 11332 |
+
"epoch": 0.08297435897435898,
|
| 11333 |
+
"grad_norm": 0.05517578125,
|
| 11334 |
+
"learning_rate": 0.005,
|
| 11335 |
+
"loss": 2.4499881267547607,
|
| 11336 |
+
"step": 3236
|
| 11337 |
+
},
|
| 11338 |
+
{
|
| 11339 |
+
"epoch": 0.08302564102564103,
|
| 11340 |
+
"grad_norm": 0.056884765625,
|
| 11341 |
+
"learning_rate": 0.005,
|
| 11342 |
+
"loss": 2.457284450531006,
|
| 11343 |
+
"step": 3238
|
| 11344 |
+
},
|
| 11345 |
+
{
|
| 11346 |
+
"epoch": 0.08307692307692308,
|
| 11347 |
+
"grad_norm": 0.07080078125,
|
| 11348 |
+
"learning_rate": 0.005,
|
| 11349 |
+
"loss": 2.4553401470184326,
|
| 11350 |
+
"step": 3240
|
| 11351 |
+
},
|
| 11352 |
+
{
|
| 11353 |
+
"epoch": 0.08312820512820512,
|
| 11354 |
+
"grad_norm": 0.058837890625,
|
| 11355 |
+
"learning_rate": 0.005,
|
| 11356 |
+
"loss": 2.4617791175842285,
|
| 11357 |
+
"step": 3242
|
| 11358 |
+
},
|
| 11359 |
+
{
|
| 11360 |
+
"epoch": 0.08317948717948719,
|
| 11361 |
+
"grad_norm": 0.06640625,
|
| 11362 |
+
"learning_rate": 0.005,
|
| 11363 |
+
"loss": 2.4616172313690186,
|
| 11364 |
+
"step": 3244
|
| 11365 |
+
},
|
| 11366 |
+
{
|
| 11367 |
+
"epoch": 0.08323076923076923,
|
| 11368 |
+
"grad_norm": 0.056396484375,
|
| 11369 |
+
"learning_rate": 0.005,
|
| 11370 |
+
"loss": 2.452378749847412,
|
| 11371 |
+
"step": 3246
|
| 11372 |
+
},
|
| 11373 |
+
{
|
| 11374 |
+
"epoch": 0.08328205128205128,
|
| 11375 |
+
"grad_norm": 0.058349609375,
|
| 11376 |
+
"learning_rate": 0.005,
|
| 11377 |
+
"loss": 2.435415506362915,
|
| 11378 |
+
"step": 3248
|
| 11379 |
+
},
|
| 11380 |
+
{
|
| 11381 |
+
"epoch": 0.08333333333333333,
|
| 11382 |
+
"grad_norm": 0.0693359375,
|
| 11383 |
+
"learning_rate": 0.005,
|
| 11384 |
+
"loss": 2.4674150943756104,
|
| 11385 |
+
"step": 3250
|
| 11386 |
+
},
|
| 11387 |
+
{
|
| 11388 |
+
"epoch": 0.08338461538461539,
|
| 11389 |
+
"grad_norm": 0.0703125,
|
| 11390 |
+
"learning_rate": 0.005,
|
| 11391 |
+
"loss": 2.43673038482666,
|
| 11392 |
+
"step": 3252
|
| 11393 |
+
},
|
| 11394 |
+
{
|
| 11395 |
+
"epoch": 0.08343589743589744,
|
| 11396 |
+
"grad_norm": 0.06787109375,
|
| 11397 |
+
"learning_rate": 0.005,
|
| 11398 |
+
"loss": 2.461371898651123,
|
| 11399 |
+
"step": 3254
|
| 11400 |
+
},
|
| 11401 |
+
{
|
| 11402 |
+
"epoch": 0.08348717948717949,
|
| 11403 |
+
"grad_norm": 0.0595703125,
|
| 11404 |
+
"learning_rate": 0.005,
|
| 11405 |
+
"loss": 2.422619104385376,
|
| 11406 |
+
"step": 3256
|
| 11407 |
+
},
|
| 11408 |
+
{
|
| 11409 |
+
"epoch": 0.08353846153846153,
|
| 11410 |
+
"grad_norm": 0.07568359375,
|
| 11411 |
+
"learning_rate": 0.005,
|
| 11412 |
+
"loss": 2.447129249572754,
|
| 11413 |
+
"step": 3258
|
| 11414 |
+
},
|
| 11415 |
+
{
|
| 11416 |
+
"epoch": 0.0835897435897436,
|
| 11417 |
+
"grad_norm": 0.07861328125,
|
| 11418 |
+
"learning_rate": 0.005,
|
| 11419 |
+
"loss": 2.4419734477996826,
|
| 11420 |
+
"step": 3260
|
| 11421 |
+
},
|
| 11422 |
+
{
|
| 11423 |
+
"epoch": 0.08364102564102564,
|
| 11424 |
+
"grad_norm": 0.07470703125,
|
| 11425 |
+
"learning_rate": 0.005,
|
| 11426 |
+
"loss": 2.4336538314819336,
|
| 11427 |
+
"step": 3262
|
| 11428 |
+
},
|
| 11429 |
+
{
|
| 11430 |
+
"epoch": 0.08369230769230769,
|
| 11431 |
+
"grad_norm": 0.06494140625,
|
| 11432 |
+
"learning_rate": 0.005,
|
| 11433 |
+
"loss": 2.436880111694336,
|
| 11434 |
+
"step": 3264
|
| 11435 |
+
},
|
| 11436 |
+
{
|
| 11437 |
+
"epoch": 0.08374358974358974,
|
| 11438 |
+
"grad_norm": 0.0654296875,
|
| 11439 |
+
"learning_rate": 0.005,
|
| 11440 |
+
"loss": 2.430910587310791,
|
| 11441 |
+
"step": 3266
|
| 11442 |
+
},
|
| 11443 |
+
{
|
| 11444 |
+
"epoch": 0.0837948717948718,
|
| 11445 |
+
"grad_norm": 0.05224609375,
|
| 11446 |
+
"learning_rate": 0.005,
|
| 11447 |
+
"loss": 2.4354732036590576,
|
| 11448 |
+
"step": 3268
|
| 11449 |
+
},
|
| 11450 |
+
{
|
| 11451 |
+
"epoch": 0.08384615384615385,
|
| 11452 |
+
"grad_norm": 0.06982421875,
|
| 11453 |
+
"learning_rate": 0.005,
|
| 11454 |
+
"loss": 2.4384119510650635,
|
| 11455 |
+
"step": 3270
|
| 11456 |
+
},
|
| 11457 |
+
{
|
| 11458 |
+
"epoch": 0.0838974358974359,
|
| 11459 |
+
"grad_norm": 0.0927734375,
|
| 11460 |
+
"learning_rate": 0.005,
|
| 11461 |
+
"loss": 2.4623165130615234,
|
| 11462 |
+
"step": 3272
|
| 11463 |
+
},
|
| 11464 |
+
{
|
| 11465 |
+
"epoch": 0.08394871794871794,
|
| 11466 |
+
"grad_norm": 0.0888671875,
|
| 11467 |
+
"learning_rate": 0.005,
|
| 11468 |
+
"loss": 2.45418643951416,
|
| 11469 |
+
"step": 3274
|
| 11470 |
+
},
|
| 11471 |
+
{
|
| 11472 |
+
"epoch": 0.084,
|
| 11473 |
+
"grad_norm": 0.083984375,
|
| 11474 |
+
"learning_rate": 0.005,
|
| 11475 |
+
"loss": 2.4397518634796143,
|
| 11476 |
+
"step": 3276
|
| 11477 |
+
},
|
| 11478 |
+
{
|
| 11479 |
+
"epoch": 0.08405128205128205,
|
| 11480 |
+
"grad_norm": 0.0595703125,
|
| 11481 |
+
"learning_rate": 0.005,
|
| 11482 |
+
"loss": 2.4341769218444824,
|
| 11483 |
+
"step": 3278
|
| 11484 |
+
},
|
| 11485 |
+
{
|
| 11486 |
+
"epoch": 0.0841025641025641,
|
| 11487 |
+
"grad_norm": 0.05517578125,
|
| 11488 |
+
"learning_rate": 0.005,
|
| 11489 |
+
"loss": 2.4285624027252197,
|
| 11490 |
+
"step": 3280
|
| 11491 |
+
},
|
| 11492 |
+
{
|
| 11493 |
+
"epoch": 0.08415384615384615,
|
| 11494 |
+
"grad_norm": 0.05712890625,
|
| 11495 |
+
"learning_rate": 0.005,
|
| 11496 |
+
"loss": 2.4526960849761963,
|
| 11497 |
+
"step": 3282
|
| 11498 |
+
},
|
| 11499 |
+
{
|
| 11500 |
+
"epoch": 0.08420512820512821,
|
| 11501 |
+
"grad_norm": 0.06396484375,
|
| 11502 |
+
"learning_rate": 0.005,
|
| 11503 |
+
"loss": 2.416841983795166,
|
| 11504 |
+
"step": 3284
|
| 11505 |
+
},
|
| 11506 |
+
{
|
| 11507 |
+
"epoch": 0.08425641025641026,
|
| 11508 |
+
"grad_norm": 0.0654296875,
|
| 11509 |
+
"learning_rate": 0.005,
|
| 11510 |
+
"loss": 2.4716298580169678,
|
| 11511 |
+
"step": 3286
|
| 11512 |
+
},
|
| 11513 |
+
{
|
| 11514 |
+
"epoch": 0.0843076923076923,
|
| 11515 |
+
"grad_norm": 0.0673828125,
|
| 11516 |
+
"learning_rate": 0.005,
|
| 11517 |
+
"loss": 2.440152406692505,
|
| 11518 |
+
"step": 3288
|
| 11519 |
+
},
|
| 11520 |
+
{
|
| 11521 |
+
"epoch": 0.08435897435897435,
|
| 11522 |
+
"grad_norm": 0.07421875,
|
| 11523 |
+
"learning_rate": 0.005,
|
| 11524 |
+
"loss": 2.4362053871154785,
|
| 11525 |
+
"step": 3290
|
| 11526 |
+
},
|
| 11527 |
+
{
|
| 11528 |
+
"epoch": 0.08441025641025642,
|
| 11529 |
+
"grad_norm": 0.06494140625,
|
| 11530 |
+
"learning_rate": 0.005,
|
| 11531 |
+
"loss": 2.4483258724212646,
|
| 11532 |
+
"step": 3292
|
| 11533 |
+
},
|
| 11534 |
+
{
|
| 11535 |
+
"epoch": 0.08446153846153846,
|
| 11536 |
+
"grad_norm": 0.057861328125,
|
| 11537 |
+
"learning_rate": 0.005,
|
| 11538 |
+
"loss": 2.4394514560699463,
|
| 11539 |
+
"step": 3294
|
| 11540 |
+
},
|
| 11541 |
+
{
|
| 11542 |
+
"epoch": 0.08451282051282051,
|
| 11543 |
+
"grad_norm": 0.060302734375,
|
| 11544 |
+
"learning_rate": 0.005,
|
| 11545 |
+
"loss": 2.436565637588501,
|
| 11546 |
+
"step": 3296
|
| 11547 |
+
},
|
| 11548 |
+
{
|
| 11549 |
+
"epoch": 0.08456410256410256,
|
| 11550 |
+
"grad_norm": 0.05224609375,
|
| 11551 |
+
"learning_rate": 0.005,
|
| 11552 |
+
"loss": 2.4596211910247803,
|
| 11553 |
+
"step": 3298
|
| 11554 |
+
},
|
| 11555 |
+
{
|
| 11556 |
+
"epoch": 0.08461538461538462,
|
| 11557 |
+
"grad_norm": 0.055908203125,
|
| 11558 |
+
"learning_rate": 0.005,
|
| 11559 |
+
"loss": 2.433617115020752,
|
| 11560 |
+
"step": 3300
|
| 11561 |
+
},
|
| 11562 |
+
{
|
| 11563 |
+
"epoch": 0.08466666666666667,
|
| 11564 |
+
"grad_norm": 0.059814453125,
|
| 11565 |
+
"learning_rate": 0.005,
|
| 11566 |
+
"loss": 2.468733072280884,
|
| 11567 |
+
"step": 3302
|
| 11568 |
+
},
|
| 11569 |
+
{
|
| 11570 |
+
"epoch": 0.08471794871794872,
|
| 11571 |
+
"grad_norm": 0.0546875,
|
| 11572 |
+
"learning_rate": 0.005,
|
| 11573 |
+
"loss": 2.4226980209350586,
|
| 11574 |
+
"step": 3304
|
| 11575 |
+
},
|
| 11576 |
+
{
|
| 11577 |
+
"epoch": 0.08476923076923076,
|
| 11578 |
+
"grad_norm": 0.062255859375,
|
| 11579 |
+
"learning_rate": 0.005,
|
| 11580 |
+
"loss": 2.425107002258301,
|
| 11581 |
+
"step": 3306
|
| 11582 |
+
},
|
| 11583 |
+
{
|
| 11584 |
+
"epoch": 0.08482051282051283,
|
| 11585 |
+
"grad_norm": 0.0654296875,
|
| 11586 |
+
"learning_rate": 0.005,
|
| 11587 |
+
"loss": 2.4202611446380615,
|
| 11588 |
+
"step": 3308
|
| 11589 |
+
},
|
| 11590 |
+
{
|
| 11591 |
+
"epoch": 0.08487179487179487,
|
| 11592 |
+
"grad_norm": 0.076171875,
|
| 11593 |
+
"learning_rate": 0.005,
|
| 11594 |
+
"loss": 2.414445638656616,
|
| 11595 |
+
"step": 3310
|
| 11596 |
+
},
|
| 11597 |
+
{
|
| 11598 |
+
"epoch": 0.08492307692307692,
|
| 11599 |
+
"grad_norm": 0.08447265625,
|
| 11600 |
+
"learning_rate": 0.005,
|
| 11601 |
+
"loss": 2.4504406452178955,
|
| 11602 |
+
"step": 3312
|
| 11603 |
+
},
|
| 11604 |
+
{
|
| 11605 |
+
"epoch": 0.08497435897435897,
|
| 11606 |
+
"grad_norm": 0.0732421875,
|
| 11607 |
+
"learning_rate": 0.005,
|
| 11608 |
+
"loss": 2.4341320991516113,
|
| 11609 |
+
"step": 3314
|
| 11610 |
+
},
|
| 11611 |
+
{
|
| 11612 |
+
"epoch": 0.08502564102564103,
|
| 11613 |
+
"grad_norm": 0.0751953125,
|
| 11614 |
+
"learning_rate": 0.005,
|
| 11615 |
+
"loss": 2.4416651725769043,
|
| 11616 |
+
"step": 3316
|
| 11617 |
+
},
|
| 11618 |
+
{
|
| 11619 |
+
"epoch": 0.08507692307692308,
|
| 11620 |
+
"grad_norm": 0.07275390625,
|
| 11621 |
+
"learning_rate": 0.005,
|
| 11622 |
+
"loss": 2.4288077354431152,
|
| 11623 |
+
"step": 3318
|
| 11624 |
+
},
|
| 11625 |
+
{
|
| 11626 |
+
"epoch": 0.08512820512820513,
|
| 11627 |
+
"grad_norm": 0.06787109375,
|
| 11628 |
+
"learning_rate": 0.005,
|
| 11629 |
+
"loss": 2.428485155105591,
|
| 11630 |
+
"step": 3320
|
| 11631 |
+
},
|
| 11632 |
+
{
|
| 11633 |
+
"epoch": 0.08517948717948717,
|
| 11634 |
+
"grad_norm": 0.056640625,
|
| 11635 |
+
"learning_rate": 0.005,
|
| 11636 |
+
"loss": 2.426039218902588,
|
| 11637 |
+
"step": 3322
|
| 11638 |
+
},
|
| 11639 |
+
{
|
| 11640 |
+
"epoch": 0.08523076923076923,
|
| 11641 |
+
"grad_norm": 0.056396484375,
|
| 11642 |
+
"learning_rate": 0.005,
|
| 11643 |
+
"loss": 2.430471420288086,
|
| 11644 |
+
"step": 3324
|
| 11645 |
+
},
|
| 11646 |
+
{
|
| 11647 |
+
"epoch": 0.08528205128205128,
|
| 11648 |
+
"grad_norm": 0.06591796875,
|
| 11649 |
+
"learning_rate": 0.005,
|
| 11650 |
+
"loss": 2.4474740028381348,
|
| 11651 |
+
"step": 3326
|
| 11652 |
+
},
|
| 11653 |
+
{
|
| 11654 |
+
"epoch": 0.08533333333333333,
|
| 11655 |
+
"grad_norm": 0.057861328125,
|
| 11656 |
+
"learning_rate": 0.005,
|
| 11657 |
+
"loss": 2.4308021068573,
|
| 11658 |
+
"step": 3328
|
| 11659 |
+
},
|
| 11660 |
+
{
|
| 11661 |
+
"epoch": 0.08538461538461538,
|
| 11662 |
+
"grad_norm": 0.053466796875,
|
| 11663 |
+
"learning_rate": 0.005,
|
| 11664 |
+
"loss": 2.4249823093414307,
|
| 11665 |
+
"step": 3330
|
| 11666 |
+
},
|
| 11667 |
+
{
|
| 11668 |
+
"epoch": 0.08543589743589744,
|
| 11669 |
+
"grad_norm": 0.05859375,
|
| 11670 |
+
"learning_rate": 0.005,
|
| 11671 |
+
"loss": 2.4435360431671143,
|
| 11672 |
+
"step": 3332
|
| 11673 |
+
},
|
| 11674 |
+
{
|
| 11675 |
+
"epoch": 0.08548717948717949,
|
| 11676 |
+
"grad_norm": 0.062255859375,
|
| 11677 |
+
"learning_rate": 0.005,
|
| 11678 |
+
"loss": 2.397392511367798,
|
| 11679 |
+
"step": 3334
|
| 11680 |
+
},
|
| 11681 |
+
{
|
| 11682 |
+
"epoch": 0.08553846153846154,
|
| 11683 |
+
"grad_norm": 0.0556640625,
|
| 11684 |
+
"learning_rate": 0.005,
|
| 11685 |
+
"loss": 2.416008710861206,
|
| 11686 |
+
"step": 3336
|
| 11687 |
+
},
|
| 11688 |
+
{
|
| 11689 |
+
"epoch": 0.08558974358974358,
|
| 11690 |
+
"grad_norm": 0.05859375,
|
| 11691 |
+
"learning_rate": 0.005,
|
| 11692 |
+
"loss": 2.4334869384765625,
|
| 11693 |
+
"step": 3338
|
| 11694 |
+
},
|
| 11695 |
+
{
|
| 11696 |
+
"epoch": 0.08564102564102564,
|
| 11697 |
+
"grad_norm": 0.07275390625,
|
| 11698 |
+
"learning_rate": 0.005,
|
| 11699 |
+
"loss": 2.4523305892944336,
|
| 11700 |
+
"step": 3340
|
| 11701 |
+
},
|
| 11702 |
+
{
|
| 11703 |
+
"epoch": 0.08569230769230769,
|
| 11704 |
+
"grad_norm": 0.10498046875,
|
| 11705 |
+
"learning_rate": 0.005,
|
| 11706 |
+
"loss": 2.4508707523345947,
|
| 11707 |
+
"step": 3342
|
| 11708 |
+
},
|
| 11709 |
+
{
|
| 11710 |
+
"epoch": 0.08574358974358974,
|
| 11711 |
+
"grad_norm": 0.0595703125,
|
| 11712 |
+
"learning_rate": 0.005,
|
| 11713 |
+
"loss": 2.4011173248291016,
|
| 11714 |
+
"step": 3344
|
| 11715 |
+
},
|
| 11716 |
+
{
|
| 11717 |
+
"epoch": 0.08579487179487179,
|
| 11718 |
+
"grad_norm": 0.06201171875,
|
| 11719 |
+
"learning_rate": 0.005,
|
| 11720 |
+
"loss": 2.4413177967071533,
|
| 11721 |
+
"step": 3346
|
| 11722 |
+
},
|
| 11723 |
+
{
|
| 11724 |
+
"epoch": 0.08584615384615385,
|
| 11725 |
+
"grad_norm": 0.06396484375,
|
| 11726 |
+
"learning_rate": 0.005,
|
| 11727 |
+
"loss": 2.39831280708313,
|
| 11728 |
+
"step": 3348
|
| 11729 |
+
},
|
| 11730 |
+
{
|
| 11731 |
+
"epoch": 0.0858974358974359,
|
| 11732 |
+
"grad_norm": 0.0654296875,
|
| 11733 |
+
"learning_rate": 0.005,
|
| 11734 |
+
"loss": 2.432015895843506,
|
| 11735 |
+
"step": 3350
|
| 11736 |
+
},
|
| 11737 |
+
{
|
| 11738 |
+
"epoch": 0.08594871794871795,
|
| 11739 |
+
"grad_norm": 0.08056640625,
|
| 11740 |
+
"learning_rate": 0.005,
|
| 11741 |
+
"loss": 2.444376230239868,
|
| 11742 |
+
"step": 3352
|
| 11743 |
+
},
|
| 11744 |
+
{
|
| 11745 |
+
"epoch": 0.086,
|
| 11746 |
+
"grad_norm": 0.07763671875,
|
| 11747 |
+
"learning_rate": 0.005,
|
| 11748 |
+
"loss": 2.422244071960449,
|
| 11749 |
+
"step": 3354
|
| 11750 |
+
},
|
| 11751 |
+
{
|
| 11752 |
+
"epoch": 0.08605128205128205,
|
| 11753 |
+
"grad_norm": 0.08642578125,
|
| 11754 |
+
"learning_rate": 0.005,
|
| 11755 |
+
"loss": 2.441168785095215,
|
| 11756 |
+
"step": 3356
|
| 11757 |
+
},
|
| 11758 |
+
{
|
| 11759 |
+
"epoch": 0.0861025641025641,
|
| 11760 |
+
"grad_norm": 0.0859375,
|
| 11761 |
+
"learning_rate": 0.005,
|
| 11762 |
+
"loss": 2.458451271057129,
|
| 11763 |
+
"step": 3358
|
| 11764 |
+
},
|
| 11765 |
+
{
|
| 11766 |
+
"epoch": 0.08615384615384615,
|
| 11767 |
+
"grad_norm": 0.0703125,
|
| 11768 |
+
"learning_rate": 0.005,
|
| 11769 |
+
"loss": 2.439116954803467,
|
| 11770 |
+
"step": 3360
|
| 11771 |
+
},
|
| 11772 |
+
{
|
| 11773 |
+
"epoch": 0.08620512820512821,
|
| 11774 |
+
"grad_norm": 0.0703125,
|
| 11775 |
+
"learning_rate": 0.005,
|
| 11776 |
+
"loss": 2.4420626163482666,
|
| 11777 |
+
"step": 3362
|
| 11778 |
+
},
|
| 11779 |
+
{
|
| 11780 |
+
"epoch": 0.08625641025641026,
|
| 11781 |
+
"grad_norm": 0.078125,
|
| 11782 |
+
"learning_rate": 0.005,
|
| 11783 |
+
"loss": 2.4348840713500977,
|
| 11784 |
+
"step": 3364
|
| 11785 |
+
},
|
| 11786 |
+
{
|
| 11787 |
+
"epoch": 0.08630769230769231,
|
| 11788 |
+
"grad_norm": 0.061767578125,
|
| 11789 |
+
"learning_rate": 0.005,
|
| 11790 |
+
"loss": 2.4551196098327637,
|
| 11791 |
+
"step": 3366
|
| 11792 |
+
},
|
| 11793 |
+
{
|
| 11794 |
+
"epoch": 0.08635897435897436,
|
| 11795 |
+
"grad_norm": 0.064453125,
|
| 11796 |
+
"learning_rate": 0.005,
|
| 11797 |
+
"loss": 2.437046527862549,
|
| 11798 |
+
"step": 3368
|
| 11799 |
+
},
|
| 11800 |
+
{
|
| 11801 |
+
"epoch": 0.08641025641025642,
|
| 11802 |
+
"grad_norm": 0.0576171875,
|
| 11803 |
+
"learning_rate": 0.005,
|
| 11804 |
+
"loss": 2.4317398071289062,
|
| 11805 |
+
"step": 3370
|
| 11806 |
+
},
|
| 11807 |
+
{
|
| 11808 |
+
"epoch": 0.08646153846153846,
|
| 11809 |
+
"grad_norm": 0.06689453125,
|
| 11810 |
+
"learning_rate": 0.005,
|
| 11811 |
+
"loss": 2.41422176361084,
|
| 11812 |
+
"step": 3372
|
| 11813 |
+
},
|
| 11814 |
+
{
|
| 11815 |
+
"epoch": 0.08651282051282051,
|
| 11816 |
+
"grad_norm": 0.05859375,
|
| 11817 |
+
"learning_rate": 0.005,
|
| 11818 |
+
"loss": 2.4424383640289307,
|
| 11819 |
+
"step": 3374
|
| 11820 |
+
},
|
| 11821 |
+
{
|
| 11822 |
+
"epoch": 0.08656410256410256,
|
| 11823 |
+
"grad_norm": 0.05908203125,
|
| 11824 |
+
"learning_rate": 0.005,
|
| 11825 |
+
"loss": 2.4365389347076416,
|
| 11826 |
+
"step": 3376
|
| 11827 |
+
},
|
| 11828 |
+
{
|
| 11829 |
+
"epoch": 0.08661538461538462,
|
| 11830 |
+
"grad_norm": 0.058349609375,
|
| 11831 |
+
"learning_rate": 0.005,
|
| 11832 |
+
"loss": 2.4194977283477783,
|
| 11833 |
+
"step": 3378
|
| 11834 |
+
},
|
| 11835 |
+
{
|
| 11836 |
+
"epoch": 0.08666666666666667,
|
| 11837 |
+
"grad_norm": 0.0634765625,
|
| 11838 |
+
"learning_rate": 0.005,
|
| 11839 |
+
"loss": 2.4611101150512695,
|
| 11840 |
+
"step": 3380
|
| 11841 |
+
},
|
| 11842 |
+
{
|
| 11843 |
+
"epoch": 0.08671794871794872,
|
| 11844 |
+
"grad_norm": 0.0634765625,
|
| 11845 |
+
"learning_rate": 0.005,
|
| 11846 |
+
"loss": 2.422762393951416,
|
| 11847 |
+
"step": 3382
|
| 11848 |
+
},
|
| 11849 |
+
{
|
| 11850 |
+
"epoch": 0.08676923076923077,
|
| 11851 |
+
"grad_norm": 0.059814453125,
|
| 11852 |
+
"learning_rate": 0.005,
|
| 11853 |
+
"loss": 2.4219725131988525,
|
| 11854 |
+
"step": 3384
|
| 11855 |
+
},
|
| 11856 |
+
{
|
| 11857 |
+
"epoch": 0.08682051282051283,
|
| 11858 |
+
"grad_norm": 0.0673828125,
|
| 11859 |
+
"learning_rate": 0.005,
|
| 11860 |
+
"loss": 2.4388206005096436,
|
| 11861 |
+
"step": 3386
|
| 11862 |
+
},
|
| 11863 |
+
{
|
| 11864 |
+
"epoch": 0.08687179487179487,
|
| 11865 |
+
"grad_norm": 0.05615234375,
|
| 11866 |
+
"learning_rate": 0.005,
|
| 11867 |
+
"loss": 2.445829391479492,
|
| 11868 |
+
"step": 3388
|
| 11869 |
+
},
|
| 11870 |
+
{
|
| 11871 |
+
"epoch": 0.08692307692307692,
|
| 11872 |
+
"grad_norm": 0.062255859375,
|
| 11873 |
+
"learning_rate": 0.005,
|
| 11874 |
+
"loss": 2.4283435344696045,
|
| 11875 |
+
"step": 3390
|
| 11876 |
+
},
|
| 11877 |
+
{
|
| 11878 |
+
"epoch": 0.08697435897435897,
|
| 11879 |
+
"grad_norm": 0.08056640625,
|
| 11880 |
+
"learning_rate": 0.005,
|
| 11881 |
+
"loss": 2.4529857635498047,
|
| 11882 |
+
"step": 3392
|
| 11883 |
+
},
|
| 11884 |
+
{
|
| 11885 |
+
"epoch": 0.08702564102564103,
|
| 11886 |
+
"grad_norm": 0.083984375,
|
| 11887 |
+
"learning_rate": 0.005,
|
| 11888 |
+
"loss": 2.397170305252075,
|
| 11889 |
+
"step": 3394
|
| 11890 |
+
},
|
| 11891 |
+
{
|
| 11892 |
+
"epoch": 0.08707692307692308,
|
| 11893 |
+
"grad_norm": 0.09423828125,
|
| 11894 |
+
"learning_rate": 0.005,
|
| 11895 |
+
"loss": 2.4309468269348145,
|
| 11896 |
+
"step": 3396
|
| 11897 |
+
},
|
| 11898 |
+
{
|
| 11899 |
+
"epoch": 0.08712820512820513,
|
| 11900 |
+
"grad_norm": 0.0986328125,
|
| 11901 |
+
"learning_rate": 0.005,
|
| 11902 |
+
"loss": 2.4139363765716553,
|
| 11903 |
+
"step": 3398
|
| 11904 |
+
},
|
| 11905 |
+
{
|
| 11906 |
+
"epoch": 0.08717948717948718,
|
| 11907 |
+
"grad_norm": 0.0888671875,
|
| 11908 |
+
"learning_rate": 0.005,
|
| 11909 |
+
"loss": 2.4378061294555664,
|
| 11910 |
+
"step": 3400
|
| 11911 |
+
},
|
| 11912 |
+
{
|
| 11913 |
+
"epoch": 0.08723076923076924,
|
| 11914 |
+
"grad_norm": 0.07373046875,
|
| 11915 |
+
"learning_rate": 0.005,
|
| 11916 |
+
"loss": 2.3990771770477295,
|
| 11917 |
+
"step": 3402
|
| 11918 |
+
},
|
| 11919 |
+
{
|
| 11920 |
+
"epoch": 0.08728205128205128,
|
| 11921 |
+
"grad_norm": 0.06787109375,
|
| 11922 |
+
"learning_rate": 0.005,
|
| 11923 |
+
"loss": 2.421858549118042,
|
| 11924 |
+
"step": 3404
|
| 11925 |
+
},
|
| 11926 |
+
{
|
| 11927 |
+
"epoch": 0.08733333333333333,
|
| 11928 |
+
"grad_norm": 0.05810546875,
|
| 11929 |
+
"learning_rate": 0.005,
|
| 11930 |
+
"loss": 2.415131092071533,
|
| 11931 |
+
"step": 3406
|
| 11932 |
+
},
|
| 11933 |
+
{
|
| 11934 |
+
"epoch": 0.08738461538461538,
|
| 11935 |
+
"grad_norm": 0.060546875,
|
| 11936 |
+
"learning_rate": 0.005,
|
| 11937 |
+
"loss": 2.4313316345214844,
|
| 11938 |
+
"step": 3408
|
| 11939 |
+
},
|
| 11940 |
+
{
|
| 11941 |
+
"epoch": 0.08743589743589744,
|
| 11942 |
+
"grad_norm": 0.07470703125,
|
| 11943 |
+
"learning_rate": 0.005,
|
| 11944 |
+
"loss": 2.4221608638763428,
|
| 11945 |
+
"step": 3410
|
| 11946 |
+
},
|
| 11947 |
+
{
|
| 11948 |
+
"epoch": 0.08748717948717949,
|
| 11949 |
+
"grad_norm": 0.06982421875,
|
| 11950 |
+
"learning_rate": 0.005,
|
| 11951 |
+
"loss": 2.432955741882324,
|
| 11952 |
+
"step": 3412
|
| 11953 |
+
},
|
| 11954 |
+
{
|
| 11955 |
+
"epoch": 0.08753846153846154,
|
| 11956 |
+
"grad_norm": 0.0966796875,
|
| 11957 |
+
"learning_rate": 0.005,
|
| 11958 |
+
"loss": 2.436786651611328,
|
| 11959 |
+
"step": 3414
|
| 11960 |
+
},
|
| 11961 |
+
{
|
| 11962 |
+
"epoch": 0.08758974358974358,
|
| 11963 |
+
"grad_norm": 0.0810546875,
|
| 11964 |
+
"learning_rate": 0.005,
|
| 11965 |
+
"loss": 2.406233310699463,
|
| 11966 |
+
"step": 3416
|
| 11967 |
+
},
|
| 11968 |
+
{
|
| 11969 |
+
"epoch": 0.08764102564102565,
|
| 11970 |
+
"grad_norm": 0.0625,
|
| 11971 |
+
"learning_rate": 0.005,
|
| 11972 |
+
"loss": 2.410997152328491,
|
| 11973 |
+
"step": 3418
|
| 11974 |
+
},
|
| 11975 |
+
{
|
| 11976 |
+
"epoch": 0.0876923076923077,
|
| 11977 |
+
"grad_norm": 0.0625,
|
| 11978 |
+
"learning_rate": 0.005,
|
| 11979 |
+
"loss": 2.4171364307403564,
|
| 11980 |
+
"step": 3420
|
| 11981 |
+
},
|
| 11982 |
+
{
|
| 11983 |
+
"epoch": 0.08774358974358974,
|
| 11984 |
+
"grad_norm": 0.07373046875,
|
| 11985 |
+
"learning_rate": 0.005,
|
| 11986 |
+
"loss": 2.438688278198242,
|
| 11987 |
+
"step": 3422
|
| 11988 |
+
},
|
| 11989 |
+
{
|
| 11990 |
+
"epoch": 0.08779487179487179,
|
| 11991 |
+
"grad_norm": 0.06689453125,
|
| 11992 |
+
"learning_rate": 0.005,
|
| 11993 |
+
"loss": 2.410836696624756,
|
| 11994 |
+
"step": 3424
|
| 11995 |
+
},
|
| 11996 |
+
{
|
| 11997 |
+
"epoch": 0.08784615384615385,
|
| 11998 |
+
"grad_norm": 0.050537109375,
|
| 11999 |
+
"learning_rate": 0.005,
|
| 12000 |
+
"loss": 2.3841400146484375,
|
| 12001 |
+
"step": 3426
|
| 12002 |
+
},
|
| 12003 |
+
{
|
| 12004 |
+
"epoch": 0.0878974358974359,
|
| 12005 |
+
"grad_norm": 0.055908203125,
|
| 12006 |
+
"learning_rate": 0.005,
|
| 12007 |
+
"loss": 2.380362033843994,
|
| 12008 |
+
"step": 3428
|
| 12009 |
+
},
|
| 12010 |
+
{
|
| 12011 |
+
"epoch": 0.08794871794871795,
|
| 12012 |
+
"grad_norm": 0.060791015625,
|
| 12013 |
+
"learning_rate": 0.005,
|
| 12014 |
+
"loss": 2.4467859268188477,
|
| 12015 |
+
"step": 3430
|
| 12016 |
+
},
|
| 12017 |
+
{
|
| 12018 |
+
"epoch": 0.088,
|
| 12019 |
+
"grad_norm": 0.06201171875,
|
| 12020 |
+
"learning_rate": 0.005,
|
| 12021 |
+
"loss": 2.415534019470215,
|
| 12022 |
+
"step": 3432
|
| 12023 |
+
},
|
| 12024 |
+
{
|
| 12025 |
+
"epoch": 0.08805128205128206,
|
| 12026 |
+
"grad_norm": 0.06787109375,
|
| 12027 |
+
"learning_rate": 0.005,
|
| 12028 |
+
"loss": 2.4545233249664307,
|
| 12029 |
+
"step": 3434
|
| 12030 |
+
},
|
| 12031 |
+
{
|
| 12032 |
+
"epoch": 0.0881025641025641,
|
| 12033 |
+
"grad_norm": 0.0771484375,
|
| 12034 |
+
"learning_rate": 0.005,
|
| 12035 |
+
"loss": 2.381844997406006,
|
| 12036 |
+
"step": 3436
|
| 12037 |
+
},
|
| 12038 |
+
{
|
| 12039 |
+
"epoch": 0.08815384615384615,
|
| 12040 |
+
"grad_norm": 0.09033203125,
|
| 12041 |
+
"learning_rate": 0.005,
|
| 12042 |
+
"loss": 2.4263484477996826,
|
| 12043 |
+
"step": 3438
|
| 12044 |
+
},
|
| 12045 |
+
{
|
| 12046 |
+
"epoch": 0.0882051282051282,
|
| 12047 |
+
"grad_norm": 0.06396484375,
|
| 12048 |
+
"learning_rate": 0.005,
|
| 12049 |
+
"loss": 2.4255788326263428,
|
| 12050 |
+
"step": 3440
|
| 12051 |
+
},
|
| 12052 |
+
{
|
| 12053 |
+
"epoch": 0.08825641025641026,
|
| 12054 |
+
"grad_norm": 0.07080078125,
|
| 12055 |
+
"learning_rate": 0.005,
|
| 12056 |
+
"loss": 2.4330549240112305,
|
| 12057 |
+
"step": 3442
|
| 12058 |
+
},
|
| 12059 |
+
{
|
| 12060 |
+
"epoch": 0.08830769230769231,
|
| 12061 |
+
"grad_norm": 0.0615234375,
|
| 12062 |
+
"learning_rate": 0.005,
|
| 12063 |
+
"loss": 2.402039051055908,
|
| 12064 |
+
"step": 3444
|
| 12065 |
+
},
|
| 12066 |
+
{
|
| 12067 |
+
"epoch": 0.08835897435897436,
|
| 12068 |
+
"grad_norm": 0.0625,
|
| 12069 |
+
"learning_rate": 0.005,
|
| 12070 |
+
"loss": 2.4456162452697754,
|
| 12071 |
+
"step": 3446
|
| 12072 |
+
},
|
| 12073 |
+
{
|
| 12074 |
+
"epoch": 0.0884102564102564,
|
| 12075 |
+
"grad_norm": 0.06884765625,
|
| 12076 |
+
"learning_rate": 0.005,
|
| 12077 |
+
"loss": 2.4226267337799072,
|
| 12078 |
+
"step": 3448
|
| 12079 |
+
},
|
| 12080 |
+
{
|
| 12081 |
+
"epoch": 0.08846153846153847,
|
| 12082 |
+
"grad_norm": 0.0712890625,
|
| 12083 |
+
"learning_rate": 0.005,
|
| 12084 |
+
"loss": 2.441948890686035,
|
| 12085 |
+
"step": 3450
|
| 12086 |
+
},
|
| 12087 |
+
{
|
| 12088 |
+
"epoch": 0.08851282051282051,
|
| 12089 |
+
"grad_norm": 0.064453125,
|
| 12090 |
+
"learning_rate": 0.005,
|
| 12091 |
+
"loss": 2.401826858520508,
|
| 12092 |
+
"step": 3452
|
| 12093 |
+
},
|
| 12094 |
+
{
|
| 12095 |
+
"epoch": 0.08856410256410256,
|
| 12096 |
+
"grad_norm": 0.0625,
|
| 12097 |
+
"learning_rate": 0.005,
|
| 12098 |
+
"loss": 2.436375617980957,
|
| 12099 |
+
"step": 3454
|
| 12100 |
+
},
|
| 12101 |
+
{
|
| 12102 |
+
"epoch": 0.08861538461538461,
|
| 12103 |
+
"grad_norm": 0.060791015625,
|
| 12104 |
+
"learning_rate": 0.005,
|
| 12105 |
+
"loss": 2.412294864654541,
|
| 12106 |
+
"step": 3456
|
| 12107 |
+
},
|
| 12108 |
+
{
|
| 12109 |
+
"epoch": 0.08866666666666667,
|
| 12110 |
+
"grad_norm": 0.07373046875,
|
| 12111 |
+
"learning_rate": 0.005,
|
| 12112 |
+
"loss": 2.4124433994293213,
|
| 12113 |
+
"step": 3458
|
| 12114 |
+
},
|
| 12115 |
+
{
|
| 12116 |
+
"epoch": 0.08871794871794872,
|
| 12117 |
+
"grad_norm": 0.06298828125,
|
| 12118 |
+
"learning_rate": 0.005,
|
| 12119 |
+
"loss": 2.4394173622131348,
|
| 12120 |
+
"step": 3460
|
| 12121 |
+
},
|
| 12122 |
+
{
|
| 12123 |
+
"epoch": 0.08876923076923077,
|
| 12124 |
+
"grad_norm": 0.062255859375,
|
| 12125 |
+
"learning_rate": 0.005,
|
| 12126 |
+
"loss": 2.4201502799987793,
|
| 12127 |
+
"step": 3462
|
| 12128 |
+
},
|
| 12129 |
+
{
|
| 12130 |
+
"epoch": 0.08882051282051281,
|
| 12131 |
+
"grad_norm": 0.05419921875,
|
| 12132 |
+
"learning_rate": 0.005,
|
| 12133 |
+
"loss": 2.398808479309082,
|
| 12134 |
+
"step": 3464
|
| 12135 |
+
},
|
| 12136 |
+
{
|
| 12137 |
+
"epoch": 0.08887179487179488,
|
| 12138 |
+
"grad_norm": 0.056640625,
|
| 12139 |
+
"learning_rate": 0.005,
|
| 12140 |
+
"loss": 2.375814437866211,
|
| 12141 |
+
"step": 3466
|
| 12142 |
+
},
|
| 12143 |
+
{
|
| 12144 |
+
"epoch": 0.08892307692307692,
|
| 12145 |
+
"grad_norm": 0.06494140625,
|
| 12146 |
+
"learning_rate": 0.005,
|
| 12147 |
+
"loss": 2.424652099609375,
|
| 12148 |
+
"step": 3468
|
| 12149 |
+
},
|
| 12150 |
+
{
|
| 12151 |
+
"epoch": 0.08897435897435897,
|
| 12152 |
+
"grad_norm": 0.0615234375,
|
| 12153 |
+
"learning_rate": 0.005,
|
| 12154 |
+
"loss": 2.409891128540039,
|
| 12155 |
+
"step": 3470
|
| 12156 |
+
},
|
| 12157 |
+
{
|
| 12158 |
+
"epoch": 0.08902564102564102,
|
| 12159 |
+
"grad_norm": 0.080078125,
|
| 12160 |
+
"learning_rate": 0.005,
|
| 12161 |
+
"loss": 2.432053565979004,
|
| 12162 |
+
"step": 3472
|
| 12163 |
+
},
|
| 12164 |
+
{
|
| 12165 |
+
"epoch": 0.08907692307692308,
|
| 12166 |
+
"grad_norm": 0.06884765625,
|
| 12167 |
+
"learning_rate": 0.005,
|
| 12168 |
+
"loss": 2.4098362922668457,
|
| 12169 |
+
"step": 3474
|
| 12170 |
+
},
|
| 12171 |
+
{
|
| 12172 |
+
"epoch": 0.08912820512820513,
|
| 12173 |
+
"grad_norm": 0.060302734375,
|
| 12174 |
+
"learning_rate": 0.005,
|
| 12175 |
+
"loss": 2.4270172119140625,
|
| 12176 |
+
"step": 3476
|
| 12177 |
+
},
|
| 12178 |
+
{
|
| 12179 |
+
"epoch": 0.08917948717948718,
|
| 12180 |
+
"grad_norm": 0.06982421875,
|
| 12181 |
+
"learning_rate": 0.005,
|
| 12182 |
+
"loss": 2.4070467948913574,
|
| 12183 |
+
"step": 3478
|
| 12184 |
+
},
|
| 12185 |
+
{
|
| 12186 |
+
"epoch": 0.08923076923076922,
|
| 12187 |
+
"grad_norm": 0.08447265625,
|
| 12188 |
+
"learning_rate": 0.005,
|
| 12189 |
+
"loss": 2.3918356895446777,
|
| 12190 |
+
"step": 3480
|
| 12191 |
+
},
|
| 12192 |
+
{
|
| 12193 |
+
"epoch": 0.08928205128205129,
|
| 12194 |
+
"grad_norm": 0.095703125,
|
| 12195 |
+
"learning_rate": 0.005,
|
| 12196 |
+
"loss": 2.4064207077026367,
|
| 12197 |
+
"step": 3482
|
| 12198 |
+
},
|
| 12199 |
+
{
|
| 12200 |
+
"epoch": 0.08933333333333333,
|
| 12201 |
+
"grad_norm": 0.09130859375,
|
| 12202 |
+
"learning_rate": 0.005,
|
| 12203 |
+
"loss": 2.420017719268799,
|
| 12204 |
+
"step": 3484
|
| 12205 |
+
},
|
| 12206 |
+
{
|
| 12207 |
+
"epoch": 0.08938461538461538,
|
| 12208 |
+
"grad_norm": 0.0869140625,
|
| 12209 |
+
"learning_rate": 0.005,
|
| 12210 |
+
"loss": 2.4369912147521973,
|
| 12211 |
+
"step": 3486
|
| 12212 |
+
},
|
| 12213 |
+
{
|
| 12214 |
+
"epoch": 0.08943589743589743,
|
| 12215 |
+
"grad_norm": 0.0869140625,
|
| 12216 |
+
"learning_rate": 0.005,
|
| 12217 |
+
"loss": 2.427253007888794,
|
| 12218 |
+
"step": 3488
|
| 12219 |
+
},
|
| 12220 |
+
{
|
| 12221 |
+
"epoch": 0.08948717948717949,
|
| 12222 |
+
"grad_norm": 0.07958984375,
|
| 12223 |
+
"learning_rate": 0.005,
|
| 12224 |
+
"loss": 2.39182448387146,
|
| 12225 |
+
"step": 3490
|
| 12226 |
+
},
|
| 12227 |
+
{
|
| 12228 |
+
"epoch": 0.08953846153846154,
|
| 12229 |
+
"grad_norm": 0.080078125,
|
| 12230 |
+
"learning_rate": 0.005,
|
| 12231 |
+
"loss": 2.4094839096069336,
|
| 12232 |
+
"step": 3492
|
| 12233 |
+
},
|
| 12234 |
+
{
|
| 12235 |
+
"epoch": 0.08958974358974359,
|
| 12236 |
+
"grad_norm": 0.0654296875,
|
| 12237 |
+
"learning_rate": 0.005,
|
| 12238 |
+
"loss": 2.438591480255127,
|
| 12239 |
+
"step": 3494
|
| 12240 |
+
},
|
| 12241 |
+
{
|
| 12242 |
+
"epoch": 0.08964102564102563,
|
| 12243 |
+
"grad_norm": 0.0654296875,
|
| 12244 |
+
"learning_rate": 0.005,
|
| 12245 |
+
"loss": 2.42618727684021,
|
| 12246 |
+
"step": 3496
|
| 12247 |
+
},
|
| 12248 |
+
{
|
| 12249 |
+
"epoch": 0.0896923076923077,
|
| 12250 |
+
"grad_norm": 0.08544921875,
|
| 12251 |
+
"learning_rate": 0.005,
|
| 12252 |
+
"loss": 2.382410764694214,
|
| 12253 |
+
"step": 3498
|
| 12254 |
+
},
|
| 12255 |
+
{
|
| 12256 |
+
"epoch": 0.08974358974358974,
|
| 12257 |
+
"grad_norm": 0.08544921875,
|
| 12258 |
+
"learning_rate": 0.005,
|
| 12259 |
+
"loss": 2.4329473972320557,
|
| 12260 |
+
"step": 3500
|
| 12261 |
+
},
|
| 12262 |
+
{
|
| 12263 |
+
"epoch": 0.08979487179487179,
|
| 12264 |
+
"grad_norm": 0.0830078125,
|
| 12265 |
+
"learning_rate": 0.005,
|
| 12266 |
+
"loss": 2.388413429260254,
|
| 12267 |
+
"step": 3502
|
| 12268 |
+
},
|
| 12269 |
+
{
|
| 12270 |
+
"epoch": 0.08984615384615384,
|
| 12271 |
+
"grad_norm": 0.08837890625,
|
| 12272 |
+
"learning_rate": 0.005,
|
| 12273 |
+
"loss": 2.4049899578094482,
|
| 12274 |
+
"step": 3504
|
| 12275 |
+
},
|
| 12276 |
+
{
|
| 12277 |
+
"epoch": 0.0898974358974359,
|
| 12278 |
+
"grad_norm": 0.111328125,
|
| 12279 |
+
"learning_rate": 0.005,
|
| 12280 |
+
"loss": 2.437688112258911,
|
| 12281 |
+
"step": 3506
|
| 12282 |
+
},
|
| 12283 |
+
{
|
| 12284 |
+
"epoch": 0.08994871794871795,
|
| 12285 |
+
"grad_norm": 0.08447265625,
|
| 12286 |
+
"learning_rate": 0.005,
|
| 12287 |
+
"loss": 2.3892924785614014,
|
| 12288 |
+
"step": 3508
|
| 12289 |
+
},
|
| 12290 |
+
{
|
| 12291 |
+
"epoch": 0.09,
|
| 12292 |
+
"grad_norm": 0.08056640625,
|
| 12293 |
+
"learning_rate": 0.005,
|
| 12294 |
+
"loss": 2.4080541133880615,
|
| 12295 |
+
"step": 3510
|
| 12296 |
+
},
|
| 12297 |
+
{
|
| 12298 |
+
"epoch": 0.09005128205128204,
|
| 12299 |
+
"grad_norm": 0.07177734375,
|
| 12300 |
+
"learning_rate": 0.005,
|
| 12301 |
+
"loss": 2.4130971431732178,
|
| 12302 |
+
"step": 3512
|
| 12303 |
+
},
|
| 12304 |
+
{
|
| 12305 |
+
"epoch": 0.0901025641025641,
|
| 12306 |
+
"grad_norm": 0.0673828125,
|
| 12307 |
+
"learning_rate": 0.005,
|
| 12308 |
+
"loss": 2.427222490310669,
|
| 12309 |
+
"step": 3514
|
| 12310 |
+
},
|
| 12311 |
+
{
|
| 12312 |
+
"epoch": 0.09015384615384615,
|
| 12313 |
+
"grad_norm": 0.057861328125,
|
| 12314 |
+
"learning_rate": 0.005,
|
| 12315 |
+
"loss": 2.400911331176758,
|
| 12316 |
+
"step": 3516
|
| 12317 |
+
},
|
| 12318 |
+
{
|
| 12319 |
+
"epoch": 0.0902051282051282,
|
| 12320 |
+
"grad_norm": 0.06640625,
|
| 12321 |
+
"learning_rate": 0.005,
|
| 12322 |
+
"loss": 2.4189445972442627,
|
| 12323 |
+
"step": 3518
|
| 12324 |
+
},
|
| 12325 |
+
{
|
| 12326 |
+
"epoch": 0.09025641025641026,
|
| 12327 |
+
"grad_norm": 0.06982421875,
|
| 12328 |
+
"learning_rate": 0.005,
|
| 12329 |
+
"loss": 2.3932156562805176,
|
| 12330 |
+
"step": 3520
|
| 12331 |
+
},
|
| 12332 |
+
{
|
| 12333 |
+
"epoch": 0.09030769230769231,
|
| 12334 |
+
"grad_norm": 0.0703125,
|
| 12335 |
+
"learning_rate": 0.005,
|
| 12336 |
+
"loss": 2.4242119789123535,
|
| 12337 |
+
"step": 3522
|
| 12338 |
+
},
|
| 12339 |
+
{
|
| 12340 |
+
"epoch": 0.09035897435897436,
|
| 12341 |
+
"grad_norm": 0.0673828125,
|
| 12342 |
+
"learning_rate": 0.005,
|
| 12343 |
+
"loss": 2.3860199451446533,
|
| 12344 |
+
"step": 3524
|
| 12345 |
+
},
|
| 12346 |
+
{
|
| 12347 |
+
"epoch": 0.0904102564102564,
|
| 12348 |
+
"grad_norm": 0.087890625,
|
| 12349 |
+
"learning_rate": 0.005,
|
| 12350 |
+
"loss": 2.4148826599121094,
|
| 12351 |
+
"step": 3526
|
| 12352 |
+
},
|
| 12353 |
+
{
|
| 12354 |
+
"epoch": 0.09046153846153847,
|
| 12355 |
+
"grad_norm": 0.119140625,
|
| 12356 |
+
"learning_rate": 0.005,
|
| 12357 |
+
"loss": 2.429211378097534,
|
| 12358 |
+
"step": 3528
|
| 12359 |
+
},
|
| 12360 |
+
{
|
| 12361 |
+
"epoch": 0.09051282051282052,
|
| 12362 |
+
"grad_norm": 0.09375,
|
| 12363 |
+
"learning_rate": 0.005,
|
| 12364 |
+
"loss": 2.4366657733917236,
|
| 12365 |
+
"step": 3530
|
| 12366 |
+
},
|
| 12367 |
+
{
|
| 12368 |
+
"epoch": 0.09056410256410256,
|
| 12369 |
+
"grad_norm": 0.06787109375,
|
| 12370 |
+
"learning_rate": 0.005,
|
| 12371 |
+
"loss": 2.383744239807129,
|
| 12372 |
+
"step": 3532
|
| 12373 |
+
},
|
| 12374 |
+
{
|
| 12375 |
+
"epoch": 0.09061538461538461,
|
| 12376 |
+
"grad_norm": 0.05908203125,
|
| 12377 |
+
"learning_rate": 0.005,
|
| 12378 |
+
"loss": 2.4217171669006348,
|
| 12379 |
+
"step": 3534
|
| 12380 |
+
},
|
| 12381 |
+
{
|
| 12382 |
+
"epoch": 0.09066666666666667,
|
| 12383 |
+
"grad_norm": 0.0625,
|
| 12384 |
+
"learning_rate": 0.005,
|
| 12385 |
+
"loss": 2.429645538330078,
|
| 12386 |
+
"step": 3536
|
| 12387 |
+
},
|
| 12388 |
+
{
|
| 12389 |
+
"epoch": 0.09071794871794872,
|
| 12390 |
+
"grad_norm": 0.06787109375,
|
| 12391 |
+
"learning_rate": 0.005,
|
| 12392 |
+
"loss": 2.39499831199646,
|
| 12393 |
+
"step": 3538
|
| 12394 |
+
},
|
| 12395 |
+
{
|
| 12396 |
+
"epoch": 0.09076923076923077,
|
| 12397 |
+
"grad_norm": 0.07666015625,
|
| 12398 |
+
"learning_rate": 0.005,
|
| 12399 |
+
"loss": 2.4183130264282227,
|
| 12400 |
+
"step": 3540
|
| 12401 |
+
},
|
| 12402 |
+
{
|
| 12403 |
+
"epoch": 0.09082051282051282,
|
| 12404 |
+
"grad_norm": 0.061279296875,
|
| 12405 |
+
"learning_rate": 0.005,
|
| 12406 |
+
"loss": 2.4454190731048584,
|
| 12407 |
+
"step": 3542
|
| 12408 |
+
},
|
| 12409 |
+
{
|
| 12410 |
+
"epoch": 0.09087179487179488,
|
| 12411 |
+
"grad_norm": 0.061279296875,
|
| 12412 |
+
"learning_rate": 0.005,
|
| 12413 |
+
"loss": 2.4304327964782715,
|
| 12414 |
+
"step": 3544
|
| 12415 |
+
},
|
| 12416 |
+
{
|
| 12417 |
+
"epoch": 0.09092307692307693,
|
| 12418 |
+
"grad_norm": 0.05615234375,
|
| 12419 |
+
"learning_rate": 0.005,
|
| 12420 |
+
"loss": 2.4137845039367676,
|
| 12421 |
+
"step": 3546
|
| 12422 |
+
},
|
| 12423 |
+
{
|
| 12424 |
+
"epoch": 0.09097435897435897,
|
| 12425 |
+
"grad_norm": 0.06396484375,
|
| 12426 |
+
"learning_rate": 0.005,
|
| 12427 |
+
"loss": 2.448103904724121,
|
| 12428 |
+
"step": 3548
|
| 12429 |
+
},
|
| 12430 |
+
{
|
| 12431 |
+
"epoch": 0.09102564102564102,
|
| 12432 |
+
"grad_norm": 0.0673828125,
|
| 12433 |
+
"learning_rate": 0.005,
|
| 12434 |
+
"loss": 2.3907322883605957,
|
| 12435 |
+
"step": 3550
|
| 12436 |
+
},
|
| 12437 |
+
{
|
| 12438 |
+
"epoch": 0.09107692307692308,
|
| 12439 |
+
"grad_norm": 0.08154296875,
|
| 12440 |
+
"learning_rate": 0.005,
|
| 12441 |
+
"loss": 2.4197568893432617,
|
| 12442 |
+
"step": 3552
|
| 12443 |
+
},
|
| 12444 |
+
{
|
| 12445 |
+
"epoch": 0.09112820512820513,
|
| 12446 |
+
"grad_norm": 0.08349609375,
|
| 12447 |
+
"learning_rate": 0.005,
|
| 12448 |
+
"loss": 2.421808958053589,
|
| 12449 |
+
"step": 3554
|
| 12450 |
+
},
|
| 12451 |
+
{
|
| 12452 |
+
"epoch": 0.09117948717948718,
|
| 12453 |
+
"grad_norm": 0.078125,
|
| 12454 |
+
"learning_rate": 0.005,
|
| 12455 |
+
"loss": 2.434272527694702,
|
| 12456 |
+
"step": 3556
|
| 12457 |
+
},
|
| 12458 |
+
{
|
| 12459 |
+
"epoch": 0.09123076923076923,
|
| 12460 |
+
"grad_norm": 0.0751953125,
|
| 12461 |
+
"learning_rate": 0.005,
|
| 12462 |
+
"loss": 2.4245188236236572,
|
| 12463 |
+
"step": 3558
|
| 12464 |
+
},
|
| 12465 |
+
{
|
| 12466 |
+
"epoch": 0.09128205128205129,
|
| 12467 |
+
"grad_norm": 0.0810546875,
|
| 12468 |
+
"learning_rate": 0.005,
|
| 12469 |
+
"loss": 2.389435052871704,
|
| 12470 |
+
"step": 3560
|
| 12471 |
+
},
|
| 12472 |
+
{
|
| 12473 |
+
"epoch": 0.09133333333333334,
|
| 12474 |
+
"grad_norm": 0.06787109375,
|
| 12475 |
+
"learning_rate": 0.005,
|
| 12476 |
+
"loss": 2.425550937652588,
|
| 12477 |
+
"step": 3562
|
| 12478 |
+
},
|
| 12479 |
+
{
|
| 12480 |
+
"epoch": 0.09138461538461538,
|
| 12481 |
+
"grad_norm": 0.058837890625,
|
| 12482 |
+
"learning_rate": 0.005,
|
| 12483 |
+
"loss": 2.424254894256592,
|
| 12484 |
+
"step": 3564
|
| 12485 |
+
},
|
| 12486 |
+
{
|
| 12487 |
+
"epoch": 0.09143589743589743,
|
| 12488 |
+
"grad_norm": 0.0625,
|
| 12489 |
+
"learning_rate": 0.005,
|
| 12490 |
+
"loss": 2.409860610961914,
|
| 12491 |
+
"step": 3566
|
| 12492 |
+
},
|
| 12493 |
+
{
|
| 12494 |
+
"epoch": 0.09148717948717949,
|
| 12495 |
+
"grad_norm": 0.05517578125,
|
| 12496 |
+
"learning_rate": 0.005,
|
| 12497 |
+
"loss": 2.363229751586914,
|
| 12498 |
+
"step": 3568
|
| 12499 |
+
},
|
| 12500 |
+
{
|
| 12501 |
+
"epoch": 0.09153846153846154,
|
| 12502 |
+
"grad_norm": 0.07421875,
|
| 12503 |
+
"learning_rate": 0.005,
|
| 12504 |
+
"loss": 2.439145803451538,
|
| 12505 |
+
"step": 3570
|
| 12506 |
+
},
|
| 12507 |
+
{
|
| 12508 |
+
"epoch": 0.09158974358974359,
|
| 12509 |
+
"grad_norm": 0.07568359375,
|
| 12510 |
+
"learning_rate": 0.005,
|
| 12511 |
+
"loss": 2.4280641078948975,
|
| 12512 |
+
"step": 3572
|
| 12513 |
+
},
|
| 12514 |
+
{
|
| 12515 |
+
"epoch": 0.09164102564102564,
|
| 12516 |
+
"grad_norm": 0.06201171875,
|
| 12517 |
+
"learning_rate": 0.005,
|
| 12518 |
+
"loss": 2.435570478439331,
|
| 12519 |
+
"step": 3574
|
| 12520 |
+
},
|
| 12521 |
+
{
|
| 12522 |
+
"epoch": 0.0916923076923077,
|
| 12523 |
+
"grad_norm": 0.05517578125,
|
| 12524 |
+
"learning_rate": 0.005,
|
| 12525 |
+
"loss": 2.417301654815674,
|
| 12526 |
+
"step": 3576
|
| 12527 |
+
},
|
| 12528 |
+
{
|
| 12529 |
+
"epoch": 0.09174358974358975,
|
| 12530 |
+
"grad_norm": 0.055419921875,
|
| 12531 |
+
"learning_rate": 0.005,
|
| 12532 |
+
"loss": 2.407351016998291,
|
| 12533 |
+
"step": 3578
|
| 12534 |
+
},
|
| 12535 |
+
{
|
| 12536 |
+
"epoch": 0.0917948717948718,
|
| 12537 |
+
"grad_norm": 0.078125,
|
| 12538 |
+
"learning_rate": 0.005,
|
| 12539 |
+
"loss": 2.403982162475586,
|
| 12540 |
+
"step": 3580
|
| 12541 |
+
},
|
| 12542 |
+
{
|
| 12543 |
+
"epoch": 0.09184615384615384,
|
| 12544 |
+
"grad_norm": 0.0634765625,
|
| 12545 |
+
"learning_rate": 0.005,
|
| 12546 |
+
"loss": 2.3989107608795166,
|
| 12547 |
+
"step": 3582
|
| 12548 |
+
},
|
| 12549 |
+
{
|
| 12550 |
+
"epoch": 0.0918974358974359,
|
| 12551 |
+
"grad_norm": 0.05810546875,
|
| 12552 |
+
"learning_rate": 0.005,
|
| 12553 |
+
"loss": 2.40513277053833,
|
| 12554 |
+
"step": 3584
|
| 12555 |
+
},
|
| 12556 |
+
{
|
| 12557 |
+
"epoch": 0.09194871794871795,
|
| 12558 |
+
"grad_norm": 0.06396484375,
|
| 12559 |
+
"learning_rate": 0.005,
|
| 12560 |
+
"loss": 2.4072911739349365,
|
| 12561 |
+
"step": 3586
|
| 12562 |
+
},
|
| 12563 |
+
{
|
| 12564 |
+
"epoch": 0.092,
|
| 12565 |
+
"grad_norm": 0.06494140625,
|
| 12566 |
+
"learning_rate": 0.005,
|
| 12567 |
+
"loss": 2.35957932472229,
|
| 12568 |
+
"step": 3588
|
| 12569 |
+
},
|
| 12570 |
+
{
|
| 12571 |
+
"epoch": 0.09205128205128205,
|
| 12572 |
+
"grad_norm": 0.06005859375,
|
| 12573 |
+
"learning_rate": 0.005,
|
| 12574 |
+
"loss": 2.406644105911255,
|
| 12575 |
+
"step": 3590
|
| 12576 |
+
},
|
| 12577 |
+
{
|
| 12578 |
+
"epoch": 0.09210256410256411,
|
| 12579 |
+
"grad_norm": 0.059814453125,
|
| 12580 |
+
"learning_rate": 0.005,
|
| 12581 |
+
"loss": 2.397491216659546,
|
| 12582 |
+
"step": 3592
|
| 12583 |
+
},
|
| 12584 |
+
{
|
| 12585 |
+
"epoch": 0.09215384615384616,
|
| 12586 |
+
"grad_norm": 0.05712890625,
|
| 12587 |
+
"learning_rate": 0.005,
|
| 12588 |
+
"loss": 2.397984743118286,
|
| 12589 |
+
"step": 3594
|
| 12590 |
+
},
|
| 12591 |
+
{
|
| 12592 |
+
"epoch": 0.0922051282051282,
|
| 12593 |
+
"grad_norm": 0.055419921875,
|
| 12594 |
+
"learning_rate": 0.005,
|
| 12595 |
+
"loss": 2.4041407108306885,
|
| 12596 |
+
"step": 3596
|
| 12597 |
+
},
|
| 12598 |
+
{
|
| 12599 |
+
"epoch": 0.09225641025641025,
|
| 12600 |
+
"grad_norm": 0.061767578125,
|
| 12601 |
+
"learning_rate": 0.005,
|
| 12602 |
+
"loss": 2.396200656890869,
|
| 12603 |
+
"step": 3598
|
| 12604 |
+
},
|
| 12605 |
+
{
|
| 12606 |
+
"epoch": 0.09230769230769231,
|
| 12607 |
+
"grad_norm": 0.06787109375,
|
| 12608 |
+
"learning_rate": 0.005,
|
| 12609 |
+
"loss": 2.4111366271972656,
|
| 12610 |
+
"step": 3600
|
| 12611 |
+
},
|
| 12612 |
+
{
|
| 12613 |
+
"epoch": 0.09235897435897436,
|
| 12614 |
+
"grad_norm": 0.08935546875,
|
| 12615 |
+
"learning_rate": 0.005,
|
| 12616 |
+
"loss": 2.435255289077759,
|
| 12617 |
+
"step": 3602
|
| 12618 |
+
},
|
| 12619 |
+
{
|
| 12620 |
+
"epoch": 0.09241025641025641,
|
| 12621 |
+
"grad_norm": 0.0849609375,
|
| 12622 |
+
"learning_rate": 0.005,
|
| 12623 |
+
"loss": 2.425633668899536,
|
| 12624 |
+
"step": 3604
|
| 12625 |
+
},
|
| 12626 |
+
{
|
| 12627 |
+
"epoch": 0.09246153846153846,
|
| 12628 |
+
"grad_norm": 0.078125,
|
| 12629 |
+
"learning_rate": 0.005,
|
| 12630 |
+
"loss": 2.3895058631896973,
|
| 12631 |
+
"step": 3606
|
| 12632 |
+
},
|
| 12633 |
+
{
|
| 12634 |
+
"epoch": 0.09251282051282052,
|
| 12635 |
+
"grad_norm": 0.0634765625,
|
| 12636 |
+
"learning_rate": 0.005,
|
| 12637 |
+
"loss": 2.4378089904785156,
|
| 12638 |
+
"step": 3608
|
| 12639 |
+
},
|
| 12640 |
+
{
|
| 12641 |
+
"epoch": 0.09256410256410257,
|
| 12642 |
+
"grad_norm": 0.0654296875,
|
| 12643 |
+
"learning_rate": 0.005,
|
| 12644 |
+
"loss": 2.4039554595947266,
|
| 12645 |
+
"step": 3610
|
| 12646 |
+
},
|
| 12647 |
+
{
|
| 12648 |
+
"epoch": 0.09261538461538461,
|
| 12649 |
+
"grad_norm": 0.057373046875,
|
| 12650 |
+
"learning_rate": 0.005,
|
| 12651 |
+
"loss": 2.4199953079223633,
|
| 12652 |
+
"step": 3612
|
| 12653 |
+
},
|
| 12654 |
+
{
|
| 12655 |
+
"epoch": 0.09266666666666666,
|
| 12656 |
+
"grad_norm": 0.06298828125,
|
| 12657 |
+
"learning_rate": 0.005,
|
| 12658 |
+
"loss": 2.392930746078491,
|
| 12659 |
+
"step": 3614
|
| 12660 |
+
},
|
| 12661 |
+
{
|
| 12662 |
+
"epoch": 0.09271794871794872,
|
| 12663 |
+
"grad_norm": 0.057861328125,
|
| 12664 |
+
"learning_rate": 0.005,
|
| 12665 |
+
"loss": 2.394972085952759,
|
| 12666 |
+
"step": 3616
|
| 12667 |
+
},
|
| 12668 |
+
{
|
| 12669 |
+
"epoch": 0.09276923076923077,
|
| 12670 |
+
"grad_norm": 0.08935546875,
|
| 12671 |
+
"learning_rate": 0.005,
|
| 12672 |
+
"loss": 2.420872211456299,
|
| 12673 |
+
"step": 3618
|
| 12674 |
+
},
|
| 12675 |
+
{
|
| 12676 |
+
"epoch": 0.09282051282051282,
|
| 12677 |
+
"grad_norm": 0.076171875,
|
| 12678 |
+
"learning_rate": 0.005,
|
| 12679 |
+
"loss": 2.425184965133667,
|
| 12680 |
+
"step": 3620
|
| 12681 |
+
},
|
| 12682 |
+
{
|
| 12683 |
+
"epoch": 0.09287179487179487,
|
| 12684 |
+
"grad_norm": 0.0791015625,
|
| 12685 |
+
"learning_rate": 0.005,
|
| 12686 |
+
"loss": 2.384809970855713,
|
| 12687 |
+
"step": 3622
|
| 12688 |
+
},
|
| 12689 |
+
{
|
| 12690 |
+
"epoch": 0.09292307692307693,
|
| 12691 |
+
"grad_norm": 0.08203125,
|
| 12692 |
+
"learning_rate": 0.005,
|
| 12693 |
+
"loss": 2.4473273754119873,
|
| 12694 |
+
"step": 3624
|
| 12695 |
+
},
|
| 12696 |
+
{
|
| 12697 |
+
"epoch": 0.09297435897435898,
|
| 12698 |
+
"grad_norm": 0.060546875,
|
| 12699 |
+
"learning_rate": 0.005,
|
| 12700 |
+
"loss": 2.4268341064453125,
|
| 12701 |
+
"step": 3626
|
| 12702 |
+
},
|
| 12703 |
+
{
|
| 12704 |
+
"epoch": 0.09302564102564102,
|
| 12705 |
+
"grad_norm": 0.054443359375,
|
| 12706 |
+
"learning_rate": 0.005,
|
| 12707 |
+
"loss": 2.4285922050476074,
|
| 12708 |
+
"step": 3628
|
| 12709 |
+
},
|
| 12710 |
+
{
|
| 12711 |
+
"epoch": 0.09307692307692307,
|
| 12712 |
+
"grad_norm": 0.06787109375,
|
| 12713 |
+
"learning_rate": 0.005,
|
| 12714 |
+
"loss": 2.41914963722229,
|
| 12715 |
+
"step": 3630
|
| 12716 |
+
},
|
| 12717 |
+
{
|
| 12718 |
+
"epoch": 0.09312820512820513,
|
| 12719 |
+
"grad_norm": 0.061767578125,
|
| 12720 |
+
"learning_rate": 0.005,
|
| 12721 |
+
"loss": 2.415557861328125,
|
| 12722 |
+
"step": 3632
|
| 12723 |
+
},
|
| 12724 |
+
{
|
| 12725 |
+
"epoch": 0.09317948717948718,
|
| 12726 |
+
"grad_norm": 0.064453125,
|
| 12727 |
+
"learning_rate": 0.005,
|
| 12728 |
+
"loss": 2.3889355659484863,
|
| 12729 |
+
"step": 3634
|
| 12730 |
+
},
|
| 12731 |
+
{
|
| 12732 |
+
"epoch": 0.09323076923076923,
|
| 12733 |
+
"grad_norm": 0.060302734375,
|
| 12734 |
+
"learning_rate": 0.005,
|
| 12735 |
+
"loss": 2.41190767288208,
|
| 12736 |
+
"step": 3636
|
| 12737 |
+
},
|
| 12738 |
+
{
|
| 12739 |
+
"epoch": 0.09328205128205128,
|
| 12740 |
+
"grad_norm": 0.076171875,
|
| 12741 |
+
"learning_rate": 0.005,
|
| 12742 |
+
"loss": 2.4262430667877197,
|
| 12743 |
+
"step": 3638
|
| 12744 |
+
},
|
| 12745 |
+
{
|
| 12746 |
+
"epoch": 0.09333333333333334,
|
| 12747 |
+
"grad_norm": 0.0771484375,
|
| 12748 |
+
"learning_rate": 0.005,
|
| 12749 |
+
"loss": 2.4167873859405518,
|
| 12750 |
+
"step": 3640
|
| 12751 |
+
},
|
| 12752 |
+
{
|
| 12753 |
+
"epoch": 0.09338461538461539,
|
| 12754 |
+
"grad_norm": 0.08935546875,
|
| 12755 |
+
"learning_rate": 0.005,
|
| 12756 |
+
"loss": 2.4328572750091553,
|
| 12757 |
+
"step": 3642
|
| 12758 |
+
},
|
| 12759 |
+
{
|
| 12760 |
+
"epoch": 0.09343589743589743,
|
| 12761 |
+
"grad_norm": 0.07421875,
|
| 12762 |
+
"learning_rate": 0.005,
|
| 12763 |
+
"loss": 2.4029531478881836,
|
| 12764 |
+
"step": 3644
|
| 12765 |
+
},
|
| 12766 |
+
{
|
| 12767 |
+
"epoch": 0.09348717948717948,
|
| 12768 |
+
"grad_norm": 0.06640625,
|
| 12769 |
+
"learning_rate": 0.005,
|
| 12770 |
+
"loss": 2.388449192047119,
|
| 12771 |
+
"step": 3646
|
| 12772 |
+
},
|
| 12773 |
+
{
|
| 12774 |
+
"epoch": 0.09353846153846154,
|
| 12775 |
+
"grad_norm": 0.060546875,
|
| 12776 |
+
"learning_rate": 0.005,
|
| 12777 |
+
"loss": 2.4016411304473877,
|
| 12778 |
+
"step": 3648
|
| 12779 |
+
},
|
| 12780 |
+
{
|
| 12781 |
+
"epoch": 0.09358974358974359,
|
| 12782 |
+
"grad_norm": 0.054443359375,
|
| 12783 |
+
"learning_rate": 0.005,
|
| 12784 |
+
"loss": 2.4022748470306396,
|
| 12785 |
+
"step": 3650
|
| 12786 |
+
},
|
| 12787 |
+
{
|
| 12788 |
+
"epoch": 0.09364102564102564,
|
| 12789 |
+
"grad_norm": 0.056640625,
|
| 12790 |
+
"learning_rate": 0.005,
|
| 12791 |
+
"loss": 2.3943135738372803,
|
| 12792 |
+
"step": 3652
|
| 12793 |
+
},
|
| 12794 |
+
{
|
| 12795 |
+
"epoch": 0.09369230769230769,
|
| 12796 |
+
"grad_norm": 0.06494140625,
|
| 12797 |
+
"learning_rate": 0.005,
|
| 12798 |
+
"loss": 2.38264536857605,
|
| 12799 |
+
"step": 3654
|
| 12800 |
+
},
|
| 12801 |
+
{
|
| 12802 |
+
"epoch": 0.09374358974358975,
|
| 12803 |
+
"grad_norm": 0.0927734375,
|
| 12804 |
+
"learning_rate": 0.005,
|
| 12805 |
+
"loss": 2.3918676376342773,
|
| 12806 |
+
"step": 3656
|
| 12807 |
+
},
|
| 12808 |
+
{
|
| 12809 |
+
"epoch": 0.0937948717948718,
|
| 12810 |
+
"grad_norm": 0.08154296875,
|
| 12811 |
+
"learning_rate": 0.005,
|
| 12812 |
+
"loss": 2.4199023246765137,
|
| 12813 |
+
"step": 3658
|
| 12814 |
+
},
|
| 12815 |
+
{
|
| 12816 |
+
"epoch": 0.09384615384615384,
|
| 12817 |
+
"grad_norm": 0.056640625,
|
| 12818 |
+
"learning_rate": 0.005,
|
| 12819 |
+
"loss": 2.3976686000823975,
|
| 12820 |
+
"step": 3660
|
| 12821 |
+
},
|
| 12822 |
+
{
|
| 12823 |
+
"epoch": 0.09389743589743589,
|
| 12824 |
+
"grad_norm": 0.05859375,
|
| 12825 |
+
"learning_rate": 0.005,
|
| 12826 |
+
"loss": 2.3834011554718018,
|
| 12827 |
+
"step": 3662
|
| 12828 |
+
},
|
| 12829 |
+
{
|
| 12830 |
+
"epoch": 0.09394871794871795,
|
| 12831 |
+
"grad_norm": 0.057373046875,
|
| 12832 |
+
"learning_rate": 0.005,
|
| 12833 |
+
"loss": 2.3770766258239746,
|
| 12834 |
+
"step": 3664
|
| 12835 |
+
},
|
| 12836 |
+
{
|
| 12837 |
+
"epoch": 0.094,
|
| 12838 |
+
"grad_norm": 0.06689453125,
|
| 12839 |
+
"learning_rate": 0.005,
|
| 12840 |
+
"loss": 2.405545473098755,
|
| 12841 |
+
"step": 3666
|
| 12842 |
+
},
|
| 12843 |
+
{
|
| 12844 |
+
"epoch": 0.09405128205128205,
|
| 12845 |
+
"grad_norm": 0.052734375,
|
| 12846 |
+
"learning_rate": 0.005,
|
| 12847 |
+
"loss": 2.423511028289795,
|
| 12848 |
+
"step": 3668
|
| 12849 |
+
},
|
| 12850 |
+
{
|
| 12851 |
+
"epoch": 0.0941025641025641,
|
| 12852 |
+
"grad_norm": 0.059326171875,
|
| 12853 |
+
"learning_rate": 0.005,
|
| 12854 |
+
"loss": 2.4031991958618164,
|
| 12855 |
+
"step": 3670
|
| 12856 |
+
},
|
| 12857 |
+
{
|
| 12858 |
+
"epoch": 0.09415384615384616,
|
| 12859 |
+
"grad_norm": 0.057373046875,
|
| 12860 |
+
"learning_rate": 0.005,
|
| 12861 |
+
"loss": 2.3995845317840576,
|
| 12862 |
+
"step": 3672
|
| 12863 |
+
},
|
| 12864 |
+
{
|
| 12865 |
+
"epoch": 0.0942051282051282,
|
| 12866 |
+
"grad_norm": 0.05859375,
|
| 12867 |
+
"learning_rate": 0.005,
|
| 12868 |
+
"loss": 2.3914711475372314,
|
| 12869 |
+
"step": 3674
|
| 12870 |
+
},
|
| 12871 |
+
{
|
| 12872 |
+
"epoch": 0.09425641025641025,
|
| 12873 |
+
"grad_norm": 0.052490234375,
|
| 12874 |
+
"learning_rate": 0.005,
|
| 12875 |
+
"loss": 2.3971457481384277,
|
| 12876 |
+
"step": 3676
|
| 12877 |
+
},
|
| 12878 |
+
{
|
| 12879 |
+
"epoch": 0.09430769230769231,
|
| 12880 |
+
"grad_norm": 0.06689453125,
|
| 12881 |
+
"learning_rate": 0.005,
|
| 12882 |
+
"loss": 2.387514114379883,
|
| 12883 |
+
"step": 3678
|
| 12884 |
+
},
|
| 12885 |
+
{
|
| 12886 |
+
"epoch": 0.09435897435897436,
|
| 12887 |
+
"grad_norm": 0.072265625,
|
| 12888 |
+
"learning_rate": 0.005,
|
| 12889 |
+
"loss": 2.3982932567596436,
|
| 12890 |
+
"step": 3680
|
| 12891 |
+
},
|
| 12892 |
+
{
|
| 12893 |
+
"epoch": 0.09441025641025641,
|
| 12894 |
+
"grad_norm": 0.07275390625,
|
| 12895 |
+
"learning_rate": 0.005,
|
| 12896 |
+
"loss": 2.3955538272857666,
|
| 12897 |
+
"step": 3682
|
| 12898 |
+
},
|
| 12899 |
+
{
|
| 12900 |
+
"epoch": 0.09446153846153846,
|
| 12901 |
+
"grad_norm": 0.0673828125,
|
| 12902 |
+
"learning_rate": 0.005,
|
| 12903 |
+
"loss": 2.39897084236145,
|
| 12904 |
+
"step": 3684
|
| 12905 |
+
},
|
| 12906 |
+
{
|
| 12907 |
+
"epoch": 0.09451282051282052,
|
| 12908 |
+
"grad_norm": 0.057861328125,
|
| 12909 |
+
"learning_rate": 0.005,
|
| 12910 |
+
"loss": 2.3957417011260986,
|
| 12911 |
+
"step": 3686
|
| 12912 |
+
},
|
| 12913 |
+
{
|
| 12914 |
+
"epoch": 0.09456410256410257,
|
| 12915 |
+
"grad_norm": 0.0556640625,
|
| 12916 |
+
"learning_rate": 0.005,
|
| 12917 |
+
"loss": 2.387707233428955,
|
| 12918 |
+
"step": 3688
|
| 12919 |
+
},
|
| 12920 |
+
{
|
| 12921 |
+
"epoch": 0.09461538461538462,
|
| 12922 |
+
"grad_norm": 0.064453125,
|
| 12923 |
+
"learning_rate": 0.005,
|
| 12924 |
+
"loss": 2.3971118927001953,
|
| 12925 |
+
"step": 3690
|
| 12926 |
+
},
|
| 12927 |
+
{
|
| 12928 |
+
"epoch": 0.09466666666666666,
|
| 12929 |
+
"grad_norm": 0.0556640625,
|
| 12930 |
+
"learning_rate": 0.005,
|
| 12931 |
+
"loss": 2.377110481262207,
|
| 12932 |
+
"step": 3692
|
| 12933 |
+
},
|
| 12934 |
+
{
|
| 12935 |
+
"epoch": 0.09471794871794872,
|
| 12936 |
+
"grad_norm": 0.058349609375,
|
| 12937 |
+
"learning_rate": 0.005,
|
| 12938 |
+
"loss": 2.3873484134674072,
|
| 12939 |
+
"step": 3694
|
| 12940 |
+
},
|
| 12941 |
+
{
|
| 12942 |
+
"epoch": 0.09476923076923077,
|
| 12943 |
+
"grad_norm": 0.061767578125,
|
| 12944 |
+
"learning_rate": 0.005,
|
| 12945 |
+
"loss": 2.4053502082824707,
|
| 12946 |
+
"step": 3696
|
| 12947 |
+
},
|
| 12948 |
+
{
|
| 12949 |
+
"epoch": 0.09482051282051282,
|
| 12950 |
+
"grad_norm": 0.048095703125,
|
| 12951 |
+
"learning_rate": 0.005,
|
| 12952 |
+
"loss": 2.402733564376831,
|
| 12953 |
+
"step": 3698
|
| 12954 |
+
},
|
| 12955 |
+
{
|
| 12956 |
+
"epoch": 0.09487179487179487,
|
| 12957 |
+
"grad_norm": 0.059814453125,
|
| 12958 |
+
"learning_rate": 0.005,
|
| 12959 |
+
"loss": 2.378330707550049,
|
| 12960 |
+
"step": 3700
|
| 12961 |
+
},
|
| 12962 |
+
{
|
| 12963 |
+
"epoch": 0.09492307692307693,
|
| 12964 |
+
"grad_norm": 0.0703125,
|
| 12965 |
+
"learning_rate": 0.005,
|
| 12966 |
+
"loss": 2.40600848197937,
|
| 12967 |
+
"step": 3702
|
| 12968 |
+
},
|
| 12969 |
+
{
|
| 12970 |
+
"epoch": 0.09497435897435898,
|
| 12971 |
+
"grad_norm": 0.0673828125,
|
| 12972 |
+
"learning_rate": 0.005,
|
| 12973 |
+
"loss": 2.4102160930633545,
|
| 12974 |
+
"step": 3704
|
| 12975 |
+
},
|
| 12976 |
+
{
|
| 12977 |
+
"epoch": 0.09502564102564102,
|
| 12978 |
+
"grad_norm": 0.0712890625,
|
| 12979 |
+
"learning_rate": 0.005,
|
| 12980 |
+
"loss": 2.4045615196228027,
|
| 12981 |
+
"step": 3706
|
| 12982 |
+
},
|
| 12983 |
+
{
|
| 12984 |
+
"epoch": 0.09507692307692307,
|
| 12985 |
+
"grad_norm": 0.05029296875,
|
| 12986 |
+
"learning_rate": 0.005,
|
| 12987 |
+
"loss": 2.3970460891723633,
|
| 12988 |
+
"step": 3708
|
| 12989 |
+
},
|
| 12990 |
+
{
|
| 12991 |
+
"epoch": 0.09512820512820513,
|
| 12992 |
+
"grad_norm": 0.052978515625,
|
| 12993 |
+
"learning_rate": 0.005,
|
| 12994 |
+
"loss": 2.376721143722534,
|
| 12995 |
+
"step": 3710
|
| 12996 |
+
},
|
| 12997 |
+
{
|
| 12998 |
+
"epoch": 0.09517948717948718,
|
| 12999 |
+
"grad_norm": 0.05078125,
|
| 13000 |
+
"learning_rate": 0.005,
|
| 13001 |
+
"loss": 2.382934093475342,
|
| 13002 |
+
"step": 3712
|
| 13003 |
+
},
|
| 13004 |
+
{
|
| 13005 |
+
"epoch": 0.09523076923076923,
|
| 13006 |
+
"grad_norm": 0.052734375,
|
| 13007 |
+
"learning_rate": 0.005,
|
| 13008 |
+
"loss": 2.3934435844421387,
|
| 13009 |
+
"step": 3714
|
| 13010 |
+
},
|
| 13011 |
+
{
|
| 13012 |
+
"epoch": 0.09528205128205128,
|
| 13013 |
+
"grad_norm": 0.05126953125,
|
| 13014 |
+
"learning_rate": 0.005,
|
| 13015 |
+
"loss": 2.372809648513794,
|
| 13016 |
+
"step": 3716
|
| 13017 |
+
},
|
| 13018 |
+
{
|
| 13019 |
+
"epoch": 0.09533333333333334,
|
| 13020 |
+
"grad_norm": 0.05908203125,
|
| 13021 |
+
"learning_rate": 0.005,
|
| 13022 |
+
"loss": 2.3779361248016357,
|
| 13023 |
+
"step": 3718
|
| 13024 |
+
},
|
| 13025 |
+
{
|
| 13026 |
+
"epoch": 0.09538461538461539,
|
| 13027 |
+
"grad_norm": 0.056640625,
|
| 13028 |
+
"learning_rate": 0.005,
|
| 13029 |
+
"loss": 2.391171455383301,
|
| 13030 |
+
"step": 3720
|
| 13031 |
+
},
|
| 13032 |
+
{
|
| 13033 |
+
"epoch": 0.09543589743589743,
|
| 13034 |
+
"grad_norm": 0.06640625,
|
| 13035 |
+
"learning_rate": 0.005,
|
| 13036 |
+
"loss": 2.3816425800323486,
|
| 13037 |
+
"step": 3722
|
| 13038 |
+
},
|
| 13039 |
+
{
|
| 13040 |
+
"epoch": 0.09548717948717948,
|
| 13041 |
+
"grad_norm": 0.056396484375,
|
| 13042 |
+
"learning_rate": 0.005,
|
| 13043 |
+
"loss": 2.4067928791046143,
|
| 13044 |
+
"step": 3724
|
| 13045 |
+
},
|
| 13046 |
+
{
|
| 13047 |
+
"epoch": 0.09553846153846154,
|
| 13048 |
+
"grad_norm": 0.0546875,
|
| 13049 |
+
"learning_rate": 0.005,
|
| 13050 |
+
"loss": 2.40596604347229,
|
| 13051 |
+
"step": 3726
|
| 13052 |
+
},
|
| 13053 |
+
{
|
| 13054 |
+
"epoch": 0.09558974358974359,
|
| 13055 |
+
"grad_norm": 0.05615234375,
|
| 13056 |
+
"learning_rate": 0.005,
|
| 13057 |
+
"loss": 2.400977373123169,
|
| 13058 |
+
"step": 3728
|
| 13059 |
+
},
|
| 13060 |
+
{
|
| 13061 |
+
"epoch": 0.09564102564102564,
|
| 13062 |
+
"grad_norm": 0.060546875,
|
| 13063 |
+
"learning_rate": 0.005,
|
| 13064 |
+
"loss": 2.411196708679199,
|
| 13065 |
+
"step": 3730
|
| 13066 |
+
},
|
| 13067 |
+
{
|
| 13068 |
+
"epoch": 0.09569230769230769,
|
| 13069 |
+
"grad_norm": 0.07861328125,
|
| 13070 |
+
"learning_rate": 0.005,
|
| 13071 |
+
"loss": 2.4202773571014404,
|
| 13072 |
+
"step": 3732
|
| 13073 |
+
},
|
| 13074 |
+
{
|
| 13075 |
+
"epoch": 0.09574358974358975,
|
| 13076 |
+
"grad_norm": 0.10009765625,
|
| 13077 |
+
"learning_rate": 0.005,
|
| 13078 |
+
"loss": 2.4468371868133545,
|
| 13079 |
+
"step": 3734
|
| 13080 |
+
},
|
| 13081 |
+
{
|
| 13082 |
+
"epoch": 0.0957948717948718,
|
| 13083 |
+
"grad_norm": 0.09228515625,
|
| 13084 |
+
"learning_rate": 0.005,
|
| 13085 |
+
"loss": 2.4051547050476074,
|
| 13086 |
+
"step": 3736
|
| 13087 |
+
},
|
| 13088 |
+
{
|
| 13089 |
+
"epoch": 0.09584615384615384,
|
| 13090 |
+
"grad_norm": 0.08056640625,
|
| 13091 |
+
"learning_rate": 0.005,
|
| 13092 |
+
"loss": 2.395641326904297,
|
| 13093 |
+
"step": 3738
|
| 13094 |
+
},
|
| 13095 |
+
{
|
| 13096 |
+
"epoch": 0.09589743589743589,
|
| 13097 |
+
"grad_norm": 0.06884765625,
|
| 13098 |
+
"learning_rate": 0.005,
|
| 13099 |
+
"loss": 2.39536714553833,
|
| 13100 |
+
"step": 3740
|
| 13101 |
+
},
|
| 13102 |
+
{
|
| 13103 |
+
"epoch": 0.09594871794871795,
|
| 13104 |
+
"grad_norm": 0.054443359375,
|
| 13105 |
+
"learning_rate": 0.005,
|
| 13106 |
+
"loss": 2.3892858028411865,
|
| 13107 |
+
"step": 3742
|
| 13108 |
+
},
|
| 13109 |
+
{
|
| 13110 |
+
"epoch": 0.096,
|
| 13111 |
+
"grad_norm": 0.0771484375,
|
| 13112 |
+
"learning_rate": 0.005,
|
| 13113 |
+
"loss": 2.4153215885162354,
|
| 13114 |
+
"step": 3744
|
| 13115 |
+
},
|
| 13116 |
+
{
|
| 13117 |
+
"epoch": 0.09605128205128205,
|
| 13118 |
+
"grad_norm": 0.0888671875,
|
| 13119 |
+
"learning_rate": 0.005,
|
| 13120 |
+
"loss": 2.4030420780181885,
|
| 13121 |
+
"step": 3746
|
| 13122 |
+
},
|
| 13123 |
+
{
|
| 13124 |
+
"epoch": 0.0961025641025641,
|
| 13125 |
+
"grad_norm": 0.06689453125,
|
| 13126 |
+
"learning_rate": 0.005,
|
| 13127 |
+
"loss": 2.3789937496185303,
|
| 13128 |
+
"step": 3748
|
| 13129 |
+
},
|
| 13130 |
+
{
|
| 13131 |
+
"epoch": 0.09615384615384616,
|
| 13132 |
+
"grad_norm": 0.07177734375,
|
| 13133 |
+
"learning_rate": 0.005,
|
| 13134 |
+
"loss": 2.3884522914886475,
|
| 13135 |
+
"step": 3750
|
| 13136 |
+
},
|
| 13137 |
+
{
|
| 13138 |
+
"epoch": 0.09620512820512821,
|
| 13139 |
+
"grad_norm": 0.07275390625,
|
| 13140 |
+
"learning_rate": 0.005,
|
| 13141 |
+
"loss": 2.385904550552368,
|
| 13142 |
+
"step": 3752
|
| 13143 |
+
},
|
| 13144 |
+
{
|
| 13145 |
+
"epoch": 0.09625641025641025,
|
| 13146 |
+
"grad_norm": 0.07080078125,
|
| 13147 |
+
"learning_rate": 0.005,
|
| 13148 |
+
"loss": 2.4197237491607666,
|
| 13149 |
+
"step": 3754
|
| 13150 |
+
},
|
| 13151 |
+
{
|
| 13152 |
+
"epoch": 0.0963076923076923,
|
| 13153 |
+
"grad_norm": 0.05322265625,
|
| 13154 |
+
"learning_rate": 0.005,
|
| 13155 |
+
"loss": 2.3917524814605713,
|
| 13156 |
+
"step": 3756
|
| 13157 |
+
},
|
| 13158 |
+
{
|
| 13159 |
+
"epoch": 0.09635897435897436,
|
| 13160 |
+
"grad_norm": 0.048095703125,
|
| 13161 |
+
"learning_rate": 0.005,
|
| 13162 |
+
"loss": 2.407090902328491,
|
| 13163 |
+
"step": 3758
|
| 13164 |
+
},
|
| 13165 |
+
{
|
| 13166 |
+
"epoch": 0.09641025641025641,
|
| 13167 |
+
"grad_norm": 0.057861328125,
|
| 13168 |
+
"learning_rate": 0.005,
|
| 13169 |
+
"loss": 2.406498908996582,
|
| 13170 |
+
"step": 3760
|
| 13171 |
+
},
|
| 13172 |
+
{
|
| 13173 |
+
"epoch": 0.09646153846153846,
|
| 13174 |
+
"grad_norm": 0.06396484375,
|
| 13175 |
+
"learning_rate": 0.005,
|
| 13176 |
+
"loss": 2.398679494857788,
|
| 13177 |
+
"step": 3762
|
| 13178 |
+
},
|
| 13179 |
+
{
|
| 13180 |
+
"epoch": 0.09651282051282051,
|
| 13181 |
+
"grad_norm": 0.05859375,
|
| 13182 |
+
"learning_rate": 0.005,
|
| 13183 |
+
"loss": 2.4034523963928223,
|
| 13184 |
+
"step": 3764
|
| 13185 |
+
},
|
| 13186 |
+
{
|
| 13187 |
+
"epoch": 0.09656410256410257,
|
| 13188 |
+
"grad_norm": 0.0634765625,
|
| 13189 |
+
"learning_rate": 0.005,
|
| 13190 |
+
"loss": 2.4111382961273193,
|
| 13191 |
+
"step": 3766
|
| 13192 |
+
},
|
| 13193 |
+
{
|
| 13194 |
+
"epoch": 0.09661538461538462,
|
| 13195 |
+
"grad_norm": 0.10498046875,
|
| 13196 |
+
"learning_rate": 0.005,
|
| 13197 |
+
"loss": 2.4209513664245605,
|
| 13198 |
+
"step": 3768
|
| 13199 |
+
},
|
| 13200 |
+
{
|
| 13201 |
+
"epoch": 0.09666666666666666,
|
| 13202 |
+
"grad_norm": 0.11328125,
|
| 13203 |
+
"learning_rate": 0.005,
|
| 13204 |
+
"loss": 2.420320749282837,
|
| 13205 |
+
"step": 3770
|
| 13206 |
+
},
|
| 13207 |
+
{
|
| 13208 |
+
"epoch": 0.09671794871794871,
|
| 13209 |
+
"grad_norm": 0.09326171875,
|
| 13210 |
+
"learning_rate": 0.005,
|
| 13211 |
+
"loss": 2.3820066452026367,
|
| 13212 |
+
"step": 3772
|
| 13213 |
+
},
|
| 13214 |
+
{
|
| 13215 |
+
"epoch": 0.09676923076923077,
|
| 13216 |
+
"grad_norm": 0.080078125,
|
| 13217 |
+
"learning_rate": 0.005,
|
| 13218 |
+
"loss": 2.415583372116089,
|
| 13219 |
+
"step": 3774
|
| 13220 |
+
},
|
| 13221 |
+
{
|
| 13222 |
+
"epoch": 0.09682051282051282,
|
| 13223 |
+
"grad_norm": 0.07080078125,
|
| 13224 |
+
"learning_rate": 0.005,
|
| 13225 |
+
"loss": 2.394167900085449,
|
| 13226 |
+
"step": 3776
|
| 13227 |
+
},
|
| 13228 |
+
{
|
| 13229 |
+
"epoch": 0.09687179487179487,
|
| 13230 |
+
"grad_norm": 0.06298828125,
|
| 13231 |
+
"learning_rate": 0.005,
|
| 13232 |
+
"loss": 2.441962718963623,
|
| 13233 |
+
"step": 3778
|
| 13234 |
+
},
|
| 13235 |
+
{
|
| 13236 |
+
"epoch": 0.09692307692307692,
|
| 13237 |
+
"grad_norm": 0.053955078125,
|
| 13238 |
+
"learning_rate": 0.005,
|
| 13239 |
+
"loss": 2.4058051109313965,
|
| 13240 |
+
"step": 3780
|
| 13241 |
+
},
|
| 13242 |
+
{
|
| 13243 |
+
"epoch": 0.09697435897435898,
|
| 13244 |
+
"grad_norm": 0.05322265625,
|
| 13245 |
+
"learning_rate": 0.005,
|
| 13246 |
+
"loss": 2.4315271377563477,
|
| 13247 |
+
"step": 3782
|
| 13248 |
+
},
|
| 13249 |
+
{
|
| 13250 |
+
"epoch": 0.09702564102564103,
|
| 13251 |
+
"grad_norm": 0.05322265625,
|
| 13252 |
+
"learning_rate": 0.005,
|
| 13253 |
+
"loss": 2.410337209701538,
|
| 13254 |
+
"step": 3784
|
| 13255 |
+
},
|
| 13256 |
+
{
|
| 13257 |
+
"epoch": 0.09707692307692307,
|
| 13258 |
+
"grad_norm": 0.056396484375,
|
| 13259 |
+
"learning_rate": 0.005,
|
| 13260 |
+
"loss": 2.404144525527954,
|
| 13261 |
+
"step": 3786
|
| 13262 |
+
},
|
| 13263 |
+
{
|
| 13264 |
+
"epoch": 0.09712820512820512,
|
| 13265 |
+
"grad_norm": 0.05224609375,
|
| 13266 |
+
"learning_rate": 0.005,
|
| 13267 |
+
"loss": 2.3995089530944824,
|
| 13268 |
+
"step": 3788
|
| 13269 |
+
},
|
| 13270 |
+
{
|
| 13271 |
+
"epoch": 0.09717948717948718,
|
| 13272 |
+
"grad_norm": 0.062255859375,
|
| 13273 |
+
"learning_rate": 0.005,
|
| 13274 |
+
"loss": 2.415908098220825,
|
| 13275 |
+
"step": 3790
|
| 13276 |
+
},
|
| 13277 |
+
{
|
| 13278 |
+
"epoch": 0.09723076923076923,
|
| 13279 |
+
"grad_norm": 0.0703125,
|
| 13280 |
+
"learning_rate": 0.005,
|
| 13281 |
+
"loss": 2.3940107822418213,
|
| 13282 |
+
"step": 3792
|
| 13283 |
+
},
|
| 13284 |
+
{
|
| 13285 |
+
"epoch": 0.09728205128205128,
|
| 13286 |
+
"grad_norm": 0.0791015625,
|
| 13287 |
+
"learning_rate": 0.005,
|
| 13288 |
+
"loss": 2.4129836559295654,
|
| 13289 |
+
"step": 3794
|
| 13290 |
+
},
|
| 13291 |
+
{
|
| 13292 |
+
"epoch": 0.09733333333333333,
|
| 13293 |
+
"grad_norm": 0.07421875,
|
| 13294 |
+
"learning_rate": 0.005,
|
| 13295 |
+
"loss": 2.4188404083251953,
|
| 13296 |
+
"step": 3796
|
| 13297 |
+
},
|
| 13298 |
+
{
|
| 13299 |
+
"epoch": 0.09738461538461539,
|
| 13300 |
+
"grad_norm": 0.08251953125,
|
| 13301 |
+
"learning_rate": 0.005,
|
| 13302 |
+
"loss": 2.3889007568359375,
|
| 13303 |
+
"step": 3798
|
| 13304 |
+
},
|
| 13305 |
+
{
|
| 13306 |
+
"epoch": 0.09743589743589744,
|
| 13307 |
+
"grad_norm": 0.09423828125,
|
| 13308 |
+
"learning_rate": 0.005,
|
| 13309 |
+
"loss": 2.409681797027588,
|
| 13310 |
+
"step": 3800
|
| 13311 |
+
},
|
| 13312 |
+
{
|
| 13313 |
+
"epoch": 0.09748717948717948,
|
| 13314 |
+
"grad_norm": 0.0908203125,
|
| 13315 |
+
"learning_rate": 0.005,
|
| 13316 |
+
"loss": 2.4024970531463623,
|
| 13317 |
+
"step": 3802
|
| 13318 |
+
},
|
| 13319 |
+
{
|
| 13320 |
+
"epoch": 0.09753846153846153,
|
| 13321 |
+
"grad_norm": 0.07666015625,
|
| 13322 |
+
"learning_rate": 0.005,
|
| 13323 |
+
"loss": 2.397566080093384,
|
| 13324 |
+
"step": 3804
|
| 13325 |
+
},
|
| 13326 |
+
{
|
| 13327 |
+
"epoch": 0.0975897435897436,
|
| 13328 |
+
"grad_norm": 0.05615234375,
|
| 13329 |
+
"learning_rate": 0.005,
|
| 13330 |
+
"loss": 2.4167375564575195,
|
| 13331 |
+
"step": 3806
|
| 13332 |
+
},
|
| 13333 |
+
{
|
| 13334 |
+
"epoch": 0.09764102564102564,
|
| 13335 |
+
"grad_norm": 0.0625,
|
| 13336 |
+
"learning_rate": 0.005,
|
| 13337 |
+
"loss": 2.4124650955200195,
|
| 13338 |
+
"step": 3808
|
| 13339 |
+
},
|
| 13340 |
+
{
|
| 13341 |
+
"epoch": 0.09769230769230769,
|
| 13342 |
+
"grad_norm": 0.055419921875,
|
| 13343 |
+
"learning_rate": 0.005,
|
| 13344 |
+
"loss": 2.403942108154297,
|
| 13345 |
+
"step": 3810
|
| 13346 |
+
},
|
| 13347 |
+
{
|
| 13348 |
+
"epoch": 0.09774358974358974,
|
| 13349 |
+
"grad_norm": 0.055419921875,
|
| 13350 |
+
"learning_rate": 0.005,
|
| 13351 |
+
"loss": 2.3854751586914062,
|
| 13352 |
+
"step": 3812
|
| 13353 |
+
},
|
| 13354 |
+
{
|
| 13355 |
+
"epoch": 0.0977948717948718,
|
| 13356 |
+
"grad_norm": 0.0654296875,
|
| 13357 |
+
"learning_rate": 0.005,
|
| 13358 |
+
"loss": 2.4010393619537354,
|
| 13359 |
+
"step": 3814
|
| 13360 |
+
},
|
| 13361 |
+
{
|
| 13362 |
+
"epoch": 0.09784615384615385,
|
| 13363 |
+
"grad_norm": 0.07568359375,
|
| 13364 |
+
"learning_rate": 0.005,
|
| 13365 |
+
"loss": 2.4228310585021973,
|
| 13366 |
+
"step": 3816
|
| 13367 |
+
},
|
| 13368 |
+
{
|
| 13369 |
+
"epoch": 0.0978974358974359,
|
| 13370 |
+
"grad_norm": 0.056396484375,
|
| 13371 |
+
"learning_rate": 0.005,
|
| 13372 |
+
"loss": 2.3976328372955322,
|
| 13373 |
+
"step": 3818
|
| 13374 |
+
},
|
| 13375 |
+
{
|
| 13376 |
+
"epoch": 0.09794871794871794,
|
| 13377 |
+
"grad_norm": 0.052001953125,
|
| 13378 |
+
"learning_rate": 0.005,
|
| 13379 |
+
"loss": 2.428123712539673,
|
| 13380 |
+
"step": 3820
|
| 13381 |
+
},
|
| 13382 |
+
{
|
| 13383 |
+
"epoch": 0.098,
|
| 13384 |
+
"grad_norm": 0.05810546875,
|
| 13385 |
+
"learning_rate": 0.005,
|
| 13386 |
+
"loss": 2.4027838706970215,
|
| 13387 |
+
"step": 3822
|
| 13388 |
+
},
|
| 13389 |
+
{
|
| 13390 |
+
"epoch": 0.09805128205128205,
|
| 13391 |
+
"grad_norm": 0.0830078125,
|
| 13392 |
+
"learning_rate": 0.005,
|
| 13393 |
+
"loss": 2.443448305130005,
|
| 13394 |
+
"step": 3824
|
| 13395 |
+
},
|
| 13396 |
+
{
|
| 13397 |
+
"epoch": 0.0981025641025641,
|
| 13398 |
+
"grad_norm": 0.080078125,
|
| 13399 |
+
"learning_rate": 0.005,
|
| 13400 |
+
"loss": 2.4143223762512207,
|
| 13401 |
+
"step": 3826
|
| 13402 |
+
},
|
| 13403 |
+
{
|
| 13404 |
+
"epoch": 0.09815384615384615,
|
| 13405 |
+
"grad_norm": 0.0771484375,
|
| 13406 |
+
"learning_rate": 0.005,
|
| 13407 |
+
"loss": 2.4070379734039307,
|
| 13408 |
+
"step": 3828
|
| 13409 |
+
},
|
| 13410 |
+
{
|
| 13411 |
+
"epoch": 0.09820512820512821,
|
| 13412 |
+
"grad_norm": 0.06689453125,
|
| 13413 |
+
"learning_rate": 0.005,
|
| 13414 |
+
"loss": 2.3898911476135254,
|
| 13415 |
+
"step": 3830
|
| 13416 |
+
},
|
| 13417 |
+
{
|
| 13418 |
+
"epoch": 0.09825641025641026,
|
| 13419 |
+
"grad_norm": 0.06494140625,
|
| 13420 |
+
"learning_rate": 0.005,
|
| 13421 |
+
"loss": 2.377822160720825,
|
| 13422 |
+
"step": 3832
|
| 13423 |
+
},
|
| 13424 |
+
{
|
| 13425 |
+
"epoch": 0.0983076923076923,
|
| 13426 |
+
"grad_norm": 0.07080078125,
|
| 13427 |
+
"learning_rate": 0.005,
|
| 13428 |
+
"loss": 2.4335594177246094,
|
| 13429 |
+
"step": 3834
|
| 13430 |
+
},
|
| 13431 |
+
{
|
| 13432 |
+
"epoch": 0.09835897435897435,
|
| 13433 |
+
"grad_norm": 0.07861328125,
|
| 13434 |
+
"learning_rate": 0.005,
|
| 13435 |
+
"loss": 2.419671058654785,
|
| 13436 |
+
"step": 3836
|
| 13437 |
+
},
|
| 13438 |
+
{
|
| 13439 |
+
"epoch": 0.09841025641025641,
|
| 13440 |
+
"grad_norm": 0.08203125,
|
| 13441 |
+
"learning_rate": 0.005,
|
| 13442 |
+
"loss": 2.417009115219116,
|
| 13443 |
+
"step": 3838
|
| 13444 |
+
},
|
| 13445 |
+
{
|
| 13446 |
+
"epoch": 0.09846153846153846,
|
| 13447 |
+
"grad_norm": 0.08251953125,
|
| 13448 |
+
"learning_rate": 0.005,
|
| 13449 |
+
"loss": 2.4051599502563477,
|
| 13450 |
+
"step": 3840
|
| 13451 |
+
},
|
| 13452 |
+
{
|
| 13453 |
+
"epoch": 0.09851282051282051,
|
| 13454 |
+
"grad_norm": 0.08447265625,
|
| 13455 |
+
"learning_rate": 0.005,
|
| 13456 |
+
"loss": 2.4230260848999023,
|
| 13457 |
+
"step": 3842
|
| 13458 |
+
},
|
| 13459 |
+
{
|
| 13460 |
+
"epoch": 0.09856410256410257,
|
| 13461 |
+
"grad_norm": 0.0751953125,
|
| 13462 |
+
"learning_rate": 0.005,
|
| 13463 |
+
"loss": 2.414266347885132,
|
| 13464 |
+
"step": 3844
|
| 13465 |
+
},
|
| 13466 |
+
{
|
| 13467 |
+
"epoch": 0.09861538461538462,
|
| 13468 |
+
"grad_norm": 0.080078125,
|
| 13469 |
+
"learning_rate": 0.005,
|
| 13470 |
+
"loss": 2.4307775497436523,
|
| 13471 |
+
"step": 3846
|
| 13472 |
+
},
|
| 13473 |
+
{
|
| 13474 |
+
"epoch": 0.09866666666666667,
|
| 13475 |
+
"grad_norm": 0.08935546875,
|
| 13476 |
+
"learning_rate": 0.005,
|
| 13477 |
+
"loss": 2.395287275314331,
|
| 13478 |
+
"step": 3848
|
| 13479 |
+
},
|
| 13480 |
+
{
|
| 13481 |
+
"epoch": 0.09871794871794871,
|
| 13482 |
+
"grad_norm": 0.0693359375,
|
| 13483 |
+
"learning_rate": 0.005,
|
| 13484 |
+
"loss": 2.3883185386657715,
|
| 13485 |
+
"step": 3850
|
| 13486 |
+
},
|
| 13487 |
+
{
|
| 13488 |
+
"epoch": 0.09876923076923078,
|
| 13489 |
+
"grad_norm": 0.05859375,
|
| 13490 |
+
"learning_rate": 0.005,
|
| 13491 |
+
"loss": 2.3991539478302,
|
| 13492 |
+
"step": 3852
|
| 13493 |
+
},
|
| 13494 |
+
{
|
| 13495 |
+
"epoch": 0.09882051282051282,
|
| 13496 |
+
"grad_norm": 0.05615234375,
|
| 13497 |
+
"learning_rate": 0.005,
|
| 13498 |
+
"loss": 2.4052047729492188,
|
| 13499 |
+
"step": 3854
|
| 13500 |
+
},
|
| 13501 |
+
{
|
| 13502 |
+
"epoch": 0.09887179487179487,
|
| 13503 |
+
"grad_norm": 0.04931640625,
|
| 13504 |
+
"learning_rate": 0.005,
|
| 13505 |
+
"loss": 2.3714468479156494,
|
| 13506 |
+
"step": 3856
|
| 13507 |
+
},
|
| 13508 |
+
{
|
| 13509 |
+
"epoch": 0.09892307692307692,
|
| 13510 |
+
"grad_norm": 0.06396484375,
|
| 13511 |
+
"learning_rate": 0.005,
|
| 13512 |
+
"loss": 2.4196550846099854,
|
| 13513 |
+
"step": 3858
|
| 13514 |
+
},
|
| 13515 |
+
{
|
| 13516 |
+
"epoch": 0.09897435897435898,
|
| 13517 |
+
"grad_norm": 0.07373046875,
|
| 13518 |
+
"learning_rate": 0.005,
|
| 13519 |
+
"loss": 2.3952279090881348,
|
| 13520 |
+
"step": 3860
|
| 13521 |
+
},
|
| 13522 |
+
{
|
| 13523 |
+
"epoch": 0.09902564102564103,
|
| 13524 |
+
"grad_norm": 0.0673828125,
|
| 13525 |
+
"learning_rate": 0.005,
|
| 13526 |
+
"loss": 2.3767526149749756,
|
| 13527 |
+
"step": 3862
|
| 13528 |
+
},
|
| 13529 |
+
{
|
| 13530 |
+
"epoch": 0.09907692307692308,
|
| 13531 |
+
"grad_norm": 0.060546875,
|
| 13532 |
+
"learning_rate": 0.005,
|
| 13533 |
+
"loss": 2.4114983081817627,
|
| 13534 |
+
"step": 3864
|
| 13535 |
+
},
|
| 13536 |
+
{
|
| 13537 |
+
"epoch": 0.09912820512820512,
|
| 13538 |
+
"grad_norm": 0.06884765625,
|
| 13539 |
+
"learning_rate": 0.005,
|
| 13540 |
+
"loss": 2.4133524894714355,
|
| 13541 |
+
"step": 3866
|
| 13542 |
+
},
|
| 13543 |
+
{
|
| 13544 |
+
"epoch": 0.09917948717948719,
|
| 13545 |
+
"grad_norm": 0.06982421875,
|
| 13546 |
+
"learning_rate": 0.005,
|
| 13547 |
+
"loss": 2.3839468955993652,
|
| 13548 |
+
"step": 3868
|
| 13549 |
+
},
|
| 13550 |
+
{
|
| 13551 |
+
"epoch": 0.09923076923076923,
|
| 13552 |
+
"grad_norm": 0.056640625,
|
| 13553 |
+
"learning_rate": 0.005,
|
| 13554 |
+
"loss": 2.3839728832244873,
|
| 13555 |
+
"step": 3870
|
| 13556 |
+
},
|
| 13557 |
+
{
|
| 13558 |
+
"epoch": 0.09928205128205128,
|
| 13559 |
+
"grad_norm": 0.06103515625,
|
| 13560 |
+
"learning_rate": 0.005,
|
| 13561 |
+
"loss": 2.4111876487731934,
|
| 13562 |
+
"step": 3872
|
| 13563 |
+
},
|
| 13564 |
+
{
|
| 13565 |
+
"epoch": 0.09933333333333333,
|
| 13566 |
+
"grad_norm": 0.06298828125,
|
| 13567 |
+
"learning_rate": 0.005,
|
| 13568 |
+
"loss": 2.40545916557312,
|
| 13569 |
+
"step": 3874
|
| 13570 |
+
},
|
| 13571 |
+
{
|
| 13572 |
+
"epoch": 0.09938461538461539,
|
| 13573 |
+
"grad_norm": 0.06884765625,
|
| 13574 |
+
"learning_rate": 0.005,
|
| 13575 |
+
"loss": 2.386960983276367,
|
| 13576 |
+
"step": 3876
|
| 13577 |
+
},
|
| 13578 |
+
{
|
| 13579 |
+
"epoch": 0.09943589743589744,
|
| 13580 |
+
"grad_norm": 0.0712890625,
|
| 13581 |
+
"learning_rate": 0.005,
|
| 13582 |
+
"loss": 2.4083690643310547,
|
| 13583 |
+
"step": 3878
|
| 13584 |
+
},
|
| 13585 |
+
{
|
| 13586 |
+
"epoch": 0.09948717948717949,
|
| 13587 |
+
"grad_norm": 0.06689453125,
|
| 13588 |
+
"learning_rate": 0.005,
|
| 13589 |
+
"loss": 2.3959543704986572,
|
| 13590 |
+
"step": 3880
|
| 13591 |
+
},
|
| 13592 |
+
{
|
| 13593 |
+
"epoch": 0.09953846153846153,
|
| 13594 |
+
"grad_norm": 0.0732421875,
|
| 13595 |
+
"learning_rate": 0.005,
|
| 13596 |
+
"loss": 2.4017083644866943,
|
| 13597 |
+
"step": 3882
|
| 13598 |
+
},
|
| 13599 |
+
{
|
| 13600 |
+
"epoch": 0.0995897435897436,
|
| 13601 |
+
"grad_norm": 0.0703125,
|
| 13602 |
+
"learning_rate": 0.005,
|
| 13603 |
+
"loss": 2.4043755531311035,
|
| 13604 |
+
"step": 3884
|
| 13605 |
+
},
|
| 13606 |
+
{
|
| 13607 |
+
"epoch": 0.09964102564102564,
|
| 13608 |
+
"grad_norm": 0.055419921875,
|
| 13609 |
+
"learning_rate": 0.005,
|
| 13610 |
+
"loss": 2.3852298259735107,
|
| 13611 |
+
"step": 3886
|
| 13612 |
+
},
|
| 13613 |
+
{
|
| 13614 |
+
"epoch": 0.09969230769230769,
|
| 13615 |
+
"grad_norm": 0.07177734375,
|
| 13616 |
+
"learning_rate": 0.005,
|
| 13617 |
+
"loss": 2.4215145111083984,
|
| 13618 |
+
"step": 3888
|
| 13619 |
+
},
|
| 13620 |
+
{
|
| 13621 |
+
"epoch": 0.09974358974358974,
|
| 13622 |
+
"grad_norm": 0.08154296875,
|
| 13623 |
+
"learning_rate": 0.005,
|
| 13624 |
+
"loss": 2.4152660369873047,
|
| 13625 |
+
"step": 3890
|
| 13626 |
+
},
|
| 13627 |
+
{
|
| 13628 |
+
"epoch": 0.0997948717948718,
|
| 13629 |
+
"grad_norm": 0.0693359375,
|
| 13630 |
+
"learning_rate": 0.005,
|
| 13631 |
+
"loss": 2.417196035385132,
|
| 13632 |
+
"step": 3892
|
| 13633 |
+
},
|
| 13634 |
+
{
|
| 13635 |
+
"epoch": 0.09984615384615385,
|
| 13636 |
+
"grad_norm": 0.060791015625,
|
| 13637 |
+
"learning_rate": 0.005,
|
| 13638 |
+
"loss": 2.3966100215911865,
|
| 13639 |
+
"step": 3894
|
| 13640 |
+
},
|
| 13641 |
+
{
|
| 13642 |
+
"epoch": 0.0998974358974359,
|
| 13643 |
+
"grad_norm": 0.068359375,
|
| 13644 |
+
"learning_rate": 0.005,
|
| 13645 |
+
"loss": 2.3832428455352783,
|
| 13646 |
+
"step": 3896
|
| 13647 |
+
},
|
| 13648 |
+
{
|
| 13649 |
+
"epoch": 0.09994871794871794,
|
| 13650 |
+
"grad_norm": 0.0830078125,
|
| 13651 |
+
"learning_rate": 0.005,
|
| 13652 |
+
"loss": 2.427678346633911,
|
| 13653 |
+
"step": 3898
|
| 13654 |
+
},
|
| 13655 |
+
{
|
| 13656 |
+
"epoch": 0.1,
|
| 13657 |
+
"grad_norm": 0.0732421875,
|
| 13658 |
+
"learning_rate": 0.005,
|
| 13659 |
+
"loss": 2.4025189876556396,
|
| 13660 |
+
"step": 3900
|
| 13661 |
+
},
|
| 13662 |
+
{
|
| 13663 |
+
"epoch": 0.1,
|
| 13664 |
+
"eval_loss": 1.8239399194717407,
|
| 13665 |
+
"eval_runtime": 330.2352,
|
| 13666 |
+
"eval_samples_per_second": 6.52,
|
| 13667 |
+
"eval_steps_per_second": 1.632,
|
| 13668 |
+
"step": 3900
|
| 13669 |
}
|
| 13670 |
],
|
| 13671 |
"logging_steps": 2,
|
|
|
|
| 13685 |
"attributes": {}
|
| 13686 |
}
|
| 13687 |
},
|
| 13688 |
+
"total_flos": 8.686220099988235e+18,
|
| 13689 |
"train_batch_size": 4,
|
| 13690 |
"trial_name": null,
|
| 13691 |
"trial_params": null
|