Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 7875, checkpoint
Browse files
last-checkpoint/config.json
CHANGED
|
@@ -60,7 +60,7 @@
|
|
| 60 |
"max_position_embeddings": 8192,
|
| 61 |
"max_window_layers": 40,
|
| 62 |
"mlp_type": "squared_relu",
|
| 63 |
-
"model_name": "
|
| 64 |
"model_type": "qwen3",
|
| 65 |
"n_layer": 40,
|
| 66 |
"num_attention_heads": 16,
|
|
|
|
| 60 |
"max_position_embeddings": 8192,
|
| 61 |
"max_window_layers": 40,
|
| 62 |
"mlp_type": "squared_relu",
|
| 63 |
+
"model_name": "checkpoint-7560",
|
| 64 |
"model_type": "qwen3",
|
| 65 |
"n_layer": 40,
|
| 66 |
"num_attention_heads": 16,
|
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1171937904
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d2a7dbb293fc8a969155fa59a9a5d45fb487f92059756a2206dd2b213705ad34
|
| 3 |
size 1171937904
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1288212619
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:37e1fac40ab049effbace32a5577cae0116c678cd4eb1899d992ce7b00f84a24
|
| 3 |
size 1288212619
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1401
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a5e041e5f006a575c0c450022f325a9d63503c3927d9e10753083345cf295a3a
|
| 3 |
size 1401
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -26484,6 +26484,1105 @@
|
|
| 26484 |
"learning_rate": 0.1,
|
| 26485 |
"loss": 2.1111397743225098,
|
| 26486 |
"step": 7560
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26487 |
}
|
| 26488 |
],
|
| 26489 |
"logging_steps": 2,
|
|
@@ -26503,7 +27602,7 @@
|
|
| 26503 |
"attributes": {}
|
| 26504 |
}
|
| 26505 |
},
|
| 26506 |
-
"total_flos": 2.
|
| 26507 |
"train_batch_size": 4,
|
| 26508 |
"trial_name": null,
|
| 26509 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.25,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 7875,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 26484 |
"learning_rate": 0.1,
|
| 26485 |
"loss": 2.1111397743225098,
|
| 26486 |
"step": 7560
|
| 26487 |
+
},
|
| 26488 |
+
{
|
| 26489 |
+
"epoch": 0.24006349206349206,
|
| 26490 |
+
"grad_norm": 0.12890625,
|
| 26491 |
+
"learning_rate": 0.1,
|
| 26492 |
+
"loss": 2.108712673187256,
|
| 26493 |
+
"step": 7562
|
| 26494 |
+
},
|
| 26495 |
+
{
|
| 26496 |
+
"epoch": 0.24012698412698413,
|
| 26497 |
+
"grad_norm": 0.1064453125,
|
| 26498 |
+
"learning_rate": 0.1,
|
| 26499 |
+
"loss": 2.1471011638641357,
|
| 26500 |
+
"step": 7564
|
| 26501 |
+
},
|
| 26502 |
+
{
|
| 26503 |
+
"epoch": 0.2401904761904762,
|
| 26504 |
+
"grad_norm": 0.12890625,
|
| 26505 |
+
"learning_rate": 0.1,
|
| 26506 |
+
"loss": 2.1378467082977295,
|
| 26507 |
+
"step": 7566
|
| 26508 |
+
},
|
| 26509 |
+
{
|
| 26510 |
+
"epoch": 0.24025396825396825,
|
| 26511 |
+
"grad_norm": 0.2041015625,
|
| 26512 |
+
"learning_rate": 0.1,
|
| 26513 |
+
"loss": 2.1189959049224854,
|
| 26514 |
+
"step": 7568
|
| 26515 |
+
},
|
| 26516 |
+
{
|
| 26517 |
+
"epoch": 0.24031746031746032,
|
| 26518 |
+
"grad_norm": 0.0830078125,
|
| 26519 |
+
"learning_rate": 0.1,
|
| 26520 |
+
"loss": 2.15275239944458,
|
| 26521 |
+
"step": 7570
|
| 26522 |
+
},
|
| 26523 |
+
{
|
| 26524 |
+
"epoch": 0.2403809523809524,
|
| 26525 |
+
"grad_norm": 0.1474609375,
|
| 26526 |
+
"learning_rate": 0.1,
|
| 26527 |
+
"loss": 2.096132516860962,
|
| 26528 |
+
"step": 7572
|
| 26529 |
+
},
|
| 26530 |
+
{
|
| 26531 |
+
"epoch": 0.24044444444444443,
|
| 26532 |
+
"grad_norm": 0.25390625,
|
| 26533 |
+
"learning_rate": 0.1,
|
| 26534 |
+
"loss": 2.1388909816741943,
|
| 26535 |
+
"step": 7574
|
| 26536 |
+
},
|
| 26537 |
+
{
|
| 26538 |
+
"epoch": 0.2405079365079365,
|
| 26539 |
+
"grad_norm": 0.310546875,
|
| 26540 |
+
"learning_rate": 0.1,
|
| 26541 |
+
"loss": 2.1405465602874756,
|
| 26542 |
+
"step": 7576
|
| 26543 |
+
},
|
| 26544 |
+
{
|
| 26545 |
+
"epoch": 0.24057142857142857,
|
| 26546 |
+
"grad_norm": 0.0576171875,
|
| 26547 |
+
"learning_rate": 0.1,
|
| 26548 |
+
"loss": 2.168820858001709,
|
| 26549 |
+
"step": 7578
|
| 26550 |
+
},
|
| 26551 |
+
{
|
| 26552 |
+
"epoch": 0.24063492063492065,
|
| 26553 |
+
"grad_norm": 0.080078125,
|
| 26554 |
+
"learning_rate": 0.1,
|
| 26555 |
+
"loss": 2.110339403152466,
|
| 26556 |
+
"step": 7580
|
| 26557 |
+
},
|
| 26558 |
+
{
|
| 26559 |
+
"epoch": 0.2406984126984127,
|
| 26560 |
+
"grad_norm": 0.09130859375,
|
| 26561 |
+
"learning_rate": 0.1,
|
| 26562 |
+
"loss": 2.1240358352661133,
|
| 26563 |
+
"step": 7582
|
| 26564 |
+
},
|
| 26565 |
+
{
|
| 26566 |
+
"epoch": 0.24076190476190476,
|
| 26567 |
+
"grad_norm": 0.103515625,
|
| 26568 |
+
"learning_rate": 0.1,
|
| 26569 |
+
"loss": 2.0973892211914062,
|
| 26570 |
+
"step": 7584
|
| 26571 |
+
},
|
| 26572 |
+
{
|
| 26573 |
+
"epoch": 0.24082539682539683,
|
| 26574 |
+
"grad_norm": 0.08935546875,
|
| 26575 |
+
"learning_rate": 0.1,
|
| 26576 |
+
"loss": 2.1381688117980957,
|
| 26577 |
+
"step": 7586
|
| 26578 |
+
},
|
| 26579 |
+
{
|
| 26580 |
+
"epoch": 0.2408888888888889,
|
| 26581 |
+
"grad_norm": 0.14453125,
|
| 26582 |
+
"learning_rate": 0.1,
|
| 26583 |
+
"loss": 2.106537103652954,
|
| 26584 |
+
"step": 7588
|
| 26585 |
+
},
|
| 26586 |
+
{
|
| 26587 |
+
"epoch": 0.24095238095238095,
|
| 26588 |
+
"grad_norm": 0.2392578125,
|
| 26589 |
+
"learning_rate": 0.1,
|
| 26590 |
+
"loss": 2.1185178756713867,
|
| 26591 |
+
"step": 7590
|
| 26592 |
+
},
|
| 26593 |
+
{
|
| 26594 |
+
"epoch": 0.24101587301587302,
|
| 26595 |
+
"grad_norm": 0.19140625,
|
| 26596 |
+
"learning_rate": 0.1,
|
| 26597 |
+
"loss": 2.120662212371826,
|
| 26598 |
+
"step": 7592
|
| 26599 |
+
},
|
| 26600 |
+
{
|
| 26601 |
+
"epoch": 0.2410793650793651,
|
| 26602 |
+
"grad_norm": 0.197265625,
|
| 26603 |
+
"learning_rate": 0.1,
|
| 26604 |
+
"loss": 2.130314588546753,
|
| 26605 |
+
"step": 7594
|
| 26606 |
+
},
|
| 26607 |
+
{
|
| 26608 |
+
"epoch": 0.24114285714285713,
|
| 26609 |
+
"grad_norm": 0.1279296875,
|
| 26610 |
+
"learning_rate": 0.1,
|
| 26611 |
+
"loss": 2.1311545372009277,
|
| 26612 |
+
"step": 7596
|
| 26613 |
+
},
|
| 26614 |
+
{
|
| 26615 |
+
"epoch": 0.2412063492063492,
|
| 26616 |
+
"grad_norm": 0.1201171875,
|
| 26617 |
+
"learning_rate": 0.1,
|
| 26618 |
+
"loss": 2.1440396308898926,
|
| 26619 |
+
"step": 7598
|
| 26620 |
+
},
|
| 26621 |
+
{
|
| 26622 |
+
"epoch": 0.24126984126984127,
|
| 26623 |
+
"grad_norm": 0.1953125,
|
| 26624 |
+
"learning_rate": 0.1,
|
| 26625 |
+
"loss": 2.121884822845459,
|
| 26626 |
+
"step": 7600
|
| 26627 |
+
},
|
| 26628 |
+
{
|
| 26629 |
+
"epoch": 0.24133333333333334,
|
| 26630 |
+
"grad_norm": 0.07080078125,
|
| 26631 |
+
"learning_rate": 0.1,
|
| 26632 |
+
"loss": 2.1237287521362305,
|
| 26633 |
+
"step": 7602
|
| 26634 |
+
},
|
| 26635 |
+
{
|
| 26636 |
+
"epoch": 0.2413968253968254,
|
| 26637 |
+
"grad_norm": 0.1826171875,
|
| 26638 |
+
"learning_rate": 0.1,
|
| 26639 |
+
"loss": 2.1442861557006836,
|
| 26640 |
+
"step": 7604
|
| 26641 |
+
},
|
| 26642 |
+
{
|
| 26643 |
+
"epoch": 0.24146031746031746,
|
| 26644 |
+
"grad_norm": 0.49609375,
|
| 26645 |
+
"learning_rate": 0.1,
|
| 26646 |
+
"loss": 2.095569372177124,
|
| 26647 |
+
"step": 7606
|
| 26648 |
+
},
|
| 26649 |
+
{
|
| 26650 |
+
"epoch": 0.24152380952380953,
|
| 26651 |
+
"grad_norm": 0.12109375,
|
| 26652 |
+
"learning_rate": 0.1,
|
| 26653 |
+
"loss": 2.1049482822418213,
|
| 26654 |
+
"step": 7608
|
| 26655 |
+
},
|
| 26656 |
+
{
|
| 26657 |
+
"epoch": 0.2415873015873016,
|
| 26658 |
+
"grad_norm": 0.10888671875,
|
| 26659 |
+
"learning_rate": 0.1,
|
| 26660 |
+
"loss": 2.1004626750946045,
|
| 26661 |
+
"step": 7610
|
| 26662 |
+
},
|
| 26663 |
+
{
|
| 26664 |
+
"epoch": 0.24165079365079364,
|
| 26665 |
+
"grad_norm": 0.10546875,
|
| 26666 |
+
"learning_rate": 0.1,
|
| 26667 |
+
"loss": 2.107957363128662,
|
| 26668 |
+
"step": 7612
|
| 26669 |
+
},
|
| 26670 |
+
{
|
| 26671 |
+
"epoch": 0.24171428571428571,
|
| 26672 |
+
"grad_norm": 0.05810546875,
|
| 26673 |
+
"learning_rate": 0.1,
|
| 26674 |
+
"loss": 2.1209492683410645,
|
| 26675 |
+
"step": 7614
|
| 26676 |
+
},
|
| 26677 |
+
{
|
| 26678 |
+
"epoch": 0.24177777777777779,
|
| 26679 |
+
"grad_norm": 0.08251953125,
|
| 26680 |
+
"learning_rate": 0.1,
|
| 26681 |
+
"loss": 2.1376209259033203,
|
| 26682 |
+
"step": 7616
|
| 26683 |
+
},
|
| 26684 |
+
{
|
| 26685 |
+
"epoch": 0.24184126984126983,
|
| 26686 |
+
"grad_norm": 0.09521484375,
|
| 26687 |
+
"learning_rate": 0.1,
|
| 26688 |
+
"loss": 2.126371383666992,
|
| 26689 |
+
"step": 7618
|
| 26690 |
+
},
|
| 26691 |
+
{
|
| 26692 |
+
"epoch": 0.2419047619047619,
|
| 26693 |
+
"grad_norm": 0.057373046875,
|
| 26694 |
+
"learning_rate": 0.1,
|
| 26695 |
+
"loss": 2.1156535148620605,
|
| 26696 |
+
"step": 7620
|
| 26697 |
+
},
|
| 26698 |
+
{
|
| 26699 |
+
"epoch": 0.24196825396825397,
|
| 26700 |
+
"grad_norm": 0.12451171875,
|
| 26701 |
+
"learning_rate": 0.1,
|
| 26702 |
+
"loss": 2.100332260131836,
|
| 26703 |
+
"step": 7622
|
| 26704 |
+
},
|
| 26705 |
+
{
|
| 26706 |
+
"epoch": 0.24203174603174604,
|
| 26707 |
+
"grad_norm": 0.3359375,
|
| 26708 |
+
"learning_rate": 0.1,
|
| 26709 |
+
"loss": 2.0983855724334717,
|
| 26710 |
+
"step": 7624
|
| 26711 |
+
},
|
| 26712 |
+
{
|
| 26713 |
+
"epoch": 0.24209523809523809,
|
| 26714 |
+
"grad_norm": 0.2412109375,
|
| 26715 |
+
"learning_rate": 0.1,
|
| 26716 |
+
"loss": 2.110562324523926,
|
| 26717 |
+
"step": 7626
|
| 26718 |
+
},
|
| 26719 |
+
{
|
| 26720 |
+
"epoch": 0.24215873015873016,
|
| 26721 |
+
"grad_norm": 0.06396484375,
|
| 26722 |
+
"learning_rate": 0.1,
|
| 26723 |
+
"loss": 2.120020866394043,
|
| 26724 |
+
"step": 7628
|
| 26725 |
+
},
|
| 26726 |
+
{
|
| 26727 |
+
"epoch": 0.24222222222222223,
|
| 26728 |
+
"grad_norm": 0.2265625,
|
| 26729 |
+
"learning_rate": 0.1,
|
| 26730 |
+
"loss": 2.1192984580993652,
|
| 26731 |
+
"step": 7630
|
| 26732 |
+
},
|
| 26733 |
+
{
|
| 26734 |
+
"epoch": 0.2422857142857143,
|
| 26735 |
+
"grad_norm": 0.3125,
|
| 26736 |
+
"learning_rate": 0.1,
|
| 26737 |
+
"loss": 2.1244804859161377,
|
| 26738 |
+
"step": 7632
|
| 26739 |
+
},
|
| 26740 |
+
{
|
| 26741 |
+
"epoch": 0.24234920634920634,
|
| 26742 |
+
"grad_norm": 0.15234375,
|
| 26743 |
+
"learning_rate": 0.1,
|
| 26744 |
+
"loss": 2.113220691680908,
|
| 26745 |
+
"step": 7634
|
| 26746 |
+
},
|
| 26747 |
+
{
|
| 26748 |
+
"epoch": 0.2424126984126984,
|
| 26749 |
+
"grad_norm": 0.099609375,
|
| 26750 |
+
"learning_rate": 0.1,
|
| 26751 |
+
"loss": 2.09896183013916,
|
| 26752 |
+
"step": 7636
|
| 26753 |
+
},
|
| 26754 |
+
{
|
| 26755 |
+
"epoch": 0.24247619047619048,
|
| 26756 |
+
"grad_norm": 0.1640625,
|
| 26757 |
+
"learning_rate": 0.1,
|
| 26758 |
+
"loss": 2.1062746047973633,
|
| 26759 |
+
"step": 7638
|
| 26760 |
+
},
|
| 26761 |
+
{
|
| 26762 |
+
"epoch": 0.24253968253968253,
|
| 26763 |
+
"grad_norm": 0.076171875,
|
| 26764 |
+
"learning_rate": 0.1,
|
| 26765 |
+
"loss": 2.0957741737365723,
|
| 26766 |
+
"step": 7640
|
| 26767 |
+
},
|
| 26768 |
+
{
|
| 26769 |
+
"epoch": 0.2426031746031746,
|
| 26770 |
+
"grad_norm": 0.2197265625,
|
| 26771 |
+
"learning_rate": 0.1,
|
| 26772 |
+
"loss": 2.1229372024536133,
|
| 26773 |
+
"step": 7642
|
| 26774 |
+
},
|
| 26775 |
+
{
|
| 26776 |
+
"epoch": 0.24266666666666667,
|
| 26777 |
+
"grad_norm": 0.29296875,
|
| 26778 |
+
"learning_rate": 0.1,
|
| 26779 |
+
"loss": 2.109020471572876,
|
| 26780 |
+
"step": 7644
|
| 26781 |
+
},
|
| 26782 |
+
{
|
| 26783 |
+
"epoch": 0.24273015873015874,
|
| 26784 |
+
"grad_norm": 0.2470703125,
|
| 26785 |
+
"learning_rate": 0.1,
|
| 26786 |
+
"loss": 2.1225526332855225,
|
| 26787 |
+
"step": 7646
|
| 26788 |
+
},
|
| 26789 |
+
{
|
| 26790 |
+
"epoch": 0.24279365079365078,
|
| 26791 |
+
"grad_norm": 0.1064453125,
|
| 26792 |
+
"learning_rate": 0.1,
|
| 26793 |
+
"loss": 2.0903031826019287,
|
| 26794 |
+
"step": 7648
|
| 26795 |
+
},
|
| 26796 |
+
{
|
| 26797 |
+
"epoch": 0.24285714285714285,
|
| 26798 |
+
"grad_norm": 0.1162109375,
|
| 26799 |
+
"learning_rate": 0.1,
|
| 26800 |
+
"loss": 2.0699353218078613,
|
| 26801 |
+
"step": 7650
|
| 26802 |
+
},
|
| 26803 |
+
{
|
| 26804 |
+
"epoch": 0.24292063492063493,
|
| 26805 |
+
"grad_norm": 0.06982421875,
|
| 26806 |
+
"learning_rate": 0.1,
|
| 26807 |
+
"loss": 2.1053171157836914,
|
| 26808 |
+
"step": 7652
|
| 26809 |
+
},
|
| 26810 |
+
{
|
| 26811 |
+
"epoch": 0.242984126984127,
|
| 26812 |
+
"grad_norm": 0.1279296875,
|
| 26813 |
+
"learning_rate": 0.1,
|
| 26814 |
+
"loss": 2.0997400283813477,
|
| 26815 |
+
"step": 7654
|
| 26816 |
+
},
|
| 26817 |
+
{
|
| 26818 |
+
"epoch": 0.24304761904761904,
|
| 26819 |
+
"grad_norm": 0.08154296875,
|
| 26820 |
+
"learning_rate": 0.1,
|
| 26821 |
+
"loss": 2.1162967681884766,
|
| 26822 |
+
"step": 7656
|
| 26823 |
+
},
|
| 26824 |
+
{
|
| 26825 |
+
"epoch": 0.2431111111111111,
|
| 26826 |
+
"grad_norm": 0.06640625,
|
| 26827 |
+
"learning_rate": 0.1,
|
| 26828 |
+
"loss": 2.1261515617370605,
|
| 26829 |
+
"step": 7658
|
| 26830 |
+
},
|
| 26831 |
+
{
|
| 26832 |
+
"epoch": 0.24317460317460318,
|
| 26833 |
+
"grad_norm": 0.1484375,
|
| 26834 |
+
"learning_rate": 0.1,
|
| 26835 |
+
"loss": 2.093658685684204,
|
| 26836 |
+
"step": 7660
|
| 26837 |
+
},
|
| 26838 |
+
{
|
| 26839 |
+
"epoch": 0.24323809523809523,
|
| 26840 |
+
"grad_norm": 0.146484375,
|
| 26841 |
+
"learning_rate": 0.1,
|
| 26842 |
+
"loss": 2.099531888961792,
|
| 26843 |
+
"step": 7662
|
| 26844 |
+
},
|
| 26845 |
+
{
|
| 26846 |
+
"epoch": 0.2433015873015873,
|
| 26847 |
+
"grad_norm": 0.142578125,
|
| 26848 |
+
"learning_rate": 0.1,
|
| 26849 |
+
"loss": 2.098545789718628,
|
| 26850 |
+
"step": 7664
|
| 26851 |
+
},
|
| 26852 |
+
{
|
| 26853 |
+
"epoch": 0.24336507936507937,
|
| 26854 |
+
"grad_norm": 0.283203125,
|
| 26855 |
+
"learning_rate": 0.1,
|
| 26856 |
+
"loss": 2.0889580249786377,
|
| 26857 |
+
"step": 7666
|
| 26858 |
+
},
|
| 26859 |
+
{
|
| 26860 |
+
"epoch": 0.24342857142857144,
|
| 26861 |
+
"grad_norm": 0.244140625,
|
| 26862 |
+
"learning_rate": 0.1,
|
| 26863 |
+
"loss": 2.082266330718994,
|
| 26864 |
+
"step": 7668
|
| 26865 |
+
},
|
| 26866 |
+
{
|
| 26867 |
+
"epoch": 0.24349206349206348,
|
| 26868 |
+
"grad_norm": 0.09814453125,
|
| 26869 |
+
"learning_rate": 0.1,
|
| 26870 |
+
"loss": 2.0861690044403076,
|
| 26871 |
+
"step": 7670
|
| 26872 |
+
},
|
| 26873 |
+
{
|
| 26874 |
+
"epoch": 0.24355555555555555,
|
| 26875 |
+
"grad_norm": 0.1669921875,
|
| 26876 |
+
"learning_rate": 0.1,
|
| 26877 |
+
"loss": 2.113250732421875,
|
| 26878 |
+
"step": 7672
|
| 26879 |
+
},
|
| 26880 |
+
{
|
| 26881 |
+
"epoch": 0.24361904761904762,
|
| 26882 |
+
"grad_norm": 0.2392578125,
|
| 26883 |
+
"learning_rate": 0.1,
|
| 26884 |
+
"loss": 2.147066593170166,
|
| 26885 |
+
"step": 7674
|
| 26886 |
+
},
|
| 26887 |
+
{
|
| 26888 |
+
"epoch": 0.2436825396825397,
|
| 26889 |
+
"grad_norm": 0.1162109375,
|
| 26890 |
+
"learning_rate": 0.1,
|
| 26891 |
+
"loss": 2.146116256713867,
|
| 26892 |
+
"step": 7676
|
| 26893 |
+
},
|
| 26894 |
+
{
|
| 26895 |
+
"epoch": 0.24374603174603174,
|
| 26896 |
+
"grad_norm": 0.16796875,
|
| 26897 |
+
"learning_rate": 0.1,
|
| 26898 |
+
"loss": 2.1041014194488525,
|
| 26899 |
+
"step": 7678
|
| 26900 |
+
},
|
| 26901 |
+
{
|
| 26902 |
+
"epoch": 0.2438095238095238,
|
| 26903 |
+
"grad_norm": 0.12109375,
|
| 26904 |
+
"learning_rate": 0.1,
|
| 26905 |
+
"loss": 2.1133410930633545,
|
| 26906 |
+
"step": 7680
|
| 26907 |
+
},
|
| 26908 |
+
{
|
| 26909 |
+
"epoch": 0.24387301587301588,
|
| 26910 |
+
"grad_norm": 0.23046875,
|
| 26911 |
+
"learning_rate": 0.1,
|
| 26912 |
+
"loss": 2.0914034843444824,
|
| 26913 |
+
"step": 7682
|
| 26914 |
+
},
|
| 26915 |
+
{
|
| 26916 |
+
"epoch": 0.24393650793650792,
|
| 26917 |
+
"grad_norm": 0.1396484375,
|
| 26918 |
+
"learning_rate": 0.1,
|
| 26919 |
+
"loss": 2.0841925144195557,
|
| 26920 |
+
"step": 7684
|
| 26921 |
+
},
|
| 26922 |
+
{
|
| 26923 |
+
"epoch": 0.244,
|
| 26924 |
+
"grad_norm": 0.0859375,
|
| 26925 |
+
"learning_rate": 0.1,
|
| 26926 |
+
"loss": 2.0767691135406494,
|
| 26927 |
+
"step": 7686
|
| 26928 |
+
},
|
| 26929 |
+
{
|
| 26930 |
+
"epoch": 0.24406349206349207,
|
| 26931 |
+
"grad_norm": 0.056640625,
|
| 26932 |
+
"learning_rate": 0.1,
|
| 26933 |
+
"loss": 2.095615863800049,
|
| 26934 |
+
"step": 7688
|
| 26935 |
+
},
|
| 26936 |
+
{
|
| 26937 |
+
"epoch": 0.24412698412698414,
|
| 26938 |
+
"grad_norm": 0.1787109375,
|
| 26939 |
+
"learning_rate": 0.1,
|
| 26940 |
+
"loss": 2.0853114128112793,
|
| 26941 |
+
"step": 7690
|
| 26942 |
+
},
|
| 26943 |
+
{
|
| 26944 |
+
"epoch": 0.24419047619047618,
|
| 26945 |
+
"grad_norm": 0.267578125,
|
| 26946 |
+
"learning_rate": 0.1,
|
| 26947 |
+
"loss": 2.0804555416107178,
|
| 26948 |
+
"step": 7692
|
| 26949 |
+
},
|
| 26950 |
+
{
|
| 26951 |
+
"epoch": 0.24425396825396825,
|
| 26952 |
+
"grad_norm": 0.12890625,
|
| 26953 |
+
"learning_rate": 0.1,
|
| 26954 |
+
"loss": 2.0776097774505615,
|
| 26955 |
+
"step": 7694
|
| 26956 |
+
},
|
| 26957 |
+
{
|
| 26958 |
+
"epoch": 0.24431746031746032,
|
| 26959 |
+
"grad_norm": 0.07470703125,
|
| 26960 |
+
"learning_rate": 0.1,
|
| 26961 |
+
"loss": 2.124927282333374,
|
| 26962 |
+
"step": 7696
|
| 26963 |
+
},
|
| 26964 |
+
{
|
| 26965 |
+
"epoch": 0.2443809523809524,
|
| 26966 |
+
"grad_norm": 0.2470703125,
|
| 26967 |
+
"learning_rate": 0.1,
|
| 26968 |
+
"loss": 2.095863103866577,
|
| 26969 |
+
"step": 7698
|
| 26970 |
+
},
|
| 26971 |
+
{
|
| 26972 |
+
"epoch": 0.24444444444444444,
|
| 26973 |
+
"grad_norm": 0.33984375,
|
| 26974 |
+
"learning_rate": 0.1,
|
| 26975 |
+
"loss": 2.1371068954467773,
|
| 26976 |
+
"step": 7700
|
| 26977 |
+
},
|
| 26978 |
+
{
|
| 26979 |
+
"epoch": 0.2445079365079365,
|
| 26980 |
+
"grad_norm": 0.056640625,
|
| 26981 |
+
"learning_rate": 0.1,
|
| 26982 |
+
"loss": 2.0929276943206787,
|
| 26983 |
+
"step": 7702
|
| 26984 |
+
},
|
| 26985 |
+
{
|
| 26986 |
+
"epoch": 0.24457142857142858,
|
| 26987 |
+
"grad_norm": 0.10400390625,
|
| 26988 |
+
"learning_rate": 0.1,
|
| 26989 |
+
"loss": 2.104721784591675,
|
| 26990 |
+
"step": 7704
|
| 26991 |
+
},
|
| 26992 |
+
{
|
| 26993 |
+
"epoch": 0.24463492063492062,
|
| 26994 |
+
"grad_norm": 0.11474609375,
|
| 26995 |
+
"learning_rate": 0.1,
|
| 26996 |
+
"loss": 2.08351731300354,
|
| 26997 |
+
"step": 7706
|
| 26998 |
+
},
|
| 26999 |
+
{
|
| 27000 |
+
"epoch": 0.2446984126984127,
|
| 27001 |
+
"grad_norm": 0.234375,
|
| 27002 |
+
"learning_rate": 0.1,
|
| 27003 |
+
"loss": 2.108940839767456,
|
| 27004 |
+
"step": 7708
|
| 27005 |
+
},
|
| 27006 |
+
{
|
| 27007 |
+
"epoch": 0.24476190476190476,
|
| 27008 |
+
"grad_norm": 0.248046875,
|
| 27009 |
+
"learning_rate": 0.1,
|
| 27010 |
+
"loss": 2.1364517211914062,
|
| 27011 |
+
"step": 7710
|
| 27012 |
+
},
|
| 27013 |
+
{
|
| 27014 |
+
"epoch": 0.24482539682539683,
|
| 27015 |
+
"grad_norm": 0.10546875,
|
| 27016 |
+
"learning_rate": 0.1,
|
| 27017 |
+
"loss": 2.104205846786499,
|
| 27018 |
+
"step": 7712
|
| 27019 |
+
},
|
| 27020 |
+
{
|
| 27021 |
+
"epoch": 0.24488888888888888,
|
| 27022 |
+
"grad_norm": 0.15625,
|
| 27023 |
+
"learning_rate": 0.1,
|
| 27024 |
+
"loss": 2.0820083618164062,
|
| 27025 |
+
"step": 7714
|
| 27026 |
+
},
|
| 27027 |
+
{
|
| 27028 |
+
"epoch": 0.24495238095238095,
|
| 27029 |
+
"grad_norm": 0.0810546875,
|
| 27030 |
+
"learning_rate": 0.1,
|
| 27031 |
+
"loss": 2.1198596954345703,
|
| 27032 |
+
"step": 7716
|
| 27033 |
+
},
|
| 27034 |
+
{
|
| 27035 |
+
"epoch": 0.24501587301587302,
|
| 27036 |
+
"grad_norm": 0.1396484375,
|
| 27037 |
+
"learning_rate": 0.1,
|
| 27038 |
+
"loss": 2.0827841758728027,
|
| 27039 |
+
"step": 7718
|
| 27040 |
+
},
|
| 27041 |
+
{
|
| 27042 |
+
"epoch": 0.2450793650793651,
|
| 27043 |
+
"grad_norm": 0.1708984375,
|
| 27044 |
+
"learning_rate": 0.1,
|
| 27045 |
+
"loss": 2.100688934326172,
|
| 27046 |
+
"step": 7720
|
| 27047 |
+
},
|
| 27048 |
+
{
|
| 27049 |
+
"epoch": 0.24514285714285713,
|
| 27050 |
+
"grad_norm": 0.34765625,
|
| 27051 |
+
"learning_rate": 0.1,
|
| 27052 |
+
"loss": 2.1275124549865723,
|
| 27053 |
+
"step": 7722
|
| 27054 |
+
},
|
| 27055 |
+
{
|
| 27056 |
+
"epoch": 0.2452063492063492,
|
| 27057 |
+
"grad_norm": 0.076171875,
|
| 27058 |
+
"learning_rate": 0.1,
|
| 27059 |
+
"loss": 2.0562610626220703,
|
| 27060 |
+
"step": 7724
|
| 27061 |
+
},
|
| 27062 |
+
{
|
| 27063 |
+
"epoch": 0.24526984126984128,
|
| 27064 |
+
"grad_norm": 0.271484375,
|
| 27065 |
+
"learning_rate": 0.1,
|
| 27066 |
+
"loss": 2.088521957397461,
|
| 27067 |
+
"step": 7726
|
| 27068 |
+
},
|
| 27069 |
+
{
|
| 27070 |
+
"epoch": 0.24533333333333332,
|
| 27071 |
+
"grad_norm": 0.212890625,
|
| 27072 |
+
"learning_rate": 0.1,
|
| 27073 |
+
"loss": 2.1251540184020996,
|
| 27074 |
+
"step": 7728
|
| 27075 |
+
},
|
| 27076 |
+
{
|
| 27077 |
+
"epoch": 0.2453968253968254,
|
| 27078 |
+
"grad_norm": 0.1572265625,
|
| 27079 |
+
"learning_rate": 0.1,
|
| 27080 |
+
"loss": 2.110805034637451,
|
| 27081 |
+
"step": 7730
|
| 27082 |
+
},
|
| 27083 |
+
{
|
| 27084 |
+
"epoch": 0.24546031746031746,
|
| 27085 |
+
"grad_norm": 0.291015625,
|
| 27086 |
+
"learning_rate": 0.1,
|
| 27087 |
+
"loss": 2.1409051418304443,
|
| 27088 |
+
"step": 7732
|
| 27089 |
+
},
|
| 27090 |
+
{
|
| 27091 |
+
"epoch": 0.24552380952380953,
|
| 27092 |
+
"grad_norm": 0.072265625,
|
| 27093 |
+
"learning_rate": 0.1,
|
| 27094 |
+
"loss": 2.089325428009033,
|
| 27095 |
+
"step": 7734
|
| 27096 |
+
},
|
| 27097 |
+
{
|
| 27098 |
+
"epoch": 0.24558730158730158,
|
| 27099 |
+
"grad_norm": 0.09619140625,
|
| 27100 |
+
"learning_rate": 0.1,
|
| 27101 |
+
"loss": 2.116668224334717,
|
| 27102 |
+
"step": 7736
|
| 27103 |
+
},
|
| 27104 |
+
{
|
| 27105 |
+
"epoch": 0.24565079365079365,
|
| 27106 |
+
"grad_norm": 0.271484375,
|
| 27107 |
+
"learning_rate": 0.1,
|
| 27108 |
+
"loss": 2.124164581298828,
|
| 27109 |
+
"step": 7738
|
| 27110 |
+
},
|
| 27111 |
+
{
|
| 27112 |
+
"epoch": 0.24571428571428572,
|
| 27113 |
+
"grad_norm": 0.150390625,
|
| 27114 |
+
"learning_rate": 0.1,
|
| 27115 |
+
"loss": 2.1034581661224365,
|
| 27116 |
+
"step": 7740
|
| 27117 |
+
},
|
| 27118 |
+
{
|
| 27119 |
+
"epoch": 0.2457777777777778,
|
| 27120 |
+
"grad_norm": 0.08251953125,
|
| 27121 |
+
"learning_rate": 0.1,
|
| 27122 |
+
"loss": 2.0777595043182373,
|
| 27123 |
+
"step": 7742
|
| 27124 |
+
},
|
| 27125 |
+
{
|
| 27126 |
+
"epoch": 0.24584126984126983,
|
| 27127 |
+
"grad_norm": 0.0732421875,
|
| 27128 |
+
"learning_rate": 0.1,
|
| 27129 |
+
"loss": 2.087156057357788,
|
| 27130 |
+
"step": 7744
|
| 27131 |
+
},
|
| 27132 |
+
{
|
| 27133 |
+
"epoch": 0.2459047619047619,
|
| 27134 |
+
"grad_norm": 0.2001953125,
|
| 27135 |
+
"learning_rate": 0.1,
|
| 27136 |
+
"loss": 2.0853726863861084,
|
| 27137 |
+
"step": 7746
|
| 27138 |
+
},
|
| 27139 |
+
{
|
| 27140 |
+
"epoch": 0.24596825396825397,
|
| 27141 |
+
"grad_norm": 0.34375,
|
| 27142 |
+
"learning_rate": 0.1,
|
| 27143 |
+
"loss": 2.1072938442230225,
|
| 27144 |
+
"step": 7748
|
| 27145 |
+
},
|
| 27146 |
+
{
|
| 27147 |
+
"epoch": 0.24603174603174602,
|
| 27148 |
+
"grad_norm": 0.0869140625,
|
| 27149 |
+
"learning_rate": 0.1,
|
| 27150 |
+
"loss": 2.0686442852020264,
|
| 27151 |
+
"step": 7750
|
| 27152 |
+
},
|
| 27153 |
+
{
|
| 27154 |
+
"epoch": 0.2460952380952381,
|
| 27155 |
+
"grad_norm": 0.1474609375,
|
| 27156 |
+
"learning_rate": 0.1,
|
| 27157 |
+
"loss": 2.0939245223999023,
|
| 27158 |
+
"step": 7752
|
| 27159 |
+
},
|
| 27160 |
+
{
|
| 27161 |
+
"epoch": 0.24615873015873016,
|
| 27162 |
+
"grad_norm": 0.095703125,
|
| 27163 |
+
"learning_rate": 0.1,
|
| 27164 |
+
"loss": 2.0739314556121826,
|
| 27165 |
+
"step": 7754
|
| 27166 |
+
},
|
| 27167 |
+
{
|
| 27168 |
+
"epoch": 0.24622222222222223,
|
| 27169 |
+
"grad_norm": 0.1572265625,
|
| 27170 |
+
"learning_rate": 0.1,
|
| 27171 |
+
"loss": 2.128819465637207,
|
| 27172 |
+
"step": 7756
|
| 27173 |
+
},
|
| 27174 |
+
{
|
| 27175 |
+
"epoch": 0.24628571428571427,
|
| 27176 |
+
"grad_norm": 0.1171875,
|
| 27177 |
+
"learning_rate": 0.1,
|
| 27178 |
+
"loss": 2.080601453781128,
|
| 27179 |
+
"step": 7758
|
| 27180 |
+
},
|
| 27181 |
+
{
|
| 27182 |
+
"epoch": 0.24634920634920635,
|
| 27183 |
+
"grad_norm": 0.0625,
|
| 27184 |
+
"learning_rate": 0.1,
|
| 27185 |
+
"loss": 2.1003646850585938,
|
| 27186 |
+
"step": 7760
|
| 27187 |
+
},
|
| 27188 |
+
{
|
| 27189 |
+
"epoch": 0.24641269841269842,
|
| 27190 |
+
"grad_norm": 0.134765625,
|
| 27191 |
+
"learning_rate": 0.1,
|
| 27192 |
+
"loss": 2.07106614112854,
|
| 27193 |
+
"step": 7762
|
| 27194 |
+
},
|
| 27195 |
+
{
|
| 27196 |
+
"epoch": 0.2464761904761905,
|
| 27197 |
+
"grad_norm": 0.26171875,
|
| 27198 |
+
"learning_rate": 0.1,
|
| 27199 |
+
"loss": 2.0942091941833496,
|
| 27200 |
+
"step": 7764
|
| 27201 |
+
},
|
| 27202 |
+
{
|
| 27203 |
+
"epoch": 0.24653968253968253,
|
| 27204 |
+
"grad_norm": 0.125,
|
| 27205 |
+
"learning_rate": 0.1,
|
| 27206 |
+
"loss": 2.0892837047576904,
|
| 27207 |
+
"step": 7766
|
| 27208 |
+
},
|
| 27209 |
+
{
|
| 27210 |
+
"epoch": 0.2466031746031746,
|
| 27211 |
+
"grad_norm": 0.171875,
|
| 27212 |
+
"learning_rate": 0.1,
|
| 27213 |
+
"loss": 2.1195573806762695,
|
| 27214 |
+
"step": 7768
|
| 27215 |
+
},
|
| 27216 |
+
{
|
| 27217 |
+
"epoch": 0.24666666666666667,
|
| 27218 |
+
"grad_norm": 0.28125,
|
| 27219 |
+
"learning_rate": 0.1,
|
| 27220 |
+
"loss": 2.1099534034729004,
|
| 27221 |
+
"step": 7770
|
| 27222 |
+
},
|
| 27223 |
+
{
|
| 27224 |
+
"epoch": 0.24673015873015874,
|
| 27225 |
+
"grad_norm": 0.134765625,
|
| 27226 |
+
"learning_rate": 0.1,
|
| 27227 |
+
"loss": 2.081984281539917,
|
| 27228 |
+
"step": 7772
|
| 27229 |
+
},
|
| 27230 |
+
{
|
| 27231 |
+
"epoch": 0.2467936507936508,
|
| 27232 |
+
"grad_norm": 0.205078125,
|
| 27233 |
+
"learning_rate": 0.1,
|
| 27234 |
+
"loss": 2.0995848178863525,
|
| 27235 |
+
"step": 7774
|
| 27236 |
+
},
|
| 27237 |
+
{
|
| 27238 |
+
"epoch": 0.24685714285714286,
|
| 27239 |
+
"grad_norm": 0.1826171875,
|
| 27240 |
+
"learning_rate": 0.1,
|
| 27241 |
+
"loss": 2.1072371006011963,
|
| 27242 |
+
"step": 7776
|
| 27243 |
+
},
|
| 27244 |
+
{
|
| 27245 |
+
"epoch": 0.24692063492063493,
|
| 27246 |
+
"grad_norm": 0.08642578125,
|
| 27247 |
+
"learning_rate": 0.1,
|
| 27248 |
+
"loss": 2.0923726558685303,
|
| 27249 |
+
"step": 7778
|
| 27250 |
+
},
|
| 27251 |
+
{
|
| 27252 |
+
"epoch": 0.24698412698412697,
|
| 27253 |
+
"grad_norm": 0.052978515625,
|
| 27254 |
+
"learning_rate": 0.1,
|
| 27255 |
+
"loss": 2.088723659515381,
|
| 27256 |
+
"step": 7780
|
| 27257 |
+
},
|
| 27258 |
+
{
|
| 27259 |
+
"epoch": 0.24704761904761904,
|
| 27260 |
+
"grad_norm": 0.16796875,
|
| 27261 |
+
"learning_rate": 0.1,
|
| 27262 |
+
"loss": 2.112581968307495,
|
| 27263 |
+
"step": 7782
|
| 27264 |
+
},
|
| 27265 |
+
{
|
| 27266 |
+
"epoch": 0.24711111111111111,
|
| 27267 |
+
"grad_norm": 0.244140625,
|
| 27268 |
+
"learning_rate": 0.1,
|
| 27269 |
+
"loss": 2.115924596786499,
|
| 27270 |
+
"step": 7784
|
| 27271 |
+
},
|
| 27272 |
+
{
|
| 27273 |
+
"epoch": 0.24717460317460319,
|
| 27274 |
+
"grad_norm": 0.059814453125,
|
| 27275 |
+
"learning_rate": 0.1,
|
| 27276 |
+
"loss": 2.103855609893799,
|
| 27277 |
+
"step": 7786
|
| 27278 |
+
},
|
| 27279 |
+
{
|
| 27280 |
+
"epoch": 0.24723809523809523,
|
| 27281 |
+
"grad_norm": 0.169921875,
|
| 27282 |
+
"learning_rate": 0.1,
|
| 27283 |
+
"loss": 2.0790719985961914,
|
| 27284 |
+
"step": 7788
|
| 27285 |
+
},
|
| 27286 |
+
{
|
| 27287 |
+
"epoch": 0.2473015873015873,
|
| 27288 |
+
"grad_norm": 0.255859375,
|
| 27289 |
+
"learning_rate": 0.1,
|
| 27290 |
+
"loss": 2.1142492294311523,
|
| 27291 |
+
"step": 7790
|
| 27292 |
+
},
|
| 27293 |
+
{
|
| 27294 |
+
"epoch": 0.24736507936507937,
|
| 27295 |
+
"grad_norm": 0.212890625,
|
| 27296 |
+
"learning_rate": 0.1,
|
| 27297 |
+
"loss": 2.140623092651367,
|
| 27298 |
+
"step": 7792
|
| 27299 |
+
},
|
| 27300 |
+
{
|
| 27301 |
+
"epoch": 0.24742857142857144,
|
| 27302 |
+
"grad_norm": 0.10205078125,
|
| 27303 |
+
"learning_rate": 0.1,
|
| 27304 |
+
"loss": 2.0839195251464844,
|
| 27305 |
+
"step": 7794
|
| 27306 |
+
},
|
| 27307 |
+
{
|
| 27308 |
+
"epoch": 0.24749206349206349,
|
| 27309 |
+
"grad_norm": 0.20703125,
|
| 27310 |
+
"learning_rate": 0.1,
|
| 27311 |
+
"loss": 2.085799217224121,
|
| 27312 |
+
"step": 7796
|
| 27313 |
+
},
|
| 27314 |
+
{
|
| 27315 |
+
"epoch": 0.24755555555555556,
|
| 27316 |
+
"grad_norm": 0.1796875,
|
| 27317 |
+
"learning_rate": 0.1,
|
| 27318 |
+
"loss": 2.0974864959716797,
|
| 27319 |
+
"step": 7798
|
| 27320 |
+
},
|
| 27321 |
+
{
|
| 27322 |
+
"epoch": 0.24761904761904763,
|
| 27323 |
+
"grad_norm": 0.27734375,
|
| 27324 |
+
"learning_rate": 0.1,
|
| 27325 |
+
"loss": 2.100862741470337,
|
| 27326 |
+
"step": 7800
|
| 27327 |
+
},
|
| 27328 |
+
{
|
| 27329 |
+
"epoch": 0.24768253968253967,
|
| 27330 |
+
"grad_norm": 0.265625,
|
| 27331 |
+
"learning_rate": 0.1,
|
| 27332 |
+
"loss": 2.123643636703491,
|
| 27333 |
+
"step": 7802
|
| 27334 |
+
},
|
| 27335 |
+
{
|
| 27336 |
+
"epoch": 0.24774603174603174,
|
| 27337 |
+
"grad_norm": 0.1533203125,
|
| 27338 |
+
"learning_rate": 0.1,
|
| 27339 |
+
"loss": 2.111666202545166,
|
| 27340 |
+
"step": 7804
|
| 27341 |
+
},
|
| 27342 |
+
{
|
| 27343 |
+
"epoch": 0.2478095238095238,
|
| 27344 |
+
"grad_norm": 0.06640625,
|
| 27345 |
+
"learning_rate": 0.1,
|
| 27346 |
+
"loss": 2.1051220893859863,
|
| 27347 |
+
"step": 7806
|
| 27348 |
+
},
|
| 27349 |
+
{
|
| 27350 |
+
"epoch": 0.24787301587301588,
|
| 27351 |
+
"grad_norm": 0.0693359375,
|
| 27352 |
+
"learning_rate": 0.1,
|
| 27353 |
+
"loss": 2.1013715267181396,
|
| 27354 |
+
"step": 7808
|
| 27355 |
+
},
|
| 27356 |
+
{
|
| 27357 |
+
"epoch": 0.24793650793650793,
|
| 27358 |
+
"grad_norm": 0.11181640625,
|
| 27359 |
+
"learning_rate": 0.1,
|
| 27360 |
+
"loss": 2.0939769744873047,
|
| 27361 |
+
"step": 7810
|
| 27362 |
+
},
|
| 27363 |
+
{
|
| 27364 |
+
"epoch": 0.248,
|
| 27365 |
+
"grad_norm": 0.07275390625,
|
| 27366 |
+
"learning_rate": 0.1,
|
| 27367 |
+
"loss": 2.0908074378967285,
|
| 27368 |
+
"step": 7812
|
| 27369 |
+
},
|
| 27370 |
+
{
|
| 27371 |
+
"epoch": 0.24806349206349207,
|
| 27372 |
+
"grad_norm": 0.1025390625,
|
| 27373 |
+
"learning_rate": 0.1,
|
| 27374 |
+
"loss": 2.0955634117126465,
|
| 27375 |
+
"step": 7814
|
| 27376 |
+
},
|
| 27377 |
+
{
|
| 27378 |
+
"epoch": 0.24812698412698414,
|
| 27379 |
+
"grad_norm": 0.087890625,
|
| 27380 |
+
"learning_rate": 0.1,
|
| 27381 |
+
"loss": 2.0926597118377686,
|
| 27382 |
+
"step": 7816
|
| 27383 |
+
},
|
| 27384 |
+
{
|
| 27385 |
+
"epoch": 0.24819047619047618,
|
| 27386 |
+
"grad_norm": 0.095703125,
|
| 27387 |
+
"learning_rate": 0.1,
|
| 27388 |
+
"loss": 2.1247200965881348,
|
| 27389 |
+
"step": 7818
|
| 27390 |
+
},
|
| 27391 |
+
{
|
| 27392 |
+
"epoch": 0.24825396825396825,
|
| 27393 |
+
"grad_norm": 0.421875,
|
| 27394 |
+
"learning_rate": 0.1,
|
| 27395 |
+
"loss": 2.096296548843384,
|
| 27396 |
+
"step": 7820
|
| 27397 |
+
},
|
| 27398 |
+
{
|
| 27399 |
+
"epoch": 0.24831746031746033,
|
| 27400 |
+
"grad_norm": 0.396484375,
|
| 27401 |
+
"learning_rate": 0.1,
|
| 27402 |
+
"loss": 2.1253182888031006,
|
| 27403 |
+
"step": 7822
|
| 27404 |
+
},
|
| 27405 |
+
{
|
| 27406 |
+
"epoch": 0.24838095238095237,
|
| 27407 |
+
"grad_norm": 0.2001953125,
|
| 27408 |
+
"learning_rate": 0.1,
|
| 27409 |
+
"loss": 2.1058707237243652,
|
| 27410 |
+
"step": 7824
|
| 27411 |
+
},
|
| 27412 |
+
{
|
| 27413 |
+
"epoch": 0.24844444444444444,
|
| 27414 |
+
"grad_norm": 0.1513671875,
|
| 27415 |
+
"learning_rate": 0.1,
|
| 27416 |
+
"loss": 2.07863187789917,
|
| 27417 |
+
"step": 7826
|
| 27418 |
+
},
|
| 27419 |
+
{
|
| 27420 |
+
"epoch": 0.2485079365079365,
|
| 27421 |
+
"grad_norm": 0.150390625,
|
| 27422 |
+
"learning_rate": 0.1,
|
| 27423 |
+
"loss": 2.123563766479492,
|
| 27424 |
+
"step": 7828
|
| 27425 |
+
},
|
| 27426 |
+
{
|
| 27427 |
+
"epoch": 0.24857142857142858,
|
| 27428 |
+
"grad_norm": 0.11669921875,
|
| 27429 |
+
"learning_rate": 0.1,
|
| 27430 |
+
"loss": 2.111006736755371,
|
| 27431 |
+
"step": 7830
|
| 27432 |
+
},
|
| 27433 |
+
{
|
| 27434 |
+
"epoch": 0.24863492063492063,
|
| 27435 |
+
"grad_norm": 0.10009765625,
|
| 27436 |
+
"learning_rate": 0.1,
|
| 27437 |
+
"loss": 2.103977680206299,
|
| 27438 |
+
"step": 7832
|
| 27439 |
+
},
|
| 27440 |
+
{
|
| 27441 |
+
"epoch": 0.2486984126984127,
|
| 27442 |
+
"grad_norm": 0.11474609375,
|
| 27443 |
+
"learning_rate": 0.1,
|
| 27444 |
+
"loss": 2.104381561279297,
|
| 27445 |
+
"step": 7834
|
| 27446 |
+
},
|
| 27447 |
+
{
|
| 27448 |
+
"epoch": 0.24876190476190477,
|
| 27449 |
+
"grad_norm": 0.16015625,
|
| 27450 |
+
"learning_rate": 0.1,
|
| 27451 |
+
"loss": 2.113999843597412,
|
| 27452 |
+
"step": 7836
|
| 27453 |
+
},
|
| 27454 |
+
{
|
| 27455 |
+
"epoch": 0.24882539682539684,
|
| 27456 |
+
"grad_norm": 0.267578125,
|
| 27457 |
+
"learning_rate": 0.1,
|
| 27458 |
+
"loss": 2.0946075916290283,
|
| 27459 |
+
"step": 7838
|
| 27460 |
+
},
|
| 27461 |
+
{
|
| 27462 |
+
"epoch": 0.24888888888888888,
|
| 27463 |
+
"grad_norm": 0.1640625,
|
| 27464 |
+
"learning_rate": 0.1,
|
| 27465 |
+
"loss": 2.123718023300171,
|
| 27466 |
+
"step": 7840
|
| 27467 |
+
},
|
| 27468 |
+
{
|
| 27469 |
+
"epoch": 0.24895238095238095,
|
| 27470 |
+
"grad_norm": 0.0908203125,
|
| 27471 |
+
"learning_rate": 0.1,
|
| 27472 |
+
"loss": 2.1301300525665283,
|
| 27473 |
+
"step": 7842
|
| 27474 |
+
},
|
| 27475 |
+
{
|
| 27476 |
+
"epoch": 0.24901587301587302,
|
| 27477 |
+
"grad_norm": 0.0830078125,
|
| 27478 |
+
"learning_rate": 0.1,
|
| 27479 |
+
"loss": 2.1389126777648926,
|
| 27480 |
+
"step": 7844
|
| 27481 |
+
},
|
| 27482 |
+
{
|
| 27483 |
+
"epoch": 0.24907936507936507,
|
| 27484 |
+
"grad_norm": 0.1123046875,
|
| 27485 |
+
"learning_rate": 0.1,
|
| 27486 |
+
"loss": 2.138192892074585,
|
| 27487 |
+
"step": 7846
|
| 27488 |
+
},
|
| 27489 |
+
{
|
| 27490 |
+
"epoch": 0.24914285714285714,
|
| 27491 |
+
"grad_norm": 0.28515625,
|
| 27492 |
+
"learning_rate": 0.1,
|
| 27493 |
+
"loss": 2.091545581817627,
|
| 27494 |
+
"step": 7848
|
| 27495 |
+
},
|
| 27496 |
+
{
|
| 27497 |
+
"epoch": 0.2492063492063492,
|
| 27498 |
+
"grad_norm": 0.2080078125,
|
| 27499 |
+
"learning_rate": 0.1,
|
| 27500 |
+
"loss": 2.090691089630127,
|
| 27501 |
+
"step": 7850
|
| 27502 |
+
},
|
| 27503 |
+
{
|
| 27504 |
+
"epoch": 0.24926984126984128,
|
| 27505 |
+
"grad_norm": 0.1533203125,
|
| 27506 |
+
"learning_rate": 0.1,
|
| 27507 |
+
"loss": 2.1429617404937744,
|
| 27508 |
+
"step": 7852
|
| 27509 |
+
},
|
| 27510 |
+
{
|
| 27511 |
+
"epoch": 0.24933333333333332,
|
| 27512 |
+
"grad_norm": 0.08447265625,
|
| 27513 |
+
"learning_rate": 0.1,
|
| 27514 |
+
"loss": 2.1256930828094482,
|
| 27515 |
+
"step": 7854
|
| 27516 |
+
},
|
| 27517 |
+
{
|
| 27518 |
+
"epoch": 0.2493968253968254,
|
| 27519 |
+
"grad_norm": 0.0830078125,
|
| 27520 |
+
"learning_rate": 0.1,
|
| 27521 |
+
"loss": 2.0978100299835205,
|
| 27522 |
+
"step": 7856
|
| 27523 |
+
},
|
| 27524 |
+
{
|
| 27525 |
+
"epoch": 0.24946031746031747,
|
| 27526 |
+
"grad_norm": 0.240234375,
|
| 27527 |
+
"learning_rate": 0.1,
|
| 27528 |
+
"loss": 2.1158103942871094,
|
| 27529 |
+
"step": 7858
|
| 27530 |
+
},
|
| 27531 |
+
{
|
| 27532 |
+
"epoch": 0.24952380952380954,
|
| 27533 |
+
"grad_norm": 0.236328125,
|
| 27534 |
+
"learning_rate": 0.1,
|
| 27535 |
+
"loss": 2.1170918941497803,
|
| 27536 |
+
"step": 7860
|
| 27537 |
+
},
|
| 27538 |
+
{
|
| 27539 |
+
"epoch": 0.24958730158730158,
|
| 27540 |
+
"grad_norm": 0.11181640625,
|
| 27541 |
+
"learning_rate": 0.1,
|
| 27542 |
+
"loss": 2.0898027420043945,
|
| 27543 |
+
"step": 7862
|
| 27544 |
+
},
|
| 27545 |
+
{
|
| 27546 |
+
"epoch": 0.24965079365079365,
|
| 27547 |
+
"grad_norm": 0.1650390625,
|
| 27548 |
+
"learning_rate": 0.1,
|
| 27549 |
+
"loss": 2.1037285327911377,
|
| 27550 |
+
"step": 7864
|
| 27551 |
+
},
|
| 27552 |
+
{
|
| 27553 |
+
"epoch": 0.24971428571428572,
|
| 27554 |
+
"grad_norm": 0.1474609375,
|
| 27555 |
+
"learning_rate": 0.1,
|
| 27556 |
+
"loss": 2.1031713485717773,
|
| 27557 |
+
"step": 7866
|
| 27558 |
+
},
|
| 27559 |
+
{
|
| 27560 |
+
"epoch": 0.24977777777777777,
|
| 27561 |
+
"grad_norm": 0.11572265625,
|
| 27562 |
+
"learning_rate": 0.1,
|
| 27563 |
+
"loss": 2.089454174041748,
|
| 27564 |
+
"step": 7868
|
| 27565 |
+
},
|
| 27566 |
+
{
|
| 27567 |
+
"epoch": 0.24984126984126984,
|
| 27568 |
+
"grad_norm": 0.13671875,
|
| 27569 |
+
"learning_rate": 0.1,
|
| 27570 |
+
"loss": 2.1241745948791504,
|
| 27571 |
+
"step": 7870
|
| 27572 |
+
},
|
| 27573 |
+
{
|
| 27574 |
+
"epoch": 0.2499047619047619,
|
| 27575 |
+
"grad_norm": 0.185546875,
|
| 27576 |
+
"learning_rate": 0.1,
|
| 27577 |
+
"loss": 2.114145278930664,
|
| 27578 |
+
"step": 7872
|
| 27579 |
+
},
|
| 27580 |
+
{
|
| 27581 |
+
"epoch": 0.24996825396825398,
|
| 27582 |
+
"grad_norm": 0.259765625,
|
| 27583 |
+
"learning_rate": 0.1,
|
| 27584 |
+
"loss": 2.1260359287261963,
|
| 27585 |
+
"step": 7874
|
| 27586 |
}
|
| 27587 |
],
|
| 27588 |
"logging_steps": 2,
|
|
|
|
| 27602 |
"attributes": {}
|
| 27603 |
}
|
| 27604 |
},
|
| 27605 |
+
"total_flos": 2.608016489333632e+19,
|
| 27606 |
"train_batch_size": 4,
|
| 27607 |
"trial_name": null,
|
| 27608 |
"trial_params": null
|