Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 10080, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1171937904
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4c15dd5758da6e91090a4e05104520c379459187474d7afcf1e299354f423045
|
| 3 |
size 1171937904
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1288212619
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0bda61c5562b085b556664a05a01e57bbea325105a786d10ee131f7f55de32d8
|
| 3 |
size 1288212619
|
last-checkpoint/rng_state.pth
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 14645
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:17d09b98667b67af91698e95c4e454d8599b7a3fe6cb1b84c03f84f475afbcb6
|
| 3 |
size 14645
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1401
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4cd3a69179e8bbc8abd9a58c7c722a1b78f068a2931f77ad34a41d0a1716b746
|
| 3 |
size 1401
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -34206,6 +34206,1112 @@
|
|
| 34206 |
"learning_rate": 0.1,
|
| 34207 |
"loss": 2.1471362113952637,
|
| 34208 |
"step": 9764
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34209 |
}
|
| 34210 |
],
|
| 34211 |
"logging_steps": 2,
|
|
@@ -34225,7 +35331,7 @@
|
|
| 34225 |
"attributes": {}
|
| 34226 |
}
|
| 34227 |
},
|
| 34228 |
-
"total_flos": 3.
|
| 34229 |
"train_batch_size": 4,
|
| 34230 |
"trial_name": null,
|
| 34231 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.32,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 10080,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 34206 |
"learning_rate": 0.1,
|
| 34207 |
"loss": 2.1471362113952637,
|
| 34208 |
"step": 9764
|
| 34209 |
+
},
|
| 34210 |
+
{
|
| 34211 |
+
"epoch": 0.31003174603174605,
|
| 34212 |
+
"grad_norm": 0.0732421875,
|
| 34213 |
+
"learning_rate": 0.1,
|
| 34214 |
+
"loss": 2.193241596221924,
|
| 34215 |
+
"step": 9766
|
| 34216 |
+
},
|
| 34217 |
+
{
|
| 34218 |
+
"epoch": 0.3100952380952381,
|
| 34219 |
+
"grad_norm": 0.111328125,
|
| 34220 |
+
"learning_rate": 0.1,
|
| 34221 |
+
"loss": 2.1627867221832275,
|
| 34222 |
+
"step": 9768
|
| 34223 |
+
},
|
| 34224 |
+
{
|
| 34225 |
+
"epoch": 0.31015873015873013,
|
| 34226 |
+
"grad_norm": 0.2060546875,
|
| 34227 |
+
"learning_rate": 0.1,
|
| 34228 |
+
"loss": 2.186823844909668,
|
| 34229 |
+
"step": 9770
|
| 34230 |
+
},
|
| 34231 |
+
{
|
| 34232 |
+
"epoch": 0.31022222222222223,
|
| 34233 |
+
"grad_norm": 0.2412109375,
|
| 34234 |
+
"learning_rate": 0.1,
|
| 34235 |
+
"loss": 2.165410280227661,
|
| 34236 |
+
"step": 9772
|
| 34237 |
+
},
|
| 34238 |
+
{
|
| 34239 |
+
"epoch": 0.3102857142857143,
|
| 34240 |
+
"grad_norm": 0.115234375,
|
| 34241 |
+
"learning_rate": 0.1,
|
| 34242 |
+
"loss": 2.191713809967041,
|
| 34243 |
+
"step": 9774
|
| 34244 |
+
},
|
| 34245 |
+
{
|
| 34246 |
+
"epoch": 0.3103492063492064,
|
| 34247 |
+
"grad_norm": 0.058349609375,
|
| 34248 |
+
"learning_rate": 0.1,
|
| 34249 |
+
"loss": 2.1742026805877686,
|
| 34250 |
+
"step": 9776
|
| 34251 |
+
},
|
| 34252 |
+
{
|
| 34253 |
+
"epoch": 0.3104126984126984,
|
| 34254 |
+
"grad_norm": 0.0888671875,
|
| 34255 |
+
"learning_rate": 0.1,
|
| 34256 |
+
"loss": 2.170882225036621,
|
| 34257 |
+
"step": 9778
|
| 34258 |
+
},
|
| 34259 |
+
{
|
| 34260 |
+
"epoch": 0.31047619047619046,
|
| 34261 |
+
"grad_norm": 0.212890625,
|
| 34262 |
+
"learning_rate": 0.1,
|
| 34263 |
+
"loss": 2.167639970779419,
|
| 34264 |
+
"step": 9780
|
| 34265 |
+
},
|
| 34266 |
+
{
|
| 34267 |
+
"epoch": 0.31053968253968256,
|
| 34268 |
+
"grad_norm": 0.2119140625,
|
| 34269 |
+
"learning_rate": 0.1,
|
| 34270 |
+
"loss": 2.1710586547851562,
|
| 34271 |
+
"step": 9782
|
| 34272 |
+
},
|
| 34273 |
+
{
|
| 34274 |
+
"epoch": 0.3106031746031746,
|
| 34275 |
+
"grad_norm": 0.271484375,
|
| 34276 |
+
"learning_rate": 0.1,
|
| 34277 |
+
"loss": 2.2063710689544678,
|
| 34278 |
+
"step": 9784
|
| 34279 |
+
},
|
| 34280 |
+
{
|
| 34281 |
+
"epoch": 0.31066666666666665,
|
| 34282 |
+
"grad_norm": 0.259765625,
|
| 34283 |
+
"learning_rate": 0.1,
|
| 34284 |
+
"loss": 2.181265115737915,
|
| 34285 |
+
"step": 9786
|
| 34286 |
+
},
|
| 34287 |
+
{
|
| 34288 |
+
"epoch": 0.31073015873015875,
|
| 34289 |
+
"grad_norm": 0.0673828125,
|
| 34290 |
+
"learning_rate": 0.1,
|
| 34291 |
+
"loss": 2.1840031147003174,
|
| 34292 |
+
"step": 9788
|
| 34293 |
+
},
|
| 34294 |
+
{
|
| 34295 |
+
"epoch": 0.3107936507936508,
|
| 34296 |
+
"grad_norm": 0.1376953125,
|
| 34297 |
+
"learning_rate": 0.1,
|
| 34298 |
+
"loss": 2.1821494102478027,
|
| 34299 |
+
"step": 9790
|
| 34300 |
+
},
|
| 34301 |
+
{
|
| 34302 |
+
"epoch": 0.31085714285714283,
|
| 34303 |
+
"grad_norm": 0.2431640625,
|
| 34304 |
+
"learning_rate": 0.1,
|
| 34305 |
+
"loss": 2.1770834922790527,
|
| 34306 |
+
"step": 9792
|
| 34307 |
+
},
|
| 34308 |
+
{
|
| 34309 |
+
"epoch": 0.31092063492063493,
|
| 34310 |
+
"grad_norm": 0.263671875,
|
| 34311 |
+
"learning_rate": 0.1,
|
| 34312 |
+
"loss": 2.163947105407715,
|
| 34313 |
+
"step": 9794
|
| 34314 |
+
},
|
| 34315 |
+
{
|
| 34316 |
+
"epoch": 0.310984126984127,
|
| 34317 |
+
"grad_norm": 0.205078125,
|
| 34318 |
+
"learning_rate": 0.1,
|
| 34319 |
+
"loss": 2.1887195110321045,
|
| 34320 |
+
"step": 9796
|
| 34321 |
+
},
|
| 34322 |
+
{
|
| 34323 |
+
"epoch": 0.3110476190476191,
|
| 34324 |
+
"grad_norm": 0.08935546875,
|
| 34325 |
+
"learning_rate": 0.1,
|
| 34326 |
+
"loss": 2.1659390926361084,
|
| 34327 |
+
"step": 9798
|
| 34328 |
+
},
|
| 34329 |
+
{
|
| 34330 |
+
"epoch": 0.3111111111111111,
|
| 34331 |
+
"grad_norm": 0.10791015625,
|
| 34332 |
+
"learning_rate": 0.1,
|
| 34333 |
+
"loss": 2.1604509353637695,
|
| 34334 |
+
"step": 9800
|
| 34335 |
+
},
|
| 34336 |
+
{
|
| 34337 |
+
"epoch": 0.31117460317460316,
|
| 34338 |
+
"grad_norm": 0.06640625,
|
| 34339 |
+
"learning_rate": 0.1,
|
| 34340 |
+
"loss": 2.193321466445923,
|
| 34341 |
+
"step": 9802
|
| 34342 |
+
},
|
| 34343 |
+
{
|
| 34344 |
+
"epoch": 0.31123809523809526,
|
| 34345 |
+
"grad_norm": 0.053466796875,
|
| 34346 |
+
"learning_rate": 0.1,
|
| 34347 |
+
"loss": 2.1531283855438232,
|
| 34348 |
+
"step": 9804
|
| 34349 |
+
},
|
| 34350 |
+
{
|
| 34351 |
+
"epoch": 0.3113015873015873,
|
| 34352 |
+
"grad_norm": 0.0751953125,
|
| 34353 |
+
"learning_rate": 0.1,
|
| 34354 |
+
"loss": 2.1906158924102783,
|
| 34355 |
+
"step": 9806
|
| 34356 |
+
},
|
| 34357 |
+
{
|
| 34358 |
+
"epoch": 0.31136507936507934,
|
| 34359 |
+
"grad_norm": 0.31640625,
|
| 34360 |
+
"learning_rate": 0.1,
|
| 34361 |
+
"loss": 2.2052948474884033,
|
| 34362 |
+
"step": 9808
|
| 34363 |
+
},
|
| 34364 |
+
{
|
| 34365 |
+
"epoch": 0.31142857142857144,
|
| 34366 |
+
"grad_norm": 0.359375,
|
| 34367 |
+
"learning_rate": 0.1,
|
| 34368 |
+
"loss": 2.20705246925354,
|
| 34369 |
+
"step": 9810
|
| 34370 |
+
},
|
| 34371 |
+
{
|
| 34372 |
+
"epoch": 0.3114920634920635,
|
| 34373 |
+
"grad_norm": 0.08154296875,
|
| 34374 |
+
"learning_rate": 0.1,
|
| 34375 |
+
"loss": 2.1867713928222656,
|
| 34376 |
+
"step": 9812
|
| 34377 |
+
},
|
| 34378 |
+
{
|
| 34379 |
+
"epoch": 0.31155555555555553,
|
| 34380 |
+
"grad_norm": 0.053955078125,
|
| 34381 |
+
"learning_rate": 0.1,
|
| 34382 |
+
"loss": 2.185459852218628,
|
| 34383 |
+
"step": 9814
|
| 34384 |
+
},
|
| 34385 |
+
{
|
| 34386 |
+
"epoch": 0.31161904761904763,
|
| 34387 |
+
"grad_norm": 0.061279296875,
|
| 34388 |
+
"learning_rate": 0.1,
|
| 34389 |
+
"loss": 2.192904233932495,
|
| 34390 |
+
"step": 9816
|
| 34391 |
+
},
|
| 34392 |
+
{
|
| 34393 |
+
"epoch": 0.31168253968253967,
|
| 34394 |
+
"grad_norm": 0.15625,
|
| 34395 |
+
"learning_rate": 0.1,
|
| 34396 |
+
"loss": 2.16719651222229,
|
| 34397 |
+
"step": 9818
|
| 34398 |
+
},
|
| 34399 |
+
{
|
| 34400 |
+
"epoch": 0.31174603174603177,
|
| 34401 |
+
"grad_norm": 0.205078125,
|
| 34402 |
+
"learning_rate": 0.1,
|
| 34403 |
+
"loss": 2.184863567352295,
|
| 34404 |
+
"step": 9820
|
| 34405 |
+
},
|
| 34406 |
+
{
|
| 34407 |
+
"epoch": 0.3118095238095238,
|
| 34408 |
+
"grad_norm": 0.19140625,
|
| 34409 |
+
"learning_rate": 0.1,
|
| 34410 |
+
"loss": 2.1928882598876953,
|
| 34411 |
+
"step": 9822
|
| 34412 |
+
},
|
| 34413 |
+
{
|
| 34414 |
+
"epoch": 0.31187301587301586,
|
| 34415 |
+
"grad_norm": 0.2294921875,
|
| 34416 |
+
"learning_rate": 0.1,
|
| 34417 |
+
"loss": 2.1773009300231934,
|
| 34418 |
+
"step": 9824
|
| 34419 |
+
},
|
| 34420 |
+
{
|
| 34421 |
+
"epoch": 0.31193650793650796,
|
| 34422 |
+
"grad_norm": 0.10595703125,
|
| 34423 |
+
"learning_rate": 0.1,
|
| 34424 |
+
"loss": 2.187877893447876,
|
| 34425 |
+
"step": 9826
|
| 34426 |
+
},
|
| 34427 |
+
{
|
| 34428 |
+
"epoch": 0.312,
|
| 34429 |
+
"grad_norm": 0.1435546875,
|
| 34430 |
+
"learning_rate": 0.1,
|
| 34431 |
+
"loss": 2.187873363494873,
|
| 34432 |
+
"step": 9828
|
| 34433 |
+
},
|
| 34434 |
+
{
|
| 34435 |
+
"epoch": 0.31206349206349204,
|
| 34436 |
+
"grad_norm": 0.0546875,
|
| 34437 |
+
"learning_rate": 0.1,
|
| 34438 |
+
"loss": 2.2156882286071777,
|
| 34439 |
+
"step": 9830
|
| 34440 |
+
},
|
| 34441 |
+
{
|
| 34442 |
+
"epoch": 0.31212698412698414,
|
| 34443 |
+
"grad_norm": 0.1845703125,
|
| 34444 |
+
"learning_rate": 0.1,
|
| 34445 |
+
"loss": 2.1722946166992188,
|
| 34446 |
+
"step": 9832
|
| 34447 |
+
},
|
| 34448 |
+
{
|
| 34449 |
+
"epoch": 0.3121904761904762,
|
| 34450 |
+
"grad_norm": 0.154296875,
|
| 34451 |
+
"learning_rate": 0.1,
|
| 34452 |
+
"loss": 2.180370330810547,
|
| 34453 |
+
"step": 9834
|
| 34454 |
+
},
|
| 34455 |
+
{
|
| 34456 |
+
"epoch": 0.31225396825396823,
|
| 34457 |
+
"grad_norm": 0.0634765625,
|
| 34458 |
+
"learning_rate": 0.1,
|
| 34459 |
+
"loss": 2.1398494243621826,
|
| 34460 |
+
"step": 9836
|
| 34461 |
+
},
|
| 34462 |
+
{
|
| 34463 |
+
"epoch": 0.3123174603174603,
|
| 34464 |
+
"grad_norm": 0.3125,
|
| 34465 |
+
"learning_rate": 0.1,
|
| 34466 |
+
"loss": 2.190025806427002,
|
| 34467 |
+
"step": 9838
|
| 34468 |
+
},
|
| 34469 |
+
{
|
| 34470 |
+
"epoch": 0.31238095238095237,
|
| 34471 |
+
"grad_norm": 0.291015625,
|
| 34472 |
+
"learning_rate": 0.1,
|
| 34473 |
+
"loss": 2.176567554473877,
|
| 34474 |
+
"step": 9840
|
| 34475 |
+
},
|
| 34476 |
+
{
|
| 34477 |
+
"epoch": 0.31244444444444447,
|
| 34478 |
+
"grad_norm": 0.095703125,
|
| 34479 |
+
"learning_rate": 0.1,
|
| 34480 |
+
"loss": 2.1811344623565674,
|
| 34481 |
+
"step": 9842
|
| 34482 |
+
},
|
| 34483 |
+
{
|
| 34484 |
+
"epoch": 0.3125079365079365,
|
| 34485 |
+
"grad_norm": 0.06689453125,
|
| 34486 |
+
"learning_rate": 0.1,
|
| 34487 |
+
"loss": 2.159538507461548,
|
| 34488 |
+
"step": 9844
|
| 34489 |
+
},
|
| 34490 |
+
{
|
| 34491 |
+
"epoch": 0.31257142857142856,
|
| 34492 |
+
"grad_norm": 0.2021484375,
|
| 34493 |
+
"learning_rate": 0.1,
|
| 34494 |
+
"loss": 2.1890382766723633,
|
| 34495 |
+
"step": 9846
|
| 34496 |
+
},
|
| 34497 |
+
{
|
| 34498 |
+
"epoch": 0.31263492063492065,
|
| 34499 |
+
"grad_norm": 0.205078125,
|
| 34500 |
+
"learning_rate": 0.1,
|
| 34501 |
+
"loss": 2.191279649734497,
|
| 34502 |
+
"step": 9848
|
| 34503 |
+
},
|
| 34504 |
+
{
|
| 34505 |
+
"epoch": 0.3126984126984127,
|
| 34506 |
+
"grad_norm": 0.1650390625,
|
| 34507 |
+
"learning_rate": 0.1,
|
| 34508 |
+
"loss": 2.1829254627227783,
|
| 34509 |
+
"step": 9850
|
| 34510 |
+
},
|
| 34511 |
+
{
|
| 34512 |
+
"epoch": 0.31276190476190474,
|
| 34513 |
+
"grad_norm": 0.0791015625,
|
| 34514 |
+
"learning_rate": 0.1,
|
| 34515 |
+
"loss": 2.1959803104400635,
|
| 34516 |
+
"step": 9852
|
| 34517 |
+
},
|
| 34518 |
+
{
|
| 34519 |
+
"epoch": 0.31282539682539684,
|
| 34520 |
+
"grad_norm": 0.1611328125,
|
| 34521 |
+
"learning_rate": 0.1,
|
| 34522 |
+
"loss": 2.1675479412078857,
|
| 34523 |
+
"step": 9854
|
| 34524 |
+
},
|
| 34525 |
+
{
|
| 34526 |
+
"epoch": 0.3128888888888889,
|
| 34527 |
+
"grad_norm": 0.1474609375,
|
| 34528 |
+
"learning_rate": 0.1,
|
| 34529 |
+
"loss": 2.172224521636963,
|
| 34530 |
+
"step": 9856
|
| 34531 |
+
},
|
| 34532 |
+
{
|
| 34533 |
+
"epoch": 0.3129523809523809,
|
| 34534 |
+
"grad_norm": 0.1083984375,
|
| 34535 |
+
"learning_rate": 0.1,
|
| 34536 |
+
"loss": 2.199164390563965,
|
| 34537 |
+
"step": 9858
|
| 34538 |
+
},
|
| 34539 |
+
{
|
| 34540 |
+
"epoch": 0.313015873015873,
|
| 34541 |
+
"grad_norm": 0.119140625,
|
| 34542 |
+
"learning_rate": 0.1,
|
| 34543 |
+
"loss": 2.2071785926818848,
|
| 34544 |
+
"step": 9860
|
| 34545 |
+
},
|
| 34546 |
+
{
|
| 34547 |
+
"epoch": 0.31307936507936507,
|
| 34548 |
+
"grad_norm": 0.27734375,
|
| 34549 |
+
"learning_rate": 0.1,
|
| 34550 |
+
"loss": 2.198953628540039,
|
| 34551 |
+
"step": 9862
|
| 34552 |
+
},
|
| 34553 |
+
{
|
| 34554 |
+
"epoch": 0.31314285714285717,
|
| 34555 |
+
"grad_norm": 0.28515625,
|
| 34556 |
+
"learning_rate": 0.1,
|
| 34557 |
+
"loss": 2.1639485359191895,
|
| 34558 |
+
"step": 9864
|
| 34559 |
+
},
|
| 34560 |
+
{
|
| 34561 |
+
"epoch": 0.3132063492063492,
|
| 34562 |
+
"grad_norm": 0.1435546875,
|
| 34563 |
+
"learning_rate": 0.1,
|
| 34564 |
+
"loss": 2.1723406314849854,
|
| 34565 |
+
"step": 9866
|
| 34566 |
+
},
|
| 34567 |
+
{
|
| 34568 |
+
"epoch": 0.31326984126984125,
|
| 34569 |
+
"grad_norm": 0.1591796875,
|
| 34570 |
+
"learning_rate": 0.1,
|
| 34571 |
+
"loss": 2.1850998401641846,
|
| 34572 |
+
"step": 9868
|
| 34573 |
+
},
|
| 34574 |
+
{
|
| 34575 |
+
"epoch": 0.31333333333333335,
|
| 34576 |
+
"grad_norm": 0.18359375,
|
| 34577 |
+
"learning_rate": 0.1,
|
| 34578 |
+
"loss": 2.2024178504943848,
|
| 34579 |
+
"step": 9870
|
| 34580 |
+
},
|
| 34581 |
+
{
|
| 34582 |
+
"epoch": 0.3133968253968254,
|
| 34583 |
+
"grad_norm": 0.330078125,
|
| 34584 |
+
"learning_rate": 0.1,
|
| 34585 |
+
"loss": 2.2177889347076416,
|
| 34586 |
+
"step": 9872
|
| 34587 |
+
},
|
| 34588 |
+
{
|
| 34589 |
+
"epoch": 0.31346031746031744,
|
| 34590 |
+
"grad_norm": 0.07861328125,
|
| 34591 |
+
"learning_rate": 0.1,
|
| 34592 |
+
"loss": 2.2047946453094482,
|
| 34593 |
+
"step": 9874
|
| 34594 |
+
},
|
| 34595 |
+
{
|
| 34596 |
+
"epoch": 0.31352380952380954,
|
| 34597 |
+
"grad_norm": 0.06201171875,
|
| 34598 |
+
"learning_rate": 0.1,
|
| 34599 |
+
"loss": 2.1921725273132324,
|
| 34600 |
+
"step": 9876
|
| 34601 |
+
},
|
| 34602 |
+
{
|
| 34603 |
+
"epoch": 0.3135873015873016,
|
| 34604 |
+
"grad_norm": 0.04931640625,
|
| 34605 |
+
"learning_rate": 0.1,
|
| 34606 |
+
"loss": 2.179666757583618,
|
| 34607 |
+
"step": 9878
|
| 34608 |
+
},
|
| 34609 |
+
{
|
| 34610 |
+
"epoch": 0.3136507936507936,
|
| 34611 |
+
"grad_norm": 0.08935546875,
|
| 34612 |
+
"learning_rate": 0.1,
|
| 34613 |
+
"loss": 2.1968069076538086,
|
| 34614 |
+
"step": 9880
|
| 34615 |
+
},
|
| 34616 |
+
{
|
| 34617 |
+
"epoch": 0.3137142857142857,
|
| 34618 |
+
"grad_norm": 0.234375,
|
| 34619 |
+
"learning_rate": 0.1,
|
| 34620 |
+
"loss": 2.215252161026001,
|
| 34621 |
+
"step": 9882
|
| 34622 |
+
},
|
| 34623 |
+
{
|
| 34624 |
+
"epoch": 0.31377777777777777,
|
| 34625 |
+
"grad_norm": 0.2421875,
|
| 34626 |
+
"learning_rate": 0.1,
|
| 34627 |
+
"loss": 2.207919120788574,
|
| 34628 |
+
"step": 9884
|
| 34629 |
+
},
|
| 34630 |
+
{
|
| 34631 |
+
"epoch": 0.31384126984126987,
|
| 34632 |
+
"grad_norm": 0.353515625,
|
| 34633 |
+
"learning_rate": 0.1,
|
| 34634 |
+
"loss": 2.192678928375244,
|
| 34635 |
+
"step": 9886
|
| 34636 |
+
},
|
| 34637 |
+
{
|
| 34638 |
+
"epoch": 0.3139047619047619,
|
| 34639 |
+
"grad_norm": 0.125,
|
| 34640 |
+
"learning_rate": 0.1,
|
| 34641 |
+
"loss": 2.178010940551758,
|
| 34642 |
+
"step": 9888
|
| 34643 |
+
},
|
| 34644 |
+
{
|
| 34645 |
+
"epoch": 0.31396825396825395,
|
| 34646 |
+
"grad_norm": 0.1484375,
|
| 34647 |
+
"learning_rate": 0.1,
|
| 34648 |
+
"loss": 2.195068836212158,
|
| 34649 |
+
"step": 9890
|
| 34650 |
+
},
|
| 34651 |
+
{
|
| 34652 |
+
"epoch": 0.31403174603174605,
|
| 34653 |
+
"grad_norm": 0.2021484375,
|
| 34654 |
+
"learning_rate": 0.1,
|
| 34655 |
+
"loss": 2.2022101879119873,
|
| 34656 |
+
"step": 9892
|
| 34657 |
+
},
|
| 34658 |
+
{
|
| 34659 |
+
"epoch": 0.3140952380952381,
|
| 34660 |
+
"grad_norm": 0.1689453125,
|
| 34661 |
+
"learning_rate": 0.1,
|
| 34662 |
+
"loss": 2.188624858856201,
|
| 34663 |
+
"step": 9894
|
| 34664 |
+
},
|
| 34665 |
+
{
|
| 34666 |
+
"epoch": 0.31415873015873014,
|
| 34667 |
+
"grad_norm": 0.0654296875,
|
| 34668 |
+
"learning_rate": 0.1,
|
| 34669 |
+
"loss": 2.174272060394287,
|
| 34670 |
+
"step": 9896
|
| 34671 |
+
},
|
| 34672 |
+
{
|
| 34673 |
+
"epoch": 0.31422222222222224,
|
| 34674 |
+
"grad_norm": 0.12109375,
|
| 34675 |
+
"learning_rate": 0.1,
|
| 34676 |
+
"loss": 2.1800880432128906,
|
| 34677 |
+
"step": 9898
|
| 34678 |
+
},
|
| 34679 |
+
{
|
| 34680 |
+
"epoch": 0.3142857142857143,
|
| 34681 |
+
"grad_norm": 0.2041015625,
|
| 34682 |
+
"learning_rate": 0.1,
|
| 34683 |
+
"loss": 2.182217597961426,
|
| 34684 |
+
"step": 9900
|
| 34685 |
+
},
|
| 34686 |
+
{
|
| 34687 |
+
"epoch": 0.3143492063492063,
|
| 34688 |
+
"grad_norm": 0.1640625,
|
| 34689 |
+
"learning_rate": 0.1,
|
| 34690 |
+
"loss": 2.175539255142212,
|
| 34691 |
+
"step": 9902
|
| 34692 |
+
},
|
| 34693 |
+
{
|
| 34694 |
+
"epoch": 0.3144126984126984,
|
| 34695 |
+
"grad_norm": 0.1826171875,
|
| 34696 |
+
"learning_rate": 0.1,
|
| 34697 |
+
"loss": 2.1903021335601807,
|
| 34698 |
+
"step": 9904
|
| 34699 |
+
},
|
| 34700 |
+
{
|
| 34701 |
+
"epoch": 0.31447619047619046,
|
| 34702 |
+
"grad_norm": 0.16015625,
|
| 34703 |
+
"learning_rate": 0.1,
|
| 34704 |
+
"loss": 2.197434663772583,
|
| 34705 |
+
"step": 9906
|
| 34706 |
+
},
|
| 34707 |
+
{
|
| 34708 |
+
"epoch": 0.31453968253968256,
|
| 34709 |
+
"grad_norm": 0.08447265625,
|
| 34710 |
+
"learning_rate": 0.1,
|
| 34711 |
+
"loss": 2.198740005493164,
|
| 34712 |
+
"step": 9908
|
| 34713 |
+
},
|
| 34714 |
+
{
|
| 34715 |
+
"epoch": 0.3146031746031746,
|
| 34716 |
+
"grad_norm": 0.0908203125,
|
| 34717 |
+
"learning_rate": 0.1,
|
| 34718 |
+
"loss": 2.165989398956299,
|
| 34719 |
+
"step": 9910
|
| 34720 |
+
},
|
| 34721 |
+
{
|
| 34722 |
+
"epoch": 0.31466666666666665,
|
| 34723 |
+
"grad_norm": 0.201171875,
|
| 34724 |
+
"learning_rate": 0.1,
|
| 34725 |
+
"loss": 2.22456693649292,
|
| 34726 |
+
"step": 9912
|
| 34727 |
+
},
|
| 34728 |
+
{
|
| 34729 |
+
"epoch": 0.31473015873015875,
|
| 34730 |
+
"grad_norm": 0.33203125,
|
| 34731 |
+
"learning_rate": 0.1,
|
| 34732 |
+
"loss": 2.1924569606781006,
|
| 34733 |
+
"step": 9914
|
| 34734 |
+
},
|
| 34735 |
+
{
|
| 34736 |
+
"epoch": 0.3147936507936508,
|
| 34737 |
+
"grad_norm": 0.1806640625,
|
| 34738 |
+
"learning_rate": 0.1,
|
| 34739 |
+
"loss": 2.1935088634490967,
|
| 34740 |
+
"step": 9916
|
| 34741 |
+
},
|
| 34742 |
+
{
|
| 34743 |
+
"epoch": 0.31485714285714284,
|
| 34744 |
+
"grad_norm": 0.1044921875,
|
| 34745 |
+
"learning_rate": 0.1,
|
| 34746 |
+
"loss": 2.222795248031616,
|
| 34747 |
+
"step": 9918
|
| 34748 |
+
},
|
| 34749 |
+
{
|
| 34750 |
+
"epoch": 0.31492063492063493,
|
| 34751 |
+
"grad_norm": 0.058349609375,
|
| 34752 |
+
"learning_rate": 0.1,
|
| 34753 |
+
"loss": 2.2171945571899414,
|
| 34754 |
+
"step": 9920
|
| 34755 |
+
},
|
| 34756 |
+
{
|
| 34757 |
+
"epoch": 0.314984126984127,
|
| 34758 |
+
"grad_norm": 0.1416015625,
|
| 34759 |
+
"learning_rate": 0.1,
|
| 34760 |
+
"loss": 2.2098042964935303,
|
| 34761 |
+
"step": 9922
|
| 34762 |
+
},
|
| 34763 |
+
{
|
| 34764 |
+
"epoch": 0.315047619047619,
|
| 34765 |
+
"grad_norm": 0.140625,
|
| 34766 |
+
"learning_rate": 0.1,
|
| 34767 |
+
"loss": 2.1736574172973633,
|
| 34768 |
+
"step": 9924
|
| 34769 |
+
},
|
| 34770 |
+
{
|
| 34771 |
+
"epoch": 0.3151111111111111,
|
| 34772 |
+
"grad_norm": 0.267578125,
|
| 34773 |
+
"learning_rate": 0.1,
|
| 34774 |
+
"loss": 2.203256607055664,
|
| 34775 |
+
"step": 9926
|
| 34776 |
+
},
|
| 34777 |
+
{
|
| 34778 |
+
"epoch": 0.31517460317460316,
|
| 34779 |
+
"grad_norm": 0.427734375,
|
| 34780 |
+
"learning_rate": 0.1,
|
| 34781 |
+
"loss": 2.196739912033081,
|
| 34782 |
+
"step": 9928
|
| 34783 |
+
},
|
| 34784 |
+
{
|
| 34785 |
+
"epoch": 0.31523809523809526,
|
| 34786 |
+
"grad_norm": 0.1005859375,
|
| 34787 |
+
"learning_rate": 0.1,
|
| 34788 |
+
"loss": 2.202354669570923,
|
| 34789 |
+
"step": 9930
|
| 34790 |
+
},
|
| 34791 |
+
{
|
| 34792 |
+
"epoch": 0.3153015873015873,
|
| 34793 |
+
"grad_norm": 0.1298828125,
|
| 34794 |
+
"learning_rate": 0.1,
|
| 34795 |
+
"loss": 2.1933236122131348,
|
| 34796 |
+
"step": 9932
|
| 34797 |
+
},
|
| 34798 |
+
{
|
| 34799 |
+
"epoch": 0.31536507936507935,
|
| 34800 |
+
"grad_norm": 0.091796875,
|
| 34801 |
+
"learning_rate": 0.1,
|
| 34802 |
+
"loss": 2.198293685913086,
|
| 34803 |
+
"step": 9934
|
| 34804 |
+
},
|
| 34805 |
+
{
|
| 34806 |
+
"epoch": 0.31542857142857145,
|
| 34807 |
+
"grad_norm": 0.1865234375,
|
| 34808 |
+
"learning_rate": 0.1,
|
| 34809 |
+
"loss": 2.212238311767578,
|
| 34810 |
+
"step": 9936
|
| 34811 |
+
},
|
| 34812 |
+
{
|
| 34813 |
+
"epoch": 0.3154920634920635,
|
| 34814 |
+
"grad_norm": 0.26953125,
|
| 34815 |
+
"learning_rate": 0.1,
|
| 34816 |
+
"loss": 2.1917948722839355,
|
| 34817 |
+
"step": 9938
|
| 34818 |
+
},
|
| 34819 |
+
{
|
| 34820 |
+
"epoch": 0.31555555555555553,
|
| 34821 |
+
"grad_norm": 0.1669921875,
|
| 34822 |
+
"learning_rate": 0.1,
|
| 34823 |
+
"loss": 2.187749147415161,
|
| 34824 |
+
"step": 9940
|
| 34825 |
+
},
|
| 34826 |
+
{
|
| 34827 |
+
"epoch": 0.31561904761904763,
|
| 34828 |
+
"grad_norm": 0.08154296875,
|
| 34829 |
+
"learning_rate": 0.1,
|
| 34830 |
+
"loss": 2.2057549953460693,
|
| 34831 |
+
"step": 9942
|
| 34832 |
+
},
|
| 34833 |
+
{
|
| 34834 |
+
"epoch": 0.3156825396825397,
|
| 34835 |
+
"grad_norm": 0.19140625,
|
| 34836 |
+
"learning_rate": 0.1,
|
| 34837 |
+
"loss": 2.1741790771484375,
|
| 34838 |
+
"step": 9944
|
| 34839 |
+
},
|
| 34840 |
+
{
|
| 34841 |
+
"epoch": 0.3157460317460317,
|
| 34842 |
+
"grad_norm": 0.109375,
|
| 34843 |
+
"learning_rate": 0.1,
|
| 34844 |
+
"loss": 2.224497079849243,
|
| 34845 |
+
"step": 9946
|
| 34846 |
+
},
|
| 34847 |
+
{
|
| 34848 |
+
"epoch": 0.3158095238095238,
|
| 34849 |
+
"grad_norm": 0.09912109375,
|
| 34850 |
+
"learning_rate": 0.1,
|
| 34851 |
+
"loss": 2.203294277191162,
|
| 34852 |
+
"step": 9948
|
| 34853 |
+
},
|
| 34854 |
+
{
|
| 34855 |
+
"epoch": 0.31587301587301586,
|
| 34856 |
+
"grad_norm": 0.08251953125,
|
| 34857 |
+
"learning_rate": 0.1,
|
| 34858 |
+
"loss": 2.196547269821167,
|
| 34859 |
+
"step": 9950
|
| 34860 |
+
},
|
| 34861 |
+
{
|
| 34862 |
+
"epoch": 0.31593650793650796,
|
| 34863 |
+
"grad_norm": 0.138671875,
|
| 34864 |
+
"learning_rate": 0.1,
|
| 34865 |
+
"loss": 2.216204881668091,
|
| 34866 |
+
"step": 9952
|
| 34867 |
+
},
|
| 34868 |
+
{
|
| 34869 |
+
"epoch": 0.316,
|
| 34870 |
+
"grad_norm": 0.283203125,
|
| 34871 |
+
"learning_rate": 0.1,
|
| 34872 |
+
"loss": 2.2269957065582275,
|
| 34873 |
+
"step": 9954
|
| 34874 |
+
},
|
| 34875 |
+
{
|
| 34876 |
+
"epoch": 0.31606349206349205,
|
| 34877 |
+
"grad_norm": 0.2314453125,
|
| 34878 |
+
"learning_rate": 0.1,
|
| 34879 |
+
"loss": 2.209287643432617,
|
| 34880 |
+
"step": 9956
|
| 34881 |
+
},
|
| 34882 |
+
{
|
| 34883 |
+
"epoch": 0.31612698412698415,
|
| 34884 |
+
"grad_norm": 0.119140625,
|
| 34885 |
+
"learning_rate": 0.1,
|
| 34886 |
+
"loss": 2.233351945877075,
|
| 34887 |
+
"step": 9958
|
| 34888 |
+
},
|
| 34889 |
+
{
|
| 34890 |
+
"epoch": 0.3161904761904762,
|
| 34891 |
+
"grad_norm": 0.201171875,
|
| 34892 |
+
"learning_rate": 0.1,
|
| 34893 |
+
"loss": 2.187995195388794,
|
| 34894 |
+
"step": 9960
|
| 34895 |
+
},
|
| 34896 |
+
{
|
| 34897 |
+
"epoch": 0.31625396825396823,
|
| 34898 |
+
"grad_norm": 0.234375,
|
| 34899 |
+
"learning_rate": 0.1,
|
| 34900 |
+
"loss": 2.203009605407715,
|
| 34901 |
+
"step": 9962
|
| 34902 |
+
},
|
| 34903 |
+
{
|
| 34904 |
+
"epoch": 0.31631746031746033,
|
| 34905 |
+
"grad_norm": 0.0703125,
|
| 34906 |
+
"learning_rate": 0.1,
|
| 34907 |
+
"loss": 2.1894705295562744,
|
| 34908 |
+
"step": 9964
|
| 34909 |
+
},
|
| 34910 |
+
{
|
| 34911 |
+
"epoch": 0.3163809523809524,
|
| 34912 |
+
"grad_norm": 0.09033203125,
|
| 34913 |
+
"learning_rate": 0.1,
|
| 34914 |
+
"loss": 2.2359108924865723,
|
| 34915 |
+
"step": 9966
|
| 34916 |
+
},
|
| 34917 |
+
{
|
| 34918 |
+
"epoch": 0.3164444444444444,
|
| 34919 |
+
"grad_norm": 0.1494140625,
|
| 34920 |
+
"learning_rate": 0.1,
|
| 34921 |
+
"loss": 2.202052593231201,
|
| 34922 |
+
"step": 9968
|
| 34923 |
+
},
|
| 34924 |
+
{
|
| 34925 |
+
"epoch": 0.3165079365079365,
|
| 34926 |
+
"grad_norm": 0.0908203125,
|
| 34927 |
+
"learning_rate": 0.1,
|
| 34928 |
+
"loss": 2.155691623687744,
|
| 34929 |
+
"step": 9970
|
| 34930 |
+
},
|
| 34931 |
+
{
|
| 34932 |
+
"epoch": 0.31657142857142856,
|
| 34933 |
+
"grad_norm": 0.189453125,
|
| 34934 |
+
"learning_rate": 0.1,
|
| 34935 |
+
"loss": 2.2047054767608643,
|
| 34936 |
+
"step": 9972
|
| 34937 |
+
},
|
| 34938 |
+
{
|
| 34939 |
+
"epoch": 0.31663492063492066,
|
| 34940 |
+
"grad_norm": 0.296875,
|
| 34941 |
+
"learning_rate": 0.1,
|
| 34942 |
+
"loss": 2.1690495014190674,
|
| 34943 |
+
"step": 9974
|
| 34944 |
+
},
|
| 34945 |
+
{
|
| 34946 |
+
"epoch": 0.3166984126984127,
|
| 34947 |
+
"grad_norm": 0.09814453125,
|
| 34948 |
+
"learning_rate": 0.1,
|
| 34949 |
+
"loss": 2.1789681911468506,
|
| 34950 |
+
"step": 9976
|
| 34951 |
+
},
|
| 34952 |
+
{
|
| 34953 |
+
"epoch": 0.31676190476190474,
|
| 34954 |
+
"grad_norm": 0.212890625,
|
| 34955 |
+
"learning_rate": 0.1,
|
| 34956 |
+
"loss": 2.19167423248291,
|
| 34957 |
+
"step": 9978
|
| 34958 |
+
},
|
| 34959 |
+
{
|
| 34960 |
+
"epoch": 0.31682539682539684,
|
| 34961 |
+
"grad_norm": 0.091796875,
|
| 34962 |
+
"learning_rate": 0.1,
|
| 34963 |
+
"loss": 2.178757667541504,
|
| 34964 |
+
"step": 9980
|
| 34965 |
+
},
|
| 34966 |
+
{
|
| 34967 |
+
"epoch": 0.3168888888888889,
|
| 34968 |
+
"grad_norm": 0.091796875,
|
| 34969 |
+
"learning_rate": 0.1,
|
| 34970 |
+
"loss": 2.184861183166504,
|
| 34971 |
+
"step": 9982
|
| 34972 |
+
},
|
| 34973 |
+
{
|
| 34974 |
+
"epoch": 0.31695238095238093,
|
| 34975 |
+
"grad_norm": 0.1494140625,
|
| 34976 |
+
"learning_rate": 0.1,
|
| 34977 |
+
"loss": 2.141266345977783,
|
| 34978 |
+
"step": 9984
|
| 34979 |
+
},
|
| 34980 |
+
{
|
| 34981 |
+
"epoch": 0.31701587301587303,
|
| 34982 |
+
"grad_norm": 0.33203125,
|
| 34983 |
+
"learning_rate": 0.1,
|
| 34984 |
+
"loss": 2.199850559234619,
|
| 34985 |
+
"step": 9986
|
| 34986 |
+
},
|
| 34987 |
+
{
|
| 34988 |
+
"epoch": 0.31707936507936507,
|
| 34989 |
+
"grad_norm": 0.2578125,
|
| 34990 |
+
"learning_rate": 0.1,
|
| 34991 |
+
"loss": 2.2000479698181152,
|
| 34992 |
+
"step": 9988
|
| 34993 |
+
},
|
| 34994 |
+
{
|
| 34995 |
+
"epoch": 0.3171428571428571,
|
| 34996 |
+
"grad_norm": 0.1396484375,
|
| 34997 |
+
"learning_rate": 0.1,
|
| 34998 |
+
"loss": 2.1777122020721436,
|
| 34999 |
+
"step": 9990
|
| 35000 |
+
},
|
| 35001 |
+
{
|
| 35002 |
+
"epoch": 0.3172063492063492,
|
| 35003 |
+
"grad_norm": 0.1708984375,
|
| 35004 |
+
"learning_rate": 0.1,
|
| 35005 |
+
"loss": 2.1901791095733643,
|
| 35006 |
+
"step": 9992
|
| 35007 |
+
},
|
| 35008 |
+
{
|
| 35009 |
+
"epoch": 0.31726984126984126,
|
| 35010 |
+
"grad_norm": 0.078125,
|
| 35011 |
+
"learning_rate": 0.1,
|
| 35012 |
+
"loss": 2.194085121154785,
|
| 35013 |
+
"step": 9994
|
| 35014 |
+
},
|
| 35015 |
+
{
|
| 35016 |
+
"epoch": 0.31733333333333336,
|
| 35017 |
+
"grad_norm": 0.09033203125,
|
| 35018 |
+
"learning_rate": 0.1,
|
| 35019 |
+
"loss": 2.1851909160614014,
|
| 35020 |
+
"step": 9996
|
| 35021 |
+
},
|
| 35022 |
+
{
|
| 35023 |
+
"epoch": 0.3173968253968254,
|
| 35024 |
+
"grad_norm": 0.1875,
|
| 35025 |
+
"learning_rate": 0.1,
|
| 35026 |
+
"loss": 2.190678358078003,
|
| 35027 |
+
"step": 9998
|
| 35028 |
+
},
|
| 35029 |
+
{
|
| 35030 |
+
"epoch": 0.31746031746031744,
|
| 35031 |
+
"grad_norm": 0.25390625,
|
| 35032 |
+
"learning_rate": 0.1,
|
| 35033 |
+
"loss": 2.172008752822876,
|
| 35034 |
+
"step": 10000
|
| 35035 |
+
},
|
| 35036 |
+
{
|
| 35037 |
+
"epoch": 0.31752380952380954,
|
| 35038 |
+
"grad_norm": 0.0634765625,
|
| 35039 |
+
"learning_rate": 0.1,
|
| 35040 |
+
"loss": 2.1945793628692627,
|
| 35041 |
+
"step": 10002
|
| 35042 |
+
},
|
| 35043 |
+
{
|
| 35044 |
+
"epoch": 0.3175873015873016,
|
| 35045 |
+
"grad_norm": 0.1435546875,
|
| 35046 |
+
"learning_rate": 0.1,
|
| 35047 |
+
"loss": 2.173008680343628,
|
| 35048 |
+
"step": 10004
|
| 35049 |
+
},
|
| 35050 |
+
{
|
| 35051 |
+
"epoch": 0.31765079365079363,
|
| 35052 |
+
"grad_norm": 0.255859375,
|
| 35053 |
+
"learning_rate": 0.1,
|
| 35054 |
+
"loss": 2.1987736225128174,
|
| 35055 |
+
"step": 10006
|
| 35056 |
+
},
|
| 35057 |
+
{
|
| 35058 |
+
"epoch": 0.3177142857142857,
|
| 35059 |
+
"grad_norm": 0.306640625,
|
| 35060 |
+
"learning_rate": 0.1,
|
| 35061 |
+
"loss": 2.194270610809326,
|
| 35062 |
+
"step": 10008
|
| 35063 |
+
},
|
| 35064 |
+
{
|
| 35065 |
+
"epoch": 0.31777777777777777,
|
| 35066 |
+
"grad_norm": 0.07958984375,
|
| 35067 |
+
"learning_rate": 0.1,
|
| 35068 |
+
"loss": 2.194709300994873,
|
| 35069 |
+
"step": 10010
|
| 35070 |
+
},
|
| 35071 |
+
{
|
| 35072 |
+
"epoch": 0.31784126984126987,
|
| 35073 |
+
"grad_norm": 0.1630859375,
|
| 35074 |
+
"learning_rate": 0.1,
|
| 35075 |
+
"loss": 2.1713898181915283,
|
| 35076 |
+
"step": 10012
|
| 35077 |
+
},
|
| 35078 |
+
{
|
| 35079 |
+
"epoch": 0.3179047619047619,
|
| 35080 |
+
"grad_norm": 0.091796875,
|
| 35081 |
+
"learning_rate": 0.1,
|
| 35082 |
+
"loss": 2.1831634044647217,
|
| 35083 |
+
"step": 10014
|
| 35084 |
+
},
|
| 35085 |
+
{
|
| 35086 |
+
"epoch": 0.31796825396825396,
|
| 35087 |
+
"grad_norm": 0.11279296875,
|
| 35088 |
+
"learning_rate": 0.1,
|
| 35089 |
+
"loss": 2.1948635578155518,
|
| 35090 |
+
"step": 10016
|
| 35091 |
+
},
|
| 35092 |
+
{
|
| 35093 |
+
"epoch": 0.31803174603174605,
|
| 35094 |
+
"grad_norm": 0.1123046875,
|
| 35095 |
+
"learning_rate": 0.1,
|
| 35096 |
+
"loss": 2.1673130989074707,
|
| 35097 |
+
"step": 10018
|
| 35098 |
+
},
|
| 35099 |
+
{
|
| 35100 |
+
"epoch": 0.3180952380952381,
|
| 35101 |
+
"grad_norm": 0.15625,
|
| 35102 |
+
"learning_rate": 0.1,
|
| 35103 |
+
"loss": 2.1914515495300293,
|
| 35104 |
+
"step": 10020
|
| 35105 |
+
},
|
| 35106 |
+
{
|
| 35107 |
+
"epoch": 0.31815873015873014,
|
| 35108 |
+
"grad_norm": 0.330078125,
|
| 35109 |
+
"learning_rate": 0.1,
|
| 35110 |
+
"loss": 2.1910383701324463,
|
| 35111 |
+
"step": 10022
|
| 35112 |
+
},
|
| 35113 |
+
{
|
| 35114 |
+
"epoch": 0.31822222222222224,
|
| 35115 |
+
"grad_norm": 0.1259765625,
|
| 35116 |
+
"learning_rate": 0.1,
|
| 35117 |
+
"loss": 2.155423402786255,
|
| 35118 |
+
"step": 10024
|
| 35119 |
+
},
|
| 35120 |
+
{
|
| 35121 |
+
"epoch": 0.3182857142857143,
|
| 35122 |
+
"grad_norm": 0.130859375,
|
| 35123 |
+
"learning_rate": 0.1,
|
| 35124 |
+
"loss": 2.1624786853790283,
|
| 35125 |
+
"step": 10026
|
| 35126 |
+
},
|
| 35127 |
+
{
|
| 35128 |
+
"epoch": 0.3183492063492063,
|
| 35129 |
+
"grad_norm": 0.07421875,
|
| 35130 |
+
"learning_rate": 0.1,
|
| 35131 |
+
"loss": 2.1976842880249023,
|
| 35132 |
+
"step": 10028
|
| 35133 |
+
},
|
| 35134 |
+
{
|
| 35135 |
+
"epoch": 0.3184126984126984,
|
| 35136 |
+
"grad_norm": 0.248046875,
|
| 35137 |
+
"learning_rate": 0.1,
|
| 35138 |
+
"loss": 2.1654818058013916,
|
| 35139 |
+
"step": 10030
|
| 35140 |
+
},
|
| 35141 |
+
{
|
| 35142 |
+
"epoch": 0.31847619047619047,
|
| 35143 |
+
"grad_norm": 0.2275390625,
|
| 35144 |
+
"learning_rate": 0.1,
|
| 35145 |
+
"loss": 2.1663379669189453,
|
| 35146 |
+
"step": 10032
|
| 35147 |
+
},
|
| 35148 |
+
{
|
| 35149 |
+
"epoch": 0.31853968253968257,
|
| 35150 |
+
"grad_norm": 0.08154296875,
|
| 35151 |
+
"learning_rate": 0.1,
|
| 35152 |
+
"loss": 2.1839845180511475,
|
| 35153 |
+
"step": 10034
|
| 35154 |
+
},
|
| 35155 |
+
{
|
| 35156 |
+
"epoch": 0.3186031746031746,
|
| 35157 |
+
"grad_norm": 0.1572265625,
|
| 35158 |
+
"learning_rate": 0.1,
|
| 35159 |
+
"loss": 2.1859796047210693,
|
| 35160 |
+
"step": 10036
|
| 35161 |
+
},
|
| 35162 |
+
{
|
| 35163 |
+
"epoch": 0.31866666666666665,
|
| 35164 |
+
"grad_norm": 0.408203125,
|
| 35165 |
+
"learning_rate": 0.1,
|
| 35166 |
+
"loss": 2.171574354171753,
|
| 35167 |
+
"step": 10038
|
| 35168 |
+
},
|
| 35169 |
+
{
|
| 35170 |
+
"epoch": 0.31873015873015875,
|
| 35171 |
+
"grad_norm": 0.09228515625,
|
| 35172 |
+
"learning_rate": 0.1,
|
| 35173 |
+
"loss": 2.1888623237609863,
|
| 35174 |
+
"step": 10040
|
| 35175 |
+
},
|
| 35176 |
+
{
|
| 35177 |
+
"epoch": 0.3187936507936508,
|
| 35178 |
+
"grad_norm": 0.287109375,
|
| 35179 |
+
"learning_rate": 0.1,
|
| 35180 |
+
"loss": 2.2281229496002197,
|
| 35181 |
+
"step": 10042
|
| 35182 |
+
},
|
| 35183 |
+
{
|
| 35184 |
+
"epoch": 0.31885714285714284,
|
| 35185 |
+
"grad_norm": 0.06591796875,
|
| 35186 |
+
"learning_rate": 0.1,
|
| 35187 |
+
"loss": 2.215243339538574,
|
| 35188 |
+
"step": 10044
|
| 35189 |
+
},
|
| 35190 |
+
{
|
| 35191 |
+
"epoch": 0.31892063492063494,
|
| 35192 |
+
"grad_norm": 0.11767578125,
|
| 35193 |
+
"learning_rate": 0.1,
|
| 35194 |
+
"loss": 2.1814558506011963,
|
| 35195 |
+
"step": 10046
|
| 35196 |
+
},
|
| 35197 |
+
{
|
| 35198 |
+
"epoch": 0.318984126984127,
|
| 35199 |
+
"grad_norm": 0.10986328125,
|
| 35200 |
+
"learning_rate": 0.1,
|
| 35201 |
+
"loss": 2.185574769973755,
|
| 35202 |
+
"step": 10048
|
| 35203 |
+
},
|
| 35204 |
+
{
|
| 35205 |
+
"epoch": 0.319047619047619,
|
| 35206 |
+
"grad_norm": 0.048583984375,
|
| 35207 |
+
"learning_rate": 0.1,
|
| 35208 |
+
"loss": 2.209963321685791,
|
| 35209 |
+
"step": 10050
|
| 35210 |
+
},
|
| 35211 |
+
{
|
| 35212 |
+
"epoch": 0.3191111111111111,
|
| 35213 |
+
"grad_norm": 0.173828125,
|
| 35214 |
+
"learning_rate": 0.1,
|
| 35215 |
+
"loss": 2.1775128841400146,
|
| 35216 |
+
"step": 10052
|
| 35217 |
+
},
|
| 35218 |
+
{
|
| 35219 |
+
"epoch": 0.31917460317460317,
|
| 35220 |
+
"grad_norm": 0.171875,
|
| 35221 |
+
"learning_rate": 0.1,
|
| 35222 |
+
"loss": 2.1951544284820557,
|
| 35223 |
+
"step": 10054
|
| 35224 |
+
},
|
| 35225 |
+
{
|
| 35226 |
+
"epoch": 0.31923809523809527,
|
| 35227 |
+
"grad_norm": 0.08203125,
|
| 35228 |
+
"learning_rate": 0.1,
|
| 35229 |
+
"loss": 2.2017500400543213,
|
| 35230 |
+
"step": 10056
|
| 35231 |
+
},
|
| 35232 |
+
{
|
| 35233 |
+
"epoch": 0.3193015873015873,
|
| 35234 |
+
"grad_norm": 0.064453125,
|
| 35235 |
+
"learning_rate": 0.1,
|
| 35236 |
+
"loss": 2.179600238800049,
|
| 35237 |
+
"step": 10058
|
| 35238 |
+
},
|
| 35239 |
+
{
|
| 35240 |
+
"epoch": 0.31936507936507935,
|
| 35241 |
+
"grad_norm": 0.1865234375,
|
| 35242 |
+
"learning_rate": 0.1,
|
| 35243 |
+
"loss": 2.2006568908691406,
|
| 35244 |
+
"step": 10060
|
| 35245 |
+
},
|
| 35246 |
+
{
|
| 35247 |
+
"epoch": 0.31942857142857145,
|
| 35248 |
+
"grad_norm": 0.458984375,
|
| 35249 |
+
"learning_rate": 0.1,
|
| 35250 |
+
"loss": 2.195237398147583,
|
| 35251 |
+
"step": 10062
|
| 35252 |
+
},
|
| 35253 |
+
{
|
| 35254 |
+
"epoch": 0.3194920634920635,
|
| 35255 |
+
"grad_norm": 0.05029296875,
|
| 35256 |
+
"learning_rate": 0.1,
|
| 35257 |
+
"loss": 2.193232536315918,
|
| 35258 |
+
"step": 10064
|
| 35259 |
+
},
|
| 35260 |
+
{
|
| 35261 |
+
"epoch": 0.31955555555555554,
|
| 35262 |
+
"grad_norm": 0.09130859375,
|
| 35263 |
+
"learning_rate": 0.1,
|
| 35264 |
+
"loss": 2.1862893104553223,
|
| 35265 |
+
"step": 10066
|
| 35266 |
+
},
|
| 35267 |
+
{
|
| 35268 |
+
"epoch": 0.31961904761904764,
|
| 35269 |
+
"grad_norm": 0.1318359375,
|
| 35270 |
+
"learning_rate": 0.1,
|
| 35271 |
+
"loss": 2.2037243843078613,
|
| 35272 |
+
"step": 10068
|
| 35273 |
+
},
|
| 35274 |
+
{
|
| 35275 |
+
"epoch": 0.3196825396825397,
|
| 35276 |
+
"grad_norm": 0.1328125,
|
| 35277 |
+
"learning_rate": 0.1,
|
| 35278 |
+
"loss": 2.184126615524292,
|
| 35279 |
+
"step": 10070
|
| 35280 |
+
},
|
| 35281 |
+
{
|
| 35282 |
+
"epoch": 0.3197460317460317,
|
| 35283 |
+
"grad_norm": 0.1865234375,
|
| 35284 |
+
"learning_rate": 0.1,
|
| 35285 |
+
"loss": 2.2134101390838623,
|
| 35286 |
+
"step": 10072
|
| 35287 |
+
},
|
| 35288 |
+
{
|
| 35289 |
+
"epoch": 0.3198095238095238,
|
| 35290 |
+
"grad_norm": 0.083984375,
|
| 35291 |
+
"learning_rate": 0.1,
|
| 35292 |
+
"loss": 2.183708667755127,
|
| 35293 |
+
"step": 10074
|
| 35294 |
+
},
|
| 35295 |
+
{
|
| 35296 |
+
"epoch": 0.31987301587301586,
|
| 35297 |
+
"grad_norm": 0.06494140625,
|
| 35298 |
+
"learning_rate": 0.1,
|
| 35299 |
+
"loss": 2.20131516456604,
|
| 35300 |
+
"step": 10076
|
| 35301 |
+
},
|
| 35302 |
+
{
|
| 35303 |
+
"epoch": 0.31993650793650796,
|
| 35304 |
+
"grad_norm": 0.05322265625,
|
| 35305 |
+
"learning_rate": 0.1,
|
| 35306 |
+
"loss": 2.1745753288269043,
|
| 35307 |
+
"step": 10078
|
| 35308 |
+
},
|
| 35309 |
+
{
|
| 35310 |
+
"epoch": 0.32,
|
| 35311 |
+
"grad_norm": 0.08349609375,
|
| 35312 |
+
"learning_rate": 0.1,
|
| 35313 |
+
"loss": 2.191664218902588,
|
| 35314 |
+
"step": 10080
|
| 35315 |
}
|
| 35316 |
],
|
| 35317 |
"logging_steps": 2,
|
|
|
|
| 35331 |
"attributes": {}
|
| 35332 |
}
|
| 35333 |
},
|
| 35334 |
+
"total_flos": 3.33837643366914e+19,
|
| 35335 |
"train_batch_size": 4,
|
| 35336 |
"trial_name": null,
|
| 35337 |
"trial_params": null
|