Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 12600, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1171937904
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4f7a031ce062de2717c20cbfe28bf235cc8c0984196df8647a9d04071a672be7
|
| 3 |
size 1171937904
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1288212619
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2bca4c44c416761810f1c3083731707ad3a2d1e7ac24304feb6b7f426c34993d
|
| 3 |
size 1288212619
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1401
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:17739480306268eecb229c9abd21a55ac8184b30446253afafb60d7a0227de30
|
| 3 |
size 1401
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -43026,6 +43026,1120 @@
|
|
| 43026 |
"learning_rate": 0.1,
|
| 43027 |
"loss": 2.452523708343506,
|
| 43028 |
"step": 12284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43029 |
}
|
| 43030 |
],
|
| 43031 |
"logging_steps": 2,
|
|
@@ -43045,7 +44159,7 @@
|
|
| 43045 |
"attributes": {}
|
| 43046 |
}
|
| 43047 |
},
|
| 43048 |
-
"total_flos": 4.
|
| 43049 |
"train_batch_size": 4,
|
| 43050 |
"trial_name": null,
|
| 43051 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.4,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 12600,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 43026 |
"learning_rate": 0.1,
|
| 43027 |
"loss": 2.452523708343506,
|
| 43028 |
"step": 12284
|
| 43029 |
+
},
|
| 43030 |
+
{
|
| 43031 |
+
"epoch": 0.390031746031746,
|
| 43032 |
+
"grad_norm": 0.109375,
|
| 43033 |
+
"learning_rate": 0.1,
|
| 43034 |
+
"loss": 2.4557151794433594,
|
| 43035 |
+
"step": 12286
|
| 43036 |
+
},
|
| 43037 |
+
{
|
| 43038 |
+
"epoch": 0.3900952380952381,
|
| 43039 |
+
"grad_norm": 0.1884765625,
|
| 43040 |
+
"learning_rate": 0.1,
|
| 43041 |
+
"loss": 2.424776315689087,
|
| 43042 |
+
"step": 12288
|
| 43043 |
+
},
|
| 43044 |
+
{
|
| 43045 |
+
"epoch": 0.39015873015873015,
|
| 43046 |
+
"grad_norm": 0.279296875,
|
| 43047 |
+
"learning_rate": 0.1,
|
| 43048 |
+
"loss": 2.4696648120880127,
|
| 43049 |
+
"step": 12290
|
| 43050 |
+
},
|
| 43051 |
+
{
|
| 43052 |
+
"epoch": 0.39022222222222225,
|
| 43053 |
+
"grad_norm": 0.1064453125,
|
| 43054 |
+
"learning_rate": 0.1,
|
| 43055 |
+
"loss": 2.4684576988220215,
|
| 43056 |
+
"step": 12292
|
| 43057 |
+
},
|
| 43058 |
+
{
|
| 43059 |
+
"epoch": 0.3902857142857143,
|
| 43060 |
+
"grad_norm": 0.1806640625,
|
| 43061 |
+
"learning_rate": 0.1,
|
| 43062 |
+
"loss": 2.461087942123413,
|
| 43063 |
+
"step": 12294
|
| 43064 |
+
},
|
| 43065 |
+
{
|
| 43066 |
+
"epoch": 0.39034920634920633,
|
| 43067 |
+
"grad_norm": 0.203125,
|
| 43068 |
+
"learning_rate": 0.1,
|
| 43069 |
+
"loss": 2.430222749710083,
|
| 43070 |
+
"step": 12296
|
| 43071 |
+
},
|
| 43072 |
+
{
|
| 43073 |
+
"epoch": 0.39041269841269843,
|
| 43074 |
+
"grad_norm": 0.35546875,
|
| 43075 |
+
"learning_rate": 0.1,
|
| 43076 |
+
"loss": 2.4839582443237305,
|
| 43077 |
+
"step": 12298
|
| 43078 |
+
},
|
| 43079 |
+
{
|
| 43080 |
+
"epoch": 0.3904761904761905,
|
| 43081 |
+
"grad_norm": 0.1611328125,
|
| 43082 |
+
"learning_rate": 0.1,
|
| 43083 |
+
"loss": 2.4276487827301025,
|
| 43084 |
+
"step": 12300
|
| 43085 |
+
},
|
| 43086 |
+
{
|
| 43087 |
+
"epoch": 0.3905396825396825,
|
| 43088 |
+
"grad_norm": 0.123046875,
|
| 43089 |
+
"learning_rate": 0.1,
|
| 43090 |
+
"loss": 2.4334208965301514,
|
| 43091 |
+
"step": 12302
|
| 43092 |
+
},
|
| 43093 |
+
{
|
| 43094 |
+
"epoch": 0.3906031746031746,
|
| 43095 |
+
"grad_norm": 0.2041015625,
|
| 43096 |
+
"learning_rate": 0.1,
|
| 43097 |
+
"loss": 2.4657487869262695,
|
| 43098 |
+
"step": 12304
|
| 43099 |
+
},
|
| 43100 |
+
{
|
| 43101 |
+
"epoch": 0.39066666666666666,
|
| 43102 |
+
"grad_norm": 0.287109375,
|
| 43103 |
+
"learning_rate": 0.1,
|
| 43104 |
+
"loss": 2.4697165489196777,
|
| 43105 |
+
"step": 12306
|
| 43106 |
+
},
|
| 43107 |
+
{
|
| 43108 |
+
"epoch": 0.3907301587301587,
|
| 43109 |
+
"grad_norm": 0.263671875,
|
| 43110 |
+
"learning_rate": 0.1,
|
| 43111 |
+
"loss": 2.4890670776367188,
|
| 43112 |
+
"step": 12308
|
| 43113 |
+
},
|
| 43114 |
+
{
|
| 43115 |
+
"epoch": 0.3907936507936508,
|
| 43116 |
+
"grad_norm": 0.078125,
|
| 43117 |
+
"learning_rate": 0.1,
|
| 43118 |
+
"loss": 2.482919931411743,
|
| 43119 |
+
"step": 12310
|
| 43120 |
+
},
|
| 43121 |
+
{
|
| 43122 |
+
"epoch": 0.39085714285714285,
|
| 43123 |
+
"grad_norm": 0.1845703125,
|
| 43124 |
+
"learning_rate": 0.1,
|
| 43125 |
+
"loss": 2.482670545578003,
|
| 43126 |
+
"step": 12312
|
| 43127 |
+
},
|
| 43128 |
+
{
|
| 43129 |
+
"epoch": 0.39092063492063495,
|
| 43130 |
+
"grad_norm": 0.4765625,
|
| 43131 |
+
"learning_rate": 0.1,
|
| 43132 |
+
"loss": 2.473506212234497,
|
| 43133 |
+
"step": 12314
|
| 43134 |
+
},
|
| 43135 |
+
{
|
| 43136 |
+
"epoch": 0.390984126984127,
|
| 43137 |
+
"grad_norm": 0.11865234375,
|
| 43138 |
+
"learning_rate": 0.1,
|
| 43139 |
+
"loss": 2.484362840652466,
|
| 43140 |
+
"step": 12316
|
| 43141 |
+
},
|
| 43142 |
+
{
|
| 43143 |
+
"epoch": 0.39104761904761903,
|
| 43144 |
+
"grad_norm": 0.09375,
|
| 43145 |
+
"learning_rate": 0.1,
|
| 43146 |
+
"loss": 2.49501895904541,
|
| 43147 |
+
"step": 12318
|
| 43148 |
+
},
|
| 43149 |
+
{
|
| 43150 |
+
"epoch": 0.39111111111111113,
|
| 43151 |
+
"grad_norm": 0.1044921875,
|
| 43152 |
+
"learning_rate": 0.1,
|
| 43153 |
+
"loss": 2.473717451095581,
|
| 43154 |
+
"step": 12320
|
| 43155 |
+
},
|
| 43156 |
+
{
|
| 43157 |
+
"epoch": 0.3911746031746032,
|
| 43158 |
+
"grad_norm": 0.13671875,
|
| 43159 |
+
"learning_rate": 0.1,
|
| 43160 |
+
"loss": 2.4876880645751953,
|
| 43161 |
+
"step": 12322
|
| 43162 |
+
},
|
| 43163 |
+
{
|
| 43164 |
+
"epoch": 0.3912380952380952,
|
| 43165 |
+
"grad_norm": 0.220703125,
|
| 43166 |
+
"learning_rate": 0.1,
|
| 43167 |
+
"loss": 2.4805827140808105,
|
| 43168 |
+
"step": 12324
|
| 43169 |
+
},
|
| 43170 |
+
{
|
| 43171 |
+
"epoch": 0.3913015873015873,
|
| 43172 |
+
"grad_norm": 0.31640625,
|
| 43173 |
+
"learning_rate": 0.1,
|
| 43174 |
+
"loss": 2.508021593093872,
|
| 43175 |
+
"step": 12326
|
| 43176 |
+
},
|
| 43177 |
+
{
|
| 43178 |
+
"epoch": 0.39136507936507936,
|
| 43179 |
+
"grad_norm": 0.33984375,
|
| 43180 |
+
"learning_rate": 0.1,
|
| 43181 |
+
"loss": 2.4558026790618896,
|
| 43182 |
+
"step": 12328
|
| 43183 |
+
},
|
| 43184 |
+
{
|
| 43185 |
+
"epoch": 0.3914285714285714,
|
| 43186 |
+
"grad_norm": 0.138671875,
|
| 43187 |
+
"learning_rate": 0.1,
|
| 43188 |
+
"loss": 2.4671530723571777,
|
| 43189 |
+
"step": 12330
|
| 43190 |
+
},
|
| 43191 |
+
{
|
| 43192 |
+
"epoch": 0.3914920634920635,
|
| 43193 |
+
"grad_norm": 0.1435546875,
|
| 43194 |
+
"learning_rate": 0.1,
|
| 43195 |
+
"loss": 2.4863462448120117,
|
| 43196 |
+
"step": 12332
|
| 43197 |
+
},
|
| 43198 |
+
{
|
| 43199 |
+
"epoch": 0.39155555555555555,
|
| 43200 |
+
"grad_norm": 0.2041015625,
|
| 43201 |
+
"learning_rate": 0.1,
|
| 43202 |
+
"loss": 2.462531328201294,
|
| 43203 |
+
"step": 12334
|
| 43204 |
+
},
|
| 43205 |
+
{
|
| 43206 |
+
"epoch": 0.39161904761904764,
|
| 43207 |
+
"grad_norm": 0.11865234375,
|
| 43208 |
+
"learning_rate": 0.1,
|
| 43209 |
+
"loss": 2.4865562915802,
|
| 43210 |
+
"step": 12336
|
| 43211 |
+
},
|
| 43212 |
+
{
|
| 43213 |
+
"epoch": 0.3916825396825397,
|
| 43214 |
+
"grad_norm": 0.12353515625,
|
| 43215 |
+
"learning_rate": 0.1,
|
| 43216 |
+
"loss": 2.481738805770874,
|
| 43217 |
+
"step": 12338
|
| 43218 |
+
},
|
| 43219 |
+
{
|
| 43220 |
+
"epoch": 0.39174603174603173,
|
| 43221 |
+
"grad_norm": 0.130859375,
|
| 43222 |
+
"learning_rate": 0.1,
|
| 43223 |
+
"loss": 2.490565061569214,
|
| 43224 |
+
"step": 12340
|
| 43225 |
+
},
|
| 43226 |
+
{
|
| 43227 |
+
"epoch": 0.39180952380952383,
|
| 43228 |
+
"grad_norm": 0.1826171875,
|
| 43229 |
+
"learning_rate": 0.1,
|
| 43230 |
+
"loss": 2.481773614883423,
|
| 43231 |
+
"step": 12342
|
| 43232 |
+
},
|
| 43233 |
+
{
|
| 43234 |
+
"epoch": 0.3918730158730159,
|
| 43235 |
+
"grad_norm": 0.50390625,
|
| 43236 |
+
"learning_rate": 0.1,
|
| 43237 |
+
"loss": 2.4730658531188965,
|
| 43238 |
+
"step": 12344
|
| 43239 |
+
},
|
| 43240 |
+
{
|
| 43241 |
+
"epoch": 0.3919365079365079,
|
| 43242 |
+
"grad_norm": 0.298828125,
|
| 43243 |
+
"learning_rate": 0.1,
|
| 43244 |
+
"loss": 2.4841883182525635,
|
| 43245 |
+
"step": 12346
|
| 43246 |
+
},
|
| 43247 |
+
{
|
| 43248 |
+
"epoch": 0.392,
|
| 43249 |
+
"grad_norm": 0.1572265625,
|
| 43250 |
+
"learning_rate": 0.1,
|
| 43251 |
+
"loss": 2.493680715560913,
|
| 43252 |
+
"step": 12348
|
| 43253 |
+
},
|
| 43254 |
+
{
|
| 43255 |
+
"epoch": 0.39206349206349206,
|
| 43256 |
+
"grad_norm": 0.130859375,
|
| 43257 |
+
"learning_rate": 0.1,
|
| 43258 |
+
"loss": 2.502474784851074,
|
| 43259 |
+
"step": 12350
|
| 43260 |
+
},
|
| 43261 |
+
{
|
| 43262 |
+
"epoch": 0.3921269841269841,
|
| 43263 |
+
"grad_norm": 0.1591796875,
|
| 43264 |
+
"learning_rate": 0.1,
|
| 43265 |
+
"loss": 2.4785163402557373,
|
| 43266 |
+
"step": 12352
|
| 43267 |
+
},
|
| 43268 |
+
{
|
| 43269 |
+
"epoch": 0.3921904761904762,
|
| 43270 |
+
"grad_norm": 0.4296875,
|
| 43271 |
+
"learning_rate": 0.1,
|
| 43272 |
+
"loss": 2.491947650909424,
|
| 43273 |
+
"step": 12354
|
| 43274 |
+
},
|
| 43275 |
+
{
|
| 43276 |
+
"epoch": 0.39225396825396824,
|
| 43277 |
+
"grad_norm": 0.1328125,
|
| 43278 |
+
"learning_rate": 0.1,
|
| 43279 |
+
"loss": 2.5014781951904297,
|
| 43280 |
+
"step": 12356
|
| 43281 |
+
},
|
| 43282 |
+
{
|
| 43283 |
+
"epoch": 0.39231746031746034,
|
| 43284 |
+
"grad_norm": 0.125,
|
| 43285 |
+
"learning_rate": 0.1,
|
| 43286 |
+
"loss": 2.479877471923828,
|
| 43287 |
+
"step": 12358
|
| 43288 |
+
},
|
| 43289 |
+
{
|
| 43290 |
+
"epoch": 0.3923809523809524,
|
| 43291 |
+
"grad_norm": 0.099609375,
|
| 43292 |
+
"learning_rate": 0.1,
|
| 43293 |
+
"loss": 2.460399627685547,
|
| 43294 |
+
"step": 12360
|
| 43295 |
+
},
|
| 43296 |
+
{
|
| 43297 |
+
"epoch": 0.39244444444444443,
|
| 43298 |
+
"grad_norm": 0.236328125,
|
| 43299 |
+
"learning_rate": 0.1,
|
| 43300 |
+
"loss": 2.4771628379821777,
|
| 43301 |
+
"step": 12362
|
| 43302 |
+
},
|
| 43303 |
+
{
|
| 43304 |
+
"epoch": 0.39250793650793653,
|
| 43305 |
+
"grad_norm": 0.09619140625,
|
| 43306 |
+
"learning_rate": 0.1,
|
| 43307 |
+
"loss": 2.5179996490478516,
|
| 43308 |
+
"step": 12364
|
| 43309 |
+
},
|
| 43310 |
+
{
|
| 43311 |
+
"epoch": 0.39257142857142857,
|
| 43312 |
+
"grad_norm": 0.158203125,
|
| 43313 |
+
"learning_rate": 0.1,
|
| 43314 |
+
"loss": 2.4573960304260254,
|
| 43315 |
+
"step": 12366
|
| 43316 |
+
},
|
| 43317 |
+
{
|
| 43318 |
+
"epoch": 0.3926349206349206,
|
| 43319 |
+
"grad_norm": 0.07666015625,
|
| 43320 |
+
"learning_rate": 0.1,
|
| 43321 |
+
"loss": 2.488504409790039,
|
| 43322 |
+
"step": 12368
|
| 43323 |
+
},
|
| 43324 |
+
{
|
| 43325 |
+
"epoch": 0.3926984126984127,
|
| 43326 |
+
"grad_norm": 0.279296875,
|
| 43327 |
+
"learning_rate": 0.1,
|
| 43328 |
+
"loss": 2.48992657661438,
|
| 43329 |
+
"step": 12370
|
| 43330 |
+
},
|
| 43331 |
+
{
|
| 43332 |
+
"epoch": 0.39276190476190476,
|
| 43333 |
+
"grad_norm": 0.443359375,
|
| 43334 |
+
"learning_rate": 0.1,
|
| 43335 |
+
"loss": 2.4311540126800537,
|
| 43336 |
+
"step": 12372
|
| 43337 |
+
},
|
| 43338 |
+
{
|
| 43339 |
+
"epoch": 0.3928253968253968,
|
| 43340 |
+
"grad_norm": 0.0859375,
|
| 43341 |
+
"learning_rate": 0.1,
|
| 43342 |
+
"loss": 2.4825241565704346,
|
| 43343 |
+
"step": 12374
|
| 43344 |
+
},
|
| 43345 |
+
{
|
| 43346 |
+
"epoch": 0.3928888888888889,
|
| 43347 |
+
"grad_norm": 0.140625,
|
| 43348 |
+
"learning_rate": 0.1,
|
| 43349 |
+
"loss": 2.4729347229003906,
|
| 43350 |
+
"step": 12376
|
| 43351 |
+
},
|
| 43352 |
+
{
|
| 43353 |
+
"epoch": 0.39295238095238094,
|
| 43354 |
+
"grad_norm": 0.1484375,
|
| 43355 |
+
"learning_rate": 0.1,
|
| 43356 |
+
"loss": 2.4759974479675293,
|
| 43357 |
+
"step": 12378
|
| 43358 |
+
},
|
| 43359 |
+
{
|
| 43360 |
+
"epoch": 0.39301587301587304,
|
| 43361 |
+
"grad_norm": 0.05712890625,
|
| 43362 |
+
"learning_rate": 0.1,
|
| 43363 |
+
"loss": 2.490987539291382,
|
| 43364 |
+
"step": 12380
|
| 43365 |
+
},
|
| 43366 |
+
{
|
| 43367 |
+
"epoch": 0.3930793650793651,
|
| 43368 |
+
"grad_norm": 0.0595703125,
|
| 43369 |
+
"learning_rate": 0.1,
|
| 43370 |
+
"loss": 2.4801199436187744,
|
| 43371 |
+
"step": 12382
|
| 43372 |
+
},
|
| 43373 |
+
{
|
| 43374 |
+
"epoch": 0.3931428571428571,
|
| 43375 |
+
"grad_norm": 0.0966796875,
|
| 43376 |
+
"learning_rate": 0.1,
|
| 43377 |
+
"loss": 2.4446208477020264,
|
| 43378 |
+
"step": 12384
|
| 43379 |
+
},
|
| 43380 |
+
{
|
| 43381 |
+
"epoch": 0.3932063492063492,
|
| 43382 |
+
"grad_norm": 0.30078125,
|
| 43383 |
+
"learning_rate": 0.1,
|
| 43384 |
+
"loss": 2.462266683578491,
|
| 43385 |
+
"step": 12386
|
| 43386 |
+
},
|
| 43387 |
+
{
|
| 43388 |
+
"epoch": 0.39326984126984127,
|
| 43389 |
+
"grad_norm": 0.55859375,
|
| 43390 |
+
"learning_rate": 0.1,
|
| 43391 |
+
"loss": 2.4918887615203857,
|
| 43392 |
+
"step": 12388
|
| 43393 |
+
},
|
| 43394 |
+
{
|
| 43395 |
+
"epoch": 0.3933333333333333,
|
| 43396 |
+
"grad_norm": 0.126953125,
|
| 43397 |
+
"learning_rate": 0.1,
|
| 43398 |
+
"loss": 2.4611899852752686,
|
| 43399 |
+
"step": 12390
|
| 43400 |
+
},
|
| 43401 |
+
{
|
| 43402 |
+
"epoch": 0.3933968253968254,
|
| 43403 |
+
"grad_norm": 0.1796875,
|
| 43404 |
+
"learning_rate": 0.1,
|
| 43405 |
+
"loss": 2.4827232360839844,
|
| 43406 |
+
"step": 12392
|
| 43407 |
+
},
|
| 43408 |
+
{
|
| 43409 |
+
"epoch": 0.39346031746031745,
|
| 43410 |
+
"grad_norm": 0.1640625,
|
| 43411 |
+
"learning_rate": 0.1,
|
| 43412 |
+
"loss": 2.4664206504821777,
|
| 43413 |
+
"step": 12394
|
| 43414 |
+
},
|
| 43415 |
+
{
|
| 43416 |
+
"epoch": 0.3935238095238095,
|
| 43417 |
+
"grad_norm": 0.09375,
|
| 43418 |
+
"learning_rate": 0.1,
|
| 43419 |
+
"loss": 2.463472843170166,
|
| 43420 |
+
"step": 12396
|
| 43421 |
+
},
|
| 43422 |
+
{
|
| 43423 |
+
"epoch": 0.3935873015873016,
|
| 43424 |
+
"grad_norm": 0.146484375,
|
| 43425 |
+
"learning_rate": 0.1,
|
| 43426 |
+
"loss": 2.452786445617676,
|
| 43427 |
+
"step": 12398
|
| 43428 |
+
},
|
| 43429 |
+
{
|
| 43430 |
+
"epoch": 0.39365079365079364,
|
| 43431 |
+
"grad_norm": 0.16015625,
|
| 43432 |
+
"learning_rate": 0.1,
|
| 43433 |
+
"loss": 2.477893590927124,
|
| 43434 |
+
"step": 12400
|
| 43435 |
+
},
|
| 43436 |
+
{
|
| 43437 |
+
"epoch": 0.39371428571428574,
|
| 43438 |
+
"grad_norm": 0.3359375,
|
| 43439 |
+
"learning_rate": 0.1,
|
| 43440 |
+
"loss": 2.4455089569091797,
|
| 43441 |
+
"step": 12402
|
| 43442 |
+
},
|
| 43443 |
+
{
|
| 43444 |
+
"epoch": 0.3937777777777778,
|
| 43445 |
+
"grad_norm": 0.53515625,
|
| 43446 |
+
"learning_rate": 0.1,
|
| 43447 |
+
"loss": 2.4739983081817627,
|
| 43448 |
+
"step": 12404
|
| 43449 |
+
},
|
| 43450 |
+
{
|
| 43451 |
+
"epoch": 0.3938412698412698,
|
| 43452 |
+
"grad_norm": 0.2197265625,
|
| 43453 |
+
"learning_rate": 0.1,
|
| 43454 |
+
"loss": 2.463010549545288,
|
| 43455 |
+
"step": 12406
|
| 43456 |
+
},
|
| 43457 |
+
{
|
| 43458 |
+
"epoch": 0.3939047619047619,
|
| 43459 |
+
"grad_norm": 0.09765625,
|
| 43460 |
+
"learning_rate": 0.1,
|
| 43461 |
+
"loss": 2.485477924346924,
|
| 43462 |
+
"step": 12408
|
| 43463 |
+
},
|
| 43464 |
+
{
|
| 43465 |
+
"epoch": 0.39396825396825397,
|
| 43466 |
+
"grad_norm": 0.11376953125,
|
| 43467 |
+
"learning_rate": 0.1,
|
| 43468 |
+
"loss": 2.4666271209716797,
|
| 43469 |
+
"step": 12410
|
| 43470 |
+
},
|
| 43471 |
+
{
|
| 43472 |
+
"epoch": 0.394031746031746,
|
| 43473 |
+
"grad_norm": 0.11962890625,
|
| 43474 |
+
"learning_rate": 0.1,
|
| 43475 |
+
"loss": 2.4628119468688965,
|
| 43476 |
+
"step": 12412
|
| 43477 |
+
},
|
| 43478 |
+
{
|
| 43479 |
+
"epoch": 0.3940952380952381,
|
| 43480 |
+
"grad_norm": 0.283203125,
|
| 43481 |
+
"learning_rate": 0.1,
|
| 43482 |
+
"loss": 2.4641313552856445,
|
| 43483 |
+
"step": 12414
|
| 43484 |
+
},
|
| 43485 |
+
{
|
| 43486 |
+
"epoch": 0.39415873015873015,
|
| 43487 |
+
"grad_norm": 0.37890625,
|
| 43488 |
+
"learning_rate": 0.1,
|
| 43489 |
+
"loss": 2.4827420711517334,
|
| 43490 |
+
"step": 12416
|
| 43491 |
+
},
|
| 43492 |
+
{
|
| 43493 |
+
"epoch": 0.3942222222222222,
|
| 43494 |
+
"grad_norm": 0.26953125,
|
| 43495 |
+
"learning_rate": 0.1,
|
| 43496 |
+
"loss": 2.471240520477295,
|
| 43497 |
+
"step": 12418
|
| 43498 |
+
},
|
| 43499 |
+
{
|
| 43500 |
+
"epoch": 0.3942857142857143,
|
| 43501 |
+
"grad_norm": 0.10791015625,
|
| 43502 |
+
"learning_rate": 0.1,
|
| 43503 |
+
"loss": 2.4685475826263428,
|
| 43504 |
+
"step": 12420
|
| 43505 |
+
},
|
| 43506 |
+
{
|
| 43507 |
+
"epoch": 0.39434920634920634,
|
| 43508 |
+
"grad_norm": 0.12158203125,
|
| 43509 |
+
"learning_rate": 0.1,
|
| 43510 |
+
"loss": 2.4665637016296387,
|
| 43511 |
+
"step": 12422
|
| 43512 |
+
},
|
| 43513 |
+
{
|
| 43514 |
+
"epoch": 0.39441269841269844,
|
| 43515 |
+
"grad_norm": 0.2177734375,
|
| 43516 |
+
"learning_rate": 0.1,
|
| 43517 |
+
"loss": 2.4597582817077637,
|
| 43518 |
+
"step": 12424
|
| 43519 |
+
},
|
| 43520 |
+
{
|
| 43521 |
+
"epoch": 0.3944761904761905,
|
| 43522 |
+
"grad_norm": 0.06982421875,
|
| 43523 |
+
"learning_rate": 0.1,
|
| 43524 |
+
"loss": 2.4702558517456055,
|
| 43525 |
+
"step": 12426
|
| 43526 |
+
},
|
| 43527 |
+
{
|
| 43528 |
+
"epoch": 0.3945396825396825,
|
| 43529 |
+
"grad_norm": 0.1337890625,
|
| 43530 |
+
"learning_rate": 0.1,
|
| 43531 |
+
"loss": 2.458078145980835,
|
| 43532 |
+
"step": 12428
|
| 43533 |
+
},
|
| 43534 |
+
{
|
| 43535 |
+
"epoch": 0.3946031746031746,
|
| 43536 |
+
"grad_norm": 0.07568359375,
|
| 43537 |
+
"learning_rate": 0.1,
|
| 43538 |
+
"loss": 2.460068702697754,
|
| 43539 |
+
"step": 12430
|
| 43540 |
+
},
|
| 43541 |
+
{
|
| 43542 |
+
"epoch": 0.39466666666666667,
|
| 43543 |
+
"grad_norm": 0.283203125,
|
| 43544 |
+
"learning_rate": 0.1,
|
| 43545 |
+
"loss": 2.4554131031036377,
|
| 43546 |
+
"step": 12432
|
| 43547 |
+
},
|
| 43548 |
+
{
|
| 43549 |
+
"epoch": 0.3947301587301587,
|
| 43550 |
+
"grad_norm": 0.08056640625,
|
| 43551 |
+
"learning_rate": 0.1,
|
| 43552 |
+
"loss": 2.488631248474121,
|
| 43553 |
+
"step": 12434
|
| 43554 |
+
},
|
| 43555 |
+
{
|
| 43556 |
+
"epoch": 0.3947936507936508,
|
| 43557 |
+
"grad_norm": 0.205078125,
|
| 43558 |
+
"learning_rate": 0.1,
|
| 43559 |
+
"loss": 2.4419941902160645,
|
| 43560 |
+
"step": 12436
|
| 43561 |
+
},
|
| 43562 |
+
{
|
| 43563 |
+
"epoch": 0.39485714285714285,
|
| 43564 |
+
"grad_norm": 0.25,
|
| 43565 |
+
"learning_rate": 0.1,
|
| 43566 |
+
"loss": 2.4841761589050293,
|
| 43567 |
+
"step": 12438
|
| 43568 |
+
},
|
| 43569 |
+
{
|
| 43570 |
+
"epoch": 0.3949206349206349,
|
| 43571 |
+
"grad_norm": 0.26171875,
|
| 43572 |
+
"learning_rate": 0.1,
|
| 43573 |
+
"loss": 2.4492082595825195,
|
| 43574 |
+
"step": 12440
|
| 43575 |
+
},
|
| 43576 |
+
{
|
| 43577 |
+
"epoch": 0.394984126984127,
|
| 43578 |
+
"grad_norm": 0.26953125,
|
| 43579 |
+
"learning_rate": 0.1,
|
| 43580 |
+
"loss": 2.447962760925293,
|
| 43581 |
+
"step": 12442
|
| 43582 |
+
},
|
| 43583 |
+
{
|
| 43584 |
+
"epoch": 0.39504761904761904,
|
| 43585 |
+
"grad_norm": 0.255859375,
|
| 43586 |
+
"learning_rate": 0.1,
|
| 43587 |
+
"loss": 2.4587724208831787,
|
| 43588 |
+
"step": 12444
|
| 43589 |
+
},
|
| 43590 |
+
{
|
| 43591 |
+
"epoch": 0.39511111111111114,
|
| 43592 |
+
"grad_norm": 0.0712890625,
|
| 43593 |
+
"learning_rate": 0.1,
|
| 43594 |
+
"loss": 2.462385892868042,
|
| 43595 |
+
"step": 12446
|
| 43596 |
+
},
|
| 43597 |
+
{
|
| 43598 |
+
"epoch": 0.3951746031746032,
|
| 43599 |
+
"grad_norm": 0.057373046875,
|
| 43600 |
+
"learning_rate": 0.1,
|
| 43601 |
+
"loss": 2.4769206047058105,
|
| 43602 |
+
"step": 12448
|
| 43603 |
+
},
|
| 43604 |
+
{
|
| 43605 |
+
"epoch": 0.3952380952380952,
|
| 43606 |
+
"grad_norm": 0.26953125,
|
| 43607 |
+
"learning_rate": 0.1,
|
| 43608 |
+
"loss": 2.448030948638916,
|
| 43609 |
+
"step": 12450
|
| 43610 |
+
},
|
| 43611 |
+
{
|
| 43612 |
+
"epoch": 0.3953015873015873,
|
| 43613 |
+
"grad_norm": 0.5,
|
| 43614 |
+
"learning_rate": 0.1,
|
| 43615 |
+
"loss": 2.474348783493042,
|
| 43616 |
+
"step": 12452
|
| 43617 |
+
},
|
| 43618 |
+
{
|
| 43619 |
+
"epoch": 0.39536507936507936,
|
| 43620 |
+
"grad_norm": 0.1259765625,
|
| 43621 |
+
"learning_rate": 0.1,
|
| 43622 |
+
"loss": 2.4807779788970947,
|
| 43623 |
+
"step": 12454
|
| 43624 |
+
},
|
| 43625 |
+
{
|
| 43626 |
+
"epoch": 0.3954285714285714,
|
| 43627 |
+
"grad_norm": 0.06884765625,
|
| 43628 |
+
"learning_rate": 0.1,
|
| 43629 |
+
"loss": 2.4678142070770264,
|
| 43630 |
+
"step": 12456
|
| 43631 |
+
},
|
| 43632 |
+
{
|
| 43633 |
+
"epoch": 0.3954920634920635,
|
| 43634 |
+
"grad_norm": 0.1591796875,
|
| 43635 |
+
"learning_rate": 0.1,
|
| 43636 |
+
"loss": 2.4569032192230225,
|
| 43637 |
+
"step": 12458
|
| 43638 |
+
},
|
| 43639 |
+
{
|
| 43640 |
+
"epoch": 0.39555555555555555,
|
| 43641 |
+
"grad_norm": 0.224609375,
|
| 43642 |
+
"learning_rate": 0.1,
|
| 43643 |
+
"loss": 2.476902484893799,
|
| 43644 |
+
"step": 12460
|
| 43645 |
+
},
|
| 43646 |
+
{
|
| 43647 |
+
"epoch": 0.3956190476190476,
|
| 43648 |
+
"grad_norm": 0.140625,
|
| 43649 |
+
"learning_rate": 0.1,
|
| 43650 |
+
"loss": 2.4690651893615723,
|
| 43651 |
+
"step": 12462
|
| 43652 |
+
},
|
| 43653 |
+
{
|
| 43654 |
+
"epoch": 0.3956825396825397,
|
| 43655 |
+
"grad_norm": 0.2451171875,
|
| 43656 |
+
"learning_rate": 0.1,
|
| 43657 |
+
"loss": 2.4898293018341064,
|
| 43658 |
+
"step": 12464
|
| 43659 |
+
},
|
| 43660 |
+
{
|
| 43661 |
+
"epoch": 0.39574603174603173,
|
| 43662 |
+
"grad_norm": 0.375,
|
| 43663 |
+
"learning_rate": 0.1,
|
| 43664 |
+
"loss": 2.4907736778259277,
|
| 43665 |
+
"step": 12466
|
| 43666 |
+
},
|
| 43667 |
+
{
|
| 43668 |
+
"epoch": 0.39580952380952383,
|
| 43669 |
+
"grad_norm": 0.130859375,
|
| 43670 |
+
"learning_rate": 0.1,
|
| 43671 |
+
"loss": 2.4488649368286133,
|
| 43672 |
+
"step": 12468
|
| 43673 |
+
},
|
| 43674 |
+
{
|
| 43675 |
+
"epoch": 0.3958730158730159,
|
| 43676 |
+
"grad_norm": 0.1298828125,
|
| 43677 |
+
"learning_rate": 0.1,
|
| 43678 |
+
"loss": 2.454991102218628,
|
| 43679 |
+
"step": 12470
|
| 43680 |
+
},
|
| 43681 |
+
{
|
| 43682 |
+
"epoch": 0.3959365079365079,
|
| 43683 |
+
"grad_norm": 0.07275390625,
|
| 43684 |
+
"learning_rate": 0.1,
|
| 43685 |
+
"loss": 2.489849328994751,
|
| 43686 |
+
"step": 12472
|
| 43687 |
+
},
|
| 43688 |
+
{
|
| 43689 |
+
"epoch": 0.396,
|
| 43690 |
+
"grad_norm": 0.1513671875,
|
| 43691 |
+
"learning_rate": 0.1,
|
| 43692 |
+
"loss": 2.4257235527038574,
|
| 43693 |
+
"step": 12474
|
| 43694 |
+
},
|
| 43695 |
+
{
|
| 43696 |
+
"epoch": 0.39606349206349206,
|
| 43697 |
+
"grad_norm": 0.056396484375,
|
| 43698 |
+
"learning_rate": 0.1,
|
| 43699 |
+
"loss": 2.471559762954712,
|
| 43700 |
+
"step": 12476
|
| 43701 |
+
},
|
| 43702 |
+
{
|
| 43703 |
+
"epoch": 0.3961269841269841,
|
| 43704 |
+
"grad_norm": 0.2060546875,
|
| 43705 |
+
"learning_rate": 0.1,
|
| 43706 |
+
"loss": 2.4803552627563477,
|
| 43707 |
+
"step": 12478
|
| 43708 |
+
},
|
| 43709 |
+
{
|
| 43710 |
+
"epoch": 0.3961904761904762,
|
| 43711 |
+
"grad_norm": 0.451171875,
|
| 43712 |
+
"learning_rate": 0.1,
|
| 43713 |
+
"loss": 2.445671319961548,
|
| 43714 |
+
"step": 12480
|
| 43715 |
+
},
|
| 43716 |
+
{
|
| 43717 |
+
"epoch": 0.39625396825396825,
|
| 43718 |
+
"grad_norm": 0.1923828125,
|
| 43719 |
+
"learning_rate": 0.1,
|
| 43720 |
+
"loss": 2.468325138092041,
|
| 43721 |
+
"step": 12482
|
| 43722 |
+
},
|
| 43723 |
+
{
|
| 43724 |
+
"epoch": 0.3963174603174603,
|
| 43725 |
+
"grad_norm": 0.11865234375,
|
| 43726 |
+
"learning_rate": 0.1,
|
| 43727 |
+
"loss": 2.4626095294952393,
|
| 43728 |
+
"step": 12484
|
| 43729 |
+
},
|
| 43730 |
+
{
|
| 43731 |
+
"epoch": 0.3963809523809524,
|
| 43732 |
+
"grad_norm": 0.0625,
|
| 43733 |
+
"learning_rate": 0.1,
|
| 43734 |
+
"loss": 2.4606876373291016,
|
| 43735 |
+
"step": 12486
|
| 43736 |
+
},
|
| 43737 |
+
{
|
| 43738 |
+
"epoch": 0.39644444444444443,
|
| 43739 |
+
"grad_norm": 0.333984375,
|
| 43740 |
+
"learning_rate": 0.1,
|
| 43741 |
+
"loss": 2.4665181636810303,
|
| 43742 |
+
"step": 12488
|
| 43743 |
+
},
|
| 43744 |
+
{
|
| 43745 |
+
"epoch": 0.39650793650793653,
|
| 43746 |
+
"grad_norm": 0.50390625,
|
| 43747 |
+
"learning_rate": 0.1,
|
| 43748 |
+
"loss": 2.468353748321533,
|
| 43749 |
+
"step": 12490
|
| 43750 |
+
},
|
| 43751 |
+
{
|
| 43752 |
+
"epoch": 0.3965714285714286,
|
| 43753 |
+
"grad_norm": 0.2158203125,
|
| 43754 |
+
"learning_rate": 0.1,
|
| 43755 |
+
"loss": 2.486236095428467,
|
| 43756 |
+
"step": 12492
|
| 43757 |
+
},
|
| 43758 |
+
{
|
| 43759 |
+
"epoch": 0.3966349206349206,
|
| 43760 |
+
"grad_norm": 0.0869140625,
|
| 43761 |
+
"learning_rate": 0.1,
|
| 43762 |
+
"loss": 2.473392963409424,
|
| 43763 |
+
"step": 12494
|
| 43764 |
+
},
|
| 43765 |
+
{
|
| 43766 |
+
"epoch": 0.3966984126984127,
|
| 43767 |
+
"grad_norm": 0.10693359375,
|
| 43768 |
+
"learning_rate": 0.1,
|
| 43769 |
+
"loss": 2.485105514526367,
|
| 43770 |
+
"step": 12496
|
| 43771 |
+
},
|
| 43772 |
+
{
|
| 43773 |
+
"epoch": 0.39676190476190476,
|
| 43774 |
+
"grad_norm": 0.1826171875,
|
| 43775 |
+
"learning_rate": 0.1,
|
| 43776 |
+
"loss": 2.4699769020080566,
|
| 43777 |
+
"step": 12498
|
| 43778 |
+
},
|
| 43779 |
+
{
|
| 43780 |
+
"epoch": 0.3968253968253968,
|
| 43781 |
+
"grad_norm": 0.1806640625,
|
| 43782 |
+
"learning_rate": 0.1,
|
| 43783 |
+
"loss": 2.4763691425323486,
|
| 43784 |
+
"step": 12500
|
| 43785 |
+
},
|
| 43786 |
+
{
|
| 43787 |
+
"epoch": 0.3968888888888889,
|
| 43788 |
+
"grad_norm": 0.07568359375,
|
| 43789 |
+
"learning_rate": 0.1,
|
| 43790 |
+
"loss": 2.491389274597168,
|
| 43791 |
+
"step": 12502
|
| 43792 |
+
},
|
| 43793 |
+
{
|
| 43794 |
+
"epoch": 0.39695238095238095,
|
| 43795 |
+
"grad_norm": 0.1025390625,
|
| 43796 |
+
"learning_rate": 0.1,
|
| 43797 |
+
"loss": 2.481797218322754,
|
| 43798 |
+
"step": 12504
|
| 43799 |
+
},
|
| 43800 |
+
{
|
| 43801 |
+
"epoch": 0.397015873015873,
|
| 43802 |
+
"grad_norm": 0.14453125,
|
| 43803 |
+
"learning_rate": 0.1,
|
| 43804 |
+
"loss": 2.5158865451812744,
|
| 43805 |
+
"step": 12506
|
| 43806 |
+
},
|
| 43807 |
+
{
|
| 43808 |
+
"epoch": 0.3970793650793651,
|
| 43809 |
+
"grad_norm": 0.2021484375,
|
| 43810 |
+
"learning_rate": 0.1,
|
| 43811 |
+
"loss": 2.4983389377593994,
|
| 43812 |
+
"step": 12508
|
| 43813 |
+
},
|
| 43814 |
+
{
|
| 43815 |
+
"epoch": 0.39714285714285713,
|
| 43816 |
+
"grad_norm": 0.1201171875,
|
| 43817 |
+
"learning_rate": 0.1,
|
| 43818 |
+
"loss": 2.496899127960205,
|
| 43819 |
+
"step": 12510
|
| 43820 |
+
},
|
| 43821 |
+
{
|
| 43822 |
+
"epoch": 0.39720634920634923,
|
| 43823 |
+
"grad_norm": 0.087890625,
|
| 43824 |
+
"learning_rate": 0.1,
|
| 43825 |
+
"loss": 2.499006986618042,
|
| 43826 |
+
"step": 12512
|
| 43827 |
+
},
|
| 43828 |
+
{
|
| 43829 |
+
"epoch": 0.3972698412698413,
|
| 43830 |
+
"grad_norm": 0.11865234375,
|
| 43831 |
+
"learning_rate": 0.1,
|
| 43832 |
+
"loss": 2.483018636703491,
|
| 43833 |
+
"step": 12514
|
| 43834 |
+
},
|
| 43835 |
+
{
|
| 43836 |
+
"epoch": 0.3973333333333333,
|
| 43837 |
+
"grad_norm": 0.384765625,
|
| 43838 |
+
"learning_rate": 0.1,
|
| 43839 |
+
"loss": 2.4755356311798096,
|
| 43840 |
+
"step": 12516
|
| 43841 |
+
},
|
| 43842 |
+
{
|
| 43843 |
+
"epoch": 0.3973968253968254,
|
| 43844 |
+
"grad_norm": 0.482421875,
|
| 43845 |
+
"learning_rate": 0.1,
|
| 43846 |
+
"loss": 2.4969849586486816,
|
| 43847 |
+
"step": 12518
|
| 43848 |
+
},
|
| 43849 |
+
{
|
| 43850 |
+
"epoch": 0.39746031746031746,
|
| 43851 |
+
"grad_norm": 0.06591796875,
|
| 43852 |
+
"learning_rate": 0.1,
|
| 43853 |
+
"loss": 2.481879234313965,
|
| 43854 |
+
"step": 12520
|
| 43855 |
+
},
|
| 43856 |
+
{
|
| 43857 |
+
"epoch": 0.3975238095238095,
|
| 43858 |
+
"grad_norm": 0.212890625,
|
| 43859 |
+
"learning_rate": 0.1,
|
| 43860 |
+
"loss": 2.507845401763916,
|
| 43861 |
+
"step": 12522
|
| 43862 |
+
},
|
| 43863 |
+
{
|
| 43864 |
+
"epoch": 0.3975873015873016,
|
| 43865 |
+
"grad_norm": 0.2080078125,
|
| 43866 |
+
"learning_rate": 0.1,
|
| 43867 |
+
"loss": 2.514796018600464,
|
| 43868 |
+
"step": 12524
|
| 43869 |
+
},
|
| 43870 |
+
{
|
| 43871 |
+
"epoch": 0.39765079365079364,
|
| 43872 |
+
"grad_norm": 0.255859375,
|
| 43873 |
+
"learning_rate": 0.1,
|
| 43874 |
+
"loss": 2.486309766769409,
|
| 43875 |
+
"step": 12526
|
| 43876 |
+
},
|
| 43877 |
+
{
|
| 43878 |
+
"epoch": 0.3977142857142857,
|
| 43879 |
+
"grad_norm": 0.2412109375,
|
| 43880 |
+
"learning_rate": 0.1,
|
| 43881 |
+
"loss": 2.4785990715026855,
|
| 43882 |
+
"step": 12528
|
| 43883 |
+
},
|
| 43884 |
+
{
|
| 43885 |
+
"epoch": 0.3977777777777778,
|
| 43886 |
+
"grad_norm": 0.294921875,
|
| 43887 |
+
"learning_rate": 0.1,
|
| 43888 |
+
"loss": 2.4831855297088623,
|
| 43889 |
+
"step": 12530
|
| 43890 |
+
},
|
| 43891 |
+
{
|
| 43892 |
+
"epoch": 0.39784126984126983,
|
| 43893 |
+
"grad_norm": 0.19921875,
|
| 43894 |
+
"learning_rate": 0.1,
|
| 43895 |
+
"loss": 2.4929800033569336,
|
| 43896 |
+
"step": 12532
|
| 43897 |
+
},
|
| 43898 |
+
{
|
| 43899 |
+
"epoch": 0.3979047619047619,
|
| 43900 |
+
"grad_norm": 0.2158203125,
|
| 43901 |
+
"learning_rate": 0.1,
|
| 43902 |
+
"loss": 2.5137693881988525,
|
| 43903 |
+
"step": 12534
|
| 43904 |
+
},
|
| 43905 |
+
{
|
| 43906 |
+
"epoch": 0.39796825396825397,
|
| 43907 |
+
"grad_norm": 0.12890625,
|
| 43908 |
+
"learning_rate": 0.1,
|
| 43909 |
+
"loss": 2.4880499839782715,
|
| 43910 |
+
"step": 12536
|
| 43911 |
+
},
|
| 43912 |
+
{
|
| 43913 |
+
"epoch": 0.398031746031746,
|
| 43914 |
+
"grad_norm": 0.423828125,
|
| 43915 |
+
"learning_rate": 0.1,
|
| 43916 |
+
"loss": 2.503533124923706,
|
| 43917 |
+
"step": 12538
|
| 43918 |
+
},
|
| 43919 |
+
{
|
| 43920 |
+
"epoch": 0.3980952380952381,
|
| 43921 |
+
"grad_norm": 0.19921875,
|
| 43922 |
+
"learning_rate": 0.1,
|
| 43923 |
+
"loss": 2.480361223220825,
|
| 43924 |
+
"step": 12540
|
| 43925 |
+
},
|
| 43926 |
+
{
|
| 43927 |
+
"epoch": 0.39815873015873016,
|
| 43928 |
+
"grad_norm": 0.205078125,
|
| 43929 |
+
"learning_rate": 0.1,
|
| 43930 |
+
"loss": 2.4782419204711914,
|
| 43931 |
+
"step": 12542
|
| 43932 |
+
},
|
| 43933 |
+
{
|
| 43934 |
+
"epoch": 0.3982222222222222,
|
| 43935 |
+
"grad_norm": 0.1572265625,
|
| 43936 |
+
"learning_rate": 0.1,
|
| 43937 |
+
"loss": 2.463319778442383,
|
| 43938 |
+
"step": 12544
|
| 43939 |
+
},
|
| 43940 |
+
{
|
| 43941 |
+
"epoch": 0.3982857142857143,
|
| 43942 |
+
"grad_norm": 0.09765625,
|
| 43943 |
+
"learning_rate": 0.1,
|
| 43944 |
+
"loss": 2.4680655002593994,
|
| 43945 |
+
"step": 12546
|
| 43946 |
+
},
|
| 43947 |
+
{
|
| 43948 |
+
"epoch": 0.39834920634920634,
|
| 43949 |
+
"grad_norm": 0.08740234375,
|
| 43950 |
+
"learning_rate": 0.1,
|
| 43951 |
+
"loss": 2.4729971885681152,
|
| 43952 |
+
"step": 12548
|
| 43953 |
+
},
|
| 43954 |
+
{
|
| 43955 |
+
"epoch": 0.3984126984126984,
|
| 43956 |
+
"grad_norm": 0.126953125,
|
| 43957 |
+
"learning_rate": 0.1,
|
| 43958 |
+
"loss": 2.4643752574920654,
|
| 43959 |
+
"step": 12550
|
| 43960 |
+
},
|
| 43961 |
+
{
|
| 43962 |
+
"epoch": 0.3984761904761905,
|
| 43963 |
+
"grad_norm": 0.1650390625,
|
| 43964 |
+
"learning_rate": 0.1,
|
| 43965 |
+
"loss": 2.4500062465667725,
|
| 43966 |
+
"step": 12552
|
| 43967 |
+
},
|
| 43968 |
+
{
|
| 43969 |
+
"epoch": 0.3985396825396825,
|
| 43970 |
+
"grad_norm": 0.2021484375,
|
| 43971 |
+
"learning_rate": 0.1,
|
| 43972 |
+
"loss": 2.4672186374664307,
|
| 43973 |
+
"step": 12554
|
| 43974 |
+
},
|
| 43975 |
+
{
|
| 43976 |
+
"epoch": 0.3986031746031746,
|
| 43977 |
+
"grad_norm": 0.38671875,
|
| 43978 |
+
"learning_rate": 0.1,
|
| 43979 |
+
"loss": 2.4818828105926514,
|
| 43980 |
+
"step": 12556
|
| 43981 |
+
},
|
| 43982 |
+
{
|
| 43983 |
+
"epoch": 0.39866666666666667,
|
| 43984 |
+
"grad_norm": 0.2578125,
|
| 43985 |
+
"learning_rate": 0.1,
|
| 43986 |
+
"loss": 2.4652960300445557,
|
| 43987 |
+
"step": 12558
|
| 43988 |
+
},
|
| 43989 |
+
{
|
| 43990 |
+
"epoch": 0.3987301587301587,
|
| 43991 |
+
"grad_norm": 0.11474609375,
|
| 43992 |
+
"learning_rate": 0.1,
|
| 43993 |
+
"loss": 2.433884382247925,
|
| 43994 |
+
"step": 12560
|
| 43995 |
+
},
|
| 43996 |
+
{
|
| 43997 |
+
"epoch": 0.3987936507936508,
|
| 43998 |
+
"grad_norm": 0.05712890625,
|
| 43999 |
+
"learning_rate": 0.1,
|
| 44000 |
+
"loss": 2.424755811691284,
|
| 44001 |
+
"step": 12562
|
| 44002 |
+
},
|
| 44003 |
+
{
|
| 44004 |
+
"epoch": 0.39885714285714285,
|
| 44005 |
+
"grad_norm": 0.07568359375,
|
| 44006 |
+
"learning_rate": 0.1,
|
| 44007 |
+
"loss": 2.430487632751465,
|
| 44008 |
+
"step": 12564
|
| 44009 |
+
},
|
| 44010 |
+
{
|
| 44011 |
+
"epoch": 0.3989206349206349,
|
| 44012 |
+
"grad_norm": 0.055908203125,
|
| 44013 |
+
"learning_rate": 0.1,
|
| 44014 |
+
"loss": 2.4455268383026123,
|
| 44015 |
+
"step": 12566
|
| 44016 |
+
},
|
| 44017 |
+
{
|
| 44018 |
+
"epoch": 0.398984126984127,
|
| 44019 |
+
"grad_norm": 0.23828125,
|
| 44020 |
+
"learning_rate": 0.1,
|
| 44021 |
+
"loss": 2.4371044635772705,
|
| 44022 |
+
"step": 12568
|
| 44023 |
+
},
|
| 44024 |
+
{
|
| 44025 |
+
"epoch": 0.39904761904761904,
|
| 44026 |
+
"grad_norm": 0.546875,
|
| 44027 |
+
"learning_rate": 0.1,
|
| 44028 |
+
"loss": 2.4374101161956787,
|
| 44029 |
+
"step": 12570
|
| 44030 |
+
},
|
| 44031 |
+
{
|
| 44032 |
+
"epoch": 0.39911111111111114,
|
| 44033 |
+
"grad_norm": 0.21875,
|
| 44034 |
+
"learning_rate": 0.1,
|
| 44035 |
+
"loss": 2.4382810592651367,
|
| 44036 |
+
"step": 12572
|
| 44037 |
+
},
|
| 44038 |
+
{
|
| 44039 |
+
"epoch": 0.3991746031746032,
|
| 44040 |
+
"grad_norm": 0.056884765625,
|
| 44041 |
+
"learning_rate": 0.1,
|
| 44042 |
+
"loss": 2.4031074047088623,
|
| 44043 |
+
"step": 12574
|
| 44044 |
+
},
|
| 44045 |
+
{
|
| 44046 |
+
"epoch": 0.3992380952380952,
|
| 44047 |
+
"grad_norm": 0.0517578125,
|
| 44048 |
+
"learning_rate": 0.1,
|
| 44049 |
+
"loss": 2.41848087310791,
|
| 44050 |
+
"step": 12576
|
| 44051 |
+
},
|
| 44052 |
+
{
|
| 44053 |
+
"epoch": 0.3993015873015873,
|
| 44054 |
+
"grad_norm": 0.12890625,
|
| 44055 |
+
"learning_rate": 0.1,
|
| 44056 |
+
"loss": 2.432514190673828,
|
| 44057 |
+
"step": 12578
|
| 44058 |
+
},
|
| 44059 |
+
{
|
| 44060 |
+
"epoch": 0.39936507936507937,
|
| 44061 |
+
"grad_norm": 0.4296875,
|
| 44062 |
+
"learning_rate": 0.1,
|
| 44063 |
+
"loss": 2.4152355194091797,
|
| 44064 |
+
"step": 12580
|
| 44065 |
+
},
|
| 44066 |
+
{
|
| 44067 |
+
"epoch": 0.3994285714285714,
|
| 44068 |
+
"grad_norm": 0.326171875,
|
| 44069 |
+
"learning_rate": 0.1,
|
| 44070 |
+
"loss": 2.3761203289031982,
|
| 44071 |
+
"step": 12582
|
| 44072 |
+
},
|
| 44073 |
+
{
|
| 44074 |
+
"epoch": 0.3994920634920635,
|
| 44075 |
+
"grad_norm": 0.06103515625,
|
| 44076 |
+
"learning_rate": 0.1,
|
| 44077 |
+
"loss": 2.4018025398254395,
|
| 44078 |
+
"step": 12584
|
| 44079 |
+
},
|
| 44080 |
+
{
|
| 44081 |
+
"epoch": 0.39955555555555555,
|
| 44082 |
+
"grad_norm": 0.07080078125,
|
| 44083 |
+
"learning_rate": 0.1,
|
| 44084 |
+
"loss": 2.386880874633789,
|
| 44085 |
+
"step": 12586
|
| 44086 |
+
},
|
| 44087 |
+
{
|
| 44088 |
+
"epoch": 0.3996190476190476,
|
| 44089 |
+
"grad_norm": 0.1845703125,
|
| 44090 |
+
"learning_rate": 0.1,
|
| 44091 |
+
"loss": 2.397996425628662,
|
| 44092 |
+
"step": 12588
|
| 44093 |
+
},
|
| 44094 |
+
{
|
| 44095 |
+
"epoch": 0.3996825396825397,
|
| 44096 |
+
"grad_norm": 0.341796875,
|
| 44097 |
+
"learning_rate": 0.1,
|
| 44098 |
+
"loss": 2.398606300354004,
|
| 44099 |
+
"step": 12590
|
| 44100 |
+
},
|
| 44101 |
+
{
|
| 44102 |
+
"epoch": 0.39974603174603174,
|
| 44103 |
+
"grad_norm": 0.0771484375,
|
| 44104 |
+
"learning_rate": 0.1,
|
| 44105 |
+
"loss": 2.370978832244873,
|
| 44106 |
+
"step": 12592
|
| 44107 |
+
},
|
| 44108 |
+
{
|
| 44109 |
+
"epoch": 0.39980952380952384,
|
| 44110 |
+
"grad_norm": 0.10205078125,
|
| 44111 |
+
"learning_rate": 0.1,
|
| 44112 |
+
"loss": 2.37992262840271,
|
| 44113 |
+
"step": 12594
|
| 44114 |
+
},
|
| 44115 |
+
{
|
| 44116 |
+
"epoch": 0.3998730158730159,
|
| 44117 |
+
"grad_norm": 0.1435546875,
|
| 44118 |
+
"learning_rate": 0.1,
|
| 44119 |
+
"loss": 2.378612995147705,
|
| 44120 |
+
"step": 12596
|
| 44121 |
+
},
|
| 44122 |
+
{
|
| 44123 |
+
"epoch": 0.3999365079365079,
|
| 44124 |
+
"grad_norm": 0.12060546875,
|
| 44125 |
+
"learning_rate": 0.1,
|
| 44126 |
+
"loss": 2.3547215461730957,
|
| 44127 |
+
"step": 12598
|
| 44128 |
+
},
|
| 44129 |
+
{
|
| 44130 |
+
"epoch": 0.4,
|
| 44131 |
+
"grad_norm": 0.08740234375,
|
| 44132 |
+
"learning_rate": 0.1,
|
| 44133 |
+
"loss": 2.359011650085449,
|
| 44134 |
+
"step": 12600
|
| 44135 |
+
},
|
| 44136 |
+
{
|
| 44137 |
+
"epoch": 0.4,
|
| 44138 |
+
"eval_loss": 1.7705790996551514,
|
| 44139 |
+
"eval_runtime": 105.9037,
|
| 44140 |
+
"eval_samples_per_second": 10.028,
|
| 44141 |
+
"eval_steps_per_second": 2.512,
|
| 44142 |
+
"step": 12600
|
| 44143 |
}
|
| 44144 |
],
|
| 44145 |
"logging_steps": 2,
|
|
|
|
| 44159 |
"attributes": {}
|
| 44160 |
}
|
| 44161 |
},
|
| 44162 |
+
"total_flos": 4.17291330903253e+19,
|
| 44163 |
"train_batch_size": 4,
|
| 44164 |
"trial_name": null,
|
| 44165 |
"trial_params": null
|