Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 12915, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1171937904
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1c4ab61291ed876d2846992c1aedc554bfcbc5ab9a61ff3f786fac212673430e
|
| 3 |
size 1171937904
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1288212619
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dafff400ed1ea4c4e87a08eca726a8823cbd85926bcb70e6f6c4e823327f74fa
|
| 3 |
size 1288212619
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1401
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c0146b598e2b404bbfd38ad4897f388ac3b184beab65521e27d27e70a8fd0073
|
| 3 |
size 1401
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -44140,6 +44140,1105 @@
|
|
| 44140 |
"eval_samples_per_second": 10.028,
|
| 44141 |
"eval_steps_per_second": 2.512,
|
| 44142 |
"step": 12600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44143 |
}
|
| 44144 |
],
|
| 44145 |
"logging_steps": 2,
|
|
@@ -44159,7 +45258,7 @@
|
|
| 44159 |
"attributes": {}
|
| 44160 |
}
|
| 44161 |
},
|
| 44162 |
-
"total_flos": 4.
|
| 44163 |
"train_batch_size": 4,
|
| 44164 |
"trial_name": null,
|
| 44165 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.41,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 12915,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 44140 |
"eval_samples_per_second": 10.028,
|
| 44141 |
"eval_steps_per_second": 2.512,
|
| 44142 |
"step": 12600
|
| 44143 |
+
},
|
| 44144 |
+
{
|
| 44145 |
+
"epoch": 0.40006349206349207,
|
| 44146 |
+
"grad_norm": 0.2060546875,
|
| 44147 |
+
"learning_rate": 0.1,
|
| 44148 |
+
"loss": 2.377570629119873,
|
| 44149 |
+
"step": 12602
|
| 44150 |
+
},
|
| 44151 |
+
{
|
| 44152 |
+
"epoch": 0.4001269841269841,
|
| 44153 |
+
"grad_norm": 0.0693359375,
|
| 44154 |
+
"learning_rate": 0.1,
|
| 44155 |
+
"loss": 2.394019365310669,
|
| 44156 |
+
"step": 12604
|
| 44157 |
+
},
|
| 44158 |
+
{
|
| 44159 |
+
"epoch": 0.4001904761904762,
|
| 44160 |
+
"grad_norm": 0.1953125,
|
| 44161 |
+
"learning_rate": 0.1,
|
| 44162 |
+
"loss": 2.3574202060699463,
|
| 44163 |
+
"step": 12606
|
| 44164 |
+
},
|
| 44165 |
+
{
|
| 44166 |
+
"epoch": 0.40025396825396825,
|
| 44167 |
+
"grad_norm": 0.1513671875,
|
| 44168 |
+
"learning_rate": 0.1,
|
| 44169 |
+
"loss": 2.33833646774292,
|
| 44170 |
+
"step": 12608
|
| 44171 |
+
},
|
| 44172 |
+
{
|
| 44173 |
+
"epoch": 0.4003174603174603,
|
| 44174 |
+
"grad_norm": 0.33203125,
|
| 44175 |
+
"learning_rate": 0.1,
|
| 44176 |
+
"loss": 2.369246244430542,
|
| 44177 |
+
"step": 12610
|
| 44178 |
+
},
|
| 44179 |
+
{
|
| 44180 |
+
"epoch": 0.4003809523809524,
|
| 44181 |
+
"grad_norm": 0.181640625,
|
| 44182 |
+
"learning_rate": 0.1,
|
| 44183 |
+
"loss": 2.3324646949768066,
|
| 44184 |
+
"step": 12612
|
| 44185 |
+
},
|
| 44186 |
+
{
|
| 44187 |
+
"epoch": 0.40044444444444444,
|
| 44188 |
+
"grad_norm": 0.2021484375,
|
| 44189 |
+
"learning_rate": 0.1,
|
| 44190 |
+
"loss": 2.342609405517578,
|
| 44191 |
+
"step": 12614
|
| 44192 |
+
},
|
| 44193 |
+
{
|
| 44194 |
+
"epoch": 0.40050793650793653,
|
| 44195 |
+
"grad_norm": 0.208984375,
|
| 44196 |
+
"learning_rate": 0.1,
|
| 44197 |
+
"loss": 2.3590927124023438,
|
| 44198 |
+
"step": 12616
|
| 44199 |
+
},
|
| 44200 |
+
{
|
| 44201 |
+
"epoch": 0.4005714285714286,
|
| 44202 |
+
"grad_norm": 0.265625,
|
| 44203 |
+
"learning_rate": 0.1,
|
| 44204 |
+
"loss": 2.340346097946167,
|
| 44205 |
+
"step": 12618
|
| 44206 |
+
},
|
| 44207 |
+
{
|
| 44208 |
+
"epoch": 0.4006349206349206,
|
| 44209 |
+
"grad_norm": 0.34765625,
|
| 44210 |
+
"learning_rate": 0.1,
|
| 44211 |
+
"loss": 2.324613571166992,
|
| 44212 |
+
"step": 12620
|
| 44213 |
+
},
|
| 44214 |
+
{
|
| 44215 |
+
"epoch": 0.4006984126984127,
|
| 44216 |
+
"grad_norm": 0.09521484375,
|
| 44217 |
+
"learning_rate": 0.1,
|
| 44218 |
+
"loss": 2.3287599086761475,
|
| 44219 |
+
"step": 12622
|
| 44220 |
+
},
|
| 44221 |
+
{
|
| 44222 |
+
"epoch": 0.40076190476190476,
|
| 44223 |
+
"grad_norm": 0.1884765625,
|
| 44224 |
+
"learning_rate": 0.1,
|
| 44225 |
+
"loss": 2.3095972537994385,
|
| 44226 |
+
"step": 12624
|
| 44227 |
+
},
|
| 44228 |
+
{
|
| 44229 |
+
"epoch": 0.4008253968253968,
|
| 44230 |
+
"grad_norm": 0.06591796875,
|
| 44231 |
+
"learning_rate": 0.1,
|
| 44232 |
+
"loss": 2.3337745666503906,
|
| 44233 |
+
"step": 12626
|
| 44234 |
+
},
|
| 44235 |
+
{
|
| 44236 |
+
"epoch": 0.4008888888888889,
|
| 44237 |
+
"grad_norm": 0.0947265625,
|
| 44238 |
+
"learning_rate": 0.1,
|
| 44239 |
+
"loss": 2.3558530807495117,
|
| 44240 |
+
"step": 12628
|
| 44241 |
+
},
|
| 44242 |
+
{
|
| 44243 |
+
"epoch": 0.40095238095238095,
|
| 44244 |
+
"grad_norm": 0.1474609375,
|
| 44245 |
+
"learning_rate": 0.1,
|
| 44246 |
+
"loss": 2.3162124156951904,
|
| 44247 |
+
"step": 12630
|
| 44248 |
+
},
|
| 44249 |
+
{
|
| 44250 |
+
"epoch": 0.401015873015873,
|
| 44251 |
+
"grad_norm": 0.20703125,
|
| 44252 |
+
"learning_rate": 0.1,
|
| 44253 |
+
"loss": 2.35158634185791,
|
| 44254 |
+
"step": 12632
|
| 44255 |
+
},
|
| 44256 |
+
{
|
| 44257 |
+
"epoch": 0.4010793650793651,
|
| 44258 |
+
"grad_norm": 0.123046875,
|
| 44259 |
+
"learning_rate": 0.1,
|
| 44260 |
+
"loss": 2.3345530033111572,
|
| 44261 |
+
"step": 12634
|
| 44262 |
+
},
|
| 44263 |
+
{
|
| 44264 |
+
"epoch": 0.40114285714285713,
|
| 44265 |
+
"grad_norm": 0.212890625,
|
| 44266 |
+
"learning_rate": 0.1,
|
| 44267 |
+
"loss": 2.3051955699920654,
|
| 44268 |
+
"step": 12636
|
| 44269 |
+
},
|
| 44270 |
+
{
|
| 44271 |
+
"epoch": 0.40120634920634923,
|
| 44272 |
+
"grad_norm": 0.19140625,
|
| 44273 |
+
"learning_rate": 0.1,
|
| 44274 |
+
"loss": 2.3525850772857666,
|
| 44275 |
+
"step": 12638
|
| 44276 |
+
},
|
| 44277 |
+
{
|
| 44278 |
+
"epoch": 0.4012698412698413,
|
| 44279 |
+
"grad_norm": 0.0791015625,
|
| 44280 |
+
"learning_rate": 0.1,
|
| 44281 |
+
"loss": 2.3211631774902344,
|
| 44282 |
+
"step": 12640
|
| 44283 |
+
},
|
| 44284 |
+
{
|
| 44285 |
+
"epoch": 0.4013333333333333,
|
| 44286 |
+
"grad_norm": 0.193359375,
|
| 44287 |
+
"learning_rate": 0.1,
|
| 44288 |
+
"loss": 2.300722122192383,
|
| 44289 |
+
"step": 12642
|
| 44290 |
+
},
|
| 44291 |
+
{
|
| 44292 |
+
"epoch": 0.4013968253968254,
|
| 44293 |
+
"grad_norm": 0.14453125,
|
| 44294 |
+
"learning_rate": 0.1,
|
| 44295 |
+
"loss": 2.308464288711548,
|
| 44296 |
+
"step": 12644
|
| 44297 |
+
},
|
| 44298 |
+
{
|
| 44299 |
+
"epoch": 0.40146031746031746,
|
| 44300 |
+
"grad_norm": 0.265625,
|
| 44301 |
+
"learning_rate": 0.1,
|
| 44302 |
+
"loss": 2.2925209999084473,
|
| 44303 |
+
"step": 12646
|
| 44304 |
+
},
|
| 44305 |
+
{
|
| 44306 |
+
"epoch": 0.4015238095238095,
|
| 44307 |
+
"grad_norm": 0.29296875,
|
| 44308 |
+
"learning_rate": 0.1,
|
| 44309 |
+
"loss": 2.31681227684021,
|
| 44310 |
+
"step": 12648
|
| 44311 |
+
},
|
| 44312 |
+
{
|
| 44313 |
+
"epoch": 0.4015873015873016,
|
| 44314 |
+
"grad_norm": 0.076171875,
|
| 44315 |
+
"learning_rate": 0.1,
|
| 44316 |
+
"loss": 2.309741973876953,
|
| 44317 |
+
"step": 12650
|
| 44318 |
+
},
|
| 44319 |
+
{
|
| 44320 |
+
"epoch": 0.40165079365079365,
|
| 44321 |
+
"grad_norm": 0.154296875,
|
| 44322 |
+
"learning_rate": 0.1,
|
| 44323 |
+
"loss": 2.2926597595214844,
|
| 44324 |
+
"step": 12652
|
| 44325 |
+
},
|
| 44326 |
+
{
|
| 44327 |
+
"epoch": 0.4017142857142857,
|
| 44328 |
+
"grad_norm": 0.1904296875,
|
| 44329 |
+
"learning_rate": 0.1,
|
| 44330 |
+
"loss": 2.289318323135376,
|
| 44331 |
+
"step": 12654
|
| 44332 |
+
},
|
| 44333 |
+
{
|
| 44334 |
+
"epoch": 0.4017777777777778,
|
| 44335 |
+
"grad_norm": 0.19140625,
|
| 44336 |
+
"learning_rate": 0.1,
|
| 44337 |
+
"loss": 2.311735153198242,
|
| 44338 |
+
"step": 12656
|
| 44339 |
+
},
|
| 44340 |
+
{
|
| 44341 |
+
"epoch": 0.40184126984126983,
|
| 44342 |
+
"grad_norm": 0.08203125,
|
| 44343 |
+
"learning_rate": 0.1,
|
| 44344 |
+
"loss": 2.292684316635132,
|
| 44345 |
+
"step": 12658
|
| 44346 |
+
},
|
| 44347 |
+
{
|
| 44348 |
+
"epoch": 0.40190476190476193,
|
| 44349 |
+
"grad_norm": 0.134765625,
|
| 44350 |
+
"learning_rate": 0.1,
|
| 44351 |
+
"loss": 2.2918014526367188,
|
| 44352 |
+
"step": 12660
|
| 44353 |
+
},
|
| 44354 |
+
{
|
| 44355 |
+
"epoch": 0.401968253968254,
|
| 44356 |
+
"grad_norm": 0.4375,
|
| 44357 |
+
"learning_rate": 0.1,
|
| 44358 |
+
"loss": 2.315018892288208,
|
| 44359 |
+
"step": 12662
|
| 44360 |
+
},
|
| 44361 |
+
{
|
| 44362 |
+
"epoch": 0.402031746031746,
|
| 44363 |
+
"grad_norm": 0.306640625,
|
| 44364 |
+
"learning_rate": 0.1,
|
| 44365 |
+
"loss": 2.2844858169555664,
|
| 44366 |
+
"step": 12664
|
| 44367 |
+
},
|
| 44368 |
+
{
|
| 44369 |
+
"epoch": 0.4020952380952381,
|
| 44370 |
+
"grad_norm": 0.267578125,
|
| 44371 |
+
"learning_rate": 0.1,
|
| 44372 |
+
"loss": 2.2867681980133057,
|
| 44373 |
+
"step": 12666
|
| 44374 |
+
},
|
| 44375 |
+
{
|
| 44376 |
+
"epoch": 0.40215873015873016,
|
| 44377 |
+
"grad_norm": 0.1259765625,
|
| 44378 |
+
"learning_rate": 0.1,
|
| 44379 |
+
"loss": 2.2710859775543213,
|
| 44380 |
+
"step": 12668
|
| 44381 |
+
},
|
| 44382 |
+
{
|
| 44383 |
+
"epoch": 0.4022222222222222,
|
| 44384 |
+
"grad_norm": 0.16796875,
|
| 44385 |
+
"learning_rate": 0.1,
|
| 44386 |
+
"loss": 2.2854208946228027,
|
| 44387 |
+
"step": 12670
|
| 44388 |
+
},
|
| 44389 |
+
{
|
| 44390 |
+
"epoch": 0.4022857142857143,
|
| 44391 |
+
"grad_norm": 0.043212890625,
|
| 44392 |
+
"learning_rate": 0.1,
|
| 44393 |
+
"loss": 2.2909247875213623,
|
| 44394 |
+
"step": 12672
|
| 44395 |
+
},
|
| 44396 |
+
{
|
| 44397 |
+
"epoch": 0.40234920634920635,
|
| 44398 |
+
"grad_norm": 0.21875,
|
| 44399 |
+
"learning_rate": 0.1,
|
| 44400 |
+
"loss": 2.272588014602661,
|
| 44401 |
+
"step": 12674
|
| 44402 |
+
},
|
| 44403 |
+
{
|
| 44404 |
+
"epoch": 0.4024126984126984,
|
| 44405 |
+
"grad_norm": 0.318359375,
|
| 44406 |
+
"learning_rate": 0.1,
|
| 44407 |
+
"loss": 2.2849819660186768,
|
| 44408 |
+
"step": 12676
|
| 44409 |
+
},
|
| 44410 |
+
{
|
| 44411 |
+
"epoch": 0.4024761904761905,
|
| 44412 |
+
"grad_norm": 0.2294921875,
|
| 44413 |
+
"learning_rate": 0.1,
|
| 44414 |
+
"loss": 2.2895278930664062,
|
| 44415 |
+
"step": 12678
|
| 44416 |
+
},
|
| 44417 |
+
{
|
| 44418 |
+
"epoch": 0.40253968253968253,
|
| 44419 |
+
"grad_norm": 0.07958984375,
|
| 44420 |
+
"learning_rate": 0.1,
|
| 44421 |
+
"loss": 2.2852509021759033,
|
| 44422 |
+
"step": 12680
|
| 44423 |
+
},
|
| 44424 |
+
{
|
| 44425 |
+
"epoch": 0.40260317460317463,
|
| 44426 |
+
"grad_norm": 0.06640625,
|
| 44427 |
+
"learning_rate": 0.1,
|
| 44428 |
+
"loss": 2.284451484680176,
|
| 44429 |
+
"step": 12682
|
| 44430 |
+
},
|
| 44431 |
+
{
|
| 44432 |
+
"epoch": 0.4026666666666667,
|
| 44433 |
+
"grad_norm": 0.1552734375,
|
| 44434 |
+
"learning_rate": 0.1,
|
| 44435 |
+
"loss": 2.2716317176818848,
|
| 44436 |
+
"step": 12684
|
| 44437 |
+
},
|
| 44438 |
+
{
|
| 44439 |
+
"epoch": 0.4027301587301587,
|
| 44440 |
+
"grad_norm": 0.095703125,
|
| 44441 |
+
"learning_rate": 0.1,
|
| 44442 |
+
"loss": 2.2610225677490234,
|
| 44443 |
+
"step": 12686
|
| 44444 |
+
},
|
| 44445 |
+
{
|
| 44446 |
+
"epoch": 0.4027936507936508,
|
| 44447 |
+
"grad_norm": 0.138671875,
|
| 44448 |
+
"learning_rate": 0.1,
|
| 44449 |
+
"loss": 2.2757816314697266,
|
| 44450 |
+
"step": 12688
|
| 44451 |
+
},
|
| 44452 |
+
{
|
| 44453 |
+
"epoch": 0.40285714285714286,
|
| 44454 |
+
"grad_norm": 0.1494140625,
|
| 44455 |
+
"learning_rate": 0.1,
|
| 44456 |
+
"loss": 2.2554984092712402,
|
| 44457 |
+
"step": 12690
|
| 44458 |
+
},
|
| 44459 |
+
{
|
| 44460 |
+
"epoch": 0.4029206349206349,
|
| 44461 |
+
"grad_norm": 0.353515625,
|
| 44462 |
+
"learning_rate": 0.1,
|
| 44463 |
+
"loss": 2.2480530738830566,
|
| 44464 |
+
"step": 12692
|
| 44465 |
+
},
|
| 44466 |
+
{
|
| 44467 |
+
"epoch": 0.402984126984127,
|
| 44468 |
+
"grad_norm": 0.10888671875,
|
| 44469 |
+
"learning_rate": 0.1,
|
| 44470 |
+
"loss": 2.276384115219116,
|
| 44471 |
+
"step": 12694
|
| 44472 |
+
},
|
| 44473 |
+
{
|
| 44474 |
+
"epoch": 0.40304761904761904,
|
| 44475 |
+
"grad_norm": 0.140625,
|
| 44476 |
+
"learning_rate": 0.1,
|
| 44477 |
+
"loss": 2.269636869430542,
|
| 44478 |
+
"step": 12696
|
| 44479 |
+
},
|
| 44480 |
+
{
|
| 44481 |
+
"epoch": 0.4031111111111111,
|
| 44482 |
+
"grad_norm": 0.19921875,
|
| 44483 |
+
"learning_rate": 0.1,
|
| 44484 |
+
"loss": 2.2666258811950684,
|
| 44485 |
+
"step": 12698
|
| 44486 |
+
},
|
| 44487 |
+
{
|
| 44488 |
+
"epoch": 0.4031746031746032,
|
| 44489 |
+
"grad_norm": 0.06884765625,
|
| 44490 |
+
"learning_rate": 0.1,
|
| 44491 |
+
"loss": 2.276977062225342,
|
| 44492 |
+
"step": 12700
|
| 44493 |
+
},
|
| 44494 |
+
{
|
| 44495 |
+
"epoch": 0.40323809523809523,
|
| 44496 |
+
"grad_norm": 0.125,
|
| 44497 |
+
"learning_rate": 0.1,
|
| 44498 |
+
"loss": 2.2830371856689453,
|
| 44499 |
+
"step": 12702
|
| 44500 |
+
},
|
| 44501 |
+
{
|
| 44502 |
+
"epoch": 0.4033015873015873,
|
| 44503 |
+
"grad_norm": 0.1337890625,
|
| 44504 |
+
"learning_rate": 0.1,
|
| 44505 |
+
"loss": 2.277653932571411,
|
| 44506 |
+
"step": 12704
|
| 44507 |
+
},
|
| 44508 |
+
{
|
| 44509 |
+
"epoch": 0.40336507936507937,
|
| 44510 |
+
"grad_norm": 0.123046875,
|
| 44511 |
+
"learning_rate": 0.1,
|
| 44512 |
+
"loss": 2.3027429580688477,
|
| 44513 |
+
"step": 12706
|
| 44514 |
+
},
|
| 44515 |
+
{
|
| 44516 |
+
"epoch": 0.4034285714285714,
|
| 44517 |
+
"grad_norm": 0.12451171875,
|
| 44518 |
+
"learning_rate": 0.1,
|
| 44519 |
+
"loss": 2.2705657482147217,
|
| 44520 |
+
"step": 12708
|
| 44521 |
+
},
|
| 44522 |
+
{
|
| 44523 |
+
"epoch": 0.4034920634920635,
|
| 44524 |
+
"grad_norm": 0.314453125,
|
| 44525 |
+
"learning_rate": 0.1,
|
| 44526 |
+
"loss": 2.2589237689971924,
|
| 44527 |
+
"step": 12710
|
| 44528 |
+
},
|
| 44529 |
+
{
|
| 44530 |
+
"epoch": 0.40355555555555556,
|
| 44531 |
+
"grad_norm": 0.298828125,
|
| 44532 |
+
"learning_rate": 0.1,
|
| 44533 |
+
"loss": 2.239840030670166,
|
| 44534 |
+
"step": 12712
|
| 44535 |
+
},
|
| 44536 |
+
{
|
| 44537 |
+
"epoch": 0.4036190476190476,
|
| 44538 |
+
"grad_norm": 0.0830078125,
|
| 44539 |
+
"learning_rate": 0.1,
|
| 44540 |
+
"loss": 2.250976324081421,
|
| 44541 |
+
"step": 12714
|
| 44542 |
+
},
|
| 44543 |
+
{
|
| 44544 |
+
"epoch": 0.4036825396825397,
|
| 44545 |
+
"grad_norm": 0.09228515625,
|
| 44546 |
+
"learning_rate": 0.1,
|
| 44547 |
+
"loss": 2.246659517288208,
|
| 44548 |
+
"step": 12716
|
| 44549 |
+
},
|
| 44550 |
+
{
|
| 44551 |
+
"epoch": 0.40374603174603174,
|
| 44552 |
+
"grad_norm": 0.1494140625,
|
| 44553 |
+
"learning_rate": 0.1,
|
| 44554 |
+
"loss": 2.259284496307373,
|
| 44555 |
+
"step": 12718
|
| 44556 |
+
},
|
| 44557 |
+
{
|
| 44558 |
+
"epoch": 0.4038095238095238,
|
| 44559 |
+
"grad_norm": 0.13671875,
|
| 44560 |
+
"learning_rate": 0.1,
|
| 44561 |
+
"loss": 2.2647953033447266,
|
| 44562 |
+
"step": 12720
|
| 44563 |
+
},
|
| 44564 |
+
{
|
| 44565 |
+
"epoch": 0.4038730158730159,
|
| 44566 |
+
"grad_norm": 0.228515625,
|
| 44567 |
+
"learning_rate": 0.1,
|
| 44568 |
+
"loss": 2.234811544418335,
|
| 44569 |
+
"step": 12722
|
| 44570 |
+
},
|
| 44571 |
+
{
|
| 44572 |
+
"epoch": 0.4039365079365079,
|
| 44573 |
+
"grad_norm": 0.2041015625,
|
| 44574 |
+
"learning_rate": 0.1,
|
| 44575 |
+
"loss": 2.2175509929656982,
|
| 44576 |
+
"step": 12724
|
| 44577 |
+
},
|
| 44578 |
+
{
|
| 44579 |
+
"epoch": 0.404,
|
| 44580 |
+
"grad_norm": 0.1337890625,
|
| 44581 |
+
"learning_rate": 0.1,
|
| 44582 |
+
"loss": 2.2525484561920166,
|
| 44583 |
+
"step": 12726
|
| 44584 |
+
},
|
| 44585 |
+
{
|
| 44586 |
+
"epoch": 0.40406349206349207,
|
| 44587 |
+
"grad_norm": 0.12109375,
|
| 44588 |
+
"learning_rate": 0.1,
|
| 44589 |
+
"loss": 2.2338736057281494,
|
| 44590 |
+
"step": 12728
|
| 44591 |
+
},
|
| 44592 |
+
{
|
| 44593 |
+
"epoch": 0.4041269841269841,
|
| 44594 |
+
"grad_norm": 0.10302734375,
|
| 44595 |
+
"learning_rate": 0.1,
|
| 44596 |
+
"loss": 2.2381958961486816,
|
| 44597 |
+
"step": 12730
|
| 44598 |
+
},
|
| 44599 |
+
{
|
| 44600 |
+
"epoch": 0.4041904761904762,
|
| 44601 |
+
"grad_norm": 0.2177734375,
|
| 44602 |
+
"learning_rate": 0.1,
|
| 44603 |
+
"loss": 2.2337396144866943,
|
| 44604 |
+
"step": 12732
|
| 44605 |
+
},
|
| 44606 |
+
{
|
| 44607 |
+
"epoch": 0.40425396825396825,
|
| 44608 |
+
"grad_norm": 0.3828125,
|
| 44609 |
+
"learning_rate": 0.1,
|
| 44610 |
+
"loss": 2.2588319778442383,
|
| 44611 |
+
"step": 12734
|
| 44612 |
+
},
|
| 44613 |
+
{
|
| 44614 |
+
"epoch": 0.4043174603174603,
|
| 44615 |
+
"grad_norm": 0.279296875,
|
| 44616 |
+
"learning_rate": 0.1,
|
| 44617 |
+
"loss": 2.2670295238494873,
|
| 44618 |
+
"step": 12736
|
| 44619 |
+
},
|
| 44620 |
+
{
|
| 44621 |
+
"epoch": 0.4043809523809524,
|
| 44622 |
+
"grad_norm": 0.0869140625,
|
| 44623 |
+
"learning_rate": 0.1,
|
| 44624 |
+
"loss": 2.2243025302886963,
|
| 44625 |
+
"step": 12738
|
| 44626 |
+
},
|
| 44627 |
+
{
|
| 44628 |
+
"epoch": 0.40444444444444444,
|
| 44629 |
+
"grad_norm": 0.07861328125,
|
| 44630 |
+
"learning_rate": 0.1,
|
| 44631 |
+
"loss": 2.251145362854004,
|
| 44632 |
+
"step": 12740
|
| 44633 |
+
},
|
| 44634 |
+
{
|
| 44635 |
+
"epoch": 0.4045079365079365,
|
| 44636 |
+
"grad_norm": 0.11962890625,
|
| 44637 |
+
"learning_rate": 0.1,
|
| 44638 |
+
"loss": 2.2257883548736572,
|
| 44639 |
+
"step": 12742
|
| 44640 |
+
},
|
| 44641 |
+
{
|
| 44642 |
+
"epoch": 0.4045714285714286,
|
| 44643 |
+
"grad_norm": 0.2060546875,
|
| 44644 |
+
"learning_rate": 0.1,
|
| 44645 |
+
"loss": 2.225264549255371,
|
| 44646 |
+
"step": 12744
|
| 44647 |
+
},
|
| 44648 |
+
{
|
| 44649 |
+
"epoch": 0.4046349206349206,
|
| 44650 |
+
"grad_norm": 0.380859375,
|
| 44651 |
+
"learning_rate": 0.1,
|
| 44652 |
+
"loss": 2.2152347564697266,
|
| 44653 |
+
"step": 12746
|
| 44654 |
+
},
|
| 44655 |
+
{
|
| 44656 |
+
"epoch": 0.4046984126984127,
|
| 44657 |
+
"grad_norm": 0.39453125,
|
| 44658 |
+
"learning_rate": 0.1,
|
| 44659 |
+
"loss": 2.2306759357452393,
|
| 44660 |
+
"step": 12748
|
| 44661 |
+
},
|
| 44662 |
+
{
|
| 44663 |
+
"epoch": 0.40476190476190477,
|
| 44664 |
+
"grad_norm": 0.0791015625,
|
| 44665 |
+
"learning_rate": 0.1,
|
| 44666 |
+
"loss": 2.22837233543396,
|
| 44667 |
+
"step": 12750
|
| 44668 |
+
},
|
| 44669 |
+
{
|
| 44670 |
+
"epoch": 0.4048253968253968,
|
| 44671 |
+
"grad_norm": 0.1083984375,
|
| 44672 |
+
"learning_rate": 0.1,
|
| 44673 |
+
"loss": 2.257899522781372,
|
| 44674 |
+
"step": 12752
|
| 44675 |
+
},
|
| 44676 |
+
{
|
| 44677 |
+
"epoch": 0.4048888888888889,
|
| 44678 |
+
"grad_norm": 0.205078125,
|
| 44679 |
+
"learning_rate": 0.1,
|
| 44680 |
+
"loss": 2.246670722961426,
|
| 44681 |
+
"step": 12754
|
| 44682 |
+
},
|
| 44683 |
+
{
|
| 44684 |
+
"epoch": 0.40495238095238095,
|
| 44685 |
+
"grad_norm": 0.14453125,
|
| 44686 |
+
"learning_rate": 0.1,
|
| 44687 |
+
"loss": 2.2245774269104004,
|
| 44688 |
+
"step": 12756
|
| 44689 |
+
},
|
| 44690 |
+
{
|
| 44691 |
+
"epoch": 0.405015873015873,
|
| 44692 |
+
"grad_norm": 0.06494140625,
|
| 44693 |
+
"learning_rate": 0.1,
|
| 44694 |
+
"loss": 2.2274630069732666,
|
| 44695 |
+
"step": 12758
|
| 44696 |
+
},
|
| 44697 |
+
{
|
| 44698 |
+
"epoch": 0.4050793650793651,
|
| 44699 |
+
"grad_norm": 0.052490234375,
|
| 44700 |
+
"learning_rate": 0.1,
|
| 44701 |
+
"loss": 2.240325927734375,
|
| 44702 |
+
"step": 12760
|
| 44703 |
+
},
|
| 44704 |
+
{
|
| 44705 |
+
"epoch": 0.40514285714285714,
|
| 44706 |
+
"grad_norm": 0.13671875,
|
| 44707 |
+
"learning_rate": 0.1,
|
| 44708 |
+
"loss": 2.2673754692077637,
|
| 44709 |
+
"step": 12762
|
| 44710 |
+
},
|
| 44711 |
+
{
|
| 44712 |
+
"epoch": 0.4052063492063492,
|
| 44713 |
+
"grad_norm": 0.06494140625,
|
| 44714 |
+
"learning_rate": 0.1,
|
| 44715 |
+
"loss": 2.2353477478027344,
|
| 44716 |
+
"step": 12764
|
| 44717 |
+
},
|
| 44718 |
+
{
|
| 44719 |
+
"epoch": 0.4052698412698413,
|
| 44720 |
+
"grad_norm": 0.1416015625,
|
| 44721 |
+
"learning_rate": 0.1,
|
| 44722 |
+
"loss": 2.2503252029418945,
|
| 44723 |
+
"step": 12766
|
| 44724 |
+
},
|
| 44725 |
+
{
|
| 44726 |
+
"epoch": 0.4053333333333333,
|
| 44727 |
+
"grad_norm": 0.41015625,
|
| 44728 |
+
"learning_rate": 0.1,
|
| 44729 |
+
"loss": 2.2447316646575928,
|
| 44730 |
+
"step": 12768
|
| 44731 |
+
},
|
| 44732 |
+
{
|
| 44733 |
+
"epoch": 0.4053968253968254,
|
| 44734 |
+
"grad_norm": 0.177734375,
|
| 44735 |
+
"learning_rate": 0.1,
|
| 44736 |
+
"loss": 2.234184741973877,
|
| 44737 |
+
"step": 12770
|
| 44738 |
+
},
|
| 44739 |
+
{
|
| 44740 |
+
"epoch": 0.40546031746031747,
|
| 44741 |
+
"grad_norm": 0.06982421875,
|
| 44742 |
+
"learning_rate": 0.1,
|
| 44743 |
+
"loss": 2.243861198425293,
|
| 44744 |
+
"step": 12772
|
| 44745 |
+
},
|
| 44746 |
+
{
|
| 44747 |
+
"epoch": 0.4055238095238095,
|
| 44748 |
+
"grad_norm": 0.06982421875,
|
| 44749 |
+
"learning_rate": 0.1,
|
| 44750 |
+
"loss": 2.2590394020080566,
|
| 44751 |
+
"step": 12774
|
| 44752 |
+
},
|
| 44753 |
+
{
|
| 44754 |
+
"epoch": 0.4055873015873016,
|
| 44755 |
+
"grad_norm": 0.099609375,
|
| 44756 |
+
"learning_rate": 0.1,
|
| 44757 |
+
"loss": 2.251347064971924,
|
| 44758 |
+
"step": 12776
|
| 44759 |
+
},
|
| 44760 |
+
{
|
| 44761 |
+
"epoch": 0.40565079365079365,
|
| 44762 |
+
"grad_norm": 0.0791015625,
|
| 44763 |
+
"learning_rate": 0.1,
|
| 44764 |
+
"loss": 2.2699408531188965,
|
| 44765 |
+
"step": 12778
|
| 44766 |
+
},
|
| 44767 |
+
{
|
| 44768 |
+
"epoch": 0.4057142857142857,
|
| 44769 |
+
"grad_norm": 0.212890625,
|
| 44770 |
+
"learning_rate": 0.1,
|
| 44771 |
+
"loss": 2.25612735748291,
|
| 44772 |
+
"step": 12780
|
| 44773 |
+
},
|
| 44774 |
+
{
|
| 44775 |
+
"epoch": 0.4057777777777778,
|
| 44776 |
+
"grad_norm": 0.1953125,
|
| 44777 |
+
"learning_rate": 0.1,
|
| 44778 |
+
"loss": 2.2291111946105957,
|
| 44779 |
+
"step": 12782
|
| 44780 |
+
},
|
| 44781 |
+
{
|
| 44782 |
+
"epoch": 0.40584126984126984,
|
| 44783 |
+
"grad_norm": 0.1171875,
|
| 44784 |
+
"learning_rate": 0.1,
|
| 44785 |
+
"loss": 2.274329900741577,
|
| 44786 |
+
"step": 12784
|
| 44787 |
+
},
|
| 44788 |
+
{
|
| 44789 |
+
"epoch": 0.4059047619047619,
|
| 44790 |
+
"grad_norm": 0.126953125,
|
| 44791 |
+
"learning_rate": 0.1,
|
| 44792 |
+
"loss": 2.2519423961639404,
|
| 44793 |
+
"step": 12786
|
| 44794 |
+
},
|
| 44795 |
+
{
|
| 44796 |
+
"epoch": 0.405968253968254,
|
| 44797 |
+
"grad_norm": 0.07421875,
|
| 44798 |
+
"learning_rate": 0.1,
|
| 44799 |
+
"loss": 2.256042957305908,
|
| 44800 |
+
"step": 12788
|
| 44801 |
+
},
|
| 44802 |
+
{
|
| 44803 |
+
"epoch": 0.406031746031746,
|
| 44804 |
+
"grad_norm": 0.33203125,
|
| 44805 |
+
"learning_rate": 0.1,
|
| 44806 |
+
"loss": 2.272219657897949,
|
| 44807 |
+
"step": 12790
|
| 44808 |
+
},
|
| 44809 |
+
{
|
| 44810 |
+
"epoch": 0.4060952380952381,
|
| 44811 |
+
"grad_norm": 0.578125,
|
| 44812 |
+
"learning_rate": 0.1,
|
| 44813 |
+
"loss": 2.256260871887207,
|
| 44814 |
+
"step": 12792
|
| 44815 |
+
},
|
| 44816 |
+
{
|
| 44817 |
+
"epoch": 0.40615873015873016,
|
| 44818 |
+
"grad_norm": 0.08544921875,
|
| 44819 |
+
"learning_rate": 0.1,
|
| 44820 |
+
"loss": 2.251669406890869,
|
| 44821 |
+
"step": 12794
|
| 44822 |
+
},
|
| 44823 |
+
{
|
| 44824 |
+
"epoch": 0.4062222222222222,
|
| 44825 |
+
"grad_norm": 0.04541015625,
|
| 44826 |
+
"learning_rate": 0.1,
|
| 44827 |
+
"loss": 2.269117593765259,
|
| 44828 |
+
"step": 12796
|
| 44829 |
+
},
|
| 44830 |
+
{
|
| 44831 |
+
"epoch": 0.4062857142857143,
|
| 44832 |
+
"grad_norm": 0.08447265625,
|
| 44833 |
+
"learning_rate": 0.1,
|
| 44834 |
+
"loss": 2.2264370918273926,
|
| 44835 |
+
"step": 12798
|
| 44836 |
+
},
|
| 44837 |
+
{
|
| 44838 |
+
"epoch": 0.40634920634920635,
|
| 44839 |
+
"grad_norm": 0.125,
|
| 44840 |
+
"learning_rate": 0.1,
|
| 44841 |
+
"loss": 2.255627393722534,
|
| 44842 |
+
"step": 12800
|
| 44843 |
+
},
|
| 44844 |
+
{
|
| 44845 |
+
"epoch": 0.4064126984126984,
|
| 44846 |
+
"grad_norm": 0.16015625,
|
| 44847 |
+
"learning_rate": 0.1,
|
| 44848 |
+
"loss": 2.232905387878418,
|
| 44849 |
+
"step": 12802
|
| 44850 |
+
},
|
| 44851 |
+
{
|
| 44852 |
+
"epoch": 0.4064761904761905,
|
| 44853 |
+
"grad_norm": 0.177734375,
|
| 44854 |
+
"learning_rate": 0.1,
|
| 44855 |
+
"loss": 2.2764182090759277,
|
| 44856 |
+
"step": 12804
|
| 44857 |
+
},
|
| 44858 |
+
{
|
| 44859 |
+
"epoch": 0.40653968253968253,
|
| 44860 |
+
"grad_norm": 0.1083984375,
|
| 44861 |
+
"learning_rate": 0.1,
|
| 44862 |
+
"loss": 2.243446111679077,
|
| 44863 |
+
"step": 12806
|
| 44864 |
+
},
|
| 44865 |
+
{
|
| 44866 |
+
"epoch": 0.4066031746031746,
|
| 44867 |
+
"grad_norm": 0.072265625,
|
| 44868 |
+
"learning_rate": 0.1,
|
| 44869 |
+
"loss": 2.2535452842712402,
|
| 44870 |
+
"step": 12808
|
| 44871 |
+
},
|
| 44872 |
+
{
|
| 44873 |
+
"epoch": 0.4066666666666667,
|
| 44874 |
+
"grad_norm": 0.0458984375,
|
| 44875 |
+
"learning_rate": 0.1,
|
| 44876 |
+
"loss": 2.2059407234191895,
|
| 44877 |
+
"step": 12810
|
| 44878 |
+
},
|
| 44879 |
+
{
|
| 44880 |
+
"epoch": 0.4067301587301587,
|
| 44881 |
+
"grad_norm": 0.162109375,
|
| 44882 |
+
"learning_rate": 0.1,
|
| 44883 |
+
"loss": 2.250304698944092,
|
| 44884 |
+
"step": 12812
|
| 44885 |
+
},
|
| 44886 |
+
{
|
| 44887 |
+
"epoch": 0.4067936507936508,
|
| 44888 |
+
"grad_norm": 0.2099609375,
|
| 44889 |
+
"learning_rate": 0.1,
|
| 44890 |
+
"loss": 2.2253971099853516,
|
| 44891 |
+
"step": 12814
|
| 44892 |
+
},
|
| 44893 |
+
{
|
| 44894 |
+
"epoch": 0.40685714285714286,
|
| 44895 |
+
"grad_norm": 0.1376953125,
|
| 44896 |
+
"learning_rate": 0.1,
|
| 44897 |
+
"loss": 2.227365493774414,
|
| 44898 |
+
"step": 12816
|
| 44899 |
+
},
|
| 44900 |
+
{
|
| 44901 |
+
"epoch": 0.4069206349206349,
|
| 44902 |
+
"grad_norm": 0.0908203125,
|
| 44903 |
+
"learning_rate": 0.1,
|
| 44904 |
+
"loss": 2.25632905960083,
|
| 44905 |
+
"step": 12818
|
| 44906 |
+
},
|
| 44907 |
+
{
|
| 44908 |
+
"epoch": 0.406984126984127,
|
| 44909 |
+
"grad_norm": 0.18359375,
|
| 44910 |
+
"learning_rate": 0.1,
|
| 44911 |
+
"loss": 2.231961965560913,
|
| 44912 |
+
"step": 12820
|
| 44913 |
+
},
|
| 44914 |
+
{
|
| 44915 |
+
"epoch": 0.40704761904761905,
|
| 44916 |
+
"grad_norm": 0.34375,
|
| 44917 |
+
"learning_rate": 0.1,
|
| 44918 |
+
"loss": 2.2555220127105713,
|
| 44919 |
+
"step": 12822
|
| 44920 |
+
},
|
| 44921 |
+
{
|
| 44922 |
+
"epoch": 0.4071111111111111,
|
| 44923 |
+
"grad_norm": 0.1484375,
|
| 44924 |
+
"learning_rate": 0.1,
|
| 44925 |
+
"loss": 2.24263334274292,
|
| 44926 |
+
"step": 12824
|
| 44927 |
+
},
|
| 44928 |
+
{
|
| 44929 |
+
"epoch": 0.4071746031746032,
|
| 44930 |
+
"grad_norm": 0.173828125,
|
| 44931 |
+
"learning_rate": 0.1,
|
| 44932 |
+
"loss": 2.242009162902832,
|
| 44933 |
+
"step": 12826
|
| 44934 |
+
},
|
| 44935 |
+
{
|
| 44936 |
+
"epoch": 0.40723809523809523,
|
| 44937 |
+
"grad_norm": 0.1494140625,
|
| 44938 |
+
"learning_rate": 0.1,
|
| 44939 |
+
"loss": 2.246868371963501,
|
| 44940 |
+
"step": 12828
|
| 44941 |
+
},
|
| 44942 |
+
{
|
| 44943 |
+
"epoch": 0.4073015873015873,
|
| 44944 |
+
"grad_norm": 0.2060546875,
|
| 44945 |
+
"learning_rate": 0.1,
|
| 44946 |
+
"loss": 2.2295122146606445,
|
| 44947 |
+
"step": 12830
|
| 44948 |
+
},
|
| 44949 |
+
{
|
| 44950 |
+
"epoch": 0.4073650793650794,
|
| 44951 |
+
"grad_norm": 0.267578125,
|
| 44952 |
+
"learning_rate": 0.1,
|
| 44953 |
+
"loss": 2.263503313064575,
|
| 44954 |
+
"step": 12832
|
| 44955 |
+
},
|
| 44956 |
+
{
|
| 44957 |
+
"epoch": 0.4074285714285714,
|
| 44958 |
+
"grad_norm": 0.09423828125,
|
| 44959 |
+
"learning_rate": 0.1,
|
| 44960 |
+
"loss": 2.241060256958008,
|
| 44961 |
+
"step": 12834
|
| 44962 |
+
},
|
| 44963 |
+
{
|
| 44964 |
+
"epoch": 0.4074920634920635,
|
| 44965 |
+
"grad_norm": 0.376953125,
|
| 44966 |
+
"learning_rate": 0.1,
|
| 44967 |
+
"loss": 2.2220916748046875,
|
| 44968 |
+
"step": 12836
|
| 44969 |
+
},
|
| 44970 |
+
{
|
| 44971 |
+
"epoch": 0.40755555555555556,
|
| 44972 |
+
"grad_norm": 0.27734375,
|
| 44973 |
+
"learning_rate": 0.1,
|
| 44974 |
+
"loss": 2.221209764480591,
|
| 44975 |
+
"step": 12838
|
| 44976 |
+
},
|
| 44977 |
+
{
|
| 44978 |
+
"epoch": 0.4076190476190476,
|
| 44979 |
+
"grad_norm": 0.10546875,
|
| 44980 |
+
"learning_rate": 0.1,
|
| 44981 |
+
"loss": 2.230886220932007,
|
| 44982 |
+
"step": 12840
|
| 44983 |
+
},
|
| 44984 |
+
{
|
| 44985 |
+
"epoch": 0.4076825396825397,
|
| 44986 |
+
"grad_norm": 0.123046875,
|
| 44987 |
+
"learning_rate": 0.1,
|
| 44988 |
+
"loss": 2.2281064987182617,
|
| 44989 |
+
"step": 12842
|
| 44990 |
+
},
|
| 44991 |
+
{
|
| 44992 |
+
"epoch": 0.40774603174603175,
|
| 44993 |
+
"grad_norm": 0.09765625,
|
| 44994 |
+
"learning_rate": 0.1,
|
| 44995 |
+
"loss": 2.2533373832702637,
|
| 44996 |
+
"step": 12844
|
| 44997 |
+
},
|
| 44998 |
+
{
|
| 44999 |
+
"epoch": 0.4078095238095238,
|
| 45000 |
+
"grad_norm": 0.11279296875,
|
| 45001 |
+
"learning_rate": 0.1,
|
| 45002 |
+
"loss": 2.220545530319214,
|
| 45003 |
+
"step": 12846
|
| 45004 |
+
},
|
| 45005 |
+
{
|
| 45006 |
+
"epoch": 0.4078730158730159,
|
| 45007 |
+
"grad_norm": 0.1962890625,
|
| 45008 |
+
"learning_rate": 0.1,
|
| 45009 |
+
"loss": 2.207355499267578,
|
| 45010 |
+
"step": 12848
|
| 45011 |
+
},
|
| 45012 |
+
{
|
| 45013 |
+
"epoch": 0.40793650793650793,
|
| 45014 |
+
"grad_norm": 0.07568359375,
|
| 45015 |
+
"learning_rate": 0.1,
|
| 45016 |
+
"loss": 2.2466094493865967,
|
| 45017 |
+
"step": 12850
|
| 45018 |
+
},
|
| 45019 |
+
{
|
| 45020 |
+
"epoch": 0.408,
|
| 45021 |
+
"grad_norm": 0.083984375,
|
| 45022 |
+
"learning_rate": 0.1,
|
| 45023 |
+
"loss": 2.2127885818481445,
|
| 45024 |
+
"step": 12852
|
| 45025 |
+
},
|
| 45026 |
+
{
|
| 45027 |
+
"epoch": 0.4080634920634921,
|
| 45028 |
+
"grad_norm": 0.2216796875,
|
| 45029 |
+
"learning_rate": 0.1,
|
| 45030 |
+
"loss": 2.2296013832092285,
|
| 45031 |
+
"step": 12854
|
| 45032 |
+
},
|
| 45033 |
+
{
|
| 45034 |
+
"epoch": 0.4081269841269841,
|
| 45035 |
+
"grad_norm": 0.34765625,
|
| 45036 |
+
"learning_rate": 0.1,
|
| 45037 |
+
"loss": 2.2332053184509277,
|
| 45038 |
+
"step": 12856
|
| 45039 |
+
},
|
| 45040 |
+
{
|
| 45041 |
+
"epoch": 0.4081904761904762,
|
| 45042 |
+
"grad_norm": 0.11962890625,
|
| 45043 |
+
"learning_rate": 0.1,
|
| 45044 |
+
"loss": 2.2455201148986816,
|
| 45045 |
+
"step": 12858
|
| 45046 |
+
},
|
| 45047 |
+
{
|
| 45048 |
+
"epoch": 0.40825396825396826,
|
| 45049 |
+
"grad_norm": 0.2041015625,
|
| 45050 |
+
"learning_rate": 0.1,
|
| 45051 |
+
"loss": 2.2342209815979004,
|
| 45052 |
+
"step": 12860
|
| 45053 |
+
},
|
| 45054 |
+
{
|
| 45055 |
+
"epoch": 0.4083174603174603,
|
| 45056 |
+
"grad_norm": 0.134765625,
|
| 45057 |
+
"learning_rate": 0.1,
|
| 45058 |
+
"loss": 2.2207164764404297,
|
| 45059 |
+
"step": 12862
|
| 45060 |
+
},
|
| 45061 |
+
{
|
| 45062 |
+
"epoch": 0.4083809523809524,
|
| 45063 |
+
"grad_norm": 0.220703125,
|
| 45064 |
+
"learning_rate": 0.1,
|
| 45065 |
+
"loss": 2.2393648624420166,
|
| 45066 |
+
"step": 12864
|
| 45067 |
+
},
|
| 45068 |
+
{
|
| 45069 |
+
"epoch": 0.40844444444444444,
|
| 45070 |
+
"grad_norm": 0.146484375,
|
| 45071 |
+
"learning_rate": 0.1,
|
| 45072 |
+
"loss": 2.2382168769836426,
|
| 45073 |
+
"step": 12866
|
| 45074 |
+
},
|
| 45075 |
+
{
|
| 45076 |
+
"epoch": 0.4085079365079365,
|
| 45077 |
+
"grad_norm": 0.06689453125,
|
| 45078 |
+
"learning_rate": 0.1,
|
| 45079 |
+
"loss": 2.2354063987731934,
|
| 45080 |
+
"step": 12868
|
| 45081 |
+
},
|
| 45082 |
+
{
|
| 45083 |
+
"epoch": 0.4085714285714286,
|
| 45084 |
+
"grad_norm": 0.205078125,
|
| 45085 |
+
"learning_rate": 0.1,
|
| 45086 |
+
"loss": 2.2208573818206787,
|
| 45087 |
+
"step": 12870
|
| 45088 |
+
},
|
| 45089 |
+
{
|
| 45090 |
+
"epoch": 0.40863492063492063,
|
| 45091 |
+
"grad_norm": 0.06689453125,
|
| 45092 |
+
"learning_rate": 0.1,
|
| 45093 |
+
"loss": 2.2357020378112793,
|
| 45094 |
+
"step": 12872
|
| 45095 |
+
},
|
| 45096 |
+
{
|
| 45097 |
+
"epoch": 0.40869841269841267,
|
| 45098 |
+
"grad_norm": 0.1513671875,
|
| 45099 |
+
"learning_rate": 0.1,
|
| 45100 |
+
"loss": 2.2209126949310303,
|
| 45101 |
+
"step": 12874
|
| 45102 |
+
},
|
| 45103 |
+
{
|
| 45104 |
+
"epoch": 0.40876190476190477,
|
| 45105 |
+
"grad_norm": 0.310546875,
|
| 45106 |
+
"learning_rate": 0.1,
|
| 45107 |
+
"loss": 2.2232158184051514,
|
| 45108 |
+
"step": 12876
|
| 45109 |
+
},
|
| 45110 |
+
{
|
| 45111 |
+
"epoch": 0.4088253968253968,
|
| 45112 |
+
"grad_norm": 0.275390625,
|
| 45113 |
+
"learning_rate": 0.1,
|
| 45114 |
+
"loss": 2.1869778633117676,
|
| 45115 |
+
"step": 12878
|
| 45116 |
+
},
|
| 45117 |
+
{
|
| 45118 |
+
"epoch": 0.4088888888888889,
|
| 45119 |
+
"grad_norm": 0.1484375,
|
| 45120 |
+
"learning_rate": 0.1,
|
| 45121 |
+
"loss": 2.230013847351074,
|
| 45122 |
+
"step": 12880
|
| 45123 |
+
},
|
| 45124 |
+
{
|
| 45125 |
+
"epoch": 0.40895238095238096,
|
| 45126 |
+
"grad_norm": 0.1416015625,
|
| 45127 |
+
"learning_rate": 0.1,
|
| 45128 |
+
"loss": 2.2143027782440186,
|
| 45129 |
+
"step": 12882
|
| 45130 |
+
},
|
| 45131 |
+
{
|
| 45132 |
+
"epoch": 0.409015873015873,
|
| 45133 |
+
"grad_norm": 0.171875,
|
| 45134 |
+
"learning_rate": 0.1,
|
| 45135 |
+
"loss": 2.2395071983337402,
|
| 45136 |
+
"step": 12884
|
| 45137 |
+
},
|
| 45138 |
+
{
|
| 45139 |
+
"epoch": 0.4090793650793651,
|
| 45140 |
+
"grad_norm": 0.18359375,
|
| 45141 |
+
"learning_rate": 0.1,
|
| 45142 |
+
"loss": 2.2181894779205322,
|
| 45143 |
+
"step": 12886
|
| 45144 |
+
},
|
| 45145 |
+
{
|
| 45146 |
+
"epoch": 0.40914285714285714,
|
| 45147 |
+
"grad_norm": 0.1298828125,
|
| 45148 |
+
"learning_rate": 0.1,
|
| 45149 |
+
"loss": 2.237212657928467,
|
| 45150 |
+
"step": 12888
|
| 45151 |
+
},
|
| 45152 |
+
{
|
| 45153 |
+
"epoch": 0.4092063492063492,
|
| 45154 |
+
"grad_norm": 0.1787109375,
|
| 45155 |
+
"learning_rate": 0.1,
|
| 45156 |
+
"loss": 2.204676866531372,
|
| 45157 |
+
"step": 12890
|
| 45158 |
+
},
|
| 45159 |
+
{
|
| 45160 |
+
"epoch": 0.4092698412698413,
|
| 45161 |
+
"grad_norm": 0.3515625,
|
| 45162 |
+
"learning_rate": 0.1,
|
| 45163 |
+
"loss": 2.2483561038970947,
|
| 45164 |
+
"step": 12892
|
| 45165 |
+
},
|
| 45166 |
+
{
|
| 45167 |
+
"epoch": 0.4093333333333333,
|
| 45168 |
+
"grad_norm": 0.12109375,
|
| 45169 |
+
"learning_rate": 0.1,
|
| 45170 |
+
"loss": 2.2073922157287598,
|
| 45171 |
+
"step": 12894
|
| 45172 |
+
},
|
| 45173 |
+
{
|
| 45174 |
+
"epoch": 0.40939682539682537,
|
| 45175 |
+
"grad_norm": 0.1669921875,
|
| 45176 |
+
"learning_rate": 0.1,
|
| 45177 |
+
"loss": 2.2114458084106445,
|
| 45178 |
+
"step": 12896
|
| 45179 |
+
},
|
| 45180 |
+
{
|
| 45181 |
+
"epoch": 0.40946031746031747,
|
| 45182 |
+
"grad_norm": 0.2314453125,
|
| 45183 |
+
"learning_rate": 0.1,
|
| 45184 |
+
"loss": 2.2161312103271484,
|
| 45185 |
+
"step": 12898
|
| 45186 |
+
},
|
| 45187 |
+
{
|
| 45188 |
+
"epoch": 0.4095238095238095,
|
| 45189 |
+
"grad_norm": 0.07470703125,
|
| 45190 |
+
"learning_rate": 0.1,
|
| 45191 |
+
"loss": 2.237602472305298,
|
| 45192 |
+
"step": 12900
|
| 45193 |
+
},
|
| 45194 |
+
{
|
| 45195 |
+
"epoch": 0.4095873015873016,
|
| 45196 |
+
"grad_norm": 0.1396484375,
|
| 45197 |
+
"learning_rate": 0.1,
|
| 45198 |
+
"loss": 2.218475580215454,
|
| 45199 |
+
"step": 12902
|
| 45200 |
+
},
|
| 45201 |
+
{
|
| 45202 |
+
"epoch": 0.40965079365079365,
|
| 45203 |
+
"grad_norm": 0.1767578125,
|
| 45204 |
+
"learning_rate": 0.1,
|
| 45205 |
+
"loss": 2.231311798095703,
|
| 45206 |
+
"step": 12904
|
| 45207 |
+
},
|
| 45208 |
+
{
|
| 45209 |
+
"epoch": 0.4097142857142857,
|
| 45210 |
+
"grad_norm": 0.11328125,
|
| 45211 |
+
"learning_rate": 0.1,
|
| 45212 |
+
"loss": 2.2220139503479004,
|
| 45213 |
+
"step": 12906
|
| 45214 |
+
},
|
| 45215 |
+
{
|
| 45216 |
+
"epoch": 0.4097777777777778,
|
| 45217 |
+
"grad_norm": 0.10009765625,
|
| 45218 |
+
"learning_rate": 0.1,
|
| 45219 |
+
"loss": 2.226863384246826,
|
| 45220 |
+
"step": 12908
|
| 45221 |
+
},
|
| 45222 |
+
{
|
| 45223 |
+
"epoch": 0.40984126984126984,
|
| 45224 |
+
"grad_norm": 0.08056640625,
|
| 45225 |
+
"learning_rate": 0.1,
|
| 45226 |
+
"loss": 2.22263503074646,
|
| 45227 |
+
"step": 12910
|
| 45228 |
+
},
|
| 45229 |
+
{
|
| 45230 |
+
"epoch": 0.4099047619047619,
|
| 45231 |
+
"grad_norm": 0.25,
|
| 45232 |
+
"learning_rate": 0.1,
|
| 45233 |
+
"loss": 2.216628074645996,
|
| 45234 |
+
"step": 12912
|
| 45235 |
+
},
|
| 45236 |
+
{
|
| 45237 |
+
"epoch": 0.409968253968254,
|
| 45238 |
+
"grad_norm": 0.4765625,
|
| 45239 |
+
"learning_rate": 0.1,
|
| 45240 |
+
"loss": 2.2253456115722656,
|
| 45241 |
+
"step": 12914
|
| 45242 |
}
|
| 45243 |
],
|
| 45244 |
"logging_steps": 2,
|
|
|
|
| 45258 |
"attributes": {}
|
| 45259 |
}
|
| 45260 |
},
|
| 45261 |
+
"total_flos": 4.277200399146157e+19,
|
| 45262 |
"train_batch_size": 4,
|
| 45263 |
"trial_name": null,
|
| 45264 |
"trial_params": null
|