Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 12285, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1171937904
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ccf3832274462c19ae6e1156105c162797ccf65eb00490b5abf25853f514c2bc
|
| 3 |
size 1171937904
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1288212619
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:643729e56f97304b8e07b41a168a1e56714ba24900040ce753f6443fda6c397d
|
| 3 |
size 1288212619
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1401
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:42b726231982d4fccb23f68a4f834f6558cd6f912a84230e86f2230216451432
|
| 3 |
size 1401
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -41927,6 +41927,1105 @@
|
|
| 41927 |
"learning_rate": 0.1,
|
| 41928 |
"loss": 2.4388091564178467,
|
| 41929 |
"step": 11970
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41930 |
}
|
| 41931 |
],
|
| 41932 |
"logging_steps": 2,
|
|
@@ -41946,7 +43045,7 @@
|
|
| 41946 |
"attributes": {}
|
| 41947 |
}
|
| 41948 |
},
|
| 41949 |
-
"total_flos":
|
| 41950 |
"train_batch_size": 4,
|
| 41951 |
"trial_name": null,
|
| 41952 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.39,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 12285,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 41927 |
"learning_rate": 0.1,
|
| 41928 |
"loss": 2.4388091564178467,
|
| 41929 |
"step": 11970
|
| 41930 |
+
},
|
| 41931 |
+
{
|
| 41932 |
+
"epoch": 0.38006349206349205,
|
| 41933 |
+
"grad_norm": 0.1318359375,
|
| 41934 |
+
"learning_rate": 0.1,
|
| 41935 |
+
"loss": 2.4562020301818848,
|
| 41936 |
+
"step": 11972
|
| 41937 |
+
},
|
| 41938 |
+
{
|
| 41939 |
+
"epoch": 0.38012698412698415,
|
| 41940 |
+
"grad_norm": 0.17578125,
|
| 41941 |
+
"learning_rate": 0.1,
|
| 41942 |
+
"loss": 2.4419918060302734,
|
| 41943 |
+
"step": 11974
|
| 41944 |
+
},
|
| 41945 |
+
{
|
| 41946 |
+
"epoch": 0.3801904761904762,
|
| 41947 |
+
"grad_norm": 0.283203125,
|
| 41948 |
+
"learning_rate": 0.1,
|
| 41949 |
+
"loss": 2.4348676204681396,
|
| 41950 |
+
"step": 11976
|
| 41951 |
+
},
|
| 41952 |
+
{
|
| 41953 |
+
"epoch": 0.38025396825396823,
|
| 41954 |
+
"grad_norm": 0.2177734375,
|
| 41955 |
+
"learning_rate": 0.1,
|
| 41956 |
+
"loss": 2.4611902236938477,
|
| 41957 |
+
"step": 11978
|
| 41958 |
+
},
|
| 41959 |
+
{
|
| 41960 |
+
"epoch": 0.38031746031746033,
|
| 41961 |
+
"grad_norm": 0.1396484375,
|
| 41962 |
+
"learning_rate": 0.1,
|
| 41963 |
+
"loss": 2.4563024044036865,
|
| 41964 |
+
"step": 11980
|
| 41965 |
+
},
|
| 41966 |
+
{
|
| 41967 |
+
"epoch": 0.3803809523809524,
|
| 41968 |
+
"grad_norm": 0.16015625,
|
| 41969 |
+
"learning_rate": 0.1,
|
| 41970 |
+
"loss": 2.445188045501709,
|
| 41971 |
+
"step": 11982
|
| 41972 |
+
},
|
| 41973 |
+
{
|
| 41974 |
+
"epoch": 0.3804444444444444,
|
| 41975 |
+
"grad_norm": 0.205078125,
|
| 41976 |
+
"learning_rate": 0.1,
|
| 41977 |
+
"loss": 2.474489212036133,
|
| 41978 |
+
"step": 11984
|
| 41979 |
+
},
|
| 41980 |
+
{
|
| 41981 |
+
"epoch": 0.3805079365079365,
|
| 41982 |
+
"grad_norm": 0.39453125,
|
| 41983 |
+
"learning_rate": 0.1,
|
| 41984 |
+
"loss": 2.4510064125061035,
|
| 41985 |
+
"step": 11986
|
| 41986 |
+
},
|
| 41987 |
+
{
|
| 41988 |
+
"epoch": 0.38057142857142856,
|
| 41989 |
+
"grad_norm": 0.236328125,
|
| 41990 |
+
"learning_rate": 0.1,
|
| 41991 |
+
"loss": 2.494896650314331,
|
| 41992 |
+
"step": 11988
|
| 41993 |
+
},
|
| 41994 |
+
{
|
| 41995 |
+
"epoch": 0.38063492063492066,
|
| 41996 |
+
"grad_norm": 0.1552734375,
|
| 41997 |
+
"learning_rate": 0.1,
|
| 41998 |
+
"loss": 2.458981513977051,
|
| 41999 |
+
"step": 11990
|
| 42000 |
+
},
|
| 42001 |
+
{
|
| 42002 |
+
"epoch": 0.3806984126984127,
|
| 42003 |
+
"grad_norm": 0.287109375,
|
| 42004 |
+
"learning_rate": 0.1,
|
| 42005 |
+
"loss": 2.4623570442199707,
|
| 42006 |
+
"step": 11992
|
| 42007 |
+
},
|
| 42008 |
+
{
|
| 42009 |
+
"epoch": 0.38076190476190475,
|
| 42010 |
+
"grad_norm": 0.337890625,
|
| 42011 |
+
"learning_rate": 0.1,
|
| 42012 |
+
"loss": 2.4659810066223145,
|
| 42013 |
+
"step": 11994
|
| 42014 |
+
},
|
| 42015 |
+
{
|
| 42016 |
+
"epoch": 0.38082539682539684,
|
| 42017 |
+
"grad_norm": 0.07958984375,
|
| 42018 |
+
"learning_rate": 0.1,
|
| 42019 |
+
"loss": 2.465498447418213,
|
| 42020 |
+
"step": 11996
|
| 42021 |
+
},
|
| 42022 |
+
{
|
| 42023 |
+
"epoch": 0.3808888888888889,
|
| 42024 |
+
"grad_norm": 0.0859375,
|
| 42025 |
+
"learning_rate": 0.1,
|
| 42026 |
+
"loss": 2.4689385890960693,
|
| 42027 |
+
"step": 11998
|
| 42028 |
+
},
|
| 42029 |
+
{
|
| 42030 |
+
"epoch": 0.38095238095238093,
|
| 42031 |
+
"grad_norm": 0.1826171875,
|
| 42032 |
+
"learning_rate": 0.1,
|
| 42033 |
+
"loss": 2.4477922916412354,
|
| 42034 |
+
"step": 12000
|
| 42035 |
+
},
|
| 42036 |
+
{
|
| 42037 |
+
"epoch": 0.38101587301587303,
|
| 42038 |
+
"grad_norm": 0.6484375,
|
| 42039 |
+
"learning_rate": 0.1,
|
| 42040 |
+
"loss": 2.4727680683135986,
|
| 42041 |
+
"step": 12002
|
| 42042 |
+
},
|
| 42043 |
+
{
|
| 42044 |
+
"epoch": 0.3810793650793651,
|
| 42045 |
+
"grad_norm": 0.1787109375,
|
| 42046 |
+
"learning_rate": 0.1,
|
| 42047 |
+
"loss": 2.4675381183624268,
|
| 42048 |
+
"step": 12004
|
| 42049 |
+
},
|
| 42050 |
+
{
|
| 42051 |
+
"epoch": 0.3811428571428571,
|
| 42052 |
+
"grad_norm": 0.060302734375,
|
| 42053 |
+
"learning_rate": 0.1,
|
| 42054 |
+
"loss": 2.453152894973755,
|
| 42055 |
+
"step": 12006
|
| 42056 |
+
},
|
| 42057 |
+
{
|
| 42058 |
+
"epoch": 0.3812063492063492,
|
| 42059 |
+
"grad_norm": 0.2158203125,
|
| 42060 |
+
"learning_rate": 0.1,
|
| 42061 |
+
"loss": 2.4358911514282227,
|
| 42062 |
+
"step": 12008
|
| 42063 |
+
},
|
| 42064 |
+
{
|
| 42065 |
+
"epoch": 0.38126984126984126,
|
| 42066 |
+
"grad_norm": 0.21875,
|
| 42067 |
+
"learning_rate": 0.1,
|
| 42068 |
+
"loss": 2.4744372367858887,
|
| 42069 |
+
"step": 12010
|
| 42070 |
+
},
|
| 42071 |
+
{
|
| 42072 |
+
"epoch": 0.38133333333333336,
|
| 42073 |
+
"grad_norm": 0.09912109375,
|
| 42074 |
+
"learning_rate": 0.1,
|
| 42075 |
+
"loss": 2.45621919631958,
|
| 42076 |
+
"step": 12012
|
| 42077 |
+
},
|
| 42078 |
+
{
|
| 42079 |
+
"epoch": 0.3813968253968254,
|
| 42080 |
+
"grad_norm": 0.2060546875,
|
| 42081 |
+
"learning_rate": 0.1,
|
| 42082 |
+
"loss": 2.468158006668091,
|
| 42083 |
+
"step": 12014
|
| 42084 |
+
},
|
| 42085 |
+
{
|
| 42086 |
+
"epoch": 0.38146031746031744,
|
| 42087 |
+
"grad_norm": 0.205078125,
|
| 42088 |
+
"learning_rate": 0.1,
|
| 42089 |
+
"loss": 2.4647765159606934,
|
| 42090 |
+
"step": 12016
|
| 42091 |
+
},
|
| 42092 |
+
{
|
| 42093 |
+
"epoch": 0.38152380952380954,
|
| 42094 |
+
"grad_norm": 0.1494140625,
|
| 42095 |
+
"learning_rate": 0.1,
|
| 42096 |
+
"loss": 2.4902184009552,
|
| 42097 |
+
"step": 12018
|
| 42098 |
+
},
|
| 42099 |
+
{
|
| 42100 |
+
"epoch": 0.3815873015873016,
|
| 42101 |
+
"grad_norm": 0.06884765625,
|
| 42102 |
+
"learning_rate": 0.1,
|
| 42103 |
+
"loss": 2.4720568656921387,
|
| 42104 |
+
"step": 12020
|
| 42105 |
+
},
|
| 42106 |
+
{
|
| 42107 |
+
"epoch": 0.38165079365079363,
|
| 42108 |
+
"grad_norm": 0.1767578125,
|
| 42109 |
+
"learning_rate": 0.1,
|
| 42110 |
+
"loss": 2.4778892993927,
|
| 42111 |
+
"step": 12022
|
| 42112 |
+
},
|
| 42113 |
+
{
|
| 42114 |
+
"epoch": 0.38171428571428573,
|
| 42115 |
+
"grad_norm": 0.2197265625,
|
| 42116 |
+
"learning_rate": 0.1,
|
| 42117 |
+
"loss": 2.46598482131958,
|
| 42118 |
+
"step": 12024
|
| 42119 |
+
},
|
| 42120 |
+
{
|
| 42121 |
+
"epoch": 0.38177777777777777,
|
| 42122 |
+
"grad_norm": 0.302734375,
|
| 42123 |
+
"learning_rate": 0.1,
|
| 42124 |
+
"loss": 2.4907238483428955,
|
| 42125 |
+
"step": 12026
|
| 42126 |
+
},
|
| 42127 |
+
{
|
| 42128 |
+
"epoch": 0.3818412698412698,
|
| 42129 |
+
"grad_norm": 0.1455078125,
|
| 42130 |
+
"learning_rate": 0.1,
|
| 42131 |
+
"loss": 2.474137783050537,
|
| 42132 |
+
"step": 12028
|
| 42133 |
+
},
|
| 42134 |
+
{
|
| 42135 |
+
"epoch": 0.3819047619047619,
|
| 42136 |
+
"grad_norm": 0.283203125,
|
| 42137 |
+
"learning_rate": 0.1,
|
| 42138 |
+
"loss": 2.4604177474975586,
|
| 42139 |
+
"step": 12030
|
| 42140 |
+
},
|
| 42141 |
+
{
|
| 42142 |
+
"epoch": 0.38196825396825396,
|
| 42143 |
+
"grad_norm": 0.25,
|
| 42144 |
+
"learning_rate": 0.1,
|
| 42145 |
+
"loss": 2.4689693450927734,
|
| 42146 |
+
"step": 12032
|
| 42147 |
+
},
|
| 42148 |
+
{
|
| 42149 |
+
"epoch": 0.38203174603174606,
|
| 42150 |
+
"grad_norm": 0.0859375,
|
| 42151 |
+
"learning_rate": 0.1,
|
| 42152 |
+
"loss": 2.458057403564453,
|
| 42153 |
+
"step": 12034
|
| 42154 |
+
},
|
| 42155 |
+
{
|
| 42156 |
+
"epoch": 0.3820952380952381,
|
| 42157 |
+
"grad_norm": 0.1318359375,
|
| 42158 |
+
"learning_rate": 0.1,
|
| 42159 |
+
"loss": 2.489271879196167,
|
| 42160 |
+
"step": 12036
|
| 42161 |
+
},
|
| 42162 |
+
{
|
| 42163 |
+
"epoch": 0.38215873015873014,
|
| 42164 |
+
"grad_norm": 0.12109375,
|
| 42165 |
+
"learning_rate": 0.1,
|
| 42166 |
+
"loss": 2.4576919078826904,
|
| 42167 |
+
"step": 12038
|
| 42168 |
+
},
|
| 42169 |
+
{
|
| 42170 |
+
"epoch": 0.38222222222222224,
|
| 42171 |
+
"grad_norm": 0.228515625,
|
| 42172 |
+
"learning_rate": 0.1,
|
| 42173 |
+
"loss": 2.463383674621582,
|
| 42174 |
+
"step": 12040
|
| 42175 |
+
},
|
| 42176 |
+
{
|
| 42177 |
+
"epoch": 0.3822857142857143,
|
| 42178 |
+
"grad_norm": 0.322265625,
|
| 42179 |
+
"learning_rate": 0.1,
|
| 42180 |
+
"loss": 2.4628543853759766,
|
| 42181 |
+
"step": 12042
|
| 42182 |
+
},
|
| 42183 |
+
{
|
| 42184 |
+
"epoch": 0.3823492063492063,
|
| 42185 |
+
"grad_norm": 0.291015625,
|
| 42186 |
+
"learning_rate": 0.1,
|
| 42187 |
+
"loss": 2.4454421997070312,
|
| 42188 |
+
"step": 12044
|
| 42189 |
+
},
|
| 42190 |
+
{
|
| 42191 |
+
"epoch": 0.3824126984126984,
|
| 42192 |
+
"grad_norm": 0.06005859375,
|
| 42193 |
+
"learning_rate": 0.1,
|
| 42194 |
+
"loss": 2.4660277366638184,
|
| 42195 |
+
"step": 12046
|
| 42196 |
+
},
|
| 42197 |
+
{
|
| 42198 |
+
"epoch": 0.38247619047619047,
|
| 42199 |
+
"grad_norm": 0.216796875,
|
| 42200 |
+
"learning_rate": 0.1,
|
| 42201 |
+
"loss": 2.484807252883911,
|
| 42202 |
+
"step": 12048
|
| 42203 |
+
},
|
| 42204 |
+
{
|
| 42205 |
+
"epoch": 0.3825396825396825,
|
| 42206 |
+
"grad_norm": 0.380859375,
|
| 42207 |
+
"learning_rate": 0.1,
|
| 42208 |
+
"loss": 2.485915184020996,
|
| 42209 |
+
"step": 12050
|
| 42210 |
+
},
|
| 42211 |
+
{
|
| 42212 |
+
"epoch": 0.3826031746031746,
|
| 42213 |
+
"grad_norm": 0.3046875,
|
| 42214 |
+
"learning_rate": 0.1,
|
| 42215 |
+
"loss": 2.4765751361846924,
|
| 42216 |
+
"step": 12052
|
| 42217 |
+
},
|
| 42218 |
+
{
|
| 42219 |
+
"epoch": 0.38266666666666665,
|
| 42220 |
+
"grad_norm": 0.0615234375,
|
| 42221 |
+
"learning_rate": 0.1,
|
| 42222 |
+
"loss": 2.45377516746521,
|
| 42223 |
+
"step": 12054
|
| 42224 |
+
},
|
| 42225 |
+
{
|
| 42226 |
+
"epoch": 0.38273015873015875,
|
| 42227 |
+
"grad_norm": 0.1005859375,
|
| 42228 |
+
"learning_rate": 0.1,
|
| 42229 |
+
"loss": 2.477993965148926,
|
| 42230 |
+
"step": 12056
|
| 42231 |
+
},
|
| 42232 |
+
{
|
| 42233 |
+
"epoch": 0.3827936507936508,
|
| 42234 |
+
"grad_norm": 0.1787109375,
|
| 42235 |
+
"learning_rate": 0.1,
|
| 42236 |
+
"loss": 2.476003646850586,
|
| 42237 |
+
"step": 12058
|
| 42238 |
+
},
|
| 42239 |
+
{
|
| 42240 |
+
"epoch": 0.38285714285714284,
|
| 42241 |
+
"grad_norm": 0.26171875,
|
| 42242 |
+
"learning_rate": 0.1,
|
| 42243 |
+
"loss": 2.4513702392578125,
|
| 42244 |
+
"step": 12060
|
| 42245 |
+
},
|
| 42246 |
+
{
|
| 42247 |
+
"epoch": 0.38292063492063494,
|
| 42248 |
+
"grad_norm": 0.2490234375,
|
| 42249 |
+
"learning_rate": 0.1,
|
| 42250 |
+
"loss": 2.481924295425415,
|
| 42251 |
+
"step": 12062
|
| 42252 |
+
},
|
| 42253 |
+
{
|
| 42254 |
+
"epoch": 0.382984126984127,
|
| 42255 |
+
"grad_norm": 0.2119140625,
|
| 42256 |
+
"learning_rate": 0.1,
|
| 42257 |
+
"loss": 2.4330499172210693,
|
| 42258 |
+
"step": 12064
|
| 42259 |
+
},
|
| 42260 |
+
{
|
| 42261 |
+
"epoch": 0.383047619047619,
|
| 42262 |
+
"grad_norm": 0.1015625,
|
| 42263 |
+
"learning_rate": 0.1,
|
| 42264 |
+
"loss": 2.4708168506622314,
|
| 42265 |
+
"step": 12066
|
| 42266 |
+
},
|
| 42267 |
+
{
|
| 42268 |
+
"epoch": 0.3831111111111111,
|
| 42269 |
+
"grad_norm": 0.07421875,
|
| 42270 |
+
"learning_rate": 0.1,
|
| 42271 |
+
"loss": 2.4653377532958984,
|
| 42272 |
+
"step": 12068
|
| 42273 |
+
},
|
| 42274 |
+
{
|
| 42275 |
+
"epoch": 0.38317460317460317,
|
| 42276 |
+
"grad_norm": 0.16796875,
|
| 42277 |
+
"learning_rate": 0.1,
|
| 42278 |
+
"loss": 2.5031898021698,
|
| 42279 |
+
"step": 12070
|
| 42280 |
+
},
|
| 42281 |
+
{
|
| 42282 |
+
"epoch": 0.3832380952380952,
|
| 42283 |
+
"grad_norm": 0.1279296875,
|
| 42284 |
+
"learning_rate": 0.1,
|
| 42285 |
+
"loss": 2.475032329559326,
|
| 42286 |
+
"step": 12072
|
| 42287 |
+
},
|
| 42288 |
+
{
|
| 42289 |
+
"epoch": 0.3833015873015873,
|
| 42290 |
+
"grad_norm": 0.11865234375,
|
| 42291 |
+
"learning_rate": 0.1,
|
| 42292 |
+
"loss": 2.4548768997192383,
|
| 42293 |
+
"step": 12074
|
| 42294 |
+
},
|
| 42295 |
+
{
|
| 42296 |
+
"epoch": 0.38336507936507935,
|
| 42297 |
+
"grad_norm": 0.337890625,
|
| 42298 |
+
"learning_rate": 0.1,
|
| 42299 |
+
"loss": 2.457284688949585,
|
| 42300 |
+
"step": 12076
|
| 42301 |
+
},
|
| 42302 |
+
{
|
| 42303 |
+
"epoch": 0.38342857142857145,
|
| 42304 |
+
"grad_norm": 0.2265625,
|
| 42305 |
+
"learning_rate": 0.1,
|
| 42306 |
+
"loss": 2.4320826530456543,
|
| 42307 |
+
"step": 12078
|
| 42308 |
+
},
|
| 42309 |
+
{
|
| 42310 |
+
"epoch": 0.3834920634920635,
|
| 42311 |
+
"grad_norm": 0.0908203125,
|
| 42312 |
+
"learning_rate": 0.1,
|
| 42313 |
+
"loss": 2.460562229156494,
|
| 42314 |
+
"step": 12080
|
| 42315 |
+
},
|
| 42316 |
+
{
|
| 42317 |
+
"epoch": 0.38355555555555554,
|
| 42318 |
+
"grad_norm": 0.44140625,
|
| 42319 |
+
"learning_rate": 0.1,
|
| 42320 |
+
"loss": 2.4694409370422363,
|
| 42321 |
+
"step": 12082
|
| 42322 |
+
},
|
| 42323 |
+
{
|
| 42324 |
+
"epoch": 0.38361904761904764,
|
| 42325 |
+
"grad_norm": 0.21484375,
|
| 42326 |
+
"learning_rate": 0.1,
|
| 42327 |
+
"loss": 2.4640471935272217,
|
| 42328 |
+
"step": 12084
|
| 42329 |
+
},
|
| 42330 |
+
{
|
| 42331 |
+
"epoch": 0.3836825396825397,
|
| 42332 |
+
"grad_norm": 0.11181640625,
|
| 42333 |
+
"learning_rate": 0.1,
|
| 42334 |
+
"loss": 2.480008363723755,
|
| 42335 |
+
"step": 12086
|
| 42336 |
+
},
|
| 42337 |
+
{
|
| 42338 |
+
"epoch": 0.3837460317460317,
|
| 42339 |
+
"grad_norm": 0.09765625,
|
| 42340 |
+
"learning_rate": 0.1,
|
| 42341 |
+
"loss": 2.4768412113189697,
|
| 42342 |
+
"step": 12088
|
| 42343 |
+
},
|
| 42344 |
+
{
|
| 42345 |
+
"epoch": 0.3838095238095238,
|
| 42346 |
+
"grad_norm": 0.1591796875,
|
| 42347 |
+
"learning_rate": 0.1,
|
| 42348 |
+
"loss": 2.4503283500671387,
|
| 42349 |
+
"step": 12090
|
| 42350 |
+
},
|
| 42351 |
+
{
|
| 42352 |
+
"epoch": 0.38387301587301587,
|
| 42353 |
+
"grad_norm": 0.474609375,
|
| 42354 |
+
"learning_rate": 0.1,
|
| 42355 |
+
"loss": 2.4610016345977783,
|
| 42356 |
+
"step": 12092
|
| 42357 |
+
},
|
| 42358 |
+
{
|
| 42359 |
+
"epoch": 0.3839365079365079,
|
| 42360 |
+
"grad_norm": 0.486328125,
|
| 42361 |
+
"learning_rate": 0.1,
|
| 42362 |
+
"loss": 2.4768073558807373,
|
| 42363 |
+
"step": 12094
|
| 42364 |
+
},
|
| 42365 |
+
{
|
| 42366 |
+
"epoch": 0.384,
|
| 42367 |
+
"grad_norm": 0.046142578125,
|
| 42368 |
+
"learning_rate": 0.1,
|
| 42369 |
+
"loss": 2.4615330696105957,
|
| 42370 |
+
"step": 12096
|
| 42371 |
+
},
|
| 42372 |
+
{
|
| 42373 |
+
"epoch": 0.38406349206349205,
|
| 42374 |
+
"grad_norm": 0.16015625,
|
| 42375 |
+
"learning_rate": 0.1,
|
| 42376 |
+
"loss": 2.4912209510803223,
|
| 42377 |
+
"step": 12098
|
| 42378 |
+
},
|
| 42379 |
+
{
|
| 42380 |
+
"epoch": 0.38412698412698415,
|
| 42381 |
+
"grad_norm": 0.2109375,
|
| 42382 |
+
"learning_rate": 0.1,
|
| 42383 |
+
"loss": 2.4521989822387695,
|
| 42384 |
+
"step": 12100
|
| 42385 |
+
},
|
| 42386 |
+
{
|
| 42387 |
+
"epoch": 0.3841904761904762,
|
| 42388 |
+
"grad_norm": 0.1845703125,
|
| 42389 |
+
"learning_rate": 0.1,
|
| 42390 |
+
"loss": 2.466559886932373,
|
| 42391 |
+
"step": 12102
|
| 42392 |
+
},
|
| 42393 |
+
{
|
| 42394 |
+
"epoch": 0.38425396825396824,
|
| 42395 |
+
"grad_norm": 0.1630859375,
|
| 42396 |
+
"learning_rate": 0.1,
|
| 42397 |
+
"loss": 2.465717315673828,
|
| 42398 |
+
"step": 12104
|
| 42399 |
+
},
|
| 42400 |
+
{
|
| 42401 |
+
"epoch": 0.38431746031746034,
|
| 42402 |
+
"grad_norm": 0.1630859375,
|
| 42403 |
+
"learning_rate": 0.1,
|
| 42404 |
+
"loss": 2.475071907043457,
|
| 42405 |
+
"step": 12106
|
| 42406 |
+
},
|
| 42407 |
+
{
|
| 42408 |
+
"epoch": 0.3843809523809524,
|
| 42409 |
+
"grad_norm": 0.470703125,
|
| 42410 |
+
"learning_rate": 0.1,
|
| 42411 |
+
"loss": 2.5202531814575195,
|
| 42412 |
+
"step": 12108
|
| 42413 |
+
},
|
| 42414 |
+
{
|
| 42415 |
+
"epoch": 0.3844444444444444,
|
| 42416 |
+
"grad_norm": 0.470703125,
|
| 42417 |
+
"learning_rate": 0.1,
|
| 42418 |
+
"loss": 2.4715182781219482,
|
| 42419 |
+
"step": 12110
|
| 42420 |
+
},
|
| 42421 |
+
{
|
| 42422 |
+
"epoch": 0.3845079365079365,
|
| 42423 |
+
"grad_norm": 0.1162109375,
|
| 42424 |
+
"learning_rate": 0.1,
|
| 42425 |
+
"loss": 2.455341100692749,
|
| 42426 |
+
"step": 12112
|
| 42427 |
+
},
|
| 42428 |
+
{
|
| 42429 |
+
"epoch": 0.38457142857142856,
|
| 42430 |
+
"grad_norm": 0.10400390625,
|
| 42431 |
+
"learning_rate": 0.1,
|
| 42432 |
+
"loss": 2.4790189266204834,
|
| 42433 |
+
"step": 12114
|
| 42434 |
+
},
|
| 42435 |
+
{
|
| 42436 |
+
"epoch": 0.3846349206349206,
|
| 42437 |
+
"grad_norm": 0.06884765625,
|
| 42438 |
+
"learning_rate": 0.1,
|
| 42439 |
+
"loss": 2.47458553314209,
|
| 42440 |
+
"step": 12116
|
| 42441 |
+
},
|
| 42442 |
+
{
|
| 42443 |
+
"epoch": 0.3846984126984127,
|
| 42444 |
+
"grad_norm": 0.07763671875,
|
| 42445 |
+
"learning_rate": 0.1,
|
| 42446 |
+
"loss": 2.475902557373047,
|
| 42447 |
+
"step": 12118
|
| 42448 |
+
},
|
| 42449 |
+
{
|
| 42450 |
+
"epoch": 0.38476190476190475,
|
| 42451 |
+
"grad_norm": 0.12255859375,
|
| 42452 |
+
"learning_rate": 0.1,
|
| 42453 |
+
"loss": 2.498180389404297,
|
| 42454 |
+
"step": 12120
|
| 42455 |
+
},
|
| 42456 |
+
{
|
| 42457 |
+
"epoch": 0.38482539682539685,
|
| 42458 |
+
"grad_norm": 0.3828125,
|
| 42459 |
+
"learning_rate": 0.1,
|
| 42460 |
+
"loss": 2.507518768310547,
|
| 42461 |
+
"step": 12122
|
| 42462 |
+
},
|
| 42463 |
+
{
|
| 42464 |
+
"epoch": 0.3848888888888889,
|
| 42465 |
+
"grad_norm": 0.2041015625,
|
| 42466 |
+
"learning_rate": 0.1,
|
| 42467 |
+
"loss": 2.4781503677368164,
|
| 42468 |
+
"step": 12124
|
| 42469 |
+
},
|
| 42470 |
+
{
|
| 42471 |
+
"epoch": 0.38495238095238093,
|
| 42472 |
+
"grad_norm": 0.06396484375,
|
| 42473 |
+
"learning_rate": 0.1,
|
| 42474 |
+
"loss": 2.5078864097595215,
|
| 42475 |
+
"step": 12126
|
| 42476 |
+
},
|
| 42477 |
+
{
|
| 42478 |
+
"epoch": 0.38501587301587303,
|
| 42479 |
+
"grad_norm": 0.0859375,
|
| 42480 |
+
"learning_rate": 0.1,
|
| 42481 |
+
"loss": 2.479459285736084,
|
| 42482 |
+
"step": 12128
|
| 42483 |
+
},
|
| 42484 |
+
{
|
| 42485 |
+
"epoch": 0.3850793650793651,
|
| 42486 |
+
"grad_norm": 0.138671875,
|
| 42487 |
+
"learning_rate": 0.1,
|
| 42488 |
+
"loss": 2.5114645957946777,
|
| 42489 |
+
"step": 12130
|
| 42490 |
+
},
|
| 42491 |
+
{
|
| 42492 |
+
"epoch": 0.3851428571428571,
|
| 42493 |
+
"grad_norm": 0.29296875,
|
| 42494 |
+
"learning_rate": 0.1,
|
| 42495 |
+
"loss": 2.4905712604522705,
|
| 42496 |
+
"step": 12132
|
| 42497 |
+
},
|
| 42498 |
+
{
|
| 42499 |
+
"epoch": 0.3852063492063492,
|
| 42500 |
+
"grad_norm": 0.12060546875,
|
| 42501 |
+
"learning_rate": 0.1,
|
| 42502 |
+
"loss": 2.4662930965423584,
|
| 42503 |
+
"step": 12134
|
| 42504 |
+
},
|
| 42505 |
+
{
|
| 42506 |
+
"epoch": 0.38526984126984126,
|
| 42507 |
+
"grad_norm": 0.177734375,
|
| 42508 |
+
"learning_rate": 0.1,
|
| 42509 |
+
"loss": 2.484269857406616,
|
| 42510 |
+
"step": 12136
|
| 42511 |
+
},
|
| 42512 |
+
{
|
| 42513 |
+
"epoch": 0.38533333333333336,
|
| 42514 |
+
"grad_norm": 0.265625,
|
| 42515 |
+
"learning_rate": 0.1,
|
| 42516 |
+
"loss": 2.501328229904175,
|
| 42517 |
+
"step": 12138
|
| 42518 |
+
},
|
| 42519 |
+
{
|
| 42520 |
+
"epoch": 0.3853968253968254,
|
| 42521 |
+
"grad_norm": 0.068359375,
|
| 42522 |
+
"learning_rate": 0.1,
|
| 42523 |
+
"loss": 2.4805939197540283,
|
| 42524 |
+
"step": 12140
|
| 42525 |
+
},
|
| 42526 |
+
{
|
| 42527 |
+
"epoch": 0.38546031746031745,
|
| 42528 |
+
"grad_norm": 0.2392578125,
|
| 42529 |
+
"learning_rate": 0.1,
|
| 42530 |
+
"loss": 2.4765126705169678,
|
| 42531 |
+
"step": 12142
|
| 42532 |
+
},
|
| 42533 |
+
{
|
| 42534 |
+
"epoch": 0.38552380952380955,
|
| 42535 |
+
"grad_norm": 0.365234375,
|
| 42536 |
+
"learning_rate": 0.1,
|
| 42537 |
+
"loss": 2.500314474105835,
|
| 42538 |
+
"step": 12144
|
| 42539 |
+
},
|
| 42540 |
+
{
|
| 42541 |
+
"epoch": 0.3855873015873016,
|
| 42542 |
+
"grad_norm": 0.125,
|
| 42543 |
+
"learning_rate": 0.1,
|
| 42544 |
+
"loss": 2.5027472972869873,
|
| 42545 |
+
"step": 12146
|
| 42546 |
+
},
|
| 42547 |
+
{
|
| 42548 |
+
"epoch": 0.38565079365079363,
|
| 42549 |
+
"grad_norm": 0.2890625,
|
| 42550 |
+
"learning_rate": 0.1,
|
| 42551 |
+
"loss": 2.473562240600586,
|
| 42552 |
+
"step": 12148
|
| 42553 |
+
},
|
| 42554 |
+
{
|
| 42555 |
+
"epoch": 0.38571428571428573,
|
| 42556 |
+
"grad_norm": 0.2890625,
|
| 42557 |
+
"learning_rate": 0.1,
|
| 42558 |
+
"loss": 2.511924982070923,
|
| 42559 |
+
"step": 12150
|
| 42560 |
+
},
|
| 42561 |
+
{
|
| 42562 |
+
"epoch": 0.3857777777777778,
|
| 42563 |
+
"grad_norm": 0.140625,
|
| 42564 |
+
"learning_rate": 0.1,
|
| 42565 |
+
"loss": 2.4819207191467285,
|
| 42566 |
+
"step": 12152
|
| 42567 |
+
},
|
| 42568 |
+
{
|
| 42569 |
+
"epoch": 0.3858412698412698,
|
| 42570 |
+
"grad_norm": 0.06591796875,
|
| 42571 |
+
"learning_rate": 0.1,
|
| 42572 |
+
"loss": 2.4831345081329346,
|
| 42573 |
+
"step": 12154
|
| 42574 |
+
},
|
| 42575 |
+
{
|
| 42576 |
+
"epoch": 0.3859047619047619,
|
| 42577 |
+
"grad_norm": 0.0849609375,
|
| 42578 |
+
"learning_rate": 0.1,
|
| 42579 |
+
"loss": 2.4863007068634033,
|
| 42580 |
+
"step": 12156
|
| 42581 |
+
},
|
| 42582 |
+
{
|
| 42583 |
+
"epoch": 0.38596825396825396,
|
| 42584 |
+
"grad_norm": 0.30859375,
|
| 42585 |
+
"learning_rate": 0.1,
|
| 42586 |
+
"loss": 2.460641622543335,
|
| 42587 |
+
"step": 12158
|
| 42588 |
+
},
|
| 42589 |
+
{
|
| 42590 |
+
"epoch": 0.38603174603174606,
|
| 42591 |
+
"grad_norm": 0.392578125,
|
| 42592 |
+
"learning_rate": 0.1,
|
| 42593 |
+
"loss": 2.4990243911743164,
|
| 42594 |
+
"step": 12160
|
| 42595 |
+
},
|
| 42596 |
+
{
|
| 42597 |
+
"epoch": 0.3860952380952381,
|
| 42598 |
+
"grad_norm": 0.1455078125,
|
| 42599 |
+
"learning_rate": 0.1,
|
| 42600 |
+
"loss": 2.5030386447906494,
|
| 42601 |
+
"step": 12162
|
| 42602 |
+
},
|
| 42603 |
+
{
|
| 42604 |
+
"epoch": 0.38615873015873015,
|
| 42605 |
+
"grad_norm": 0.248046875,
|
| 42606 |
+
"learning_rate": 0.1,
|
| 42607 |
+
"loss": 2.4420783519744873,
|
| 42608 |
+
"step": 12164
|
| 42609 |
+
},
|
| 42610 |
+
{
|
| 42611 |
+
"epoch": 0.38622222222222224,
|
| 42612 |
+
"grad_norm": 0.341796875,
|
| 42613 |
+
"learning_rate": 0.1,
|
| 42614 |
+
"loss": 2.504606008529663,
|
| 42615 |
+
"step": 12166
|
| 42616 |
+
},
|
| 42617 |
+
{
|
| 42618 |
+
"epoch": 0.3862857142857143,
|
| 42619 |
+
"grad_norm": 0.06640625,
|
| 42620 |
+
"learning_rate": 0.1,
|
| 42621 |
+
"loss": 2.5177011489868164,
|
| 42622 |
+
"step": 12168
|
| 42623 |
+
},
|
| 42624 |
+
{
|
| 42625 |
+
"epoch": 0.38634920634920633,
|
| 42626 |
+
"grad_norm": 0.119140625,
|
| 42627 |
+
"learning_rate": 0.1,
|
| 42628 |
+
"loss": 2.5138041973114014,
|
| 42629 |
+
"step": 12170
|
| 42630 |
+
},
|
| 42631 |
+
{
|
| 42632 |
+
"epoch": 0.38641269841269843,
|
| 42633 |
+
"grad_norm": 0.11181640625,
|
| 42634 |
+
"learning_rate": 0.1,
|
| 42635 |
+
"loss": 2.4538416862487793,
|
| 42636 |
+
"step": 12172
|
| 42637 |
+
},
|
| 42638 |
+
{
|
| 42639 |
+
"epoch": 0.3864761904761905,
|
| 42640 |
+
"grad_norm": 0.1640625,
|
| 42641 |
+
"learning_rate": 0.1,
|
| 42642 |
+
"loss": 2.4983925819396973,
|
| 42643 |
+
"step": 12174
|
| 42644 |
+
},
|
| 42645 |
+
{
|
| 42646 |
+
"epoch": 0.3865396825396825,
|
| 42647 |
+
"grad_norm": 0.177734375,
|
| 42648 |
+
"learning_rate": 0.1,
|
| 42649 |
+
"loss": 2.472153425216675,
|
| 42650 |
+
"step": 12176
|
| 42651 |
+
},
|
| 42652 |
+
{
|
| 42653 |
+
"epoch": 0.3866031746031746,
|
| 42654 |
+
"grad_norm": 0.08544921875,
|
| 42655 |
+
"learning_rate": 0.1,
|
| 42656 |
+
"loss": 2.4816946983337402,
|
| 42657 |
+
"step": 12178
|
| 42658 |
+
},
|
| 42659 |
+
{
|
| 42660 |
+
"epoch": 0.38666666666666666,
|
| 42661 |
+
"grad_norm": 0.1767578125,
|
| 42662 |
+
"learning_rate": 0.1,
|
| 42663 |
+
"loss": 2.5129659175872803,
|
| 42664 |
+
"step": 12180
|
| 42665 |
+
},
|
| 42666 |
+
{
|
| 42667 |
+
"epoch": 0.38673015873015876,
|
| 42668 |
+
"grad_norm": 0.11376953125,
|
| 42669 |
+
"learning_rate": 0.1,
|
| 42670 |
+
"loss": 2.4627156257629395,
|
| 42671 |
+
"step": 12182
|
| 42672 |
+
},
|
| 42673 |
+
{
|
| 42674 |
+
"epoch": 0.3867936507936508,
|
| 42675 |
+
"grad_norm": 0.3984375,
|
| 42676 |
+
"learning_rate": 0.1,
|
| 42677 |
+
"loss": 2.494533061981201,
|
| 42678 |
+
"step": 12184
|
| 42679 |
+
},
|
| 42680 |
+
{
|
| 42681 |
+
"epoch": 0.38685714285714284,
|
| 42682 |
+
"grad_norm": 0.3515625,
|
| 42683 |
+
"learning_rate": 0.1,
|
| 42684 |
+
"loss": 2.471395254135132,
|
| 42685 |
+
"step": 12186
|
| 42686 |
+
},
|
| 42687 |
+
{
|
| 42688 |
+
"epoch": 0.38692063492063494,
|
| 42689 |
+
"grad_norm": 0.1337890625,
|
| 42690 |
+
"learning_rate": 0.1,
|
| 42691 |
+
"loss": 2.4891700744628906,
|
| 42692 |
+
"step": 12188
|
| 42693 |
+
},
|
| 42694 |
+
{
|
| 42695 |
+
"epoch": 0.386984126984127,
|
| 42696 |
+
"grad_norm": 0.1474609375,
|
| 42697 |
+
"learning_rate": 0.1,
|
| 42698 |
+
"loss": 2.5044522285461426,
|
| 42699 |
+
"step": 12190
|
| 42700 |
+
},
|
| 42701 |
+
{
|
| 42702 |
+
"epoch": 0.38704761904761903,
|
| 42703 |
+
"grad_norm": 0.2890625,
|
| 42704 |
+
"learning_rate": 0.1,
|
| 42705 |
+
"loss": 2.487046241760254,
|
| 42706 |
+
"step": 12192
|
| 42707 |
+
},
|
| 42708 |
+
{
|
| 42709 |
+
"epoch": 0.38711111111111113,
|
| 42710 |
+
"grad_norm": 0.1708984375,
|
| 42711 |
+
"learning_rate": 0.1,
|
| 42712 |
+
"loss": 2.514503002166748,
|
| 42713 |
+
"step": 12194
|
| 42714 |
+
},
|
| 42715 |
+
{
|
| 42716 |
+
"epoch": 0.38717460317460317,
|
| 42717 |
+
"grad_norm": 0.232421875,
|
| 42718 |
+
"learning_rate": 0.1,
|
| 42719 |
+
"loss": 2.4909095764160156,
|
| 42720 |
+
"step": 12196
|
| 42721 |
+
},
|
| 42722 |
+
{
|
| 42723 |
+
"epoch": 0.3872380952380952,
|
| 42724 |
+
"grad_norm": 0.224609375,
|
| 42725 |
+
"learning_rate": 0.1,
|
| 42726 |
+
"loss": 2.4744057655334473,
|
| 42727 |
+
"step": 12198
|
| 42728 |
+
},
|
| 42729 |
+
{
|
| 42730 |
+
"epoch": 0.3873015873015873,
|
| 42731 |
+
"grad_norm": 0.1103515625,
|
| 42732 |
+
"learning_rate": 0.1,
|
| 42733 |
+
"loss": 2.4896419048309326,
|
| 42734 |
+
"step": 12200
|
| 42735 |
+
},
|
| 42736 |
+
{
|
| 42737 |
+
"epoch": 0.38736507936507936,
|
| 42738 |
+
"grad_norm": 0.119140625,
|
| 42739 |
+
"learning_rate": 0.1,
|
| 42740 |
+
"loss": 2.4401516914367676,
|
| 42741 |
+
"step": 12202
|
| 42742 |
+
},
|
| 42743 |
+
{
|
| 42744 |
+
"epoch": 0.38742857142857146,
|
| 42745 |
+
"grad_norm": 0.103515625,
|
| 42746 |
+
"learning_rate": 0.1,
|
| 42747 |
+
"loss": 2.45546293258667,
|
| 42748 |
+
"step": 12204
|
| 42749 |
+
},
|
| 42750 |
+
{
|
| 42751 |
+
"epoch": 0.3874920634920635,
|
| 42752 |
+
"grad_norm": 0.203125,
|
| 42753 |
+
"learning_rate": 0.1,
|
| 42754 |
+
"loss": 2.4839749336242676,
|
| 42755 |
+
"step": 12206
|
| 42756 |
+
},
|
| 42757 |
+
{
|
| 42758 |
+
"epoch": 0.38755555555555554,
|
| 42759 |
+
"grad_norm": 0.09375,
|
| 42760 |
+
"learning_rate": 0.1,
|
| 42761 |
+
"loss": 2.457061767578125,
|
| 42762 |
+
"step": 12208
|
| 42763 |
+
},
|
| 42764 |
+
{
|
| 42765 |
+
"epoch": 0.38761904761904764,
|
| 42766 |
+
"grad_norm": 0.1591796875,
|
| 42767 |
+
"learning_rate": 0.1,
|
| 42768 |
+
"loss": 2.4803502559661865,
|
| 42769 |
+
"step": 12210
|
| 42770 |
+
},
|
| 42771 |
+
{
|
| 42772 |
+
"epoch": 0.3876825396825397,
|
| 42773 |
+
"grad_norm": 0.10791015625,
|
| 42774 |
+
"learning_rate": 0.1,
|
| 42775 |
+
"loss": 2.4752790927886963,
|
| 42776 |
+
"step": 12212
|
| 42777 |
+
},
|
| 42778 |
+
{
|
| 42779 |
+
"epoch": 0.3877460317460317,
|
| 42780 |
+
"grad_norm": 0.1796875,
|
| 42781 |
+
"learning_rate": 0.1,
|
| 42782 |
+
"loss": 2.475409746170044,
|
| 42783 |
+
"step": 12214
|
| 42784 |
+
},
|
| 42785 |
+
{
|
| 42786 |
+
"epoch": 0.3878095238095238,
|
| 42787 |
+
"grad_norm": 0.369140625,
|
| 42788 |
+
"learning_rate": 0.1,
|
| 42789 |
+
"loss": 2.4633171558380127,
|
| 42790 |
+
"step": 12216
|
| 42791 |
+
},
|
| 42792 |
+
{
|
| 42793 |
+
"epoch": 0.38787301587301587,
|
| 42794 |
+
"grad_norm": 0.1923828125,
|
| 42795 |
+
"learning_rate": 0.1,
|
| 42796 |
+
"loss": 2.451033115386963,
|
| 42797 |
+
"step": 12218
|
| 42798 |
+
},
|
| 42799 |
+
{
|
| 42800 |
+
"epoch": 0.3879365079365079,
|
| 42801 |
+
"grad_norm": 0.263671875,
|
| 42802 |
+
"learning_rate": 0.1,
|
| 42803 |
+
"loss": 2.4517745971679688,
|
| 42804 |
+
"step": 12220
|
| 42805 |
+
},
|
| 42806 |
+
{
|
| 42807 |
+
"epoch": 0.388,
|
| 42808 |
+
"grad_norm": 0.267578125,
|
| 42809 |
+
"learning_rate": 0.1,
|
| 42810 |
+
"loss": 2.4649953842163086,
|
| 42811 |
+
"step": 12222
|
| 42812 |
+
},
|
| 42813 |
+
{
|
| 42814 |
+
"epoch": 0.38806349206349205,
|
| 42815 |
+
"grad_norm": 0.11376953125,
|
| 42816 |
+
"learning_rate": 0.1,
|
| 42817 |
+
"loss": 2.4534993171691895,
|
| 42818 |
+
"step": 12224
|
| 42819 |
+
},
|
| 42820 |
+
{
|
| 42821 |
+
"epoch": 0.38812698412698415,
|
| 42822 |
+
"grad_norm": 0.06884765625,
|
| 42823 |
+
"learning_rate": 0.1,
|
| 42824 |
+
"loss": 2.4374120235443115,
|
| 42825 |
+
"step": 12226
|
| 42826 |
+
},
|
| 42827 |
+
{
|
| 42828 |
+
"epoch": 0.3881904761904762,
|
| 42829 |
+
"grad_norm": 0.1123046875,
|
| 42830 |
+
"learning_rate": 0.1,
|
| 42831 |
+
"loss": 2.444390296936035,
|
| 42832 |
+
"step": 12228
|
| 42833 |
+
},
|
| 42834 |
+
{
|
| 42835 |
+
"epoch": 0.38825396825396824,
|
| 42836 |
+
"grad_norm": 0.1640625,
|
| 42837 |
+
"learning_rate": 0.1,
|
| 42838 |
+
"loss": 2.4295685291290283,
|
| 42839 |
+
"step": 12230
|
| 42840 |
+
},
|
| 42841 |
+
{
|
| 42842 |
+
"epoch": 0.38831746031746034,
|
| 42843 |
+
"grad_norm": 0.267578125,
|
| 42844 |
+
"learning_rate": 0.1,
|
| 42845 |
+
"loss": 2.4740664958953857,
|
| 42846 |
+
"step": 12232
|
| 42847 |
+
},
|
| 42848 |
+
{
|
| 42849 |
+
"epoch": 0.3883809523809524,
|
| 42850 |
+
"grad_norm": 0.150390625,
|
| 42851 |
+
"learning_rate": 0.1,
|
| 42852 |
+
"loss": 2.4552464485168457,
|
| 42853 |
+
"step": 12234
|
| 42854 |
+
},
|
| 42855 |
+
{
|
| 42856 |
+
"epoch": 0.3884444444444444,
|
| 42857 |
+
"grad_norm": 0.248046875,
|
| 42858 |
+
"learning_rate": 0.1,
|
| 42859 |
+
"loss": 2.469726085662842,
|
| 42860 |
+
"step": 12236
|
| 42861 |
+
},
|
| 42862 |
+
{
|
| 42863 |
+
"epoch": 0.3885079365079365,
|
| 42864 |
+
"grad_norm": 0.423828125,
|
| 42865 |
+
"learning_rate": 0.1,
|
| 42866 |
+
"loss": 2.4829745292663574,
|
| 42867 |
+
"step": 12238
|
| 42868 |
+
},
|
| 42869 |
+
{
|
| 42870 |
+
"epoch": 0.38857142857142857,
|
| 42871 |
+
"grad_norm": 0.337890625,
|
| 42872 |
+
"learning_rate": 0.1,
|
| 42873 |
+
"loss": 2.461160182952881,
|
| 42874 |
+
"step": 12240
|
| 42875 |
+
},
|
| 42876 |
+
{
|
| 42877 |
+
"epoch": 0.3886349206349206,
|
| 42878 |
+
"grad_norm": 0.10888671875,
|
| 42879 |
+
"learning_rate": 0.1,
|
| 42880 |
+
"loss": 2.465991973876953,
|
| 42881 |
+
"step": 12242
|
| 42882 |
+
},
|
| 42883 |
+
{
|
| 42884 |
+
"epoch": 0.3886984126984127,
|
| 42885 |
+
"grad_norm": 0.181640625,
|
| 42886 |
+
"learning_rate": 0.1,
|
| 42887 |
+
"loss": 2.464043617248535,
|
| 42888 |
+
"step": 12244
|
| 42889 |
+
},
|
| 42890 |
+
{
|
| 42891 |
+
"epoch": 0.38876190476190475,
|
| 42892 |
+
"grad_norm": 0.140625,
|
| 42893 |
+
"learning_rate": 0.1,
|
| 42894 |
+
"loss": 2.4663949012756348,
|
| 42895 |
+
"step": 12246
|
| 42896 |
+
},
|
| 42897 |
+
{
|
| 42898 |
+
"epoch": 0.38882539682539685,
|
| 42899 |
+
"grad_norm": 0.15625,
|
| 42900 |
+
"learning_rate": 0.1,
|
| 42901 |
+
"loss": 2.4549779891967773,
|
| 42902 |
+
"step": 12248
|
| 42903 |
+
},
|
| 42904 |
+
{
|
| 42905 |
+
"epoch": 0.3888888888888889,
|
| 42906 |
+
"grad_norm": 0.140625,
|
| 42907 |
+
"learning_rate": 0.1,
|
| 42908 |
+
"loss": 2.4570398330688477,
|
| 42909 |
+
"step": 12250
|
| 42910 |
+
},
|
| 42911 |
+
{
|
| 42912 |
+
"epoch": 0.38895238095238094,
|
| 42913 |
+
"grad_norm": 0.10400390625,
|
| 42914 |
+
"learning_rate": 0.1,
|
| 42915 |
+
"loss": 2.4533965587615967,
|
| 42916 |
+
"step": 12252
|
| 42917 |
+
},
|
| 42918 |
+
{
|
| 42919 |
+
"epoch": 0.38901587301587304,
|
| 42920 |
+
"grad_norm": 0.1552734375,
|
| 42921 |
+
"learning_rate": 0.1,
|
| 42922 |
+
"loss": 2.461820363998413,
|
| 42923 |
+
"step": 12254
|
| 42924 |
+
},
|
| 42925 |
+
{
|
| 42926 |
+
"epoch": 0.3890793650793651,
|
| 42927 |
+
"grad_norm": 0.380859375,
|
| 42928 |
+
"learning_rate": 0.1,
|
| 42929 |
+
"loss": 2.473161220550537,
|
| 42930 |
+
"step": 12256
|
| 42931 |
+
},
|
| 42932 |
+
{
|
| 42933 |
+
"epoch": 0.3891428571428571,
|
| 42934 |
+
"grad_norm": 0.5859375,
|
| 42935 |
+
"learning_rate": 0.1,
|
| 42936 |
+
"loss": 2.4561400413513184,
|
| 42937 |
+
"step": 12258
|
| 42938 |
+
},
|
| 42939 |
+
{
|
| 42940 |
+
"epoch": 0.3892063492063492,
|
| 42941 |
+
"grad_norm": 0.1279296875,
|
| 42942 |
+
"learning_rate": 0.1,
|
| 42943 |
+
"loss": 2.4945881366729736,
|
| 42944 |
+
"step": 12260
|
| 42945 |
+
},
|
| 42946 |
+
{
|
| 42947 |
+
"epoch": 0.38926984126984127,
|
| 42948 |
+
"grad_norm": 0.1787109375,
|
| 42949 |
+
"learning_rate": 0.1,
|
| 42950 |
+
"loss": 2.4576754570007324,
|
| 42951 |
+
"step": 12262
|
| 42952 |
+
},
|
| 42953 |
+
{
|
| 42954 |
+
"epoch": 0.3893333333333333,
|
| 42955 |
+
"grad_norm": 0.1484375,
|
| 42956 |
+
"learning_rate": 0.1,
|
| 42957 |
+
"loss": 2.4777143001556396,
|
| 42958 |
+
"step": 12264
|
| 42959 |
+
},
|
| 42960 |
+
{
|
| 42961 |
+
"epoch": 0.3893968253968254,
|
| 42962 |
+
"grad_norm": 0.25390625,
|
| 42963 |
+
"learning_rate": 0.1,
|
| 42964 |
+
"loss": 2.466712713241577,
|
| 42965 |
+
"step": 12266
|
| 42966 |
+
},
|
| 42967 |
+
{
|
| 42968 |
+
"epoch": 0.38946031746031745,
|
| 42969 |
+
"grad_norm": 0.3359375,
|
| 42970 |
+
"learning_rate": 0.1,
|
| 42971 |
+
"loss": 2.4671568870544434,
|
| 42972 |
+
"step": 12268
|
| 42973 |
+
},
|
| 42974 |
+
{
|
| 42975 |
+
"epoch": 0.38952380952380955,
|
| 42976 |
+
"grad_norm": 0.2138671875,
|
| 42977 |
+
"learning_rate": 0.1,
|
| 42978 |
+
"loss": 2.449784517288208,
|
| 42979 |
+
"step": 12270
|
| 42980 |
+
},
|
| 42981 |
+
{
|
| 42982 |
+
"epoch": 0.3895873015873016,
|
| 42983 |
+
"grad_norm": 0.09130859375,
|
| 42984 |
+
"learning_rate": 0.1,
|
| 42985 |
+
"loss": 2.468003273010254,
|
| 42986 |
+
"step": 12272
|
| 42987 |
+
},
|
| 42988 |
+
{
|
| 42989 |
+
"epoch": 0.38965079365079364,
|
| 42990 |
+
"grad_norm": 0.099609375,
|
| 42991 |
+
"learning_rate": 0.1,
|
| 42992 |
+
"loss": 2.455387830734253,
|
| 42993 |
+
"step": 12274
|
| 42994 |
+
},
|
| 42995 |
+
{
|
| 42996 |
+
"epoch": 0.38971428571428574,
|
| 42997 |
+
"grad_norm": 0.052490234375,
|
| 42998 |
+
"learning_rate": 0.1,
|
| 42999 |
+
"loss": 2.445190191268921,
|
| 43000 |
+
"step": 12276
|
| 43001 |
+
},
|
| 43002 |
+
{
|
| 43003 |
+
"epoch": 0.3897777777777778,
|
| 43004 |
+
"grad_norm": 0.08056640625,
|
| 43005 |
+
"learning_rate": 0.1,
|
| 43006 |
+
"loss": 2.44834041595459,
|
| 43007 |
+
"step": 12278
|
| 43008 |
+
},
|
| 43009 |
+
{
|
| 43010 |
+
"epoch": 0.3898412698412698,
|
| 43011 |
+
"grad_norm": 0.25,
|
| 43012 |
+
"learning_rate": 0.1,
|
| 43013 |
+
"loss": 2.4347023963928223,
|
| 43014 |
+
"step": 12280
|
| 43015 |
+
},
|
| 43016 |
+
{
|
| 43017 |
+
"epoch": 0.3899047619047619,
|
| 43018 |
+
"grad_norm": 0.296875,
|
| 43019 |
+
"learning_rate": 0.1,
|
| 43020 |
+
"loss": 2.424799680709839,
|
| 43021 |
+
"step": 12282
|
| 43022 |
+
},
|
| 43023 |
+
{
|
| 43024 |
+
"epoch": 0.38996825396825396,
|
| 43025 |
+
"grad_norm": 0.2060546875,
|
| 43026 |
+
"learning_rate": 0.1,
|
| 43027 |
+
"loss": 2.452523708343506,
|
| 43028 |
+
"step": 12284
|
| 43029 |
}
|
| 43030 |
],
|
| 43031 |
"logging_steps": 2,
|
|
|
|
| 43045 |
"attributes": {}
|
| 43046 |
}
|
| 43047 |
},
|
| 43048 |
+
"total_flos": 4.068607701147668e+19,
|
| 43049 |
"train_batch_size": 4,
|
| 43050 |
"trial_name": null,
|
| 43051 |
"trial_params": null
|