Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 10710, checkpoint
Browse files- last-checkpoint/trainer_state.json +1109 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -36411,6 +36411,1112 @@
|
|
| 36411 |
"learning_rate": 0.1,
|
| 36412 |
"loss": 2.2154617309570312,
|
| 36413 |
"step": 10394
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36414 |
}
|
| 36415 |
],
|
| 36416 |
"logging_steps": 2,
|
|
@@ -36430,7 +37536,7 @@
|
|
| 36430 |
"attributes": {}
|
| 36431 |
}
|
| 36432 |
},
|
| 36433 |
-
"total_flos": 3.
|
| 36434 |
"train_batch_size": 4,
|
| 36435 |
"trial_name": null,
|
| 36436 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.34,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 10710,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 36411 |
"learning_rate": 0.1,
|
| 36412 |
"loss": 2.2154617309570312,
|
| 36413 |
"step": 10394
|
| 36414 |
+
},
|
| 36415 |
+
{
|
| 36416 |
+
"epoch": 0.330031746031746,
|
| 36417 |
+
"grad_norm": 0.1298828125,
|
| 36418 |
+
"learning_rate": 0.1,
|
| 36419 |
+
"loss": 2.2190306186676025,
|
| 36420 |
+
"step": 10396
|
| 36421 |
+
},
|
| 36422 |
+
{
|
| 36423 |
+
"epoch": 0.3300952380952381,
|
| 36424 |
+
"grad_norm": 0.10888671875,
|
| 36425 |
+
"learning_rate": 0.1,
|
| 36426 |
+
"loss": 2.2018134593963623,
|
| 36427 |
+
"step": 10398
|
| 36428 |
+
},
|
| 36429 |
+
{
|
| 36430 |
+
"epoch": 0.33015873015873015,
|
| 36431 |
+
"grad_norm": 0.142578125,
|
| 36432 |
+
"learning_rate": 0.1,
|
| 36433 |
+
"loss": 2.2261345386505127,
|
| 36434 |
+
"step": 10400
|
| 36435 |
+
},
|
| 36436 |
+
{
|
| 36437 |
+
"epoch": 0.3302222222222222,
|
| 36438 |
+
"grad_norm": 0.1787109375,
|
| 36439 |
+
"learning_rate": 0.1,
|
| 36440 |
+
"loss": 2.2277352809906006,
|
| 36441 |
+
"step": 10402
|
| 36442 |
+
},
|
| 36443 |
+
{
|
| 36444 |
+
"epoch": 0.3302857142857143,
|
| 36445 |
+
"grad_norm": 0.142578125,
|
| 36446 |
+
"learning_rate": 0.1,
|
| 36447 |
+
"loss": 2.2359938621520996,
|
| 36448 |
+
"step": 10404
|
| 36449 |
+
},
|
| 36450 |
+
{
|
| 36451 |
+
"epoch": 0.33034920634920634,
|
| 36452 |
+
"grad_norm": 0.2265625,
|
| 36453 |
+
"learning_rate": 0.1,
|
| 36454 |
+
"loss": 2.2159276008605957,
|
| 36455 |
+
"step": 10406
|
| 36456 |
+
},
|
| 36457 |
+
{
|
| 36458 |
+
"epoch": 0.33041269841269844,
|
| 36459 |
+
"grad_norm": 0.22265625,
|
| 36460 |
+
"learning_rate": 0.1,
|
| 36461 |
+
"loss": 2.2152233123779297,
|
| 36462 |
+
"step": 10408
|
| 36463 |
+
},
|
| 36464 |
+
{
|
| 36465 |
+
"epoch": 0.3304761904761905,
|
| 36466 |
+
"grad_norm": 0.17578125,
|
| 36467 |
+
"learning_rate": 0.1,
|
| 36468 |
+
"loss": 2.1989567279815674,
|
| 36469 |
+
"step": 10410
|
| 36470 |
+
},
|
| 36471 |
+
{
|
| 36472 |
+
"epoch": 0.3305396825396825,
|
| 36473 |
+
"grad_norm": 0.23046875,
|
| 36474 |
+
"learning_rate": 0.1,
|
| 36475 |
+
"loss": 2.2169737815856934,
|
| 36476 |
+
"step": 10412
|
| 36477 |
+
},
|
| 36478 |
+
{
|
| 36479 |
+
"epoch": 0.3306031746031746,
|
| 36480 |
+
"grad_norm": 0.158203125,
|
| 36481 |
+
"learning_rate": 0.1,
|
| 36482 |
+
"loss": 2.2402162551879883,
|
| 36483 |
+
"step": 10414
|
| 36484 |
+
},
|
| 36485 |
+
{
|
| 36486 |
+
"epoch": 0.33066666666666666,
|
| 36487 |
+
"grad_norm": 0.09716796875,
|
| 36488 |
+
"learning_rate": 0.1,
|
| 36489 |
+
"loss": 2.22573184967041,
|
| 36490 |
+
"step": 10416
|
| 36491 |
+
},
|
| 36492 |
+
{
|
| 36493 |
+
"epoch": 0.3307301587301587,
|
| 36494 |
+
"grad_norm": 0.173828125,
|
| 36495 |
+
"learning_rate": 0.1,
|
| 36496 |
+
"loss": 2.2216873168945312,
|
| 36497 |
+
"step": 10418
|
| 36498 |
+
},
|
| 36499 |
+
{
|
| 36500 |
+
"epoch": 0.3307936507936508,
|
| 36501 |
+
"grad_norm": 0.35546875,
|
| 36502 |
+
"learning_rate": 0.1,
|
| 36503 |
+
"loss": 2.22286319732666,
|
| 36504 |
+
"step": 10420
|
| 36505 |
+
},
|
| 36506 |
+
{
|
| 36507 |
+
"epoch": 0.33085714285714285,
|
| 36508 |
+
"grad_norm": 0.248046875,
|
| 36509 |
+
"learning_rate": 0.1,
|
| 36510 |
+
"loss": 2.2141366004943848,
|
| 36511 |
+
"step": 10422
|
| 36512 |
+
},
|
| 36513 |
+
{
|
| 36514 |
+
"epoch": 0.3309206349206349,
|
| 36515 |
+
"grad_norm": 0.138671875,
|
| 36516 |
+
"learning_rate": 0.1,
|
| 36517 |
+
"loss": 2.207756519317627,
|
| 36518 |
+
"step": 10424
|
| 36519 |
+
},
|
| 36520 |
+
{
|
| 36521 |
+
"epoch": 0.330984126984127,
|
| 36522 |
+
"grad_norm": 0.07763671875,
|
| 36523 |
+
"learning_rate": 0.1,
|
| 36524 |
+
"loss": 2.225470781326294,
|
| 36525 |
+
"step": 10426
|
| 36526 |
+
},
|
| 36527 |
+
{
|
| 36528 |
+
"epoch": 0.33104761904761904,
|
| 36529 |
+
"grad_norm": 0.1669921875,
|
| 36530 |
+
"learning_rate": 0.1,
|
| 36531 |
+
"loss": 2.228902816772461,
|
| 36532 |
+
"step": 10428
|
| 36533 |
+
},
|
| 36534 |
+
{
|
| 36535 |
+
"epoch": 0.33111111111111113,
|
| 36536 |
+
"grad_norm": 0.1533203125,
|
| 36537 |
+
"learning_rate": 0.1,
|
| 36538 |
+
"loss": 2.2184183597564697,
|
| 36539 |
+
"step": 10430
|
| 36540 |
+
},
|
| 36541 |
+
{
|
| 36542 |
+
"epoch": 0.3311746031746032,
|
| 36543 |
+
"grad_norm": 0.12158203125,
|
| 36544 |
+
"learning_rate": 0.1,
|
| 36545 |
+
"loss": 2.2254750728607178,
|
| 36546 |
+
"step": 10432
|
| 36547 |
+
},
|
| 36548 |
+
{
|
| 36549 |
+
"epoch": 0.3312380952380952,
|
| 36550 |
+
"grad_norm": 0.1767578125,
|
| 36551 |
+
"learning_rate": 0.1,
|
| 36552 |
+
"loss": 2.2171778678894043,
|
| 36553 |
+
"step": 10434
|
| 36554 |
+
},
|
| 36555 |
+
{
|
| 36556 |
+
"epoch": 0.3313015873015873,
|
| 36557 |
+
"grad_norm": 0.068359375,
|
| 36558 |
+
"learning_rate": 0.1,
|
| 36559 |
+
"loss": 2.237307071685791,
|
| 36560 |
+
"step": 10436
|
| 36561 |
+
},
|
| 36562 |
+
{
|
| 36563 |
+
"epoch": 0.33136507936507936,
|
| 36564 |
+
"grad_norm": 0.1259765625,
|
| 36565 |
+
"learning_rate": 0.1,
|
| 36566 |
+
"loss": 2.2065367698669434,
|
| 36567 |
+
"step": 10438
|
| 36568 |
+
},
|
| 36569 |
+
{
|
| 36570 |
+
"epoch": 0.3314285714285714,
|
| 36571 |
+
"grad_norm": 0.09716796875,
|
| 36572 |
+
"learning_rate": 0.1,
|
| 36573 |
+
"loss": 2.2208123207092285,
|
| 36574 |
+
"step": 10440
|
| 36575 |
+
},
|
| 36576 |
+
{
|
| 36577 |
+
"epoch": 0.3314920634920635,
|
| 36578 |
+
"grad_norm": 0.16796875,
|
| 36579 |
+
"learning_rate": 0.1,
|
| 36580 |
+
"loss": 2.219244956970215,
|
| 36581 |
+
"step": 10442
|
| 36582 |
+
},
|
| 36583 |
+
{
|
| 36584 |
+
"epoch": 0.33155555555555555,
|
| 36585 |
+
"grad_norm": 0.185546875,
|
| 36586 |
+
"learning_rate": 0.1,
|
| 36587 |
+
"loss": 2.2103569507598877,
|
| 36588 |
+
"step": 10444
|
| 36589 |
+
},
|
| 36590 |
+
{
|
| 36591 |
+
"epoch": 0.33161904761904765,
|
| 36592 |
+
"grad_norm": 0.1591796875,
|
| 36593 |
+
"learning_rate": 0.1,
|
| 36594 |
+
"loss": 2.2479629516601562,
|
| 36595 |
+
"step": 10446
|
| 36596 |
+
},
|
| 36597 |
+
{
|
| 36598 |
+
"epoch": 0.3316825396825397,
|
| 36599 |
+
"grad_norm": 0.1337890625,
|
| 36600 |
+
"learning_rate": 0.1,
|
| 36601 |
+
"loss": 2.2229833602905273,
|
| 36602 |
+
"step": 10448
|
| 36603 |
+
},
|
| 36604 |
+
{
|
| 36605 |
+
"epoch": 0.33174603174603173,
|
| 36606 |
+
"grad_norm": 0.546875,
|
| 36607 |
+
"learning_rate": 0.1,
|
| 36608 |
+
"loss": 2.2340261936187744,
|
| 36609 |
+
"step": 10450
|
| 36610 |
+
},
|
| 36611 |
+
{
|
| 36612 |
+
"epoch": 0.33180952380952383,
|
| 36613 |
+
"grad_norm": 0.11962890625,
|
| 36614 |
+
"learning_rate": 0.1,
|
| 36615 |
+
"loss": 2.228762626647949,
|
| 36616 |
+
"step": 10452
|
| 36617 |
+
},
|
| 36618 |
+
{
|
| 36619 |
+
"epoch": 0.3318730158730159,
|
| 36620 |
+
"grad_norm": 0.12158203125,
|
| 36621 |
+
"learning_rate": 0.1,
|
| 36622 |
+
"loss": 2.2089247703552246,
|
| 36623 |
+
"step": 10454
|
| 36624 |
+
},
|
| 36625 |
+
{
|
| 36626 |
+
"epoch": 0.3319365079365079,
|
| 36627 |
+
"grad_norm": 0.07275390625,
|
| 36628 |
+
"learning_rate": 0.1,
|
| 36629 |
+
"loss": 2.214805841445923,
|
| 36630 |
+
"step": 10456
|
| 36631 |
+
},
|
| 36632 |
+
{
|
| 36633 |
+
"epoch": 0.332,
|
| 36634 |
+
"grad_norm": 0.111328125,
|
| 36635 |
+
"learning_rate": 0.1,
|
| 36636 |
+
"loss": 2.2390830516815186,
|
| 36637 |
+
"step": 10458
|
| 36638 |
+
},
|
| 36639 |
+
{
|
| 36640 |
+
"epoch": 0.33206349206349206,
|
| 36641 |
+
"grad_norm": 0.2080078125,
|
| 36642 |
+
"learning_rate": 0.1,
|
| 36643 |
+
"loss": 2.255481719970703,
|
| 36644 |
+
"step": 10460
|
| 36645 |
+
},
|
| 36646 |
+
{
|
| 36647 |
+
"epoch": 0.3321269841269841,
|
| 36648 |
+
"grad_norm": 0.146484375,
|
| 36649 |
+
"learning_rate": 0.1,
|
| 36650 |
+
"loss": 2.19061541557312,
|
| 36651 |
+
"step": 10462
|
| 36652 |
+
},
|
| 36653 |
+
{
|
| 36654 |
+
"epoch": 0.3321904761904762,
|
| 36655 |
+
"grad_norm": 0.1259765625,
|
| 36656 |
+
"learning_rate": 0.1,
|
| 36657 |
+
"loss": 2.2185862064361572,
|
| 36658 |
+
"step": 10464
|
| 36659 |
+
},
|
| 36660 |
+
{
|
| 36661 |
+
"epoch": 0.33225396825396825,
|
| 36662 |
+
"grad_norm": 0.064453125,
|
| 36663 |
+
"learning_rate": 0.1,
|
| 36664 |
+
"loss": 2.2265207767486572,
|
| 36665 |
+
"step": 10466
|
| 36666 |
+
},
|
| 36667 |
+
{
|
| 36668 |
+
"epoch": 0.33231746031746034,
|
| 36669 |
+
"grad_norm": 0.1279296875,
|
| 36670 |
+
"learning_rate": 0.1,
|
| 36671 |
+
"loss": 2.2071709632873535,
|
| 36672 |
+
"step": 10468
|
| 36673 |
+
},
|
| 36674 |
+
{
|
| 36675 |
+
"epoch": 0.3323809523809524,
|
| 36676 |
+
"grad_norm": 0.2333984375,
|
| 36677 |
+
"learning_rate": 0.1,
|
| 36678 |
+
"loss": 2.219552993774414,
|
| 36679 |
+
"step": 10470
|
| 36680 |
+
},
|
| 36681 |
+
{
|
| 36682 |
+
"epoch": 0.33244444444444443,
|
| 36683 |
+
"grad_norm": 0.15625,
|
| 36684 |
+
"learning_rate": 0.1,
|
| 36685 |
+
"loss": 2.2066922187805176,
|
| 36686 |
+
"step": 10472
|
| 36687 |
+
},
|
| 36688 |
+
{
|
| 36689 |
+
"epoch": 0.33250793650793653,
|
| 36690 |
+
"grad_norm": 0.255859375,
|
| 36691 |
+
"learning_rate": 0.1,
|
| 36692 |
+
"loss": 2.1909236907958984,
|
| 36693 |
+
"step": 10474
|
| 36694 |
+
},
|
| 36695 |
+
{
|
| 36696 |
+
"epoch": 0.3325714285714286,
|
| 36697 |
+
"grad_norm": 0.2099609375,
|
| 36698 |
+
"learning_rate": 0.1,
|
| 36699 |
+
"loss": 2.212517023086548,
|
| 36700 |
+
"step": 10476
|
| 36701 |
+
},
|
| 36702 |
+
{
|
| 36703 |
+
"epoch": 0.3326349206349206,
|
| 36704 |
+
"grad_norm": 0.181640625,
|
| 36705 |
+
"learning_rate": 0.1,
|
| 36706 |
+
"loss": 2.2175776958465576,
|
| 36707 |
+
"step": 10478
|
| 36708 |
+
},
|
| 36709 |
+
{
|
| 36710 |
+
"epoch": 0.3326984126984127,
|
| 36711 |
+
"grad_norm": 0.162109375,
|
| 36712 |
+
"learning_rate": 0.1,
|
| 36713 |
+
"loss": 2.1897659301757812,
|
| 36714 |
+
"step": 10480
|
| 36715 |
+
},
|
| 36716 |
+
{
|
| 36717 |
+
"epoch": 0.33276190476190476,
|
| 36718 |
+
"grad_norm": 0.07763671875,
|
| 36719 |
+
"learning_rate": 0.1,
|
| 36720 |
+
"loss": 2.197108745574951,
|
| 36721 |
+
"step": 10482
|
| 36722 |
+
},
|
| 36723 |
+
{
|
| 36724 |
+
"epoch": 0.3328253968253968,
|
| 36725 |
+
"grad_norm": 0.08544921875,
|
| 36726 |
+
"learning_rate": 0.1,
|
| 36727 |
+
"loss": 2.210545301437378,
|
| 36728 |
+
"step": 10484
|
| 36729 |
+
},
|
| 36730 |
+
{
|
| 36731 |
+
"epoch": 0.3328888888888889,
|
| 36732 |
+
"grad_norm": 0.1572265625,
|
| 36733 |
+
"learning_rate": 0.1,
|
| 36734 |
+
"loss": 2.1997792720794678,
|
| 36735 |
+
"step": 10486
|
| 36736 |
+
},
|
| 36737 |
+
{
|
| 36738 |
+
"epoch": 0.33295238095238094,
|
| 36739 |
+
"grad_norm": 0.0908203125,
|
| 36740 |
+
"learning_rate": 0.1,
|
| 36741 |
+
"loss": 2.212092638015747,
|
| 36742 |
+
"step": 10488
|
| 36743 |
+
},
|
| 36744 |
+
{
|
| 36745 |
+
"epoch": 0.33301587301587304,
|
| 36746 |
+
"grad_norm": 0.08349609375,
|
| 36747 |
+
"learning_rate": 0.1,
|
| 36748 |
+
"loss": 2.193092107772827,
|
| 36749 |
+
"step": 10490
|
| 36750 |
+
},
|
| 36751 |
+
{
|
| 36752 |
+
"epoch": 0.3330793650793651,
|
| 36753 |
+
"grad_norm": 0.1669921875,
|
| 36754 |
+
"learning_rate": 0.1,
|
| 36755 |
+
"loss": 2.2050037384033203,
|
| 36756 |
+
"step": 10492
|
| 36757 |
+
},
|
| 36758 |
+
{
|
| 36759 |
+
"epoch": 0.33314285714285713,
|
| 36760 |
+
"grad_norm": 0.4140625,
|
| 36761 |
+
"learning_rate": 0.1,
|
| 36762 |
+
"loss": 2.2340750694274902,
|
| 36763 |
+
"step": 10494
|
| 36764 |
+
},
|
| 36765 |
+
{
|
| 36766 |
+
"epoch": 0.33320634920634923,
|
| 36767 |
+
"grad_norm": 0.07763671875,
|
| 36768 |
+
"learning_rate": 0.1,
|
| 36769 |
+
"loss": 2.1892526149749756,
|
| 36770 |
+
"step": 10496
|
| 36771 |
+
},
|
| 36772 |
+
{
|
| 36773 |
+
"epoch": 0.33326984126984127,
|
| 36774 |
+
"grad_norm": 0.07421875,
|
| 36775 |
+
"learning_rate": 0.1,
|
| 36776 |
+
"loss": 2.1927127838134766,
|
| 36777 |
+
"step": 10498
|
| 36778 |
+
},
|
| 36779 |
+
{
|
| 36780 |
+
"epoch": 0.3333333333333333,
|
| 36781 |
+
"grad_norm": 0.2216796875,
|
| 36782 |
+
"learning_rate": 0.1,
|
| 36783 |
+
"loss": 2.200645685195923,
|
| 36784 |
+
"step": 10500
|
| 36785 |
+
},
|
| 36786 |
+
{
|
| 36787 |
+
"epoch": 0.3333968253968254,
|
| 36788 |
+
"grad_norm": 0.1279296875,
|
| 36789 |
+
"learning_rate": 0.1,
|
| 36790 |
+
"loss": 2.215970277786255,
|
| 36791 |
+
"step": 10502
|
| 36792 |
+
},
|
| 36793 |
+
{
|
| 36794 |
+
"epoch": 0.33346031746031746,
|
| 36795 |
+
"grad_norm": 0.109375,
|
| 36796 |
+
"learning_rate": 0.1,
|
| 36797 |
+
"loss": 2.222879648208618,
|
| 36798 |
+
"step": 10504
|
| 36799 |
+
},
|
| 36800 |
+
{
|
| 36801 |
+
"epoch": 0.3335238095238095,
|
| 36802 |
+
"grad_norm": 0.052978515625,
|
| 36803 |
+
"learning_rate": 0.1,
|
| 36804 |
+
"loss": 2.2258737087249756,
|
| 36805 |
+
"step": 10506
|
| 36806 |
+
},
|
| 36807 |
+
{
|
| 36808 |
+
"epoch": 0.3335873015873016,
|
| 36809 |
+
"grad_norm": 0.1552734375,
|
| 36810 |
+
"learning_rate": 0.1,
|
| 36811 |
+
"loss": 2.1933491230010986,
|
| 36812 |
+
"step": 10508
|
| 36813 |
+
},
|
| 36814 |
+
{
|
| 36815 |
+
"epoch": 0.33365079365079364,
|
| 36816 |
+
"grad_norm": 0.353515625,
|
| 36817 |
+
"learning_rate": 0.1,
|
| 36818 |
+
"loss": 2.189959764480591,
|
| 36819 |
+
"step": 10510
|
| 36820 |
+
},
|
| 36821 |
+
{
|
| 36822 |
+
"epoch": 0.33371428571428574,
|
| 36823 |
+
"grad_norm": 0.1943359375,
|
| 36824 |
+
"learning_rate": 0.1,
|
| 36825 |
+
"loss": 2.215662717819214,
|
| 36826 |
+
"step": 10512
|
| 36827 |
+
},
|
| 36828 |
+
{
|
| 36829 |
+
"epoch": 0.3337777777777778,
|
| 36830 |
+
"grad_norm": 0.154296875,
|
| 36831 |
+
"learning_rate": 0.1,
|
| 36832 |
+
"loss": 2.2161316871643066,
|
| 36833 |
+
"step": 10514
|
| 36834 |
+
},
|
| 36835 |
+
{
|
| 36836 |
+
"epoch": 0.3338412698412698,
|
| 36837 |
+
"grad_norm": 0.158203125,
|
| 36838 |
+
"learning_rate": 0.1,
|
| 36839 |
+
"loss": 2.213809013366699,
|
| 36840 |
+
"step": 10516
|
| 36841 |
+
},
|
| 36842 |
+
{
|
| 36843 |
+
"epoch": 0.3339047619047619,
|
| 36844 |
+
"grad_norm": 0.11181640625,
|
| 36845 |
+
"learning_rate": 0.1,
|
| 36846 |
+
"loss": 2.193037509918213,
|
| 36847 |
+
"step": 10518
|
| 36848 |
+
},
|
| 36849 |
+
{
|
| 36850 |
+
"epoch": 0.33396825396825397,
|
| 36851 |
+
"grad_norm": 0.04638671875,
|
| 36852 |
+
"learning_rate": 0.1,
|
| 36853 |
+
"loss": 2.168241262435913,
|
| 36854 |
+
"step": 10520
|
| 36855 |
+
},
|
| 36856 |
+
{
|
| 36857 |
+
"epoch": 0.334031746031746,
|
| 36858 |
+
"grad_norm": 0.2138671875,
|
| 36859 |
+
"learning_rate": 0.1,
|
| 36860 |
+
"loss": 2.2035601139068604,
|
| 36861 |
+
"step": 10522
|
| 36862 |
+
},
|
| 36863 |
+
{
|
| 36864 |
+
"epoch": 0.3340952380952381,
|
| 36865 |
+
"grad_norm": 0.33984375,
|
| 36866 |
+
"learning_rate": 0.1,
|
| 36867 |
+
"loss": 2.1778528690338135,
|
| 36868 |
+
"step": 10524
|
| 36869 |
+
},
|
| 36870 |
+
{
|
| 36871 |
+
"epoch": 0.33415873015873016,
|
| 36872 |
+
"grad_norm": 0.1142578125,
|
| 36873 |
+
"learning_rate": 0.1,
|
| 36874 |
+
"loss": 2.15850830078125,
|
| 36875 |
+
"step": 10526
|
| 36876 |
+
},
|
| 36877 |
+
{
|
| 36878 |
+
"epoch": 0.3342222222222222,
|
| 36879 |
+
"grad_norm": 0.10791015625,
|
| 36880 |
+
"learning_rate": 0.1,
|
| 36881 |
+
"loss": 2.205479621887207,
|
| 36882 |
+
"step": 10528
|
| 36883 |
+
},
|
| 36884 |
+
{
|
| 36885 |
+
"epoch": 0.3342857142857143,
|
| 36886 |
+
"grad_norm": 0.07275390625,
|
| 36887 |
+
"learning_rate": 0.1,
|
| 36888 |
+
"loss": 2.195462703704834,
|
| 36889 |
+
"step": 10530
|
| 36890 |
+
},
|
| 36891 |
+
{
|
| 36892 |
+
"epoch": 0.33434920634920634,
|
| 36893 |
+
"grad_norm": 0.09423828125,
|
| 36894 |
+
"learning_rate": 0.1,
|
| 36895 |
+
"loss": 2.1862833499908447,
|
| 36896 |
+
"step": 10532
|
| 36897 |
+
},
|
| 36898 |
+
{
|
| 36899 |
+
"epoch": 0.33441269841269844,
|
| 36900 |
+
"grad_norm": 0.1826171875,
|
| 36901 |
+
"learning_rate": 0.1,
|
| 36902 |
+
"loss": 2.1653358936309814,
|
| 36903 |
+
"step": 10534
|
| 36904 |
+
},
|
| 36905 |
+
{
|
| 36906 |
+
"epoch": 0.3344761904761905,
|
| 36907 |
+
"grad_norm": 0.1044921875,
|
| 36908 |
+
"learning_rate": 0.1,
|
| 36909 |
+
"loss": 2.187879800796509,
|
| 36910 |
+
"step": 10536
|
| 36911 |
+
},
|
| 36912 |
+
{
|
| 36913 |
+
"epoch": 0.3345396825396825,
|
| 36914 |
+
"grad_norm": 0.2392578125,
|
| 36915 |
+
"learning_rate": 0.1,
|
| 36916 |
+
"loss": 2.1676132678985596,
|
| 36917 |
+
"step": 10538
|
| 36918 |
+
},
|
| 36919 |
+
{
|
| 36920 |
+
"epoch": 0.3346031746031746,
|
| 36921 |
+
"grad_norm": 0.298828125,
|
| 36922 |
+
"learning_rate": 0.1,
|
| 36923 |
+
"loss": 2.172863006591797,
|
| 36924 |
+
"step": 10540
|
| 36925 |
+
},
|
| 36926 |
+
{
|
| 36927 |
+
"epoch": 0.33466666666666667,
|
| 36928 |
+
"grad_norm": 0.2041015625,
|
| 36929 |
+
"learning_rate": 0.1,
|
| 36930 |
+
"loss": 2.221343517303467,
|
| 36931 |
+
"step": 10542
|
| 36932 |
+
},
|
| 36933 |
+
{
|
| 36934 |
+
"epoch": 0.3347301587301587,
|
| 36935 |
+
"grad_norm": 0.05615234375,
|
| 36936 |
+
"learning_rate": 0.1,
|
| 36937 |
+
"loss": 2.168297052383423,
|
| 36938 |
+
"step": 10544
|
| 36939 |
+
},
|
| 36940 |
+
{
|
| 36941 |
+
"epoch": 0.3347936507936508,
|
| 36942 |
+
"grad_norm": 0.08837890625,
|
| 36943 |
+
"learning_rate": 0.1,
|
| 36944 |
+
"loss": 2.179994821548462,
|
| 36945 |
+
"step": 10546
|
| 36946 |
+
},
|
| 36947 |
+
{
|
| 36948 |
+
"epoch": 0.33485714285714285,
|
| 36949 |
+
"grad_norm": 0.107421875,
|
| 36950 |
+
"learning_rate": 0.1,
|
| 36951 |
+
"loss": 2.1946322917938232,
|
| 36952 |
+
"step": 10548
|
| 36953 |
+
},
|
| 36954 |
+
{
|
| 36955 |
+
"epoch": 0.3349206349206349,
|
| 36956 |
+
"grad_norm": 0.1630859375,
|
| 36957 |
+
"learning_rate": 0.1,
|
| 36958 |
+
"loss": 2.1987736225128174,
|
| 36959 |
+
"step": 10550
|
| 36960 |
+
},
|
| 36961 |
+
{
|
| 36962 |
+
"epoch": 0.334984126984127,
|
| 36963 |
+
"grad_norm": 0.1708984375,
|
| 36964 |
+
"learning_rate": 0.1,
|
| 36965 |
+
"loss": 2.186816453933716,
|
| 36966 |
+
"step": 10552
|
| 36967 |
+
},
|
| 36968 |
+
{
|
| 36969 |
+
"epoch": 0.33504761904761904,
|
| 36970 |
+
"grad_norm": 0.060546875,
|
| 36971 |
+
"learning_rate": 0.1,
|
| 36972 |
+
"loss": 2.2010762691497803,
|
| 36973 |
+
"step": 10554
|
| 36974 |
+
},
|
| 36975 |
+
{
|
| 36976 |
+
"epoch": 0.33511111111111114,
|
| 36977 |
+
"grad_norm": 0.09912109375,
|
| 36978 |
+
"learning_rate": 0.1,
|
| 36979 |
+
"loss": 2.20300030708313,
|
| 36980 |
+
"step": 10556
|
| 36981 |
+
},
|
| 36982 |
+
{
|
| 36983 |
+
"epoch": 0.3351746031746032,
|
| 36984 |
+
"grad_norm": 0.234375,
|
| 36985 |
+
"learning_rate": 0.1,
|
| 36986 |
+
"loss": 2.2044005393981934,
|
| 36987 |
+
"step": 10558
|
| 36988 |
+
},
|
| 36989 |
+
{
|
| 36990 |
+
"epoch": 0.3352380952380952,
|
| 36991 |
+
"grad_norm": 0.173828125,
|
| 36992 |
+
"learning_rate": 0.1,
|
| 36993 |
+
"loss": 2.177442789077759,
|
| 36994 |
+
"step": 10560
|
| 36995 |
+
},
|
| 36996 |
+
{
|
| 36997 |
+
"epoch": 0.3353015873015873,
|
| 36998 |
+
"grad_norm": 0.1875,
|
| 36999 |
+
"learning_rate": 0.1,
|
| 37000 |
+
"loss": 2.1997268199920654,
|
| 37001 |
+
"step": 10562
|
| 37002 |
+
},
|
| 37003 |
+
{
|
| 37004 |
+
"epoch": 0.33536507936507937,
|
| 37005 |
+
"grad_norm": 0.54296875,
|
| 37006 |
+
"learning_rate": 0.1,
|
| 37007 |
+
"loss": 2.177321434020996,
|
| 37008 |
+
"step": 10564
|
| 37009 |
+
},
|
| 37010 |
+
{
|
| 37011 |
+
"epoch": 0.3354285714285714,
|
| 37012 |
+
"grad_norm": 0.09521484375,
|
| 37013 |
+
"learning_rate": 0.1,
|
| 37014 |
+
"loss": 2.1977639198303223,
|
| 37015 |
+
"step": 10566
|
| 37016 |
+
},
|
| 37017 |
+
{
|
| 37018 |
+
"epoch": 0.3354920634920635,
|
| 37019 |
+
"grad_norm": 0.06689453125,
|
| 37020 |
+
"learning_rate": 0.1,
|
| 37021 |
+
"loss": 2.186274528503418,
|
| 37022 |
+
"step": 10568
|
| 37023 |
+
},
|
| 37024 |
+
{
|
| 37025 |
+
"epoch": 0.33555555555555555,
|
| 37026 |
+
"grad_norm": 0.154296875,
|
| 37027 |
+
"learning_rate": 0.1,
|
| 37028 |
+
"loss": 2.178790807723999,
|
| 37029 |
+
"step": 10570
|
| 37030 |
+
},
|
| 37031 |
+
{
|
| 37032 |
+
"epoch": 0.3356190476190476,
|
| 37033 |
+
"grad_norm": 0.11474609375,
|
| 37034 |
+
"learning_rate": 0.1,
|
| 37035 |
+
"loss": 2.1916494369506836,
|
| 37036 |
+
"step": 10572
|
| 37037 |
+
},
|
| 37038 |
+
{
|
| 37039 |
+
"epoch": 0.3356825396825397,
|
| 37040 |
+
"grad_norm": 0.15625,
|
| 37041 |
+
"learning_rate": 0.1,
|
| 37042 |
+
"loss": 2.198308229446411,
|
| 37043 |
+
"step": 10574
|
| 37044 |
+
},
|
| 37045 |
+
{
|
| 37046 |
+
"epoch": 0.33574603174603174,
|
| 37047 |
+
"grad_norm": 0.1748046875,
|
| 37048 |
+
"learning_rate": 0.1,
|
| 37049 |
+
"loss": 2.1767139434814453,
|
| 37050 |
+
"step": 10576
|
| 37051 |
+
},
|
| 37052 |
+
{
|
| 37053 |
+
"epoch": 0.33580952380952384,
|
| 37054 |
+
"grad_norm": 0.2138671875,
|
| 37055 |
+
"learning_rate": 0.1,
|
| 37056 |
+
"loss": 2.189945936203003,
|
| 37057 |
+
"step": 10578
|
| 37058 |
+
},
|
| 37059 |
+
{
|
| 37060 |
+
"epoch": 0.3358730158730159,
|
| 37061 |
+
"grad_norm": 0.15625,
|
| 37062 |
+
"learning_rate": 0.1,
|
| 37063 |
+
"loss": 2.1869962215423584,
|
| 37064 |
+
"step": 10580
|
| 37065 |
+
},
|
| 37066 |
+
{
|
| 37067 |
+
"epoch": 0.3359365079365079,
|
| 37068 |
+
"grad_norm": 0.181640625,
|
| 37069 |
+
"learning_rate": 0.1,
|
| 37070 |
+
"loss": 2.1500842571258545,
|
| 37071 |
+
"step": 10582
|
| 37072 |
+
},
|
| 37073 |
+
{
|
| 37074 |
+
"epoch": 0.336,
|
| 37075 |
+
"grad_norm": 0.10009765625,
|
| 37076 |
+
"learning_rate": 0.1,
|
| 37077 |
+
"loss": 2.1756904125213623,
|
| 37078 |
+
"step": 10584
|
| 37079 |
+
},
|
| 37080 |
+
{
|
| 37081 |
+
"epoch": 0.33606349206349206,
|
| 37082 |
+
"grad_norm": 0.1083984375,
|
| 37083 |
+
"learning_rate": 0.1,
|
| 37084 |
+
"loss": 2.1680054664611816,
|
| 37085 |
+
"step": 10586
|
| 37086 |
+
},
|
| 37087 |
+
{
|
| 37088 |
+
"epoch": 0.3361269841269841,
|
| 37089 |
+
"grad_norm": 0.216796875,
|
| 37090 |
+
"learning_rate": 0.1,
|
| 37091 |
+
"loss": 2.182795763015747,
|
| 37092 |
+
"step": 10588
|
| 37093 |
+
},
|
| 37094 |
+
{
|
| 37095 |
+
"epoch": 0.3361904761904762,
|
| 37096 |
+
"grad_norm": 0.177734375,
|
| 37097 |
+
"learning_rate": 0.1,
|
| 37098 |
+
"loss": 2.187570571899414,
|
| 37099 |
+
"step": 10590
|
| 37100 |
+
},
|
| 37101 |
+
{
|
| 37102 |
+
"epoch": 0.33625396825396825,
|
| 37103 |
+
"grad_norm": 0.216796875,
|
| 37104 |
+
"learning_rate": 0.1,
|
| 37105 |
+
"loss": 2.186627149581909,
|
| 37106 |
+
"step": 10592
|
| 37107 |
+
},
|
| 37108 |
+
{
|
| 37109 |
+
"epoch": 0.3363174603174603,
|
| 37110 |
+
"grad_norm": 0.12890625,
|
| 37111 |
+
"learning_rate": 0.1,
|
| 37112 |
+
"loss": 2.1917238235473633,
|
| 37113 |
+
"step": 10594
|
| 37114 |
+
},
|
| 37115 |
+
{
|
| 37116 |
+
"epoch": 0.3363809523809524,
|
| 37117 |
+
"grad_norm": 0.09130859375,
|
| 37118 |
+
"learning_rate": 0.1,
|
| 37119 |
+
"loss": 2.1571314334869385,
|
| 37120 |
+
"step": 10596
|
| 37121 |
+
},
|
| 37122 |
+
{
|
| 37123 |
+
"epoch": 0.33644444444444443,
|
| 37124 |
+
"grad_norm": 0.062255859375,
|
| 37125 |
+
"learning_rate": 0.1,
|
| 37126 |
+
"loss": 2.2026073932647705,
|
| 37127 |
+
"step": 10598
|
| 37128 |
+
},
|
| 37129 |
+
{
|
| 37130 |
+
"epoch": 0.33650793650793653,
|
| 37131 |
+
"grad_norm": 0.103515625,
|
| 37132 |
+
"learning_rate": 0.1,
|
| 37133 |
+
"loss": 2.2058682441711426,
|
| 37134 |
+
"step": 10600
|
| 37135 |
+
},
|
| 37136 |
+
{
|
| 37137 |
+
"epoch": 0.3365714285714286,
|
| 37138 |
+
"grad_norm": 0.28125,
|
| 37139 |
+
"learning_rate": 0.1,
|
| 37140 |
+
"loss": 2.202815294265747,
|
| 37141 |
+
"step": 10602
|
| 37142 |
+
},
|
| 37143 |
+
{
|
| 37144 |
+
"epoch": 0.3366349206349206,
|
| 37145 |
+
"grad_norm": 0.265625,
|
| 37146 |
+
"learning_rate": 0.1,
|
| 37147 |
+
"loss": 2.2016568183898926,
|
| 37148 |
+
"step": 10604
|
| 37149 |
+
},
|
| 37150 |
+
{
|
| 37151 |
+
"epoch": 0.3366984126984127,
|
| 37152 |
+
"grad_norm": 0.11279296875,
|
| 37153 |
+
"learning_rate": 0.1,
|
| 37154 |
+
"loss": 2.1916162967681885,
|
| 37155 |
+
"step": 10606
|
| 37156 |
+
},
|
| 37157 |
+
{
|
| 37158 |
+
"epoch": 0.33676190476190476,
|
| 37159 |
+
"grad_norm": 0.109375,
|
| 37160 |
+
"learning_rate": 0.1,
|
| 37161 |
+
"loss": 2.1857426166534424,
|
| 37162 |
+
"step": 10608
|
| 37163 |
+
},
|
| 37164 |
+
{
|
| 37165 |
+
"epoch": 0.3368253968253968,
|
| 37166 |
+
"grad_norm": 0.08935546875,
|
| 37167 |
+
"learning_rate": 0.1,
|
| 37168 |
+
"loss": 2.2138404846191406,
|
| 37169 |
+
"step": 10610
|
| 37170 |
+
},
|
| 37171 |
+
{
|
| 37172 |
+
"epoch": 0.3368888888888889,
|
| 37173 |
+
"grad_norm": 0.1904296875,
|
| 37174 |
+
"learning_rate": 0.1,
|
| 37175 |
+
"loss": 2.1968023777008057,
|
| 37176 |
+
"step": 10612
|
| 37177 |
+
},
|
| 37178 |
+
{
|
| 37179 |
+
"epoch": 0.33695238095238095,
|
| 37180 |
+
"grad_norm": 0.1748046875,
|
| 37181 |
+
"learning_rate": 0.1,
|
| 37182 |
+
"loss": 2.193377733230591,
|
| 37183 |
+
"step": 10614
|
| 37184 |
+
},
|
| 37185 |
+
{
|
| 37186 |
+
"epoch": 0.337015873015873,
|
| 37187 |
+
"grad_norm": 0.302734375,
|
| 37188 |
+
"learning_rate": 0.1,
|
| 37189 |
+
"loss": 2.1726179122924805,
|
| 37190 |
+
"step": 10616
|
| 37191 |
+
},
|
| 37192 |
+
{
|
| 37193 |
+
"epoch": 0.3370793650793651,
|
| 37194 |
+
"grad_norm": 0.2099609375,
|
| 37195 |
+
"learning_rate": 0.1,
|
| 37196 |
+
"loss": 2.1705336570739746,
|
| 37197 |
+
"step": 10618
|
| 37198 |
+
},
|
| 37199 |
+
{
|
| 37200 |
+
"epoch": 0.33714285714285713,
|
| 37201 |
+
"grad_norm": 0.205078125,
|
| 37202 |
+
"learning_rate": 0.1,
|
| 37203 |
+
"loss": 2.2049412727355957,
|
| 37204 |
+
"step": 10620
|
| 37205 |
+
},
|
| 37206 |
+
{
|
| 37207 |
+
"epoch": 0.33720634920634923,
|
| 37208 |
+
"grad_norm": 0.279296875,
|
| 37209 |
+
"learning_rate": 0.1,
|
| 37210 |
+
"loss": 2.211937665939331,
|
| 37211 |
+
"step": 10622
|
| 37212 |
+
},
|
| 37213 |
+
{
|
| 37214 |
+
"epoch": 0.3372698412698413,
|
| 37215 |
+
"grad_norm": 0.1591796875,
|
| 37216 |
+
"learning_rate": 0.1,
|
| 37217 |
+
"loss": 2.178178071975708,
|
| 37218 |
+
"step": 10624
|
| 37219 |
+
},
|
| 37220 |
+
{
|
| 37221 |
+
"epoch": 0.3373333333333333,
|
| 37222 |
+
"grad_norm": 0.06005859375,
|
| 37223 |
+
"learning_rate": 0.1,
|
| 37224 |
+
"loss": 2.1817891597747803,
|
| 37225 |
+
"step": 10626
|
| 37226 |
+
},
|
| 37227 |
+
{
|
| 37228 |
+
"epoch": 0.3373968253968254,
|
| 37229 |
+
"grad_norm": 0.07373046875,
|
| 37230 |
+
"learning_rate": 0.1,
|
| 37231 |
+
"loss": 2.1902942657470703,
|
| 37232 |
+
"step": 10628
|
| 37233 |
+
},
|
| 37234 |
+
{
|
| 37235 |
+
"epoch": 0.33746031746031746,
|
| 37236 |
+
"grad_norm": 0.09716796875,
|
| 37237 |
+
"learning_rate": 0.1,
|
| 37238 |
+
"loss": 2.168306827545166,
|
| 37239 |
+
"step": 10630
|
| 37240 |
+
},
|
| 37241 |
+
{
|
| 37242 |
+
"epoch": 0.3375238095238095,
|
| 37243 |
+
"grad_norm": 0.294921875,
|
| 37244 |
+
"learning_rate": 0.1,
|
| 37245 |
+
"loss": 2.1916491985321045,
|
| 37246 |
+
"step": 10632
|
| 37247 |
+
},
|
| 37248 |
+
{
|
| 37249 |
+
"epoch": 0.3375873015873016,
|
| 37250 |
+
"grad_norm": 0.1162109375,
|
| 37251 |
+
"learning_rate": 0.1,
|
| 37252 |
+
"loss": 2.20019268989563,
|
| 37253 |
+
"step": 10634
|
| 37254 |
+
},
|
| 37255 |
+
{
|
| 37256 |
+
"epoch": 0.33765079365079365,
|
| 37257 |
+
"grad_norm": 0.1015625,
|
| 37258 |
+
"learning_rate": 0.1,
|
| 37259 |
+
"loss": 2.1709628105163574,
|
| 37260 |
+
"step": 10636
|
| 37261 |
+
},
|
| 37262 |
+
{
|
| 37263 |
+
"epoch": 0.3377142857142857,
|
| 37264 |
+
"grad_norm": 0.1484375,
|
| 37265 |
+
"learning_rate": 0.1,
|
| 37266 |
+
"loss": 2.164698839187622,
|
| 37267 |
+
"step": 10638
|
| 37268 |
+
},
|
| 37269 |
+
{
|
| 37270 |
+
"epoch": 0.3377777777777778,
|
| 37271 |
+
"grad_norm": 0.12890625,
|
| 37272 |
+
"learning_rate": 0.1,
|
| 37273 |
+
"loss": 2.1735100746154785,
|
| 37274 |
+
"step": 10640
|
| 37275 |
+
},
|
| 37276 |
+
{
|
| 37277 |
+
"epoch": 0.33784126984126983,
|
| 37278 |
+
"grad_norm": 0.2177734375,
|
| 37279 |
+
"learning_rate": 0.1,
|
| 37280 |
+
"loss": 2.1582412719726562,
|
| 37281 |
+
"step": 10642
|
| 37282 |
+
},
|
| 37283 |
+
{
|
| 37284 |
+
"epoch": 0.33790476190476193,
|
| 37285 |
+
"grad_norm": 0.453125,
|
| 37286 |
+
"learning_rate": 0.1,
|
| 37287 |
+
"loss": 2.1964187622070312,
|
| 37288 |
+
"step": 10644
|
| 37289 |
+
},
|
| 37290 |
+
{
|
| 37291 |
+
"epoch": 0.337968253968254,
|
| 37292 |
+
"grad_norm": 0.1533203125,
|
| 37293 |
+
"learning_rate": 0.1,
|
| 37294 |
+
"loss": 2.1713085174560547,
|
| 37295 |
+
"step": 10646
|
| 37296 |
+
},
|
| 37297 |
+
{
|
| 37298 |
+
"epoch": 0.338031746031746,
|
| 37299 |
+
"grad_norm": 0.0986328125,
|
| 37300 |
+
"learning_rate": 0.1,
|
| 37301 |
+
"loss": 2.190800666809082,
|
| 37302 |
+
"step": 10648
|
| 37303 |
+
},
|
| 37304 |
+
{
|
| 37305 |
+
"epoch": 0.3380952380952381,
|
| 37306 |
+
"grad_norm": 0.1044921875,
|
| 37307 |
+
"learning_rate": 0.1,
|
| 37308 |
+
"loss": 2.1849403381347656,
|
| 37309 |
+
"step": 10650
|
| 37310 |
+
},
|
| 37311 |
+
{
|
| 37312 |
+
"epoch": 0.33815873015873016,
|
| 37313 |
+
"grad_norm": 0.1513671875,
|
| 37314 |
+
"learning_rate": 0.1,
|
| 37315 |
+
"loss": 2.1842076778411865,
|
| 37316 |
+
"step": 10652
|
| 37317 |
+
},
|
| 37318 |
+
{
|
| 37319 |
+
"epoch": 0.3382222222222222,
|
| 37320 |
+
"grad_norm": 0.07861328125,
|
| 37321 |
+
"learning_rate": 0.1,
|
| 37322 |
+
"loss": 2.1835153102874756,
|
| 37323 |
+
"step": 10654
|
| 37324 |
+
},
|
| 37325 |
+
{
|
| 37326 |
+
"epoch": 0.3382857142857143,
|
| 37327 |
+
"grad_norm": 0.1201171875,
|
| 37328 |
+
"learning_rate": 0.1,
|
| 37329 |
+
"loss": 2.184572696685791,
|
| 37330 |
+
"step": 10656
|
| 37331 |
+
},
|
| 37332 |
+
{
|
| 37333 |
+
"epoch": 0.33834920634920634,
|
| 37334 |
+
"grad_norm": 0.205078125,
|
| 37335 |
+
"learning_rate": 0.1,
|
| 37336 |
+
"loss": 2.20729398727417,
|
| 37337 |
+
"step": 10658
|
| 37338 |
+
},
|
| 37339 |
+
{
|
| 37340 |
+
"epoch": 0.3384126984126984,
|
| 37341 |
+
"grad_norm": 0.11328125,
|
| 37342 |
+
"learning_rate": 0.1,
|
| 37343 |
+
"loss": 2.148922920227051,
|
| 37344 |
+
"step": 10660
|
| 37345 |
+
},
|
| 37346 |
+
{
|
| 37347 |
+
"epoch": 0.3384761904761905,
|
| 37348 |
+
"grad_norm": 0.0595703125,
|
| 37349 |
+
"learning_rate": 0.1,
|
| 37350 |
+
"loss": 2.197078227996826,
|
| 37351 |
+
"step": 10662
|
| 37352 |
+
},
|
| 37353 |
+
{
|
| 37354 |
+
"epoch": 0.33853968253968253,
|
| 37355 |
+
"grad_norm": 0.1455078125,
|
| 37356 |
+
"learning_rate": 0.1,
|
| 37357 |
+
"loss": 2.1765565872192383,
|
| 37358 |
+
"step": 10664
|
| 37359 |
+
},
|
| 37360 |
+
{
|
| 37361 |
+
"epoch": 0.33860317460317463,
|
| 37362 |
+
"grad_norm": 0.12109375,
|
| 37363 |
+
"learning_rate": 0.1,
|
| 37364 |
+
"loss": 2.160946846008301,
|
| 37365 |
+
"step": 10666
|
| 37366 |
+
},
|
| 37367 |
+
{
|
| 37368 |
+
"epoch": 0.33866666666666667,
|
| 37369 |
+
"grad_norm": 0.138671875,
|
| 37370 |
+
"learning_rate": 0.1,
|
| 37371 |
+
"loss": 2.1720829010009766,
|
| 37372 |
+
"step": 10668
|
| 37373 |
+
},
|
| 37374 |
+
{
|
| 37375 |
+
"epoch": 0.3387301587301587,
|
| 37376 |
+
"grad_norm": 0.28515625,
|
| 37377 |
+
"learning_rate": 0.1,
|
| 37378 |
+
"loss": 2.2134289741516113,
|
| 37379 |
+
"step": 10670
|
| 37380 |
+
},
|
| 37381 |
+
{
|
| 37382 |
+
"epoch": 0.3387936507936508,
|
| 37383 |
+
"grad_norm": 0.2333984375,
|
| 37384 |
+
"learning_rate": 0.1,
|
| 37385 |
+
"loss": 2.1767289638519287,
|
| 37386 |
+
"step": 10672
|
| 37387 |
+
},
|
| 37388 |
+
{
|
| 37389 |
+
"epoch": 0.33885714285714286,
|
| 37390 |
+
"grad_norm": 0.173828125,
|
| 37391 |
+
"learning_rate": 0.1,
|
| 37392 |
+
"loss": 2.1474976539611816,
|
| 37393 |
+
"step": 10674
|
| 37394 |
+
},
|
| 37395 |
+
{
|
| 37396 |
+
"epoch": 0.3389206349206349,
|
| 37397 |
+
"grad_norm": 0.1083984375,
|
| 37398 |
+
"learning_rate": 0.1,
|
| 37399 |
+
"loss": 2.1698927879333496,
|
| 37400 |
+
"step": 10676
|
| 37401 |
+
},
|
| 37402 |
+
{
|
| 37403 |
+
"epoch": 0.338984126984127,
|
| 37404 |
+
"grad_norm": 0.2421875,
|
| 37405 |
+
"learning_rate": 0.1,
|
| 37406 |
+
"loss": 2.170835494995117,
|
| 37407 |
+
"step": 10678
|
| 37408 |
+
},
|
| 37409 |
+
{
|
| 37410 |
+
"epoch": 0.33904761904761904,
|
| 37411 |
+
"grad_norm": 0.193359375,
|
| 37412 |
+
"learning_rate": 0.1,
|
| 37413 |
+
"loss": 2.15285587310791,
|
| 37414 |
+
"step": 10680
|
| 37415 |
+
},
|
| 37416 |
+
{
|
| 37417 |
+
"epoch": 0.3391111111111111,
|
| 37418 |
+
"grad_norm": 0.2080078125,
|
| 37419 |
+
"learning_rate": 0.1,
|
| 37420 |
+
"loss": 2.1628880500793457,
|
| 37421 |
+
"step": 10682
|
| 37422 |
+
},
|
| 37423 |
+
{
|
| 37424 |
+
"epoch": 0.3391746031746032,
|
| 37425 |
+
"grad_norm": 0.130859375,
|
| 37426 |
+
"learning_rate": 0.1,
|
| 37427 |
+
"loss": 2.1494805812835693,
|
| 37428 |
+
"step": 10684
|
| 37429 |
+
},
|
| 37430 |
+
{
|
| 37431 |
+
"epoch": 0.3392380952380952,
|
| 37432 |
+
"grad_norm": 0.052490234375,
|
| 37433 |
+
"learning_rate": 0.1,
|
| 37434 |
+
"loss": 2.1779205799102783,
|
| 37435 |
+
"step": 10686
|
| 37436 |
+
},
|
| 37437 |
+
{
|
| 37438 |
+
"epoch": 0.3393015873015873,
|
| 37439 |
+
"grad_norm": 0.0966796875,
|
| 37440 |
+
"learning_rate": 0.1,
|
| 37441 |
+
"loss": 2.150136947631836,
|
| 37442 |
+
"step": 10688
|
| 37443 |
+
},
|
| 37444 |
+
{
|
| 37445 |
+
"epoch": 0.33936507936507937,
|
| 37446 |
+
"grad_norm": 0.053466796875,
|
| 37447 |
+
"learning_rate": 0.1,
|
| 37448 |
+
"loss": 2.1458067893981934,
|
| 37449 |
+
"step": 10690
|
| 37450 |
+
},
|
| 37451 |
+
{
|
| 37452 |
+
"epoch": 0.3394285714285714,
|
| 37453 |
+
"grad_norm": 0.322265625,
|
| 37454 |
+
"learning_rate": 0.1,
|
| 37455 |
+
"loss": 2.15260910987854,
|
| 37456 |
+
"step": 10692
|
| 37457 |
+
},
|
| 37458 |
+
{
|
| 37459 |
+
"epoch": 0.3394920634920635,
|
| 37460 |
+
"grad_norm": 0.279296875,
|
| 37461 |
+
"learning_rate": 0.1,
|
| 37462 |
+
"loss": 2.1635806560516357,
|
| 37463 |
+
"step": 10694
|
| 37464 |
+
},
|
| 37465 |
+
{
|
| 37466 |
+
"epoch": 0.33955555555555555,
|
| 37467 |
+
"grad_norm": 0.2890625,
|
| 37468 |
+
"learning_rate": 0.1,
|
| 37469 |
+
"loss": 2.1902191638946533,
|
| 37470 |
+
"step": 10696
|
| 37471 |
+
},
|
| 37472 |
+
{
|
| 37473 |
+
"epoch": 0.3396190476190476,
|
| 37474 |
+
"grad_norm": 0.1259765625,
|
| 37475 |
+
"learning_rate": 0.1,
|
| 37476 |
+
"loss": 2.171922206878662,
|
| 37477 |
+
"step": 10698
|
| 37478 |
+
},
|
| 37479 |
+
{
|
| 37480 |
+
"epoch": 0.3396825396825397,
|
| 37481 |
+
"grad_norm": 0.09326171875,
|
| 37482 |
+
"learning_rate": 0.1,
|
| 37483 |
+
"loss": 2.1400513648986816,
|
| 37484 |
+
"step": 10700
|
| 37485 |
+
},
|
| 37486 |
+
{
|
| 37487 |
+
"epoch": 0.33974603174603174,
|
| 37488 |
+
"grad_norm": 0.1181640625,
|
| 37489 |
+
"learning_rate": 0.1,
|
| 37490 |
+
"loss": 2.1618614196777344,
|
| 37491 |
+
"step": 10702
|
| 37492 |
+
},
|
| 37493 |
+
{
|
| 37494 |
+
"epoch": 0.3398095238095238,
|
| 37495 |
+
"grad_norm": 0.17578125,
|
| 37496 |
+
"learning_rate": 0.1,
|
| 37497 |
+
"loss": 2.1420485973358154,
|
| 37498 |
+
"step": 10704
|
| 37499 |
+
},
|
| 37500 |
+
{
|
| 37501 |
+
"epoch": 0.3398730158730159,
|
| 37502 |
+
"grad_norm": 0.1025390625,
|
| 37503 |
+
"learning_rate": 0.1,
|
| 37504 |
+
"loss": 2.181180953979492,
|
| 37505 |
+
"step": 10706
|
| 37506 |
+
},
|
| 37507 |
+
{
|
| 37508 |
+
"epoch": 0.3399365079365079,
|
| 37509 |
+
"grad_norm": 0.1064453125,
|
| 37510 |
+
"learning_rate": 0.1,
|
| 37511 |
+
"loss": 2.175330877304077,
|
| 37512 |
+
"step": 10708
|
| 37513 |
+
},
|
| 37514 |
+
{
|
| 37515 |
+
"epoch": 0.34,
|
| 37516 |
+
"grad_norm": 0.138671875,
|
| 37517 |
+
"learning_rate": 0.1,
|
| 37518 |
+
"loss": 2.1646363735198975,
|
| 37519 |
+
"step": 10710
|
| 37520 |
}
|
| 37521 |
],
|
| 37522 |
"logging_steps": 2,
|
|
|
|
| 37536 |
"attributes": {}
|
| 37537 |
}
|
| 37538 |
},
|
| 37539 |
+
"total_flos": 3.5470475061314245e+19,
|
| 37540 |
"train_batch_size": 4,
|
| 37541 |
"trial_name": null,
|
| 37542 |
"trial_params": null
|