Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 11340, checkpoint
Browse files- last-checkpoint/trainer_state.json +1109 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -38616,6 +38616,1112 @@
|
|
| 38616 |
"learning_rate": 0.1,
|
| 38617 |
"loss": 2.3982603549957275,
|
| 38618 |
"step": 11024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38619 |
}
|
| 38620 |
],
|
| 38621 |
"logging_steps": 2,
|
|
@@ -38635,7 +39741,7 @@
|
|
| 38635 |
"attributes": {}
|
| 38636 |
}
|
| 38637 |
},
|
| 38638 |
-
"total_flos": 3.
|
| 38639 |
"train_batch_size": 4,
|
| 38640 |
"trial_name": null,
|
| 38641 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.36,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 11340,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 38616 |
"learning_rate": 0.1,
|
| 38617 |
"loss": 2.3982603549957275,
|
| 38618 |
"step": 11024
|
| 38619 |
+
},
|
| 38620 |
+
{
|
| 38621 |
+
"epoch": 0.350031746031746,
|
| 38622 |
+
"grad_norm": 0.11767578125,
|
| 38623 |
+
"learning_rate": 0.1,
|
| 38624 |
+
"loss": 2.417543888092041,
|
| 38625 |
+
"step": 11026
|
| 38626 |
+
},
|
| 38627 |
+
{
|
| 38628 |
+
"epoch": 0.35009523809523807,
|
| 38629 |
+
"grad_norm": 0.4921875,
|
| 38630 |
+
"learning_rate": 0.1,
|
| 38631 |
+
"loss": 2.4204814434051514,
|
| 38632 |
+
"step": 11028
|
| 38633 |
+
},
|
| 38634 |
+
{
|
| 38635 |
+
"epoch": 0.35015873015873017,
|
| 38636 |
+
"grad_norm": 0.2470703125,
|
| 38637 |
+
"learning_rate": 0.1,
|
| 38638 |
+
"loss": 2.423769235610962,
|
| 38639 |
+
"step": 11030
|
| 38640 |
+
},
|
| 38641 |
+
{
|
| 38642 |
+
"epoch": 0.3502222222222222,
|
| 38643 |
+
"grad_norm": 0.26171875,
|
| 38644 |
+
"learning_rate": 0.1,
|
| 38645 |
+
"loss": 2.403280019760132,
|
| 38646 |
+
"step": 11032
|
| 38647 |
+
},
|
| 38648 |
+
{
|
| 38649 |
+
"epoch": 0.3502857142857143,
|
| 38650 |
+
"grad_norm": 0.12451171875,
|
| 38651 |
+
"learning_rate": 0.1,
|
| 38652 |
+
"loss": 2.391350746154785,
|
| 38653 |
+
"step": 11034
|
| 38654 |
+
},
|
| 38655 |
+
{
|
| 38656 |
+
"epoch": 0.35034920634920635,
|
| 38657 |
+
"grad_norm": 0.166015625,
|
| 38658 |
+
"learning_rate": 0.1,
|
| 38659 |
+
"loss": 2.422663927078247,
|
| 38660 |
+
"step": 11036
|
| 38661 |
+
},
|
| 38662 |
+
{
|
| 38663 |
+
"epoch": 0.3504126984126984,
|
| 38664 |
+
"grad_norm": 0.4140625,
|
| 38665 |
+
"learning_rate": 0.1,
|
| 38666 |
+
"loss": 2.4171299934387207,
|
| 38667 |
+
"step": 11038
|
| 38668 |
+
},
|
| 38669 |
+
{
|
| 38670 |
+
"epoch": 0.3504761904761905,
|
| 38671 |
+
"grad_norm": 0.287109375,
|
| 38672 |
+
"learning_rate": 0.1,
|
| 38673 |
+
"loss": 2.4181694984436035,
|
| 38674 |
+
"step": 11040
|
| 38675 |
+
},
|
| 38676 |
+
{
|
| 38677 |
+
"epoch": 0.35053968253968254,
|
| 38678 |
+
"grad_norm": 0.181640625,
|
| 38679 |
+
"learning_rate": 0.1,
|
| 38680 |
+
"loss": 2.4219069480895996,
|
| 38681 |
+
"step": 11042
|
| 38682 |
+
},
|
| 38683 |
+
{
|
| 38684 |
+
"epoch": 0.3506031746031746,
|
| 38685 |
+
"grad_norm": 0.05517578125,
|
| 38686 |
+
"learning_rate": 0.1,
|
| 38687 |
+
"loss": 2.4353079795837402,
|
| 38688 |
+
"step": 11044
|
| 38689 |
+
},
|
| 38690 |
+
{
|
| 38691 |
+
"epoch": 0.3506666666666667,
|
| 38692 |
+
"grad_norm": 0.1513671875,
|
| 38693 |
+
"learning_rate": 0.1,
|
| 38694 |
+
"loss": 2.4211533069610596,
|
| 38695 |
+
"step": 11046
|
| 38696 |
+
},
|
| 38697 |
+
{
|
| 38698 |
+
"epoch": 0.3507301587301587,
|
| 38699 |
+
"grad_norm": 0.21484375,
|
| 38700 |
+
"learning_rate": 0.1,
|
| 38701 |
+
"loss": 2.4109206199645996,
|
| 38702 |
+
"step": 11048
|
| 38703 |
+
},
|
| 38704 |
+
{
|
| 38705 |
+
"epoch": 0.35079365079365077,
|
| 38706 |
+
"grad_norm": 0.234375,
|
| 38707 |
+
"learning_rate": 0.1,
|
| 38708 |
+
"loss": 2.42941951751709,
|
| 38709 |
+
"step": 11050
|
| 38710 |
+
},
|
| 38711 |
+
{
|
| 38712 |
+
"epoch": 0.35085714285714287,
|
| 38713 |
+
"grad_norm": 0.1474609375,
|
| 38714 |
+
"learning_rate": 0.1,
|
| 38715 |
+
"loss": 2.3879783153533936,
|
| 38716 |
+
"step": 11052
|
| 38717 |
+
},
|
| 38718 |
+
{
|
| 38719 |
+
"epoch": 0.3509206349206349,
|
| 38720 |
+
"grad_norm": 0.359375,
|
| 38721 |
+
"learning_rate": 0.1,
|
| 38722 |
+
"loss": 2.4356982707977295,
|
| 38723 |
+
"step": 11054
|
| 38724 |
+
},
|
| 38725 |
+
{
|
| 38726 |
+
"epoch": 0.350984126984127,
|
| 38727 |
+
"grad_norm": 0.212890625,
|
| 38728 |
+
"learning_rate": 0.1,
|
| 38729 |
+
"loss": 2.4666500091552734,
|
| 38730 |
+
"step": 11056
|
| 38731 |
+
},
|
| 38732 |
+
{
|
| 38733 |
+
"epoch": 0.35104761904761905,
|
| 38734 |
+
"grad_norm": 0.072265625,
|
| 38735 |
+
"learning_rate": 0.1,
|
| 38736 |
+
"loss": 2.4236481189727783,
|
| 38737 |
+
"step": 11058
|
| 38738 |
+
},
|
| 38739 |
+
{
|
| 38740 |
+
"epoch": 0.3511111111111111,
|
| 38741 |
+
"grad_norm": 0.2421875,
|
| 38742 |
+
"learning_rate": 0.1,
|
| 38743 |
+
"loss": 2.4124252796173096,
|
| 38744 |
+
"step": 11060
|
| 38745 |
+
},
|
| 38746 |
+
{
|
| 38747 |
+
"epoch": 0.3511746031746032,
|
| 38748 |
+
"grad_norm": 0.314453125,
|
| 38749 |
+
"learning_rate": 0.1,
|
| 38750 |
+
"loss": 2.432394027709961,
|
| 38751 |
+
"step": 11062
|
| 38752 |
+
},
|
| 38753 |
+
{
|
| 38754 |
+
"epoch": 0.35123809523809524,
|
| 38755 |
+
"grad_norm": 0.14453125,
|
| 38756 |
+
"learning_rate": 0.1,
|
| 38757 |
+
"loss": 2.4237029552459717,
|
| 38758 |
+
"step": 11064
|
| 38759 |
+
},
|
| 38760 |
+
{
|
| 38761 |
+
"epoch": 0.3513015873015873,
|
| 38762 |
+
"grad_norm": 0.1044921875,
|
| 38763 |
+
"learning_rate": 0.1,
|
| 38764 |
+
"loss": 2.424628496170044,
|
| 38765 |
+
"step": 11066
|
| 38766 |
+
},
|
| 38767 |
+
{
|
| 38768 |
+
"epoch": 0.3513650793650794,
|
| 38769 |
+
"grad_norm": 0.310546875,
|
| 38770 |
+
"learning_rate": 0.1,
|
| 38771 |
+
"loss": 2.4362010955810547,
|
| 38772 |
+
"step": 11068
|
| 38773 |
+
},
|
| 38774 |
+
{
|
| 38775 |
+
"epoch": 0.3514285714285714,
|
| 38776 |
+
"grad_norm": 0.203125,
|
| 38777 |
+
"learning_rate": 0.1,
|
| 38778 |
+
"loss": 2.437283992767334,
|
| 38779 |
+
"step": 11070
|
| 38780 |
+
},
|
| 38781 |
+
{
|
| 38782 |
+
"epoch": 0.35149206349206347,
|
| 38783 |
+
"grad_norm": 0.107421875,
|
| 38784 |
+
"learning_rate": 0.1,
|
| 38785 |
+
"loss": 2.446931838989258,
|
| 38786 |
+
"step": 11072
|
| 38787 |
+
},
|
| 38788 |
+
{
|
| 38789 |
+
"epoch": 0.35155555555555557,
|
| 38790 |
+
"grad_norm": 0.1591796875,
|
| 38791 |
+
"learning_rate": 0.1,
|
| 38792 |
+
"loss": 2.439727783203125,
|
| 38793 |
+
"step": 11074
|
| 38794 |
+
},
|
| 38795 |
+
{
|
| 38796 |
+
"epoch": 0.3516190476190476,
|
| 38797 |
+
"grad_norm": 0.11474609375,
|
| 38798 |
+
"learning_rate": 0.1,
|
| 38799 |
+
"loss": 2.451850175857544,
|
| 38800 |
+
"step": 11076
|
| 38801 |
+
},
|
| 38802 |
+
{
|
| 38803 |
+
"epoch": 0.3516825396825397,
|
| 38804 |
+
"grad_norm": 0.185546875,
|
| 38805 |
+
"learning_rate": 0.1,
|
| 38806 |
+
"loss": 2.431602954864502,
|
| 38807 |
+
"step": 11078
|
| 38808 |
+
},
|
| 38809 |
+
{
|
| 38810 |
+
"epoch": 0.35174603174603175,
|
| 38811 |
+
"grad_norm": 0.33984375,
|
| 38812 |
+
"learning_rate": 0.1,
|
| 38813 |
+
"loss": 2.4908480644226074,
|
| 38814 |
+
"step": 11080
|
| 38815 |
+
},
|
| 38816 |
+
{
|
| 38817 |
+
"epoch": 0.3518095238095238,
|
| 38818 |
+
"grad_norm": 0.2431640625,
|
| 38819 |
+
"learning_rate": 0.1,
|
| 38820 |
+
"loss": 2.444524049758911,
|
| 38821 |
+
"step": 11082
|
| 38822 |
+
},
|
| 38823 |
+
{
|
| 38824 |
+
"epoch": 0.3518730158730159,
|
| 38825 |
+
"grad_norm": 0.244140625,
|
| 38826 |
+
"learning_rate": 0.1,
|
| 38827 |
+
"loss": 2.432138204574585,
|
| 38828 |
+
"step": 11084
|
| 38829 |
+
},
|
| 38830 |
+
{
|
| 38831 |
+
"epoch": 0.35193650793650794,
|
| 38832 |
+
"grad_norm": 0.310546875,
|
| 38833 |
+
"learning_rate": 0.1,
|
| 38834 |
+
"loss": 2.4384355545043945,
|
| 38835 |
+
"step": 11086
|
| 38836 |
+
},
|
| 38837 |
+
{
|
| 38838 |
+
"epoch": 0.352,
|
| 38839 |
+
"grad_norm": 0.224609375,
|
| 38840 |
+
"learning_rate": 0.1,
|
| 38841 |
+
"loss": 2.4333276748657227,
|
| 38842 |
+
"step": 11088
|
| 38843 |
+
},
|
| 38844 |
+
{
|
| 38845 |
+
"epoch": 0.3520634920634921,
|
| 38846 |
+
"grad_norm": 0.11474609375,
|
| 38847 |
+
"learning_rate": 0.1,
|
| 38848 |
+
"loss": 2.467416524887085,
|
| 38849 |
+
"step": 11090
|
| 38850 |
+
},
|
| 38851 |
+
{
|
| 38852 |
+
"epoch": 0.3521269841269841,
|
| 38853 |
+
"grad_norm": 0.1923828125,
|
| 38854 |
+
"learning_rate": 0.1,
|
| 38855 |
+
"loss": 2.4466445446014404,
|
| 38856 |
+
"step": 11092
|
| 38857 |
+
},
|
| 38858 |
+
{
|
| 38859 |
+
"epoch": 0.35219047619047616,
|
| 38860 |
+
"grad_norm": 0.37890625,
|
| 38861 |
+
"learning_rate": 0.1,
|
| 38862 |
+
"loss": 2.461064338684082,
|
| 38863 |
+
"step": 11094
|
| 38864 |
+
},
|
| 38865 |
+
{
|
| 38866 |
+
"epoch": 0.35225396825396826,
|
| 38867 |
+
"grad_norm": 0.291015625,
|
| 38868 |
+
"learning_rate": 0.1,
|
| 38869 |
+
"loss": 2.464916229248047,
|
| 38870 |
+
"step": 11096
|
| 38871 |
+
},
|
| 38872 |
+
{
|
| 38873 |
+
"epoch": 0.3523174603174603,
|
| 38874 |
+
"grad_norm": 0.0771484375,
|
| 38875 |
+
"learning_rate": 0.1,
|
| 38876 |
+
"loss": 2.4417896270751953,
|
| 38877 |
+
"step": 11098
|
| 38878 |
+
},
|
| 38879 |
+
{
|
| 38880 |
+
"epoch": 0.3523809523809524,
|
| 38881 |
+
"grad_norm": 0.1767578125,
|
| 38882 |
+
"learning_rate": 0.1,
|
| 38883 |
+
"loss": 2.4504122734069824,
|
| 38884 |
+
"step": 11100
|
| 38885 |
+
},
|
| 38886 |
+
{
|
| 38887 |
+
"epoch": 0.35244444444444445,
|
| 38888 |
+
"grad_norm": 0.25390625,
|
| 38889 |
+
"learning_rate": 0.1,
|
| 38890 |
+
"loss": 2.46895694732666,
|
| 38891 |
+
"step": 11102
|
| 38892 |
+
},
|
| 38893 |
+
{
|
| 38894 |
+
"epoch": 0.3525079365079365,
|
| 38895 |
+
"grad_norm": 0.125,
|
| 38896 |
+
"learning_rate": 0.1,
|
| 38897 |
+
"loss": 2.4499869346618652,
|
| 38898 |
+
"step": 11104
|
| 38899 |
+
},
|
| 38900 |
+
{
|
| 38901 |
+
"epoch": 0.3525714285714286,
|
| 38902 |
+
"grad_norm": 0.046875,
|
| 38903 |
+
"learning_rate": 0.1,
|
| 38904 |
+
"loss": 2.463418483734131,
|
| 38905 |
+
"step": 11106
|
| 38906 |
+
},
|
| 38907 |
+
{
|
| 38908 |
+
"epoch": 0.35263492063492063,
|
| 38909 |
+
"grad_norm": 0.091796875,
|
| 38910 |
+
"learning_rate": 0.1,
|
| 38911 |
+
"loss": 2.453564405441284,
|
| 38912 |
+
"step": 11108
|
| 38913 |
+
},
|
| 38914 |
+
{
|
| 38915 |
+
"epoch": 0.3526984126984127,
|
| 38916 |
+
"grad_norm": 0.1982421875,
|
| 38917 |
+
"learning_rate": 0.1,
|
| 38918 |
+
"loss": 2.4937078952789307,
|
| 38919 |
+
"step": 11110
|
| 38920 |
+
},
|
| 38921 |
+
{
|
| 38922 |
+
"epoch": 0.3527619047619048,
|
| 38923 |
+
"grad_norm": 0.486328125,
|
| 38924 |
+
"learning_rate": 0.1,
|
| 38925 |
+
"loss": 2.4809000492095947,
|
| 38926 |
+
"step": 11112
|
| 38927 |
+
},
|
| 38928 |
+
{
|
| 38929 |
+
"epoch": 0.3528253968253968,
|
| 38930 |
+
"grad_norm": 0.30078125,
|
| 38931 |
+
"learning_rate": 0.1,
|
| 38932 |
+
"loss": 2.503629446029663,
|
| 38933 |
+
"step": 11114
|
| 38934 |
+
},
|
| 38935 |
+
{
|
| 38936 |
+
"epoch": 0.35288888888888886,
|
| 38937 |
+
"grad_norm": 0.12451171875,
|
| 38938 |
+
"learning_rate": 0.1,
|
| 38939 |
+
"loss": 2.464132070541382,
|
| 38940 |
+
"step": 11116
|
| 38941 |
+
},
|
| 38942 |
+
{
|
| 38943 |
+
"epoch": 0.35295238095238096,
|
| 38944 |
+
"grad_norm": 0.341796875,
|
| 38945 |
+
"learning_rate": 0.1,
|
| 38946 |
+
"loss": 2.5101478099823,
|
| 38947 |
+
"step": 11118
|
| 38948 |
+
},
|
| 38949 |
+
{
|
| 38950 |
+
"epoch": 0.353015873015873,
|
| 38951 |
+
"grad_norm": 0.328125,
|
| 38952 |
+
"learning_rate": 0.1,
|
| 38953 |
+
"loss": 2.492459297180176,
|
| 38954 |
+
"step": 11120
|
| 38955 |
+
},
|
| 38956 |
+
{
|
| 38957 |
+
"epoch": 0.3530793650793651,
|
| 38958 |
+
"grad_norm": 0.07666015625,
|
| 38959 |
+
"learning_rate": 0.1,
|
| 38960 |
+
"loss": 2.4816741943359375,
|
| 38961 |
+
"step": 11122
|
| 38962 |
+
},
|
| 38963 |
+
{
|
| 38964 |
+
"epoch": 0.35314285714285715,
|
| 38965 |
+
"grad_norm": 0.1513671875,
|
| 38966 |
+
"learning_rate": 0.1,
|
| 38967 |
+
"loss": 2.4845693111419678,
|
| 38968 |
+
"step": 11124
|
| 38969 |
+
},
|
| 38970 |
+
{
|
| 38971 |
+
"epoch": 0.3532063492063492,
|
| 38972 |
+
"grad_norm": 0.11376953125,
|
| 38973 |
+
"learning_rate": 0.1,
|
| 38974 |
+
"loss": 2.47183895111084,
|
| 38975 |
+
"step": 11126
|
| 38976 |
+
},
|
| 38977 |
+
{
|
| 38978 |
+
"epoch": 0.3532698412698413,
|
| 38979 |
+
"grad_norm": 0.1982421875,
|
| 38980 |
+
"learning_rate": 0.1,
|
| 38981 |
+
"loss": 2.5153868198394775,
|
| 38982 |
+
"step": 11128
|
| 38983 |
+
},
|
| 38984 |
+
{
|
| 38985 |
+
"epoch": 0.35333333333333333,
|
| 38986 |
+
"grad_norm": 0.1748046875,
|
| 38987 |
+
"learning_rate": 0.1,
|
| 38988 |
+
"loss": 2.5130763053894043,
|
| 38989 |
+
"step": 11130
|
| 38990 |
+
},
|
| 38991 |
+
{
|
| 38992 |
+
"epoch": 0.3533968253968254,
|
| 38993 |
+
"grad_norm": 0.1484375,
|
| 38994 |
+
"learning_rate": 0.1,
|
| 38995 |
+
"loss": 2.487558126449585,
|
| 38996 |
+
"step": 11132
|
| 38997 |
+
},
|
| 38998 |
+
{
|
| 38999 |
+
"epoch": 0.3534603174603175,
|
| 39000 |
+
"grad_norm": 0.1572265625,
|
| 39001 |
+
"learning_rate": 0.1,
|
| 39002 |
+
"loss": 2.501115322113037,
|
| 39003 |
+
"step": 11134
|
| 39004 |
+
},
|
| 39005 |
+
{
|
| 39006 |
+
"epoch": 0.3535238095238095,
|
| 39007 |
+
"grad_norm": 0.08154296875,
|
| 39008 |
+
"learning_rate": 0.1,
|
| 39009 |
+
"loss": 2.516602039337158,
|
| 39010 |
+
"step": 11136
|
| 39011 |
+
},
|
| 39012 |
+
{
|
| 39013 |
+
"epoch": 0.35358730158730156,
|
| 39014 |
+
"grad_norm": 0.341796875,
|
| 39015 |
+
"learning_rate": 0.1,
|
| 39016 |
+
"loss": 2.49763560295105,
|
| 39017 |
+
"step": 11138
|
| 39018 |
+
},
|
| 39019 |
+
{
|
| 39020 |
+
"epoch": 0.35365079365079366,
|
| 39021 |
+
"grad_norm": 0.482421875,
|
| 39022 |
+
"learning_rate": 0.1,
|
| 39023 |
+
"loss": 2.4931797981262207,
|
| 39024 |
+
"step": 11140
|
| 39025 |
+
},
|
| 39026 |
+
{
|
| 39027 |
+
"epoch": 0.3537142857142857,
|
| 39028 |
+
"grad_norm": 0.26953125,
|
| 39029 |
+
"learning_rate": 0.1,
|
| 39030 |
+
"loss": 2.5045690536499023,
|
| 39031 |
+
"step": 11142
|
| 39032 |
+
},
|
| 39033 |
+
{
|
| 39034 |
+
"epoch": 0.3537777777777778,
|
| 39035 |
+
"grad_norm": 0.318359375,
|
| 39036 |
+
"learning_rate": 0.1,
|
| 39037 |
+
"loss": 2.4567301273345947,
|
| 39038 |
+
"step": 11144
|
| 39039 |
+
},
|
| 39040 |
+
{
|
| 39041 |
+
"epoch": 0.35384126984126985,
|
| 39042 |
+
"grad_norm": 0.4375,
|
| 39043 |
+
"learning_rate": 0.1,
|
| 39044 |
+
"loss": 2.4942221641540527,
|
| 39045 |
+
"step": 11146
|
| 39046 |
+
},
|
| 39047 |
+
{
|
| 39048 |
+
"epoch": 0.3539047619047619,
|
| 39049 |
+
"grad_norm": 0.263671875,
|
| 39050 |
+
"learning_rate": 0.1,
|
| 39051 |
+
"loss": 2.494699001312256,
|
| 39052 |
+
"step": 11148
|
| 39053 |
+
},
|
| 39054 |
+
{
|
| 39055 |
+
"epoch": 0.353968253968254,
|
| 39056 |
+
"grad_norm": 0.185546875,
|
| 39057 |
+
"learning_rate": 0.1,
|
| 39058 |
+
"loss": 2.503068685531616,
|
| 39059 |
+
"step": 11150
|
| 39060 |
+
},
|
| 39061 |
+
{
|
| 39062 |
+
"epoch": 0.35403174603174603,
|
| 39063 |
+
"grad_norm": 0.154296875,
|
| 39064 |
+
"learning_rate": 0.1,
|
| 39065 |
+
"loss": 2.525214672088623,
|
| 39066 |
+
"step": 11152
|
| 39067 |
+
},
|
| 39068 |
+
{
|
| 39069 |
+
"epoch": 0.3540952380952381,
|
| 39070 |
+
"grad_norm": 0.087890625,
|
| 39071 |
+
"learning_rate": 0.1,
|
| 39072 |
+
"loss": 2.475733518600464,
|
| 39073 |
+
"step": 11154
|
| 39074 |
+
},
|
| 39075 |
+
{
|
| 39076 |
+
"epoch": 0.3541587301587302,
|
| 39077 |
+
"grad_norm": 0.201171875,
|
| 39078 |
+
"learning_rate": 0.1,
|
| 39079 |
+
"loss": 2.518965005874634,
|
| 39080 |
+
"step": 11156
|
| 39081 |
+
},
|
| 39082 |
+
{
|
| 39083 |
+
"epoch": 0.3542222222222222,
|
| 39084 |
+
"grad_norm": 0.162109375,
|
| 39085 |
+
"learning_rate": 0.1,
|
| 39086 |
+
"loss": 2.4847307205200195,
|
| 39087 |
+
"step": 11158
|
| 39088 |
+
},
|
| 39089 |
+
{
|
| 39090 |
+
"epoch": 0.35428571428571426,
|
| 39091 |
+
"grad_norm": 0.126953125,
|
| 39092 |
+
"learning_rate": 0.1,
|
| 39093 |
+
"loss": 2.5241191387176514,
|
| 39094 |
+
"step": 11160
|
| 39095 |
+
},
|
| 39096 |
+
{
|
| 39097 |
+
"epoch": 0.35434920634920636,
|
| 39098 |
+
"grad_norm": 0.0673828125,
|
| 39099 |
+
"learning_rate": 0.1,
|
| 39100 |
+
"loss": 2.5268568992614746,
|
| 39101 |
+
"step": 11162
|
| 39102 |
+
},
|
| 39103 |
+
{
|
| 39104 |
+
"epoch": 0.3544126984126984,
|
| 39105 |
+
"grad_norm": 0.18359375,
|
| 39106 |
+
"learning_rate": 0.1,
|
| 39107 |
+
"loss": 2.489588975906372,
|
| 39108 |
+
"step": 11164
|
| 39109 |
+
},
|
| 39110 |
+
{
|
| 39111 |
+
"epoch": 0.3544761904761905,
|
| 39112 |
+
"grad_norm": 0.19921875,
|
| 39113 |
+
"learning_rate": 0.1,
|
| 39114 |
+
"loss": 2.517854928970337,
|
| 39115 |
+
"step": 11166
|
| 39116 |
+
},
|
| 39117 |
+
{
|
| 39118 |
+
"epoch": 0.35453968253968254,
|
| 39119 |
+
"grad_norm": 0.1796875,
|
| 39120 |
+
"learning_rate": 0.1,
|
| 39121 |
+
"loss": 2.5115416049957275,
|
| 39122 |
+
"step": 11168
|
| 39123 |
+
},
|
| 39124 |
+
{
|
| 39125 |
+
"epoch": 0.3546031746031746,
|
| 39126 |
+
"grad_norm": 0.126953125,
|
| 39127 |
+
"learning_rate": 0.1,
|
| 39128 |
+
"loss": 2.5270791053771973,
|
| 39129 |
+
"step": 11170
|
| 39130 |
+
},
|
| 39131 |
+
{
|
| 39132 |
+
"epoch": 0.3546666666666667,
|
| 39133 |
+
"grad_norm": 0.189453125,
|
| 39134 |
+
"learning_rate": 0.1,
|
| 39135 |
+
"loss": 2.49017596244812,
|
| 39136 |
+
"step": 11172
|
| 39137 |
+
},
|
| 39138 |
+
{
|
| 39139 |
+
"epoch": 0.35473015873015873,
|
| 39140 |
+
"grad_norm": 0.07080078125,
|
| 39141 |
+
"learning_rate": 0.1,
|
| 39142 |
+
"loss": 2.486820697784424,
|
| 39143 |
+
"step": 11174
|
| 39144 |
+
},
|
| 39145 |
+
{
|
| 39146 |
+
"epoch": 0.35479365079365077,
|
| 39147 |
+
"grad_norm": 0.181640625,
|
| 39148 |
+
"learning_rate": 0.1,
|
| 39149 |
+
"loss": 2.4675469398498535,
|
| 39150 |
+
"step": 11176
|
| 39151 |
+
},
|
| 39152 |
+
{
|
| 39153 |
+
"epoch": 0.35485714285714287,
|
| 39154 |
+
"grad_norm": 0.5625,
|
| 39155 |
+
"learning_rate": 0.1,
|
| 39156 |
+
"loss": 2.5254688262939453,
|
| 39157 |
+
"step": 11178
|
| 39158 |
+
},
|
| 39159 |
+
{
|
| 39160 |
+
"epoch": 0.3549206349206349,
|
| 39161 |
+
"grad_norm": 0.2119140625,
|
| 39162 |
+
"learning_rate": 0.1,
|
| 39163 |
+
"loss": 2.49434494972229,
|
| 39164 |
+
"step": 11180
|
| 39165 |
+
},
|
| 39166 |
+
{
|
| 39167 |
+
"epoch": 0.35498412698412696,
|
| 39168 |
+
"grad_norm": 0.142578125,
|
| 39169 |
+
"learning_rate": 0.1,
|
| 39170 |
+
"loss": 2.473165273666382,
|
| 39171 |
+
"step": 11182
|
| 39172 |
+
},
|
| 39173 |
+
{
|
| 39174 |
+
"epoch": 0.35504761904761906,
|
| 39175 |
+
"grad_norm": 0.3125,
|
| 39176 |
+
"learning_rate": 0.1,
|
| 39177 |
+
"loss": 2.496553421020508,
|
| 39178 |
+
"step": 11184
|
| 39179 |
+
},
|
| 39180 |
+
{
|
| 39181 |
+
"epoch": 0.3551111111111111,
|
| 39182 |
+
"grad_norm": 0.609375,
|
| 39183 |
+
"learning_rate": 0.1,
|
| 39184 |
+
"loss": 2.5094237327575684,
|
| 39185 |
+
"step": 11186
|
| 39186 |
+
},
|
| 39187 |
+
{
|
| 39188 |
+
"epoch": 0.3551746031746032,
|
| 39189 |
+
"grad_norm": 0.154296875,
|
| 39190 |
+
"learning_rate": 0.1,
|
| 39191 |
+
"loss": 2.4690942764282227,
|
| 39192 |
+
"step": 11188
|
| 39193 |
+
},
|
| 39194 |
+
{
|
| 39195 |
+
"epoch": 0.35523809523809524,
|
| 39196 |
+
"grad_norm": 0.1591796875,
|
| 39197 |
+
"learning_rate": 0.1,
|
| 39198 |
+
"loss": 2.4825117588043213,
|
| 39199 |
+
"step": 11190
|
| 39200 |
+
},
|
| 39201 |
+
{
|
| 39202 |
+
"epoch": 0.3553015873015873,
|
| 39203 |
+
"grad_norm": 0.259765625,
|
| 39204 |
+
"learning_rate": 0.1,
|
| 39205 |
+
"loss": 2.530270576477051,
|
| 39206 |
+
"step": 11192
|
| 39207 |
+
},
|
| 39208 |
+
{
|
| 39209 |
+
"epoch": 0.3553650793650794,
|
| 39210 |
+
"grad_norm": 0.0947265625,
|
| 39211 |
+
"learning_rate": 0.1,
|
| 39212 |
+
"loss": 2.500499725341797,
|
| 39213 |
+
"step": 11194
|
| 39214 |
+
},
|
| 39215 |
+
{
|
| 39216 |
+
"epoch": 0.3554285714285714,
|
| 39217 |
+
"grad_norm": 0.0849609375,
|
| 39218 |
+
"learning_rate": 0.1,
|
| 39219 |
+
"loss": 2.483163595199585,
|
| 39220 |
+
"step": 11196
|
| 39221 |
+
},
|
| 39222 |
+
{
|
| 39223 |
+
"epoch": 0.35549206349206347,
|
| 39224 |
+
"grad_norm": 0.09130859375,
|
| 39225 |
+
"learning_rate": 0.1,
|
| 39226 |
+
"loss": 2.5392792224884033,
|
| 39227 |
+
"step": 11198
|
| 39228 |
+
},
|
| 39229 |
+
{
|
| 39230 |
+
"epoch": 0.35555555555555557,
|
| 39231 |
+
"grad_norm": 0.0751953125,
|
| 39232 |
+
"learning_rate": 0.1,
|
| 39233 |
+
"loss": 2.4986934661865234,
|
| 39234 |
+
"step": 11200
|
| 39235 |
+
},
|
| 39236 |
+
{
|
| 39237 |
+
"epoch": 0.3556190476190476,
|
| 39238 |
+
"grad_norm": 0.04931640625,
|
| 39239 |
+
"learning_rate": 0.1,
|
| 39240 |
+
"loss": 2.480569362640381,
|
| 39241 |
+
"step": 11202
|
| 39242 |
+
},
|
| 39243 |
+
{
|
| 39244 |
+
"epoch": 0.35568253968253966,
|
| 39245 |
+
"grad_norm": 0.06494140625,
|
| 39246 |
+
"learning_rate": 0.1,
|
| 39247 |
+
"loss": 2.4624855518341064,
|
| 39248 |
+
"step": 11204
|
| 39249 |
+
},
|
| 39250 |
+
{
|
| 39251 |
+
"epoch": 0.35574603174603175,
|
| 39252 |
+
"grad_norm": 0.197265625,
|
| 39253 |
+
"learning_rate": 0.1,
|
| 39254 |
+
"loss": 2.493828296661377,
|
| 39255 |
+
"step": 11206
|
| 39256 |
+
},
|
| 39257 |
+
{
|
| 39258 |
+
"epoch": 0.3558095238095238,
|
| 39259 |
+
"grad_norm": 0.203125,
|
| 39260 |
+
"learning_rate": 0.1,
|
| 39261 |
+
"loss": 2.491539239883423,
|
| 39262 |
+
"step": 11208
|
| 39263 |
+
},
|
| 39264 |
+
{
|
| 39265 |
+
"epoch": 0.3558730158730159,
|
| 39266 |
+
"grad_norm": 0.0478515625,
|
| 39267 |
+
"learning_rate": 0.1,
|
| 39268 |
+
"loss": 2.497725486755371,
|
| 39269 |
+
"step": 11210
|
| 39270 |
+
},
|
| 39271 |
+
{
|
| 39272 |
+
"epoch": 0.35593650793650794,
|
| 39273 |
+
"grad_norm": 0.1474609375,
|
| 39274 |
+
"learning_rate": 0.1,
|
| 39275 |
+
"loss": 2.4691996574401855,
|
| 39276 |
+
"step": 11212
|
| 39277 |
+
},
|
| 39278 |
+
{
|
| 39279 |
+
"epoch": 0.356,
|
| 39280 |
+
"grad_norm": 0.3671875,
|
| 39281 |
+
"learning_rate": 0.1,
|
| 39282 |
+
"loss": 2.4893603324890137,
|
| 39283 |
+
"step": 11214
|
| 39284 |
+
},
|
| 39285 |
+
{
|
| 39286 |
+
"epoch": 0.3560634920634921,
|
| 39287 |
+
"grad_norm": 0.2734375,
|
| 39288 |
+
"learning_rate": 0.1,
|
| 39289 |
+
"loss": 2.4978229999542236,
|
| 39290 |
+
"step": 11216
|
| 39291 |
+
},
|
| 39292 |
+
{
|
| 39293 |
+
"epoch": 0.3561269841269841,
|
| 39294 |
+
"grad_norm": 0.053466796875,
|
| 39295 |
+
"learning_rate": 0.1,
|
| 39296 |
+
"loss": 2.4881231784820557,
|
| 39297 |
+
"step": 11218
|
| 39298 |
+
},
|
| 39299 |
+
{
|
| 39300 |
+
"epoch": 0.35619047619047617,
|
| 39301 |
+
"grad_norm": 0.10595703125,
|
| 39302 |
+
"learning_rate": 0.1,
|
| 39303 |
+
"loss": 2.46063232421875,
|
| 39304 |
+
"step": 11220
|
| 39305 |
+
},
|
| 39306 |
+
{
|
| 39307 |
+
"epoch": 0.35625396825396827,
|
| 39308 |
+
"grad_norm": 0.09033203125,
|
| 39309 |
+
"learning_rate": 0.1,
|
| 39310 |
+
"loss": 2.5022330284118652,
|
| 39311 |
+
"step": 11222
|
| 39312 |
+
},
|
| 39313 |
+
{
|
| 39314 |
+
"epoch": 0.3563174603174603,
|
| 39315 |
+
"grad_norm": 0.21875,
|
| 39316 |
+
"learning_rate": 0.1,
|
| 39317 |
+
"loss": 2.5123372077941895,
|
| 39318 |
+
"step": 11224
|
| 39319 |
+
},
|
| 39320 |
+
{
|
| 39321 |
+
"epoch": 0.35638095238095235,
|
| 39322 |
+
"grad_norm": 0.62890625,
|
| 39323 |
+
"learning_rate": 0.1,
|
| 39324 |
+
"loss": 2.4963247776031494,
|
| 39325 |
+
"step": 11226
|
| 39326 |
+
},
|
| 39327 |
+
{
|
| 39328 |
+
"epoch": 0.35644444444444445,
|
| 39329 |
+
"grad_norm": 0.166015625,
|
| 39330 |
+
"learning_rate": 0.1,
|
| 39331 |
+
"loss": 2.4859702587127686,
|
| 39332 |
+
"step": 11228
|
| 39333 |
+
},
|
| 39334 |
+
{
|
| 39335 |
+
"epoch": 0.3565079365079365,
|
| 39336 |
+
"grad_norm": 0.1298828125,
|
| 39337 |
+
"learning_rate": 0.1,
|
| 39338 |
+
"loss": 2.475501775741577,
|
| 39339 |
+
"step": 11230
|
| 39340 |
+
},
|
| 39341 |
+
{
|
| 39342 |
+
"epoch": 0.3565714285714286,
|
| 39343 |
+
"grad_norm": 0.291015625,
|
| 39344 |
+
"learning_rate": 0.1,
|
| 39345 |
+
"loss": 2.490372657775879,
|
| 39346 |
+
"step": 11232
|
| 39347 |
+
},
|
| 39348 |
+
{
|
| 39349 |
+
"epoch": 0.35663492063492064,
|
| 39350 |
+
"grad_norm": 0.349609375,
|
| 39351 |
+
"learning_rate": 0.1,
|
| 39352 |
+
"loss": 2.480090618133545,
|
| 39353 |
+
"step": 11234
|
| 39354 |
+
},
|
| 39355 |
+
{
|
| 39356 |
+
"epoch": 0.3566984126984127,
|
| 39357 |
+
"grad_norm": 0.146484375,
|
| 39358 |
+
"learning_rate": 0.1,
|
| 39359 |
+
"loss": 2.5044050216674805,
|
| 39360 |
+
"step": 11236
|
| 39361 |
+
},
|
| 39362 |
+
{
|
| 39363 |
+
"epoch": 0.3567619047619048,
|
| 39364 |
+
"grad_norm": 0.09521484375,
|
| 39365 |
+
"learning_rate": 0.1,
|
| 39366 |
+
"loss": 2.4619669914245605,
|
| 39367 |
+
"step": 11238
|
| 39368 |
+
},
|
| 39369 |
+
{
|
| 39370 |
+
"epoch": 0.3568253968253968,
|
| 39371 |
+
"grad_norm": 0.080078125,
|
| 39372 |
+
"learning_rate": 0.1,
|
| 39373 |
+
"loss": 2.4584763050079346,
|
| 39374 |
+
"step": 11240
|
| 39375 |
+
},
|
| 39376 |
+
{
|
| 39377 |
+
"epoch": 0.35688888888888887,
|
| 39378 |
+
"grad_norm": 0.197265625,
|
| 39379 |
+
"learning_rate": 0.1,
|
| 39380 |
+
"loss": 2.478175640106201,
|
| 39381 |
+
"step": 11242
|
| 39382 |
+
},
|
| 39383 |
+
{
|
| 39384 |
+
"epoch": 0.35695238095238097,
|
| 39385 |
+
"grad_norm": 0.3515625,
|
| 39386 |
+
"learning_rate": 0.1,
|
| 39387 |
+
"loss": 2.476935386657715,
|
| 39388 |
+
"step": 11244
|
| 39389 |
+
},
|
| 39390 |
+
{
|
| 39391 |
+
"epoch": 0.357015873015873,
|
| 39392 |
+
"grad_norm": 0.244140625,
|
| 39393 |
+
"learning_rate": 0.1,
|
| 39394 |
+
"loss": 2.461888074874878,
|
| 39395 |
+
"step": 11246
|
| 39396 |
+
},
|
| 39397 |
+
{
|
| 39398 |
+
"epoch": 0.35707936507936505,
|
| 39399 |
+
"grad_norm": 0.1142578125,
|
| 39400 |
+
"learning_rate": 0.1,
|
| 39401 |
+
"loss": 2.449694871902466,
|
| 39402 |
+
"step": 11248
|
| 39403 |
+
},
|
| 39404 |
+
{
|
| 39405 |
+
"epoch": 0.35714285714285715,
|
| 39406 |
+
"grad_norm": 0.244140625,
|
| 39407 |
+
"learning_rate": 0.1,
|
| 39408 |
+
"loss": 2.4589521884918213,
|
| 39409 |
+
"step": 11250
|
| 39410 |
+
},
|
| 39411 |
+
{
|
| 39412 |
+
"epoch": 0.3572063492063492,
|
| 39413 |
+
"grad_norm": 0.169921875,
|
| 39414 |
+
"learning_rate": 0.1,
|
| 39415 |
+
"loss": 2.469026565551758,
|
| 39416 |
+
"step": 11252
|
| 39417 |
+
},
|
| 39418 |
+
{
|
| 39419 |
+
"epoch": 0.3572698412698413,
|
| 39420 |
+
"grad_norm": 0.2294921875,
|
| 39421 |
+
"learning_rate": 0.1,
|
| 39422 |
+
"loss": 2.463324785232544,
|
| 39423 |
+
"step": 11254
|
| 39424 |
+
},
|
| 39425 |
+
{
|
| 39426 |
+
"epoch": 0.35733333333333334,
|
| 39427 |
+
"grad_norm": 0.37109375,
|
| 39428 |
+
"learning_rate": 0.1,
|
| 39429 |
+
"loss": 2.450981616973877,
|
| 39430 |
+
"step": 11256
|
| 39431 |
+
},
|
| 39432 |
+
{
|
| 39433 |
+
"epoch": 0.3573968253968254,
|
| 39434 |
+
"grad_norm": 0.1572265625,
|
| 39435 |
+
"learning_rate": 0.1,
|
| 39436 |
+
"loss": 2.4704065322875977,
|
| 39437 |
+
"step": 11258
|
| 39438 |
+
},
|
| 39439 |
+
{
|
| 39440 |
+
"epoch": 0.3574603174603175,
|
| 39441 |
+
"grad_norm": 0.10400390625,
|
| 39442 |
+
"learning_rate": 0.1,
|
| 39443 |
+
"loss": 2.4855854511260986,
|
| 39444 |
+
"step": 11260
|
| 39445 |
+
},
|
| 39446 |
+
{
|
| 39447 |
+
"epoch": 0.3575238095238095,
|
| 39448 |
+
"grad_norm": 0.091796875,
|
| 39449 |
+
"learning_rate": 0.1,
|
| 39450 |
+
"loss": 2.459942102432251,
|
| 39451 |
+
"step": 11262
|
| 39452 |
+
},
|
| 39453 |
+
{
|
| 39454 |
+
"epoch": 0.35758730158730156,
|
| 39455 |
+
"grad_norm": 0.365234375,
|
| 39456 |
+
"learning_rate": 0.1,
|
| 39457 |
+
"loss": 2.4716644287109375,
|
| 39458 |
+
"step": 11264
|
| 39459 |
+
},
|
| 39460 |
+
{
|
| 39461 |
+
"epoch": 0.35765079365079366,
|
| 39462 |
+
"grad_norm": 0.138671875,
|
| 39463 |
+
"learning_rate": 0.1,
|
| 39464 |
+
"loss": 2.4955148696899414,
|
| 39465 |
+
"step": 11266
|
| 39466 |
+
},
|
| 39467 |
+
{
|
| 39468 |
+
"epoch": 0.3577142857142857,
|
| 39469 |
+
"grad_norm": 0.076171875,
|
| 39470 |
+
"learning_rate": 0.1,
|
| 39471 |
+
"loss": 2.462697982788086,
|
| 39472 |
+
"step": 11268
|
| 39473 |
+
},
|
| 39474 |
+
{
|
| 39475 |
+
"epoch": 0.35777777777777775,
|
| 39476 |
+
"grad_norm": 0.1396484375,
|
| 39477 |
+
"learning_rate": 0.1,
|
| 39478 |
+
"loss": 2.4656829833984375,
|
| 39479 |
+
"step": 11270
|
| 39480 |
+
},
|
| 39481 |
+
{
|
| 39482 |
+
"epoch": 0.35784126984126985,
|
| 39483 |
+
"grad_norm": 0.09423828125,
|
| 39484 |
+
"learning_rate": 0.1,
|
| 39485 |
+
"loss": 2.458569288253784,
|
| 39486 |
+
"step": 11272
|
| 39487 |
+
},
|
| 39488 |
+
{
|
| 39489 |
+
"epoch": 0.3579047619047619,
|
| 39490 |
+
"grad_norm": 0.0771484375,
|
| 39491 |
+
"learning_rate": 0.1,
|
| 39492 |
+
"loss": 2.4550700187683105,
|
| 39493 |
+
"step": 11274
|
| 39494 |
+
},
|
| 39495 |
+
{
|
| 39496 |
+
"epoch": 0.357968253968254,
|
| 39497 |
+
"grad_norm": 0.1142578125,
|
| 39498 |
+
"learning_rate": 0.1,
|
| 39499 |
+
"loss": 2.4642810821533203,
|
| 39500 |
+
"step": 11276
|
| 39501 |
+
},
|
| 39502 |
+
{
|
| 39503 |
+
"epoch": 0.35803174603174603,
|
| 39504 |
+
"grad_norm": 0.1826171875,
|
| 39505 |
+
"learning_rate": 0.1,
|
| 39506 |
+
"loss": 2.454416513442993,
|
| 39507 |
+
"step": 11278
|
| 39508 |
+
},
|
| 39509 |
+
{
|
| 39510 |
+
"epoch": 0.3580952380952381,
|
| 39511 |
+
"grad_norm": 0.17578125,
|
| 39512 |
+
"learning_rate": 0.1,
|
| 39513 |
+
"loss": 2.469170570373535,
|
| 39514 |
+
"step": 11280
|
| 39515 |
+
},
|
| 39516 |
+
{
|
| 39517 |
+
"epoch": 0.3581587301587302,
|
| 39518 |
+
"grad_norm": 0.17578125,
|
| 39519 |
+
"learning_rate": 0.1,
|
| 39520 |
+
"loss": 2.481339693069458,
|
| 39521 |
+
"step": 11282
|
| 39522 |
+
},
|
| 39523 |
+
{
|
| 39524 |
+
"epoch": 0.3582222222222222,
|
| 39525 |
+
"grad_norm": 0.458984375,
|
| 39526 |
+
"learning_rate": 0.1,
|
| 39527 |
+
"loss": 2.467975378036499,
|
| 39528 |
+
"step": 11284
|
| 39529 |
+
},
|
| 39530 |
+
{
|
| 39531 |
+
"epoch": 0.35828571428571426,
|
| 39532 |
+
"grad_norm": 0.2080078125,
|
| 39533 |
+
"learning_rate": 0.1,
|
| 39534 |
+
"loss": 2.5128746032714844,
|
| 39535 |
+
"step": 11286
|
| 39536 |
+
},
|
| 39537 |
+
{
|
| 39538 |
+
"epoch": 0.35834920634920636,
|
| 39539 |
+
"grad_norm": 0.08154296875,
|
| 39540 |
+
"learning_rate": 0.1,
|
| 39541 |
+
"loss": 2.4856278896331787,
|
| 39542 |
+
"step": 11288
|
| 39543 |
+
},
|
| 39544 |
+
{
|
| 39545 |
+
"epoch": 0.3584126984126984,
|
| 39546 |
+
"grad_norm": 0.1044921875,
|
| 39547 |
+
"learning_rate": 0.1,
|
| 39548 |
+
"loss": 2.4855496883392334,
|
| 39549 |
+
"step": 11290
|
| 39550 |
+
},
|
| 39551 |
+
{
|
| 39552 |
+
"epoch": 0.3584761904761905,
|
| 39553 |
+
"grad_norm": 0.28125,
|
| 39554 |
+
"learning_rate": 0.1,
|
| 39555 |
+
"loss": 2.4849109649658203,
|
| 39556 |
+
"step": 11292
|
| 39557 |
+
},
|
| 39558 |
+
{
|
| 39559 |
+
"epoch": 0.35853968253968255,
|
| 39560 |
+
"grad_norm": 0.2890625,
|
| 39561 |
+
"learning_rate": 0.1,
|
| 39562 |
+
"loss": 2.4842143058776855,
|
| 39563 |
+
"step": 11294
|
| 39564 |
+
},
|
| 39565 |
+
{
|
| 39566 |
+
"epoch": 0.3586031746031746,
|
| 39567 |
+
"grad_norm": 0.3671875,
|
| 39568 |
+
"learning_rate": 0.1,
|
| 39569 |
+
"loss": 2.451904058456421,
|
| 39570 |
+
"step": 11296
|
| 39571 |
+
},
|
| 39572 |
+
{
|
| 39573 |
+
"epoch": 0.3586666666666667,
|
| 39574 |
+
"grad_norm": 0.1884765625,
|
| 39575 |
+
"learning_rate": 0.1,
|
| 39576 |
+
"loss": 2.4747142791748047,
|
| 39577 |
+
"step": 11298
|
| 39578 |
+
},
|
| 39579 |
+
{
|
| 39580 |
+
"epoch": 0.35873015873015873,
|
| 39581 |
+
"grad_norm": 0.2294921875,
|
| 39582 |
+
"learning_rate": 0.1,
|
| 39583 |
+
"loss": 2.4832489490509033,
|
| 39584 |
+
"step": 11300
|
| 39585 |
+
},
|
| 39586 |
+
{
|
| 39587 |
+
"epoch": 0.3587936507936508,
|
| 39588 |
+
"grad_norm": 0.1484375,
|
| 39589 |
+
"learning_rate": 0.1,
|
| 39590 |
+
"loss": 2.4741103649139404,
|
| 39591 |
+
"step": 11302
|
| 39592 |
+
},
|
| 39593 |
+
{
|
| 39594 |
+
"epoch": 0.3588571428571429,
|
| 39595 |
+
"grad_norm": 0.099609375,
|
| 39596 |
+
"learning_rate": 0.1,
|
| 39597 |
+
"loss": 2.470633029937744,
|
| 39598 |
+
"step": 11304
|
| 39599 |
+
},
|
| 39600 |
+
{
|
| 39601 |
+
"epoch": 0.3589206349206349,
|
| 39602 |
+
"grad_norm": 0.228515625,
|
| 39603 |
+
"learning_rate": 0.1,
|
| 39604 |
+
"loss": 2.491023063659668,
|
| 39605 |
+
"step": 11306
|
| 39606 |
+
},
|
| 39607 |
+
{
|
| 39608 |
+
"epoch": 0.35898412698412696,
|
| 39609 |
+
"grad_norm": 0.2578125,
|
| 39610 |
+
"learning_rate": 0.1,
|
| 39611 |
+
"loss": 2.4867446422576904,
|
| 39612 |
+
"step": 11308
|
| 39613 |
+
},
|
| 39614 |
+
{
|
| 39615 |
+
"epoch": 0.35904761904761906,
|
| 39616 |
+
"grad_norm": 0.1875,
|
| 39617 |
+
"learning_rate": 0.1,
|
| 39618 |
+
"loss": 2.4757778644561768,
|
| 39619 |
+
"step": 11310
|
| 39620 |
+
},
|
| 39621 |
+
{
|
| 39622 |
+
"epoch": 0.3591111111111111,
|
| 39623 |
+
"grad_norm": 0.236328125,
|
| 39624 |
+
"learning_rate": 0.1,
|
| 39625 |
+
"loss": 2.483226776123047,
|
| 39626 |
+
"step": 11312
|
| 39627 |
+
},
|
| 39628 |
+
{
|
| 39629 |
+
"epoch": 0.3591746031746032,
|
| 39630 |
+
"grad_norm": 0.291015625,
|
| 39631 |
+
"learning_rate": 0.1,
|
| 39632 |
+
"loss": 2.509500026702881,
|
| 39633 |
+
"step": 11314
|
| 39634 |
+
},
|
| 39635 |
+
{
|
| 39636 |
+
"epoch": 0.35923809523809525,
|
| 39637 |
+
"grad_norm": 0.1064453125,
|
| 39638 |
+
"learning_rate": 0.1,
|
| 39639 |
+
"loss": 2.4560673236846924,
|
| 39640 |
+
"step": 11316
|
| 39641 |
+
},
|
| 39642 |
+
{
|
| 39643 |
+
"epoch": 0.3593015873015873,
|
| 39644 |
+
"grad_norm": 0.35546875,
|
| 39645 |
+
"learning_rate": 0.1,
|
| 39646 |
+
"loss": 2.4755465984344482,
|
| 39647 |
+
"step": 11318
|
| 39648 |
+
},
|
| 39649 |
+
{
|
| 39650 |
+
"epoch": 0.3593650793650794,
|
| 39651 |
+
"grad_norm": 0.390625,
|
| 39652 |
+
"learning_rate": 0.1,
|
| 39653 |
+
"loss": 2.4875471591949463,
|
| 39654 |
+
"step": 11320
|
| 39655 |
+
},
|
| 39656 |
+
{
|
| 39657 |
+
"epoch": 0.35942857142857143,
|
| 39658 |
+
"grad_norm": 0.1376953125,
|
| 39659 |
+
"learning_rate": 0.1,
|
| 39660 |
+
"loss": 2.501183032989502,
|
| 39661 |
+
"step": 11322
|
| 39662 |
+
},
|
| 39663 |
+
{
|
| 39664 |
+
"epoch": 0.3594920634920635,
|
| 39665 |
+
"grad_norm": 0.1708984375,
|
| 39666 |
+
"learning_rate": 0.1,
|
| 39667 |
+
"loss": 2.482649087905884,
|
| 39668 |
+
"step": 11324
|
| 39669 |
+
},
|
| 39670 |
+
{
|
| 39671 |
+
"epoch": 0.3595555555555556,
|
| 39672 |
+
"grad_norm": 0.33984375,
|
| 39673 |
+
"learning_rate": 0.1,
|
| 39674 |
+
"loss": 2.5185811519622803,
|
| 39675 |
+
"step": 11326
|
| 39676 |
+
},
|
| 39677 |
+
{
|
| 39678 |
+
"epoch": 0.3596190476190476,
|
| 39679 |
+
"grad_norm": 0.189453125,
|
| 39680 |
+
"learning_rate": 0.1,
|
| 39681 |
+
"loss": 2.5071229934692383,
|
| 39682 |
+
"step": 11328
|
| 39683 |
+
},
|
| 39684 |
+
{
|
| 39685 |
+
"epoch": 0.35968253968253966,
|
| 39686 |
+
"grad_norm": 0.1708984375,
|
| 39687 |
+
"learning_rate": 0.1,
|
| 39688 |
+
"loss": 2.4960479736328125,
|
| 39689 |
+
"step": 11330
|
| 39690 |
+
},
|
| 39691 |
+
{
|
| 39692 |
+
"epoch": 0.35974603174603176,
|
| 39693 |
+
"grad_norm": 0.0732421875,
|
| 39694 |
+
"learning_rate": 0.1,
|
| 39695 |
+
"loss": 2.4599719047546387,
|
| 39696 |
+
"step": 11332
|
| 39697 |
+
},
|
| 39698 |
+
{
|
| 39699 |
+
"epoch": 0.3598095238095238,
|
| 39700 |
+
"grad_norm": 0.0712890625,
|
| 39701 |
+
"learning_rate": 0.1,
|
| 39702 |
+
"loss": 2.492602586746216,
|
| 39703 |
+
"step": 11334
|
| 39704 |
+
},
|
| 39705 |
+
{
|
| 39706 |
+
"epoch": 0.3598730158730159,
|
| 39707 |
+
"grad_norm": 0.0615234375,
|
| 39708 |
+
"learning_rate": 0.1,
|
| 39709 |
+
"loss": 2.467595100402832,
|
| 39710 |
+
"step": 11336
|
| 39711 |
+
},
|
| 39712 |
+
{
|
| 39713 |
+
"epoch": 0.35993650793650794,
|
| 39714 |
+
"grad_norm": 0.0966796875,
|
| 39715 |
+
"learning_rate": 0.1,
|
| 39716 |
+
"loss": 2.5092246532440186,
|
| 39717 |
+
"step": 11338
|
| 39718 |
+
},
|
| 39719 |
+
{
|
| 39720 |
+
"epoch": 0.36,
|
| 39721 |
+
"grad_norm": 0.2373046875,
|
| 39722 |
+
"learning_rate": 0.1,
|
| 39723 |
+
"loss": 2.4747467041015625,
|
| 39724 |
+
"step": 11340
|
| 39725 |
}
|
| 39726 |
],
|
| 39727 |
"logging_steps": 2,
|
|
|
|
| 39741 |
"attributes": {}
|
| 39742 |
}
|
| 39743 |
},
|
| 39744 |
+
"total_flos": 3.755676497098762e+19,
|
| 39745 |
"train_batch_size": 4,
|
| 39746 |
"trial_name": null,
|
| 39747 |
"trial_params": null
|