Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 11025, checkpoint
Browse files- last-checkpoint/trainer_state.json +1102 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -37517,6 +37517,1105 @@
|
|
| 37517 |
"learning_rate": 0.1,
|
| 37518 |
"loss": 2.1646363735198975,
|
| 37519 |
"step": 10710
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37520 |
}
|
| 37521 |
],
|
| 37522 |
"logging_steps": 2,
|
|
@@ -37536,7 +38635,7 @@
|
|
| 37536 |
"attributes": {}
|
| 37537 |
}
|
| 37538 |
},
|
| 37539 |
-
"total_flos": 3.
|
| 37540 |
"train_batch_size": 4,
|
| 37541 |
"trial_name": null,
|
| 37542 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.35,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 11025,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 37517 |
"learning_rate": 0.1,
|
| 37518 |
"loss": 2.1646363735198975,
|
| 37519 |
"step": 10710
|
| 37520 |
+
},
|
| 37521 |
+
{
|
| 37522 |
+
"epoch": 0.34006349206349207,
|
| 37523 |
+
"grad_norm": 0.057373046875,
|
| 37524 |
+
"learning_rate": 0.1,
|
| 37525 |
+
"loss": 2.1419057846069336,
|
| 37526 |
+
"step": 10712
|
| 37527 |
+
},
|
| 37528 |
+
{
|
| 37529 |
+
"epoch": 0.3401269841269841,
|
| 37530 |
+
"grad_norm": 0.1787109375,
|
| 37531 |
+
"learning_rate": 0.1,
|
| 37532 |
+
"loss": 2.1370527744293213,
|
| 37533 |
+
"step": 10714
|
| 37534 |
+
},
|
| 37535 |
+
{
|
| 37536 |
+
"epoch": 0.3401904761904762,
|
| 37537 |
+
"grad_norm": 0.0654296875,
|
| 37538 |
+
"learning_rate": 0.1,
|
| 37539 |
+
"loss": 2.1385576725006104,
|
| 37540 |
+
"step": 10716
|
| 37541 |
+
},
|
| 37542 |
+
{
|
| 37543 |
+
"epoch": 0.34025396825396825,
|
| 37544 |
+
"grad_norm": 0.13671875,
|
| 37545 |
+
"learning_rate": 0.1,
|
| 37546 |
+
"loss": 2.1344106197357178,
|
| 37547 |
+
"step": 10718
|
| 37548 |
+
},
|
| 37549 |
+
{
|
| 37550 |
+
"epoch": 0.3403174603174603,
|
| 37551 |
+
"grad_norm": 0.181640625,
|
| 37552 |
+
"learning_rate": 0.1,
|
| 37553 |
+
"loss": 2.123504877090454,
|
| 37554 |
+
"step": 10720
|
| 37555 |
+
},
|
| 37556 |
+
{
|
| 37557 |
+
"epoch": 0.3403809523809524,
|
| 37558 |
+
"grad_norm": 0.240234375,
|
| 37559 |
+
"learning_rate": 0.1,
|
| 37560 |
+
"loss": 2.168992280960083,
|
| 37561 |
+
"step": 10722
|
| 37562 |
+
},
|
| 37563 |
+
{
|
| 37564 |
+
"epoch": 0.34044444444444444,
|
| 37565 |
+
"grad_norm": 0.189453125,
|
| 37566 |
+
"learning_rate": 0.1,
|
| 37567 |
+
"loss": 2.182685613632202,
|
| 37568 |
+
"step": 10724
|
| 37569 |
+
},
|
| 37570 |
+
{
|
| 37571 |
+
"epoch": 0.3405079365079365,
|
| 37572 |
+
"grad_norm": 0.15234375,
|
| 37573 |
+
"learning_rate": 0.1,
|
| 37574 |
+
"loss": 2.1744816303253174,
|
| 37575 |
+
"step": 10726
|
| 37576 |
+
},
|
| 37577 |
+
{
|
| 37578 |
+
"epoch": 0.3405714285714286,
|
| 37579 |
+
"grad_norm": 0.462890625,
|
| 37580 |
+
"learning_rate": 0.1,
|
| 37581 |
+
"loss": 2.1602675914764404,
|
| 37582 |
+
"step": 10728
|
| 37583 |
+
},
|
| 37584 |
+
{
|
| 37585 |
+
"epoch": 0.3406349206349206,
|
| 37586 |
+
"grad_norm": 0.08447265625,
|
| 37587 |
+
"learning_rate": 0.1,
|
| 37588 |
+
"loss": 2.1859214305877686,
|
| 37589 |
+
"step": 10730
|
| 37590 |
+
},
|
| 37591 |
+
{
|
| 37592 |
+
"epoch": 0.3406984126984127,
|
| 37593 |
+
"grad_norm": 0.06396484375,
|
| 37594 |
+
"learning_rate": 0.1,
|
| 37595 |
+
"loss": 2.151904582977295,
|
| 37596 |
+
"step": 10732
|
| 37597 |
+
},
|
| 37598 |
+
{
|
| 37599 |
+
"epoch": 0.34076190476190477,
|
| 37600 |
+
"grad_norm": 0.0791015625,
|
| 37601 |
+
"learning_rate": 0.1,
|
| 37602 |
+
"loss": 2.143826961517334,
|
| 37603 |
+
"step": 10734
|
| 37604 |
+
},
|
| 37605 |
+
{
|
| 37606 |
+
"epoch": 0.3408253968253968,
|
| 37607 |
+
"grad_norm": 0.166015625,
|
| 37608 |
+
"learning_rate": 0.1,
|
| 37609 |
+
"loss": 2.1708786487579346,
|
| 37610 |
+
"step": 10736
|
| 37611 |
+
},
|
| 37612 |
+
{
|
| 37613 |
+
"epoch": 0.3408888888888889,
|
| 37614 |
+
"grad_norm": 0.1005859375,
|
| 37615 |
+
"learning_rate": 0.1,
|
| 37616 |
+
"loss": 2.1587464809417725,
|
| 37617 |
+
"step": 10738
|
| 37618 |
+
},
|
| 37619 |
+
{
|
| 37620 |
+
"epoch": 0.34095238095238095,
|
| 37621 |
+
"grad_norm": 0.1669921875,
|
| 37622 |
+
"learning_rate": 0.1,
|
| 37623 |
+
"loss": 2.159186363220215,
|
| 37624 |
+
"step": 10740
|
| 37625 |
+
},
|
| 37626 |
+
{
|
| 37627 |
+
"epoch": 0.341015873015873,
|
| 37628 |
+
"grad_norm": 0.087890625,
|
| 37629 |
+
"learning_rate": 0.1,
|
| 37630 |
+
"loss": 2.132807970046997,
|
| 37631 |
+
"step": 10742
|
| 37632 |
+
},
|
| 37633 |
+
{
|
| 37634 |
+
"epoch": 0.3410793650793651,
|
| 37635 |
+
"grad_norm": 0.1923828125,
|
| 37636 |
+
"learning_rate": 0.1,
|
| 37637 |
+
"loss": 2.1830198764801025,
|
| 37638 |
+
"step": 10744
|
| 37639 |
+
},
|
| 37640 |
+
{
|
| 37641 |
+
"epoch": 0.34114285714285714,
|
| 37642 |
+
"grad_norm": 0.1083984375,
|
| 37643 |
+
"learning_rate": 0.1,
|
| 37644 |
+
"loss": 2.1978118419647217,
|
| 37645 |
+
"step": 10746
|
| 37646 |
+
},
|
| 37647 |
+
{
|
| 37648 |
+
"epoch": 0.3412063492063492,
|
| 37649 |
+
"grad_norm": 0.11669921875,
|
| 37650 |
+
"learning_rate": 0.1,
|
| 37651 |
+
"loss": 2.1586546897888184,
|
| 37652 |
+
"step": 10748
|
| 37653 |
+
},
|
| 37654 |
+
{
|
| 37655 |
+
"epoch": 0.3412698412698413,
|
| 37656 |
+
"grad_norm": 0.326171875,
|
| 37657 |
+
"learning_rate": 0.1,
|
| 37658 |
+
"loss": 2.181734323501587,
|
| 37659 |
+
"step": 10750
|
| 37660 |
+
},
|
| 37661 |
+
{
|
| 37662 |
+
"epoch": 0.3413333333333333,
|
| 37663 |
+
"grad_norm": 0.12109375,
|
| 37664 |
+
"learning_rate": 0.1,
|
| 37665 |
+
"loss": 2.1670312881469727,
|
| 37666 |
+
"step": 10752
|
| 37667 |
+
},
|
| 37668 |
+
{
|
| 37669 |
+
"epoch": 0.3413968253968254,
|
| 37670 |
+
"grad_norm": 0.11962890625,
|
| 37671 |
+
"learning_rate": 0.1,
|
| 37672 |
+
"loss": 2.17029070854187,
|
| 37673 |
+
"step": 10754
|
| 37674 |
+
},
|
| 37675 |
+
{
|
| 37676 |
+
"epoch": 0.34146031746031746,
|
| 37677 |
+
"grad_norm": 0.07080078125,
|
| 37678 |
+
"learning_rate": 0.1,
|
| 37679 |
+
"loss": 2.166867971420288,
|
| 37680 |
+
"step": 10756
|
| 37681 |
+
},
|
| 37682 |
+
{
|
| 37683 |
+
"epoch": 0.3415238095238095,
|
| 37684 |
+
"grad_norm": 0.08544921875,
|
| 37685 |
+
"learning_rate": 0.1,
|
| 37686 |
+
"loss": 2.1606602668762207,
|
| 37687 |
+
"step": 10758
|
| 37688 |
+
},
|
| 37689 |
+
{
|
| 37690 |
+
"epoch": 0.3415873015873016,
|
| 37691 |
+
"grad_norm": 0.1474609375,
|
| 37692 |
+
"learning_rate": 0.1,
|
| 37693 |
+
"loss": 2.183364152908325,
|
| 37694 |
+
"step": 10760
|
| 37695 |
+
},
|
| 37696 |
+
{
|
| 37697 |
+
"epoch": 0.34165079365079365,
|
| 37698 |
+
"grad_norm": 0.2412109375,
|
| 37699 |
+
"learning_rate": 0.1,
|
| 37700 |
+
"loss": 2.166038990020752,
|
| 37701 |
+
"step": 10762
|
| 37702 |
+
},
|
| 37703 |
+
{
|
| 37704 |
+
"epoch": 0.3417142857142857,
|
| 37705 |
+
"grad_norm": 0.2255859375,
|
| 37706 |
+
"learning_rate": 0.1,
|
| 37707 |
+
"loss": 2.1788270473480225,
|
| 37708 |
+
"step": 10764
|
| 37709 |
+
},
|
| 37710 |
+
{
|
| 37711 |
+
"epoch": 0.3417777777777778,
|
| 37712 |
+
"grad_norm": 0.107421875,
|
| 37713 |
+
"learning_rate": 0.1,
|
| 37714 |
+
"loss": 2.1759607791900635,
|
| 37715 |
+
"step": 10766
|
| 37716 |
+
},
|
| 37717 |
+
{
|
| 37718 |
+
"epoch": 0.34184126984126983,
|
| 37719 |
+
"grad_norm": 0.054931640625,
|
| 37720 |
+
"learning_rate": 0.1,
|
| 37721 |
+
"loss": 2.1977617740631104,
|
| 37722 |
+
"step": 10768
|
| 37723 |
+
},
|
| 37724 |
+
{
|
| 37725 |
+
"epoch": 0.3419047619047619,
|
| 37726 |
+
"grad_norm": 0.197265625,
|
| 37727 |
+
"learning_rate": 0.1,
|
| 37728 |
+
"loss": 2.138373851776123,
|
| 37729 |
+
"step": 10770
|
| 37730 |
+
},
|
| 37731 |
+
{
|
| 37732 |
+
"epoch": 0.341968253968254,
|
| 37733 |
+
"grad_norm": 0.259765625,
|
| 37734 |
+
"learning_rate": 0.1,
|
| 37735 |
+
"loss": 2.1617345809936523,
|
| 37736 |
+
"step": 10772
|
| 37737 |
+
},
|
| 37738 |
+
{
|
| 37739 |
+
"epoch": 0.342031746031746,
|
| 37740 |
+
"grad_norm": 0.34765625,
|
| 37741 |
+
"learning_rate": 0.1,
|
| 37742 |
+
"loss": 2.158432960510254,
|
| 37743 |
+
"step": 10774
|
| 37744 |
+
},
|
| 37745 |
+
{
|
| 37746 |
+
"epoch": 0.3420952380952381,
|
| 37747 |
+
"grad_norm": 0.050537109375,
|
| 37748 |
+
"learning_rate": 0.1,
|
| 37749 |
+
"loss": 2.165778875350952,
|
| 37750 |
+
"step": 10776
|
| 37751 |
+
},
|
| 37752 |
+
{
|
| 37753 |
+
"epoch": 0.34215873015873016,
|
| 37754 |
+
"grad_norm": 0.28125,
|
| 37755 |
+
"learning_rate": 0.1,
|
| 37756 |
+
"loss": 2.1905174255371094,
|
| 37757 |
+
"step": 10778
|
| 37758 |
+
},
|
| 37759 |
+
{
|
| 37760 |
+
"epoch": 0.3422222222222222,
|
| 37761 |
+
"grad_norm": 0.1396484375,
|
| 37762 |
+
"learning_rate": 0.1,
|
| 37763 |
+
"loss": 2.18183970451355,
|
| 37764 |
+
"step": 10780
|
| 37765 |
+
},
|
| 37766 |
+
{
|
| 37767 |
+
"epoch": 0.3422857142857143,
|
| 37768 |
+
"grad_norm": 0.2265625,
|
| 37769 |
+
"learning_rate": 0.1,
|
| 37770 |
+
"loss": 2.1712934970855713,
|
| 37771 |
+
"step": 10782
|
| 37772 |
+
},
|
| 37773 |
+
{
|
| 37774 |
+
"epoch": 0.34234920634920635,
|
| 37775 |
+
"grad_norm": 0.279296875,
|
| 37776 |
+
"learning_rate": 0.1,
|
| 37777 |
+
"loss": 2.16552996635437,
|
| 37778 |
+
"step": 10784
|
| 37779 |
+
},
|
| 37780 |
+
{
|
| 37781 |
+
"epoch": 0.3424126984126984,
|
| 37782 |
+
"grad_norm": 0.123046875,
|
| 37783 |
+
"learning_rate": 0.1,
|
| 37784 |
+
"loss": 2.1844046115875244,
|
| 37785 |
+
"step": 10786
|
| 37786 |
+
},
|
| 37787 |
+
{
|
| 37788 |
+
"epoch": 0.3424761904761905,
|
| 37789 |
+
"grad_norm": 0.10498046875,
|
| 37790 |
+
"learning_rate": 0.1,
|
| 37791 |
+
"loss": 2.1611874103546143,
|
| 37792 |
+
"step": 10788
|
| 37793 |
+
},
|
| 37794 |
+
{
|
| 37795 |
+
"epoch": 0.34253968253968253,
|
| 37796 |
+
"grad_norm": 0.12890625,
|
| 37797 |
+
"learning_rate": 0.1,
|
| 37798 |
+
"loss": 2.1646482944488525,
|
| 37799 |
+
"step": 10790
|
| 37800 |
+
},
|
| 37801 |
+
{
|
| 37802 |
+
"epoch": 0.3426031746031746,
|
| 37803 |
+
"grad_norm": 0.11669921875,
|
| 37804 |
+
"learning_rate": 0.1,
|
| 37805 |
+
"loss": 2.1656932830810547,
|
| 37806 |
+
"step": 10792
|
| 37807 |
+
},
|
| 37808 |
+
{
|
| 37809 |
+
"epoch": 0.3426666666666667,
|
| 37810 |
+
"grad_norm": 0.134765625,
|
| 37811 |
+
"learning_rate": 0.1,
|
| 37812 |
+
"loss": 2.164336681365967,
|
| 37813 |
+
"step": 10794
|
| 37814 |
+
},
|
| 37815 |
+
{
|
| 37816 |
+
"epoch": 0.3427301587301587,
|
| 37817 |
+
"grad_norm": 0.0703125,
|
| 37818 |
+
"learning_rate": 0.1,
|
| 37819 |
+
"loss": 2.159842014312744,
|
| 37820 |
+
"step": 10796
|
| 37821 |
+
},
|
| 37822 |
+
{
|
| 37823 |
+
"epoch": 0.3427936507936508,
|
| 37824 |
+
"grad_norm": 0.1181640625,
|
| 37825 |
+
"learning_rate": 0.1,
|
| 37826 |
+
"loss": 2.188279151916504,
|
| 37827 |
+
"step": 10798
|
| 37828 |
+
},
|
| 37829 |
+
{
|
| 37830 |
+
"epoch": 0.34285714285714286,
|
| 37831 |
+
"grad_norm": 0.1845703125,
|
| 37832 |
+
"learning_rate": 0.1,
|
| 37833 |
+
"loss": 2.165369749069214,
|
| 37834 |
+
"step": 10800
|
| 37835 |
+
},
|
| 37836 |
+
{
|
| 37837 |
+
"epoch": 0.3429206349206349,
|
| 37838 |
+
"grad_norm": 0.1796875,
|
| 37839 |
+
"learning_rate": 0.1,
|
| 37840 |
+
"loss": 2.1752822399139404,
|
| 37841 |
+
"step": 10802
|
| 37842 |
+
},
|
| 37843 |
+
{
|
| 37844 |
+
"epoch": 0.342984126984127,
|
| 37845 |
+
"grad_norm": 0.08740234375,
|
| 37846 |
+
"learning_rate": 0.1,
|
| 37847 |
+
"loss": 2.1573591232299805,
|
| 37848 |
+
"step": 10804
|
| 37849 |
+
},
|
| 37850 |
+
{
|
| 37851 |
+
"epoch": 0.34304761904761905,
|
| 37852 |
+
"grad_norm": 0.216796875,
|
| 37853 |
+
"learning_rate": 0.1,
|
| 37854 |
+
"loss": 2.154291868209839,
|
| 37855 |
+
"step": 10806
|
| 37856 |
+
},
|
| 37857 |
+
{
|
| 37858 |
+
"epoch": 0.3431111111111111,
|
| 37859 |
+
"grad_norm": 0.2236328125,
|
| 37860 |
+
"learning_rate": 0.1,
|
| 37861 |
+
"loss": 2.1773228645324707,
|
| 37862 |
+
"step": 10808
|
| 37863 |
+
},
|
| 37864 |
+
{
|
| 37865 |
+
"epoch": 0.3431746031746032,
|
| 37866 |
+
"grad_norm": 0.310546875,
|
| 37867 |
+
"learning_rate": 0.1,
|
| 37868 |
+
"loss": 2.132742166519165,
|
| 37869 |
+
"step": 10810
|
| 37870 |
+
},
|
| 37871 |
+
{
|
| 37872 |
+
"epoch": 0.34323809523809523,
|
| 37873 |
+
"grad_norm": 0.07275390625,
|
| 37874 |
+
"learning_rate": 0.1,
|
| 37875 |
+
"loss": 2.16198992729187,
|
| 37876 |
+
"step": 10812
|
| 37877 |
+
},
|
| 37878 |
+
{
|
| 37879 |
+
"epoch": 0.3433015873015873,
|
| 37880 |
+
"grad_norm": 0.2158203125,
|
| 37881 |
+
"learning_rate": 0.1,
|
| 37882 |
+
"loss": 2.1417953968048096,
|
| 37883 |
+
"step": 10814
|
| 37884 |
+
},
|
| 37885 |
+
{
|
| 37886 |
+
"epoch": 0.3433650793650794,
|
| 37887 |
+
"grad_norm": 0.234375,
|
| 37888 |
+
"learning_rate": 0.1,
|
| 37889 |
+
"loss": 2.16147780418396,
|
| 37890 |
+
"step": 10816
|
| 37891 |
+
},
|
| 37892 |
+
{
|
| 37893 |
+
"epoch": 0.3434285714285714,
|
| 37894 |
+
"grad_norm": 0.1279296875,
|
| 37895 |
+
"learning_rate": 0.1,
|
| 37896 |
+
"loss": 2.16763973236084,
|
| 37897 |
+
"step": 10818
|
| 37898 |
+
},
|
| 37899 |
+
{
|
| 37900 |
+
"epoch": 0.3434920634920635,
|
| 37901 |
+
"grad_norm": 0.1337890625,
|
| 37902 |
+
"learning_rate": 0.1,
|
| 37903 |
+
"loss": 2.165390729904175,
|
| 37904 |
+
"step": 10820
|
| 37905 |
+
},
|
| 37906 |
+
{
|
| 37907 |
+
"epoch": 0.34355555555555556,
|
| 37908 |
+
"grad_norm": 0.15625,
|
| 37909 |
+
"learning_rate": 0.1,
|
| 37910 |
+
"loss": 2.1336190700531006,
|
| 37911 |
+
"step": 10822
|
| 37912 |
+
},
|
| 37913 |
+
{
|
| 37914 |
+
"epoch": 0.3436190476190476,
|
| 37915 |
+
"grad_norm": 0.134765625,
|
| 37916 |
+
"learning_rate": 0.1,
|
| 37917 |
+
"loss": 2.1284821033477783,
|
| 37918 |
+
"step": 10824
|
| 37919 |
+
},
|
| 37920 |
+
{
|
| 37921 |
+
"epoch": 0.3436825396825397,
|
| 37922 |
+
"grad_norm": 0.2099609375,
|
| 37923 |
+
"learning_rate": 0.1,
|
| 37924 |
+
"loss": 2.205808639526367,
|
| 37925 |
+
"step": 10826
|
| 37926 |
+
},
|
| 37927 |
+
{
|
| 37928 |
+
"epoch": 0.34374603174603174,
|
| 37929 |
+
"grad_norm": 0.158203125,
|
| 37930 |
+
"learning_rate": 0.1,
|
| 37931 |
+
"loss": 2.168703556060791,
|
| 37932 |
+
"step": 10828
|
| 37933 |
+
},
|
| 37934 |
+
{
|
| 37935 |
+
"epoch": 0.3438095238095238,
|
| 37936 |
+
"grad_norm": 0.26171875,
|
| 37937 |
+
"learning_rate": 0.1,
|
| 37938 |
+
"loss": 2.206918239593506,
|
| 37939 |
+
"step": 10830
|
| 37940 |
+
},
|
| 37941 |
+
{
|
| 37942 |
+
"epoch": 0.3438730158730159,
|
| 37943 |
+
"grad_norm": 0.173828125,
|
| 37944 |
+
"learning_rate": 0.1,
|
| 37945 |
+
"loss": 2.19317364692688,
|
| 37946 |
+
"step": 10832
|
| 37947 |
+
},
|
| 37948 |
+
{
|
| 37949 |
+
"epoch": 0.34393650793650793,
|
| 37950 |
+
"grad_norm": 0.1669921875,
|
| 37951 |
+
"learning_rate": 0.1,
|
| 37952 |
+
"loss": 2.1836607456207275,
|
| 37953 |
+
"step": 10834
|
| 37954 |
+
},
|
| 37955 |
+
{
|
| 37956 |
+
"epoch": 0.344,
|
| 37957 |
+
"grad_norm": 0.224609375,
|
| 37958 |
+
"learning_rate": 0.1,
|
| 37959 |
+
"loss": 2.1935431957244873,
|
| 37960 |
+
"step": 10836
|
| 37961 |
+
},
|
| 37962 |
+
{
|
| 37963 |
+
"epoch": 0.34406349206349207,
|
| 37964 |
+
"grad_norm": 0.1494140625,
|
| 37965 |
+
"learning_rate": 0.1,
|
| 37966 |
+
"loss": 2.2080745697021484,
|
| 37967 |
+
"step": 10838
|
| 37968 |
+
},
|
| 37969 |
+
{
|
| 37970 |
+
"epoch": 0.3441269841269841,
|
| 37971 |
+
"grad_norm": 0.216796875,
|
| 37972 |
+
"learning_rate": 0.1,
|
| 37973 |
+
"loss": 2.1659133434295654,
|
| 37974 |
+
"step": 10840
|
| 37975 |
+
},
|
| 37976 |
+
{
|
| 37977 |
+
"epoch": 0.3441904761904762,
|
| 37978 |
+
"grad_norm": 0.1650390625,
|
| 37979 |
+
"learning_rate": 0.1,
|
| 37980 |
+
"loss": 2.208919048309326,
|
| 37981 |
+
"step": 10842
|
| 37982 |
+
},
|
| 37983 |
+
{
|
| 37984 |
+
"epoch": 0.34425396825396826,
|
| 37985 |
+
"grad_norm": 0.10986328125,
|
| 37986 |
+
"learning_rate": 0.1,
|
| 37987 |
+
"loss": 2.192574977874756,
|
| 37988 |
+
"step": 10844
|
| 37989 |
+
},
|
| 37990 |
+
{
|
| 37991 |
+
"epoch": 0.3443174603174603,
|
| 37992 |
+
"grad_norm": 0.09619140625,
|
| 37993 |
+
"learning_rate": 0.1,
|
| 37994 |
+
"loss": 2.2011077404022217,
|
| 37995 |
+
"step": 10846
|
| 37996 |
+
},
|
| 37997 |
+
{
|
| 37998 |
+
"epoch": 0.3443809523809524,
|
| 37999 |
+
"grad_norm": 0.166015625,
|
| 38000 |
+
"learning_rate": 0.1,
|
| 38001 |
+
"loss": 2.2008471488952637,
|
| 38002 |
+
"step": 10848
|
| 38003 |
+
},
|
| 38004 |
+
{
|
| 38005 |
+
"epoch": 0.34444444444444444,
|
| 38006 |
+
"grad_norm": 0.15625,
|
| 38007 |
+
"learning_rate": 0.1,
|
| 38008 |
+
"loss": 2.2248358726501465,
|
| 38009 |
+
"step": 10850
|
| 38010 |
+
},
|
| 38011 |
+
{
|
| 38012 |
+
"epoch": 0.3445079365079365,
|
| 38013 |
+
"grad_norm": 0.1005859375,
|
| 38014 |
+
"learning_rate": 0.1,
|
| 38015 |
+
"loss": 2.2257256507873535,
|
| 38016 |
+
"step": 10852
|
| 38017 |
+
},
|
| 38018 |
+
{
|
| 38019 |
+
"epoch": 0.3445714285714286,
|
| 38020 |
+
"grad_norm": 0.150390625,
|
| 38021 |
+
"learning_rate": 0.1,
|
| 38022 |
+
"loss": 2.222543239593506,
|
| 38023 |
+
"step": 10854
|
| 38024 |
+
},
|
| 38025 |
+
{
|
| 38026 |
+
"epoch": 0.3446349206349206,
|
| 38027 |
+
"grad_norm": 0.20703125,
|
| 38028 |
+
"learning_rate": 0.1,
|
| 38029 |
+
"loss": 2.2047646045684814,
|
| 38030 |
+
"step": 10856
|
| 38031 |
+
},
|
| 38032 |
+
{
|
| 38033 |
+
"epoch": 0.3446984126984127,
|
| 38034 |
+
"grad_norm": 0.0478515625,
|
| 38035 |
+
"learning_rate": 0.1,
|
| 38036 |
+
"loss": 2.201016426086426,
|
| 38037 |
+
"step": 10858
|
| 38038 |
+
},
|
| 38039 |
+
{
|
| 38040 |
+
"epoch": 0.34476190476190477,
|
| 38041 |
+
"grad_norm": 0.30859375,
|
| 38042 |
+
"learning_rate": 0.1,
|
| 38043 |
+
"loss": 2.212066650390625,
|
| 38044 |
+
"step": 10860
|
| 38045 |
+
},
|
| 38046 |
+
{
|
| 38047 |
+
"epoch": 0.3448253968253968,
|
| 38048 |
+
"grad_norm": 0.65234375,
|
| 38049 |
+
"learning_rate": 0.1,
|
| 38050 |
+
"loss": 2.2267720699310303,
|
| 38051 |
+
"step": 10862
|
| 38052 |
+
},
|
| 38053 |
+
{
|
| 38054 |
+
"epoch": 0.3448888888888889,
|
| 38055 |
+
"grad_norm": 0.07763671875,
|
| 38056 |
+
"learning_rate": 0.1,
|
| 38057 |
+
"loss": 2.213447332382202,
|
| 38058 |
+
"step": 10864
|
| 38059 |
+
},
|
| 38060 |
+
{
|
| 38061 |
+
"epoch": 0.34495238095238095,
|
| 38062 |
+
"grad_norm": 0.09765625,
|
| 38063 |
+
"learning_rate": 0.1,
|
| 38064 |
+
"loss": 2.2096965312957764,
|
| 38065 |
+
"step": 10866
|
| 38066 |
+
},
|
| 38067 |
+
{
|
| 38068 |
+
"epoch": 0.345015873015873,
|
| 38069 |
+
"grad_norm": 0.09375,
|
| 38070 |
+
"learning_rate": 0.1,
|
| 38071 |
+
"loss": 2.217432737350464,
|
| 38072 |
+
"step": 10868
|
| 38073 |
+
},
|
| 38074 |
+
{
|
| 38075 |
+
"epoch": 0.3450793650793651,
|
| 38076 |
+
"grad_norm": 0.12255859375,
|
| 38077 |
+
"learning_rate": 0.1,
|
| 38078 |
+
"loss": 2.2168514728546143,
|
| 38079 |
+
"step": 10870
|
| 38080 |
+
},
|
| 38081 |
+
{
|
| 38082 |
+
"epoch": 0.34514285714285714,
|
| 38083 |
+
"grad_norm": 0.08154296875,
|
| 38084 |
+
"learning_rate": 0.1,
|
| 38085 |
+
"loss": 2.2287988662719727,
|
| 38086 |
+
"step": 10872
|
| 38087 |
+
},
|
| 38088 |
+
{
|
| 38089 |
+
"epoch": 0.3452063492063492,
|
| 38090 |
+
"grad_norm": 0.11767578125,
|
| 38091 |
+
"learning_rate": 0.1,
|
| 38092 |
+
"loss": 2.2565572261810303,
|
| 38093 |
+
"step": 10874
|
| 38094 |
+
},
|
| 38095 |
+
{
|
| 38096 |
+
"epoch": 0.3452698412698413,
|
| 38097 |
+
"grad_norm": 0.3828125,
|
| 38098 |
+
"learning_rate": 0.1,
|
| 38099 |
+
"loss": 2.2269654273986816,
|
| 38100 |
+
"step": 10876
|
| 38101 |
+
},
|
| 38102 |
+
{
|
| 38103 |
+
"epoch": 0.3453333333333333,
|
| 38104 |
+
"grad_norm": 0.3671875,
|
| 38105 |
+
"learning_rate": 0.1,
|
| 38106 |
+
"loss": 2.219343662261963,
|
| 38107 |
+
"step": 10878
|
| 38108 |
+
},
|
| 38109 |
+
{
|
| 38110 |
+
"epoch": 0.3453968253968254,
|
| 38111 |
+
"grad_norm": 0.13671875,
|
| 38112 |
+
"learning_rate": 0.1,
|
| 38113 |
+
"loss": 2.265392303466797,
|
| 38114 |
+
"step": 10880
|
| 38115 |
+
},
|
| 38116 |
+
{
|
| 38117 |
+
"epoch": 0.34546031746031747,
|
| 38118 |
+
"grad_norm": 0.10693359375,
|
| 38119 |
+
"learning_rate": 0.1,
|
| 38120 |
+
"loss": 2.259321689605713,
|
| 38121 |
+
"step": 10882
|
| 38122 |
+
},
|
| 38123 |
+
{
|
| 38124 |
+
"epoch": 0.3455238095238095,
|
| 38125 |
+
"grad_norm": 0.060302734375,
|
| 38126 |
+
"learning_rate": 0.1,
|
| 38127 |
+
"loss": 2.2631168365478516,
|
| 38128 |
+
"step": 10884
|
| 38129 |
+
},
|
| 38130 |
+
{
|
| 38131 |
+
"epoch": 0.3455873015873016,
|
| 38132 |
+
"grad_norm": 0.185546875,
|
| 38133 |
+
"learning_rate": 0.1,
|
| 38134 |
+
"loss": 2.230673313140869,
|
| 38135 |
+
"step": 10886
|
| 38136 |
+
},
|
| 38137 |
+
{
|
| 38138 |
+
"epoch": 0.34565079365079365,
|
| 38139 |
+
"grad_norm": 0.060546875,
|
| 38140 |
+
"learning_rate": 0.1,
|
| 38141 |
+
"loss": 2.241331100463867,
|
| 38142 |
+
"step": 10888
|
| 38143 |
+
},
|
| 38144 |
+
{
|
| 38145 |
+
"epoch": 0.3457142857142857,
|
| 38146 |
+
"grad_norm": 0.091796875,
|
| 38147 |
+
"learning_rate": 0.1,
|
| 38148 |
+
"loss": 2.260580062866211,
|
| 38149 |
+
"step": 10890
|
| 38150 |
+
},
|
| 38151 |
+
{
|
| 38152 |
+
"epoch": 0.3457777777777778,
|
| 38153 |
+
"grad_norm": 0.1806640625,
|
| 38154 |
+
"learning_rate": 0.1,
|
| 38155 |
+
"loss": 2.268975257873535,
|
| 38156 |
+
"step": 10892
|
| 38157 |
+
},
|
| 38158 |
+
{
|
| 38159 |
+
"epoch": 0.34584126984126984,
|
| 38160 |
+
"grad_norm": 0.1318359375,
|
| 38161 |
+
"learning_rate": 0.1,
|
| 38162 |
+
"loss": 2.2483866214752197,
|
| 38163 |
+
"step": 10894
|
| 38164 |
+
},
|
| 38165 |
+
{
|
| 38166 |
+
"epoch": 0.3459047619047619,
|
| 38167 |
+
"grad_norm": 0.1123046875,
|
| 38168 |
+
"learning_rate": 0.1,
|
| 38169 |
+
"loss": 2.2478039264678955,
|
| 38170 |
+
"step": 10896
|
| 38171 |
+
},
|
| 38172 |
+
{
|
| 38173 |
+
"epoch": 0.345968253968254,
|
| 38174 |
+
"grad_norm": 0.12353515625,
|
| 38175 |
+
"learning_rate": 0.1,
|
| 38176 |
+
"loss": 2.266867160797119,
|
| 38177 |
+
"step": 10898
|
| 38178 |
+
},
|
| 38179 |
+
{
|
| 38180 |
+
"epoch": 0.346031746031746,
|
| 38181 |
+
"grad_norm": 0.2119140625,
|
| 38182 |
+
"learning_rate": 0.1,
|
| 38183 |
+
"loss": 2.2199819087982178,
|
| 38184 |
+
"step": 10900
|
| 38185 |
+
},
|
| 38186 |
+
{
|
| 38187 |
+
"epoch": 0.3460952380952381,
|
| 38188 |
+
"grad_norm": 0.2041015625,
|
| 38189 |
+
"learning_rate": 0.1,
|
| 38190 |
+
"loss": 2.24721097946167,
|
| 38191 |
+
"step": 10902
|
| 38192 |
+
},
|
| 38193 |
+
{
|
| 38194 |
+
"epoch": 0.34615873015873017,
|
| 38195 |
+
"grad_norm": 0.0693359375,
|
| 38196 |
+
"learning_rate": 0.1,
|
| 38197 |
+
"loss": 2.2504703998565674,
|
| 38198 |
+
"step": 10904
|
| 38199 |
+
},
|
| 38200 |
+
{
|
| 38201 |
+
"epoch": 0.3462222222222222,
|
| 38202 |
+
"grad_norm": 0.208984375,
|
| 38203 |
+
"learning_rate": 0.1,
|
| 38204 |
+
"loss": 2.250040054321289,
|
| 38205 |
+
"step": 10906
|
| 38206 |
+
},
|
| 38207 |
+
{
|
| 38208 |
+
"epoch": 0.3462857142857143,
|
| 38209 |
+
"grad_norm": 0.373046875,
|
| 38210 |
+
"learning_rate": 0.1,
|
| 38211 |
+
"loss": 2.261671781539917,
|
| 38212 |
+
"step": 10908
|
| 38213 |
+
},
|
| 38214 |
+
{
|
| 38215 |
+
"epoch": 0.34634920634920635,
|
| 38216 |
+
"grad_norm": 0.058837890625,
|
| 38217 |
+
"learning_rate": 0.1,
|
| 38218 |
+
"loss": 2.241607427597046,
|
| 38219 |
+
"step": 10910
|
| 38220 |
+
},
|
| 38221 |
+
{
|
| 38222 |
+
"epoch": 0.3464126984126984,
|
| 38223 |
+
"grad_norm": 0.25390625,
|
| 38224 |
+
"learning_rate": 0.1,
|
| 38225 |
+
"loss": 2.230332612991333,
|
| 38226 |
+
"step": 10912
|
| 38227 |
+
},
|
| 38228 |
+
{
|
| 38229 |
+
"epoch": 0.3464761904761905,
|
| 38230 |
+
"grad_norm": 0.2373046875,
|
| 38231 |
+
"learning_rate": 0.1,
|
| 38232 |
+
"loss": 2.251079559326172,
|
| 38233 |
+
"step": 10914
|
| 38234 |
+
},
|
| 38235 |
+
{
|
| 38236 |
+
"epoch": 0.34653968253968254,
|
| 38237 |
+
"grad_norm": 0.1474609375,
|
| 38238 |
+
"learning_rate": 0.1,
|
| 38239 |
+
"loss": 2.264204740524292,
|
| 38240 |
+
"step": 10916
|
| 38241 |
+
},
|
| 38242 |
+
{
|
| 38243 |
+
"epoch": 0.3466031746031746,
|
| 38244 |
+
"grad_norm": 0.146484375,
|
| 38245 |
+
"learning_rate": 0.1,
|
| 38246 |
+
"loss": 2.2764639854431152,
|
| 38247 |
+
"step": 10918
|
| 38248 |
+
},
|
| 38249 |
+
{
|
| 38250 |
+
"epoch": 0.3466666666666667,
|
| 38251 |
+
"grad_norm": 0.162109375,
|
| 38252 |
+
"learning_rate": 0.1,
|
| 38253 |
+
"loss": 2.271958112716675,
|
| 38254 |
+
"step": 10920
|
| 38255 |
+
},
|
| 38256 |
+
{
|
| 38257 |
+
"epoch": 0.3467301587301587,
|
| 38258 |
+
"grad_norm": 0.1162109375,
|
| 38259 |
+
"learning_rate": 0.1,
|
| 38260 |
+
"loss": 2.2935328483581543,
|
| 38261 |
+
"step": 10922
|
| 38262 |
+
},
|
| 38263 |
+
{
|
| 38264 |
+
"epoch": 0.3467936507936508,
|
| 38265 |
+
"grad_norm": 0.10205078125,
|
| 38266 |
+
"learning_rate": 0.1,
|
| 38267 |
+
"loss": 2.3027031421661377,
|
| 38268 |
+
"step": 10924
|
| 38269 |
+
},
|
| 38270 |
+
{
|
| 38271 |
+
"epoch": 0.34685714285714286,
|
| 38272 |
+
"grad_norm": 0.10107421875,
|
| 38273 |
+
"learning_rate": 0.1,
|
| 38274 |
+
"loss": 2.275430202484131,
|
| 38275 |
+
"step": 10926
|
| 38276 |
+
},
|
| 38277 |
+
{
|
| 38278 |
+
"epoch": 0.3469206349206349,
|
| 38279 |
+
"grad_norm": 0.109375,
|
| 38280 |
+
"learning_rate": 0.1,
|
| 38281 |
+
"loss": 2.2565975189208984,
|
| 38282 |
+
"step": 10928
|
| 38283 |
+
},
|
| 38284 |
+
{
|
| 38285 |
+
"epoch": 0.346984126984127,
|
| 38286 |
+
"grad_norm": 0.1171875,
|
| 38287 |
+
"learning_rate": 0.1,
|
| 38288 |
+
"loss": 2.2793102264404297,
|
| 38289 |
+
"step": 10930
|
| 38290 |
+
},
|
| 38291 |
+
{
|
| 38292 |
+
"epoch": 0.34704761904761905,
|
| 38293 |
+
"grad_norm": 0.2578125,
|
| 38294 |
+
"learning_rate": 0.1,
|
| 38295 |
+
"loss": 2.299699306488037,
|
| 38296 |
+
"step": 10932
|
| 38297 |
+
},
|
| 38298 |
+
{
|
| 38299 |
+
"epoch": 0.3471111111111111,
|
| 38300 |
+
"grad_norm": 0.33984375,
|
| 38301 |
+
"learning_rate": 0.1,
|
| 38302 |
+
"loss": 2.326904296875,
|
| 38303 |
+
"step": 10934
|
| 38304 |
+
},
|
| 38305 |
+
{
|
| 38306 |
+
"epoch": 0.3471746031746032,
|
| 38307 |
+
"grad_norm": 0.2236328125,
|
| 38308 |
+
"learning_rate": 0.1,
|
| 38309 |
+
"loss": 2.2868425846099854,
|
| 38310 |
+
"step": 10936
|
| 38311 |
+
},
|
| 38312 |
+
{
|
| 38313 |
+
"epoch": 0.34723809523809523,
|
| 38314 |
+
"grad_norm": 0.3515625,
|
| 38315 |
+
"learning_rate": 0.1,
|
| 38316 |
+
"loss": 2.2794508934020996,
|
| 38317 |
+
"step": 10938
|
| 38318 |
+
},
|
| 38319 |
+
{
|
| 38320 |
+
"epoch": 0.3473015873015873,
|
| 38321 |
+
"grad_norm": 0.2021484375,
|
| 38322 |
+
"learning_rate": 0.1,
|
| 38323 |
+
"loss": 2.3169283866882324,
|
| 38324 |
+
"step": 10940
|
| 38325 |
+
},
|
| 38326 |
+
{
|
| 38327 |
+
"epoch": 0.3473650793650794,
|
| 38328 |
+
"grad_norm": 0.2578125,
|
| 38329 |
+
"learning_rate": 0.1,
|
| 38330 |
+
"loss": 2.2692878246307373,
|
| 38331 |
+
"step": 10942
|
| 38332 |
+
},
|
| 38333 |
+
{
|
| 38334 |
+
"epoch": 0.3474285714285714,
|
| 38335 |
+
"grad_norm": 0.29296875,
|
| 38336 |
+
"learning_rate": 0.1,
|
| 38337 |
+
"loss": 2.31179141998291,
|
| 38338 |
+
"step": 10944
|
| 38339 |
+
},
|
| 38340 |
+
{
|
| 38341 |
+
"epoch": 0.3474920634920635,
|
| 38342 |
+
"grad_norm": 0.0517578125,
|
| 38343 |
+
"learning_rate": 0.1,
|
| 38344 |
+
"loss": 2.3011832237243652,
|
| 38345 |
+
"step": 10946
|
| 38346 |
+
},
|
| 38347 |
+
{
|
| 38348 |
+
"epoch": 0.34755555555555556,
|
| 38349 |
+
"grad_norm": 0.099609375,
|
| 38350 |
+
"learning_rate": 0.1,
|
| 38351 |
+
"loss": 2.289245128631592,
|
| 38352 |
+
"step": 10948
|
| 38353 |
+
},
|
| 38354 |
+
{
|
| 38355 |
+
"epoch": 0.3476190476190476,
|
| 38356 |
+
"grad_norm": 0.1005859375,
|
| 38357 |
+
"learning_rate": 0.1,
|
| 38358 |
+
"loss": 2.3354389667510986,
|
| 38359 |
+
"step": 10950
|
| 38360 |
+
},
|
| 38361 |
+
{
|
| 38362 |
+
"epoch": 0.3476825396825397,
|
| 38363 |
+
"grad_norm": 0.2236328125,
|
| 38364 |
+
"learning_rate": 0.1,
|
| 38365 |
+
"loss": 2.2830705642700195,
|
| 38366 |
+
"step": 10952
|
| 38367 |
+
},
|
| 38368 |
+
{
|
| 38369 |
+
"epoch": 0.34774603174603175,
|
| 38370 |
+
"grad_norm": 0.12060546875,
|
| 38371 |
+
"learning_rate": 0.1,
|
| 38372 |
+
"loss": 2.2964699268341064,
|
| 38373 |
+
"step": 10954
|
| 38374 |
+
},
|
| 38375 |
+
{
|
| 38376 |
+
"epoch": 0.3478095238095238,
|
| 38377 |
+
"grad_norm": 0.1083984375,
|
| 38378 |
+
"learning_rate": 0.1,
|
| 38379 |
+
"loss": 2.3209269046783447,
|
| 38380 |
+
"step": 10956
|
| 38381 |
+
},
|
| 38382 |
+
{
|
| 38383 |
+
"epoch": 0.3478730158730159,
|
| 38384 |
+
"grad_norm": 0.19140625,
|
| 38385 |
+
"learning_rate": 0.1,
|
| 38386 |
+
"loss": 2.3036513328552246,
|
| 38387 |
+
"step": 10958
|
| 38388 |
+
},
|
| 38389 |
+
{
|
| 38390 |
+
"epoch": 0.34793650793650793,
|
| 38391 |
+
"grad_norm": 0.328125,
|
| 38392 |
+
"learning_rate": 0.1,
|
| 38393 |
+
"loss": 2.296513795852661,
|
| 38394 |
+
"step": 10960
|
| 38395 |
+
},
|
| 38396 |
+
{
|
| 38397 |
+
"epoch": 0.348,
|
| 38398 |
+
"grad_norm": 0.109375,
|
| 38399 |
+
"learning_rate": 0.1,
|
| 38400 |
+
"loss": 2.307774066925049,
|
| 38401 |
+
"step": 10962
|
| 38402 |
+
},
|
| 38403 |
+
{
|
| 38404 |
+
"epoch": 0.3480634920634921,
|
| 38405 |
+
"grad_norm": 0.0712890625,
|
| 38406 |
+
"learning_rate": 0.1,
|
| 38407 |
+
"loss": 2.314592123031616,
|
| 38408 |
+
"step": 10964
|
| 38409 |
+
},
|
| 38410 |
+
{
|
| 38411 |
+
"epoch": 0.3481269841269841,
|
| 38412 |
+
"grad_norm": 0.07763671875,
|
| 38413 |
+
"learning_rate": 0.1,
|
| 38414 |
+
"loss": 2.3010873794555664,
|
| 38415 |
+
"step": 10966
|
| 38416 |
+
},
|
| 38417 |
+
{
|
| 38418 |
+
"epoch": 0.3481904761904762,
|
| 38419 |
+
"grad_norm": 0.07666015625,
|
| 38420 |
+
"learning_rate": 0.1,
|
| 38421 |
+
"loss": 2.314419746398926,
|
| 38422 |
+
"step": 10968
|
| 38423 |
+
},
|
| 38424 |
+
{
|
| 38425 |
+
"epoch": 0.34825396825396826,
|
| 38426 |
+
"grad_norm": 0.1083984375,
|
| 38427 |
+
"learning_rate": 0.1,
|
| 38428 |
+
"loss": 2.320044994354248,
|
| 38429 |
+
"step": 10970
|
| 38430 |
+
},
|
| 38431 |
+
{
|
| 38432 |
+
"epoch": 0.3483174603174603,
|
| 38433 |
+
"grad_norm": 0.12109375,
|
| 38434 |
+
"learning_rate": 0.1,
|
| 38435 |
+
"loss": 2.3089184761047363,
|
| 38436 |
+
"step": 10972
|
| 38437 |
+
},
|
| 38438 |
+
{
|
| 38439 |
+
"epoch": 0.3483809523809524,
|
| 38440 |
+
"grad_norm": 0.07373046875,
|
| 38441 |
+
"learning_rate": 0.1,
|
| 38442 |
+
"loss": 2.3317675590515137,
|
| 38443 |
+
"step": 10974
|
| 38444 |
+
},
|
| 38445 |
+
{
|
| 38446 |
+
"epoch": 0.34844444444444445,
|
| 38447 |
+
"grad_norm": 0.14453125,
|
| 38448 |
+
"learning_rate": 0.1,
|
| 38449 |
+
"loss": 2.3759024143218994,
|
| 38450 |
+
"step": 10976
|
| 38451 |
+
},
|
| 38452 |
+
{
|
| 38453 |
+
"epoch": 0.3485079365079365,
|
| 38454 |
+
"grad_norm": 0.51953125,
|
| 38455 |
+
"learning_rate": 0.1,
|
| 38456 |
+
"loss": 2.3379838466644287,
|
| 38457 |
+
"step": 10978
|
| 38458 |
+
},
|
| 38459 |
+
{
|
| 38460 |
+
"epoch": 0.3485714285714286,
|
| 38461 |
+
"grad_norm": 0.337890625,
|
| 38462 |
+
"learning_rate": 0.1,
|
| 38463 |
+
"loss": 2.3536903858184814,
|
| 38464 |
+
"step": 10980
|
| 38465 |
+
},
|
| 38466 |
+
{
|
| 38467 |
+
"epoch": 0.34863492063492063,
|
| 38468 |
+
"grad_norm": 0.267578125,
|
| 38469 |
+
"learning_rate": 0.1,
|
| 38470 |
+
"loss": 2.3582966327667236,
|
| 38471 |
+
"step": 10982
|
| 38472 |
+
},
|
| 38473 |
+
{
|
| 38474 |
+
"epoch": 0.3486984126984127,
|
| 38475 |
+
"grad_norm": 0.12353515625,
|
| 38476 |
+
"learning_rate": 0.1,
|
| 38477 |
+
"loss": 2.3518645763397217,
|
| 38478 |
+
"step": 10984
|
| 38479 |
+
},
|
| 38480 |
+
{
|
| 38481 |
+
"epoch": 0.3487619047619048,
|
| 38482 |
+
"grad_norm": 0.24609375,
|
| 38483 |
+
"learning_rate": 0.1,
|
| 38484 |
+
"loss": 2.381120204925537,
|
| 38485 |
+
"step": 10986
|
| 38486 |
+
},
|
| 38487 |
+
{
|
| 38488 |
+
"epoch": 0.3488253968253968,
|
| 38489 |
+
"grad_norm": 0.31640625,
|
| 38490 |
+
"learning_rate": 0.1,
|
| 38491 |
+
"loss": 2.351613998413086,
|
| 38492 |
+
"step": 10988
|
| 38493 |
+
},
|
| 38494 |
+
{
|
| 38495 |
+
"epoch": 0.3488888888888889,
|
| 38496 |
+
"grad_norm": 0.12451171875,
|
| 38497 |
+
"learning_rate": 0.1,
|
| 38498 |
+
"loss": 2.362210988998413,
|
| 38499 |
+
"step": 10990
|
| 38500 |
+
},
|
| 38501 |
+
{
|
| 38502 |
+
"epoch": 0.34895238095238096,
|
| 38503 |
+
"grad_norm": 0.1318359375,
|
| 38504 |
+
"learning_rate": 0.1,
|
| 38505 |
+
"loss": 2.353654623031616,
|
| 38506 |
+
"step": 10992
|
| 38507 |
+
},
|
| 38508 |
+
{
|
| 38509 |
+
"epoch": 0.349015873015873,
|
| 38510 |
+
"grad_norm": 0.08740234375,
|
| 38511 |
+
"learning_rate": 0.1,
|
| 38512 |
+
"loss": 2.371495485305786,
|
| 38513 |
+
"step": 10994
|
| 38514 |
+
},
|
| 38515 |
+
{
|
| 38516 |
+
"epoch": 0.3490793650793651,
|
| 38517 |
+
"grad_norm": 0.15625,
|
| 38518 |
+
"learning_rate": 0.1,
|
| 38519 |
+
"loss": 2.387871026992798,
|
| 38520 |
+
"step": 10996
|
| 38521 |
+
},
|
| 38522 |
+
{
|
| 38523 |
+
"epoch": 0.34914285714285714,
|
| 38524 |
+
"grad_norm": 0.0830078125,
|
| 38525 |
+
"learning_rate": 0.1,
|
| 38526 |
+
"loss": 2.3548786640167236,
|
| 38527 |
+
"step": 10998
|
| 38528 |
+
},
|
| 38529 |
+
{
|
| 38530 |
+
"epoch": 0.3492063492063492,
|
| 38531 |
+
"grad_norm": 0.099609375,
|
| 38532 |
+
"learning_rate": 0.1,
|
| 38533 |
+
"loss": 2.38694429397583,
|
| 38534 |
+
"step": 11000
|
| 38535 |
+
},
|
| 38536 |
+
{
|
| 38537 |
+
"epoch": 0.3492698412698413,
|
| 38538 |
+
"grad_norm": 0.125,
|
| 38539 |
+
"learning_rate": 0.1,
|
| 38540 |
+
"loss": 2.3750088214874268,
|
| 38541 |
+
"step": 11002
|
| 38542 |
+
},
|
| 38543 |
+
{
|
| 38544 |
+
"epoch": 0.34933333333333333,
|
| 38545 |
+
"grad_norm": 0.326171875,
|
| 38546 |
+
"learning_rate": 0.1,
|
| 38547 |
+
"loss": 2.3959736824035645,
|
| 38548 |
+
"step": 11004
|
| 38549 |
+
},
|
| 38550 |
+
{
|
| 38551 |
+
"epoch": 0.3493968253968254,
|
| 38552 |
+
"grad_norm": 0.236328125,
|
| 38553 |
+
"learning_rate": 0.1,
|
| 38554 |
+
"loss": 2.3784327507019043,
|
| 38555 |
+
"step": 11006
|
| 38556 |
+
},
|
| 38557 |
+
{
|
| 38558 |
+
"epoch": 0.34946031746031747,
|
| 38559 |
+
"grad_norm": 0.142578125,
|
| 38560 |
+
"learning_rate": 0.1,
|
| 38561 |
+
"loss": 2.387637138366699,
|
| 38562 |
+
"step": 11008
|
| 38563 |
+
},
|
| 38564 |
+
{
|
| 38565 |
+
"epoch": 0.3495238095238095,
|
| 38566 |
+
"grad_norm": 0.1943359375,
|
| 38567 |
+
"learning_rate": 0.1,
|
| 38568 |
+
"loss": 2.3979554176330566,
|
| 38569 |
+
"step": 11010
|
| 38570 |
+
},
|
| 38571 |
+
{
|
| 38572 |
+
"epoch": 0.3495873015873016,
|
| 38573 |
+
"grad_norm": 0.396484375,
|
| 38574 |
+
"learning_rate": 0.1,
|
| 38575 |
+
"loss": 2.3882758617401123,
|
| 38576 |
+
"step": 11012
|
| 38577 |
+
},
|
| 38578 |
+
{
|
| 38579 |
+
"epoch": 0.34965079365079366,
|
| 38580 |
+
"grad_norm": 0.11376953125,
|
| 38581 |
+
"learning_rate": 0.1,
|
| 38582 |
+
"loss": 2.3989250659942627,
|
| 38583 |
+
"step": 11014
|
| 38584 |
+
},
|
| 38585 |
+
{
|
| 38586 |
+
"epoch": 0.3497142857142857,
|
| 38587 |
+
"grad_norm": 0.0439453125,
|
| 38588 |
+
"learning_rate": 0.1,
|
| 38589 |
+
"loss": 2.4110400676727295,
|
| 38590 |
+
"step": 11016
|
| 38591 |
+
},
|
| 38592 |
+
{
|
| 38593 |
+
"epoch": 0.3497777777777778,
|
| 38594 |
+
"grad_norm": 0.053955078125,
|
| 38595 |
+
"learning_rate": 0.1,
|
| 38596 |
+
"loss": 2.3825628757476807,
|
| 38597 |
+
"step": 11018
|
| 38598 |
+
},
|
| 38599 |
+
{
|
| 38600 |
+
"epoch": 0.34984126984126984,
|
| 38601 |
+
"grad_norm": 0.06884765625,
|
| 38602 |
+
"learning_rate": 0.1,
|
| 38603 |
+
"loss": 2.39042329788208,
|
| 38604 |
+
"step": 11020
|
| 38605 |
+
},
|
| 38606 |
+
{
|
| 38607 |
+
"epoch": 0.3499047619047619,
|
| 38608 |
+
"grad_norm": 0.1376953125,
|
| 38609 |
+
"learning_rate": 0.1,
|
| 38610 |
+
"loss": 2.4223592281341553,
|
| 38611 |
+
"step": 11022
|
| 38612 |
+
},
|
| 38613 |
+
{
|
| 38614 |
+
"epoch": 0.349968253968254,
|
| 38615 |
+
"grad_norm": 0.13671875,
|
| 38616 |
+
"learning_rate": 0.1,
|
| 38617 |
+
"loss": 2.3982603549957275,
|
| 38618 |
+
"step": 11024
|
| 38619 |
}
|
| 38620 |
],
|
| 38621 |
"logging_steps": 2,
|
|
|
|
| 38635 |
"attributes": {}
|
| 38636 |
}
|
| 38637 |
},
|
| 38638 |
+
"total_flos": 3.6513666425418183e+19,
|
| 38639 |
"train_batch_size": 4,
|
| 38640 |
"trial_name": null,
|
| 38641 |
"trial_params": null
|