Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 11970, checkpoint
Browse files- last-checkpoint/trainer_state.json +1109 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -40821,6 +40821,1112 @@
|
|
| 40821 |
"learning_rate": 0.1,
|
| 40822 |
"loss": 2.473306179046631,
|
| 40823 |
"step": 11654
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40824 |
}
|
| 40825 |
],
|
| 40826 |
"logging_steps": 2,
|
|
@@ -40840,7 +41946,7 @@
|
|
| 40840 |
"attributes": {}
|
| 40841 |
}
|
| 40842 |
},
|
| 40843 |
-
"total_flos": 3.
|
| 40844 |
"train_batch_size": 4,
|
| 40845 |
"trial_name": null,
|
| 40846 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.38,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 11970,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 40821 |
"learning_rate": 0.1,
|
| 40822 |
"loss": 2.473306179046631,
|
| 40823 |
"step": 11654
|
| 40824 |
+
},
|
| 40825 |
+
{
|
| 40826 |
+
"epoch": 0.37003174603174604,
|
| 40827 |
+
"grad_norm": 0.31640625,
|
| 40828 |
+
"learning_rate": 0.1,
|
| 40829 |
+
"loss": 2.4608564376831055,
|
| 40830 |
+
"step": 11656
|
| 40831 |
+
},
|
| 40832 |
+
{
|
| 40833 |
+
"epoch": 0.3700952380952381,
|
| 40834 |
+
"grad_norm": 0.23046875,
|
| 40835 |
+
"learning_rate": 0.1,
|
| 40836 |
+
"loss": 2.4572176933288574,
|
| 40837 |
+
"step": 11658
|
| 40838 |
+
},
|
| 40839 |
+
{
|
| 40840 |
+
"epoch": 0.37015873015873013,
|
| 40841 |
+
"grad_norm": 0.2373046875,
|
| 40842 |
+
"learning_rate": 0.1,
|
| 40843 |
+
"loss": 2.5033202171325684,
|
| 40844 |
+
"step": 11660
|
| 40845 |
+
},
|
| 40846 |
+
{
|
| 40847 |
+
"epoch": 0.37022222222222223,
|
| 40848 |
+
"grad_norm": 0.232421875,
|
| 40849 |
+
"learning_rate": 0.1,
|
| 40850 |
+
"loss": 2.4684784412384033,
|
| 40851 |
+
"step": 11662
|
| 40852 |
+
},
|
| 40853 |
+
{
|
| 40854 |
+
"epoch": 0.3702857142857143,
|
| 40855 |
+
"grad_norm": 0.140625,
|
| 40856 |
+
"learning_rate": 0.1,
|
| 40857 |
+
"loss": 2.4543795585632324,
|
| 40858 |
+
"step": 11664
|
| 40859 |
+
},
|
| 40860 |
+
{
|
| 40861 |
+
"epoch": 0.3703492063492064,
|
| 40862 |
+
"grad_norm": 0.2412109375,
|
| 40863 |
+
"learning_rate": 0.1,
|
| 40864 |
+
"loss": 2.4649386405944824,
|
| 40865 |
+
"step": 11666
|
| 40866 |
+
},
|
| 40867 |
+
{
|
| 40868 |
+
"epoch": 0.3704126984126984,
|
| 40869 |
+
"grad_norm": 0.1318359375,
|
| 40870 |
+
"learning_rate": 0.1,
|
| 40871 |
+
"loss": 2.442139148712158,
|
| 40872 |
+
"step": 11668
|
| 40873 |
+
},
|
| 40874 |
+
{
|
| 40875 |
+
"epoch": 0.37047619047619046,
|
| 40876 |
+
"grad_norm": 0.15234375,
|
| 40877 |
+
"learning_rate": 0.1,
|
| 40878 |
+
"loss": 2.467223644256592,
|
| 40879 |
+
"step": 11670
|
| 40880 |
+
},
|
| 40881 |
+
{
|
| 40882 |
+
"epoch": 0.37053968253968256,
|
| 40883 |
+
"grad_norm": 0.1025390625,
|
| 40884 |
+
"learning_rate": 0.1,
|
| 40885 |
+
"loss": 2.484752655029297,
|
| 40886 |
+
"step": 11672
|
| 40887 |
+
},
|
| 40888 |
+
{
|
| 40889 |
+
"epoch": 0.3706031746031746,
|
| 40890 |
+
"grad_norm": 0.1669921875,
|
| 40891 |
+
"learning_rate": 0.1,
|
| 40892 |
+
"loss": 2.461017370223999,
|
| 40893 |
+
"step": 11674
|
| 40894 |
+
},
|
| 40895 |
+
{
|
| 40896 |
+
"epoch": 0.37066666666666664,
|
| 40897 |
+
"grad_norm": 0.2197265625,
|
| 40898 |
+
"learning_rate": 0.1,
|
| 40899 |
+
"loss": 2.465242624282837,
|
| 40900 |
+
"step": 11676
|
| 40901 |
+
},
|
| 40902 |
+
{
|
| 40903 |
+
"epoch": 0.37073015873015874,
|
| 40904 |
+
"grad_norm": 0.2421875,
|
| 40905 |
+
"learning_rate": 0.1,
|
| 40906 |
+
"loss": 2.466881513595581,
|
| 40907 |
+
"step": 11678
|
| 40908 |
+
},
|
| 40909 |
+
{
|
| 40910 |
+
"epoch": 0.3707936507936508,
|
| 40911 |
+
"grad_norm": 0.18359375,
|
| 40912 |
+
"learning_rate": 0.1,
|
| 40913 |
+
"loss": 2.4572830200195312,
|
| 40914 |
+
"step": 11680
|
| 40915 |
+
},
|
| 40916 |
+
{
|
| 40917 |
+
"epoch": 0.37085714285714283,
|
| 40918 |
+
"grad_norm": 0.275390625,
|
| 40919 |
+
"learning_rate": 0.1,
|
| 40920 |
+
"loss": 2.473813772201538,
|
| 40921 |
+
"step": 11682
|
| 40922 |
+
},
|
| 40923 |
+
{
|
| 40924 |
+
"epoch": 0.37092063492063493,
|
| 40925 |
+
"grad_norm": 0.306640625,
|
| 40926 |
+
"learning_rate": 0.1,
|
| 40927 |
+
"loss": 2.466486930847168,
|
| 40928 |
+
"step": 11684
|
| 40929 |
+
},
|
| 40930 |
+
{
|
| 40931 |
+
"epoch": 0.37098412698412697,
|
| 40932 |
+
"grad_norm": 0.353515625,
|
| 40933 |
+
"learning_rate": 0.1,
|
| 40934 |
+
"loss": 2.4515933990478516,
|
| 40935 |
+
"step": 11686
|
| 40936 |
+
},
|
| 40937 |
+
{
|
| 40938 |
+
"epoch": 0.37104761904761907,
|
| 40939 |
+
"grad_norm": 0.2431640625,
|
| 40940 |
+
"learning_rate": 0.1,
|
| 40941 |
+
"loss": 2.45233416557312,
|
| 40942 |
+
"step": 11688
|
| 40943 |
+
},
|
| 40944 |
+
{
|
| 40945 |
+
"epoch": 0.3711111111111111,
|
| 40946 |
+
"grad_norm": 0.162109375,
|
| 40947 |
+
"learning_rate": 0.1,
|
| 40948 |
+
"loss": 2.4697437286376953,
|
| 40949 |
+
"step": 11690
|
| 40950 |
+
},
|
| 40951 |
+
{
|
| 40952 |
+
"epoch": 0.37117460317460316,
|
| 40953 |
+
"grad_norm": 0.12109375,
|
| 40954 |
+
"learning_rate": 0.1,
|
| 40955 |
+
"loss": 2.4969775676727295,
|
| 40956 |
+
"step": 11692
|
| 40957 |
+
},
|
| 40958 |
+
{
|
| 40959 |
+
"epoch": 0.37123809523809526,
|
| 40960 |
+
"grad_norm": 0.07861328125,
|
| 40961 |
+
"learning_rate": 0.1,
|
| 40962 |
+
"loss": 2.4909560680389404,
|
| 40963 |
+
"step": 11694
|
| 40964 |
+
},
|
| 40965 |
+
{
|
| 40966 |
+
"epoch": 0.3713015873015873,
|
| 40967 |
+
"grad_norm": 0.09619140625,
|
| 40968 |
+
"learning_rate": 0.1,
|
| 40969 |
+
"loss": 2.466994285583496,
|
| 40970 |
+
"step": 11696
|
| 40971 |
+
},
|
| 40972 |
+
{
|
| 40973 |
+
"epoch": 0.37136507936507934,
|
| 40974 |
+
"grad_norm": 0.154296875,
|
| 40975 |
+
"learning_rate": 0.1,
|
| 40976 |
+
"loss": 2.456789493560791,
|
| 40977 |
+
"step": 11698
|
| 40978 |
+
},
|
| 40979 |
+
{
|
| 40980 |
+
"epoch": 0.37142857142857144,
|
| 40981 |
+
"grad_norm": 0.056396484375,
|
| 40982 |
+
"learning_rate": 0.1,
|
| 40983 |
+
"loss": 2.4662811756134033,
|
| 40984 |
+
"step": 11700
|
| 40985 |
+
},
|
| 40986 |
+
{
|
| 40987 |
+
"epoch": 0.3714920634920635,
|
| 40988 |
+
"grad_norm": 0.16796875,
|
| 40989 |
+
"learning_rate": 0.1,
|
| 40990 |
+
"loss": 2.4680871963500977,
|
| 40991 |
+
"step": 11702
|
| 40992 |
+
},
|
| 40993 |
+
{
|
| 40994 |
+
"epoch": 0.37155555555555553,
|
| 40995 |
+
"grad_norm": 0.75390625,
|
| 40996 |
+
"learning_rate": 0.1,
|
| 40997 |
+
"loss": 2.481945037841797,
|
| 40998 |
+
"step": 11704
|
| 40999 |
+
},
|
| 41000 |
+
{
|
| 41001 |
+
"epoch": 0.3716190476190476,
|
| 41002 |
+
"grad_norm": 0.2421875,
|
| 41003 |
+
"learning_rate": 0.1,
|
| 41004 |
+
"loss": 2.465639591217041,
|
| 41005 |
+
"step": 11706
|
| 41006 |
+
},
|
| 41007 |
+
{
|
| 41008 |
+
"epoch": 0.37168253968253967,
|
| 41009 |
+
"grad_norm": 0.10498046875,
|
| 41010 |
+
"learning_rate": 0.1,
|
| 41011 |
+
"loss": 2.4653689861297607,
|
| 41012 |
+
"step": 11708
|
| 41013 |
+
},
|
| 41014 |
+
{
|
| 41015 |
+
"epoch": 0.37174603174603177,
|
| 41016 |
+
"grad_norm": 0.1171875,
|
| 41017 |
+
"learning_rate": 0.1,
|
| 41018 |
+
"loss": 2.49106764793396,
|
| 41019 |
+
"step": 11710
|
| 41020 |
+
},
|
| 41021 |
+
{
|
| 41022 |
+
"epoch": 0.3718095238095238,
|
| 41023 |
+
"grad_norm": 0.21875,
|
| 41024 |
+
"learning_rate": 0.1,
|
| 41025 |
+
"loss": 2.4726955890655518,
|
| 41026 |
+
"step": 11712
|
| 41027 |
+
},
|
| 41028 |
+
{
|
| 41029 |
+
"epoch": 0.37187301587301586,
|
| 41030 |
+
"grad_norm": 0.2333984375,
|
| 41031 |
+
"learning_rate": 0.1,
|
| 41032 |
+
"loss": 2.4753713607788086,
|
| 41033 |
+
"step": 11714
|
| 41034 |
+
},
|
| 41035 |
+
{
|
| 41036 |
+
"epoch": 0.37193650793650795,
|
| 41037 |
+
"grad_norm": 0.09716796875,
|
| 41038 |
+
"learning_rate": 0.1,
|
| 41039 |
+
"loss": 2.4738142490386963,
|
| 41040 |
+
"step": 11716
|
| 41041 |
+
},
|
| 41042 |
+
{
|
| 41043 |
+
"epoch": 0.372,
|
| 41044 |
+
"grad_norm": 0.2255859375,
|
| 41045 |
+
"learning_rate": 0.1,
|
| 41046 |
+
"loss": 2.469127893447876,
|
| 41047 |
+
"step": 11718
|
| 41048 |
+
},
|
| 41049 |
+
{
|
| 41050 |
+
"epoch": 0.37206349206349204,
|
| 41051 |
+
"grad_norm": 0.271484375,
|
| 41052 |
+
"learning_rate": 0.1,
|
| 41053 |
+
"loss": 2.498310089111328,
|
| 41054 |
+
"step": 11720
|
| 41055 |
+
},
|
| 41056 |
+
{
|
| 41057 |
+
"epoch": 0.37212698412698414,
|
| 41058 |
+
"grad_norm": 0.1806640625,
|
| 41059 |
+
"learning_rate": 0.1,
|
| 41060 |
+
"loss": 2.4911274909973145,
|
| 41061 |
+
"step": 11722
|
| 41062 |
+
},
|
| 41063 |
+
{
|
| 41064 |
+
"epoch": 0.3721904761904762,
|
| 41065 |
+
"grad_norm": 0.1640625,
|
| 41066 |
+
"learning_rate": 0.1,
|
| 41067 |
+
"loss": 2.496417760848999,
|
| 41068 |
+
"step": 11724
|
| 41069 |
+
},
|
| 41070 |
+
{
|
| 41071 |
+
"epoch": 0.3722539682539683,
|
| 41072 |
+
"grad_norm": 0.146484375,
|
| 41073 |
+
"learning_rate": 0.1,
|
| 41074 |
+
"loss": 2.491314172744751,
|
| 41075 |
+
"step": 11726
|
| 41076 |
+
},
|
| 41077 |
+
{
|
| 41078 |
+
"epoch": 0.3723174603174603,
|
| 41079 |
+
"grad_norm": 0.11669921875,
|
| 41080 |
+
"learning_rate": 0.1,
|
| 41081 |
+
"loss": 2.4768242835998535,
|
| 41082 |
+
"step": 11728
|
| 41083 |
+
},
|
| 41084 |
+
{
|
| 41085 |
+
"epoch": 0.37238095238095237,
|
| 41086 |
+
"grad_norm": 0.0810546875,
|
| 41087 |
+
"learning_rate": 0.1,
|
| 41088 |
+
"loss": 2.4933431148529053,
|
| 41089 |
+
"step": 11730
|
| 41090 |
+
},
|
| 41091 |
+
{
|
| 41092 |
+
"epoch": 0.37244444444444447,
|
| 41093 |
+
"grad_norm": 0.08642578125,
|
| 41094 |
+
"learning_rate": 0.1,
|
| 41095 |
+
"loss": 2.504645347595215,
|
| 41096 |
+
"step": 11732
|
| 41097 |
+
},
|
| 41098 |
+
{
|
| 41099 |
+
"epoch": 0.3725079365079365,
|
| 41100 |
+
"grad_norm": 0.15625,
|
| 41101 |
+
"learning_rate": 0.1,
|
| 41102 |
+
"loss": 2.4549572467803955,
|
| 41103 |
+
"step": 11734
|
| 41104 |
+
},
|
| 41105 |
+
{
|
| 41106 |
+
"epoch": 0.37257142857142855,
|
| 41107 |
+
"grad_norm": 0.328125,
|
| 41108 |
+
"learning_rate": 0.1,
|
| 41109 |
+
"loss": 2.499910354614258,
|
| 41110 |
+
"step": 11736
|
| 41111 |
+
},
|
| 41112 |
+
{
|
| 41113 |
+
"epoch": 0.37263492063492065,
|
| 41114 |
+
"grad_norm": 0.671875,
|
| 41115 |
+
"learning_rate": 0.1,
|
| 41116 |
+
"loss": 2.474417209625244,
|
| 41117 |
+
"step": 11738
|
| 41118 |
+
},
|
| 41119 |
+
{
|
| 41120 |
+
"epoch": 0.3726984126984127,
|
| 41121 |
+
"grad_norm": 0.1435546875,
|
| 41122 |
+
"learning_rate": 0.1,
|
| 41123 |
+
"loss": 2.48007869720459,
|
| 41124 |
+
"step": 11740
|
| 41125 |
+
},
|
| 41126 |
+
{
|
| 41127 |
+
"epoch": 0.37276190476190474,
|
| 41128 |
+
"grad_norm": 0.1494140625,
|
| 41129 |
+
"learning_rate": 0.1,
|
| 41130 |
+
"loss": 2.511558771133423,
|
| 41131 |
+
"step": 11742
|
| 41132 |
+
},
|
| 41133 |
+
{
|
| 41134 |
+
"epoch": 0.37282539682539684,
|
| 41135 |
+
"grad_norm": 0.1787109375,
|
| 41136 |
+
"learning_rate": 0.1,
|
| 41137 |
+
"loss": 2.4778945446014404,
|
| 41138 |
+
"step": 11744
|
| 41139 |
+
},
|
| 41140 |
+
{
|
| 41141 |
+
"epoch": 0.3728888888888889,
|
| 41142 |
+
"grad_norm": 0.365234375,
|
| 41143 |
+
"learning_rate": 0.1,
|
| 41144 |
+
"loss": 2.465162992477417,
|
| 41145 |
+
"step": 11746
|
| 41146 |
+
},
|
| 41147 |
+
{
|
| 41148 |
+
"epoch": 0.372952380952381,
|
| 41149 |
+
"grad_norm": 0.357421875,
|
| 41150 |
+
"learning_rate": 0.1,
|
| 41151 |
+
"loss": 2.4944188594818115,
|
| 41152 |
+
"step": 11748
|
| 41153 |
+
},
|
| 41154 |
+
{
|
| 41155 |
+
"epoch": 0.373015873015873,
|
| 41156 |
+
"grad_norm": 0.09814453125,
|
| 41157 |
+
"learning_rate": 0.1,
|
| 41158 |
+
"loss": 2.492619276046753,
|
| 41159 |
+
"step": 11750
|
| 41160 |
+
},
|
| 41161 |
+
{
|
| 41162 |
+
"epoch": 0.37307936507936507,
|
| 41163 |
+
"grad_norm": 0.10107421875,
|
| 41164 |
+
"learning_rate": 0.1,
|
| 41165 |
+
"loss": 2.499040365219116,
|
| 41166 |
+
"step": 11752
|
| 41167 |
+
},
|
| 41168 |
+
{
|
| 41169 |
+
"epoch": 0.37314285714285716,
|
| 41170 |
+
"grad_norm": 0.3125,
|
| 41171 |
+
"learning_rate": 0.1,
|
| 41172 |
+
"loss": 2.4819164276123047,
|
| 41173 |
+
"step": 11754
|
| 41174 |
+
},
|
| 41175 |
+
{
|
| 41176 |
+
"epoch": 0.3732063492063492,
|
| 41177 |
+
"grad_norm": 0.3125,
|
| 41178 |
+
"learning_rate": 0.1,
|
| 41179 |
+
"loss": 2.51578688621521,
|
| 41180 |
+
"step": 11756
|
| 41181 |
+
},
|
| 41182 |
+
{
|
| 41183 |
+
"epoch": 0.37326984126984125,
|
| 41184 |
+
"grad_norm": 0.068359375,
|
| 41185 |
+
"learning_rate": 0.1,
|
| 41186 |
+
"loss": 2.4873805046081543,
|
| 41187 |
+
"step": 11758
|
| 41188 |
+
},
|
| 41189 |
+
{
|
| 41190 |
+
"epoch": 0.37333333333333335,
|
| 41191 |
+
"grad_norm": 0.064453125,
|
| 41192 |
+
"learning_rate": 0.1,
|
| 41193 |
+
"loss": 2.5044052600860596,
|
| 41194 |
+
"step": 11760
|
| 41195 |
+
},
|
| 41196 |
+
{
|
| 41197 |
+
"epoch": 0.3733968253968254,
|
| 41198 |
+
"grad_norm": 0.125,
|
| 41199 |
+
"learning_rate": 0.1,
|
| 41200 |
+
"loss": 2.470613479614258,
|
| 41201 |
+
"step": 11762
|
| 41202 |
+
},
|
| 41203 |
+
{
|
| 41204 |
+
"epoch": 0.37346031746031744,
|
| 41205 |
+
"grad_norm": 0.0439453125,
|
| 41206 |
+
"learning_rate": 0.1,
|
| 41207 |
+
"loss": 2.4653987884521484,
|
| 41208 |
+
"step": 11764
|
| 41209 |
+
},
|
| 41210 |
+
{
|
| 41211 |
+
"epoch": 0.37352380952380954,
|
| 41212 |
+
"grad_norm": 0.29296875,
|
| 41213 |
+
"learning_rate": 0.1,
|
| 41214 |
+
"loss": 2.5110135078430176,
|
| 41215 |
+
"step": 11766
|
| 41216 |
+
},
|
| 41217 |
+
{
|
| 41218 |
+
"epoch": 0.3735873015873016,
|
| 41219 |
+
"grad_norm": 0.2255859375,
|
| 41220 |
+
"learning_rate": 0.1,
|
| 41221 |
+
"loss": 2.480912446975708,
|
| 41222 |
+
"step": 11768
|
| 41223 |
+
},
|
| 41224 |
+
{
|
| 41225 |
+
"epoch": 0.3736507936507937,
|
| 41226 |
+
"grad_norm": 0.10693359375,
|
| 41227 |
+
"learning_rate": 0.1,
|
| 41228 |
+
"loss": 2.505415916442871,
|
| 41229 |
+
"step": 11770
|
| 41230 |
+
},
|
| 41231 |
+
{
|
| 41232 |
+
"epoch": 0.3737142857142857,
|
| 41233 |
+
"grad_norm": 0.21875,
|
| 41234 |
+
"learning_rate": 0.1,
|
| 41235 |
+
"loss": 2.484025239944458,
|
| 41236 |
+
"step": 11772
|
| 41237 |
+
},
|
| 41238 |
+
{
|
| 41239 |
+
"epoch": 0.37377777777777776,
|
| 41240 |
+
"grad_norm": 0.11572265625,
|
| 41241 |
+
"learning_rate": 0.1,
|
| 41242 |
+
"loss": 2.478426933288574,
|
| 41243 |
+
"step": 11774
|
| 41244 |
+
},
|
| 41245 |
+
{
|
| 41246 |
+
"epoch": 0.37384126984126986,
|
| 41247 |
+
"grad_norm": 0.099609375,
|
| 41248 |
+
"learning_rate": 0.1,
|
| 41249 |
+
"loss": 2.497925043106079,
|
| 41250 |
+
"step": 11776
|
| 41251 |
+
},
|
| 41252 |
+
{
|
| 41253 |
+
"epoch": 0.3739047619047619,
|
| 41254 |
+
"grad_norm": 0.11572265625,
|
| 41255 |
+
"learning_rate": 0.1,
|
| 41256 |
+
"loss": 2.4825384616851807,
|
| 41257 |
+
"step": 11778
|
| 41258 |
+
},
|
| 41259 |
+
{
|
| 41260 |
+
"epoch": 0.37396825396825395,
|
| 41261 |
+
"grad_norm": 0.2470703125,
|
| 41262 |
+
"learning_rate": 0.1,
|
| 41263 |
+
"loss": 2.4915850162506104,
|
| 41264 |
+
"step": 11780
|
| 41265 |
+
},
|
| 41266 |
+
{
|
| 41267 |
+
"epoch": 0.37403174603174605,
|
| 41268 |
+
"grad_norm": 0.06103515625,
|
| 41269 |
+
"learning_rate": 0.1,
|
| 41270 |
+
"loss": 2.4559059143066406,
|
| 41271 |
+
"step": 11782
|
| 41272 |
+
},
|
| 41273 |
+
{
|
| 41274 |
+
"epoch": 0.3740952380952381,
|
| 41275 |
+
"grad_norm": 0.12109375,
|
| 41276 |
+
"learning_rate": 0.1,
|
| 41277 |
+
"loss": 2.5077600479125977,
|
| 41278 |
+
"step": 11784
|
| 41279 |
+
},
|
| 41280 |
+
{
|
| 41281 |
+
"epoch": 0.37415873015873014,
|
| 41282 |
+
"grad_norm": 0.4921875,
|
| 41283 |
+
"learning_rate": 0.1,
|
| 41284 |
+
"loss": 2.4866786003112793,
|
| 41285 |
+
"step": 11786
|
| 41286 |
+
},
|
| 41287 |
+
{
|
| 41288 |
+
"epoch": 0.37422222222222223,
|
| 41289 |
+
"grad_norm": 0.423828125,
|
| 41290 |
+
"learning_rate": 0.1,
|
| 41291 |
+
"loss": 2.4874205589294434,
|
| 41292 |
+
"step": 11788
|
| 41293 |
+
},
|
| 41294 |
+
{
|
| 41295 |
+
"epoch": 0.3742857142857143,
|
| 41296 |
+
"grad_norm": 0.0771484375,
|
| 41297 |
+
"learning_rate": 0.1,
|
| 41298 |
+
"loss": 2.462109088897705,
|
| 41299 |
+
"step": 11790
|
| 41300 |
+
},
|
| 41301 |
+
{
|
| 41302 |
+
"epoch": 0.3743492063492064,
|
| 41303 |
+
"grad_norm": 0.166015625,
|
| 41304 |
+
"learning_rate": 0.1,
|
| 41305 |
+
"loss": 2.4757614135742188,
|
| 41306 |
+
"step": 11792
|
| 41307 |
+
},
|
| 41308 |
+
{
|
| 41309 |
+
"epoch": 0.3744126984126984,
|
| 41310 |
+
"grad_norm": 0.1181640625,
|
| 41311 |
+
"learning_rate": 0.1,
|
| 41312 |
+
"loss": 2.506653308868408,
|
| 41313 |
+
"step": 11794
|
| 41314 |
+
},
|
| 41315 |
+
{
|
| 41316 |
+
"epoch": 0.37447619047619046,
|
| 41317 |
+
"grad_norm": 0.11328125,
|
| 41318 |
+
"learning_rate": 0.1,
|
| 41319 |
+
"loss": 2.4672343730926514,
|
| 41320 |
+
"step": 11796
|
| 41321 |
+
},
|
| 41322 |
+
{
|
| 41323 |
+
"epoch": 0.37453968253968256,
|
| 41324 |
+
"grad_norm": 0.10400390625,
|
| 41325 |
+
"learning_rate": 0.1,
|
| 41326 |
+
"loss": 2.47956919670105,
|
| 41327 |
+
"step": 11798
|
| 41328 |
+
},
|
| 41329 |
+
{
|
| 41330 |
+
"epoch": 0.3746031746031746,
|
| 41331 |
+
"grad_norm": 0.1298828125,
|
| 41332 |
+
"learning_rate": 0.1,
|
| 41333 |
+
"loss": 2.4795355796813965,
|
| 41334 |
+
"step": 11800
|
| 41335 |
+
},
|
| 41336 |
+
{
|
| 41337 |
+
"epoch": 0.37466666666666665,
|
| 41338 |
+
"grad_norm": 0.1337890625,
|
| 41339 |
+
"learning_rate": 0.1,
|
| 41340 |
+
"loss": 2.47666597366333,
|
| 41341 |
+
"step": 11802
|
| 41342 |
+
},
|
| 41343 |
+
{
|
| 41344 |
+
"epoch": 0.37473015873015875,
|
| 41345 |
+
"grad_norm": 0.3984375,
|
| 41346 |
+
"learning_rate": 0.1,
|
| 41347 |
+
"loss": 2.47880220413208,
|
| 41348 |
+
"step": 11804
|
| 41349 |
+
},
|
| 41350 |
+
{
|
| 41351 |
+
"epoch": 0.3747936507936508,
|
| 41352 |
+
"grad_norm": 0.5078125,
|
| 41353 |
+
"learning_rate": 0.1,
|
| 41354 |
+
"loss": 2.4636754989624023,
|
| 41355 |
+
"step": 11806
|
| 41356 |
+
},
|
| 41357 |
+
{
|
| 41358 |
+
"epoch": 0.37485714285714283,
|
| 41359 |
+
"grad_norm": 0.21875,
|
| 41360 |
+
"learning_rate": 0.1,
|
| 41361 |
+
"loss": 2.4849722385406494,
|
| 41362 |
+
"step": 11808
|
| 41363 |
+
},
|
| 41364 |
+
{
|
| 41365 |
+
"epoch": 0.37492063492063493,
|
| 41366 |
+
"grad_norm": 0.123046875,
|
| 41367 |
+
"learning_rate": 0.1,
|
| 41368 |
+
"loss": 2.4913809299468994,
|
| 41369 |
+
"step": 11810
|
| 41370 |
+
},
|
| 41371 |
+
{
|
| 41372 |
+
"epoch": 0.374984126984127,
|
| 41373 |
+
"grad_norm": 0.134765625,
|
| 41374 |
+
"learning_rate": 0.1,
|
| 41375 |
+
"loss": 2.4815680980682373,
|
| 41376 |
+
"step": 11812
|
| 41377 |
+
},
|
| 41378 |
+
{
|
| 41379 |
+
"epoch": 0.3750476190476191,
|
| 41380 |
+
"grad_norm": 0.25,
|
| 41381 |
+
"learning_rate": 0.1,
|
| 41382 |
+
"loss": 2.4828529357910156,
|
| 41383 |
+
"step": 11814
|
| 41384 |
+
},
|
| 41385 |
+
{
|
| 41386 |
+
"epoch": 0.3751111111111111,
|
| 41387 |
+
"grad_norm": 0.3203125,
|
| 41388 |
+
"learning_rate": 0.1,
|
| 41389 |
+
"loss": 2.46449875831604,
|
| 41390 |
+
"step": 11816
|
| 41391 |
+
},
|
| 41392 |
+
{
|
| 41393 |
+
"epoch": 0.37517460317460316,
|
| 41394 |
+
"grad_norm": 0.1787109375,
|
| 41395 |
+
"learning_rate": 0.1,
|
| 41396 |
+
"loss": 2.490966558456421,
|
| 41397 |
+
"step": 11818
|
| 41398 |
+
},
|
| 41399 |
+
{
|
| 41400 |
+
"epoch": 0.37523809523809526,
|
| 41401 |
+
"grad_norm": 0.04345703125,
|
| 41402 |
+
"learning_rate": 0.1,
|
| 41403 |
+
"loss": 2.482635021209717,
|
| 41404 |
+
"step": 11820
|
| 41405 |
+
},
|
| 41406 |
+
{
|
| 41407 |
+
"epoch": 0.3753015873015873,
|
| 41408 |
+
"grad_norm": 0.057861328125,
|
| 41409 |
+
"learning_rate": 0.1,
|
| 41410 |
+
"loss": 2.484558582305908,
|
| 41411 |
+
"step": 11822
|
| 41412 |
+
},
|
| 41413 |
+
{
|
| 41414 |
+
"epoch": 0.37536507936507935,
|
| 41415 |
+
"grad_norm": 0.087890625,
|
| 41416 |
+
"learning_rate": 0.1,
|
| 41417 |
+
"loss": 2.4617435932159424,
|
| 41418 |
+
"step": 11824
|
| 41419 |
+
},
|
| 41420 |
+
{
|
| 41421 |
+
"epoch": 0.37542857142857144,
|
| 41422 |
+
"grad_norm": 0.126953125,
|
| 41423 |
+
"learning_rate": 0.1,
|
| 41424 |
+
"loss": 2.461313009262085,
|
| 41425 |
+
"step": 11826
|
| 41426 |
+
},
|
| 41427 |
+
{
|
| 41428 |
+
"epoch": 0.3754920634920635,
|
| 41429 |
+
"grad_norm": 0.25390625,
|
| 41430 |
+
"learning_rate": 0.1,
|
| 41431 |
+
"loss": 2.452178716659546,
|
| 41432 |
+
"step": 11828
|
| 41433 |
+
},
|
| 41434 |
+
{
|
| 41435 |
+
"epoch": 0.37555555555555553,
|
| 41436 |
+
"grad_norm": 0.068359375,
|
| 41437 |
+
"learning_rate": 0.1,
|
| 41438 |
+
"loss": 2.481152057647705,
|
| 41439 |
+
"step": 11830
|
| 41440 |
+
},
|
| 41441 |
+
{
|
| 41442 |
+
"epoch": 0.37561904761904763,
|
| 41443 |
+
"grad_norm": 0.302734375,
|
| 41444 |
+
"learning_rate": 0.1,
|
| 41445 |
+
"loss": 2.4916257858276367,
|
| 41446 |
+
"step": 11832
|
| 41447 |
+
},
|
| 41448 |
+
{
|
| 41449 |
+
"epoch": 0.3756825396825397,
|
| 41450 |
+
"grad_norm": 0.58203125,
|
| 41451 |
+
"learning_rate": 0.1,
|
| 41452 |
+
"loss": 2.4957711696624756,
|
| 41453 |
+
"step": 11834
|
| 41454 |
+
},
|
| 41455 |
+
{
|
| 41456 |
+
"epoch": 0.3757460317460318,
|
| 41457 |
+
"grad_norm": 0.1513671875,
|
| 41458 |
+
"learning_rate": 0.1,
|
| 41459 |
+
"loss": 2.4785571098327637,
|
| 41460 |
+
"step": 11836
|
| 41461 |
+
},
|
| 41462 |
+
{
|
| 41463 |
+
"epoch": 0.3758095238095238,
|
| 41464 |
+
"grad_norm": 0.2177734375,
|
| 41465 |
+
"learning_rate": 0.1,
|
| 41466 |
+
"loss": 2.486116409301758,
|
| 41467 |
+
"step": 11838
|
| 41468 |
+
},
|
| 41469 |
+
{
|
| 41470 |
+
"epoch": 0.37587301587301586,
|
| 41471 |
+
"grad_norm": 0.203125,
|
| 41472 |
+
"learning_rate": 0.1,
|
| 41473 |
+
"loss": 2.504101514816284,
|
| 41474 |
+
"step": 11840
|
| 41475 |
+
},
|
| 41476 |
+
{
|
| 41477 |
+
"epoch": 0.37593650793650796,
|
| 41478 |
+
"grad_norm": 0.115234375,
|
| 41479 |
+
"learning_rate": 0.1,
|
| 41480 |
+
"loss": 2.473159074783325,
|
| 41481 |
+
"step": 11842
|
| 41482 |
+
},
|
| 41483 |
+
{
|
| 41484 |
+
"epoch": 0.376,
|
| 41485 |
+
"grad_norm": 0.1064453125,
|
| 41486 |
+
"learning_rate": 0.1,
|
| 41487 |
+
"loss": 2.475519895553589,
|
| 41488 |
+
"step": 11844
|
| 41489 |
+
},
|
| 41490 |
+
{
|
| 41491 |
+
"epoch": 0.37606349206349204,
|
| 41492 |
+
"grad_norm": 0.11865234375,
|
| 41493 |
+
"learning_rate": 0.1,
|
| 41494 |
+
"loss": 2.47436785697937,
|
| 41495 |
+
"step": 11846
|
| 41496 |
+
},
|
| 41497 |
+
{
|
| 41498 |
+
"epoch": 0.37612698412698414,
|
| 41499 |
+
"grad_norm": 0.310546875,
|
| 41500 |
+
"learning_rate": 0.1,
|
| 41501 |
+
"loss": 2.445840358734131,
|
| 41502 |
+
"step": 11848
|
| 41503 |
+
},
|
| 41504 |
+
{
|
| 41505 |
+
"epoch": 0.3761904761904762,
|
| 41506 |
+
"grad_norm": 0.244140625,
|
| 41507 |
+
"learning_rate": 0.1,
|
| 41508 |
+
"loss": 2.4474823474884033,
|
| 41509 |
+
"step": 11850
|
| 41510 |
+
},
|
| 41511 |
+
{
|
| 41512 |
+
"epoch": 0.37625396825396823,
|
| 41513 |
+
"grad_norm": 0.205078125,
|
| 41514 |
+
"learning_rate": 0.1,
|
| 41515 |
+
"loss": 2.493243932723999,
|
| 41516 |
+
"step": 11852
|
| 41517 |
+
},
|
| 41518 |
+
{
|
| 41519 |
+
"epoch": 0.37631746031746033,
|
| 41520 |
+
"grad_norm": 0.10595703125,
|
| 41521 |
+
"learning_rate": 0.1,
|
| 41522 |
+
"loss": 2.4680328369140625,
|
| 41523 |
+
"step": 11854
|
| 41524 |
+
},
|
| 41525 |
+
{
|
| 41526 |
+
"epoch": 0.37638095238095237,
|
| 41527 |
+
"grad_norm": 0.142578125,
|
| 41528 |
+
"learning_rate": 0.1,
|
| 41529 |
+
"loss": 2.492051601409912,
|
| 41530 |
+
"step": 11856
|
| 41531 |
+
},
|
| 41532 |
+
{
|
| 41533 |
+
"epoch": 0.37644444444444447,
|
| 41534 |
+
"grad_norm": 0.287109375,
|
| 41535 |
+
"learning_rate": 0.1,
|
| 41536 |
+
"loss": 2.478950023651123,
|
| 41537 |
+
"step": 11858
|
| 41538 |
+
},
|
| 41539 |
+
{
|
| 41540 |
+
"epoch": 0.3765079365079365,
|
| 41541 |
+
"grad_norm": 0.2109375,
|
| 41542 |
+
"learning_rate": 0.1,
|
| 41543 |
+
"loss": 2.4861087799072266,
|
| 41544 |
+
"step": 11860
|
| 41545 |
+
},
|
| 41546 |
+
{
|
| 41547 |
+
"epoch": 0.37657142857142856,
|
| 41548 |
+
"grad_norm": 0.12255859375,
|
| 41549 |
+
"learning_rate": 0.1,
|
| 41550 |
+
"loss": 2.466583490371704,
|
| 41551 |
+
"step": 11862
|
| 41552 |
+
},
|
| 41553 |
+
{
|
| 41554 |
+
"epoch": 0.37663492063492066,
|
| 41555 |
+
"grad_norm": 0.103515625,
|
| 41556 |
+
"learning_rate": 0.1,
|
| 41557 |
+
"loss": 2.4566826820373535,
|
| 41558 |
+
"step": 11864
|
| 41559 |
+
},
|
| 41560 |
+
{
|
| 41561 |
+
"epoch": 0.3766984126984127,
|
| 41562 |
+
"grad_norm": 0.310546875,
|
| 41563 |
+
"learning_rate": 0.1,
|
| 41564 |
+
"loss": 2.459071159362793,
|
| 41565 |
+
"step": 11866
|
| 41566 |
+
},
|
| 41567 |
+
{
|
| 41568 |
+
"epoch": 0.37676190476190474,
|
| 41569 |
+
"grad_norm": 0.369140625,
|
| 41570 |
+
"learning_rate": 0.1,
|
| 41571 |
+
"loss": 2.474431037902832,
|
| 41572 |
+
"step": 11868
|
| 41573 |
+
},
|
| 41574 |
+
{
|
| 41575 |
+
"epoch": 0.37682539682539684,
|
| 41576 |
+
"grad_norm": 0.10205078125,
|
| 41577 |
+
"learning_rate": 0.1,
|
| 41578 |
+
"loss": 2.474824905395508,
|
| 41579 |
+
"step": 11870
|
| 41580 |
+
},
|
| 41581 |
+
{
|
| 41582 |
+
"epoch": 0.3768888888888889,
|
| 41583 |
+
"grad_norm": 0.1416015625,
|
| 41584 |
+
"learning_rate": 0.1,
|
| 41585 |
+
"loss": 2.5060482025146484,
|
| 41586 |
+
"step": 11872
|
| 41587 |
+
},
|
| 41588 |
+
{
|
| 41589 |
+
"epoch": 0.3769523809523809,
|
| 41590 |
+
"grad_norm": 0.2275390625,
|
| 41591 |
+
"learning_rate": 0.1,
|
| 41592 |
+
"loss": 2.4836771488189697,
|
| 41593 |
+
"step": 11874
|
| 41594 |
+
},
|
| 41595 |
+
{
|
| 41596 |
+
"epoch": 0.377015873015873,
|
| 41597 |
+
"grad_norm": 0.1650390625,
|
| 41598 |
+
"learning_rate": 0.1,
|
| 41599 |
+
"loss": 2.4806151390075684,
|
| 41600 |
+
"step": 11876
|
| 41601 |
+
},
|
| 41602 |
+
{
|
| 41603 |
+
"epoch": 0.37707936507936507,
|
| 41604 |
+
"grad_norm": 0.2138671875,
|
| 41605 |
+
"learning_rate": 0.1,
|
| 41606 |
+
"loss": 2.4879519939422607,
|
| 41607 |
+
"step": 11878
|
| 41608 |
+
},
|
| 41609 |
+
{
|
| 41610 |
+
"epoch": 0.37714285714285717,
|
| 41611 |
+
"grad_norm": 0.3046875,
|
| 41612 |
+
"learning_rate": 0.1,
|
| 41613 |
+
"loss": 2.4853639602661133,
|
| 41614 |
+
"step": 11880
|
| 41615 |
+
},
|
| 41616 |
+
{
|
| 41617 |
+
"epoch": 0.3772063492063492,
|
| 41618 |
+
"grad_norm": 0.2080078125,
|
| 41619 |
+
"learning_rate": 0.1,
|
| 41620 |
+
"loss": 2.4878056049346924,
|
| 41621 |
+
"step": 11882
|
| 41622 |
+
},
|
| 41623 |
+
{
|
| 41624 |
+
"epoch": 0.37726984126984126,
|
| 41625 |
+
"grad_norm": 0.1533203125,
|
| 41626 |
+
"learning_rate": 0.1,
|
| 41627 |
+
"loss": 2.4815573692321777,
|
| 41628 |
+
"step": 11884
|
| 41629 |
+
},
|
| 41630 |
+
{
|
| 41631 |
+
"epoch": 0.37733333333333335,
|
| 41632 |
+
"grad_norm": 0.11474609375,
|
| 41633 |
+
"learning_rate": 0.1,
|
| 41634 |
+
"loss": 2.4892303943634033,
|
| 41635 |
+
"step": 11886
|
| 41636 |
+
},
|
| 41637 |
+
{
|
| 41638 |
+
"epoch": 0.3773968253968254,
|
| 41639 |
+
"grad_norm": 0.18359375,
|
| 41640 |
+
"learning_rate": 0.1,
|
| 41641 |
+
"loss": 2.485851526260376,
|
| 41642 |
+
"step": 11888
|
| 41643 |
+
},
|
| 41644 |
+
{
|
| 41645 |
+
"epoch": 0.37746031746031744,
|
| 41646 |
+
"grad_norm": 0.294921875,
|
| 41647 |
+
"learning_rate": 0.1,
|
| 41648 |
+
"loss": 2.45359468460083,
|
| 41649 |
+
"step": 11890
|
| 41650 |
+
},
|
| 41651 |
+
{
|
| 41652 |
+
"epoch": 0.37752380952380954,
|
| 41653 |
+
"grad_norm": 0.482421875,
|
| 41654 |
+
"learning_rate": 0.1,
|
| 41655 |
+
"loss": 2.4731178283691406,
|
| 41656 |
+
"step": 11892
|
| 41657 |
+
},
|
| 41658 |
+
{
|
| 41659 |
+
"epoch": 0.3775873015873016,
|
| 41660 |
+
"grad_norm": 0.33203125,
|
| 41661 |
+
"learning_rate": 0.1,
|
| 41662 |
+
"loss": 2.44585919380188,
|
| 41663 |
+
"step": 11894
|
| 41664 |
+
},
|
| 41665 |
+
{
|
| 41666 |
+
"epoch": 0.3776507936507936,
|
| 41667 |
+
"grad_norm": 0.1826171875,
|
| 41668 |
+
"learning_rate": 0.1,
|
| 41669 |
+
"loss": 2.453669786453247,
|
| 41670 |
+
"step": 11896
|
| 41671 |
+
},
|
| 41672 |
+
{
|
| 41673 |
+
"epoch": 0.3777142857142857,
|
| 41674 |
+
"grad_norm": 0.08251953125,
|
| 41675 |
+
"learning_rate": 0.1,
|
| 41676 |
+
"loss": 2.4264750480651855,
|
| 41677 |
+
"step": 11898
|
| 41678 |
+
},
|
| 41679 |
+
{
|
| 41680 |
+
"epoch": 0.37777777777777777,
|
| 41681 |
+
"grad_norm": 0.12158203125,
|
| 41682 |
+
"learning_rate": 0.1,
|
| 41683 |
+
"loss": 2.466355323791504,
|
| 41684 |
+
"step": 11900
|
| 41685 |
+
},
|
| 41686 |
+
{
|
| 41687 |
+
"epoch": 0.37784126984126987,
|
| 41688 |
+
"grad_norm": 0.208984375,
|
| 41689 |
+
"learning_rate": 0.1,
|
| 41690 |
+
"loss": 2.453779458999634,
|
| 41691 |
+
"step": 11902
|
| 41692 |
+
},
|
| 41693 |
+
{
|
| 41694 |
+
"epoch": 0.3779047619047619,
|
| 41695 |
+
"grad_norm": 0.10205078125,
|
| 41696 |
+
"learning_rate": 0.1,
|
| 41697 |
+
"loss": 2.4556055068969727,
|
| 41698 |
+
"step": 11904
|
| 41699 |
+
},
|
| 41700 |
+
{
|
| 41701 |
+
"epoch": 0.37796825396825395,
|
| 41702 |
+
"grad_norm": 0.375,
|
| 41703 |
+
"learning_rate": 0.1,
|
| 41704 |
+
"loss": 2.4727084636688232,
|
| 41705 |
+
"step": 11906
|
| 41706 |
+
},
|
| 41707 |
+
{
|
| 41708 |
+
"epoch": 0.37803174603174605,
|
| 41709 |
+
"grad_norm": 0.177734375,
|
| 41710 |
+
"learning_rate": 0.1,
|
| 41711 |
+
"loss": 2.430933713912964,
|
| 41712 |
+
"step": 11908
|
| 41713 |
+
},
|
| 41714 |
+
{
|
| 41715 |
+
"epoch": 0.3780952380952381,
|
| 41716 |
+
"grad_norm": 0.1572265625,
|
| 41717 |
+
"learning_rate": 0.1,
|
| 41718 |
+
"loss": 2.452418327331543,
|
| 41719 |
+
"step": 11910
|
| 41720 |
+
},
|
| 41721 |
+
{
|
| 41722 |
+
"epoch": 0.37815873015873014,
|
| 41723 |
+
"grad_norm": 0.2431640625,
|
| 41724 |
+
"learning_rate": 0.1,
|
| 41725 |
+
"loss": 2.4696695804595947,
|
| 41726 |
+
"step": 11912
|
| 41727 |
+
},
|
| 41728 |
+
{
|
| 41729 |
+
"epoch": 0.37822222222222224,
|
| 41730 |
+
"grad_norm": 0.1083984375,
|
| 41731 |
+
"learning_rate": 0.1,
|
| 41732 |
+
"loss": 2.457073211669922,
|
| 41733 |
+
"step": 11914
|
| 41734 |
+
},
|
| 41735 |
+
{
|
| 41736 |
+
"epoch": 0.3782857142857143,
|
| 41737 |
+
"grad_norm": 0.0888671875,
|
| 41738 |
+
"learning_rate": 0.1,
|
| 41739 |
+
"loss": 2.4525954723358154,
|
| 41740 |
+
"step": 11916
|
| 41741 |
+
},
|
| 41742 |
+
{
|
| 41743 |
+
"epoch": 0.3783492063492063,
|
| 41744 |
+
"grad_norm": 0.1435546875,
|
| 41745 |
+
"learning_rate": 0.1,
|
| 41746 |
+
"loss": 2.467282772064209,
|
| 41747 |
+
"step": 11918
|
| 41748 |
+
},
|
| 41749 |
+
{
|
| 41750 |
+
"epoch": 0.3784126984126984,
|
| 41751 |
+
"grad_norm": 0.158203125,
|
| 41752 |
+
"learning_rate": 0.1,
|
| 41753 |
+
"loss": 2.4599666595458984,
|
| 41754 |
+
"step": 11920
|
| 41755 |
+
},
|
| 41756 |
+
{
|
| 41757 |
+
"epoch": 0.37847619047619047,
|
| 41758 |
+
"grad_norm": 0.435546875,
|
| 41759 |
+
"learning_rate": 0.1,
|
| 41760 |
+
"loss": 2.4501702785491943,
|
| 41761 |
+
"step": 11922
|
| 41762 |
+
},
|
| 41763 |
+
{
|
| 41764 |
+
"epoch": 0.37853968253968256,
|
| 41765 |
+
"grad_norm": 0.447265625,
|
| 41766 |
+
"learning_rate": 0.1,
|
| 41767 |
+
"loss": 2.4939215183258057,
|
| 41768 |
+
"step": 11924
|
| 41769 |
+
},
|
| 41770 |
+
{
|
| 41771 |
+
"epoch": 0.3786031746031746,
|
| 41772 |
+
"grad_norm": 0.123046875,
|
| 41773 |
+
"learning_rate": 0.1,
|
| 41774 |
+
"loss": 2.448594093322754,
|
| 41775 |
+
"step": 11926
|
| 41776 |
+
},
|
| 41777 |
+
{
|
| 41778 |
+
"epoch": 0.37866666666666665,
|
| 41779 |
+
"grad_norm": 0.09619140625,
|
| 41780 |
+
"learning_rate": 0.1,
|
| 41781 |
+
"loss": 2.4669272899627686,
|
| 41782 |
+
"step": 11928
|
| 41783 |
+
},
|
| 41784 |
+
{
|
| 41785 |
+
"epoch": 0.37873015873015875,
|
| 41786 |
+
"grad_norm": 0.2421875,
|
| 41787 |
+
"learning_rate": 0.1,
|
| 41788 |
+
"loss": 2.474453926086426,
|
| 41789 |
+
"step": 11930
|
| 41790 |
+
},
|
| 41791 |
+
{
|
| 41792 |
+
"epoch": 0.3787936507936508,
|
| 41793 |
+
"grad_norm": 0.263671875,
|
| 41794 |
+
"learning_rate": 0.1,
|
| 41795 |
+
"loss": 2.4715001583099365,
|
| 41796 |
+
"step": 11932
|
| 41797 |
+
},
|
| 41798 |
+
{
|
| 41799 |
+
"epoch": 0.37885714285714284,
|
| 41800 |
+
"grad_norm": 0.365234375,
|
| 41801 |
+
"learning_rate": 0.1,
|
| 41802 |
+
"loss": 2.4608728885650635,
|
| 41803 |
+
"step": 11934
|
| 41804 |
+
},
|
| 41805 |
+
{
|
| 41806 |
+
"epoch": 0.37892063492063494,
|
| 41807 |
+
"grad_norm": 0.302734375,
|
| 41808 |
+
"learning_rate": 0.1,
|
| 41809 |
+
"loss": 2.483308792114258,
|
| 41810 |
+
"step": 11936
|
| 41811 |
+
},
|
| 41812 |
+
{
|
| 41813 |
+
"epoch": 0.378984126984127,
|
| 41814 |
+
"grad_norm": 0.10791015625,
|
| 41815 |
+
"learning_rate": 0.1,
|
| 41816 |
+
"loss": 2.484426975250244,
|
| 41817 |
+
"step": 11938
|
| 41818 |
+
},
|
| 41819 |
+
{
|
| 41820 |
+
"epoch": 0.379047619047619,
|
| 41821 |
+
"grad_norm": 0.0634765625,
|
| 41822 |
+
"learning_rate": 0.1,
|
| 41823 |
+
"loss": 2.478487968444824,
|
| 41824 |
+
"step": 11940
|
| 41825 |
+
},
|
| 41826 |
+
{
|
| 41827 |
+
"epoch": 0.3791111111111111,
|
| 41828 |
+
"grad_norm": 0.07177734375,
|
| 41829 |
+
"learning_rate": 0.1,
|
| 41830 |
+
"loss": 2.434561014175415,
|
| 41831 |
+
"step": 11942
|
| 41832 |
+
},
|
| 41833 |
+
{
|
| 41834 |
+
"epoch": 0.37917460317460316,
|
| 41835 |
+
"grad_norm": 0.057861328125,
|
| 41836 |
+
"learning_rate": 0.1,
|
| 41837 |
+
"loss": 2.4459564685821533,
|
| 41838 |
+
"step": 11944
|
| 41839 |
+
},
|
| 41840 |
+
{
|
| 41841 |
+
"epoch": 0.37923809523809526,
|
| 41842 |
+
"grad_norm": 0.130859375,
|
| 41843 |
+
"learning_rate": 0.1,
|
| 41844 |
+
"loss": 2.4592580795288086,
|
| 41845 |
+
"step": 11946
|
| 41846 |
+
},
|
| 41847 |
+
{
|
| 41848 |
+
"epoch": 0.3793015873015873,
|
| 41849 |
+
"grad_norm": 0.08447265625,
|
| 41850 |
+
"learning_rate": 0.1,
|
| 41851 |
+
"loss": 2.459186315536499,
|
| 41852 |
+
"step": 11948
|
| 41853 |
+
},
|
| 41854 |
+
{
|
| 41855 |
+
"epoch": 0.37936507936507935,
|
| 41856 |
+
"grad_norm": 0.12890625,
|
| 41857 |
+
"learning_rate": 0.1,
|
| 41858 |
+
"loss": 2.458428144454956,
|
| 41859 |
+
"step": 11950
|
| 41860 |
+
},
|
| 41861 |
+
{
|
| 41862 |
+
"epoch": 0.37942857142857145,
|
| 41863 |
+
"grad_norm": 0.201171875,
|
| 41864 |
+
"learning_rate": 0.1,
|
| 41865 |
+
"loss": 2.4365906715393066,
|
| 41866 |
+
"step": 11952
|
| 41867 |
+
},
|
| 41868 |
+
{
|
| 41869 |
+
"epoch": 0.3794920634920635,
|
| 41870 |
+
"grad_norm": 0.318359375,
|
| 41871 |
+
"learning_rate": 0.1,
|
| 41872 |
+
"loss": 2.4577033519744873,
|
| 41873 |
+
"step": 11954
|
| 41874 |
+
},
|
| 41875 |
+
{
|
| 41876 |
+
"epoch": 0.37955555555555553,
|
| 41877 |
+
"grad_norm": 0.166015625,
|
| 41878 |
+
"learning_rate": 0.1,
|
| 41879 |
+
"loss": 2.445676565170288,
|
| 41880 |
+
"step": 11956
|
| 41881 |
+
},
|
| 41882 |
+
{
|
| 41883 |
+
"epoch": 0.37961904761904763,
|
| 41884 |
+
"grad_norm": 0.130859375,
|
| 41885 |
+
"learning_rate": 0.1,
|
| 41886 |
+
"loss": 2.4809117317199707,
|
| 41887 |
+
"step": 11958
|
| 41888 |
+
},
|
| 41889 |
+
{
|
| 41890 |
+
"epoch": 0.3796825396825397,
|
| 41891 |
+
"grad_norm": 0.359375,
|
| 41892 |
+
"learning_rate": 0.1,
|
| 41893 |
+
"loss": 2.446380615234375,
|
| 41894 |
+
"step": 11960
|
| 41895 |
+
},
|
| 41896 |
+
{
|
| 41897 |
+
"epoch": 0.3797460317460317,
|
| 41898 |
+
"grad_norm": 0.44921875,
|
| 41899 |
+
"learning_rate": 0.1,
|
| 41900 |
+
"loss": 2.4621827602386475,
|
| 41901 |
+
"step": 11962
|
| 41902 |
+
},
|
| 41903 |
+
{
|
| 41904 |
+
"epoch": 0.3798095238095238,
|
| 41905 |
+
"grad_norm": 0.1435546875,
|
| 41906 |
+
"learning_rate": 0.1,
|
| 41907 |
+
"loss": 2.434882402420044,
|
| 41908 |
+
"step": 11964
|
| 41909 |
+
},
|
| 41910 |
+
{
|
| 41911 |
+
"epoch": 0.37987301587301586,
|
| 41912 |
+
"grad_norm": 0.087890625,
|
| 41913 |
+
"learning_rate": 0.1,
|
| 41914 |
+
"loss": 2.4461870193481445,
|
| 41915 |
+
"step": 11966
|
| 41916 |
+
},
|
| 41917 |
+
{
|
| 41918 |
+
"epoch": 0.37993650793650796,
|
| 41919 |
+
"grad_norm": 0.1494140625,
|
| 41920 |
+
"learning_rate": 0.1,
|
| 41921 |
+
"loss": 2.4736037254333496,
|
| 41922 |
+
"step": 11968
|
| 41923 |
+
},
|
| 41924 |
+
{
|
| 41925 |
+
"epoch": 0.38,
|
| 41926 |
+
"grad_norm": 0.087890625,
|
| 41927 |
+
"learning_rate": 0.1,
|
| 41928 |
+
"loss": 2.4388091564178467,
|
| 41929 |
+
"step": 11970
|
| 41930 |
}
|
| 41931 |
],
|
| 41932 |
"logging_steps": 2,
|
|
|
|
| 41946 |
"attributes": {}
|
| 41947 |
}
|
| 41948 |
},
|
| 41949 |
+
"total_flos": 3.964287804106193e+19,
|
| 41950 |
"train_batch_size": 4,
|
| 41951 |
"trial_name": null,
|
| 41952 |
"trial_params": null
|