Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/experimental2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/experimental2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/experimental2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ba2han/experimental2
- SGLang
How to use Ba2han/experimental2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/experimental2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/experimental2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Ba2han/experimental2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/experimental2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/experimental2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/experimental2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
docker model run hf.co/Ba2han/experimental2
Training in progress, step 11655, checkpoint
Browse files- last-checkpoint/trainer_state.json +1102 -3
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 3150,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -39722,6 +39722,1105 @@
|
|
| 39722 |
"learning_rate": 0.1,
|
| 39723 |
"loss": 2.4747467041015625,
|
| 39724 |
"step": 11340
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39725 |
}
|
| 39726 |
],
|
| 39727 |
"logging_steps": 2,
|
|
@@ -39741,7 +40840,7 @@
|
|
| 39741 |
"attributes": {}
|
| 39742 |
}
|
| 39743 |
},
|
| 39744 |
-
"total_flos": 3.
|
| 39745 |
"train_batch_size": 4,
|
| 39746 |
"trial_name": null,
|
| 39747 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.37,
|
| 6 |
"eval_steps": 3150,
|
| 7 |
+
"global_step": 11655,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 39722 |
"learning_rate": 0.1,
|
| 39723 |
"loss": 2.4747467041015625,
|
| 39724 |
"step": 11340
|
| 39725 |
+
},
|
| 39726 |
+
{
|
| 39727 |
+
"epoch": 0.3600634920634921,
|
| 39728 |
+
"grad_norm": 0.318359375,
|
| 39729 |
+
"learning_rate": 0.1,
|
| 39730 |
+
"loss": 2.5079941749572754,
|
| 39731 |
+
"step": 11342
|
| 39732 |
+
},
|
| 39733 |
+
{
|
| 39734 |
+
"epoch": 0.36012698412698413,
|
| 39735 |
+
"grad_norm": 0.10498046875,
|
| 39736 |
+
"learning_rate": 0.1,
|
| 39737 |
+
"loss": 2.453760862350464,
|
| 39738 |
+
"step": 11344
|
| 39739 |
+
},
|
| 39740 |
+
{
|
| 39741 |
+
"epoch": 0.36019047619047617,
|
| 39742 |
+
"grad_norm": 0.255859375,
|
| 39743 |
+
"learning_rate": 0.1,
|
| 39744 |
+
"loss": 2.4873969554901123,
|
| 39745 |
+
"step": 11346
|
| 39746 |
+
},
|
| 39747 |
+
{
|
| 39748 |
+
"epoch": 0.36025396825396827,
|
| 39749 |
+
"grad_norm": 0.443359375,
|
| 39750 |
+
"learning_rate": 0.1,
|
| 39751 |
+
"loss": 2.510312080383301,
|
| 39752 |
+
"step": 11348
|
| 39753 |
+
},
|
| 39754 |
+
{
|
| 39755 |
+
"epoch": 0.3603174603174603,
|
| 39756 |
+
"grad_norm": 0.1875,
|
| 39757 |
+
"learning_rate": 0.1,
|
| 39758 |
+
"loss": 2.477294683456421,
|
| 39759 |
+
"step": 11350
|
| 39760 |
+
},
|
| 39761 |
+
{
|
| 39762 |
+
"epoch": 0.36038095238095236,
|
| 39763 |
+
"grad_norm": 0.1328125,
|
| 39764 |
+
"learning_rate": 0.1,
|
| 39765 |
+
"loss": 2.4856131076812744,
|
| 39766 |
+
"step": 11352
|
| 39767 |
+
},
|
| 39768 |
+
{
|
| 39769 |
+
"epoch": 0.36044444444444446,
|
| 39770 |
+
"grad_norm": 0.16796875,
|
| 39771 |
+
"learning_rate": 0.1,
|
| 39772 |
+
"loss": 2.488382577896118,
|
| 39773 |
+
"step": 11354
|
| 39774 |
+
},
|
| 39775 |
+
{
|
| 39776 |
+
"epoch": 0.3605079365079365,
|
| 39777 |
+
"grad_norm": 0.1826171875,
|
| 39778 |
+
"learning_rate": 0.1,
|
| 39779 |
+
"loss": 2.477754831314087,
|
| 39780 |
+
"step": 11356
|
| 39781 |
+
},
|
| 39782 |
+
{
|
| 39783 |
+
"epoch": 0.3605714285714286,
|
| 39784 |
+
"grad_norm": 0.423828125,
|
| 39785 |
+
"learning_rate": 0.1,
|
| 39786 |
+
"loss": 2.481013774871826,
|
| 39787 |
+
"step": 11358
|
| 39788 |
+
},
|
| 39789 |
+
{
|
| 39790 |
+
"epoch": 0.36063492063492064,
|
| 39791 |
+
"grad_norm": 0.484375,
|
| 39792 |
+
"learning_rate": 0.1,
|
| 39793 |
+
"loss": 2.490089178085327,
|
| 39794 |
+
"step": 11360
|
| 39795 |
+
},
|
| 39796 |
+
{
|
| 39797 |
+
"epoch": 0.3606984126984127,
|
| 39798 |
+
"grad_norm": 0.07958984375,
|
| 39799 |
+
"learning_rate": 0.1,
|
| 39800 |
+
"loss": 2.5008773803710938,
|
| 39801 |
+
"step": 11362
|
| 39802 |
+
},
|
| 39803 |
+
{
|
| 39804 |
+
"epoch": 0.3607619047619048,
|
| 39805 |
+
"grad_norm": 0.232421875,
|
| 39806 |
+
"learning_rate": 0.1,
|
| 39807 |
+
"loss": 2.492215871810913,
|
| 39808 |
+
"step": 11364
|
| 39809 |
+
},
|
| 39810 |
+
{
|
| 39811 |
+
"epoch": 0.3608253968253968,
|
| 39812 |
+
"grad_norm": 0.345703125,
|
| 39813 |
+
"learning_rate": 0.1,
|
| 39814 |
+
"loss": 2.4556965827941895,
|
| 39815 |
+
"step": 11366
|
| 39816 |
+
},
|
| 39817 |
+
{
|
| 39818 |
+
"epoch": 0.36088888888888887,
|
| 39819 |
+
"grad_norm": 0.294921875,
|
| 39820 |
+
"learning_rate": 0.1,
|
| 39821 |
+
"loss": 2.5121169090270996,
|
| 39822 |
+
"step": 11368
|
| 39823 |
+
},
|
| 39824 |
+
{
|
| 39825 |
+
"epoch": 0.36095238095238097,
|
| 39826 |
+
"grad_norm": 0.201171875,
|
| 39827 |
+
"learning_rate": 0.1,
|
| 39828 |
+
"loss": 2.476003885269165,
|
| 39829 |
+
"step": 11370
|
| 39830 |
+
},
|
| 39831 |
+
{
|
| 39832 |
+
"epoch": 0.361015873015873,
|
| 39833 |
+
"grad_norm": 0.14453125,
|
| 39834 |
+
"learning_rate": 0.1,
|
| 39835 |
+
"loss": 2.4960570335388184,
|
| 39836 |
+
"step": 11372
|
| 39837 |
+
},
|
| 39838 |
+
{
|
| 39839 |
+
"epoch": 0.36107936507936506,
|
| 39840 |
+
"grad_norm": 0.130859375,
|
| 39841 |
+
"learning_rate": 0.1,
|
| 39842 |
+
"loss": 2.501624822616577,
|
| 39843 |
+
"step": 11374
|
| 39844 |
+
},
|
| 39845 |
+
{
|
| 39846 |
+
"epoch": 0.36114285714285715,
|
| 39847 |
+
"grad_norm": 0.12060546875,
|
| 39848 |
+
"learning_rate": 0.1,
|
| 39849 |
+
"loss": 2.5003836154937744,
|
| 39850 |
+
"step": 11376
|
| 39851 |
+
},
|
| 39852 |
+
{
|
| 39853 |
+
"epoch": 0.3612063492063492,
|
| 39854 |
+
"grad_norm": 0.2333984375,
|
| 39855 |
+
"learning_rate": 0.1,
|
| 39856 |
+
"loss": 2.4826653003692627,
|
| 39857 |
+
"step": 11378
|
| 39858 |
+
},
|
| 39859 |
+
{
|
| 39860 |
+
"epoch": 0.3612698412698413,
|
| 39861 |
+
"grad_norm": 0.09228515625,
|
| 39862 |
+
"learning_rate": 0.1,
|
| 39863 |
+
"loss": 2.495271682739258,
|
| 39864 |
+
"step": 11380
|
| 39865 |
+
},
|
| 39866 |
+
{
|
| 39867 |
+
"epoch": 0.36133333333333334,
|
| 39868 |
+
"grad_norm": 0.0830078125,
|
| 39869 |
+
"learning_rate": 0.1,
|
| 39870 |
+
"loss": 2.492497205734253,
|
| 39871 |
+
"step": 11382
|
| 39872 |
+
},
|
| 39873 |
+
{
|
| 39874 |
+
"epoch": 0.3613968253968254,
|
| 39875 |
+
"grad_norm": 0.07958984375,
|
| 39876 |
+
"learning_rate": 0.1,
|
| 39877 |
+
"loss": 2.495521306991577,
|
| 39878 |
+
"step": 11384
|
| 39879 |
+
},
|
| 39880 |
+
{
|
| 39881 |
+
"epoch": 0.3614603174603175,
|
| 39882 |
+
"grad_norm": 0.25390625,
|
| 39883 |
+
"learning_rate": 0.1,
|
| 39884 |
+
"loss": 2.488354444503784,
|
| 39885 |
+
"step": 11386
|
| 39886 |
+
},
|
| 39887 |
+
{
|
| 39888 |
+
"epoch": 0.3615238095238095,
|
| 39889 |
+
"grad_norm": 0.162109375,
|
| 39890 |
+
"learning_rate": 0.1,
|
| 39891 |
+
"loss": 2.4690821170806885,
|
| 39892 |
+
"step": 11388
|
| 39893 |
+
},
|
| 39894 |
+
{
|
| 39895 |
+
"epoch": 0.36158730158730157,
|
| 39896 |
+
"grad_norm": 0.1728515625,
|
| 39897 |
+
"learning_rate": 0.1,
|
| 39898 |
+
"loss": 2.473665475845337,
|
| 39899 |
+
"step": 11390
|
| 39900 |
+
},
|
| 39901 |
+
{
|
| 39902 |
+
"epoch": 0.36165079365079367,
|
| 39903 |
+
"grad_norm": 0.12255859375,
|
| 39904 |
+
"learning_rate": 0.1,
|
| 39905 |
+
"loss": 2.507917642593384,
|
| 39906 |
+
"step": 11392
|
| 39907 |
+
},
|
| 39908 |
+
{
|
| 39909 |
+
"epoch": 0.3617142857142857,
|
| 39910 |
+
"grad_norm": 0.166015625,
|
| 39911 |
+
"learning_rate": 0.1,
|
| 39912 |
+
"loss": 2.496009588241577,
|
| 39913 |
+
"step": 11394
|
| 39914 |
+
},
|
| 39915 |
+
{
|
| 39916 |
+
"epoch": 0.36177777777777775,
|
| 39917 |
+
"grad_norm": 0.4140625,
|
| 39918 |
+
"learning_rate": 0.1,
|
| 39919 |
+
"loss": 2.498739719390869,
|
| 39920 |
+
"step": 11396
|
| 39921 |
+
},
|
| 39922 |
+
{
|
| 39923 |
+
"epoch": 0.36184126984126985,
|
| 39924 |
+
"grad_norm": 0.353515625,
|
| 39925 |
+
"learning_rate": 0.1,
|
| 39926 |
+
"loss": 2.4704389572143555,
|
| 39927 |
+
"step": 11398
|
| 39928 |
+
},
|
| 39929 |
+
{
|
| 39930 |
+
"epoch": 0.3619047619047619,
|
| 39931 |
+
"grad_norm": 0.130859375,
|
| 39932 |
+
"learning_rate": 0.1,
|
| 39933 |
+
"loss": 2.4971871376037598,
|
| 39934 |
+
"step": 11400
|
| 39935 |
+
},
|
| 39936 |
+
{
|
| 39937 |
+
"epoch": 0.361968253968254,
|
| 39938 |
+
"grad_norm": 0.10205078125,
|
| 39939 |
+
"learning_rate": 0.1,
|
| 39940 |
+
"loss": 2.4943349361419678,
|
| 39941 |
+
"step": 11402
|
| 39942 |
+
},
|
| 39943 |
+
{
|
| 39944 |
+
"epoch": 0.36203174603174604,
|
| 39945 |
+
"grad_norm": 0.255859375,
|
| 39946 |
+
"learning_rate": 0.1,
|
| 39947 |
+
"loss": 2.4700849056243896,
|
| 39948 |
+
"step": 11404
|
| 39949 |
+
},
|
| 39950 |
+
{
|
| 39951 |
+
"epoch": 0.3620952380952381,
|
| 39952 |
+
"grad_norm": 0.3125,
|
| 39953 |
+
"learning_rate": 0.1,
|
| 39954 |
+
"loss": 2.4819083213806152,
|
| 39955 |
+
"step": 11406
|
| 39956 |
+
},
|
| 39957 |
+
{
|
| 39958 |
+
"epoch": 0.3621587301587302,
|
| 39959 |
+
"grad_norm": 0.193359375,
|
| 39960 |
+
"learning_rate": 0.1,
|
| 39961 |
+
"loss": 2.4707014560699463,
|
| 39962 |
+
"step": 11408
|
| 39963 |
+
},
|
| 39964 |
+
{
|
| 39965 |
+
"epoch": 0.3622222222222222,
|
| 39966 |
+
"grad_norm": 0.09228515625,
|
| 39967 |
+
"learning_rate": 0.1,
|
| 39968 |
+
"loss": 2.478113889694214,
|
| 39969 |
+
"step": 11410
|
| 39970 |
+
},
|
| 39971 |
+
{
|
| 39972 |
+
"epoch": 0.36228571428571427,
|
| 39973 |
+
"grad_norm": 0.1318359375,
|
| 39974 |
+
"learning_rate": 0.1,
|
| 39975 |
+
"loss": 2.5012660026550293,
|
| 39976 |
+
"step": 11412
|
| 39977 |
+
},
|
| 39978 |
+
{
|
| 39979 |
+
"epoch": 0.36234920634920637,
|
| 39980 |
+
"grad_norm": 0.140625,
|
| 39981 |
+
"learning_rate": 0.1,
|
| 39982 |
+
"loss": 2.5109262466430664,
|
| 39983 |
+
"step": 11414
|
| 39984 |
+
},
|
| 39985 |
+
{
|
| 39986 |
+
"epoch": 0.3624126984126984,
|
| 39987 |
+
"grad_norm": 0.55078125,
|
| 39988 |
+
"learning_rate": 0.1,
|
| 39989 |
+
"loss": 2.4876835346221924,
|
| 39990 |
+
"step": 11416
|
| 39991 |
+
},
|
| 39992 |
+
{
|
| 39993 |
+
"epoch": 0.36247619047619045,
|
| 39994 |
+
"grad_norm": 0.3125,
|
| 39995 |
+
"learning_rate": 0.1,
|
| 39996 |
+
"loss": 2.488044023513794,
|
| 39997 |
+
"step": 11418
|
| 39998 |
+
},
|
| 39999 |
+
{
|
| 40000 |
+
"epoch": 0.36253968253968255,
|
| 40001 |
+
"grad_norm": 0.1337890625,
|
| 40002 |
+
"learning_rate": 0.1,
|
| 40003 |
+
"loss": 2.474841594696045,
|
| 40004 |
+
"step": 11420
|
| 40005 |
+
},
|
| 40006 |
+
{
|
| 40007 |
+
"epoch": 0.3626031746031746,
|
| 40008 |
+
"grad_norm": 0.2275390625,
|
| 40009 |
+
"learning_rate": 0.1,
|
| 40010 |
+
"loss": 2.457033157348633,
|
| 40011 |
+
"step": 11422
|
| 40012 |
+
},
|
| 40013 |
+
{
|
| 40014 |
+
"epoch": 0.3626666666666667,
|
| 40015 |
+
"grad_norm": 0.181640625,
|
| 40016 |
+
"learning_rate": 0.1,
|
| 40017 |
+
"loss": 2.4880568981170654,
|
| 40018 |
+
"step": 11424
|
| 40019 |
+
},
|
| 40020 |
+
{
|
| 40021 |
+
"epoch": 0.36273015873015874,
|
| 40022 |
+
"grad_norm": 0.1171875,
|
| 40023 |
+
"learning_rate": 0.1,
|
| 40024 |
+
"loss": 2.4715211391448975,
|
| 40025 |
+
"step": 11426
|
| 40026 |
+
},
|
| 40027 |
+
{
|
| 40028 |
+
"epoch": 0.3627936507936508,
|
| 40029 |
+
"grad_norm": 0.13671875,
|
| 40030 |
+
"learning_rate": 0.1,
|
| 40031 |
+
"loss": 2.4934537410736084,
|
| 40032 |
+
"step": 11428
|
| 40033 |
+
},
|
| 40034 |
+
{
|
| 40035 |
+
"epoch": 0.3628571428571429,
|
| 40036 |
+
"grad_norm": 0.294921875,
|
| 40037 |
+
"learning_rate": 0.1,
|
| 40038 |
+
"loss": 2.498490333557129,
|
| 40039 |
+
"step": 11430
|
| 40040 |
+
},
|
| 40041 |
+
{
|
| 40042 |
+
"epoch": 0.3629206349206349,
|
| 40043 |
+
"grad_norm": 0.498046875,
|
| 40044 |
+
"learning_rate": 0.1,
|
| 40045 |
+
"loss": 2.517286777496338,
|
| 40046 |
+
"step": 11432
|
| 40047 |
+
},
|
| 40048 |
+
{
|
| 40049 |
+
"epoch": 0.36298412698412696,
|
| 40050 |
+
"grad_norm": 0.06591796875,
|
| 40051 |
+
"learning_rate": 0.1,
|
| 40052 |
+
"loss": 2.502140760421753,
|
| 40053 |
+
"step": 11434
|
| 40054 |
+
},
|
| 40055 |
+
{
|
| 40056 |
+
"epoch": 0.36304761904761906,
|
| 40057 |
+
"grad_norm": 0.06396484375,
|
| 40058 |
+
"learning_rate": 0.1,
|
| 40059 |
+
"loss": 2.482360601425171,
|
| 40060 |
+
"step": 11436
|
| 40061 |
+
},
|
| 40062 |
+
{
|
| 40063 |
+
"epoch": 0.3631111111111111,
|
| 40064 |
+
"grad_norm": 0.064453125,
|
| 40065 |
+
"learning_rate": 0.1,
|
| 40066 |
+
"loss": 2.470284938812256,
|
| 40067 |
+
"step": 11438
|
| 40068 |
+
},
|
| 40069 |
+
{
|
| 40070 |
+
"epoch": 0.36317460317460315,
|
| 40071 |
+
"grad_norm": 0.109375,
|
| 40072 |
+
"learning_rate": 0.1,
|
| 40073 |
+
"loss": 2.4868221282958984,
|
| 40074 |
+
"step": 11440
|
| 40075 |
+
},
|
| 40076 |
+
{
|
| 40077 |
+
"epoch": 0.36323809523809525,
|
| 40078 |
+
"grad_norm": 0.2216796875,
|
| 40079 |
+
"learning_rate": 0.1,
|
| 40080 |
+
"loss": 2.4566707611083984,
|
| 40081 |
+
"step": 11442
|
| 40082 |
+
},
|
| 40083 |
+
{
|
| 40084 |
+
"epoch": 0.3633015873015873,
|
| 40085 |
+
"grad_norm": 0.3125,
|
| 40086 |
+
"learning_rate": 0.1,
|
| 40087 |
+
"loss": 2.509664297103882,
|
| 40088 |
+
"step": 11444
|
| 40089 |
+
},
|
| 40090 |
+
{
|
| 40091 |
+
"epoch": 0.3633650793650794,
|
| 40092 |
+
"grad_norm": 0.061767578125,
|
| 40093 |
+
"learning_rate": 0.1,
|
| 40094 |
+
"loss": 2.4820139408111572,
|
| 40095 |
+
"step": 11446
|
| 40096 |
+
},
|
| 40097 |
+
{
|
| 40098 |
+
"epoch": 0.36342857142857143,
|
| 40099 |
+
"grad_norm": 0.11474609375,
|
| 40100 |
+
"learning_rate": 0.1,
|
| 40101 |
+
"loss": 2.500307559967041,
|
| 40102 |
+
"step": 11448
|
| 40103 |
+
},
|
| 40104 |
+
{
|
| 40105 |
+
"epoch": 0.3634920634920635,
|
| 40106 |
+
"grad_norm": 0.251953125,
|
| 40107 |
+
"learning_rate": 0.1,
|
| 40108 |
+
"loss": 2.495778799057007,
|
| 40109 |
+
"step": 11450
|
| 40110 |
+
},
|
| 40111 |
+
{
|
| 40112 |
+
"epoch": 0.3635555555555556,
|
| 40113 |
+
"grad_norm": 0.189453125,
|
| 40114 |
+
"learning_rate": 0.1,
|
| 40115 |
+
"loss": 2.4892730712890625,
|
| 40116 |
+
"step": 11452
|
| 40117 |
+
},
|
| 40118 |
+
{
|
| 40119 |
+
"epoch": 0.3636190476190476,
|
| 40120 |
+
"grad_norm": 0.5546875,
|
| 40121 |
+
"learning_rate": 0.1,
|
| 40122 |
+
"loss": 2.4844303131103516,
|
| 40123 |
+
"step": 11454
|
| 40124 |
+
},
|
| 40125 |
+
{
|
| 40126 |
+
"epoch": 0.36368253968253966,
|
| 40127 |
+
"grad_norm": 0.369140625,
|
| 40128 |
+
"learning_rate": 0.1,
|
| 40129 |
+
"loss": 2.50929856300354,
|
| 40130 |
+
"step": 11456
|
| 40131 |
+
},
|
| 40132 |
+
{
|
| 40133 |
+
"epoch": 0.36374603174603176,
|
| 40134 |
+
"grad_norm": 0.0859375,
|
| 40135 |
+
"learning_rate": 0.1,
|
| 40136 |
+
"loss": 2.4976625442504883,
|
| 40137 |
+
"step": 11458
|
| 40138 |
+
},
|
| 40139 |
+
{
|
| 40140 |
+
"epoch": 0.3638095238095238,
|
| 40141 |
+
"grad_norm": 0.076171875,
|
| 40142 |
+
"learning_rate": 0.1,
|
| 40143 |
+
"loss": 2.495049238204956,
|
| 40144 |
+
"step": 11460
|
| 40145 |
+
},
|
| 40146 |
+
{
|
| 40147 |
+
"epoch": 0.36387301587301585,
|
| 40148 |
+
"grad_norm": 0.1552734375,
|
| 40149 |
+
"learning_rate": 0.1,
|
| 40150 |
+
"loss": 2.5226457118988037,
|
| 40151 |
+
"step": 11462
|
| 40152 |
+
},
|
| 40153 |
+
{
|
| 40154 |
+
"epoch": 0.36393650793650795,
|
| 40155 |
+
"grad_norm": 0.1953125,
|
| 40156 |
+
"learning_rate": 0.1,
|
| 40157 |
+
"loss": 2.4973907470703125,
|
| 40158 |
+
"step": 11464
|
| 40159 |
+
},
|
| 40160 |
+
{
|
| 40161 |
+
"epoch": 0.364,
|
| 40162 |
+
"grad_norm": 0.130859375,
|
| 40163 |
+
"learning_rate": 0.1,
|
| 40164 |
+
"loss": 2.4904561042785645,
|
| 40165 |
+
"step": 11466
|
| 40166 |
+
},
|
| 40167 |
+
{
|
| 40168 |
+
"epoch": 0.3640634920634921,
|
| 40169 |
+
"grad_norm": 0.060302734375,
|
| 40170 |
+
"learning_rate": 0.1,
|
| 40171 |
+
"loss": 2.470416784286499,
|
| 40172 |
+
"step": 11468
|
| 40173 |
+
},
|
| 40174 |
+
{
|
| 40175 |
+
"epoch": 0.36412698412698413,
|
| 40176 |
+
"grad_norm": 0.2431640625,
|
| 40177 |
+
"learning_rate": 0.1,
|
| 40178 |
+
"loss": 2.502138137817383,
|
| 40179 |
+
"step": 11470
|
| 40180 |
+
},
|
| 40181 |
+
{
|
| 40182 |
+
"epoch": 0.3641904761904762,
|
| 40183 |
+
"grad_norm": 0.490234375,
|
| 40184 |
+
"learning_rate": 0.1,
|
| 40185 |
+
"loss": 2.489926338195801,
|
| 40186 |
+
"step": 11472
|
| 40187 |
+
},
|
| 40188 |
+
{
|
| 40189 |
+
"epoch": 0.3642539682539683,
|
| 40190 |
+
"grad_norm": 0.2333984375,
|
| 40191 |
+
"learning_rate": 0.1,
|
| 40192 |
+
"loss": 2.488893747329712,
|
| 40193 |
+
"step": 11474
|
| 40194 |
+
},
|
| 40195 |
+
{
|
| 40196 |
+
"epoch": 0.3643174603174603,
|
| 40197 |
+
"grad_norm": 0.1689453125,
|
| 40198 |
+
"learning_rate": 0.1,
|
| 40199 |
+
"loss": 2.5296382904052734,
|
| 40200 |
+
"step": 11476
|
| 40201 |
+
},
|
| 40202 |
+
{
|
| 40203 |
+
"epoch": 0.36438095238095236,
|
| 40204 |
+
"grad_norm": 0.14453125,
|
| 40205 |
+
"learning_rate": 0.1,
|
| 40206 |
+
"loss": 2.469823122024536,
|
| 40207 |
+
"step": 11478
|
| 40208 |
+
},
|
| 40209 |
+
{
|
| 40210 |
+
"epoch": 0.36444444444444446,
|
| 40211 |
+
"grad_norm": 0.10498046875,
|
| 40212 |
+
"learning_rate": 0.1,
|
| 40213 |
+
"loss": 2.4645936489105225,
|
| 40214 |
+
"step": 11480
|
| 40215 |
+
},
|
| 40216 |
+
{
|
| 40217 |
+
"epoch": 0.3645079365079365,
|
| 40218 |
+
"grad_norm": 0.12353515625,
|
| 40219 |
+
"learning_rate": 0.1,
|
| 40220 |
+
"loss": 2.4924604892730713,
|
| 40221 |
+
"step": 11482
|
| 40222 |
+
},
|
| 40223 |
+
{
|
| 40224 |
+
"epoch": 0.36457142857142855,
|
| 40225 |
+
"grad_norm": 0.06787109375,
|
| 40226 |
+
"learning_rate": 0.1,
|
| 40227 |
+
"loss": 2.5029876232147217,
|
| 40228 |
+
"step": 11484
|
| 40229 |
+
},
|
| 40230 |
+
{
|
| 40231 |
+
"epoch": 0.36463492063492065,
|
| 40232 |
+
"grad_norm": 0.3125,
|
| 40233 |
+
"learning_rate": 0.1,
|
| 40234 |
+
"loss": 2.523407220840454,
|
| 40235 |
+
"step": 11486
|
| 40236 |
+
},
|
| 40237 |
+
{
|
| 40238 |
+
"epoch": 0.3646984126984127,
|
| 40239 |
+
"grad_norm": 0.357421875,
|
| 40240 |
+
"learning_rate": 0.1,
|
| 40241 |
+
"loss": 2.4972376823425293,
|
| 40242 |
+
"step": 11488
|
| 40243 |
+
},
|
| 40244 |
+
{
|
| 40245 |
+
"epoch": 0.3647619047619048,
|
| 40246 |
+
"grad_norm": 0.19921875,
|
| 40247 |
+
"learning_rate": 0.1,
|
| 40248 |
+
"loss": 2.462998151779175,
|
| 40249 |
+
"step": 11490
|
| 40250 |
+
},
|
| 40251 |
+
{
|
| 40252 |
+
"epoch": 0.36482539682539683,
|
| 40253 |
+
"grad_norm": 0.10791015625,
|
| 40254 |
+
"learning_rate": 0.1,
|
| 40255 |
+
"loss": 2.500276803970337,
|
| 40256 |
+
"step": 11492
|
| 40257 |
+
},
|
| 40258 |
+
{
|
| 40259 |
+
"epoch": 0.3648888888888889,
|
| 40260 |
+
"grad_norm": 0.0927734375,
|
| 40261 |
+
"learning_rate": 0.1,
|
| 40262 |
+
"loss": 2.4935381412506104,
|
| 40263 |
+
"step": 11494
|
| 40264 |
+
},
|
| 40265 |
+
{
|
| 40266 |
+
"epoch": 0.364952380952381,
|
| 40267 |
+
"grad_norm": 0.1591796875,
|
| 40268 |
+
"learning_rate": 0.1,
|
| 40269 |
+
"loss": 2.4755890369415283,
|
| 40270 |
+
"step": 11496
|
| 40271 |
+
},
|
| 40272 |
+
{
|
| 40273 |
+
"epoch": 0.365015873015873,
|
| 40274 |
+
"grad_norm": 0.1591796875,
|
| 40275 |
+
"learning_rate": 0.1,
|
| 40276 |
+
"loss": 2.501697301864624,
|
| 40277 |
+
"step": 11498
|
| 40278 |
+
},
|
| 40279 |
+
{
|
| 40280 |
+
"epoch": 0.36507936507936506,
|
| 40281 |
+
"grad_norm": 0.48046875,
|
| 40282 |
+
"learning_rate": 0.1,
|
| 40283 |
+
"loss": 2.5090396404266357,
|
| 40284 |
+
"step": 11500
|
| 40285 |
+
},
|
| 40286 |
+
{
|
| 40287 |
+
"epoch": 0.36514285714285716,
|
| 40288 |
+
"grad_norm": 0.0908203125,
|
| 40289 |
+
"learning_rate": 0.1,
|
| 40290 |
+
"loss": 2.4752163887023926,
|
| 40291 |
+
"step": 11502
|
| 40292 |
+
},
|
| 40293 |
+
{
|
| 40294 |
+
"epoch": 0.3652063492063492,
|
| 40295 |
+
"grad_norm": 0.1787109375,
|
| 40296 |
+
"learning_rate": 0.1,
|
| 40297 |
+
"loss": 2.4872875213623047,
|
| 40298 |
+
"step": 11504
|
| 40299 |
+
},
|
| 40300 |
+
{
|
| 40301 |
+
"epoch": 0.36526984126984124,
|
| 40302 |
+
"grad_norm": 0.068359375,
|
| 40303 |
+
"learning_rate": 0.1,
|
| 40304 |
+
"loss": 2.4730489253997803,
|
| 40305 |
+
"step": 11506
|
| 40306 |
+
},
|
| 40307 |
+
{
|
| 40308 |
+
"epoch": 0.36533333333333334,
|
| 40309 |
+
"grad_norm": 0.373046875,
|
| 40310 |
+
"learning_rate": 0.1,
|
| 40311 |
+
"loss": 2.4886996746063232,
|
| 40312 |
+
"step": 11508
|
| 40313 |
+
},
|
| 40314 |
+
{
|
| 40315 |
+
"epoch": 0.3653968253968254,
|
| 40316 |
+
"grad_norm": 0.35546875,
|
| 40317 |
+
"learning_rate": 0.1,
|
| 40318 |
+
"loss": 2.476940870285034,
|
| 40319 |
+
"step": 11510
|
| 40320 |
+
},
|
| 40321 |
+
{
|
| 40322 |
+
"epoch": 0.3654603174603175,
|
| 40323 |
+
"grad_norm": 0.1484375,
|
| 40324 |
+
"learning_rate": 0.1,
|
| 40325 |
+
"loss": 2.4784579277038574,
|
| 40326 |
+
"step": 11512
|
| 40327 |
+
},
|
| 40328 |
+
{
|
| 40329 |
+
"epoch": 0.36552380952380953,
|
| 40330 |
+
"grad_norm": 0.10009765625,
|
| 40331 |
+
"learning_rate": 0.1,
|
| 40332 |
+
"loss": 2.4905340671539307,
|
| 40333 |
+
"step": 11514
|
| 40334 |
+
},
|
| 40335 |
+
{
|
| 40336 |
+
"epoch": 0.36558730158730157,
|
| 40337 |
+
"grad_norm": 0.1669921875,
|
| 40338 |
+
"learning_rate": 0.1,
|
| 40339 |
+
"loss": 2.4924633502960205,
|
| 40340 |
+
"step": 11516
|
| 40341 |
+
},
|
| 40342 |
+
{
|
| 40343 |
+
"epoch": 0.36565079365079367,
|
| 40344 |
+
"grad_norm": 0.06591796875,
|
| 40345 |
+
"learning_rate": 0.1,
|
| 40346 |
+
"loss": 2.459251642227173,
|
| 40347 |
+
"step": 11518
|
| 40348 |
+
},
|
| 40349 |
+
{
|
| 40350 |
+
"epoch": 0.3657142857142857,
|
| 40351 |
+
"grad_norm": 0.0791015625,
|
| 40352 |
+
"learning_rate": 0.1,
|
| 40353 |
+
"loss": 2.4829344749450684,
|
| 40354 |
+
"step": 11520
|
| 40355 |
+
},
|
| 40356 |
+
{
|
| 40357 |
+
"epoch": 0.36577777777777776,
|
| 40358 |
+
"grad_norm": 0.15234375,
|
| 40359 |
+
"learning_rate": 0.1,
|
| 40360 |
+
"loss": 2.5123536586761475,
|
| 40361 |
+
"step": 11522
|
| 40362 |
+
},
|
| 40363 |
+
{
|
| 40364 |
+
"epoch": 0.36584126984126986,
|
| 40365 |
+
"grad_norm": 0.1220703125,
|
| 40366 |
+
"learning_rate": 0.1,
|
| 40367 |
+
"loss": 2.4925343990325928,
|
| 40368 |
+
"step": 11524
|
| 40369 |
+
},
|
| 40370 |
+
{
|
| 40371 |
+
"epoch": 0.3659047619047619,
|
| 40372 |
+
"grad_norm": 0.220703125,
|
| 40373 |
+
"learning_rate": 0.1,
|
| 40374 |
+
"loss": 2.463898181915283,
|
| 40375 |
+
"step": 11526
|
| 40376 |
+
},
|
| 40377 |
+
{
|
| 40378 |
+
"epoch": 0.36596825396825394,
|
| 40379 |
+
"grad_norm": 0.1728515625,
|
| 40380 |
+
"learning_rate": 0.1,
|
| 40381 |
+
"loss": 2.4763333797454834,
|
| 40382 |
+
"step": 11528
|
| 40383 |
+
},
|
| 40384 |
+
{
|
| 40385 |
+
"epoch": 0.36603174603174604,
|
| 40386 |
+
"grad_norm": 0.353515625,
|
| 40387 |
+
"learning_rate": 0.1,
|
| 40388 |
+
"loss": 2.515787124633789,
|
| 40389 |
+
"step": 11530
|
| 40390 |
+
},
|
| 40391 |
+
{
|
| 40392 |
+
"epoch": 0.3660952380952381,
|
| 40393 |
+
"grad_norm": 0.361328125,
|
| 40394 |
+
"learning_rate": 0.1,
|
| 40395 |
+
"loss": 2.4792990684509277,
|
| 40396 |
+
"step": 11532
|
| 40397 |
+
},
|
| 40398 |
+
{
|
| 40399 |
+
"epoch": 0.3661587301587302,
|
| 40400 |
+
"grad_norm": 0.11572265625,
|
| 40401 |
+
"learning_rate": 0.1,
|
| 40402 |
+
"loss": 2.487999200820923,
|
| 40403 |
+
"step": 11534
|
| 40404 |
+
},
|
| 40405 |
+
{
|
| 40406 |
+
"epoch": 0.3662222222222222,
|
| 40407 |
+
"grad_norm": 0.0966796875,
|
| 40408 |
+
"learning_rate": 0.1,
|
| 40409 |
+
"loss": 2.4968855381011963,
|
| 40410 |
+
"step": 11536
|
| 40411 |
+
},
|
| 40412 |
+
{
|
| 40413 |
+
"epoch": 0.36628571428571427,
|
| 40414 |
+
"grad_norm": 0.1611328125,
|
| 40415 |
+
"learning_rate": 0.1,
|
| 40416 |
+
"loss": 2.4591152667999268,
|
| 40417 |
+
"step": 11538
|
| 40418 |
+
},
|
| 40419 |
+
{
|
| 40420 |
+
"epoch": 0.36634920634920637,
|
| 40421 |
+
"grad_norm": 0.166015625,
|
| 40422 |
+
"learning_rate": 0.1,
|
| 40423 |
+
"loss": 2.4713194370269775,
|
| 40424 |
+
"step": 11540
|
| 40425 |
+
},
|
| 40426 |
+
{
|
| 40427 |
+
"epoch": 0.3664126984126984,
|
| 40428 |
+
"grad_norm": 0.259765625,
|
| 40429 |
+
"learning_rate": 0.1,
|
| 40430 |
+
"loss": 2.4807326793670654,
|
| 40431 |
+
"step": 11542
|
| 40432 |
+
},
|
| 40433 |
+
{
|
| 40434 |
+
"epoch": 0.36647619047619046,
|
| 40435 |
+
"grad_norm": 0.17578125,
|
| 40436 |
+
"learning_rate": 0.1,
|
| 40437 |
+
"loss": 2.4585866928100586,
|
| 40438 |
+
"step": 11544
|
| 40439 |
+
},
|
| 40440 |
+
{
|
| 40441 |
+
"epoch": 0.36653968253968255,
|
| 40442 |
+
"grad_norm": 0.28125,
|
| 40443 |
+
"learning_rate": 0.1,
|
| 40444 |
+
"loss": 2.4734954833984375,
|
| 40445 |
+
"step": 11546
|
| 40446 |
+
},
|
| 40447 |
+
{
|
| 40448 |
+
"epoch": 0.3666031746031746,
|
| 40449 |
+
"grad_norm": 0.2197265625,
|
| 40450 |
+
"learning_rate": 0.1,
|
| 40451 |
+
"loss": 2.5005085468292236,
|
| 40452 |
+
"step": 11548
|
| 40453 |
+
},
|
| 40454 |
+
{
|
| 40455 |
+
"epoch": 0.36666666666666664,
|
| 40456 |
+
"grad_norm": 0.365234375,
|
| 40457 |
+
"learning_rate": 0.1,
|
| 40458 |
+
"loss": 2.489720582962036,
|
| 40459 |
+
"step": 11550
|
| 40460 |
+
},
|
| 40461 |
+
{
|
| 40462 |
+
"epoch": 0.36673015873015874,
|
| 40463 |
+
"grad_norm": 0.435546875,
|
| 40464 |
+
"learning_rate": 0.1,
|
| 40465 |
+
"loss": 2.4803593158721924,
|
| 40466 |
+
"step": 11552
|
| 40467 |
+
},
|
| 40468 |
+
{
|
| 40469 |
+
"epoch": 0.3667936507936508,
|
| 40470 |
+
"grad_norm": 0.263671875,
|
| 40471 |
+
"learning_rate": 0.1,
|
| 40472 |
+
"loss": 2.4626052379608154,
|
| 40473 |
+
"step": 11554
|
| 40474 |
+
},
|
| 40475 |
+
{
|
| 40476 |
+
"epoch": 0.3668571428571429,
|
| 40477 |
+
"grad_norm": 0.09521484375,
|
| 40478 |
+
"learning_rate": 0.1,
|
| 40479 |
+
"loss": 2.512833833694458,
|
| 40480 |
+
"step": 11556
|
| 40481 |
+
},
|
| 40482 |
+
{
|
| 40483 |
+
"epoch": 0.3669206349206349,
|
| 40484 |
+
"grad_norm": 0.1416015625,
|
| 40485 |
+
"learning_rate": 0.1,
|
| 40486 |
+
"loss": 2.4813570976257324,
|
| 40487 |
+
"step": 11558
|
| 40488 |
+
},
|
| 40489 |
+
{
|
| 40490 |
+
"epoch": 0.36698412698412697,
|
| 40491 |
+
"grad_norm": 0.181640625,
|
| 40492 |
+
"learning_rate": 0.1,
|
| 40493 |
+
"loss": 2.470029354095459,
|
| 40494 |
+
"step": 11560
|
| 40495 |
+
},
|
| 40496 |
+
{
|
| 40497 |
+
"epoch": 0.36704761904761907,
|
| 40498 |
+
"grad_norm": 0.1494140625,
|
| 40499 |
+
"learning_rate": 0.1,
|
| 40500 |
+
"loss": 2.4803454875946045,
|
| 40501 |
+
"step": 11562
|
| 40502 |
+
},
|
| 40503 |
+
{
|
| 40504 |
+
"epoch": 0.3671111111111111,
|
| 40505 |
+
"grad_norm": 0.275390625,
|
| 40506 |
+
"learning_rate": 0.1,
|
| 40507 |
+
"loss": 2.481099843978882,
|
| 40508 |
+
"step": 11564
|
| 40509 |
+
},
|
| 40510 |
+
{
|
| 40511 |
+
"epoch": 0.36717460317460315,
|
| 40512 |
+
"grad_norm": 0.10009765625,
|
| 40513 |
+
"learning_rate": 0.1,
|
| 40514 |
+
"loss": 2.4896509647369385,
|
| 40515 |
+
"step": 11566
|
| 40516 |
+
},
|
| 40517 |
+
{
|
| 40518 |
+
"epoch": 0.36723809523809525,
|
| 40519 |
+
"grad_norm": 0.1142578125,
|
| 40520 |
+
"learning_rate": 0.1,
|
| 40521 |
+
"loss": 2.4700539112091064,
|
| 40522 |
+
"step": 11568
|
| 40523 |
+
},
|
| 40524 |
+
{
|
| 40525 |
+
"epoch": 0.3673015873015873,
|
| 40526 |
+
"grad_norm": 0.130859375,
|
| 40527 |
+
"learning_rate": 0.1,
|
| 40528 |
+
"loss": 2.4857864379882812,
|
| 40529 |
+
"step": 11570
|
| 40530 |
+
},
|
| 40531 |
+
{
|
| 40532 |
+
"epoch": 0.36736507936507934,
|
| 40533 |
+
"grad_norm": 0.06787109375,
|
| 40534 |
+
"learning_rate": 0.1,
|
| 40535 |
+
"loss": 2.4712841510772705,
|
| 40536 |
+
"step": 11572
|
| 40537 |
+
},
|
| 40538 |
+
{
|
| 40539 |
+
"epoch": 0.36742857142857144,
|
| 40540 |
+
"grad_norm": 0.265625,
|
| 40541 |
+
"learning_rate": 0.1,
|
| 40542 |
+
"loss": 2.4903788566589355,
|
| 40543 |
+
"step": 11574
|
| 40544 |
+
},
|
| 40545 |
+
{
|
| 40546 |
+
"epoch": 0.3674920634920635,
|
| 40547 |
+
"grad_norm": 0.427734375,
|
| 40548 |
+
"learning_rate": 0.1,
|
| 40549 |
+
"loss": 2.4827778339385986,
|
| 40550 |
+
"step": 11576
|
| 40551 |
+
},
|
| 40552 |
+
{
|
| 40553 |
+
"epoch": 0.3675555555555556,
|
| 40554 |
+
"grad_norm": 0.06298828125,
|
| 40555 |
+
"learning_rate": 0.1,
|
| 40556 |
+
"loss": 2.481940269470215,
|
| 40557 |
+
"step": 11578
|
| 40558 |
+
},
|
| 40559 |
+
{
|
| 40560 |
+
"epoch": 0.3676190476190476,
|
| 40561 |
+
"grad_norm": 0.130859375,
|
| 40562 |
+
"learning_rate": 0.1,
|
| 40563 |
+
"loss": 2.488921642303467,
|
| 40564 |
+
"step": 11580
|
| 40565 |
+
},
|
| 40566 |
+
{
|
| 40567 |
+
"epoch": 0.36768253968253967,
|
| 40568 |
+
"grad_norm": 0.12255859375,
|
| 40569 |
+
"learning_rate": 0.1,
|
| 40570 |
+
"loss": 2.487088918685913,
|
| 40571 |
+
"step": 11582
|
| 40572 |
+
},
|
| 40573 |
+
{
|
| 40574 |
+
"epoch": 0.36774603174603177,
|
| 40575 |
+
"grad_norm": 0.1513671875,
|
| 40576 |
+
"learning_rate": 0.1,
|
| 40577 |
+
"loss": 2.476516008377075,
|
| 40578 |
+
"step": 11584
|
| 40579 |
+
},
|
| 40580 |
+
{
|
| 40581 |
+
"epoch": 0.3678095238095238,
|
| 40582 |
+
"grad_norm": 0.1845703125,
|
| 40583 |
+
"learning_rate": 0.1,
|
| 40584 |
+
"loss": 2.4968202114105225,
|
| 40585 |
+
"step": 11586
|
| 40586 |
+
},
|
| 40587 |
+
{
|
| 40588 |
+
"epoch": 0.36787301587301585,
|
| 40589 |
+
"grad_norm": 0.0869140625,
|
| 40590 |
+
"learning_rate": 0.1,
|
| 40591 |
+
"loss": 2.4828453063964844,
|
| 40592 |
+
"step": 11588
|
| 40593 |
+
},
|
| 40594 |
+
{
|
| 40595 |
+
"epoch": 0.36793650793650795,
|
| 40596 |
+
"grad_norm": 0.2451171875,
|
| 40597 |
+
"learning_rate": 0.1,
|
| 40598 |
+
"loss": 2.4897444248199463,
|
| 40599 |
+
"step": 11590
|
| 40600 |
+
},
|
| 40601 |
+
{
|
| 40602 |
+
"epoch": 0.368,
|
| 40603 |
+
"grad_norm": 0.443359375,
|
| 40604 |
+
"learning_rate": 0.1,
|
| 40605 |
+
"loss": 2.4958691596984863,
|
| 40606 |
+
"step": 11592
|
| 40607 |
+
},
|
| 40608 |
+
{
|
| 40609 |
+
"epoch": 0.36806349206349204,
|
| 40610 |
+
"grad_norm": 0.35546875,
|
| 40611 |
+
"learning_rate": 0.1,
|
| 40612 |
+
"loss": 2.4981982707977295,
|
| 40613 |
+
"step": 11594
|
| 40614 |
+
},
|
| 40615 |
+
{
|
| 40616 |
+
"epoch": 0.36812698412698414,
|
| 40617 |
+
"grad_norm": 0.1806640625,
|
| 40618 |
+
"learning_rate": 0.1,
|
| 40619 |
+
"loss": 2.4794466495513916,
|
| 40620 |
+
"step": 11596
|
| 40621 |
+
},
|
| 40622 |
+
{
|
| 40623 |
+
"epoch": 0.3681904761904762,
|
| 40624 |
+
"grad_norm": 0.205078125,
|
| 40625 |
+
"learning_rate": 0.1,
|
| 40626 |
+
"loss": 2.477478265762329,
|
| 40627 |
+
"step": 11598
|
| 40628 |
+
},
|
| 40629 |
+
{
|
| 40630 |
+
"epoch": 0.3682539682539683,
|
| 40631 |
+
"grad_norm": 0.2392578125,
|
| 40632 |
+
"learning_rate": 0.1,
|
| 40633 |
+
"loss": 2.4623732566833496,
|
| 40634 |
+
"step": 11600
|
| 40635 |
+
},
|
| 40636 |
+
{
|
| 40637 |
+
"epoch": 0.3683174603174603,
|
| 40638 |
+
"grad_norm": 0.1689453125,
|
| 40639 |
+
"learning_rate": 0.1,
|
| 40640 |
+
"loss": 2.475332498550415,
|
| 40641 |
+
"step": 11602
|
| 40642 |
+
},
|
| 40643 |
+
{
|
| 40644 |
+
"epoch": 0.36838095238095236,
|
| 40645 |
+
"grad_norm": 0.1904296875,
|
| 40646 |
+
"learning_rate": 0.1,
|
| 40647 |
+
"loss": 2.4902873039245605,
|
| 40648 |
+
"step": 11604
|
| 40649 |
+
},
|
| 40650 |
+
{
|
| 40651 |
+
"epoch": 0.36844444444444446,
|
| 40652 |
+
"grad_norm": 0.3046875,
|
| 40653 |
+
"learning_rate": 0.1,
|
| 40654 |
+
"loss": 2.5031120777130127,
|
| 40655 |
+
"step": 11606
|
| 40656 |
+
},
|
| 40657 |
+
{
|
| 40658 |
+
"epoch": 0.3685079365079365,
|
| 40659 |
+
"grad_norm": 0.16015625,
|
| 40660 |
+
"learning_rate": 0.1,
|
| 40661 |
+
"loss": 2.4824001789093018,
|
| 40662 |
+
"step": 11608
|
| 40663 |
+
},
|
| 40664 |
+
{
|
| 40665 |
+
"epoch": 0.36857142857142855,
|
| 40666 |
+
"grad_norm": 0.08203125,
|
| 40667 |
+
"learning_rate": 0.1,
|
| 40668 |
+
"loss": 2.507827043533325,
|
| 40669 |
+
"step": 11610
|
| 40670 |
+
},
|
| 40671 |
+
{
|
| 40672 |
+
"epoch": 0.36863492063492065,
|
| 40673 |
+
"grad_norm": 0.296875,
|
| 40674 |
+
"learning_rate": 0.1,
|
| 40675 |
+
"loss": 2.4686498641967773,
|
| 40676 |
+
"step": 11612
|
| 40677 |
+
},
|
| 40678 |
+
{
|
| 40679 |
+
"epoch": 0.3686984126984127,
|
| 40680 |
+
"grad_norm": 0.50390625,
|
| 40681 |
+
"learning_rate": 0.1,
|
| 40682 |
+
"loss": 2.4797205924987793,
|
| 40683 |
+
"step": 11614
|
| 40684 |
+
},
|
| 40685 |
+
{
|
| 40686 |
+
"epoch": 0.36876190476190474,
|
| 40687 |
+
"grad_norm": 0.287109375,
|
| 40688 |
+
"learning_rate": 0.1,
|
| 40689 |
+
"loss": 2.5164663791656494,
|
| 40690 |
+
"step": 11616
|
| 40691 |
+
},
|
| 40692 |
+
{
|
| 40693 |
+
"epoch": 0.36882539682539683,
|
| 40694 |
+
"grad_norm": 0.09716796875,
|
| 40695 |
+
"learning_rate": 0.1,
|
| 40696 |
+
"loss": 2.491569757461548,
|
| 40697 |
+
"step": 11618
|
| 40698 |
+
},
|
| 40699 |
+
{
|
| 40700 |
+
"epoch": 0.3688888888888889,
|
| 40701 |
+
"grad_norm": 0.109375,
|
| 40702 |
+
"learning_rate": 0.1,
|
| 40703 |
+
"loss": 2.48038911819458,
|
| 40704 |
+
"step": 11620
|
| 40705 |
+
},
|
| 40706 |
+
{
|
| 40707 |
+
"epoch": 0.368952380952381,
|
| 40708 |
+
"grad_norm": 0.0576171875,
|
| 40709 |
+
"learning_rate": 0.1,
|
| 40710 |
+
"loss": 2.4934587478637695,
|
| 40711 |
+
"step": 11622
|
| 40712 |
+
},
|
| 40713 |
+
{
|
| 40714 |
+
"epoch": 0.369015873015873,
|
| 40715 |
+
"grad_norm": 0.056884765625,
|
| 40716 |
+
"learning_rate": 0.1,
|
| 40717 |
+
"loss": 2.460035800933838,
|
| 40718 |
+
"step": 11624
|
| 40719 |
+
},
|
| 40720 |
+
{
|
| 40721 |
+
"epoch": 0.36907936507936506,
|
| 40722 |
+
"grad_norm": 0.224609375,
|
| 40723 |
+
"learning_rate": 0.1,
|
| 40724 |
+
"loss": 2.473371982574463,
|
| 40725 |
+
"step": 11626
|
| 40726 |
+
},
|
| 40727 |
+
{
|
| 40728 |
+
"epoch": 0.36914285714285716,
|
| 40729 |
+
"grad_norm": 0.078125,
|
| 40730 |
+
"learning_rate": 0.1,
|
| 40731 |
+
"loss": 2.4780993461608887,
|
| 40732 |
+
"step": 11628
|
| 40733 |
+
},
|
| 40734 |
+
{
|
| 40735 |
+
"epoch": 0.3692063492063492,
|
| 40736 |
+
"grad_norm": 0.08740234375,
|
| 40737 |
+
"learning_rate": 0.1,
|
| 40738 |
+
"loss": 2.466709613800049,
|
| 40739 |
+
"step": 11630
|
| 40740 |
+
},
|
| 40741 |
+
{
|
| 40742 |
+
"epoch": 0.36926984126984125,
|
| 40743 |
+
"grad_norm": 0.12890625,
|
| 40744 |
+
"learning_rate": 0.1,
|
| 40745 |
+
"loss": 2.462557792663574,
|
| 40746 |
+
"step": 11632
|
| 40747 |
+
},
|
| 40748 |
+
{
|
| 40749 |
+
"epoch": 0.36933333333333335,
|
| 40750 |
+
"grad_norm": 0.08740234375,
|
| 40751 |
+
"learning_rate": 0.1,
|
| 40752 |
+
"loss": 2.4835760593414307,
|
| 40753 |
+
"step": 11634
|
| 40754 |
+
},
|
| 40755 |
+
{
|
| 40756 |
+
"epoch": 0.3693968253968254,
|
| 40757 |
+
"grad_norm": 0.09130859375,
|
| 40758 |
+
"learning_rate": 0.1,
|
| 40759 |
+
"loss": 2.4916019439697266,
|
| 40760 |
+
"step": 11636
|
| 40761 |
+
},
|
| 40762 |
+
{
|
| 40763 |
+
"epoch": 0.36946031746031743,
|
| 40764 |
+
"grad_norm": 0.1982421875,
|
| 40765 |
+
"learning_rate": 0.1,
|
| 40766 |
+
"loss": 2.480407476425171,
|
| 40767 |
+
"step": 11638
|
| 40768 |
+
},
|
| 40769 |
+
{
|
| 40770 |
+
"epoch": 0.36952380952380953,
|
| 40771 |
+
"grad_norm": 0.26171875,
|
| 40772 |
+
"learning_rate": 0.1,
|
| 40773 |
+
"loss": 2.482396364212036,
|
| 40774 |
+
"step": 11640
|
| 40775 |
+
},
|
| 40776 |
+
{
|
| 40777 |
+
"epoch": 0.3695873015873016,
|
| 40778 |
+
"grad_norm": 0.30078125,
|
| 40779 |
+
"learning_rate": 0.1,
|
| 40780 |
+
"loss": 2.4516971111297607,
|
| 40781 |
+
"step": 11642
|
| 40782 |
+
},
|
| 40783 |
+
{
|
| 40784 |
+
"epoch": 0.3696507936507937,
|
| 40785 |
+
"grad_norm": 0.490234375,
|
| 40786 |
+
"learning_rate": 0.1,
|
| 40787 |
+
"loss": 2.466508150100708,
|
| 40788 |
+
"step": 11644
|
| 40789 |
+
},
|
| 40790 |
+
{
|
| 40791 |
+
"epoch": 0.3697142857142857,
|
| 40792 |
+
"grad_norm": 0.169921875,
|
| 40793 |
+
"learning_rate": 0.1,
|
| 40794 |
+
"loss": 2.4519338607788086,
|
| 40795 |
+
"step": 11646
|
| 40796 |
+
},
|
| 40797 |
+
{
|
| 40798 |
+
"epoch": 0.36977777777777776,
|
| 40799 |
+
"grad_norm": 0.06005859375,
|
| 40800 |
+
"learning_rate": 0.1,
|
| 40801 |
+
"loss": 2.4561171531677246,
|
| 40802 |
+
"step": 11648
|
| 40803 |
+
},
|
| 40804 |
+
{
|
| 40805 |
+
"epoch": 0.36984126984126986,
|
| 40806 |
+
"grad_norm": 0.1748046875,
|
| 40807 |
+
"learning_rate": 0.1,
|
| 40808 |
+
"loss": 2.4761106967926025,
|
| 40809 |
+
"step": 11650
|
| 40810 |
+
},
|
| 40811 |
+
{
|
| 40812 |
+
"epoch": 0.3699047619047619,
|
| 40813 |
+
"grad_norm": 0.322265625,
|
| 40814 |
+
"learning_rate": 0.1,
|
| 40815 |
+
"loss": 2.445469856262207,
|
| 40816 |
+
"step": 11652
|
| 40817 |
+
},
|
| 40818 |
+
{
|
| 40819 |
+
"epoch": 0.36996825396825395,
|
| 40820 |
+
"grad_norm": 0.296875,
|
| 40821 |
+
"learning_rate": 0.1,
|
| 40822 |
+
"loss": 2.473306179046631,
|
| 40823 |
+
"step": 11654
|
| 40824 |
}
|
| 40825 |
],
|
| 40826 |
"logging_steps": 2,
|
|
|
|
| 40840 |
"attributes": {}
|
| 40841 |
}
|
| 40842 |
},
|
| 40843 |
+
"total_flos": 3.859984432812526e+19,
|
| 40844 |
"train_batch_size": 4,
|
| 40845 |
"trial_name": null,
|
| 40846 |
"trial_params": null
|