Text Generation
Transformers
Safetensors
qwen3
Generated from Trainer
unsloth
trl
sft
conversational
custom_code
text-generation-inference
Instructions to use Ba2han/model-sft-q2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ba2han/model-sft-q2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ba2han/model-sft-q2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Ba2han/model-sft-q2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Ba2han/model-sft-q2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ba2han/model-sft-q2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ba2han/model-sft-q2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/model-sft-q2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Ba2han/model-sft-q2
- SGLang
How to use Ba2han/model-sft-q2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ba2han/model-sft-q2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/model-sft-q2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ba2han/model-sft-q2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ba2han/model-sft-q2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use Ba2han/model-sft-q2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/model-sft-q2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ba2han/model-sft-q2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ba2han/model-sft-q2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Ba2han/model-sft-q2", max_seq_length=2048, ) - Docker Model Runner
How to use Ba2han/model-sft-q2 with Docker Model Runner:
docker model run hf.co/Ba2han/model-sft-q2
Training in progress, step 1602, checkpoint
Browse files
last-checkpoint/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1049614696
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ab338d32c65cbc500544ff4a625cf314dd9def11411364fc3db9c433b459bce2
|
| 3 |
size 1049614696
|
last-checkpoint/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1372902609
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7136556ff04c85f9a3ed66400c9c5ffc372d079098d1fdeb594771004a1b4ba5
|
| 3 |
size 1372902609
|
last-checkpoint/rng_state.pth
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 14645
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:181c5f0270cf39930062ddfa3767a2481d0c360f120b11f8e25dbf533a1cdaba
|
| 3 |
size 14645
|
last-checkpoint/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1465
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:76693c24eda9f0156815fd8848cbcd34e72eab813183473d3af3a56ebb9977ec
|
| 3 |
size 1465
|
last-checkpoint/trainer_state.json
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
-
"epoch": 0.
|
| 6 |
"eval_steps": 799,
|
| 7 |
-
"global_step":
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
@@ -9361,6 +9361,1883 @@
|
|
| 9361 |
"learning_rate": 7.571157495256167e-05,
|
| 9362 |
"loss": 1.875717282295227,
|
| 9363 |
"step": 1335
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9364 |
}
|
| 9365 |
],
|
| 9366 |
"logging_steps": 1,
|
|
@@ -9380,7 +11257,7 @@
|
|
| 9380 |
"attributes": {}
|
| 9381 |
}
|
| 9382 |
},
|
| 9383 |
-
"total_flos":
|
| 9384 |
"train_batch_size": 4,
|
| 9385 |
"trial_name": null,
|
| 9386 |
"trial_params": null
|
|
|
|
| 2 |
"best_global_step": null,
|
| 3 |
"best_metric": null,
|
| 4 |
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 0.1053306376054046,
|
| 6 |
"eval_steps": 799,
|
| 7 |
+
"global_step": 1602,
|
| 8 |
"is_hyper_param_search": false,
|
| 9 |
"is_local_process_zero": true,
|
| 10 |
"is_world_process_zero": true,
|
|
|
|
| 9361 |
"learning_rate": 7.571157495256167e-05,
|
| 9362 |
"loss": 1.875717282295227,
|
| 9363 |
"step": 1335
|
| 9364 |
+
},
|
| 9365 |
+
{
|
| 9366 |
+
"epoch": 0.08784128079951345,
|
| 9367 |
+
"grad_norm": 0.162109375,
|
| 9368 |
+
"learning_rate": 7.569259962049336e-05,
|
| 9369 |
+
"loss": 1.7527270317077637,
|
| 9370 |
+
"step": 1336
|
| 9371 |
+
},
|
| 9372 |
+
{
|
| 9373 |
+
"epoch": 0.08790703026118973,
|
| 9374 |
+
"grad_norm": 0.1796875,
|
| 9375 |
+
"learning_rate": 7.567362428842505e-05,
|
| 9376 |
+
"loss": 1.7787734270095825,
|
| 9377 |
+
"step": 1337
|
| 9378 |
+
},
|
| 9379 |
+
{
|
| 9380 |
+
"epoch": 0.08797277972286602,
|
| 9381 |
+
"grad_norm": 0.1669921875,
|
| 9382 |
+
"learning_rate": 7.565464895635674e-05,
|
| 9383 |
+
"loss": 1.754085898399353,
|
| 9384 |
+
"step": 1338
|
| 9385 |
+
},
|
| 9386 |
+
{
|
| 9387 |
+
"epoch": 0.0880385291845423,
|
| 9388 |
+
"grad_norm": 0.1806640625,
|
| 9389 |
+
"learning_rate": 7.563567362428843e-05,
|
| 9390 |
+
"loss": 1.7644147872924805,
|
| 9391 |
+
"step": 1339
|
| 9392 |
+
},
|
| 9393 |
+
{
|
| 9394 |
+
"epoch": 0.08810427864621859,
|
| 9395 |
+
"grad_norm": 0.1982421875,
|
| 9396 |
+
"learning_rate": 7.561669829222012e-05,
|
| 9397 |
+
"loss": 1.836362361907959,
|
| 9398 |
+
"step": 1340
|
| 9399 |
+
},
|
| 9400 |
+
{
|
| 9401 |
+
"epoch": 0.08817002810789487,
|
| 9402 |
+
"grad_norm": 0.1787109375,
|
| 9403 |
+
"learning_rate": 7.55977229601518e-05,
|
| 9404 |
+
"loss": 1.7424442768096924,
|
| 9405 |
+
"step": 1341
|
| 9406 |
+
},
|
| 9407 |
+
{
|
| 9408 |
+
"epoch": 0.08823577756957116,
|
| 9409 |
+
"grad_norm": 0.173828125,
|
| 9410 |
+
"learning_rate": 7.557874762808349e-05,
|
| 9411 |
+
"loss": 1.8069794178009033,
|
| 9412 |
+
"step": 1342
|
| 9413 |
+
},
|
| 9414 |
+
{
|
| 9415 |
+
"epoch": 0.08830152703124743,
|
| 9416 |
+
"grad_norm": 0.208984375,
|
| 9417 |
+
"learning_rate": 7.555977229601518e-05,
|
| 9418 |
+
"loss": 1.7618046998977661,
|
| 9419 |
+
"step": 1343
|
| 9420 |
+
},
|
| 9421 |
+
{
|
| 9422 |
+
"epoch": 0.08836727649292371,
|
| 9423 |
+
"grad_norm": 0.25390625,
|
| 9424 |
+
"learning_rate": 7.554079696394687e-05,
|
| 9425 |
+
"loss": 1.7213244438171387,
|
| 9426 |
+
"step": 1344
|
| 9427 |
+
},
|
| 9428 |
+
{
|
| 9429 |
+
"epoch": 0.0884330259546,
|
| 9430 |
+
"grad_norm": 0.2099609375,
|
| 9431 |
+
"learning_rate": 7.552182163187857e-05,
|
| 9432 |
+
"loss": 1.86086905002594,
|
| 9433 |
+
"step": 1345
|
| 9434 |
+
},
|
| 9435 |
+
{
|
| 9436 |
+
"epoch": 0.08849877541627628,
|
| 9437 |
+
"grad_norm": 0.1826171875,
|
| 9438 |
+
"learning_rate": 7.550284629981024e-05,
|
| 9439 |
+
"loss": 1.6897088289260864,
|
| 9440 |
+
"step": 1346
|
| 9441 |
+
},
|
| 9442 |
+
{
|
| 9443 |
+
"epoch": 0.08856452487795256,
|
| 9444 |
+
"grad_norm": 0.1669921875,
|
| 9445 |
+
"learning_rate": 7.548387096774195e-05,
|
| 9446 |
+
"loss": 1.7442395687103271,
|
| 9447 |
+
"step": 1347
|
| 9448 |
+
},
|
| 9449 |
+
{
|
| 9450 |
+
"epoch": 0.08863027433962885,
|
| 9451 |
+
"grad_norm": 0.197265625,
|
| 9452 |
+
"learning_rate": 7.546489563567363e-05,
|
| 9453 |
+
"loss": 1.7865397930145264,
|
| 9454 |
+
"step": 1348
|
| 9455 |
+
},
|
| 9456 |
+
{
|
| 9457 |
+
"epoch": 0.08869602380130513,
|
| 9458 |
+
"grad_norm": 0.18359375,
|
| 9459 |
+
"learning_rate": 7.544592030360532e-05,
|
| 9460 |
+
"loss": 1.7122347354888916,
|
| 9461 |
+
"step": 1349
|
| 9462 |
+
},
|
| 9463 |
+
{
|
| 9464 |
+
"epoch": 0.0887617732629814,
|
| 9465 |
+
"grad_norm": 0.1796875,
|
| 9466 |
+
"learning_rate": 7.542694497153701e-05,
|
| 9467 |
+
"loss": 1.8242549896240234,
|
| 9468 |
+
"step": 1350
|
| 9469 |
+
},
|
| 9470 |
+
{
|
| 9471 |
+
"epoch": 0.08882752272465769,
|
| 9472 |
+
"grad_norm": 0.1748046875,
|
| 9473 |
+
"learning_rate": 7.540796963946868e-05,
|
| 9474 |
+
"loss": 1.711504340171814,
|
| 9475 |
+
"step": 1351
|
| 9476 |
+
},
|
| 9477 |
+
{
|
| 9478 |
+
"epoch": 0.08889327218633397,
|
| 9479 |
+
"grad_norm": 0.1806640625,
|
| 9480 |
+
"learning_rate": 7.538899430740039e-05,
|
| 9481 |
+
"loss": 1.7520360946655273,
|
| 9482 |
+
"step": 1352
|
| 9483 |
+
},
|
| 9484 |
+
{
|
| 9485 |
+
"epoch": 0.08895902164801026,
|
| 9486 |
+
"grad_norm": 0.189453125,
|
| 9487 |
+
"learning_rate": 7.537001897533207e-05,
|
| 9488 |
+
"loss": 1.9256170988082886,
|
| 9489 |
+
"step": 1353
|
| 9490 |
+
},
|
| 9491 |
+
{
|
| 9492 |
+
"epoch": 0.08902477110968654,
|
| 9493 |
+
"grad_norm": 0.1826171875,
|
| 9494 |
+
"learning_rate": 7.535104364326376e-05,
|
| 9495 |
+
"loss": 1.7326505184173584,
|
| 9496 |
+
"step": 1354
|
| 9497 |
+
},
|
| 9498 |
+
{
|
| 9499 |
+
"epoch": 0.08909052057136282,
|
| 9500 |
+
"grad_norm": 0.166015625,
|
| 9501 |
+
"learning_rate": 7.533206831119545e-05,
|
| 9502 |
+
"loss": 1.6585073471069336,
|
| 9503 |
+
"step": 1355
|
| 9504 |
+
},
|
| 9505 |
+
{
|
| 9506 |
+
"epoch": 0.08915627003303911,
|
| 9507 |
+
"grad_norm": 0.220703125,
|
| 9508 |
+
"learning_rate": 7.531309297912714e-05,
|
| 9509 |
+
"loss": 1.8321764469146729,
|
| 9510 |
+
"step": 1356
|
| 9511 |
+
},
|
| 9512 |
+
{
|
| 9513 |
+
"epoch": 0.0892220194947154,
|
| 9514 |
+
"grad_norm": 0.1884765625,
|
| 9515 |
+
"learning_rate": 7.529411764705883e-05,
|
| 9516 |
+
"loss": 1.774332046508789,
|
| 9517 |
+
"step": 1357
|
| 9518 |
+
},
|
| 9519 |
+
{
|
| 9520 |
+
"epoch": 0.08928776895639166,
|
| 9521 |
+
"grad_norm": 0.20703125,
|
| 9522 |
+
"learning_rate": 7.527514231499052e-05,
|
| 9523 |
+
"loss": 1.7889446020126343,
|
| 9524 |
+
"step": 1358
|
| 9525 |
+
},
|
| 9526 |
+
{
|
| 9527 |
+
"epoch": 0.08935351841806795,
|
| 9528 |
+
"grad_norm": 0.208984375,
|
| 9529 |
+
"learning_rate": 7.52561669829222e-05,
|
| 9530 |
+
"loss": 1.7455190420150757,
|
| 9531 |
+
"step": 1359
|
| 9532 |
+
},
|
| 9533 |
+
{
|
| 9534 |
+
"epoch": 0.08941926787974423,
|
| 9535 |
+
"grad_norm": 0.201171875,
|
| 9536 |
+
"learning_rate": 7.523719165085389e-05,
|
| 9537 |
+
"loss": 1.8102771043777466,
|
| 9538 |
+
"step": 1360
|
| 9539 |
+
},
|
| 9540 |
+
{
|
| 9541 |
+
"epoch": 0.08948501734142052,
|
| 9542 |
+
"grad_norm": 0.1767578125,
|
| 9543 |
+
"learning_rate": 7.521821631878558e-05,
|
| 9544 |
+
"loss": 1.7873613834381104,
|
| 9545 |
+
"step": 1361
|
| 9546 |
+
},
|
| 9547 |
+
{
|
| 9548 |
+
"epoch": 0.0895507668030968,
|
| 9549 |
+
"grad_norm": 0.197265625,
|
| 9550 |
+
"learning_rate": 7.519924098671727e-05,
|
| 9551 |
+
"loss": 1.7841826677322388,
|
| 9552 |
+
"step": 1362
|
| 9553 |
+
},
|
| 9554 |
+
{
|
| 9555 |
+
"epoch": 0.08961651626477309,
|
| 9556 |
+
"grad_norm": 0.193359375,
|
| 9557 |
+
"learning_rate": 7.518026565464896e-05,
|
| 9558 |
+
"loss": 1.8560742139816284,
|
| 9559 |
+
"step": 1363
|
| 9560 |
+
},
|
| 9561 |
+
{
|
| 9562 |
+
"epoch": 0.08968226572644937,
|
| 9563 |
+
"grad_norm": 0.22265625,
|
| 9564 |
+
"learning_rate": 7.516129032258064e-05,
|
| 9565 |
+
"loss": 1.8234230279922485,
|
| 9566 |
+
"step": 1364
|
| 9567 |
+
},
|
| 9568 |
+
{
|
| 9569 |
+
"epoch": 0.08974801518812564,
|
| 9570 |
+
"grad_norm": 0.205078125,
|
| 9571 |
+
"learning_rate": 7.514231499051234e-05,
|
| 9572 |
+
"loss": 1.7481718063354492,
|
| 9573 |
+
"step": 1365
|
| 9574 |
+
},
|
| 9575 |
+
{
|
| 9576 |
+
"epoch": 0.08981376464980192,
|
| 9577 |
+
"grad_norm": 0.18359375,
|
| 9578 |
+
"learning_rate": 7.512333965844402e-05,
|
| 9579 |
+
"loss": 1.7564069032669067,
|
| 9580 |
+
"step": 1366
|
| 9581 |
+
},
|
| 9582 |
+
{
|
| 9583 |
+
"epoch": 0.08987951411147821,
|
| 9584 |
+
"grad_norm": 0.1728515625,
|
| 9585 |
+
"learning_rate": 7.510436432637573e-05,
|
| 9586 |
+
"loss": 1.7697902917861938,
|
| 9587 |
+
"step": 1367
|
| 9588 |
+
},
|
| 9589 |
+
{
|
| 9590 |
+
"epoch": 0.0899452635731545,
|
| 9591 |
+
"grad_norm": 0.1748046875,
|
| 9592 |
+
"learning_rate": 7.50853889943074e-05,
|
| 9593 |
+
"loss": 1.7558023929595947,
|
| 9594 |
+
"step": 1368
|
| 9595 |
+
},
|
| 9596 |
+
{
|
| 9597 |
+
"epoch": 0.09001101303483078,
|
| 9598 |
+
"grad_norm": 0.1708984375,
|
| 9599 |
+
"learning_rate": 7.506641366223908e-05,
|
| 9600 |
+
"loss": 1.845292568206787,
|
| 9601 |
+
"step": 1369
|
| 9602 |
+
},
|
| 9603 |
+
{
|
| 9604 |
+
"epoch": 0.09007676249650706,
|
| 9605 |
+
"grad_norm": 0.1953125,
|
| 9606 |
+
"learning_rate": 7.504743833017079e-05,
|
| 9607 |
+
"loss": 1.758579134941101,
|
| 9608 |
+
"step": 1370
|
| 9609 |
+
},
|
| 9610 |
+
{
|
| 9611 |
+
"epoch": 0.09014251195818335,
|
| 9612 |
+
"grad_norm": 0.18359375,
|
| 9613 |
+
"learning_rate": 7.502846299810246e-05,
|
| 9614 |
+
"loss": 1.7008386850357056,
|
| 9615 |
+
"step": 1371
|
| 9616 |
+
},
|
| 9617 |
+
{
|
| 9618 |
+
"epoch": 0.09020826141985963,
|
| 9619 |
+
"grad_norm": 0.203125,
|
| 9620 |
+
"learning_rate": 7.500948766603417e-05,
|
| 9621 |
+
"loss": 1.6727268695831299,
|
| 9622 |
+
"step": 1372
|
| 9623 |
+
},
|
| 9624 |
+
{
|
| 9625 |
+
"epoch": 0.0902740108815359,
|
| 9626 |
+
"grad_norm": 0.1875,
|
| 9627 |
+
"learning_rate": 7.499051233396585e-05,
|
| 9628 |
+
"loss": 1.7608578205108643,
|
| 9629 |
+
"step": 1373
|
| 9630 |
+
},
|
| 9631 |
+
{
|
| 9632 |
+
"epoch": 0.09033976034321219,
|
| 9633 |
+
"grad_norm": 0.1708984375,
|
| 9634 |
+
"learning_rate": 7.497153700189754e-05,
|
| 9635 |
+
"loss": 1.7583236694335938,
|
| 9636 |
+
"step": 1374
|
| 9637 |
+
},
|
| 9638 |
+
{
|
| 9639 |
+
"epoch": 0.09040550980488847,
|
| 9640 |
+
"grad_norm": 0.2197265625,
|
| 9641 |
+
"learning_rate": 7.495256166982923e-05,
|
| 9642 |
+
"loss": 1.8918523788452148,
|
| 9643 |
+
"step": 1375
|
| 9644 |
+
},
|
| 9645 |
+
{
|
| 9646 |
+
"epoch": 0.09047125926656475,
|
| 9647 |
+
"grad_norm": 0.2236328125,
|
| 9648 |
+
"learning_rate": 7.493358633776092e-05,
|
| 9649 |
+
"loss": 1.6583324670791626,
|
| 9650 |
+
"step": 1376
|
| 9651 |
+
},
|
| 9652 |
+
{
|
| 9653 |
+
"epoch": 0.09053700872824104,
|
| 9654 |
+
"grad_norm": 0.1826171875,
|
| 9655 |
+
"learning_rate": 7.491461100569261e-05,
|
| 9656 |
+
"loss": 1.7445127964019775,
|
| 9657 |
+
"step": 1377
|
| 9658 |
+
},
|
| 9659 |
+
{
|
| 9660 |
+
"epoch": 0.09060275818991732,
|
| 9661 |
+
"grad_norm": 0.185546875,
|
| 9662 |
+
"learning_rate": 7.489563567362429e-05,
|
| 9663 |
+
"loss": 1.7454389333724976,
|
| 9664 |
+
"step": 1378
|
| 9665 |
+
},
|
| 9666 |
+
{
|
| 9667 |
+
"epoch": 0.09066850765159361,
|
| 9668 |
+
"grad_norm": 0.2294921875,
|
| 9669 |
+
"learning_rate": 7.487666034155598e-05,
|
| 9670 |
+
"loss": 1.6798499822616577,
|
| 9671 |
+
"step": 1379
|
| 9672 |
+
},
|
| 9673 |
+
{
|
| 9674 |
+
"epoch": 0.09073425711326988,
|
| 9675 |
+
"grad_norm": 0.2197265625,
|
| 9676 |
+
"learning_rate": 7.485768500948767e-05,
|
| 9677 |
+
"loss": 1.6668037176132202,
|
| 9678 |
+
"step": 1380
|
| 9679 |
+
},
|
| 9680 |
+
{
|
| 9681 |
+
"epoch": 0.09080000657494616,
|
| 9682 |
+
"grad_norm": 0.19140625,
|
| 9683 |
+
"learning_rate": 7.483870967741936e-05,
|
| 9684 |
+
"loss": 1.833579182624817,
|
| 9685 |
+
"step": 1381
|
| 9686 |
+
},
|
| 9687 |
+
{
|
| 9688 |
+
"epoch": 0.09086575603662245,
|
| 9689 |
+
"grad_norm": 0.2109375,
|
| 9690 |
+
"learning_rate": 7.481973434535105e-05,
|
| 9691 |
+
"loss": 1.7775713205337524,
|
| 9692 |
+
"step": 1382
|
| 9693 |
+
},
|
| 9694 |
+
{
|
| 9695 |
+
"epoch": 0.09093150549829873,
|
| 9696 |
+
"grad_norm": 0.2197265625,
|
| 9697 |
+
"learning_rate": 7.480075901328274e-05,
|
| 9698 |
+
"loss": 1.8585375547409058,
|
| 9699 |
+
"step": 1383
|
| 9700 |
+
},
|
| 9701 |
+
{
|
| 9702 |
+
"epoch": 0.09099725495997502,
|
| 9703 |
+
"grad_norm": 0.1796875,
|
| 9704 |
+
"learning_rate": 7.478178368121442e-05,
|
| 9705 |
+
"loss": 1.6736737489700317,
|
| 9706 |
+
"step": 1384
|
| 9707 |
+
},
|
| 9708 |
+
{
|
| 9709 |
+
"epoch": 0.0910630044216513,
|
| 9710 |
+
"grad_norm": 0.1865234375,
|
| 9711 |
+
"learning_rate": 7.476280834914612e-05,
|
| 9712 |
+
"loss": 1.7602424621582031,
|
| 9713 |
+
"step": 1385
|
| 9714 |
+
},
|
| 9715 |
+
{
|
| 9716 |
+
"epoch": 0.09112875388332758,
|
| 9717 |
+
"grad_norm": 0.212890625,
|
| 9718 |
+
"learning_rate": 7.47438330170778e-05,
|
| 9719 |
+
"loss": 1.7911678552627563,
|
| 9720 |
+
"step": 1386
|
| 9721 |
+
},
|
| 9722 |
+
{
|
| 9723 |
+
"epoch": 0.09119450334500387,
|
| 9724 |
+
"grad_norm": 0.17578125,
|
| 9725 |
+
"learning_rate": 7.472485768500949e-05,
|
| 9726 |
+
"loss": 1.754852533340454,
|
| 9727 |
+
"step": 1387
|
| 9728 |
+
},
|
| 9729 |
+
{
|
| 9730 |
+
"epoch": 0.09126025280668014,
|
| 9731 |
+
"grad_norm": 0.2177734375,
|
| 9732 |
+
"learning_rate": 7.470588235294118e-05,
|
| 9733 |
+
"loss": 1.9604911804199219,
|
| 9734 |
+
"step": 1388
|
| 9735 |
+
},
|
| 9736 |
+
{
|
| 9737 |
+
"epoch": 0.09132600226835642,
|
| 9738 |
+
"grad_norm": 0.2138671875,
|
| 9739 |
+
"learning_rate": 7.468690702087286e-05,
|
| 9740 |
+
"loss": 1.7786387205123901,
|
| 9741 |
+
"step": 1389
|
| 9742 |
+
},
|
| 9743 |
+
{
|
| 9744 |
+
"epoch": 0.09139175173003271,
|
| 9745 |
+
"grad_norm": 0.2109375,
|
| 9746 |
+
"learning_rate": 7.466793168880457e-05,
|
| 9747 |
+
"loss": 1.755190134048462,
|
| 9748 |
+
"step": 1390
|
| 9749 |
+
},
|
| 9750 |
+
{
|
| 9751 |
+
"epoch": 0.09145750119170899,
|
| 9752 |
+
"grad_norm": 0.1806640625,
|
| 9753 |
+
"learning_rate": 7.464895635673624e-05,
|
| 9754 |
+
"loss": 1.7109942436218262,
|
| 9755 |
+
"step": 1391
|
| 9756 |
+
},
|
| 9757 |
+
{
|
| 9758 |
+
"epoch": 0.09152325065338528,
|
| 9759 |
+
"grad_norm": 0.1845703125,
|
| 9760 |
+
"learning_rate": 7.462998102466793e-05,
|
| 9761 |
+
"loss": 1.7900867462158203,
|
| 9762 |
+
"step": 1392
|
| 9763 |
+
},
|
| 9764 |
+
{
|
| 9765 |
+
"epoch": 0.09158900011506156,
|
| 9766 |
+
"grad_norm": 0.1982421875,
|
| 9767 |
+
"learning_rate": 7.461100569259962e-05,
|
| 9768 |
+
"loss": 1.798643708229065,
|
| 9769 |
+
"step": 1393
|
| 9770 |
+
},
|
| 9771 |
+
{
|
| 9772 |
+
"epoch": 0.09165474957673785,
|
| 9773 |
+
"grad_norm": 0.1748046875,
|
| 9774 |
+
"learning_rate": 7.459203036053132e-05,
|
| 9775 |
+
"loss": 1.8416310548782349,
|
| 9776 |
+
"step": 1394
|
| 9777 |
+
},
|
| 9778 |
+
{
|
| 9779 |
+
"epoch": 0.09172049903841412,
|
| 9780 |
+
"grad_norm": 0.18359375,
|
| 9781 |
+
"learning_rate": 7.457305502846301e-05,
|
| 9782 |
+
"loss": 1.7465572357177734,
|
| 9783 |
+
"step": 1395
|
| 9784 |
+
},
|
| 9785 |
+
{
|
| 9786 |
+
"epoch": 0.0917862485000904,
|
| 9787 |
+
"grad_norm": 0.2001953125,
|
| 9788 |
+
"learning_rate": 7.455407969639468e-05,
|
| 9789 |
+
"loss": 1.731363296508789,
|
| 9790 |
+
"step": 1396
|
| 9791 |
+
},
|
| 9792 |
+
{
|
| 9793 |
+
"epoch": 0.09185199796176668,
|
| 9794 |
+
"grad_norm": 0.177734375,
|
| 9795 |
+
"learning_rate": 7.453510436432638e-05,
|
| 9796 |
+
"loss": 1.794449806213379,
|
| 9797 |
+
"step": 1397
|
| 9798 |
+
},
|
| 9799 |
+
{
|
| 9800 |
+
"epoch": 0.09191774742344297,
|
| 9801 |
+
"grad_norm": 0.1669921875,
|
| 9802 |
+
"learning_rate": 7.451612903225807e-05,
|
| 9803 |
+
"loss": 1.8196163177490234,
|
| 9804 |
+
"step": 1398
|
| 9805 |
+
},
|
| 9806 |
+
{
|
| 9807 |
+
"epoch": 0.09198349688511925,
|
| 9808 |
+
"grad_norm": 0.1787109375,
|
| 9809 |
+
"learning_rate": 7.449715370018976e-05,
|
| 9810 |
+
"loss": 1.809367299079895,
|
| 9811 |
+
"step": 1399
|
| 9812 |
+
},
|
| 9813 |
+
{
|
| 9814 |
+
"epoch": 0.09204924634679554,
|
| 9815 |
+
"grad_norm": 0.171875,
|
| 9816 |
+
"learning_rate": 7.447817836812145e-05,
|
| 9817 |
+
"loss": 1.7287137508392334,
|
| 9818 |
+
"step": 1400
|
| 9819 |
+
},
|
| 9820 |
+
{
|
| 9821 |
+
"epoch": 0.09211499580847182,
|
| 9822 |
+
"grad_norm": 0.189453125,
|
| 9823 |
+
"learning_rate": 7.445920303605314e-05,
|
| 9824 |
+
"loss": 1.730889916419983,
|
| 9825 |
+
"step": 1401
|
| 9826 |
+
},
|
| 9827 |
+
{
|
| 9828 |
+
"epoch": 0.0921807452701481,
|
| 9829 |
+
"grad_norm": 0.1962890625,
|
| 9830 |
+
"learning_rate": 7.444022770398482e-05,
|
| 9831 |
+
"loss": 1.7207967042922974,
|
| 9832 |
+
"step": 1402
|
| 9833 |
+
},
|
| 9834 |
+
{
|
| 9835 |
+
"epoch": 0.09224649473182438,
|
| 9836 |
+
"grad_norm": 0.1787109375,
|
| 9837 |
+
"learning_rate": 7.442125237191652e-05,
|
| 9838 |
+
"loss": 1.8299708366394043,
|
| 9839 |
+
"step": 1403
|
| 9840 |
+
},
|
| 9841 |
+
{
|
| 9842 |
+
"epoch": 0.09231224419350066,
|
| 9843 |
+
"grad_norm": 0.1953125,
|
| 9844 |
+
"learning_rate": 7.44022770398482e-05,
|
| 9845 |
+
"loss": 1.7028698921203613,
|
| 9846 |
+
"step": 1404
|
| 9847 |
+
},
|
| 9848 |
+
{
|
| 9849 |
+
"epoch": 0.09237799365517695,
|
| 9850 |
+
"grad_norm": 0.2314453125,
|
| 9851 |
+
"learning_rate": 7.438330170777989e-05,
|
| 9852 |
+
"loss": 1.755666732788086,
|
| 9853 |
+
"step": 1405
|
| 9854 |
+
},
|
| 9855 |
+
{
|
| 9856 |
+
"epoch": 0.09244374311685323,
|
| 9857 |
+
"grad_norm": 0.189453125,
|
| 9858 |
+
"learning_rate": 7.436432637571158e-05,
|
| 9859 |
+
"loss": 1.8222557306289673,
|
| 9860 |
+
"step": 1406
|
| 9861 |
+
},
|
| 9862 |
+
{
|
| 9863 |
+
"epoch": 0.09250949257852951,
|
| 9864 |
+
"grad_norm": 0.2109375,
|
| 9865 |
+
"learning_rate": 7.434535104364326e-05,
|
| 9866 |
+
"loss": 1.8097777366638184,
|
| 9867 |
+
"step": 1407
|
| 9868 |
+
},
|
| 9869 |
+
{
|
| 9870 |
+
"epoch": 0.0925752420402058,
|
| 9871 |
+
"grad_norm": 0.1708984375,
|
| 9872 |
+
"learning_rate": 7.432637571157496e-05,
|
| 9873 |
+
"loss": 1.743056058883667,
|
| 9874 |
+
"step": 1408
|
| 9875 |
+
},
|
| 9876 |
+
{
|
| 9877 |
+
"epoch": 0.09264099150188208,
|
| 9878 |
+
"grad_norm": 0.169921875,
|
| 9879 |
+
"learning_rate": 7.430740037950664e-05,
|
| 9880 |
+
"loss": 1.767404317855835,
|
| 9881 |
+
"step": 1409
|
| 9882 |
+
},
|
| 9883 |
+
{
|
| 9884 |
+
"epoch": 0.09270674096355837,
|
| 9885 |
+
"grad_norm": 0.1689453125,
|
| 9886 |
+
"learning_rate": 7.428842504743833e-05,
|
| 9887 |
+
"loss": 1.7809104919433594,
|
| 9888 |
+
"step": 1410
|
| 9889 |
+
},
|
| 9890 |
+
{
|
| 9891 |
+
"epoch": 0.09277249042523464,
|
| 9892 |
+
"grad_norm": 0.18359375,
|
| 9893 |
+
"learning_rate": 7.426944971537002e-05,
|
| 9894 |
+
"loss": 1.754618763923645,
|
| 9895 |
+
"step": 1411
|
| 9896 |
+
},
|
| 9897 |
+
{
|
| 9898 |
+
"epoch": 0.09283823988691092,
|
| 9899 |
+
"grad_norm": 0.19140625,
|
| 9900 |
+
"learning_rate": 7.425047438330171e-05,
|
| 9901 |
+
"loss": 1.7636806964874268,
|
| 9902 |
+
"step": 1412
|
| 9903 |
+
},
|
| 9904 |
+
{
|
| 9905 |
+
"epoch": 0.09290398934858721,
|
| 9906 |
+
"grad_norm": 0.1708984375,
|
| 9907 |
+
"learning_rate": 7.42314990512334e-05,
|
| 9908 |
+
"loss": 1.749414324760437,
|
| 9909 |
+
"step": 1413
|
| 9910 |
+
},
|
| 9911 |
+
{
|
| 9912 |
+
"epoch": 0.09296973881026349,
|
| 9913 |
+
"grad_norm": 0.1611328125,
|
| 9914 |
+
"learning_rate": 7.421252371916508e-05,
|
| 9915 |
+
"loss": 1.6943576335906982,
|
| 9916 |
+
"step": 1414
|
| 9917 |
+
},
|
| 9918 |
+
{
|
| 9919 |
+
"epoch": 0.09303548827193978,
|
| 9920 |
+
"grad_norm": 0.1767578125,
|
| 9921 |
+
"learning_rate": 7.419354838709677e-05,
|
| 9922 |
+
"loss": 1.7116183042526245,
|
| 9923 |
+
"step": 1415
|
| 9924 |
+
},
|
| 9925 |
+
{
|
| 9926 |
+
"epoch": 0.09310123773361606,
|
| 9927 |
+
"grad_norm": 0.1748046875,
|
| 9928 |
+
"learning_rate": 7.417457305502846e-05,
|
| 9929 |
+
"loss": 1.899487018585205,
|
| 9930 |
+
"step": 1416
|
| 9931 |
+
},
|
| 9932 |
+
{
|
| 9933 |
+
"epoch": 0.09316698719529234,
|
| 9934 |
+
"grad_norm": 0.2001953125,
|
| 9935 |
+
"learning_rate": 7.415559772296015e-05,
|
| 9936 |
+
"loss": 1.804649829864502,
|
| 9937 |
+
"step": 1417
|
| 9938 |
+
},
|
| 9939 |
+
{
|
| 9940 |
+
"epoch": 0.09323273665696862,
|
| 9941 |
+
"grad_norm": 0.181640625,
|
| 9942 |
+
"learning_rate": 7.413662239089185e-05,
|
| 9943 |
+
"loss": 1.881633996963501,
|
| 9944 |
+
"step": 1418
|
| 9945 |
+
},
|
| 9946 |
+
{
|
| 9947 |
+
"epoch": 0.0932984861186449,
|
| 9948 |
+
"grad_norm": 0.177734375,
|
| 9949 |
+
"learning_rate": 7.411764705882354e-05,
|
| 9950 |
+
"loss": 1.711790680885315,
|
| 9951 |
+
"step": 1419
|
| 9952 |
+
},
|
| 9953 |
+
{
|
| 9954 |
+
"epoch": 0.09336423558032118,
|
| 9955 |
+
"grad_norm": 0.185546875,
|
| 9956 |
+
"learning_rate": 7.409867172675521e-05,
|
| 9957 |
+
"loss": 1.762755274772644,
|
| 9958 |
+
"step": 1420
|
| 9959 |
+
},
|
| 9960 |
+
{
|
| 9961 |
+
"epoch": 0.09342998504199747,
|
| 9962 |
+
"grad_norm": 0.1796875,
|
| 9963 |
+
"learning_rate": 7.407969639468692e-05,
|
| 9964 |
+
"loss": 1.6768829822540283,
|
| 9965 |
+
"step": 1421
|
| 9966 |
+
},
|
| 9967 |
+
{
|
| 9968 |
+
"epoch": 0.09349573450367375,
|
| 9969 |
+
"grad_norm": 0.1787109375,
|
| 9970 |
+
"learning_rate": 7.40607210626186e-05,
|
| 9971 |
+
"loss": 1.7820616960525513,
|
| 9972 |
+
"step": 1422
|
| 9973 |
+
},
|
| 9974 |
+
{
|
| 9975 |
+
"epoch": 0.09356148396535004,
|
| 9976 |
+
"grad_norm": 0.16015625,
|
| 9977 |
+
"learning_rate": 7.404174573055029e-05,
|
| 9978 |
+
"loss": 1.7190353870391846,
|
| 9979 |
+
"step": 1423
|
| 9980 |
+
},
|
| 9981 |
+
{
|
| 9982 |
+
"epoch": 0.09362723342702632,
|
| 9983 |
+
"grad_norm": 0.1953125,
|
| 9984 |
+
"learning_rate": 7.402277039848198e-05,
|
| 9985 |
+
"loss": 1.7924084663391113,
|
| 9986 |
+
"step": 1424
|
| 9987 |
+
},
|
| 9988 |
+
{
|
| 9989 |
+
"epoch": 0.0936929828887026,
|
| 9990 |
+
"grad_norm": 0.193359375,
|
| 9991 |
+
"learning_rate": 7.400379506641366e-05,
|
| 9992 |
+
"loss": 1.7938555479049683,
|
| 9993 |
+
"step": 1425
|
| 9994 |
+
},
|
| 9995 |
+
{
|
| 9996 |
+
"epoch": 0.09375873235037888,
|
| 9997 |
+
"grad_norm": 0.1630859375,
|
| 9998 |
+
"learning_rate": 7.398481973434536e-05,
|
| 9999 |
+
"loss": 1.7281429767608643,
|
| 10000 |
+
"step": 1426
|
| 10001 |
+
},
|
| 10002 |
+
{
|
| 10003 |
+
"epoch": 0.09382448181205516,
|
| 10004 |
+
"grad_norm": 0.1689453125,
|
| 10005 |
+
"learning_rate": 7.396584440227704e-05,
|
| 10006 |
+
"loss": 1.7515310049057007,
|
| 10007 |
+
"step": 1427
|
| 10008 |
+
},
|
| 10009 |
+
{
|
| 10010 |
+
"epoch": 0.09389023127373144,
|
| 10011 |
+
"grad_norm": 0.193359375,
|
| 10012 |
+
"learning_rate": 7.394686907020874e-05,
|
| 10013 |
+
"loss": 1.7627710103988647,
|
| 10014 |
+
"step": 1428
|
| 10015 |
+
},
|
| 10016 |
+
{
|
| 10017 |
+
"epoch": 0.09395598073540773,
|
| 10018 |
+
"grad_norm": 0.1796875,
|
| 10019 |
+
"learning_rate": 7.392789373814042e-05,
|
| 10020 |
+
"loss": 1.8321844339370728,
|
| 10021 |
+
"step": 1429
|
| 10022 |
+
},
|
| 10023 |
+
{
|
| 10024 |
+
"epoch": 0.09402173019708401,
|
| 10025 |
+
"grad_norm": 0.1796875,
|
| 10026 |
+
"learning_rate": 7.390891840607211e-05,
|
| 10027 |
+
"loss": 1.756172776222229,
|
| 10028 |
+
"step": 1430
|
| 10029 |
+
},
|
| 10030 |
+
{
|
| 10031 |
+
"epoch": 0.0940874796587603,
|
| 10032 |
+
"grad_norm": 0.1884765625,
|
| 10033 |
+
"learning_rate": 7.38899430740038e-05,
|
| 10034 |
+
"loss": 1.7920876741409302,
|
| 10035 |
+
"step": 1431
|
| 10036 |
+
},
|
| 10037 |
+
{
|
| 10038 |
+
"epoch": 0.09415322912043658,
|
| 10039 |
+
"grad_norm": 0.2158203125,
|
| 10040 |
+
"learning_rate": 7.387096774193549e-05,
|
| 10041 |
+
"loss": 1.7303719520568848,
|
| 10042 |
+
"step": 1432
|
| 10043 |
+
},
|
| 10044 |
+
{
|
| 10045 |
+
"epoch": 0.09421897858211285,
|
| 10046 |
+
"grad_norm": 0.1767578125,
|
| 10047 |
+
"learning_rate": 7.385199240986718e-05,
|
| 10048 |
+
"loss": 1.8264763355255127,
|
| 10049 |
+
"step": 1433
|
| 10050 |
+
},
|
| 10051 |
+
{
|
| 10052 |
+
"epoch": 0.09428472804378914,
|
| 10053 |
+
"grad_norm": 0.189453125,
|
| 10054 |
+
"learning_rate": 7.383301707779886e-05,
|
| 10055 |
+
"loss": 1.7400777339935303,
|
| 10056 |
+
"step": 1434
|
| 10057 |
+
},
|
| 10058 |
+
{
|
| 10059 |
+
"epoch": 0.09435047750546542,
|
| 10060 |
+
"grad_norm": 0.2001953125,
|
| 10061 |
+
"learning_rate": 7.381404174573055e-05,
|
| 10062 |
+
"loss": 1.7014490365982056,
|
| 10063 |
+
"step": 1435
|
| 10064 |
+
},
|
| 10065 |
+
{
|
| 10066 |
+
"epoch": 0.0944162269671417,
|
| 10067 |
+
"grad_norm": 0.185546875,
|
| 10068 |
+
"learning_rate": 7.379506641366224e-05,
|
| 10069 |
+
"loss": 1.684818148612976,
|
| 10070 |
+
"step": 1436
|
| 10071 |
+
},
|
| 10072 |
+
{
|
| 10073 |
+
"epoch": 0.09448197642881799,
|
| 10074 |
+
"grad_norm": 0.173828125,
|
| 10075 |
+
"learning_rate": 7.377609108159393e-05,
|
| 10076 |
+
"loss": 1.824100136756897,
|
| 10077 |
+
"step": 1437
|
| 10078 |
+
},
|
| 10079 |
+
{
|
| 10080 |
+
"epoch": 0.09454772589049427,
|
| 10081 |
+
"grad_norm": 0.1630859375,
|
| 10082 |
+
"learning_rate": 7.375711574952562e-05,
|
| 10083 |
+
"loss": 1.764764428138733,
|
| 10084 |
+
"step": 1438
|
| 10085 |
+
},
|
| 10086 |
+
{
|
| 10087 |
+
"epoch": 0.09461347535217056,
|
| 10088 |
+
"grad_norm": 0.1884765625,
|
| 10089 |
+
"learning_rate": 7.373814041745732e-05,
|
| 10090 |
+
"loss": 1.7814570665359497,
|
| 10091 |
+
"step": 1439
|
| 10092 |
+
},
|
| 10093 |
+
{
|
| 10094 |
+
"epoch": 0.09467922481384684,
|
| 10095 |
+
"grad_norm": 0.1728515625,
|
| 10096 |
+
"learning_rate": 7.371916508538899e-05,
|
| 10097 |
+
"loss": 1.6833386421203613,
|
| 10098 |
+
"step": 1440
|
| 10099 |
+
},
|
| 10100 |
+
{
|
| 10101 |
+
"epoch": 0.09474497427552311,
|
| 10102 |
+
"grad_norm": 0.173828125,
|
| 10103 |
+
"learning_rate": 7.37001897533207e-05,
|
| 10104 |
+
"loss": 1.7733638286590576,
|
| 10105 |
+
"step": 1441
|
| 10106 |
+
},
|
| 10107 |
+
{
|
| 10108 |
+
"epoch": 0.0948107237371994,
|
| 10109 |
+
"grad_norm": 0.1767578125,
|
| 10110 |
+
"learning_rate": 7.368121442125238e-05,
|
| 10111 |
+
"loss": 1.7373719215393066,
|
| 10112 |
+
"step": 1442
|
| 10113 |
+
},
|
| 10114 |
+
{
|
| 10115 |
+
"epoch": 0.09487647319887568,
|
| 10116 |
+
"grad_norm": 0.2021484375,
|
| 10117 |
+
"learning_rate": 7.366223908918407e-05,
|
| 10118 |
+
"loss": 1.7804597616195679,
|
| 10119 |
+
"step": 1443
|
| 10120 |
+
},
|
| 10121 |
+
{
|
| 10122 |
+
"epoch": 0.09494222266055197,
|
| 10123 |
+
"grad_norm": 0.1787109375,
|
| 10124 |
+
"learning_rate": 7.364326375711576e-05,
|
| 10125 |
+
"loss": 1.7043981552124023,
|
| 10126 |
+
"step": 1444
|
| 10127 |
+
},
|
| 10128 |
+
{
|
| 10129 |
+
"epoch": 0.09500797212222825,
|
| 10130 |
+
"grad_norm": 0.18359375,
|
| 10131 |
+
"learning_rate": 7.362428842504743e-05,
|
| 10132 |
+
"loss": 1.84101140499115,
|
| 10133 |
+
"step": 1445
|
| 10134 |
+
},
|
| 10135 |
+
{
|
| 10136 |
+
"epoch": 0.09507372158390454,
|
| 10137 |
+
"grad_norm": 0.1796875,
|
| 10138 |
+
"learning_rate": 7.360531309297914e-05,
|
| 10139 |
+
"loss": 1.8040908575057983,
|
| 10140 |
+
"step": 1446
|
| 10141 |
+
},
|
| 10142 |
+
{
|
| 10143 |
+
"epoch": 0.09513947104558082,
|
| 10144 |
+
"grad_norm": 0.1640625,
|
| 10145 |
+
"learning_rate": 7.358633776091082e-05,
|
| 10146 |
+
"loss": 1.7962355613708496,
|
| 10147 |
+
"step": 1447
|
| 10148 |
+
},
|
| 10149 |
+
{
|
| 10150 |
+
"epoch": 0.09520522050725709,
|
| 10151 |
+
"grad_norm": 0.181640625,
|
| 10152 |
+
"learning_rate": 7.356736242884251e-05,
|
| 10153 |
+
"loss": 1.7433644533157349,
|
| 10154 |
+
"step": 1448
|
| 10155 |
+
},
|
| 10156 |
+
{
|
| 10157 |
+
"epoch": 0.09527096996893337,
|
| 10158 |
+
"grad_norm": 0.2001953125,
|
| 10159 |
+
"learning_rate": 7.35483870967742e-05,
|
| 10160 |
+
"loss": 1.8856478929519653,
|
| 10161 |
+
"step": 1449
|
| 10162 |
+
},
|
| 10163 |
+
{
|
| 10164 |
+
"epoch": 0.09533671943060966,
|
| 10165 |
+
"grad_norm": 0.19140625,
|
| 10166 |
+
"learning_rate": 7.352941176470589e-05,
|
| 10167 |
+
"loss": 1.8307948112487793,
|
| 10168 |
+
"step": 1450
|
| 10169 |
+
},
|
| 10170 |
+
{
|
| 10171 |
+
"epoch": 0.09540246889228594,
|
| 10172 |
+
"grad_norm": 0.171875,
|
| 10173 |
+
"learning_rate": 7.351043643263758e-05,
|
| 10174 |
+
"loss": 1.6885976791381836,
|
| 10175 |
+
"step": 1451
|
| 10176 |
+
},
|
| 10177 |
+
{
|
| 10178 |
+
"epoch": 0.09546821835396223,
|
| 10179 |
+
"grad_norm": 0.208984375,
|
| 10180 |
+
"learning_rate": 7.349146110056926e-05,
|
| 10181 |
+
"loss": 1.759564995765686,
|
| 10182 |
+
"step": 1452
|
| 10183 |
+
},
|
| 10184 |
+
{
|
| 10185 |
+
"epoch": 0.09553396781563851,
|
| 10186 |
+
"grad_norm": 0.2041015625,
|
| 10187 |
+
"learning_rate": 7.347248576850095e-05,
|
| 10188 |
+
"loss": 1.8143390417099,
|
| 10189 |
+
"step": 1453
|
| 10190 |
+
},
|
| 10191 |
+
{
|
| 10192 |
+
"epoch": 0.0955997172773148,
|
| 10193 |
+
"grad_norm": 0.1689453125,
|
| 10194 |
+
"learning_rate": 7.345351043643264e-05,
|
| 10195 |
+
"loss": 1.7538601160049438,
|
| 10196 |
+
"step": 1454
|
| 10197 |
+
},
|
| 10198 |
+
{
|
| 10199 |
+
"epoch": 0.09566546673899108,
|
| 10200 |
+
"grad_norm": 0.203125,
|
| 10201 |
+
"learning_rate": 7.343453510436433e-05,
|
| 10202 |
+
"loss": 1.6588038206100464,
|
| 10203 |
+
"step": 1455
|
| 10204 |
+
},
|
| 10205 |
+
{
|
| 10206 |
+
"epoch": 0.09573121620066735,
|
| 10207 |
+
"grad_norm": 0.1796875,
|
| 10208 |
+
"learning_rate": 7.341555977229602e-05,
|
| 10209 |
+
"loss": 1.8485676050186157,
|
| 10210 |
+
"step": 1456
|
| 10211 |
+
},
|
| 10212 |
+
{
|
| 10213 |
+
"epoch": 0.09579696566234364,
|
| 10214 |
+
"grad_norm": 0.251953125,
|
| 10215 |
+
"learning_rate": 7.339658444022771e-05,
|
| 10216 |
+
"loss": 1.6846543550491333,
|
| 10217 |
+
"step": 1457
|
| 10218 |
+
},
|
| 10219 |
+
{
|
| 10220 |
+
"epoch": 0.09586271512401992,
|
| 10221 |
+
"grad_norm": 0.1982421875,
|
| 10222 |
+
"learning_rate": 7.337760910815939e-05,
|
| 10223 |
+
"loss": 1.7926013469696045,
|
| 10224 |
+
"step": 1458
|
| 10225 |
+
},
|
| 10226 |
+
{
|
| 10227 |
+
"epoch": 0.0959284645856962,
|
| 10228 |
+
"grad_norm": 0.19921875,
|
| 10229 |
+
"learning_rate": 7.33586337760911e-05,
|
| 10230 |
+
"loss": 1.8280398845672607,
|
| 10231 |
+
"step": 1459
|
| 10232 |
+
},
|
| 10233 |
+
{
|
| 10234 |
+
"epoch": 0.09599421404737249,
|
| 10235 |
+
"grad_norm": 0.1943359375,
|
| 10236 |
+
"learning_rate": 7.333965844402277e-05,
|
| 10237 |
+
"loss": 1.7489094734191895,
|
| 10238 |
+
"step": 1460
|
| 10239 |
+
},
|
| 10240 |
+
{
|
| 10241 |
+
"epoch": 0.09605996350904877,
|
| 10242 |
+
"grad_norm": 0.1806640625,
|
| 10243 |
+
"learning_rate": 7.332068311195446e-05,
|
| 10244 |
+
"loss": 1.79719877243042,
|
| 10245 |
+
"step": 1461
|
| 10246 |
+
},
|
| 10247 |
+
{
|
| 10248 |
+
"epoch": 0.09612571297072506,
|
| 10249 |
+
"grad_norm": 0.26953125,
|
| 10250 |
+
"learning_rate": 7.330170777988615e-05,
|
| 10251 |
+
"loss": 1.8896360397338867,
|
| 10252 |
+
"step": 1462
|
| 10253 |
+
},
|
| 10254 |
+
{
|
| 10255 |
+
"epoch": 0.09619146243240133,
|
| 10256 |
+
"grad_norm": 0.173828125,
|
| 10257 |
+
"learning_rate": 7.328273244781783e-05,
|
| 10258 |
+
"loss": 1.8005951642990112,
|
| 10259 |
+
"step": 1463
|
| 10260 |
+
},
|
| 10261 |
+
{
|
| 10262 |
+
"epoch": 0.09625721189407761,
|
| 10263 |
+
"grad_norm": 0.181640625,
|
| 10264 |
+
"learning_rate": 7.326375711574954e-05,
|
| 10265 |
+
"loss": 1.7644914388656616,
|
| 10266 |
+
"step": 1464
|
| 10267 |
+
},
|
| 10268 |
+
{
|
| 10269 |
+
"epoch": 0.0963229613557539,
|
| 10270 |
+
"grad_norm": 0.189453125,
|
| 10271 |
+
"learning_rate": 7.324478178368121e-05,
|
| 10272 |
+
"loss": 1.758145809173584,
|
| 10273 |
+
"step": 1465
|
| 10274 |
+
},
|
| 10275 |
+
{
|
| 10276 |
+
"epoch": 0.09638871081743018,
|
| 10277 |
+
"grad_norm": 0.189453125,
|
| 10278 |
+
"learning_rate": 7.32258064516129e-05,
|
| 10279 |
+
"loss": 1.8068387508392334,
|
| 10280 |
+
"step": 1466
|
| 10281 |
+
},
|
| 10282 |
+
{
|
| 10283 |
+
"epoch": 0.09645446027910647,
|
| 10284 |
+
"grad_norm": 0.1904296875,
|
| 10285 |
+
"learning_rate": 7.32068311195446e-05,
|
| 10286 |
+
"loss": 1.7152271270751953,
|
| 10287 |
+
"step": 1467
|
| 10288 |
+
},
|
| 10289 |
+
{
|
| 10290 |
+
"epoch": 0.09652020974078275,
|
| 10291 |
+
"grad_norm": 0.1943359375,
|
| 10292 |
+
"learning_rate": 7.318785578747629e-05,
|
| 10293 |
+
"loss": 1.6813393831253052,
|
| 10294 |
+
"step": 1468
|
| 10295 |
+
},
|
| 10296 |
+
{
|
| 10297 |
+
"epoch": 0.09658595920245903,
|
| 10298 |
+
"grad_norm": 0.2119140625,
|
| 10299 |
+
"learning_rate": 7.316888045540798e-05,
|
| 10300 |
+
"loss": 1.824900507926941,
|
| 10301 |
+
"step": 1469
|
| 10302 |
+
},
|
| 10303 |
+
{
|
| 10304 |
+
"epoch": 0.09665170866413532,
|
| 10305 |
+
"grad_norm": 0.1748046875,
|
| 10306 |
+
"learning_rate": 7.314990512333966e-05,
|
| 10307 |
+
"loss": 1.7878731489181519,
|
| 10308 |
+
"step": 1470
|
| 10309 |
+
},
|
| 10310 |
+
{
|
| 10311 |
+
"epoch": 0.09671745812581159,
|
| 10312 |
+
"grad_norm": 0.1796875,
|
| 10313 |
+
"learning_rate": 7.313092979127135e-05,
|
| 10314 |
+
"loss": 1.7270451784133911,
|
| 10315 |
+
"step": 1471
|
| 10316 |
+
},
|
| 10317 |
+
{
|
| 10318 |
+
"epoch": 0.09678320758748787,
|
| 10319 |
+
"grad_norm": 0.177734375,
|
| 10320 |
+
"learning_rate": 7.311195445920304e-05,
|
| 10321 |
+
"loss": 1.7670789957046509,
|
| 10322 |
+
"step": 1472
|
| 10323 |
+
},
|
| 10324 |
+
{
|
| 10325 |
+
"epoch": 0.09684895704916416,
|
| 10326 |
+
"grad_norm": 0.259765625,
|
| 10327 |
+
"learning_rate": 7.309297912713473e-05,
|
| 10328 |
+
"loss": 1.789025902748108,
|
| 10329 |
+
"step": 1473
|
| 10330 |
+
},
|
| 10331 |
+
{
|
| 10332 |
+
"epoch": 0.09691470651084044,
|
| 10333 |
+
"grad_norm": 0.1689453125,
|
| 10334 |
+
"learning_rate": 7.307400379506642e-05,
|
| 10335 |
+
"loss": 1.8536561727523804,
|
| 10336 |
+
"step": 1474
|
| 10337 |
+
},
|
| 10338 |
+
{
|
| 10339 |
+
"epoch": 0.09698045597251673,
|
| 10340 |
+
"grad_norm": 0.185546875,
|
| 10341 |
+
"learning_rate": 7.305502846299811e-05,
|
| 10342 |
+
"loss": 1.7734472751617432,
|
| 10343 |
+
"step": 1475
|
| 10344 |
+
},
|
| 10345 |
+
{
|
| 10346 |
+
"epoch": 0.09704620543419301,
|
| 10347 |
+
"grad_norm": 0.1943359375,
|
| 10348 |
+
"learning_rate": 7.303605313092979e-05,
|
| 10349 |
+
"loss": 1.747511863708496,
|
| 10350 |
+
"step": 1476
|
| 10351 |
+
},
|
| 10352 |
+
{
|
| 10353 |
+
"epoch": 0.0971119548958693,
|
| 10354 |
+
"grad_norm": 0.177734375,
|
| 10355 |
+
"learning_rate": 7.301707779886149e-05,
|
| 10356 |
+
"loss": 1.7072086334228516,
|
| 10357 |
+
"step": 1477
|
| 10358 |
+
},
|
| 10359 |
+
{
|
| 10360 |
+
"epoch": 0.09717770435754557,
|
| 10361 |
+
"grad_norm": 0.173828125,
|
| 10362 |
+
"learning_rate": 7.299810246679317e-05,
|
| 10363 |
+
"loss": 1.728750228881836,
|
| 10364 |
+
"step": 1478
|
| 10365 |
+
},
|
| 10366 |
+
{
|
| 10367 |
+
"epoch": 0.09724345381922185,
|
| 10368 |
+
"grad_norm": 0.19921875,
|
| 10369 |
+
"learning_rate": 7.297912713472486e-05,
|
| 10370 |
+
"loss": 1.8214446306228638,
|
| 10371 |
+
"step": 1479
|
| 10372 |
+
},
|
| 10373 |
+
{
|
| 10374 |
+
"epoch": 0.09730920328089813,
|
| 10375 |
+
"grad_norm": 0.1875,
|
| 10376 |
+
"learning_rate": 7.296015180265655e-05,
|
| 10377 |
+
"loss": 1.7706884145736694,
|
| 10378 |
+
"step": 1480
|
| 10379 |
+
},
|
| 10380 |
+
{
|
| 10381 |
+
"epoch": 0.09737495274257442,
|
| 10382 |
+
"grad_norm": 0.1708984375,
|
| 10383 |
+
"learning_rate": 7.294117647058823e-05,
|
| 10384 |
+
"loss": 1.7763272523880005,
|
| 10385 |
+
"step": 1481
|
| 10386 |
+
},
|
| 10387 |
+
{
|
| 10388 |
+
"epoch": 0.0974407022042507,
|
| 10389 |
+
"grad_norm": 0.169921875,
|
| 10390 |
+
"learning_rate": 7.292220113851993e-05,
|
| 10391 |
+
"loss": 1.7453808784484863,
|
| 10392 |
+
"step": 1482
|
| 10393 |
+
},
|
| 10394 |
+
{
|
| 10395 |
+
"epoch": 0.09750645166592699,
|
| 10396 |
+
"grad_norm": 0.1884765625,
|
| 10397 |
+
"learning_rate": 7.290322580645161e-05,
|
| 10398 |
+
"loss": 1.8003605604171753,
|
| 10399 |
+
"step": 1483
|
| 10400 |
+
},
|
| 10401 |
+
{
|
| 10402 |
+
"epoch": 0.09757220112760327,
|
| 10403 |
+
"grad_norm": 0.1669921875,
|
| 10404 |
+
"learning_rate": 7.288425047438332e-05,
|
| 10405 |
+
"loss": 1.7307446002960205,
|
| 10406 |
+
"step": 1484
|
| 10407 |
+
},
|
| 10408 |
+
{
|
| 10409 |
+
"epoch": 0.09763795058927956,
|
| 10410 |
+
"grad_norm": 0.1708984375,
|
| 10411 |
+
"learning_rate": 7.286527514231499e-05,
|
| 10412 |
+
"loss": 1.7151074409484863,
|
| 10413 |
+
"step": 1485
|
| 10414 |
+
},
|
| 10415 |
+
{
|
| 10416 |
+
"epoch": 0.09770370005095583,
|
| 10417 |
+
"grad_norm": 0.16796875,
|
| 10418 |
+
"learning_rate": 7.284629981024668e-05,
|
| 10419 |
+
"loss": 1.824714183807373,
|
| 10420 |
+
"step": 1486
|
| 10421 |
+
},
|
| 10422 |
+
{
|
| 10423 |
+
"epoch": 0.09776944951263211,
|
| 10424 |
+
"grad_norm": 0.205078125,
|
| 10425 |
+
"learning_rate": 7.282732447817837e-05,
|
| 10426 |
+
"loss": 1.7898285388946533,
|
| 10427 |
+
"step": 1487
|
| 10428 |
+
},
|
| 10429 |
+
{
|
| 10430 |
+
"epoch": 0.0978351989743084,
|
| 10431 |
+
"grad_norm": 0.1767578125,
|
| 10432 |
+
"learning_rate": 7.280834914611005e-05,
|
| 10433 |
+
"loss": 1.8133134841918945,
|
| 10434 |
+
"step": 1488
|
| 10435 |
+
},
|
| 10436 |
+
{
|
| 10437 |
+
"epoch": 0.09790094843598468,
|
| 10438 |
+
"grad_norm": 0.1796875,
|
| 10439 |
+
"learning_rate": 7.278937381404176e-05,
|
| 10440 |
+
"loss": 1.7619338035583496,
|
| 10441 |
+
"step": 1489
|
| 10442 |
+
},
|
| 10443 |
+
{
|
| 10444 |
+
"epoch": 0.09796669789766096,
|
| 10445 |
+
"grad_norm": 0.16796875,
|
| 10446 |
+
"learning_rate": 7.277039848197343e-05,
|
| 10447 |
+
"loss": 1.7614924907684326,
|
| 10448 |
+
"step": 1490
|
| 10449 |
+
},
|
| 10450 |
+
{
|
| 10451 |
+
"epoch": 0.09803244735933725,
|
| 10452 |
+
"grad_norm": 0.16796875,
|
| 10453 |
+
"learning_rate": 7.275142314990513e-05,
|
| 10454 |
+
"loss": 1.7422925233840942,
|
| 10455 |
+
"step": 1491
|
| 10456 |
+
},
|
| 10457 |
+
{
|
| 10458 |
+
"epoch": 0.09809819682101353,
|
| 10459 |
+
"grad_norm": 0.1630859375,
|
| 10460 |
+
"learning_rate": 7.273244781783682e-05,
|
| 10461 |
+
"loss": 1.6483393907546997,
|
| 10462 |
+
"step": 1492
|
| 10463 |
+
},
|
| 10464 |
+
{
|
| 10465 |
+
"epoch": 0.0981639462826898,
|
| 10466 |
+
"grad_norm": 0.1865234375,
|
| 10467 |
+
"learning_rate": 7.271347248576851e-05,
|
| 10468 |
+
"loss": 1.8690717220306396,
|
| 10469 |
+
"step": 1493
|
| 10470 |
+
},
|
| 10471 |
+
{
|
| 10472 |
+
"epoch": 0.09822969574436609,
|
| 10473 |
+
"grad_norm": 0.1728515625,
|
| 10474 |
+
"learning_rate": 7.26944971537002e-05,
|
| 10475 |
+
"loss": 1.7135531902313232,
|
| 10476 |
+
"step": 1494
|
| 10477 |
+
},
|
| 10478 |
+
{
|
| 10479 |
+
"epoch": 0.09829544520604237,
|
| 10480 |
+
"grad_norm": 0.185546875,
|
| 10481 |
+
"learning_rate": 7.267552182163189e-05,
|
| 10482 |
+
"loss": 1.7983638048171997,
|
| 10483 |
+
"step": 1495
|
| 10484 |
+
},
|
| 10485 |
+
{
|
| 10486 |
+
"epoch": 0.09836119466771866,
|
| 10487 |
+
"grad_norm": 0.1630859375,
|
| 10488 |
+
"learning_rate": 7.265654648956357e-05,
|
| 10489 |
+
"loss": 1.8132215738296509,
|
| 10490 |
+
"step": 1496
|
| 10491 |
+
},
|
| 10492 |
+
{
|
| 10493 |
+
"epoch": 0.09842694412939494,
|
| 10494 |
+
"grad_norm": 0.2001953125,
|
| 10495 |
+
"learning_rate": 7.263757115749526e-05,
|
| 10496 |
+
"loss": 1.7059814929962158,
|
| 10497 |
+
"step": 1497
|
| 10498 |
+
},
|
| 10499 |
+
{
|
| 10500 |
+
"epoch": 0.09849269359107123,
|
| 10501 |
+
"grad_norm": 0.18359375,
|
| 10502 |
+
"learning_rate": 7.261859582542695e-05,
|
| 10503 |
+
"loss": 1.7478179931640625,
|
| 10504 |
+
"step": 1498
|
| 10505 |
+
},
|
| 10506 |
+
{
|
| 10507 |
+
"epoch": 0.09855844305274751,
|
| 10508 |
+
"grad_norm": 0.1806640625,
|
| 10509 |
+
"learning_rate": 7.259962049335864e-05,
|
| 10510 |
+
"loss": 1.7152681350708008,
|
| 10511 |
+
"step": 1499
|
| 10512 |
+
},
|
| 10513 |
+
{
|
| 10514 |
+
"epoch": 0.0986241925144238,
|
| 10515 |
+
"grad_norm": 0.189453125,
|
| 10516 |
+
"learning_rate": 7.258064516129033e-05,
|
| 10517 |
+
"loss": 1.7134069204330444,
|
| 10518 |
+
"step": 1500
|
| 10519 |
+
},
|
| 10520 |
+
{
|
| 10521 |
+
"epoch": 0.09868994197610007,
|
| 10522 |
+
"grad_norm": 0.1904296875,
|
| 10523 |
+
"learning_rate": 7.256166982922201e-05,
|
| 10524 |
+
"loss": 1.7912667989730835,
|
| 10525 |
+
"step": 1501
|
| 10526 |
+
},
|
| 10527 |
+
{
|
| 10528 |
+
"epoch": 0.09875569143777635,
|
| 10529 |
+
"grad_norm": 0.2001953125,
|
| 10530 |
+
"learning_rate": 7.254269449715371e-05,
|
| 10531 |
+
"loss": 1.8104010820388794,
|
| 10532 |
+
"step": 1502
|
| 10533 |
+
},
|
| 10534 |
+
{
|
| 10535 |
+
"epoch": 0.09882144089945263,
|
| 10536 |
+
"grad_norm": 0.203125,
|
| 10537 |
+
"learning_rate": 7.252371916508539e-05,
|
| 10538 |
+
"loss": 1.7435134649276733,
|
| 10539 |
+
"step": 1503
|
| 10540 |
+
},
|
| 10541 |
+
{
|
| 10542 |
+
"epoch": 0.09888719036112892,
|
| 10543 |
+
"grad_norm": 0.1796875,
|
| 10544 |
+
"learning_rate": 7.250474383301708e-05,
|
| 10545 |
+
"loss": 1.7764124870300293,
|
| 10546 |
+
"step": 1504
|
| 10547 |
+
},
|
| 10548 |
+
{
|
| 10549 |
+
"epoch": 0.0989529398228052,
|
| 10550 |
+
"grad_norm": 0.1884765625,
|
| 10551 |
+
"learning_rate": 7.248576850094877e-05,
|
| 10552 |
+
"loss": 1.6979496479034424,
|
| 10553 |
+
"step": 1505
|
| 10554 |
+
},
|
| 10555 |
+
{
|
| 10556 |
+
"epoch": 0.09901868928448149,
|
| 10557 |
+
"grad_norm": 0.1728515625,
|
| 10558 |
+
"learning_rate": 7.246679316888045e-05,
|
| 10559 |
+
"loss": 1.7199857234954834,
|
| 10560 |
+
"step": 1506
|
| 10561 |
+
},
|
| 10562 |
+
{
|
| 10563 |
+
"epoch": 0.09908443874615777,
|
| 10564 |
+
"grad_norm": 0.16796875,
|
| 10565 |
+
"learning_rate": 7.244781783681215e-05,
|
| 10566 |
+
"loss": 1.75422203540802,
|
| 10567 |
+
"step": 1507
|
| 10568 |
+
},
|
| 10569 |
+
{
|
| 10570 |
+
"epoch": 0.09915018820783404,
|
| 10571 |
+
"grad_norm": 0.1796875,
|
| 10572 |
+
"learning_rate": 7.242884250474383e-05,
|
| 10573 |
+
"loss": 1.7922829389572144,
|
| 10574 |
+
"step": 1508
|
| 10575 |
+
},
|
| 10576 |
+
{
|
| 10577 |
+
"epoch": 0.09921593766951033,
|
| 10578 |
+
"grad_norm": 0.416015625,
|
| 10579 |
+
"learning_rate": 7.240986717267552e-05,
|
| 10580 |
+
"loss": 1.834727168083191,
|
| 10581 |
+
"step": 1509
|
| 10582 |
+
},
|
| 10583 |
+
{
|
| 10584 |
+
"epoch": 0.09928168713118661,
|
| 10585 |
+
"grad_norm": 0.1650390625,
|
| 10586 |
+
"learning_rate": 7.239089184060721e-05,
|
| 10587 |
+
"loss": 1.774029016494751,
|
| 10588 |
+
"step": 1510
|
| 10589 |
+
},
|
| 10590 |
+
{
|
| 10591 |
+
"epoch": 0.0993474365928629,
|
| 10592 |
+
"grad_norm": 0.16796875,
|
| 10593 |
+
"learning_rate": 7.23719165085389e-05,
|
| 10594 |
+
"loss": 1.645897626876831,
|
| 10595 |
+
"step": 1511
|
| 10596 |
+
},
|
| 10597 |
+
{
|
| 10598 |
+
"epoch": 0.09941318605453918,
|
| 10599 |
+
"grad_norm": 0.193359375,
|
| 10600 |
+
"learning_rate": 7.23529411764706e-05,
|
| 10601 |
+
"loss": 1.7838002443313599,
|
| 10602 |
+
"step": 1512
|
| 10603 |
+
},
|
| 10604 |
+
{
|
| 10605 |
+
"epoch": 0.09947893551621546,
|
| 10606 |
+
"grad_norm": 0.1787109375,
|
| 10607 |
+
"learning_rate": 7.233396584440229e-05,
|
| 10608 |
+
"loss": 1.7444725036621094,
|
| 10609 |
+
"step": 1513
|
| 10610 |
+
},
|
| 10611 |
+
{
|
| 10612 |
+
"epoch": 0.09954468497789175,
|
| 10613 |
+
"grad_norm": 0.1708984375,
|
| 10614 |
+
"learning_rate": 7.231499051233396e-05,
|
| 10615 |
+
"loss": 1.784379482269287,
|
| 10616 |
+
"step": 1514
|
| 10617 |
+
},
|
| 10618 |
+
{
|
| 10619 |
+
"epoch": 0.09961043443956803,
|
| 10620 |
+
"grad_norm": 0.19140625,
|
| 10621 |
+
"learning_rate": 7.229601518026565e-05,
|
| 10622 |
+
"loss": 1.6817635297775269,
|
| 10623 |
+
"step": 1515
|
| 10624 |
+
},
|
| 10625 |
+
{
|
| 10626 |
+
"epoch": 0.0996761839012443,
|
| 10627 |
+
"grad_norm": 0.1689453125,
|
| 10628 |
+
"learning_rate": 7.227703984819735e-05,
|
| 10629 |
+
"loss": 1.7608973979949951,
|
| 10630 |
+
"step": 1516
|
| 10631 |
+
},
|
| 10632 |
+
{
|
| 10633 |
+
"epoch": 0.09974193336292059,
|
| 10634 |
+
"grad_norm": 0.2080078125,
|
| 10635 |
+
"learning_rate": 7.225806451612904e-05,
|
| 10636 |
+
"loss": 1.864208459854126,
|
| 10637 |
+
"step": 1517
|
| 10638 |
+
},
|
| 10639 |
+
{
|
| 10640 |
+
"epoch": 0.09980768282459687,
|
| 10641 |
+
"grad_norm": 0.1865234375,
|
| 10642 |
+
"learning_rate": 7.223908918406073e-05,
|
| 10643 |
+
"loss": 1.8407920598983765,
|
| 10644 |
+
"step": 1518
|
| 10645 |
+
},
|
| 10646 |
+
{
|
| 10647 |
+
"epoch": 0.09987343228627316,
|
| 10648 |
+
"grad_norm": 0.181640625,
|
| 10649 |
+
"learning_rate": 7.22201138519924e-05,
|
| 10650 |
+
"loss": 1.8261181116104126,
|
| 10651 |
+
"step": 1519
|
| 10652 |
+
},
|
| 10653 |
+
{
|
| 10654 |
+
"epoch": 0.09993918174794944,
|
| 10655 |
+
"grad_norm": 0.181640625,
|
| 10656 |
+
"learning_rate": 7.220113851992411e-05,
|
| 10657 |
+
"loss": 1.781592607498169,
|
| 10658 |
+
"step": 1520
|
| 10659 |
+
},
|
| 10660 |
+
{
|
| 10661 |
+
"epoch": 0.10000493120962572,
|
| 10662 |
+
"grad_norm": 0.171875,
|
| 10663 |
+
"learning_rate": 7.218216318785579e-05,
|
| 10664 |
+
"loss": 1.7122652530670166,
|
| 10665 |
+
"step": 1521
|
| 10666 |
+
},
|
| 10667 |
+
{
|
| 10668 |
+
"epoch": 0.10007068067130201,
|
| 10669 |
+
"grad_norm": 0.17578125,
|
| 10670 |
+
"learning_rate": 7.216318785578748e-05,
|
| 10671 |
+
"loss": 1.7317644357681274,
|
| 10672 |
+
"step": 1522
|
| 10673 |
+
},
|
| 10674 |
+
{
|
| 10675 |
+
"epoch": 0.10013643013297828,
|
| 10676 |
+
"grad_norm": 0.166015625,
|
| 10677 |
+
"learning_rate": 7.214421252371917e-05,
|
| 10678 |
+
"loss": 1.8022645711898804,
|
| 10679 |
+
"step": 1523
|
| 10680 |
+
},
|
| 10681 |
+
{
|
| 10682 |
+
"epoch": 0.10020217959465456,
|
| 10683 |
+
"grad_norm": 0.17578125,
|
| 10684 |
+
"learning_rate": 7.212523719165085e-05,
|
| 10685 |
+
"loss": 1.695499062538147,
|
| 10686 |
+
"step": 1524
|
| 10687 |
+
},
|
| 10688 |
+
{
|
| 10689 |
+
"epoch": 0.10026792905633085,
|
| 10690 |
+
"grad_norm": 0.1708984375,
|
| 10691 |
+
"learning_rate": 7.210626185958255e-05,
|
| 10692 |
+
"loss": 1.6665771007537842,
|
| 10693 |
+
"step": 1525
|
| 10694 |
+
},
|
| 10695 |
+
{
|
| 10696 |
+
"epoch": 0.10033367851800713,
|
| 10697 |
+
"grad_norm": 0.1708984375,
|
| 10698 |
+
"learning_rate": 7.208728652751423e-05,
|
| 10699 |
+
"loss": 1.772363305091858,
|
| 10700 |
+
"step": 1526
|
| 10701 |
+
},
|
| 10702 |
+
{
|
| 10703 |
+
"epoch": 0.10039942797968342,
|
| 10704 |
+
"grad_norm": 0.1845703125,
|
| 10705 |
+
"learning_rate": 7.206831119544592e-05,
|
| 10706 |
+
"loss": 1.739142894744873,
|
| 10707 |
+
"step": 1527
|
| 10708 |
+
},
|
| 10709 |
+
{
|
| 10710 |
+
"epoch": 0.1004651774413597,
|
| 10711 |
+
"grad_norm": 0.1826171875,
|
| 10712 |
+
"learning_rate": 7.204933586337761e-05,
|
| 10713 |
+
"loss": 1.8881553411483765,
|
| 10714 |
+
"step": 1528
|
| 10715 |
+
},
|
| 10716 |
+
{
|
| 10717 |
+
"epoch": 0.10053092690303599,
|
| 10718 |
+
"grad_norm": 0.1787109375,
|
| 10719 |
+
"learning_rate": 7.20303605313093e-05,
|
| 10720 |
+
"loss": 1.8863234519958496,
|
| 10721 |
+
"step": 1529
|
| 10722 |
+
},
|
| 10723 |
+
{
|
| 10724 |
+
"epoch": 0.10059667636471227,
|
| 10725 |
+
"grad_norm": 0.1767578125,
|
| 10726 |
+
"learning_rate": 7.201138519924099e-05,
|
| 10727 |
+
"loss": 1.7713812589645386,
|
| 10728 |
+
"step": 1530
|
| 10729 |
+
},
|
| 10730 |
+
{
|
| 10731 |
+
"epoch": 0.10066242582638854,
|
| 10732 |
+
"grad_norm": 0.1748046875,
|
| 10733 |
+
"learning_rate": 7.199240986717268e-05,
|
| 10734 |
+
"loss": 1.68230402469635,
|
| 10735 |
+
"step": 1531
|
| 10736 |
+
},
|
| 10737 |
+
{
|
| 10738 |
+
"epoch": 0.10072817528806483,
|
| 10739 |
+
"grad_norm": 0.189453125,
|
| 10740 |
+
"learning_rate": 7.197343453510436e-05,
|
| 10741 |
+
"loss": 1.7307608127593994,
|
| 10742 |
+
"step": 1532
|
| 10743 |
+
},
|
| 10744 |
+
{
|
| 10745 |
+
"epoch": 0.10079392474974111,
|
| 10746 |
+
"grad_norm": 0.1640625,
|
| 10747 |
+
"learning_rate": 7.195445920303607e-05,
|
| 10748 |
+
"loss": 1.6918061971664429,
|
| 10749 |
+
"step": 1533
|
| 10750 |
+
},
|
| 10751 |
+
{
|
| 10752 |
+
"epoch": 0.1008596742114174,
|
| 10753 |
+
"grad_norm": 0.2041015625,
|
| 10754 |
+
"learning_rate": 7.193548387096774e-05,
|
| 10755 |
+
"loss": 1.6906559467315674,
|
| 10756 |
+
"step": 1534
|
| 10757 |
+
},
|
| 10758 |
+
{
|
| 10759 |
+
"epoch": 0.10092542367309368,
|
| 10760 |
+
"grad_norm": 0.1962890625,
|
| 10761 |
+
"learning_rate": 7.191650853889943e-05,
|
| 10762 |
+
"loss": 1.7546312808990479,
|
| 10763 |
+
"step": 1535
|
| 10764 |
+
},
|
| 10765 |
+
{
|
| 10766 |
+
"epoch": 0.10099117313476996,
|
| 10767 |
+
"grad_norm": 0.18359375,
|
| 10768 |
+
"learning_rate": 7.189753320683113e-05,
|
| 10769 |
+
"loss": 1.7461049556732178,
|
| 10770 |
+
"step": 1536
|
| 10771 |
+
},
|
| 10772 |
+
{
|
| 10773 |
+
"epoch": 0.10105692259644625,
|
| 10774 |
+
"grad_norm": 0.1748046875,
|
| 10775 |
+
"learning_rate": 7.18785578747628e-05,
|
| 10776 |
+
"loss": 1.8401005268096924,
|
| 10777 |
+
"step": 1537
|
| 10778 |
+
},
|
| 10779 |
+
{
|
| 10780 |
+
"epoch": 0.10112267205812252,
|
| 10781 |
+
"grad_norm": 0.1689453125,
|
| 10782 |
+
"learning_rate": 7.185958254269451e-05,
|
| 10783 |
+
"loss": 1.7328840494155884,
|
| 10784 |
+
"step": 1538
|
| 10785 |
+
},
|
| 10786 |
+
{
|
| 10787 |
+
"epoch": 0.1011884215197988,
|
| 10788 |
+
"grad_norm": 0.173828125,
|
| 10789 |
+
"learning_rate": 7.184060721062618e-05,
|
| 10790 |
+
"loss": 1.7221028804779053,
|
| 10791 |
+
"step": 1539
|
| 10792 |
+
},
|
| 10793 |
+
{
|
| 10794 |
+
"epoch": 0.10125417098147509,
|
| 10795 |
+
"grad_norm": 0.1689453125,
|
| 10796 |
+
"learning_rate": 7.182163187855789e-05,
|
| 10797 |
+
"loss": 1.7687971591949463,
|
| 10798 |
+
"step": 1540
|
| 10799 |
+
},
|
| 10800 |
+
{
|
| 10801 |
+
"epoch": 0.10131992044315137,
|
| 10802 |
+
"grad_norm": 0.1708984375,
|
| 10803 |
+
"learning_rate": 7.180265654648957e-05,
|
| 10804 |
+
"loss": 1.8095515966415405,
|
| 10805 |
+
"step": 1541
|
| 10806 |
+
},
|
| 10807 |
+
{
|
| 10808 |
+
"epoch": 0.10138566990482765,
|
| 10809 |
+
"grad_norm": 0.17578125,
|
| 10810 |
+
"learning_rate": 7.178368121442126e-05,
|
| 10811 |
+
"loss": 1.7873241901397705,
|
| 10812 |
+
"step": 1542
|
| 10813 |
+
},
|
| 10814 |
+
{
|
| 10815 |
+
"epoch": 0.10145141936650394,
|
| 10816 |
+
"grad_norm": 0.181640625,
|
| 10817 |
+
"learning_rate": 7.176470588235295e-05,
|
| 10818 |
+
"loss": 1.7834312915802002,
|
| 10819 |
+
"step": 1543
|
| 10820 |
+
},
|
| 10821 |
+
{
|
| 10822 |
+
"epoch": 0.10151716882818022,
|
| 10823 |
+
"grad_norm": 0.1591796875,
|
| 10824 |
+
"learning_rate": 7.174573055028463e-05,
|
| 10825 |
+
"loss": 1.7562631368637085,
|
| 10826 |
+
"step": 1544
|
| 10827 |
+
},
|
| 10828 |
+
{
|
| 10829 |
+
"epoch": 0.10158291828985651,
|
| 10830 |
+
"grad_norm": 0.185546875,
|
| 10831 |
+
"learning_rate": 7.172675521821633e-05,
|
| 10832 |
+
"loss": 1.7083765268325806,
|
| 10833 |
+
"step": 1545
|
| 10834 |
+
},
|
| 10835 |
+
{
|
| 10836 |
+
"epoch": 0.10164866775153278,
|
| 10837 |
+
"grad_norm": 0.1806640625,
|
| 10838 |
+
"learning_rate": 7.170777988614801e-05,
|
| 10839 |
+
"loss": 1.828485369682312,
|
| 10840 |
+
"step": 1546
|
| 10841 |
+
},
|
| 10842 |
+
{
|
| 10843 |
+
"epoch": 0.10171441721320906,
|
| 10844 |
+
"grad_norm": 0.1767578125,
|
| 10845 |
+
"learning_rate": 7.16888045540797e-05,
|
| 10846 |
+
"loss": 1.7196537256240845,
|
| 10847 |
+
"step": 1547
|
| 10848 |
+
},
|
| 10849 |
+
{
|
| 10850 |
+
"epoch": 0.10178016667488535,
|
| 10851 |
+
"grad_norm": 0.1826171875,
|
| 10852 |
+
"learning_rate": 7.166982922201139e-05,
|
| 10853 |
+
"loss": 1.8074257373809814,
|
| 10854 |
+
"step": 1548
|
| 10855 |
+
},
|
| 10856 |
+
{
|
| 10857 |
+
"epoch": 0.10184591613656163,
|
| 10858 |
+
"grad_norm": 0.234375,
|
| 10859 |
+
"learning_rate": 7.165085388994308e-05,
|
| 10860 |
+
"loss": 1.6863274574279785,
|
| 10861 |
+
"step": 1549
|
| 10862 |
+
},
|
| 10863 |
+
{
|
| 10864 |
+
"epoch": 0.10191166559823792,
|
| 10865 |
+
"grad_norm": 0.1787109375,
|
| 10866 |
+
"learning_rate": 7.163187855787477e-05,
|
| 10867 |
+
"loss": 1.8261758089065552,
|
| 10868 |
+
"step": 1550
|
| 10869 |
+
},
|
| 10870 |
+
{
|
| 10871 |
+
"epoch": 0.1019774150599142,
|
| 10872 |
+
"grad_norm": 0.177734375,
|
| 10873 |
+
"learning_rate": 7.161290322580646e-05,
|
| 10874 |
+
"loss": 1.7858636379241943,
|
| 10875 |
+
"step": 1551
|
| 10876 |
+
},
|
| 10877 |
+
{
|
| 10878 |
+
"epoch": 0.10204316452159048,
|
| 10879 |
+
"grad_norm": 0.17578125,
|
| 10880 |
+
"learning_rate": 7.159392789373814e-05,
|
| 10881 |
+
"loss": 1.8049784898757935,
|
| 10882 |
+
"step": 1552
|
| 10883 |
+
},
|
| 10884 |
+
{
|
| 10885 |
+
"epoch": 0.10210891398326676,
|
| 10886 |
+
"grad_norm": 0.1845703125,
|
| 10887 |
+
"learning_rate": 7.157495256166983e-05,
|
| 10888 |
+
"loss": 1.7329432964324951,
|
| 10889 |
+
"step": 1553
|
| 10890 |
+
},
|
| 10891 |
+
{
|
| 10892 |
+
"epoch": 0.10217466344494304,
|
| 10893 |
+
"grad_norm": 0.1748046875,
|
| 10894 |
+
"learning_rate": 7.155597722960152e-05,
|
| 10895 |
+
"loss": 1.765457034111023,
|
| 10896 |
+
"step": 1554
|
| 10897 |
+
},
|
| 10898 |
+
{
|
| 10899 |
+
"epoch": 0.10224041290661932,
|
| 10900 |
+
"grad_norm": 0.16796875,
|
| 10901 |
+
"learning_rate": 7.153700189753321e-05,
|
| 10902 |
+
"loss": 1.7625668048858643,
|
| 10903 |
+
"step": 1555
|
| 10904 |
+
},
|
| 10905 |
+
{
|
| 10906 |
+
"epoch": 0.10230616236829561,
|
| 10907 |
+
"grad_norm": 0.1826171875,
|
| 10908 |
+
"learning_rate": 7.15180265654649e-05,
|
| 10909 |
+
"loss": 1.9404858350753784,
|
| 10910 |
+
"step": 1556
|
| 10911 |
+
},
|
| 10912 |
+
{
|
| 10913 |
+
"epoch": 0.10237191182997189,
|
| 10914 |
+
"grad_norm": 0.173828125,
|
| 10915 |
+
"learning_rate": 7.149905123339658e-05,
|
| 10916 |
+
"loss": 1.7697091102600098,
|
| 10917 |
+
"step": 1557
|
| 10918 |
+
},
|
| 10919 |
+
{
|
| 10920 |
+
"epoch": 0.10243766129164818,
|
| 10921 |
+
"grad_norm": 0.169921875,
|
| 10922 |
+
"learning_rate": 7.148007590132829e-05,
|
| 10923 |
+
"loss": 1.7413151264190674,
|
| 10924 |
+
"step": 1558
|
| 10925 |
+
},
|
| 10926 |
+
{
|
| 10927 |
+
"epoch": 0.10250341075332446,
|
| 10928 |
+
"grad_norm": 0.1708984375,
|
| 10929 |
+
"learning_rate": 7.146110056925996e-05,
|
| 10930 |
+
"loss": 1.7571144104003906,
|
| 10931 |
+
"step": 1559
|
| 10932 |
+
},
|
| 10933 |
+
{
|
| 10934 |
+
"epoch": 0.10256916021500075,
|
| 10935 |
+
"grad_norm": 0.1689453125,
|
| 10936 |
+
"learning_rate": 7.144212523719165e-05,
|
| 10937 |
+
"loss": 1.7602663040161133,
|
| 10938 |
+
"step": 1560
|
| 10939 |
+
},
|
| 10940 |
+
{
|
| 10941 |
+
"epoch": 0.10263490967667702,
|
| 10942 |
+
"grad_norm": 0.1962890625,
|
| 10943 |
+
"learning_rate": 7.142314990512335e-05,
|
| 10944 |
+
"loss": 1.7653920650482178,
|
| 10945 |
+
"step": 1561
|
| 10946 |
+
},
|
| 10947 |
+
{
|
| 10948 |
+
"epoch": 0.1027006591383533,
|
| 10949 |
+
"grad_norm": 0.189453125,
|
| 10950 |
+
"learning_rate": 7.140417457305502e-05,
|
| 10951 |
+
"loss": 1.8886628150939941,
|
| 10952 |
+
"step": 1562
|
| 10953 |
+
},
|
| 10954 |
+
{
|
| 10955 |
+
"epoch": 0.10276640860002959,
|
| 10956 |
+
"grad_norm": 0.1806640625,
|
| 10957 |
+
"learning_rate": 7.138519924098673e-05,
|
| 10958 |
+
"loss": 1.7143476009368896,
|
| 10959 |
+
"step": 1563
|
| 10960 |
+
},
|
| 10961 |
+
{
|
| 10962 |
+
"epoch": 0.10283215806170587,
|
| 10963 |
+
"grad_norm": 0.185546875,
|
| 10964 |
+
"learning_rate": 7.13662239089184e-05,
|
| 10965 |
+
"loss": 1.833126425743103,
|
| 10966 |
+
"step": 1564
|
| 10967 |
+
},
|
| 10968 |
+
{
|
| 10969 |
+
"epoch": 0.10289790752338215,
|
| 10970 |
+
"grad_norm": 0.19140625,
|
| 10971 |
+
"learning_rate": 7.13472485768501e-05,
|
| 10972 |
+
"loss": 1.7594883441925049,
|
| 10973 |
+
"step": 1565
|
| 10974 |
+
},
|
| 10975 |
+
{
|
| 10976 |
+
"epoch": 0.10296365698505844,
|
| 10977 |
+
"grad_norm": 0.1728515625,
|
| 10978 |
+
"learning_rate": 7.132827324478179e-05,
|
| 10979 |
+
"loss": 1.7321943044662476,
|
| 10980 |
+
"step": 1566
|
| 10981 |
+
},
|
| 10982 |
+
{
|
| 10983 |
+
"epoch": 0.10302940644673472,
|
| 10984 |
+
"grad_norm": 0.1689453125,
|
| 10985 |
+
"learning_rate": 7.130929791271348e-05,
|
| 10986 |
+
"loss": 1.7456376552581787,
|
| 10987 |
+
"step": 1567
|
| 10988 |
+
},
|
| 10989 |
+
{
|
| 10990 |
+
"epoch": 0.103095155908411,
|
| 10991 |
+
"grad_norm": 0.1943359375,
|
| 10992 |
+
"learning_rate": 7.129032258064517e-05,
|
| 10993 |
+
"loss": 1.7408852577209473,
|
| 10994 |
+
"step": 1568
|
| 10995 |
+
},
|
| 10996 |
+
{
|
| 10997 |
+
"epoch": 0.10316090537008728,
|
| 10998 |
+
"grad_norm": 0.2021484375,
|
| 10999 |
+
"learning_rate": 7.127134724857686e-05,
|
| 11000 |
+
"loss": 1.7469125986099243,
|
| 11001 |
+
"step": 1569
|
| 11002 |
+
},
|
| 11003 |
+
{
|
| 11004 |
+
"epoch": 0.10322665483176356,
|
| 11005 |
+
"grad_norm": 0.1767578125,
|
| 11006 |
+
"learning_rate": 7.125237191650854e-05,
|
| 11007 |
+
"loss": 1.7859790325164795,
|
| 11008 |
+
"step": 1570
|
| 11009 |
+
},
|
| 11010 |
+
{
|
| 11011 |
+
"epoch": 0.10329240429343985,
|
| 11012 |
+
"grad_norm": 0.1796875,
|
| 11013 |
+
"learning_rate": 7.123339658444023e-05,
|
| 11014 |
+
"loss": 1.7745604515075684,
|
| 11015 |
+
"step": 1571
|
| 11016 |
+
},
|
| 11017 |
+
{
|
| 11018 |
+
"epoch": 0.10335815375511613,
|
| 11019 |
+
"grad_norm": 0.2158203125,
|
| 11020 |
+
"learning_rate": 7.121442125237192e-05,
|
| 11021 |
+
"loss": 1.7328556776046753,
|
| 11022 |
+
"step": 1572
|
| 11023 |
+
},
|
| 11024 |
+
{
|
| 11025 |
+
"epoch": 0.10342390321679241,
|
| 11026 |
+
"grad_norm": 0.1787109375,
|
| 11027 |
+
"learning_rate": 7.119544592030361e-05,
|
| 11028 |
+
"loss": 1.8278470039367676,
|
| 11029 |
+
"step": 1573
|
| 11030 |
+
},
|
| 11031 |
+
{
|
| 11032 |
+
"epoch": 0.1034896526784687,
|
| 11033 |
+
"grad_norm": 0.1826171875,
|
| 11034 |
+
"learning_rate": 7.11764705882353e-05,
|
| 11035 |
+
"loss": 1.8097009658813477,
|
| 11036 |
+
"step": 1574
|
| 11037 |
+
},
|
| 11038 |
+
{
|
| 11039 |
+
"epoch": 0.10355540214014498,
|
| 11040 |
+
"grad_norm": 0.18359375,
|
| 11041 |
+
"learning_rate": 7.115749525616698e-05,
|
| 11042 |
+
"loss": 1.735648512840271,
|
| 11043 |
+
"step": 1575
|
| 11044 |
+
},
|
| 11045 |
+
{
|
| 11046 |
+
"epoch": 0.10362115160182125,
|
| 11047 |
+
"grad_norm": 0.1962890625,
|
| 11048 |
+
"learning_rate": 7.113851992409868e-05,
|
| 11049 |
+
"loss": 1.8268425464630127,
|
| 11050 |
+
"step": 1576
|
| 11051 |
+
},
|
| 11052 |
+
{
|
| 11053 |
+
"epoch": 0.10368690106349754,
|
| 11054 |
+
"grad_norm": 0.255859375,
|
| 11055 |
+
"learning_rate": 7.111954459203036e-05,
|
| 11056 |
+
"loss": 1.720413327217102,
|
| 11057 |
+
"step": 1577
|
| 11058 |
+
},
|
| 11059 |
+
{
|
| 11060 |
+
"epoch": 0.10375265052517382,
|
| 11061 |
+
"grad_norm": 0.181640625,
|
| 11062 |
+
"learning_rate": 7.110056925996205e-05,
|
| 11063 |
+
"loss": 1.8232238292694092,
|
| 11064 |
+
"step": 1578
|
| 11065 |
+
},
|
| 11066 |
+
{
|
| 11067 |
+
"epoch": 0.10381839998685011,
|
| 11068 |
+
"grad_norm": 0.1708984375,
|
| 11069 |
+
"learning_rate": 7.108159392789374e-05,
|
| 11070 |
+
"loss": 1.7939257621765137,
|
| 11071 |
+
"step": 1579
|
| 11072 |
+
},
|
| 11073 |
+
{
|
| 11074 |
+
"epoch": 0.10388414944852639,
|
| 11075 |
+
"grad_norm": 0.1826171875,
|
| 11076 |
+
"learning_rate": 7.106261859582542e-05,
|
| 11077 |
+
"loss": 1.768985629081726,
|
| 11078 |
+
"step": 1580
|
| 11079 |
+
},
|
| 11080 |
+
{
|
| 11081 |
+
"epoch": 0.10394989891020268,
|
| 11082 |
+
"grad_norm": 0.17578125,
|
| 11083 |
+
"learning_rate": 7.104364326375712e-05,
|
| 11084 |
+
"loss": 1.6919560432434082,
|
| 11085 |
+
"step": 1581
|
| 11086 |
+
},
|
| 11087 |
+
{
|
| 11088 |
+
"epoch": 0.10401564837187896,
|
| 11089 |
+
"grad_norm": 0.1884765625,
|
| 11090 |
+
"learning_rate": 7.10246679316888e-05,
|
| 11091 |
+
"loss": 1.8019638061523438,
|
| 11092 |
+
"step": 1582
|
| 11093 |
+
},
|
| 11094 |
+
{
|
| 11095 |
+
"epoch": 0.10408139783355523,
|
| 11096 |
+
"grad_norm": 0.185546875,
|
| 11097 |
+
"learning_rate": 7.10056925996205e-05,
|
| 11098 |
+
"loss": 1.8744618892669678,
|
| 11099 |
+
"step": 1583
|
| 11100 |
+
},
|
| 11101 |
+
{
|
| 11102 |
+
"epoch": 0.10414714729523152,
|
| 11103 |
+
"grad_norm": 0.189453125,
|
| 11104 |
+
"learning_rate": 7.098671726755218e-05,
|
| 11105 |
+
"loss": 1.8021092414855957,
|
| 11106 |
+
"step": 1584
|
| 11107 |
+
},
|
| 11108 |
+
{
|
| 11109 |
+
"epoch": 0.1042128967569078,
|
| 11110 |
+
"grad_norm": 0.19140625,
|
| 11111 |
+
"learning_rate": 7.096774193548388e-05,
|
| 11112 |
+
"loss": 1.8837298154830933,
|
| 11113 |
+
"step": 1585
|
| 11114 |
+
},
|
| 11115 |
+
{
|
| 11116 |
+
"epoch": 0.10427864621858408,
|
| 11117 |
+
"grad_norm": 0.1826171875,
|
| 11118 |
+
"learning_rate": 7.094876660341557e-05,
|
| 11119 |
+
"loss": 1.8405685424804688,
|
| 11120 |
+
"step": 1586
|
| 11121 |
+
},
|
| 11122 |
+
{
|
| 11123 |
+
"epoch": 0.10434439568026037,
|
| 11124 |
+
"grad_norm": 0.1669921875,
|
| 11125 |
+
"learning_rate": 7.092979127134726e-05,
|
| 11126 |
+
"loss": 1.7396854162216187,
|
| 11127 |
+
"step": 1587
|
| 11128 |
+
},
|
| 11129 |
+
{
|
| 11130 |
+
"epoch": 0.10441014514193665,
|
| 11131 |
+
"grad_norm": 0.17578125,
|
| 11132 |
+
"learning_rate": 7.091081593927893e-05,
|
| 11133 |
+
"loss": 1.7072564363479614,
|
| 11134 |
+
"step": 1588
|
| 11135 |
+
},
|
| 11136 |
+
{
|
| 11137 |
+
"epoch": 0.10447589460361294,
|
| 11138 |
+
"grad_norm": 0.18359375,
|
| 11139 |
+
"learning_rate": 7.089184060721063e-05,
|
| 11140 |
+
"loss": 1.7411478757858276,
|
| 11141 |
+
"step": 1589
|
| 11142 |
+
},
|
| 11143 |
+
{
|
| 11144 |
+
"epoch": 0.10454164406528922,
|
| 11145 |
+
"grad_norm": 0.1904296875,
|
| 11146 |
+
"learning_rate": 7.087286527514232e-05,
|
| 11147 |
+
"loss": 1.7470080852508545,
|
| 11148 |
+
"step": 1590
|
| 11149 |
+
},
|
| 11150 |
+
{
|
| 11151 |
+
"epoch": 0.10460739352696549,
|
| 11152 |
+
"grad_norm": 0.16796875,
|
| 11153 |
+
"learning_rate": 7.085388994307401e-05,
|
| 11154 |
+
"loss": 1.7566428184509277,
|
| 11155 |
+
"step": 1591
|
| 11156 |
+
},
|
| 11157 |
+
{
|
| 11158 |
+
"epoch": 0.10467314298864178,
|
| 11159 |
+
"grad_norm": 0.1806640625,
|
| 11160 |
+
"learning_rate": 7.08349146110057e-05,
|
| 11161 |
+
"loss": 1.7637954950332642,
|
| 11162 |
+
"step": 1592
|
| 11163 |
+
},
|
| 11164 |
+
{
|
| 11165 |
+
"epoch": 0.10473889245031806,
|
| 11166 |
+
"grad_norm": 0.2099609375,
|
| 11167 |
+
"learning_rate": 7.081593927893738e-05,
|
| 11168 |
+
"loss": 1.7997431755065918,
|
| 11169 |
+
"step": 1593
|
| 11170 |
+
},
|
| 11171 |
+
{
|
| 11172 |
+
"epoch": 0.10480464191199435,
|
| 11173 |
+
"grad_norm": 0.2041015625,
|
| 11174 |
+
"learning_rate": 7.079696394686908e-05,
|
| 11175 |
+
"loss": 1.923598051071167,
|
| 11176 |
+
"step": 1594
|
| 11177 |
+
},
|
| 11178 |
+
{
|
| 11179 |
+
"epoch": 0.10487039137367063,
|
| 11180 |
+
"grad_norm": 0.2099609375,
|
| 11181 |
+
"learning_rate": 7.077798861480076e-05,
|
| 11182 |
+
"loss": 1.775617241859436,
|
| 11183 |
+
"step": 1595
|
| 11184 |
+
},
|
| 11185 |
+
{
|
| 11186 |
+
"epoch": 0.10493614083534691,
|
| 11187 |
+
"grad_norm": 0.212890625,
|
| 11188 |
+
"learning_rate": 7.075901328273246e-05,
|
| 11189 |
+
"loss": 1.790596842765808,
|
| 11190 |
+
"step": 1596
|
| 11191 |
+
},
|
| 11192 |
+
{
|
| 11193 |
+
"epoch": 0.1050018902970232,
|
| 11194 |
+
"grad_norm": 0.1796875,
|
| 11195 |
+
"learning_rate": 7.074003795066414e-05,
|
| 11196 |
+
"loss": 1.7600752115249634,
|
| 11197 |
+
"step": 1597
|
| 11198 |
+
},
|
| 11199 |
+
{
|
| 11200 |
+
"epoch": 0.10506763975869948,
|
| 11201 |
+
"grad_norm": 0.173828125,
|
| 11202 |
+
"learning_rate": 7.072106261859582e-05,
|
| 11203 |
+
"loss": 1.6680808067321777,
|
| 11204 |
+
"step": 1598
|
| 11205 |
+
},
|
| 11206 |
+
{
|
| 11207 |
+
"epoch": 0.10506763975869948,
|
| 11208 |
+
"eval_loss": 1.7673020362854004,
|
| 11209 |
+
"eval_runtime": 328.7165,
|
| 11210 |
+
"eval_samples_per_second": 22.454,
|
| 11211 |
+
"eval_steps_per_second": 5.616,
|
| 11212 |
+
"step": 1598
|
| 11213 |
+
},
|
| 11214 |
+
{
|
| 11215 |
+
"epoch": 0.10513338922037575,
|
| 11216 |
+
"grad_norm": 0.1640625,
|
| 11217 |
+
"learning_rate": 7.070208728652752e-05,
|
| 11218 |
+
"loss": 1.729224443435669,
|
| 11219 |
+
"step": 1599
|
| 11220 |
+
},
|
| 11221 |
+
{
|
| 11222 |
+
"epoch": 0.10519913868205204,
|
| 11223 |
+
"grad_norm": 0.1875,
|
| 11224 |
+
"learning_rate": 7.06831119544592e-05,
|
| 11225 |
+
"loss": 1.7369139194488525,
|
| 11226 |
+
"step": 1600
|
| 11227 |
+
},
|
| 11228 |
+
{
|
| 11229 |
+
"epoch": 0.10526488814372832,
|
| 11230 |
+
"grad_norm": 0.1708984375,
|
| 11231 |
+
"learning_rate": 7.06641366223909e-05,
|
| 11232 |
+
"loss": 1.7510089874267578,
|
| 11233 |
+
"step": 1601
|
| 11234 |
+
},
|
| 11235 |
+
{
|
| 11236 |
+
"epoch": 0.1053306376054046,
|
| 11237 |
+
"grad_norm": 0.1689453125,
|
| 11238 |
+
"learning_rate": 7.064516129032258e-05,
|
| 11239 |
+
"loss": 1.759426236152649,
|
| 11240 |
+
"step": 1602
|
| 11241 |
}
|
| 11242 |
],
|
| 11243 |
"logging_steps": 1,
|
|
|
|
| 11257 |
"attributes": {}
|
| 11258 |
}
|
| 11259 |
},
|
| 11260 |
+
"total_flos": 5.572584912252979e+17,
|
| 11261 |
"train_batch_size": 4,
|
| 11262 |
"trial_name": null,
|
| 11263 |
"trial_params": null
|