Text Generation
Transformers
Safetensors
mistral
Generated from Trainer
conversational
text-generation-inference
Instructions to use TheAgenticAI/mistral-medical with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheAgenticAI/mistral-medical with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheAgenticAI/mistral-medical") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TheAgenticAI/mistral-medical") model = AutoModelForCausalLM.from_pretrained("TheAgenticAI/mistral-medical") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TheAgenticAI/mistral-medical with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheAgenticAI/mistral-medical" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheAgenticAI/mistral-medical", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TheAgenticAI/mistral-medical
- SGLang
How to use TheAgenticAI/mistral-medical with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheAgenticAI/mistral-medical" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheAgenticAI/mistral-medical", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheAgenticAI/mistral-medical" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheAgenticAI/mistral-medical", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TheAgenticAI/mistral-medical with Docker Model Runner:
docker model run hf.co/TheAgenticAI/mistral-medical
| {"loss": 1.32448733, "token_acc": 0.65718884, "grad_norm": 15.26710224, "learning_rate": 2.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.010607, "epoch": 0.00684053, "global_step/max_steps": "1/438", "percentage": "0.23%", "elapsed_time": "14s", "remaining_time": "1h 49m 6s"} | |
| {"loss": 1.31752133, "token_acc": 0.64439791, "grad_norm": 15.16372204, "learning_rate": 4.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.018561, "epoch": 0.01368106, "global_step/max_steps": "2/438", "percentage": "0.46%", "elapsed_time": "28s", "remaining_time": "1h 43m 24s"} | |
| {"loss": 1.30831349, "token_acc": 0.65193146, "grad_norm": 15.06866264, "learning_rate": 6.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.024792, "epoch": 0.02052159, "global_step/max_steps": "3/438", "percentage": "0.68%", "elapsed_time": "41s", "remaining_time": "1h 40m 48s"} | |
| {"loss": 1.25359964, "token_acc": 0.67217281, "grad_norm": 13.01718903, "learning_rate": 9.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.029707, "epoch": 0.02736212, "global_step/max_steps": "4/438", "percentage": "0.91%", "elapsed_time": "55s", "remaining_time": "1h 40m 6s"} | |
| {"loss": 1.26073611, "token_acc": 0.66645134, "grad_norm": 8.78279591, "learning_rate": 1.14e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.033688, "epoch": 0.03420265, "global_step/max_steps": "5/438", "percentage": "1.14%", "elapsed_time": "1m 9s", "remaining_time": "1h 39m 46s"} | |
| {"loss": 1.20014334, "token_acc": 0.68080749, "grad_norm": 6.65655422, "learning_rate": 1.36e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.03705, "epoch": 0.04104318, "global_step/max_steps": "6/438", "percentage": "1.37%", "elapsed_time": "1m 22s", "remaining_time": "1h 39m 10s"} | |
| {"loss": 1.2028923, "token_acc": 0.66300407, "grad_norm": 6.05399799, "learning_rate": 1.59e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.039934, "epoch": 0.04788371, "global_step/max_steps": "7/438", "percentage": "1.60%", "elapsed_time": "1m 35s", "remaining_time": "1h 38m 30s"} | |
| {"loss": 1.14145327, "token_acc": 0.67340506, "grad_norm": 7.26559734, "learning_rate": 1.82e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.04237, "epoch": 0.05472424, "global_step/max_steps": "8/438", "percentage": "1.83%", "elapsed_time": "1m 49s", "remaining_time": "1h 38m 6s"} | |
| {"loss": 1.15775061, "token_acc": 0.66711829, "grad_norm": 5.21852875, "learning_rate": 2.05e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.044476, "epoch": 0.06156477, "global_step/max_steps": "9/438", "percentage": "2.05%", "elapsed_time": "2m 3s", "remaining_time": "1h 37m 45s"} | |
| {"loss": 1.13660383, "token_acc": 0.67274493, "grad_norm": 4.59082365, "learning_rate": 2.27e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.046418, "epoch": 0.0684053, "global_step/max_steps": "10/438", "percentage": "2.28%", "elapsed_time": "2m 16s", "remaining_time": "1h 37m 6s"} | |
| {"loss": 1.12756419, "token_acc": 0.68308148, "grad_norm": 5.57118988, "learning_rate": 2.5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.048053, "epoch": 0.07524583, "global_step/max_steps": "11/438", "percentage": "2.51%", "elapsed_time": "2m 29s", "remaining_time": "1h 36m 48s"} | |
| {"loss": 1.16390884, "token_acc": 0.68324459, "grad_norm": 3.91575289, "learning_rate": 2.73e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.04946, "epoch": 0.08208636, "global_step/max_steps": "12/438", "percentage": "2.74%", "elapsed_time": "2m 43s", "remaining_time": "1h 36m 38s"} | |
| {"loss": 1.15068495, "token_acc": 0.67635429, "grad_norm": 4.50359535, "learning_rate": 2.95e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.050765, "epoch": 0.08892689, "global_step/max_steps": "13/438", "percentage": "2.97%", "elapsed_time": "2m 56s", "remaining_time": "1h 36m 19s"} | |
| {"loss": 1.12314987, "token_acc": 0.68746717, "grad_norm": 3.70606685, "learning_rate": 3.18e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.051876, "epoch": 0.09576742, "global_step/max_steps": "14/438", "percentage": "3.20%", "elapsed_time": "3m 10s", "remaining_time": "1h 36m 11s"} | |
| {"loss": 1.14283812, "token_acc": 0.68590957, "grad_norm": 4.25128508, "learning_rate": 3.41e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.052947, "epoch": 0.10260795, "global_step/max_steps": "15/438", "percentage": "3.42%", "elapsed_time": "3m 24s", "remaining_time": "1h 35m 53s"} | |
| {"loss": 1.09887028, "token_acc": 0.69151205, "grad_norm": 3.46263003, "learning_rate": 3.64e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.053916, "epoch": 0.10944848, "global_step/max_steps": "16/438", "percentage": "3.65%", "elapsed_time": "3m 37s", "remaining_time": "1h 35m 35s"} | |
| {"loss": 1.13754737, "token_acc": 0.69320205, "grad_norm": 3.18743038, "learning_rate": 3.86e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.054826, "epoch": 0.11628901, "global_step/max_steps": "17/438", "percentage": "3.88%", "elapsed_time": "3m 50s", "remaining_time": "1h 35m 15s"} | |
| {"loss": 1.13744867, "token_acc": 0.67793828, "grad_norm": 3.41927767, "learning_rate": 4.09e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.055611, "epoch": 0.12312954, "global_step/max_steps": "18/438", "percentage": "4.11%", "elapsed_time": "4m 4s", "remaining_time": "1h 35m 2s"} | |
| {"loss": 1.11418939, "token_acc": 0.67178391, "grad_norm": 2.88191676, "learning_rate": 4.32e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.056304, "epoch": 0.12997007, "global_step/max_steps": "19/438", "percentage": "4.34%", "elapsed_time": "4m 18s", "remaining_time": "1h 34m 53s"} | |
| {"loss": 1.12714362, "token_acc": 0.69842221, "grad_norm": 3.3820951, "learning_rate": 4.55e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.056973, "epoch": 0.1368106, "global_step/max_steps": "20/438", "percentage": "4.57%", "elapsed_time": "4m 31s", "remaining_time": "1h 34m 39s"} | |
| {"loss": 1.13588881, "token_acc": 0.68285686, "grad_norm": 3.22926211, "learning_rate": 4.77e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.057589, "epoch": 0.14365113, "global_step/max_steps": "21/438", "percentage": "4.79%", "elapsed_time": "4m 45s", "remaining_time": "1h 34m 26s"} | |
| {"loss": 1.12716842, "token_acc": 0.66992167, "grad_norm": 2.53772569, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.058168, "epoch": 0.15049166, "global_step/max_steps": "22/438", "percentage": "5.02%", "elapsed_time": "4m 58s", "remaining_time": "1h 34m 12s"} | |
| {"loss": 1.09829748, "token_acc": 0.69681345, "grad_norm": 2.97209692, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.058711, "epoch": 0.15733219, "global_step/max_steps": "23/438", "percentage": "5.25%", "elapsed_time": "5m 12s", "remaining_time": "1h 33m 57s"} | |
| {"loss": 1.10398936, "token_acc": 0.67933568, "grad_norm": 2.61581445, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.059257, "epoch": 0.16417272, "global_step/max_steps": "24/438", "percentage": "5.48%", "elapsed_time": "5m 25s", "remaining_time": "1h 33m 38s"} | |
| {"loss": 1.14157128, "token_acc": 0.66356409, "grad_norm": 2.78803897, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.059736, "epoch": 0.17101325, "global_step/max_steps": "25/438", "percentage": "5.71%", "elapsed_time": "5m 39s", "remaining_time": "1h 33m 23s"} | |
| {"loss": 1.09384954, "token_acc": 0.68632514, "grad_norm": 2.59948683, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060163, "epoch": 0.17785378, "global_step/max_steps": "26/438", "percentage": "5.94%", "elapsed_time": "5m 52s", "remaining_time": "1h 33m 11s"} | |
| {"loss": 1.13768613, "token_acc": 0.68746793, "grad_norm": 2.62007427, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060574, "epoch": 0.18469431, "global_step/max_steps": "27/438", "percentage": "6.16%", "elapsed_time": "6m 6s", "remaining_time": "1h 32m 58s"} | |
| {"loss": 1.08911824, "token_acc": 0.69407524, "grad_norm": 2.64245224, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06101, "epoch": 0.19153484, "global_step/max_steps": "28/438", "percentage": "6.39%", "elapsed_time": "6m 19s", "remaining_time": "1h 32m 39s"} | |
| {"loss": 1.11435318, "token_acc": 0.68673613, "grad_norm": 2.54893303, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061371, "epoch": 0.19837537, "global_step/max_steps": "29/438", "percentage": "6.62%", "elapsed_time": "6m 33s", "remaining_time": "1h 32m 26s"} | |
| {"loss": 1.13253427, "token_acc": 0.6737902, "grad_norm": 2.90977049, "learning_rate": 5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061734, "epoch": 0.2052159, "global_step/max_steps": "30/438", "percentage": "6.85%", "elapsed_time": "6m 46s", "remaining_time": "1h 32m 10s"} | |
| {"loss": 1.09388208, "token_acc": 0.68917048, "grad_norm": 2.43353581, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062063, "epoch": 0.21205643, "global_step/max_steps": "31/438", "percentage": "7.08%", "elapsed_time": "7m 0s", "remaining_time": "1h 31m 56s"} | |
| {"loss": 1.11644948, "token_acc": 0.67968291, "grad_norm": 2.56412053, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062366, "epoch": 0.21889696, "global_step/max_steps": "32/438", "percentage": "7.31%", "elapsed_time": "7m 13s", "remaining_time": "1h 31m 43s"} | |
| {"loss": 1.11410522, "token_acc": 0.6856999, "grad_norm": 2.31273174, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062679, "epoch": 0.22573749, "global_step/max_steps": "33/438", "percentage": "7.53%", "elapsed_time": "7m 27s", "remaining_time": "1h 31m 28s"} | |
| {"loss": 1.11746752, "token_acc": 0.69118668, "grad_norm": 2.42017865, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062979, "epoch": 0.23257802, "global_step/max_steps": "34/438", "percentage": "7.76%", "elapsed_time": "7m 40s", "remaining_time": "1h 31m 12s"} | |
| {"loss": 1.09101915, "token_acc": 0.68153862, "grad_norm": 2.29399395, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06326, "epoch": 0.23941855, "global_step/max_steps": "35/438", "percentage": "7.99%", "elapsed_time": "7m 53s", "remaining_time": "1h 30m 57s"} | |
| {"loss": 1.12311721, "token_acc": 0.69326261, "grad_norm": 2.25255132, "learning_rate": 4.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063496, "epoch": 0.24625909, "global_step/max_steps": "36/438", "percentage": "8.22%", "elapsed_time": "8m 7s", "remaining_time": "1h 30m 45s"} | |
| {"loss": 1.09486032, "token_acc": 0.68499534, "grad_norm": 2.1680696, "learning_rate": 4.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063757, "epoch": 0.25309962, "global_step/max_steps": "37/438", "percentage": "8.45%", "elapsed_time": "8m 21s", "remaining_time": "1h 30m 30s"} | |
| {"loss": 1.12473679, "token_acc": 0.68193384, "grad_norm": 2.3330121, "learning_rate": 4.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063997, "epoch": 0.25994015, "global_step/max_steps": "38/438", "percentage": "8.68%", "elapsed_time": "8m 34s", "remaining_time": "1h 30m 15s"} | |
| {"loss": 1.10269117, "token_acc": 0.676268, "grad_norm": 2.2119348, "learning_rate": 4.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064202, "epoch": 0.26678068, "global_step/max_steps": "39/438", "percentage": "8.90%", "elapsed_time": "8m 48s", "remaining_time": "1h 30m 3s"} | |
| {"loss": 1.08336365, "token_acc": 0.68104059, "grad_norm": 2.12633634, "learning_rate": 4.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064399, "epoch": 0.27362121, "global_step/max_steps": "40/438", "percentage": "9.13%", "elapsed_time": "9m 1s", "remaining_time": "1h 29m 51s"} | |
| {"loss": 1.08647966, "token_acc": 0.67832847, "grad_norm": 2.99487329, "learning_rate": 4.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064604, "epoch": 0.28046174, "global_step/max_steps": "41/438", "percentage": "9.36%", "elapsed_time": "9m 15s", "remaining_time": "1h 29m 37s"} | |
| {"loss": 1.0942682, "token_acc": 0.68147467, "grad_norm": 2.37968826, "learning_rate": 4.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06482, "epoch": 0.28730227, "global_step/max_steps": "42/438", "percentage": "9.59%", "elapsed_time": "9m 28s", "remaining_time": "1h 29m 21s"} | |
| {"loss": 1.10921788, "token_acc": 0.6759337, "grad_norm": 2.24053669, "learning_rate": 4.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064999, "epoch": 0.2941428, "global_step/max_steps": "43/438", "percentage": "9.82%", "elapsed_time": "9m 42s", "remaining_time": "1h 29m 8s"} | |
| {"loss": 1.10576177, "token_acc": 0.67282828, "grad_norm": 2.14062095, "learning_rate": 4.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065183, "epoch": 0.30098333, "global_step/max_steps": "44/438", "percentage": "10.05%", "elapsed_time": "9m 55s", "remaining_time": "1h 28m 54s"} | |
| {"loss": 1.10003507, "token_acc": 0.70283119, "grad_norm": 2.21740055, "learning_rate": 4.96e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065373, "epoch": 0.30782386, "global_step/max_steps": "45/438", "percentage": "10.27%", "elapsed_time": "10m 9s", "remaining_time": "1h 28m 39s"} | |
| {"loss": 1.11402595, "token_acc": 0.68623071, "grad_norm": 2.27186966, "learning_rate": 4.96e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065534, "epoch": 0.31466439, "global_step/max_steps": "46/438", "percentage": "10.50%", "elapsed_time": "10m 22s", "remaining_time": "1h 28m 25s"} | |
| {"loss": 1.09573102, "token_acc": 0.67752833, "grad_norm": 2.47669244, "learning_rate": 4.96e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06569, "epoch": 0.32150492, "global_step/max_steps": "47/438", "percentage": "10.73%", "elapsed_time": "10m 36s", "remaining_time": "1h 28m 12s"} | |
| {"loss": 1.10476148, "token_acc": 0.67782427, "grad_norm": 2.40813327, "learning_rate": 4.95e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065841, "epoch": 0.32834545, "global_step/max_steps": "48/438", "percentage": "10.96%", "elapsed_time": "10m 49s", "remaining_time": "1h 27m 59s"} | |
| {"loss": 1.10099018, "token_acc": 0.68903714, "grad_norm": 2.27323198, "learning_rate": 4.95e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065993, "epoch": 0.33518598, "global_step/max_steps": "49/438", "percentage": "11.19%", "elapsed_time": "11m 3s", "remaining_time": "1h 27m 45s"} | |
| {"loss": 1.10472989, "token_acc": 0.6844942, "grad_norm": 2.19536352, "learning_rate": 4.94e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066168, "epoch": 0.34202651, "global_step/max_steps": "50/438", "percentage": "11.42%", "elapsed_time": "11m 16s", "remaining_time": "1h 27m 28s"} | |
| {"loss": 1.12567806, "token_acc": 0.67346527, "grad_norm": 2.83770585, "learning_rate": 4.94e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066266, "epoch": 0.34886704, "global_step/max_steps": "51/438", "percentage": "11.64%", "elapsed_time": "11m 30s", "remaining_time": "1h 27m 18s"} | |
| {"loss": 1.12685454, "token_acc": 0.66109219, "grad_norm": 2.23817825, "learning_rate": 4.94e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066374, "epoch": 0.35570757, "global_step/max_steps": "52/438", "percentage": "11.87%", "elapsed_time": "11m 44s", "remaining_time": "1h 27m 6s"} | |
| {"loss": 1.1018827, "token_acc": 0.69081563, "grad_norm": 2.17631102, "learning_rate": 4.93e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066509, "epoch": 0.3625481, "global_step/max_steps": "53/438", "percentage": "12.10%", "elapsed_time": "11m 57s", "remaining_time": "1h 26m 52s"} | |
| {"loss": 1.09778643, "token_acc": 0.67163616, "grad_norm": 2.23925686, "learning_rate": 4.93e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066644, "epoch": 0.36938863, "global_step/max_steps": "54/438", "percentage": "12.33%", "elapsed_time": "12m 10s", "remaining_time": "1h 26m 38s"} | |
| {"loss": 1.10647607, "token_acc": 0.67679849, "grad_norm": 2.33379078, "learning_rate": 4.92e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066745, "epoch": 0.37622916, "global_step/max_steps": "55/438", "percentage": "12.56%", "elapsed_time": "12m 24s", "remaining_time": "1h 26m 26s"} | |
| {"loss": 1.09020162, "token_acc": 0.68970232, "grad_norm": 2.1699276, "learning_rate": 4.92e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066871, "epoch": 0.38306969, "global_step/max_steps": "56/438", "percentage": "12.79%", "elapsed_time": "12m 38s", "remaining_time": "1h 26m 11s"} | |
| {"loss": 1.08863413, "token_acc": 0.69393085, "grad_norm": 2.14426684, "learning_rate": 4.91e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066977, "epoch": 0.38991022, "global_step/max_steps": "57/438", "percentage": "13.01%", "elapsed_time": "12m 51s", "remaining_time": "1h 25m 58s"} | |
| {"loss": 1.1063329, "token_acc": 0.66955203, "grad_norm": 2.1652391, "learning_rate": 4.91e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067092, "epoch": 0.39675075, "global_step/max_steps": "58/438", "percentage": "13.24%", "elapsed_time": "13m 5s", "remaining_time": "1h 25m 44s"} | |
| {"loss": 1.10677338, "token_acc": 0.69030521, "grad_norm": 2.26135731, "learning_rate": 4.9e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067222, "epoch": 0.40359128, "global_step/max_steps": "59/438", "percentage": "13.47%", "elapsed_time": "13m 18s", "remaining_time": "1h 25m 28s"} | |
| {"loss": 1.08060896, "token_acc": 0.70369139, "grad_norm": 2.23832226, "learning_rate": 4.9e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06732, "epoch": 0.41043181, "global_step/max_steps": "60/438", "percentage": "13.70%", "elapsed_time": "13m 31s", "remaining_time": "1h 25m 15s"} | |
| {"loss": 1.10245109, "token_acc": 0.69769428, "grad_norm": 2.26931381, "learning_rate": 4.89e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067436, "epoch": 0.41727234, "global_step/max_steps": "61/438", "percentage": "13.93%", "elapsed_time": "13m 45s", "remaining_time": "1h 25m 0s"} | |
| {"loss": 1.12171626, "token_acc": 0.67331862, "grad_norm": 2.22701454, "learning_rate": 4.89e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067531, "epoch": 0.42411287, "global_step/max_steps": "62/438", "percentage": "14.16%", "elapsed_time": "13m 58s", "remaining_time": "1h 24m 46s"} | |
| {"loss": 1.09284782, "token_acc": 0.67992749, "grad_norm": 2.1609025, "learning_rate": 4.88e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067624, "epoch": 0.4309534, "global_step/max_steps": "63/438", "percentage": "14.38%", "elapsed_time": "14m 12s", "remaining_time": "1h 24m 33s"} | |
| {"loss": 1.08852911, "token_acc": 0.68369441, "grad_norm": 2.44614315, "learning_rate": 4.88e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067723, "epoch": 0.43779393, "global_step/max_steps": "64/438", "percentage": "14.61%", "elapsed_time": "14m 25s", "remaining_time": "1h 24m 19s"} | |
| {"loss": 1.11688435, "token_acc": 0.67950678, "grad_norm": 2.34876776, "learning_rate": 4.87e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067792, "epoch": 0.44463446, "global_step/max_steps": "65/438", "percentage": "14.84%", "elapsed_time": "14m 39s", "remaining_time": "1h 24m 7s"} | |
| {"loss": 1.08325982, "token_acc": 0.69920949, "grad_norm": 2.03145409, "learning_rate": 4.86e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.067884, "epoch": 0.45147499, "global_step/max_steps": "66/438", "percentage": "15.07%", "elapsed_time": "14m 52s", "remaining_time": "1h 23m 52s"} | |
| {"loss": 1.1143831, "token_acc": 0.68214175, "grad_norm": 2.25568533, "learning_rate": 4.86e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06798, "epoch": 0.45831552, "global_step/max_steps": "67/438", "percentage": "15.30%", "elapsed_time": "15m 6s", "remaining_time": "1h 23m 38s"} | |
| {"loss": 1.08743358, "token_acc": 0.68253618, "grad_norm": 2.08536768, "learning_rate": 4.85e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068039, "epoch": 0.46515605, "global_step/max_steps": "68/438", "percentage": "15.53%", "elapsed_time": "15m 20s", "remaining_time": "1h 23m 26s"} | |
| {"loss": 1.11893344, "token_acc": 0.70382478, "grad_norm": 2.9689436, "learning_rate": 4.84e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068115, "epoch": 0.47199658, "global_step/max_steps": "69/438", "percentage": "15.75%", "elapsed_time": "15m 33s", "remaining_time": "1h 23m 13s"} | |
| {"loss": 1.076074, "token_acc": 0.69562946, "grad_norm": 2.10883188, "learning_rate": 4.84e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068178, "epoch": 0.47883711, "global_step/max_steps": "70/438", "percentage": "15.98%", "elapsed_time": "15m 47s", "remaining_time": "1h 23m 0s"} | |
| {"loss": 1.10775733, "token_acc": 0.70134856, "grad_norm": 2.06094122, "learning_rate": 4.83e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068233, "epoch": 0.48567764, "global_step/max_steps": "71/438", "percentage": "16.21%", "elapsed_time": "16m 1s", "remaining_time": "1h 22m 48s"} | |
| {"loss": 1.09979415, "token_acc": 0.68779844, "grad_norm": 2.11672616, "learning_rate": 4.82e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068292, "epoch": 0.49251817, "global_step/max_steps": "72/438", "percentage": "16.44%", "elapsed_time": "16m 15s", "remaining_time": "1h 22m 36s"} | |
| {"loss": 1.10627294, "token_acc": 0.67222669, "grad_norm": 2.5381968, "learning_rate": 4.82e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068357, "epoch": 0.4993587, "global_step/max_steps": "73/438", "percentage": "16.67%", "elapsed_time": "16m 28s", "remaining_time": "1h 22m 23s"} | |
| {"loss": 1.11859632, "token_acc": 0.69048975, "grad_norm": 1.96097445, "learning_rate": 4.81e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068415, "epoch": 0.50619923, "global_step/max_steps": "74/438", "percentage": "16.89%", "elapsed_time": "16m 42s", "remaining_time": "1h 22m 10s"} | |
| {"loss": 1.08301711, "token_acc": 0.68628328, "grad_norm": 2.71161532, "learning_rate": 4.8e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068488, "epoch": 0.51303976, "global_step/max_steps": "75/438", "percentage": "17.12%", "elapsed_time": "16m 55s", "remaining_time": "1h 21m 56s"} | |
| {"loss": 1.0969032, "token_acc": 0.69485027, "grad_norm": 2.17653537, "learning_rate": 4.79e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068558, "epoch": 0.51988029, "global_step/max_steps": "76/438", "percentage": "17.35%", "elapsed_time": "17m 9s", "remaining_time": "1h 21m 42s"} | |
| {"loss": 1.10186732, "token_acc": 0.68443041, "grad_norm": 2.83583736, "learning_rate": 4.79e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068612, "epoch": 0.52672082, "global_step/max_steps": "77/438", "percentage": "17.58%", "elapsed_time": "17m 22s", "remaining_time": "1h 21m 29s"} | |
| {"loss": 1.09556782, "token_acc": 0.68599539, "grad_norm": 2.55437589, "learning_rate": 4.78e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068676, "epoch": 0.53356135, "global_step/max_steps": "78/438", "percentage": "17.81%", "elapsed_time": "17m 36s", "remaining_time": "1h 21m 16s"} | |
| {"loss": 1.0788424, "token_acc": 0.68991112, "grad_norm": 2.63972187, "learning_rate": 4.77e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068727, "epoch": 0.54040188, "global_step/max_steps": "79/438", "percentage": "18.04%", "elapsed_time": "17m 50s", "remaining_time": "1h 21m 3s"} | |
| {"loss": 1.10909939, "token_acc": 0.69057782, "grad_norm": 2.55636263, "learning_rate": 4.76e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068783, "epoch": 0.54724241, "global_step/max_steps": "80/438", "percentage": "18.26%", "elapsed_time": "18m 3s", "remaining_time": "1h 20m 49s"} | |
| {"loss": 1.11332214, "token_acc": 0.70017689, "grad_norm": 2.52733946, "learning_rate": 4.76e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068832, "epoch": 0.55408294, "global_step/max_steps": "81/438", "percentage": "18.49%", "elapsed_time": "18m 17s", "remaining_time": "1h 20m 37s"} | |
| {"loss": 1.13330567, "token_acc": 0.6656017, "grad_norm": 2.44118977, "learning_rate": 4.75e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068883, "epoch": 0.56092347, "global_step/max_steps": "82/438", "percentage": "18.72%", "elapsed_time": "18m 31s", "remaining_time": "1h 20m 23s"} | |
| {"loss": 1.0900898, "token_acc": 0.6896309, "grad_norm": 2.04203677, "learning_rate": 4.74e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.068952, "epoch": 0.567764, "global_step/max_steps": "83/438", "percentage": "18.95%", "elapsed_time": "18m 44s", "remaining_time": "1h 20m 9s"} | |
| {"loss": 1.12732458, "token_acc": 0.68459883, "grad_norm": 2.76321721, "learning_rate": 4.73e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069015, "epoch": 0.57460453, "global_step/max_steps": "84/438", "percentage": "19.18%", "elapsed_time": "18m 57s", "remaining_time": "1h 19m 55s"} | |
| {"loss": 1.08451271, "token_acc": 0.6922626, "grad_norm": 2.09732866, "learning_rate": 4.72e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069074, "epoch": 0.58144506, "global_step/max_steps": "85/438", "percentage": "19.41%", "elapsed_time": "19m 11s", "remaining_time": "1h 19m 41s"} | |
| {"loss": 1.11771178, "token_acc": 0.69011446, "grad_norm": 2.36023974, "learning_rate": 4.71e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069124, "epoch": 0.58828559, "global_step/max_steps": "86/438", "percentage": "19.63%", "elapsed_time": "19m 24s", "remaining_time": "1h 19m 27s"} | |
| {"loss": 1.08736682, "token_acc": 0.68859168, "grad_norm": 2.2112596, "learning_rate": 4.7e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0692, "epoch": 0.59512612, "global_step/max_steps": "87/438", "percentage": "19.86%", "elapsed_time": "19m 37s", "remaining_time": "1h 19m 12s"} | |
| {"loss": 1.09089327, "token_acc": 0.69346788, "grad_norm": 2.34892702, "learning_rate": 4.7e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06925, "epoch": 0.60196665, "global_step/max_steps": "88/438", "percentage": "20.09%", "elapsed_time": "19m 51s", "remaining_time": "1h 18m 58s"} | |
| {"loss": 1.08859301, "token_acc": 0.68274925, "grad_norm": 2.42192912, "learning_rate": 4.69e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069296, "epoch": 0.60880718, "global_step/max_steps": "89/438", "percentage": "20.32%", "elapsed_time": "20m 5s", "remaining_time": "1h 18m 45s"} | |
| {"loss": 1.11126089, "token_acc": 0.67941748, "grad_norm": 2.49750519, "learning_rate": 4.68e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069343, "epoch": 0.61564771, "global_step/max_steps": "90/438", "percentage": "20.55%", "elapsed_time": "20m 18s", "remaining_time": "1h 18m 31s"} | |
| {"loss": 1.11114061, "token_acc": 0.68449816, "grad_norm": 2.56865168, "learning_rate": 4.67e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069383, "epoch": 0.62248824, "global_step/max_steps": "91/438", "percentage": "20.78%", "elapsed_time": "20m 32s", "remaining_time": "1h 18m 18s"} | |
| {"loss": 1.11451471, "token_acc": 0.68583277, "grad_norm": 2.05704951, "learning_rate": 4.66e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06941, "epoch": 0.62932877, "global_step/max_steps": "92/438", "percentage": "21.00%", "elapsed_time": "20m 46s", "remaining_time": "1h 18m 6s"} | |
| {"loss": 1.0957973, "token_acc": 0.68713323, "grad_norm": 2.31245375, "learning_rate": 4.65e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069458, "epoch": 0.6361693, "global_step/max_steps": "93/438", "percentage": "21.23%", "elapsed_time": "20m 59s", "remaining_time": "1h 17m 52s"} | |
| {"loss": 1.10252881, "token_acc": 0.69049336, "grad_norm": 2.26923084, "learning_rate": 4.64e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069498, "epoch": 0.64300983, "global_step/max_steps": "94/438", "percentage": "21.46%", "elapsed_time": "21m 13s", "remaining_time": "1h 17m 39s"} | |
| {"loss": 1.1101222, "token_acc": 0.67835506, "grad_norm": 2.24394512, "learning_rate": 4.63e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069544, "epoch": 0.64985036, "global_step/max_steps": "95/438", "percentage": "21.69%", "elapsed_time": "21m 26s", "remaining_time": "1h 17m 25s"} | |
| {"loss": 1.10440254, "token_acc": 0.69113471, "grad_norm": 2.13546801, "learning_rate": 4.62e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0696, "epoch": 0.65669089, "global_step/max_steps": "96/438", "percentage": "21.92%", "elapsed_time": "21m 40s", "remaining_time": "1h 17m 11s"} | |
| {"loss": 1.1048435, "token_acc": 0.69788078, "grad_norm": 2.21477604, "learning_rate": 4.61e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069647, "epoch": 0.66353142, "global_step/max_steps": "97/438", "percentage": "22.15%", "elapsed_time": "21m 53s", "remaining_time": "1h 16m 57s"} | |
| {"loss": 1.08908308, "token_acc": 0.66263099, "grad_norm": 2.02156925, "learning_rate": 4.6e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069666, "epoch": 0.67037195, "global_step/max_steps": "98/438", "percentage": "22.37%", "elapsed_time": "22m 7s", "remaining_time": "1h 16m 45s"} | |
| {"loss": 1.11216283, "token_acc": 0.68233371, "grad_norm": 2.22646284, "learning_rate": 4.59e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0697, "epoch": 0.67721248, "global_step/max_steps": "99/438", "percentage": "22.60%", "elapsed_time": "22m 21s", "remaining_time": "1h 16m 32s"} | |
| {"loss": 1.08155155, "token_acc": 0.69272844, "grad_norm": 2.20218539, "learning_rate": 4.58e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.069747, "epoch": 0.68405301, "global_step/max_steps": "100/438", "percentage": "22.83%", "elapsed_time": "22m 34s", "remaining_time": "1h 16m 18s"} | |
| {"eval_loss": 1.08442855, "eval_token_acc": 0.68760993, "eval_runtime": 18.8828, "eval_samples_per_second": 52.164, "eval_steps_per_second": 6.567, "epoch": 0.68405301, "global_step/max_steps": "100/438", "percentage": "22.83%", "elapsed_time": "22m 53s", "remaining_time": "1h 17m 21s"} | |
| {"loss": 1.09720016, "token_acc": 0.68862822, "grad_norm": 2.41684461, "learning_rate": 4.57e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.059901, "epoch": 0.69089354, "global_step/max_steps": "101/438", "percentage": "23.06%", "elapsed_time": "26m 46s", "remaining_time": "1h 29m 21s"} | |
| {"loss": 1.11566281, "token_acc": 0.69899232, "grad_norm": 2.20432281, "learning_rate": 4.56e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060005, "epoch": 0.69773407, "global_step/max_steps": "102/438", "percentage": "23.29%", "elapsed_time": "27m 0s", "remaining_time": "1h 28m 58s"} | |
| {"loss": 1.08794069, "token_acc": 0.67792632, "grad_norm": 2.15967655, "learning_rate": 4.55e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060119, "epoch": 0.7045746, "global_step/max_steps": "103/438", "percentage": "23.52%", "elapsed_time": "27m 13s", "remaining_time": "1h 28m 34s"} | |
| {"loss": 1.07929981, "token_acc": 0.6925023, "grad_norm": 2.30847716, "learning_rate": 4.54e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060229, "epoch": 0.71141513, "global_step/max_steps": "104/438", "percentage": "23.74%", "elapsed_time": "27m 27s", "remaining_time": "1h 28m 10s"} | |
| {"loss": 1.10373664, "token_acc": 0.68495146, "grad_norm": 2.05202723, "learning_rate": 4.52e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060328, "epoch": 0.71825566, "global_step/max_steps": "105/438", "percentage": "23.97%", "elapsed_time": "27m 41s", "remaining_time": "1h 27m 48s"} | |
| {"loss": 1.1148057, "token_acc": 0.68809152, "grad_norm": 2.15440369, "learning_rate": 4.51e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060442, "epoch": 0.72509619, "global_step/max_steps": "106/438", "percentage": "24.20%", "elapsed_time": "27m 54s", "remaining_time": "1h 27m 24s"} | |
| {"loss": 1.08998144, "token_acc": 0.68774194, "grad_norm": 2.24733448, "learning_rate": 4.5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060541, "epoch": 0.73193673, "global_step/max_steps": "107/438", "percentage": "24.43%", "elapsed_time": "28m 8s", "remaining_time": "1h 27m 2s"} | |
| {"loss": 1.10158086, "token_acc": 0.68555359, "grad_norm": 2.13290381, "learning_rate": 4.49e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060647, "epoch": 0.73877726, "global_step/max_steps": "108/438", "percentage": "24.66%", "elapsed_time": "28m 21s", "remaining_time": "1h 26m 39s"} | |
| {"loss": 1.1020844, "token_acc": 0.68688172, "grad_norm": 2.29288816, "learning_rate": 4.48e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060757, "epoch": 0.74561779, "global_step/max_steps": "109/438", "percentage": "24.89%", "elapsed_time": "28m 34s", "remaining_time": "1h 26m 15s"} | |
| {"loss": 1.06607306, "token_acc": 0.6940231, "grad_norm": 2.0880115, "learning_rate": 4.47e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060858, "epoch": 0.75245832, "global_step/max_steps": "110/438", "percentage": "25.11%", "elapsed_time": "28m 48s", "remaining_time": "1h 25m 53s"} | |
| {"loss": 1.09520507, "token_acc": 0.67799273, "grad_norm": 2.09125304, "learning_rate": 4.46e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060956, "epoch": 0.75929885, "global_step/max_steps": "111/438", "percentage": "25.34%", "elapsed_time": "29m 1s", "remaining_time": "1h 25m 30s"} | |
| {"loss": 1.06843829, "token_acc": 0.68395634, "grad_norm": 2.0262785, "learning_rate": 4.44e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061054, "epoch": 0.76613938, "global_step/max_steps": "112/438", "percentage": "25.57%", "elapsed_time": "29m 15s", "remaining_time": "1h 25m 8s"} | |
| {"loss": 1.07099426, "token_acc": 0.69901011, "grad_norm": 2.11470532, "learning_rate": 4.43e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061158, "epoch": 0.77297991, "global_step/max_steps": "113/438", "percentage": "25.80%", "elapsed_time": "29m 28s", "remaining_time": "1h 24m 46s"} | |
| {"loss": 1.09204865, "token_acc": 0.68291635, "grad_norm": 2.08143091, "learning_rate": 4.42e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061239, "epoch": 0.77982044, "global_step/max_steps": "114/438", "percentage": "26.03%", "elapsed_time": "29m 42s", "remaining_time": "1h 24m 25s"} | |
| {"loss": 1.08635902, "token_acc": 0.68927309, "grad_norm": 2.10561299, "learning_rate": 4.41e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061331, "epoch": 0.78666097, "global_step/max_steps": "115/438", "percentage": "26.26%", "elapsed_time": "29m 55s", "remaining_time": "1h 24m 3s"} | |
| {"loss": 1.08257174, "token_acc": 0.69410781, "grad_norm": 2.04155684, "learning_rate": 4.4e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061422, "epoch": 0.7935015, "global_step/max_steps": "116/438", "percentage": "26.48%", "elapsed_time": "30m 9s", "remaining_time": "1h 23m 42s"} | |
| {"loss": 1.11466837, "token_acc": 0.70041503, "grad_norm": 2.06280637, "learning_rate": 4.38e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061513, "epoch": 0.80034203, "global_step/max_steps": "117/438", "percentage": "26.71%", "elapsed_time": "30m 22s", "remaining_time": "1h 23m 20s"} | |
| {"loss": 1.0804944, "token_acc": 0.68743584, "grad_norm": 2.16781521, "learning_rate": 4.37e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061604, "epoch": 0.80718256, "global_step/max_steps": "118/438", "percentage": "26.94%", "elapsed_time": "30m 36s", "remaining_time": "1h 22m 59s"} | |
| {"loss": 1.07927585, "token_acc": 0.69405815, "grad_norm": 2.02913332, "learning_rate": 4.36e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061691, "epoch": 0.81402309, "global_step/max_steps": "119/438", "percentage": "27.17%", "elapsed_time": "30m 49s", "remaining_time": "1h 22m 38s"} | |
| {"loss": 1.05509448, "token_acc": 0.68438904, "grad_norm": 2.01890969, "learning_rate": 4.35e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061779, "epoch": 0.82086362, "global_step/max_steps": "120/438", "percentage": "27.40%", "elapsed_time": "31m 3s", "remaining_time": "1h 22m 17s"} | |
| {"loss": 1.09220791, "token_acc": 0.67177824, "grad_norm": 2.14285684, "learning_rate": 4.33e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061858, "epoch": 0.82770415, "global_step/max_steps": "121/438", "percentage": "27.63%", "elapsed_time": "31m 16s", "remaining_time": "1h 21m 56s"} | |
| {"loss": 1.09302473, "token_acc": 0.68291688, "grad_norm": 2.11581326, "learning_rate": 4.32e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061935, "epoch": 0.83454468, "global_step/max_steps": "122/438", "percentage": "27.85%", "elapsed_time": "31m 30s", "remaining_time": "1h 21m 36s"} | |
| {"loss": 1.11649549, "token_acc": 0.67320325, "grad_norm": 2.14738321, "learning_rate": 4.31e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062015, "epoch": 0.84138521, "global_step/max_steps": "123/438", "percentage": "28.08%", "elapsed_time": "31m 44s", "remaining_time": "1h 21m 16s"} | |
| {"loss": 1.0742054, "token_acc": 0.68284617, "grad_norm": 2.10535336, "learning_rate": 4.29e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062102, "epoch": 0.84822574, "global_step/max_steps": "124/438", "percentage": "28.31%", "elapsed_time": "31m 57s", "remaining_time": "1h 20m 55s"} | |
| {"loss": 1.08880138, "token_acc": 0.67957643, "grad_norm": 2.12732935, "learning_rate": 4.28e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062178, "epoch": 0.85506627, "global_step/max_steps": "125/438", "percentage": "28.54%", "elapsed_time": "32m 11s", "remaining_time": "1h 20m 35s"} | |
| {"loss": 1.08475089, "token_acc": 0.68436303, "grad_norm": 2.00703001, "learning_rate": 4.27e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062265, "epoch": 0.8619068, "global_step/max_steps": "126/438", "percentage": "28.77%", "elapsed_time": "32m 24s", "remaining_time": "1h 20m 14s"} | |
| {"loss": 1.1318078, "token_acc": 0.66376728, "grad_norm": 2.2657032, "learning_rate": 4.25e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062337, "epoch": 0.86874733, "global_step/max_steps": "127/438", "percentage": "29.00%", "elapsed_time": "32m 38s", "remaining_time": "1h 19m 54s"} | |
| {"loss": 1.07927227, "token_acc": 0.70161374, "grad_norm": 2.18184066, "learning_rate": 4.24e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062414, "epoch": 0.87558786, "global_step/max_steps": "128/438", "percentage": "29.22%", "elapsed_time": "32m 51s", "remaining_time": "1h 19m 34s"} | |
| {"loss": 1.08935821, "token_acc": 0.69525494, "grad_norm": 2.02096701, "learning_rate": 4.23e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062494, "epoch": 0.88242839, "global_step/max_steps": "129/438", "percentage": "29.45%", "elapsed_time": "33m 4s", "remaining_time": "1h 19m 14s"} | |
| {"loss": 1.1004312, "token_acc": 0.6945279, "grad_norm": 2.10023904, "learning_rate": 4.21e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062564, "epoch": 0.88926892, "global_step/max_steps": "130/438", "percentage": "29.68%", "elapsed_time": "33m 18s", "remaining_time": "1h 18m 55s"} | |
| {"loss": 1.10651028, "token_acc": 0.67756359, "grad_norm": 2.0362711, "learning_rate": 4.2e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062633, "epoch": 0.89610945, "global_step/max_steps": "131/438", "percentage": "29.91%", "elapsed_time": "33m 32s", "remaining_time": "1h 18m 35s"} | |
| {"loss": 1.0916853, "token_acc": 0.67597709, "grad_norm": 2.00083447, "learning_rate": 4.19e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0627, "epoch": 0.90294998, "global_step/max_steps": "132/438", "percentage": "30.14%", "elapsed_time": "33m 45s", "remaining_time": "1h 18m 16s"} | |
| {"loss": 1.09735751, "token_acc": 0.69789307, "grad_norm": 2.20222735, "learning_rate": 4.17e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06277, "epoch": 0.90979051, "global_step/max_steps": "133/438", "percentage": "30.37%", "elapsed_time": "33m 59s", "remaining_time": "1h 17m 57s"} | |
| {"loss": 1.09693933, "token_acc": 0.70456802, "grad_norm": 2.11650324, "learning_rate": 4.16e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062835, "epoch": 0.91663104, "global_step/max_steps": "134/438", "percentage": "30.59%", "elapsed_time": "34m 13s", "remaining_time": "1h 17m 38s"} | |
| {"loss": 1.09591222, "token_acc": 0.67919983, "grad_norm": 1.86144531, "learning_rate": 4.14e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062906, "epoch": 0.92347157, "global_step/max_steps": "135/438", "percentage": "30.82%", "elapsed_time": "34m 26s", "remaining_time": "1h 17m 18s"} | |
| {"loss": 1.07723475, "token_acc": 0.68502652, "grad_norm": 2.16780448, "learning_rate": 4.13e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062974, "epoch": 0.9303121, "global_step/max_steps": "136/438", "percentage": "31.05%", "elapsed_time": "34m 40s", "remaining_time": "1h 16m 59s"} | |
| {"loss": 1.06759381, "token_acc": 0.69047885, "grad_norm": 2.14518762, "learning_rate": 4.11e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06305, "epoch": 0.93715263, "global_step/max_steps": "137/438", "percentage": "31.28%", "elapsed_time": "34m 53s", "remaining_time": "1h 16m 39s"} | |
| {"loss": 1.0793016, "token_acc": 0.68612421, "grad_norm": 1.90841281, "learning_rate": 4.1e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063104, "epoch": 0.94399316, "global_step/max_steps": "138/438", "percentage": "31.51%", "elapsed_time": "35m 7s", "remaining_time": "1h 16m 21s"} | |
| {"loss": 1.09360647, "token_acc": 0.69031348, "grad_norm": 2.02513123, "learning_rate": 4.09e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063162, "epoch": 0.95083369, "global_step/max_steps": "139/438", "percentage": "31.74%", "elapsed_time": "35m 21s", "remaining_time": "1h 16m 3s"} | |
| {"loss": 1.12534165, "token_acc": 0.6751586, "grad_norm": 1.99317515, "learning_rate": 4.07e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063229, "epoch": 0.95767422, "global_step/max_steps": "140/438", "percentage": "31.96%", "elapsed_time": "35m 34s", "remaining_time": "1h 15m 44s"} | |
| {"loss": 1.11249626, "token_acc": 0.67734628, "grad_norm": 2.21646667, "learning_rate": 4.06e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063295, "epoch": 0.96451475, "global_step/max_steps": "141/438", "percentage": "32.19%", "elapsed_time": "35m 48s", "remaining_time": "1h 15m 25s"} | |
| {"loss": 1.07855034, "token_acc": 0.68858025, "grad_norm": 1.9561305, "learning_rate": 4.04e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063358, "epoch": 0.97135528, "global_step/max_steps": "142/438", "percentage": "32.42%", "elapsed_time": "36m 1s", "remaining_time": "1h 15m 6s"} | |
| {"loss": 1.07847273, "token_acc": 0.68126965, "grad_norm": 1.90408933, "learning_rate": 4.03e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063428, "epoch": 0.97819581, "global_step/max_steps": "143/438", "percentage": "32.65%", "elapsed_time": "36m 15s", "remaining_time": "1h 14m 47s"} | |
| {"loss": 1.08116019, "token_acc": 0.67892033, "grad_norm": 2.1272881, "learning_rate": 4.01e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063492, "epoch": 0.98503634, "global_step/max_steps": "144/438", "percentage": "32.88%", "elapsed_time": "36m 28s", "remaining_time": "1h 14m 28s"} | |
| {"loss": 1.0934006, "token_acc": 0.6827111, "grad_norm": 2.00823569, "learning_rate": 4e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06356, "epoch": 0.99187687, "global_step/max_steps": "145/438", "percentage": "33.11%", "elapsed_time": "36m 42s", "remaining_time": "1h 14m 9s"} | |
| {"loss": 1.0874095, "token_acc": 0.69849763, "grad_norm": 2.01293111, "learning_rate": 3.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063624, "epoch": 0.9987174, "global_step/max_steps": "146/438", "percentage": "33.33%", "elapsed_time": "36m 55s", "remaining_time": "1h 13m 50s"} | |
| {"loss": 1.10059404, "token_acc": 0.65476963, "grad_norm": 2.01293111, "learning_rate": 3.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063989, "epoch": 1.0, "global_step/max_steps": "147/438", "percentage": "33.56%", "elapsed_time": "36m 57s", "remaining_time": "1h 13m 10s"} | |
| {"loss": 0.88656336, "token_acc": 0.72938669, "grad_norm": 5.00217819, "learning_rate": 3.95e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064035, "epoch": 1.00684053, "global_step/max_steps": "148/438", "percentage": "33.79%", "elapsed_time": "37m 11s", "remaining_time": "1h 12m 53s"} | |
| {"loss": 0.8776077, "token_acc": 0.73441887, "grad_norm": 2.82801056, "learning_rate": 3.94e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064096, "epoch": 1.01368106, "global_step/max_steps": "149/438", "percentage": "34.02%", "elapsed_time": "37m 25s", "remaining_time": "1h 12m 35s"} | |
| {"loss": 0.86887139, "token_acc": 0.73476128, "grad_norm": 2.37643099, "learning_rate": 3.92e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064145, "epoch": 1.02052159, "global_step/max_steps": "150/438", "percentage": "34.25%", "elapsed_time": "37m 39s", "remaining_time": "1h 12m 17s"} | |
| {"loss": 0.85906684, "token_acc": 0.73412973, "grad_norm": 2.36670995, "learning_rate": 3.9e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064197, "epoch": 1.02736212, "global_step/max_steps": "151/438", "percentage": "34.47%", "elapsed_time": "37m 52s", "remaining_time": "1h 11m 59s"} | |
| {"loss": 0.81556022, "token_acc": 0.75090323, "grad_norm": 2.94548702, "learning_rate": 3.89e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064254, "epoch": 1.03420265, "global_step/max_steps": "152/438", "percentage": "34.70%", "elapsed_time": "38m 6s", "remaining_time": "1h 11m 41s"} | |
| {"loss": 0.86946702, "token_acc": 0.73711725, "grad_norm": 3.11124063, "learning_rate": 3.87e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064305, "epoch": 1.04104318, "global_step/max_steps": "153/438", "percentage": "34.93%", "elapsed_time": "38m 19s", "remaining_time": "1h 11m 24s"} | |
| {"loss": 0.82285875, "token_acc": 0.74682666, "grad_norm": 2.5405066, "learning_rate": 3.86e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064354, "epoch": 1.04788371, "global_step/max_steps": "154/438", "percentage": "35.16%", "elapsed_time": "38m 33s", "remaining_time": "1h 11m 6s"} | |
| {"loss": 0.84390497, "token_acc": 0.74596422, "grad_norm": 3.07758474, "learning_rate": 3.84e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064408, "epoch": 1.05472424, "global_step/max_steps": "155/438", "percentage": "35.39%", "elapsed_time": "38m 47s", "remaining_time": "1h 10m 49s"} | |
| {"loss": 0.82624316, "token_acc": 0.73849518, "grad_norm": 2.84137797, "learning_rate": 3.83e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064467, "epoch": 1.06156477, "global_step/max_steps": "156/438", "percentage": "35.62%", "elapsed_time": "39m 0s", "remaining_time": "1h 10m 30s"} | |
| {"loss": 0.82533765, "token_acc": 0.7275661, "grad_norm": 2.7208724, "learning_rate": 3.81e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064518, "epoch": 1.0684053, "global_step/max_steps": "157/438", "percentage": "35.84%", "elapsed_time": "39m 14s", "remaining_time": "1h 10m 13s"} | |
| {"loss": 0.81975013, "token_acc": 0.74720358, "grad_norm": 2.35895419, "learning_rate": 3.79e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064571, "epoch": 1.07524583, "global_step/max_steps": "158/438", "percentage": "36.07%", "elapsed_time": "39m 27s", "remaining_time": "1h 9m 55s"} | |
| {"loss": 0.8332752, "token_acc": 0.71650705, "grad_norm": 2.60684419, "learning_rate": 3.78e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064629, "epoch": 1.08208636, "global_step/max_steps": "159/438", "percentage": "36.30%", "elapsed_time": "39m 40s", "remaining_time": "1h 9m 37s"} | |
| {"loss": 0.85839999, "token_acc": 0.76361235, "grad_norm": 2.65483141, "learning_rate": 3.76e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064677, "epoch": 1.08892689, "global_step/max_steps": "160/438", "percentage": "36.53%", "elapsed_time": "39m 54s", "remaining_time": "1h 9m 20s"} | |
| {"loss": 0.83146179, "token_acc": 0.74669718, "grad_norm": 2.30899024, "learning_rate": 3.74e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064731, "epoch": 1.09576742, "global_step/max_steps": "161/438", "percentage": "36.76%", "elapsed_time": "40m 7s", "remaining_time": "1h 9m 2s"} | |
| {"loss": 0.78907788, "token_acc": 0.74368427, "grad_norm": 2.35486507, "learning_rate": 3.73e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064774, "epoch": 1.10260795, "global_step/max_steps": "162/438", "percentage": "36.99%", "elapsed_time": "40m 21s", "remaining_time": "1h 8m 45s"} | |
| {"loss": 0.81396461, "token_acc": 0.75407363, "grad_norm": 2.16729355, "learning_rate": 3.71e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064824, "epoch": 1.10944848, "global_step/max_steps": "163/438", "percentage": "37.21%", "elapsed_time": "40m 35s", "remaining_time": "1h 8m 28s"} | |
| {"loss": 0.82537532, "token_acc": 0.74428571, "grad_norm": 2.30538964, "learning_rate": 3.7e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064871, "epoch": 1.11628901, "global_step/max_steps": "164/438", "percentage": "37.44%", "elapsed_time": "40m 48s", "remaining_time": "1h 8m 11s"} | |
| {"loss": 0.82059765, "token_acc": 0.7518906, "grad_norm": 2.14769077, "learning_rate": 3.68e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06492, "epoch": 1.12312954, "global_step/max_steps": "165/438", "percentage": "37.67%", "elapsed_time": "41m 2s", "remaining_time": "1h 7m 53s"} | |
| {"loss": 0.8073501, "token_acc": 0.74353857, "grad_norm": 2.29713583, "learning_rate": 3.66e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064971, "epoch": 1.12997007, "global_step/max_steps": "166/438", "percentage": "37.90%", "elapsed_time": "41m 15s", "remaining_time": "1h 7m 36s"} | |
| {"loss": 0.80840313, "token_acc": 0.75584148, "grad_norm": 2.23488927, "learning_rate": 3.65e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065021, "epoch": 1.1368106, "global_step/max_steps": "167/438", "percentage": "38.13%", "elapsed_time": "41m 29s", "remaining_time": "1h 7m 19s"} | |
| {"loss": 0.81069857, "token_acc": 0.75382423, "grad_norm": 2.33313251, "learning_rate": 3.63e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065067, "epoch": 1.14365113, "global_step/max_steps": "168/438", "percentage": "38.36%", "elapsed_time": "41m 42s", "remaining_time": "1h 7m 2s"} | |
| {"loss": 0.79178816, "token_acc": 0.76201742, "grad_norm": 2.3197906, "learning_rate": 3.61e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065105, "epoch": 1.15049166, "global_step/max_steps": "169/438", "percentage": "38.58%", "elapsed_time": "41m 56s", "remaining_time": "1h 6m 45s"} | |
| {"loss": 0.81671697, "token_acc": 0.7507794, "grad_norm": 2.34021521, "learning_rate": 3.59e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065145, "epoch": 1.15733219, "global_step/max_steps": "170/438", "percentage": "38.81%", "elapsed_time": "42m 10s", "remaining_time": "1h 6m 28s"} | |
| {"loss": 0.78854477, "token_acc": 0.77511456, "grad_norm": 2.14898157, "learning_rate": 3.58e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065186, "epoch": 1.16417272, "global_step/max_steps": "171/438", "percentage": "39.04%", "elapsed_time": "42m 23s", "remaining_time": "1h 6m 12s"} | |
| {"loss": 0.7933774, "token_acc": 0.75049564, "grad_norm": 2.26571107, "learning_rate": 3.56e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065235, "epoch": 1.17101325, "global_step/max_steps": "172/438", "percentage": "39.27%", "elapsed_time": "42m 37s", "remaining_time": "1h 5m 54s"} | |
| {"loss": 0.79885399, "token_acc": 0.74213339, "grad_norm": 2.30276465, "learning_rate": 3.54e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065271, "epoch": 1.17785378, "global_step/max_steps": "173/438", "percentage": "39.50%", "elapsed_time": "42m 51s", "remaining_time": "1h 5m 38s"} | |
| {"loss": 0.81437981, "token_acc": 0.74503635, "grad_norm": 2.21942687, "learning_rate": 3.53e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065312, "epoch": 1.18469431, "global_step/max_steps": "174/438", "percentage": "39.73%", "elapsed_time": "43m 4s", "remaining_time": "1h 5m 21s"} | |
| {"loss": 0.80905366, "token_acc": 0.75095548, "grad_norm": 2.13464761, "learning_rate": 3.51e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065353, "epoch": 1.19153484, "global_step/max_steps": "175/438", "percentage": "39.95%", "elapsed_time": "43m 18s", "remaining_time": "1h 5m 5s"} | |
| {"loss": 0.81485403, "token_acc": 0.76280242, "grad_norm": 2.20754671, "learning_rate": 3.49e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065397, "epoch": 1.19837537, "global_step/max_steps": "176/438", "percentage": "40.18%", "elapsed_time": "43m 31s", "remaining_time": "1h 4m 48s"} | |
| {"loss": 0.79710519, "token_acc": 0.75989526, "grad_norm": 2.15477324, "learning_rate": 3.47e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065444, "epoch": 1.2052159, "global_step/max_steps": "177/438", "percentage": "40.41%", "elapsed_time": "43m 45s", "remaining_time": "1h 4m 31s"} | |
| {"loss": 0.81676149, "token_acc": 0.74868643, "grad_norm": 2.13530111, "learning_rate": 3.46e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065482, "epoch": 1.21205643, "global_step/max_steps": "178/438", "percentage": "40.64%", "elapsed_time": "43m 59s", "remaining_time": "1h 4m 14s"} | |
| {"loss": 0.80099833, "token_acc": 0.7756896, "grad_norm": 2.17409253, "learning_rate": 3.44e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065516, "epoch": 1.21889696, "global_step/max_steps": "179/438", "percentage": "40.87%", "elapsed_time": "44m 12s", "remaining_time": "1h 3m 58s"} | |
| {"loss": 0.83853233, "token_acc": 0.73276869, "grad_norm": 2.28290987, "learning_rate": 3.42e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065556, "epoch": 1.22573749, "global_step/max_steps": "180/438", "percentage": "41.10%", "elapsed_time": "44m 26s", "remaining_time": "1h 3m 41s"} | |
| {"loss": 0.79771173, "token_acc": 0.75507657, "grad_norm": 2.4223516, "learning_rate": 3.4e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065598, "epoch": 1.23257802, "global_step/max_steps": "181/438", "percentage": "41.32%", "elapsed_time": "44m 39s", "remaining_time": "1h 3m 25s"} | |
| {"loss": 0.79670918, "token_acc": 0.77750309, "grad_norm": 2.20744729, "learning_rate": 3.39e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065644, "epoch": 1.23941855, "global_step/max_steps": "182/438", "percentage": "41.55%", "elapsed_time": "44m 53s", "remaining_time": "1h 3m 8s"} | |
| {"loss": 0.77120924, "token_acc": 0.75651303, "grad_norm": 2.26727462, "learning_rate": 3.37e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065686, "epoch": 1.24625909, "global_step/max_steps": "183/438", "percentage": "41.78%", "elapsed_time": "45m 6s", "remaining_time": "1h 2m 51s"} | |
| {"loss": 0.80533761, "token_acc": 0.75386377, "grad_norm": 2.18146992, "learning_rate": 3.35e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065722, "epoch": 1.25309962, "global_step/max_steps": "184/438", "percentage": "42.01%", "elapsed_time": "45m 20s", "remaining_time": "1h 2m 35s"} | |
| {"loss": 0.80800486, "token_acc": 0.73734431, "grad_norm": 2.32830071, "learning_rate": 3.33e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065756, "epoch": 1.25994015, "global_step/max_steps": "185/438", "percentage": "42.24%", "elapsed_time": "45m 34s", "remaining_time": "1h 2m 19s"} | |
| {"loss": 0.81566858, "token_acc": 0.76173741, "grad_norm": 2.15495849, "learning_rate": 3.32e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065792, "epoch": 1.26678068, "global_step/max_steps": "186/438", "percentage": "42.47%", "elapsed_time": "45m 47s", "remaining_time": "1h 2m 2s"} | |
| {"loss": 0.77903819, "token_acc": 0.74939784, "grad_norm": 2.05213571, "learning_rate": 3.3e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065836, "epoch": 1.27362121, "global_step/max_steps": "187/438", "percentage": "42.69%", "elapsed_time": "46m 1s", "remaining_time": "1h 1m 46s"} | |
| {"loss": 0.78718412, "token_acc": 0.75598656, "grad_norm": 2.09561968, "learning_rate": 3.28e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065871, "epoch": 1.28046174, "global_step/max_steps": "188/438", "percentage": "42.92%", "elapsed_time": "46m 14s", "remaining_time": "1h 1m 29s"} | |
| {"loss": 0.80823046, "token_acc": 0.75485409, "grad_norm": 2.08995128, "learning_rate": 3.26e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065907, "epoch": 1.28730227, "global_step/max_steps": "189/438", "percentage": "43.15%", "elapsed_time": "46m 28s", "remaining_time": "1h 1m 13s"} | |
| {"loss": 0.80047834, "token_acc": 0.76477644, "grad_norm": 2.14590549, "learning_rate": 3.24e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065946, "epoch": 1.2941428, "global_step/max_steps": "190/438", "percentage": "43.38%", "elapsed_time": "46m 41s", "remaining_time": "1h 0m 57s"} | |
| {"loss": 0.79757953, "token_acc": 0.75790779, "grad_norm": 2.16854358, "learning_rate": 3.23e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.065978, "epoch": 1.30098333, "global_step/max_steps": "191/438", "percentage": "43.61%", "elapsed_time": "46m 55s", "remaining_time": "1h 0m 41s"} | |
| {"loss": 0.80043441, "token_acc": 0.74897077, "grad_norm": 2.19995809, "learning_rate": 3.21e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066016, "epoch": 1.30782386, "global_step/max_steps": "192/438", "percentage": "43.84%", "elapsed_time": "47m 9s", "remaining_time": "1h 0m 24s"} | |
| {"loss": 0.80513364, "token_acc": 0.76095279, "grad_norm": 2.13512564, "learning_rate": 3.19e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066054, "epoch": 1.31466439, "global_step/max_steps": "193/438", "percentage": "44.06%", "elapsed_time": "47m 22s", "remaining_time": "1h 0m 8s"} | |
| {"loss": 0.80786967, "token_acc": 0.75628821, "grad_norm": 2.17346549, "learning_rate": 3.17e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066088, "epoch": 1.32150492, "global_step/max_steps": "194/438", "percentage": "44.29%", "elapsed_time": "47m 36s", "remaining_time": "59m 52s"} | |
| {"loss": 0.80035102, "token_acc": 0.74532003, "grad_norm": 2.13355994, "learning_rate": 3.15e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066124, "epoch": 1.32834545, "global_step/max_steps": "195/438", "percentage": "44.52%", "elapsed_time": "47m 49s", "remaining_time": "59m 36s"} | |
| {"loss": 0.79917967, "token_acc": 0.74585927, "grad_norm": 2.17028189, "learning_rate": 3.13e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06616, "epoch": 1.33518598, "global_step/max_steps": "196/438", "percentage": "44.75%", "elapsed_time": "48m 3s", "remaining_time": "59m 19s"} | |
| {"loss": 0.7829634, "token_acc": 0.73649352, "grad_norm": 2.1003294, "learning_rate": 3.12e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0662, "epoch": 1.34202651, "global_step/max_steps": "197/438", "percentage": "44.98%", "elapsed_time": "48m 16s", "remaining_time": "59m 3s"} | |
| {"loss": 0.83499759, "token_acc": 0.75417208, "grad_norm": 2.20376182, "learning_rate": 3.1e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066232, "epoch": 1.34886704, "global_step/max_steps": "198/438", "percentage": "45.21%", "elapsed_time": "48m 30s", "remaining_time": "58m 47s"} | |
| {"loss": 0.82829124, "token_acc": 0.75469007, "grad_norm": 2.15614057, "learning_rate": 3.08e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066266, "epoch": 1.35570757, "global_step/max_steps": "199/438", "percentage": "45.43%", "elapsed_time": "48m 43s", "remaining_time": "58m 31s"} | |
| {"loss": 0.82295626, "token_acc": 0.75484713, "grad_norm": 2.13961625, "learning_rate": 3.06e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.066302, "epoch": 1.3625481, "global_step/max_steps": "200/438", "percentage": "45.66%", "elapsed_time": "48m 57s", "remaining_time": "58m 15s"} | |
| {"eval_loss": 1.11335552, "eval_token_acc": 0.6851379, "eval_runtime": 18.8889, "eval_samples_per_second": 52.147, "eval_steps_per_second": 6.565, "epoch": 1.3625481, "global_step/max_steps": "200/438", "percentage": "45.66%", "elapsed_time": "49m 16s", "remaining_time": "58m 37s"} | |
| {"loss": 0.82565069, "token_acc": 0.69313595, "grad_norm": 2.15123749, "learning_rate": 3.04e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060573, "epoch": 1.36938863, "global_step/max_steps": "201/438", "percentage": "45.89%", "elapsed_time": "53m 59s", "remaining_time": "1h 3m 39s"} | |
| {"loss": 0.80379736, "token_acc": 0.75545852, "grad_norm": 2.15687871, "learning_rate": 3.02e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060629, "epoch": 1.37622916, "global_step/max_steps": "202/438", "percentage": "46.12%", "elapsed_time": "54m 12s", "remaining_time": "1h 3m 19s"} | |
| {"loss": 0.81765449, "token_acc": 0.74912677, "grad_norm": 2.34822059, "learning_rate": 3.01e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060679, "epoch": 1.38306969, "global_step/max_steps": "203/438", "percentage": "46.35%", "elapsed_time": "54m 26s", "remaining_time": "1h 3m 1s"} | |
| {"loss": 0.79931134, "token_acc": 0.74216205, "grad_norm": 2.13844681, "learning_rate": 2.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060734, "epoch": 1.38991022, "global_step/max_steps": "204/438", "percentage": "46.58%", "elapsed_time": "54m 39s", "remaining_time": "1h 2m 41s"} | |
| {"loss": 0.77114737, "token_acc": 0.74276903, "grad_norm": 2.33005166, "learning_rate": 2.97e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060793, "epoch": 1.39675075, "global_step/max_steps": "205/438", "percentage": "46.80%", "elapsed_time": "54m 52s", "remaining_time": "1h 2m 22s"} | |
| {"loss": 0.79651028, "token_acc": 0.74483172, "grad_norm": 2.13843513, "learning_rate": 2.95e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060847, "epoch": 1.40359128, "global_step/max_steps": "206/438", "percentage": "47.03%", "elapsed_time": "55m 6s", "remaining_time": "1h 2m 3s"} | |
| {"loss": 0.79994428, "token_acc": 0.75635292, "grad_norm": 2.17009759, "learning_rate": 2.93e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060902, "epoch": 1.41043181, "global_step/max_steps": "207/438", "percentage": "47.26%", "elapsed_time": "55m 19s", "remaining_time": "1h 1m 44s"} | |
| {"loss": 0.80059159, "token_acc": 0.74598808, "grad_norm": 2.26453829, "learning_rate": 2.91e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.060955, "epoch": 1.41727234, "global_step/max_steps": "208/438", "percentage": "47.49%", "elapsed_time": "55m 33s", "remaining_time": "1h 1m 25s"} | |
| {"loss": 0.7983858, "token_acc": 0.75752564, "grad_norm": 2.13891888, "learning_rate": 2.89e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06101, "epoch": 1.42411287, "global_step/max_steps": "209/438", "percentage": "47.72%", "elapsed_time": "55m 46s", "remaining_time": "1h 1m 6s"} | |
| {"loss": 0.81248677, "token_acc": 0.75444522, "grad_norm": 2.17773104, "learning_rate": 2.88e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06106, "epoch": 1.4309534, "global_step/max_steps": "210/438", "percentage": "47.95%", "elapsed_time": "55m 59s", "remaining_time": "1h 0m 47s"} | |
| {"loss": 0.80381238, "token_acc": 0.75810926, "grad_norm": 2.09110975, "learning_rate": 2.86e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061115, "epoch": 1.43779393, "global_step/max_steps": "211/438", "percentage": "48.17%", "elapsed_time": "56m 13s", "remaining_time": "1h 0m 29s"} | |
| {"loss": 0.80831271, "token_acc": 0.76020985, "grad_norm": 2.23463988, "learning_rate": 2.84e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061167, "epoch": 1.44463446, "global_step/max_steps": "212/438", "percentage": "48.40%", "elapsed_time": "56m 26s", "remaining_time": "1h 0m 10s"} | |
| {"loss": 0.80929482, "token_acc": 0.74652058, "grad_norm": 2.11457157, "learning_rate": 2.82e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061214, "epoch": 1.45147499, "global_step/max_steps": "213/438", "percentage": "48.63%", "elapsed_time": "56m 40s", "remaining_time": "59m 51s"} | |
| {"loss": 0.81993675, "token_acc": 0.74573413, "grad_norm": 2.15945768, "learning_rate": 2.8e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061264, "epoch": 1.45831552, "global_step/max_steps": "214/438", "percentage": "48.86%", "elapsed_time": "56m 53s", "remaining_time": "59m 33s"} | |
| {"loss": 0.82290787, "token_acc": 0.76030873, "grad_norm": 2.24140096, "learning_rate": 2.78e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061309, "epoch": 1.46515605, "global_step/max_steps": "215/438", "percentage": "49.09%", "elapsed_time": "57m 7s", "remaining_time": "59m 15s"} | |
| {"loss": 0.79570156, "token_acc": 0.76068376, "grad_norm": 2.2216506, "learning_rate": 2.76e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061352, "epoch": 1.47199658, "global_step/max_steps": "216/438", "percentage": "49.32%", "elapsed_time": "57m 21s", "remaining_time": "58m 56s"} | |
| {"loss": 0.78722608, "token_acc": 0.75513541, "grad_norm": 2.11924672, "learning_rate": 2.75e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0614, "epoch": 1.47883711, "global_step/max_steps": "217/438", "percentage": "49.54%", "elapsed_time": "57m 34s", "remaining_time": "58m 38s"} | |
| {"loss": 0.81181979, "token_acc": 0.76232275, "grad_norm": 2.19390202, "learning_rate": 2.73e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06145, "epoch": 1.48567764, "global_step/max_steps": "218/438", "percentage": "49.77%", "elapsed_time": "57m 48s", "remaining_time": "58m 20s"} | |
| {"loss": 0.8056758, "token_acc": 0.74222044, "grad_norm": 2.07812953, "learning_rate": 2.71e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061498, "epoch": 1.49251817, "global_step/max_steps": "219/438", "percentage": "50.00%", "elapsed_time": "58m 1s", "remaining_time": "58m 1s"} | |
| {"loss": 0.78949201, "token_acc": 0.75136079, "grad_norm": 2.14274168, "learning_rate": 2.69e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061544, "epoch": 1.4993587, "global_step/max_steps": "220/438", "percentage": "50.23%", "elapsed_time": "58m 15s", "remaining_time": "57m 43s"} | |
| {"loss": 0.82521194, "token_acc": 0.74212317, "grad_norm": 2.10073209, "learning_rate": 2.67e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061596, "epoch": 1.50619923, "global_step/max_steps": "221/438", "percentage": "50.46%", "elapsed_time": "58m 28s", "remaining_time": "57m 25s"} | |
| {"loss": 0.82503176, "token_acc": 0.74506211, "grad_norm": 2.12198186, "learning_rate": 2.65e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061642, "epoch": 1.51303976, "global_step/max_steps": "222/438", "percentage": "50.68%", "elapsed_time": "58m 42s", "remaining_time": "57m 6s"} | |
| {"loss": 0.81021166, "token_acc": 0.7592533, "grad_norm": 2.21558309, "learning_rate": 2.63e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061688, "epoch": 1.51988029, "global_step/max_steps": "223/438", "percentage": "50.91%", "elapsed_time": "58m 55s", "remaining_time": "56m 48s"} | |
| {"loss": 0.77191424, "token_acc": 0.77001666, "grad_norm": 2.31064939, "learning_rate": 2.61e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061737, "epoch": 1.52672082, "global_step/max_steps": "224/438", "percentage": "51.14%", "elapsed_time": "59m 8s", "remaining_time": "56m 30s"} | |
| {"loss": 0.81854141, "token_acc": 0.75500509, "grad_norm": 2.14440966, "learning_rate": 2.59e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061786, "epoch": 1.53356135, "global_step/max_steps": "225/438", "percentage": "51.37%", "elapsed_time": "59m 22s", "remaining_time": "56m 12s"} | |
| {"loss": 0.83804309, "token_acc": 0.74562946, "grad_norm": 2.3037262, "learning_rate": 2.58e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061829, "epoch": 1.54040188, "global_step/max_steps": "226/438", "percentage": "51.60%", "elapsed_time": "59m 35s", "remaining_time": "55m 54s"} | |
| {"loss": 0.81949747, "token_acc": 0.75574805, "grad_norm": 2.14727402, "learning_rate": 2.56e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061874, "epoch": 1.54724241, "global_step/max_steps": "227/438", "percentage": "51.83%", "elapsed_time": "59m 49s", "remaining_time": "55m 36s"} | |
| {"loss": 0.78034693, "token_acc": 0.76871462, "grad_norm": 2.06493044, "learning_rate": 2.54e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061919, "epoch": 1.55408294, "global_step/max_steps": "228/438", "percentage": "52.05%", "elapsed_time": "1h 0m 2s", "remaining_time": "55m 18s"} | |
| {"loss": 0.8253026, "token_acc": 0.76570541, "grad_norm": 2.17582178, "learning_rate": 2.52e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061963, "epoch": 1.56092347, "global_step/max_steps": "229/438", "percentage": "52.28%", "elapsed_time": "1h 0m 16s", "remaining_time": "55m 0s"} | |
| {"loss": 0.80877966, "token_acc": 0.75679572, "grad_norm": 2.08991623, "learning_rate": 2.5e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06201, "epoch": 1.567764, "global_step/max_steps": "230/438", "percentage": "52.51%", "elapsed_time": "1h 0m 29s", "remaining_time": "54m 42s"} | |
| {"loss": 0.80574703, "token_acc": 0.76006441, "grad_norm": 2.12185264, "learning_rate": 2.48e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062056, "epoch": 1.57460453, "global_step/max_steps": "231/438", "percentage": "52.74%", "elapsed_time": "1h 0m 43s", "remaining_time": "54m 24s"} | |
| {"loss": 0.81953931, "token_acc": 0.74078286, "grad_norm": 2.07525945, "learning_rate": 2.46e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062099, "epoch": 1.58144506, "global_step/max_steps": "232/438", "percentage": "52.97%", "elapsed_time": "1h 0m 56s", "remaining_time": "54m 6s"} | |
| {"loss": 0.8068521, "token_acc": 0.74693834, "grad_norm": 2.12685728, "learning_rate": 2.44e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062144, "epoch": 1.58828559, "global_step/max_steps": "233/438", "percentage": "53.20%", "elapsed_time": "1h 1m 10s", "remaining_time": "53m 49s"} | |
| {"loss": 0.82605815, "token_acc": 0.73855616, "grad_norm": 2.19582653, "learning_rate": 2.42e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062186, "epoch": 1.59512612, "global_step/max_steps": "234/438", "percentage": "53.42%", "elapsed_time": "1h 1m 23s", "remaining_time": "53m 31s"} | |
| {"loss": 0.81047529, "token_acc": 0.7432972, "grad_norm": 2.16632175, "learning_rate": 2.41e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062228, "epoch": 1.60196665, "global_step/max_steps": "235/438", "percentage": "53.65%", "elapsed_time": "1h 1m 37s", "remaining_time": "53m 13s"} | |
| {"loss": 0.80809045, "token_acc": 0.75401575, "grad_norm": 2.08405256, "learning_rate": 2.39e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062271, "epoch": 1.60880718, "global_step/max_steps": "236/438", "percentage": "53.88%", "elapsed_time": "1h 1m 50s", "remaining_time": "52m 56s"} | |
| {"loss": 0.79967946, "token_acc": 0.75883019, "grad_norm": 2.0533638, "learning_rate": 2.37e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062317, "epoch": 1.61564771, "global_step/max_steps": "237/438", "percentage": "54.11%", "elapsed_time": "1h 2m 3s", "remaining_time": "52m 38s"} | |
| {"loss": 0.81557751, "token_acc": 0.75299635, "grad_norm": 2.11085987, "learning_rate": 2.35e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062358, "epoch": 1.62248824, "global_step/max_steps": "238/438", "percentage": "54.34%", "elapsed_time": "1h 2m 17s", "remaining_time": "52m 20s"} | |
| {"loss": 0.81775439, "token_acc": 0.74023578, "grad_norm": 3.46187067, "learning_rate": 2.33e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0624, "epoch": 1.62932877, "global_step/max_steps": "239/438", "percentage": "54.57%", "elapsed_time": "1h 2m 30s", "remaining_time": "52m 3s"} | |
| {"loss": 0.8151561, "token_acc": 0.75462431, "grad_norm": 2.28983927, "learning_rate": 2.31e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062437, "epoch": 1.6361693, "global_step/max_steps": "240/438", "percentage": "54.79%", "elapsed_time": "1h 2m 44s", "remaining_time": "51m 45s"} | |
| {"loss": 0.7993176, "token_acc": 0.76790451, "grad_norm": 2.14071321, "learning_rate": 2.29e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062482, "epoch": 1.64300983, "global_step/max_steps": "241/438", "percentage": "55.02%", "elapsed_time": "1h 2m 57s", "remaining_time": "51m 28s"} | |
| {"loss": 0.82974482, "token_acc": 0.72892435, "grad_norm": 2.15133953, "learning_rate": 2.27e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06252, "epoch": 1.64985036, "global_step/max_steps": "242/438", "percentage": "55.25%", "elapsed_time": "1h 3m 11s", "remaining_time": "51m 10s"} | |
| {"loss": 0.80829132, "token_acc": 0.76279559, "grad_norm": 2.16442823, "learning_rate": 2.25e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06256, "epoch": 1.65669089, "global_step/max_steps": "243/438", "percentage": "55.48%", "elapsed_time": "1h 3m 24s", "remaining_time": "50m 53s"} | |
| {"loss": 0.80542183, "token_acc": 0.76391818, "grad_norm": 2.10646749, "learning_rate": 2.24e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062593, "epoch": 1.66353142, "global_step/max_steps": "244/438", "percentage": "55.71%", "elapsed_time": "1h 3m 38s", "remaining_time": "50m 36s"} | |
| {"loss": 0.78409743, "token_acc": 0.75722307, "grad_norm": 2.09766436, "learning_rate": 2.22e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062633, "epoch": 1.67037195, "global_step/max_steps": "245/438", "percentage": "55.94%", "elapsed_time": "1h 3m 52s", "remaining_time": "50m 18s"} | |
| {"loss": 0.8236475, "token_acc": 0.76020074, "grad_norm": 2.23466611, "learning_rate": 2.2e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062672, "epoch": 1.67721248, "global_step/max_steps": "246/438", "percentage": "56.16%", "elapsed_time": "1h 4m 5s", "remaining_time": "50m 1s"} | |
| {"loss": 0.8219502, "token_acc": 0.74373626, "grad_norm": 2.13302851, "learning_rate": 2.18e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062713, "epoch": 1.68405301, "global_step/max_steps": "247/438", "percentage": "56.39%", "elapsed_time": "1h 4m 19s", "remaining_time": "49m 44s"} | |
| {"loss": 0.80380005, "token_acc": 0.73728039, "grad_norm": 2.16181445, "learning_rate": 2.16e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062754, "epoch": 1.69089354, "global_step/max_steps": "248/438", "percentage": "56.62%", "elapsed_time": "1h 4m 32s", "remaining_time": "49m 26s"} | |
| {"loss": 0.82576221, "token_acc": 0.74850014, "grad_norm": 2.12576222, "learning_rate": 2.14e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062789, "epoch": 1.69773407, "global_step/max_steps": "249/438", "percentage": "56.85%", "elapsed_time": "1h 4m 46s", "remaining_time": "49m 9s"} | |
| {"loss": 0.79972994, "token_acc": 0.77055913, "grad_norm": 2.04634976, "learning_rate": 2.12e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062825, "epoch": 1.7045746, "global_step/max_steps": "250/438", "percentage": "57.08%", "elapsed_time": "1h 5m 0s", "remaining_time": "48m 52s"} | |
| {"loss": 0.82249838, "token_acc": 0.7526182, "grad_norm": 2.01988554, "learning_rate": 2.11e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062864, "epoch": 1.71141513, "global_step/max_steps": "251/438", "percentage": "57.31%", "elapsed_time": "1h 5m 13s", "remaining_time": "48m 35s"} | |
| {"loss": 0.80093408, "token_acc": 0.75906882, "grad_norm": 2.08013558, "learning_rate": 2.09e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062903, "epoch": 1.71825566, "global_step/max_steps": "252/438", "percentage": "57.53%", "elapsed_time": "1h 5m 26s", "remaining_time": "48m 18s"} | |
| {"loss": 0.80407757, "token_acc": 0.75268496, "grad_norm": 2.19546533, "learning_rate": 2.07e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062942, "epoch": 1.72509619, "global_step/max_steps": "253/438", "percentage": "57.76%", "elapsed_time": "1h 5m 40s", "remaining_time": "48m 1s"} | |
| {"loss": 0.80783117, "token_acc": 0.75796244, "grad_norm": 2.13104439, "learning_rate": 2.05e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06298, "epoch": 1.73193673, "global_step/max_steps": "254/438", "percentage": "57.99%", "elapsed_time": "1h 5m 53s", "remaining_time": "47m 44s"} | |
| {"loss": 0.79507875, "token_acc": 0.74558385, "grad_norm": 2.07990813, "learning_rate": 2.03e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063019, "epoch": 1.73877726, "global_step/max_steps": "255/438", "percentage": "58.22%", "elapsed_time": "1h 6m 7s", "remaining_time": "47m 26s"} | |
| {"loss": 0.81779832, "token_acc": 0.74881272, "grad_norm": 2.09516263, "learning_rate": 2.01e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063055, "epoch": 1.74561779, "global_step/max_steps": "256/438", "percentage": "58.45%", "elapsed_time": "1h 6m 20s", "remaining_time": "47m 9s"} | |
| {"loss": 0.8058514, "token_acc": 0.74854173, "grad_norm": 2.08688712, "learning_rate": 1.99e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063092, "epoch": 1.75245832, "global_step/max_steps": "257/438", "percentage": "58.68%", "elapsed_time": "1h 6m 34s", "remaining_time": "46m 52s"} | |
| {"loss": 0.78225756, "token_acc": 0.75294882, "grad_norm": 2.17280579, "learning_rate": 1.98e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063131, "epoch": 1.75929885, "global_step/max_steps": "258/438", "percentage": "58.90%", "elapsed_time": "1h 6m 47s", "remaining_time": "46m 35s"} | |
| {"loss": 0.80961639, "token_acc": 0.75707522, "grad_norm": 2.06212282, "learning_rate": 1.96e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063168, "epoch": 1.76613938, "global_step/max_steps": "259/438", "percentage": "59.13%", "elapsed_time": "1h 7m 0s", "remaining_time": "46m 18s"} | |
| {"loss": 0.81621003, "token_acc": 0.75036117, "grad_norm": 2.14935303, "learning_rate": 1.94e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063206, "epoch": 1.77297991, "global_step/max_steps": "260/438", "percentage": "59.36%", "elapsed_time": "1h 7m 14s", "remaining_time": "46m 1s"} | |
| {"loss": 0.82073611, "token_acc": 0.7515528, "grad_norm": 2.11798549, "learning_rate": 1.92e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063236, "epoch": 1.77982044, "global_step/max_steps": "261/438", "percentage": "59.59%", "elapsed_time": "1h 7m 28s", "remaining_time": "45m 45s"} | |
| {"loss": 0.8073144, "token_acc": 0.75966334, "grad_norm": 2.02955079, "learning_rate": 1.9e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063274, "epoch": 1.78666097, "global_step/max_steps": "262/438", "percentage": "59.82%", "elapsed_time": "1h 7m 41s", "remaining_time": "45m 28s"} | |
| {"loss": 0.80666697, "token_acc": 0.75360577, "grad_norm": 2.13641477, "learning_rate": 1.88e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063308, "epoch": 1.7935015, "global_step/max_steps": "263/438", "percentage": "60.05%", "elapsed_time": "1h 7m 55s", "remaining_time": "45m 11s"} | |
| {"loss": 0.81875402, "token_acc": 0.72282344, "grad_norm": 2.0440836, "learning_rate": 1.87e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063347, "epoch": 1.80034203, "global_step/max_steps": "264/438", "percentage": "60.27%", "elapsed_time": "1h 8m 8s", "remaining_time": "44m 54s"} | |
| {"loss": 0.79434562, "token_acc": 0.7672607, "grad_norm": 2.07998109, "learning_rate": 1.85e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063375, "epoch": 1.80718256, "global_step/max_steps": "265/438", "percentage": "60.50%", "elapsed_time": "1h 8m 22s", "remaining_time": "44m 37s"} | |
| {"loss": 0.83470309, "token_acc": 0.72700494, "grad_norm": 2.07748699, "learning_rate": 1.83e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06341, "epoch": 1.81402309, "global_step/max_steps": "266/438", "percentage": "60.73%", "elapsed_time": "1h 8m 35s", "remaining_time": "44m 21s"} | |
| {"loss": 0.80593896, "token_acc": 0.76676896, "grad_norm": 2.06038451, "learning_rate": 1.81e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063445, "epoch": 1.82086362, "global_step/max_steps": "267/438", "percentage": "60.96%", "elapsed_time": "1h 8m 49s", "remaining_time": "44m 4s"} | |
| {"loss": 0.79821169, "token_acc": 0.74754755, "grad_norm": 2.07918334, "learning_rate": 1.79e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063478, "epoch": 1.82770415, "global_step/max_steps": "268/438", "percentage": "61.19%", "elapsed_time": "1h 9m 2s", "remaining_time": "43m 47s"} | |
| {"loss": 0.79375684, "token_acc": 0.77142236, "grad_norm": 2.04859662, "learning_rate": 1.77e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063514, "epoch": 1.83454468, "global_step/max_steps": "269/438", "percentage": "61.42%", "elapsed_time": "1h 9m 15s", "remaining_time": "43m 31s"} | |
| {"loss": 0.81057572, "token_acc": 0.75475529, "grad_norm": 2.43997574, "learning_rate": 1.76e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063552, "epoch": 1.84138521, "global_step/max_steps": "270/438", "percentage": "61.64%", "elapsed_time": "1h 9m 29s", "remaining_time": "43m 14s"} | |
| {"loss": 0.77705377, "token_acc": 0.7597307, "grad_norm": 2.03213692, "learning_rate": 1.74e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063583, "epoch": 1.84822574, "global_step/max_steps": "271/438", "percentage": "61.87%", "elapsed_time": "1h 9m 42s", "remaining_time": "42m 57s"} | |
| {"loss": 0.80197304, "token_acc": 0.73811192, "grad_norm": 2.03455043, "learning_rate": 1.72e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063618, "epoch": 1.85506627, "global_step/max_steps": "272/438", "percentage": "62.10%", "elapsed_time": "1h 9m 56s", "remaining_time": "42m 40s"} | |
| {"loss": 0.79902804, "token_acc": 0.73994836, "grad_norm": 1.96878231, "learning_rate": 1.7e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063648, "epoch": 1.8619068, "global_step/max_steps": "273/438", "percentage": "62.33%", "elapsed_time": "1h 10m 9s", "remaining_time": "42m 24s"} | |
| {"loss": 0.81818569, "token_acc": 0.73944439, "grad_norm": 2.05183911, "learning_rate": 1.68e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063682, "epoch": 1.86874733, "global_step/max_steps": "274/438", "percentage": "62.56%", "elapsed_time": "1h 10m 23s", "remaining_time": "42m 7s"} | |
| {"loss": 0.80987239, "token_acc": 0.75834446, "grad_norm": 2.19405222, "learning_rate": 1.67e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063713, "epoch": 1.87558786, "global_step/max_steps": "275/438", "percentage": "62.79%", "elapsed_time": "1h 10m 36s", "remaining_time": "41m 51s"} | |
| {"loss": 0.7968806, "token_acc": 0.74665304, "grad_norm": 2.11199641, "learning_rate": 1.65e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063749, "epoch": 1.88242839, "global_step/max_steps": "276/438", "percentage": "63.01%", "elapsed_time": "1h 10m 50s", "remaining_time": "41m 34s"} | |
| {"loss": 0.792063, "token_acc": 0.7628131, "grad_norm": 1.98221111, "learning_rate": 1.63e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063778, "epoch": 1.88926892, "global_step/max_steps": "277/438", "percentage": "63.24%", "elapsed_time": "1h 11m 3s", "remaining_time": "41m 18s"} | |
| {"loss": 0.83340865, "token_acc": 0.75748691, "grad_norm": 2.05525684, "learning_rate": 1.61e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063808, "epoch": 1.89610945, "global_step/max_steps": "278/438", "percentage": "63.47%", "elapsed_time": "1h 11m 17s", "remaining_time": "41m 1s"} | |
| {"loss": 0.80428755, "token_acc": 0.74424231, "grad_norm": 2.10756755, "learning_rate": 1.6e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063843, "epoch": 1.90294998, "global_step/max_steps": "279/438", "percentage": "63.70%", "elapsed_time": "1h 11m 30s", "remaining_time": "40m 45s"} | |
| {"loss": 0.79609728, "token_acc": 0.75835012, "grad_norm": 2.04904604, "learning_rate": 1.58e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063874, "epoch": 1.90979051, "global_step/max_steps": "280/438", "percentage": "63.93%", "elapsed_time": "1h 11m 44s", "remaining_time": "40m 28s"} | |
| {"loss": 0.82323515, "token_acc": 0.74903394, "grad_norm": 2.0917604, "learning_rate": 1.56e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063907, "epoch": 1.91663104, "global_step/max_steps": "281/438", "percentage": "64.16%", "elapsed_time": "1h 11m 57s", "remaining_time": "40m 12s"} | |
| {"loss": 0.80421281, "token_acc": 0.75313558, "grad_norm": 2.03753281, "learning_rate": 1.54e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063936, "epoch": 1.92347157, "global_step/max_steps": "282/438", "percentage": "64.38%", "elapsed_time": "1h 12m 11s", "remaining_time": "39m 56s"} | |
| {"loss": 0.80596584, "token_acc": 0.7426268, "grad_norm": 2.00357509, "learning_rate": 1.53e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063961, "epoch": 1.9303121, "global_step/max_steps": "283/438", "percentage": "64.61%", "elapsed_time": "1h 12m 25s", "remaining_time": "39m 39s"} | |
| {"loss": 0.79782355, "token_acc": 0.75757576, "grad_norm": 2.0615685, "learning_rate": 1.51e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063995, "epoch": 1.93715263, "global_step/max_steps": "284/438", "percentage": "64.84%", "elapsed_time": "1h 12m 38s", "remaining_time": "39m 23s"} | |
| {"loss": 0.81198698, "token_acc": 0.76679723, "grad_norm": 2.05550098, "learning_rate": 1.49e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064024, "epoch": 1.94399316, "global_step/max_steps": "285/438", "percentage": "65.07%", "elapsed_time": "1h 12m 52s", "remaining_time": "39m 7s"} | |
| {"loss": 0.80396396, "token_acc": 0.76043999, "grad_norm": 2.14291859, "learning_rate": 1.47e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064058, "epoch": 1.95083369, "global_step/max_steps": "286/438", "percentage": "65.30%", "elapsed_time": "1h 13m 5s", "remaining_time": "38m 50s"} | |
| {"loss": 0.81342399, "token_acc": 0.76105175, "grad_norm": 2.01350975, "learning_rate": 1.46e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064083, "epoch": 1.95767422, "global_step/max_steps": "287/438", "percentage": "65.53%", "elapsed_time": "1h 13m 19s", "remaining_time": "38m 34s"} | |
| {"loss": 0.79982054, "token_acc": 0.76418749, "grad_norm": 1.95146668, "learning_rate": 1.44e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064113, "epoch": 1.96451475, "global_step/max_steps": "288/438", "percentage": "65.75%", "elapsed_time": "1h 13m 32s", "remaining_time": "38m 18s"} | |
| {"loss": 0.81999683, "token_acc": 0.7582213, "grad_norm": 2.0420413, "learning_rate": 1.42e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064141, "epoch": 1.97135528, "global_step/max_steps": "289/438", "percentage": "65.98%", "elapsed_time": "1h 13m 46s", "remaining_time": "38m 2s"} | |
| {"loss": 0.80862862, "token_acc": 0.75725815, "grad_norm": 2.14618874, "learning_rate": 1.41e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064164, "epoch": 1.97819581, "global_step/max_steps": "290/438", "percentage": "66.21%", "elapsed_time": "1h 14m 0s", "remaining_time": "37m 46s"} | |
| {"loss": 0.80883503, "token_acc": 0.75600699, "grad_norm": 2.12935328, "learning_rate": 1.39e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064189, "epoch": 1.98503634, "global_step/max_steps": "291/438", "percentage": "66.44%", "elapsed_time": "1h 14m 14s", "remaining_time": "37m 30s"} | |
| {"loss": 0.79838026, "token_acc": 0.7461056, "grad_norm": 1.94632268, "learning_rate": 1.37e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064216, "epoch": 1.99187687, "global_step/max_steps": "292/438", "percentage": "66.67%", "elapsed_time": "1h 14m 27s", "remaining_time": "37m 13s"} | |
| {"loss": 0.81084442, "token_acc": 0.7370438, "grad_norm": 1.98978913, "learning_rate": 1.35e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06424, "epoch": 1.9987174, "global_step/max_steps": "293/438", "percentage": "66.89%", "elapsed_time": "1h 14m 41s", "remaining_time": "36m 57s"} | |
| {"loss": 0.81482089, "token_acc": 0.73860911, "grad_norm": 1.98978913, "learning_rate": 1.34e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064424, "epoch": 2.0, "global_step/max_steps": "294/438", "percentage": "67.12%", "elapsed_time": "1h 14m 44s", "remaining_time": "36m 36s"} | |
| {"loss": 0.6082499, "token_acc": 0.82087393, "grad_norm": 6.30188847, "learning_rate": 1.32e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064447, "epoch": 2.00684053, "global_step/max_steps": "295/438", "percentage": "67.35%", "elapsed_time": "1h 14m 58s", "remaining_time": "36m 20s"} | |
| {"loss": 0.5499807, "token_acc": 0.83180789, "grad_norm": 6.23137093, "learning_rate": 1.3e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064475, "epoch": 2.01368106, "global_step/max_steps": "296/438", "percentage": "67.58%", "elapsed_time": "1h 15m 11s", "remaining_time": "36m 4s"} | |
| {"loss": 0.57290006, "token_acc": 0.80791249, "grad_norm": 5.29059887, "learning_rate": 1.29e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064499, "epoch": 2.02052159, "global_step/max_steps": "297/438", "percentage": "67.81%", "elapsed_time": "1h 15m 25s", "remaining_time": "35m 48s"} | |
| {"loss": 0.53303528, "token_acc": 0.8501811, "grad_norm": 3.70670414, "learning_rate": 1.27e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064525, "epoch": 2.02736212, "global_step/max_steps": "298/438", "percentage": "68.04%", "elapsed_time": "1h 15m 39s", "remaining_time": "35m 32s"} | |
| {"loss": 0.54291219, "token_acc": 0.83228637, "grad_norm": 3.49614334, "learning_rate": 1.26e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064546, "epoch": 2.03420265, "global_step/max_steps": "299/438", "percentage": "68.26%", "elapsed_time": "1h 15m 53s", "remaining_time": "35m 16s"} | |
| {"loss": 0.5641126, "token_acc": 0.82458014, "grad_norm": 5.72066593, "learning_rate": 1.24e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064568, "epoch": 2.04104318, "global_step/max_steps": "300/438", "percentage": "68.49%", "elapsed_time": "1h 16m 6s", "remaining_time": "35m 0s"} | |
| {"eval_loss": 1.18641031, "eval_token_acc": 0.68091035, "eval_runtime": 18.7296, "eval_samples_per_second": 52.59, "eval_steps_per_second": 6.621, "epoch": 2.04104318, "global_step/max_steps": "300/438", "percentage": "68.49%", "elapsed_time": "1h 16m 25s", "remaining_time": "35m 9s"} | |
| {"loss": 0.55863982, "token_acc": 0.6978733, "grad_norm": 6.91134453, "learning_rate": 1.22e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061394, "epoch": 2.04788371, "global_step/max_steps": "301/438", "percentage": "68.72%", "elapsed_time": "1h 20m 23s", "remaining_time": "36m 35s"} | |
| {"loss": 0.5557909, "token_acc": 0.8166509, "grad_norm": 5.86622143, "learning_rate": 1.21e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061433, "epoch": 2.05472424, "global_step/max_steps": "302/438", "percentage": "68.95%", "elapsed_time": "1h 20m 36s", "remaining_time": "36m 18s"} | |
| {"loss": 0.53551131, "token_acc": 0.83979975, "grad_norm": 4.45468283, "learning_rate": 1.19e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061462, "epoch": 2.06156477, "global_step/max_steps": "303/438", "percentage": "69.18%", "elapsed_time": "1h 20m 50s", "remaining_time": "36m 1s"} | |
| {"loss": 0.55207813, "token_acc": 0.83199269, "grad_norm": 3.18589664, "learning_rate": 1.17e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061496, "epoch": 2.0684053, "global_step/max_steps": "304/438", "percentage": "69.41%", "elapsed_time": "1h 21m 4s", "remaining_time": "35m 44s"} | |
| {"loss": 0.54873967, "token_acc": 0.81280116, "grad_norm": 3.1732347, "learning_rate": 1.16e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061529, "epoch": 2.07524583, "global_step/max_steps": "305/438", "percentage": "69.63%", "elapsed_time": "1h 21m 17s", "remaining_time": "35m 26s"} | |
| {"loss": 0.53435862, "token_acc": 0.83026264, "grad_norm": 3.2237165, "learning_rate": 1.14e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061561, "epoch": 2.08208636, "global_step/max_steps": "306/438", "percentage": "69.86%", "elapsed_time": "1h 21m 31s", "remaining_time": "35m 10s"} | |
| {"loss": 0.54218465, "token_acc": 0.82334505, "grad_norm": 3.46148062, "learning_rate": 1.13e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061596, "epoch": 2.08892689, "global_step/max_steps": "307/438", "percentage": "70.09%", "elapsed_time": "1h 21m 44s", "remaining_time": "34m 52s"} | |
| {"loss": 0.5443756, "token_acc": 0.83262188, "grad_norm": 3.20402622, "learning_rate": 1.11e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061632, "epoch": 2.09576742, "global_step/max_steps": "308/438", "percentage": "70.32%", "elapsed_time": "1h 21m 58s", "remaining_time": "34m 35s"} | |
| {"loss": 0.54601407, "token_acc": 0.8201928, "grad_norm": 2.9513948, "learning_rate": 1.1e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061668, "epoch": 2.10260795, "global_step/max_steps": "309/438", "percentage": "70.55%", "elapsed_time": "1h 22m 11s", "remaining_time": "34m 18s"} | |
| {"loss": 0.53452688, "token_acc": 0.82920406, "grad_norm": 2.65442038, "learning_rate": 1.08e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0617, "epoch": 2.10944848, "global_step/max_steps": "310/438", "percentage": "70.78%", "elapsed_time": "1h 22m 25s", "remaining_time": "34m 1s"} | |
| {"loss": 0.52102637, "token_acc": 0.83305613, "grad_norm": 2.52542734, "learning_rate": 1.06e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061727, "epoch": 2.11628901, "global_step/max_steps": "311/438", "percentage": "71.00%", "elapsed_time": "1h 22m 39s", "remaining_time": "33m 45s"} | |
| {"loss": 0.56271333, "token_acc": 0.8186461, "grad_norm": 2.79947948, "learning_rate": 1.05e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061759, "epoch": 2.12312954, "global_step/max_steps": "312/438", "percentage": "71.23%", "elapsed_time": "1h 22m 52s", "remaining_time": "33m 28s"} | |
| {"loss": 0.52794236, "token_acc": 0.83689899, "grad_norm": 2.80942941, "learning_rate": 1.03e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061791, "epoch": 2.12997007, "global_step/max_steps": "313/438", "percentage": "71.46%", "elapsed_time": "1h 23m 6s", "remaining_time": "33m 11s"} | |
| {"loss": 0.54783303, "token_acc": 0.85856388, "grad_norm": 2.7143631, "learning_rate": 1.02e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06182, "epoch": 2.1368106, "global_step/max_steps": "314/438", "percentage": "71.69%", "elapsed_time": "1h 23m 19s", "remaining_time": "32m 54s"} | |
| {"loss": 0.5266608, "token_acc": 0.82135562, "grad_norm": 2.81380129, "learning_rate": 1e-06, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061853, "epoch": 2.14365113, "global_step/max_steps": "315/438", "percentage": "71.92%", "elapsed_time": "1h 23m 33s", "remaining_time": "32m 37s"} | |
| {"loss": 0.54894793, "token_acc": 0.82768009, "grad_norm": 2.68940544, "learning_rate": 9.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061884, "epoch": 2.15049166, "global_step/max_steps": "316/438", "percentage": "72.15%", "elapsed_time": "1h 23m 47s", "remaining_time": "32m 20s"} | |
| {"loss": 0.54624975, "token_acc": 0.82663397, "grad_norm": 2.62640929, "learning_rate": 9.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061914, "epoch": 2.15733219, "global_step/max_steps": "317/438", "percentage": "72.37%", "elapsed_time": "1h 24m 0s", "remaining_time": "32m 4s"} | |
| {"loss": 0.51936674, "token_acc": 0.83039392, "grad_norm": 2.53439116, "learning_rate": 9.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061945, "epoch": 2.16417272, "global_step/max_steps": "318/438", "percentage": "72.60%", "elapsed_time": "1h 24m 14s", "remaining_time": "31m 47s"} | |
| {"loss": 0.51416045, "token_acc": 0.85647374, "grad_norm": 2.59942865, "learning_rate": 9.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06198, "epoch": 2.17101325, "global_step/max_steps": "319/438", "percentage": "72.83%", "elapsed_time": "1h 24m 27s", "remaining_time": "31m 30s"} | |
| {"loss": 0.5283615, "token_acc": 0.84981989, "grad_norm": 2.45987654, "learning_rate": 9.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062008, "epoch": 2.17785378, "global_step/max_steps": "320/438", "percentage": "73.06%", "elapsed_time": "1h 24m 41s", "remaining_time": "31m 13s"} | |
| {"loss": 0.51090842, "token_acc": 0.84513727, "grad_norm": 2.51493788, "learning_rate": 9.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06204, "epoch": 2.18469431, "global_step/max_steps": "321/438", "percentage": "73.29%", "elapsed_time": "1h 24m 54s", "remaining_time": "30m 56s"} | |
| {"loss": 0.52403641, "token_acc": 0.82590429, "grad_norm": 2.47504044, "learning_rate": 9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062077, "epoch": 2.19153484, "global_step/max_steps": "322/438", "percentage": "73.52%", "elapsed_time": "1h 25m 7s", "remaining_time": "30m 40s"} | |
| {"loss": 0.53065062, "token_acc": 0.83598972, "grad_norm": 2.58034277, "learning_rate": 8.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062109, "epoch": 2.19837537, "global_step/max_steps": "323/438", "percentage": "73.74%", "elapsed_time": "1h 25m 21s", "remaining_time": "30m 23s"} | |
| {"loss": 0.51585102, "token_acc": 0.83917341, "grad_norm": 2.54973459, "learning_rate": 8.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062142, "epoch": 2.2052159, "global_step/max_steps": "324/438", "percentage": "73.97%", "elapsed_time": "1h 25m 34s", "remaining_time": "30m 6s"} | |
| {"loss": 0.51437008, "token_acc": 0.82611631, "grad_norm": 2.58653927, "learning_rate": 8.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06217, "epoch": 2.21205643, "global_step/max_steps": "325/438", "percentage": "74.20%", "elapsed_time": "1h 25m 48s", "remaining_time": "29m 50s"} | |
| {"loss": 0.51771665, "token_acc": 0.84400038, "grad_norm": 2.52010202, "learning_rate": 8.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062201, "epoch": 2.21889696, "global_step/max_steps": "326/438", "percentage": "74.43%", "elapsed_time": "1h 26m 1s", "remaining_time": "29m 33s"} | |
| {"loss": 0.53029555, "token_acc": 0.82020984, "grad_norm": 2.52291369, "learning_rate": 8.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062231, "epoch": 2.22573749, "global_step/max_steps": "327/438", "percentage": "74.66%", "elapsed_time": "1h 26m 15s", "remaining_time": "29m 16s"} | |
| {"loss": 0.53785908, "token_acc": 0.8405561, "grad_norm": 2.51233006, "learning_rate": 8.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062255, "epoch": 2.23257802, "global_step/max_steps": "328/438", "percentage": "74.89%", "elapsed_time": "1h 26m 29s", "remaining_time": "29m 0s"} | |
| {"loss": 0.53006816, "token_acc": 0.82746781, "grad_norm": 2.41755438, "learning_rate": 8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062281, "epoch": 2.23941855, "global_step/max_steps": "329/438", "percentage": "75.11%", "elapsed_time": "1h 26m 43s", "remaining_time": "28m 43s"} | |
| {"loss": 0.52311009, "token_acc": 0.83036761, "grad_norm": 2.59230757, "learning_rate": 7.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062312, "epoch": 2.24625909, "global_step/max_steps": "330/438", "percentage": "75.34%", "elapsed_time": "1h 26m 56s", "remaining_time": "28m 27s"} | |
| {"loss": 0.53304905, "token_acc": 0.81996122, "grad_norm": 2.42506957, "learning_rate": 7.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062338, "epoch": 2.25309962, "global_step/max_steps": "331/438", "percentage": "75.57%", "elapsed_time": "1h 27m 10s", "remaining_time": "28m 10s"} | |
| {"loss": 0.53157401, "token_acc": 0.8396629, "grad_norm": 2.4816618, "learning_rate": 7.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062369, "epoch": 2.25994015, "global_step/max_steps": "332/438", "percentage": "75.80%", "elapsed_time": "1h 27m 23s", "remaining_time": "27m 54s"} | |
| {"loss": 0.50844705, "token_acc": 0.82729834, "grad_norm": 2.47605586, "learning_rate": 7.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.0624, "epoch": 2.26678068, "global_step/max_steps": "333/438", "percentage": "76.03%", "elapsed_time": "1h 27m 37s", "remaining_time": "27m 37s"} | |
| {"loss": 0.54557532, "token_acc": 0.82214156, "grad_norm": 2.5360806, "learning_rate": 7.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062429, "epoch": 2.27362121, "global_step/max_steps": "334/438", "percentage": "76.26%", "elapsed_time": "1h 27m 50s", "remaining_time": "27m 21s"} | |
| {"loss": 0.53563464, "token_acc": 0.81672235, "grad_norm": 2.75521755, "learning_rate": 7.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062455, "epoch": 2.28046174, "global_step/max_steps": "335/438", "percentage": "76.48%", "elapsed_time": "1h 28m 4s", "remaining_time": "27m 4s"} | |
| {"loss": 0.54578793, "token_acc": 0.84204832, "grad_norm": 2.67481351, "learning_rate": 7.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06248, "epoch": 2.28730227, "global_step/max_steps": "336/438", "percentage": "76.71%", "elapsed_time": "1h 28m 18s", "remaining_time": "26m 48s"} | |
| {"loss": 0.51656449, "token_acc": 0.84156543, "grad_norm": 2.586658, "learning_rate": 6.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062511, "epoch": 2.2941428, "global_step/max_steps": "337/438", "percentage": "76.94%", "elapsed_time": "1h 28m 31s", "remaining_time": "26m 31s"} | |
| {"loss": 0.52270561, "token_acc": 0.84020838, "grad_norm": 2.50846624, "learning_rate": 6.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062542, "epoch": 2.30098333, "global_step/max_steps": "338/438", "percentage": "77.17%", "elapsed_time": "1h 28m 45s", "remaining_time": "26m 15s"} | |
| {"loss": 0.52721918, "token_acc": 0.8530724, "grad_norm": 2.52371192, "learning_rate": 6.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062571, "epoch": 2.30782386, "global_step/max_steps": "339/438", "percentage": "77.40%", "elapsed_time": "1h 28m 58s", "remaining_time": "25m 59s"} | |
| {"loss": 0.51657742, "token_acc": 0.8444399, "grad_norm": 2.54019618, "learning_rate": 6.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062603, "epoch": 2.31466439, "global_step/max_steps": "340/438", "percentage": "77.63%", "elapsed_time": "1h 29m 11s", "remaining_time": "25m 42s"} | |
| {"loss": 0.5181973, "token_acc": 0.83146543, "grad_norm": 2.50294256, "learning_rate": 6.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062631, "epoch": 2.32150492, "global_step/max_steps": "341/438", "percentage": "77.85%", "elapsed_time": "1h 29m 25s", "remaining_time": "25m 26s"} | |
| {"loss": 0.51184905, "token_acc": 0.85130785, "grad_norm": 2.51500964, "learning_rate": 6.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06266, "epoch": 2.32834545, "global_step/max_steps": "342/438", "percentage": "78.08%", "elapsed_time": "1h 29m 38s", "remaining_time": "25m 9s"} | |
| {"loss": 0.52372706, "token_acc": 0.83313976, "grad_norm": 2.50826716, "learning_rate": 6.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06269, "epoch": 2.33518598, "global_step/max_steps": "343/438", "percentage": "78.31%", "elapsed_time": "1h 29m 52s", "remaining_time": "24m 53s"} | |
| {"loss": 0.51460689, "token_acc": 0.82737673, "grad_norm": 2.53574443, "learning_rate": 6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062717, "epoch": 2.34202651, "global_step/max_steps": "344/438", "percentage": "78.54%", "elapsed_time": "1h 30m 5s", "remaining_time": "24m 37s"} | |
| {"loss": 0.52191377, "token_acc": 0.83665972, "grad_norm": 2.50006032, "learning_rate": 5.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062745, "epoch": 2.34886704, "global_step/max_steps": "345/438", "percentage": "78.77%", "elapsed_time": "1h 30m 19s", "remaining_time": "24m 20s"} | |
| {"loss": 0.52046674, "token_acc": 0.84578015, "grad_norm": 2.51174545, "learning_rate": 5.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062773, "epoch": 2.35570757, "global_step/max_steps": "346/438", "percentage": "79.00%", "elapsed_time": "1h 30m 32s", "remaining_time": "24m 4s"} | |
| {"loss": 0.51725936, "token_acc": 0.82654903, "grad_norm": 2.47993875, "learning_rate": 5.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062799, "epoch": 2.3625481, "global_step/max_steps": "347/438", "percentage": "79.22%", "elapsed_time": "1h 30m 46s", "remaining_time": "23m 48s"} | |
| {"loss": 0.50381166, "token_acc": 0.84284306, "grad_norm": 2.50357151, "learning_rate": 5.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06283, "epoch": 2.36938863, "global_step/max_steps": "348/438", "percentage": "79.45%", "elapsed_time": "1h 30m 59s", "remaining_time": "23m 31s"} | |
| {"loss": 0.516312, "token_acc": 0.81890339, "grad_norm": 2.57196522, "learning_rate": 5.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062856, "epoch": 2.37622916, "global_step/max_steps": "349/438", "percentage": "79.68%", "elapsed_time": "1h 31m 13s", "remaining_time": "23m 15s"} | |
| {"loss": 0.51821649, "token_acc": 0.83522559, "grad_norm": 2.5079689, "learning_rate": 5.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062885, "epoch": 2.38306969, "global_step/max_steps": "350/438", "percentage": "79.91%", "elapsed_time": "1h 31m 26s", "remaining_time": "22m 59s"} | |
| {"loss": 0.52005374, "token_acc": 0.84559723, "grad_norm": 2.57892942, "learning_rate": 5.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062913, "epoch": 2.38991022, "global_step/max_steps": "351/438", "percentage": "80.14%", "elapsed_time": "1h 31m 39s", "remaining_time": "22m 43s"} | |
| {"loss": 0.516716, "token_acc": 0.83776796, "grad_norm": 2.5134542, "learning_rate": 5.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062939, "epoch": 2.39675075, "global_step/max_steps": "352/438", "percentage": "80.37%", "elapsed_time": "1h 31m 53s", "remaining_time": "22m 27s"} | |
| {"loss": 0.53499615, "token_acc": 0.84662983, "grad_norm": 2.53047943, "learning_rate": 5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062963, "epoch": 2.40359128, "global_step/max_steps": "353/438", "percentage": "80.59%", "elapsed_time": "1h 32m 7s", "remaining_time": "22m 10s"} | |
| {"loss": 0.51721936, "token_acc": 0.82761816, "grad_norm": 2.46074462, "learning_rate": 4.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06299, "epoch": 2.41043181, "global_step/max_steps": "354/438", "percentage": "80.82%", "elapsed_time": "1h 32m 20s", "remaining_time": "21m 54s"} | |
| {"loss": 0.51024115, "token_acc": 0.8322391, "grad_norm": 2.41461325, "learning_rate": 4.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063016, "epoch": 2.41727234, "global_step/max_steps": "355/438", "percentage": "81.05%", "elapsed_time": "1h 32m 34s", "remaining_time": "21m 38s"} | |
| {"loss": 0.50399548, "token_acc": 0.84677134, "grad_norm": 2.53878498, "learning_rate": 4.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063042, "epoch": 2.42411287, "global_step/max_steps": "356/438", "percentage": "81.28%", "elapsed_time": "1h 32m 47s", "remaining_time": "21m 22s"} | |
| {"loss": 0.49316406, "token_acc": 0.84243855, "grad_norm": 2.48908973, "learning_rate": 4.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063065, "epoch": 2.4309534, "global_step/max_steps": "357/438", "percentage": "81.51%", "elapsed_time": "1h 33m 1s", "remaining_time": "21m 6s"} | |
| {"loss": 0.51230168, "token_acc": 0.85498399, "grad_norm": 2.50412107, "learning_rate": 4.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06309, "epoch": 2.43779393, "global_step/max_steps": "358/438", "percentage": "81.74%", "elapsed_time": "1h 33m 15s", "remaining_time": "20m 50s"} | |
| {"loss": 0.50859952, "token_acc": 0.82217918, "grad_norm": 2.52799416, "learning_rate": 4.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063114, "epoch": 2.44463446, "global_step/max_steps": "359/438", "percentage": "81.96%", "elapsed_time": "1h 33m 28s", "remaining_time": "20m 34s"} | |
| {"loss": 0.53430343, "token_acc": 0.8502406, "grad_norm": 2.5144217, "learning_rate": 4.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063138, "epoch": 2.45147499, "global_step/max_steps": "360/438", "percentage": "82.19%", "elapsed_time": "1h 33m 42s", "remaining_time": "20m 18s"} | |
| {"loss": 0.53863055, "token_acc": 0.83702081, "grad_norm": 2.52731299, "learning_rate": 4.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063163, "epoch": 2.45831552, "global_step/max_steps": "361/438", "percentage": "82.42%", "elapsed_time": "1h 33m 56s", "remaining_time": "20m 2s"} | |
| {"loss": 0.51252604, "token_acc": 0.83024567, "grad_norm": 2.4897902, "learning_rate": 4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06319, "epoch": 2.46515605, "global_step/max_steps": "362/438", "percentage": "82.65%", "elapsed_time": "1h 34m 9s", "remaining_time": "19m 46s"} | |
| {"loss": 0.50375783, "token_acc": 0.85002906, "grad_norm": 2.51232481, "learning_rate": 3.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063216, "epoch": 2.47199658, "global_step/max_steps": "363/438", "percentage": "82.88%", "elapsed_time": "1h 34m 22s", "remaining_time": "19m 30s"} | |
| {"loss": 0.50739694, "token_acc": 0.83046757, "grad_norm": 2.76510954, "learning_rate": 3.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06324, "epoch": 2.47883711, "global_step/max_steps": "364/438", "percentage": "83.11%", "elapsed_time": "1h 34m 36s", "remaining_time": "19m 14s"} | |
| {"loss": 0.53586549, "token_acc": 0.83555601, "grad_norm": 2.59664702, "learning_rate": 3.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063266, "epoch": 2.48567764, "global_step/max_steps": "365/438", "percentage": "83.33%", "elapsed_time": "1h 34m 49s", "remaining_time": "18m 57s"} | |
| {"loss": 0.50125223, "token_acc": 0.84133173, "grad_norm": 2.46823907, "learning_rate": 3.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063291, "epoch": 2.49251817, "global_step/max_steps": "366/438", "percentage": "83.56%", "elapsed_time": "1h 35m 3s", "remaining_time": "18m 41s"} | |
| {"loss": 0.51589191, "token_acc": 0.83669807, "grad_norm": 2.45199895, "learning_rate": 3.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063316, "epoch": 2.4993587, "global_step/max_steps": "367/438", "percentage": "83.79%", "elapsed_time": "1h 35m 17s", "remaining_time": "18m 26s"} | |
| {"loss": 0.51533669, "token_acc": 0.85136314, "grad_norm": 2.55191875, "learning_rate": 3.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063339, "epoch": 2.50619923, "global_step/max_steps": "368/438", "percentage": "84.02%", "elapsed_time": "1h 35m 30s", "remaining_time": "18m 10s"} | |
| {"loss": 0.53273153, "token_acc": 0.81953838, "grad_norm": 2.45936298, "learning_rate": 3.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063367, "epoch": 2.51303976, "global_step/max_steps": "369/438", "percentage": "84.25%", "elapsed_time": "1h 35m 43s", "remaining_time": "17m 54s"} | |
| {"loss": 0.52729505, "token_acc": 0.85646538, "grad_norm": 2.5278182, "learning_rate": 3.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063394, "epoch": 2.51988029, "global_step/max_steps": "370/438", "percentage": "84.47%", "elapsed_time": "1h 35m 57s", "remaining_time": "17m 38s"} | |
| {"loss": 0.50268257, "token_acc": 0.83338708, "grad_norm": 2.3734467, "learning_rate": 3.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063419, "epoch": 2.52672082, "global_step/max_steps": "371/438", "percentage": "84.70%", "elapsed_time": "1h 36m 10s", "remaining_time": "17m 22s"} | |
| {"loss": 0.51780176, "token_acc": 0.8516129, "grad_norm": 2.5273025, "learning_rate": 3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063444, "epoch": 2.53356135, "global_step/max_steps": "372/438", "percentage": "84.93%", "elapsed_time": "1h 36m 24s", "remaining_time": "17m 6s"} | |
| {"loss": 0.51935709, "token_acc": 0.81667499, "grad_norm": 2.40989971, "learning_rate": 3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063466, "epoch": 2.54040188, "global_step/max_steps": "373/438", "percentage": "85.16%", "elapsed_time": "1h 36m 37s", "remaining_time": "16m 50s"} | |
| {"loss": 0.50259888, "token_acc": 0.83633857, "grad_norm": 2.42685151, "learning_rate": 2.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06349, "epoch": 2.54724241, "global_step/max_steps": "374/438", "percentage": "85.39%", "elapsed_time": "1h 36m 51s", "remaining_time": "16m 34s"} | |
| {"loss": 0.53072357, "token_acc": 0.84753185, "grad_norm": 2.57127285, "learning_rate": 2.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063516, "epoch": 2.55408294, "global_step/max_steps": "375/438", "percentage": "85.62%", "elapsed_time": "1h 37m 4s", "remaining_time": "16m 18s"} | |
| {"loss": 0.53114378, "token_acc": 0.84484193, "grad_norm": 2.40793085, "learning_rate": 2.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063537, "epoch": 2.56092347, "global_step/max_steps": "376/438", "percentage": "85.84%", "elapsed_time": "1h 37m 18s", "remaining_time": "16m 2s"} | |
| {"loss": 0.52485871, "token_acc": 0.83501234, "grad_norm": 2.84673977, "learning_rate": 2.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063563, "epoch": 2.567764, "global_step/max_steps": "377/438", "percentage": "86.07%", "elapsed_time": "1h 37m 31s", "remaining_time": "15m 46s"} | |
| {"loss": 0.54551798, "token_acc": 0.84362184, "grad_norm": 2.56867599, "learning_rate": 2.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063588, "epoch": 2.57460453, "global_step/max_steps": "378/438", "percentage": "86.30%", "elapsed_time": "1h 37m 45s", "remaining_time": "15m 30s"} | |
| {"loss": 0.53701639, "token_acc": 0.82898842, "grad_norm": 2.52074194, "learning_rate": 2.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063614, "epoch": 2.58144506, "global_step/max_steps": "379/438", "percentage": "86.53%", "elapsed_time": "1h 37m 58s", "remaining_time": "15m 15s"} | |
| {"loss": 0.52585953, "token_acc": 0.82580852, "grad_norm": 2.64235187, "learning_rate": 2.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06364, "epoch": 2.58828559, "global_step/max_steps": "380/438", "percentage": "86.76%", "elapsed_time": "1h 38m 11s", "remaining_time": "14m 59s"} | |
| {"loss": 0.51939797, "token_acc": 0.839556, "grad_norm": 2.5273273, "learning_rate": 2.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063661, "epoch": 2.59512612, "global_step/max_steps": "381/438", "percentage": "86.99%", "elapsed_time": "1h 38m 25s", "remaining_time": "14m 43s"} | |
| {"loss": 0.51489806, "token_acc": 0.84687228, "grad_norm": 2.43460679, "learning_rate": 2.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063682, "epoch": 2.60196665, "global_step/max_steps": "382/438", "percentage": "87.21%", "elapsed_time": "1h 38m 39s", "remaining_time": "14m 27s"} | |
| {"loss": 0.51843989, "token_acc": 0.82902804, "grad_norm": 2.53402662, "learning_rate": 2.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063706, "epoch": 2.60880718, "global_step/max_steps": "383/438", "percentage": "87.44%", "elapsed_time": "1h 38m 52s", "remaining_time": "14m 11s"} | |
| {"loss": 0.50924456, "token_acc": 0.8456259, "grad_norm": 2.47531605, "learning_rate": 2.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063727, "epoch": 2.61564771, "global_step/max_steps": "384/438", "percentage": "87.67%", "elapsed_time": "1h 39m 6s", "remaining_time": "13m 56s"} | |
| {"loss": 0.51417923, "token_acc": 0.83179279, "grad_norm": 2.46097445, "learning_rate": 2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063752, "epoch": 2.62248824, "global_step/max_steps": "385/438", "percentage": "87.90%", "elapsed_time": "1h 39m 19s", "remaining_time": "13m 40s"} | |
| {"loss": 0.53958583, "token_acc": 0.82783483, "grad_norm": 2.4744308, "learning_rate": 1.9e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063776, "epoch": 2.62932877, "global_step/max_steps": "386/438", "percentage": "88.13%", "elapsed_time": "1h 39m 33s", "remaining_time": "13m 24s"} | |
| {"loss": 0.49264601, "token_acc": 0.83850235, "grad_norm": 2.52190304, "learning_rate": 1.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063798, "epoch": 2.6361693, "global_step/max_steps": "387/438", "percentage": "88.36%", "elapsed_time": "1h 39m 46s", "remaining_time": "13m 8s"} | |
| {"loss": 0.52458477, "token_acc": 0.81327924, "grad_norm": 2.42457008, "learning_rate": 1.8e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06382, "epoch": 2.64300983, "global_step/max_steps": "388/438", "percentage": "88.58%", "elapsed_time": "1h 40m 0s", "remaining_time": "12m 53s"} | |
| {"loss": 0.52787828, "token_acc": 0.82469423, "grad_norm": 2.50938058, "learning_rate": 1.7e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063841, "epoch": 2.64985036, "global_step/max_steps": "389/438", "percentage": "88.81%", "elapsed_time": "1h 40m 13s", "remaining_time": "12m 37s"} | |
| {"loss": 0.50351691, "token_acc": 0.84404345, "grad_norm": 2.42199969, "learning_rate": 1.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063866, "epoch": 2.65669089, "global_step/max_steps": "390/438", "percentage": "89.04%", "elapsed_time": "1h 40m 27s", "remaining_time": "12m 21s"} | |
| {"loss": 0.52700698, "token_acc": 0.82948756, "grad_norm": 2.48994899, "learning_rate": 1.6e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063891, "epoch": 2.66353142, "global_step/max_steps": "391/438", "percentage": "89.27%", "elapsed_time": "1h 40m 40s", "remaining_time": "12m 6s"} | |
| {"loss": 0.52218735, "token_acc": 0.83752375, "grad_norm": 2.85355043, "learning_rate": 1.5e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063917, "epoch": 2.67037195, "global_step/max_steps": "392/438", "percentage": "89.50%", "elapsed_time": "1h 40m 53s", "remaining_time": "11m 50s"} | |
| {"loss": 0.49281785, "token_acc": 0.84835006, "grad_norm": 2.47051859, "learning_rate": 1.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063938, "epoch": 2.67721248, "global_step/max_steps": "393/438", "percentage": "89.73%", "elapsed_time": "1h 41m 7s", "remaining_time": "11m 34s"} | |
| {"loss": 0.50458997, "token_acc": 0.84803655, "grad_norm": 2.52398705, "learning_rate": 1.4e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063961, "epoch": 2.68405301, "global_step/max_steps": "394/438", "percentage": "89.95%", "elapsed_time": "1h 41m 20s", "remaining_time": "11m 19s"} | |
| {"loss": 0.50725394, "token_acc": 0.84835465, "grad_norm": 2.52271891, "learning_rate": 1.3e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.063983, "epoch": 2.69089354, "global_step/max_steps": "395/438", "percentage": "90.18%", "elapsed_time": "1h 41m 34s", "remaining_time": "11m 3s"} | |
| {"loss": 0.5192579, "token_acc": 0.83345417, "grad_norm": 2.50154233, "learning_rate": 1.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064004, "epoch": 2.69773407, "global_step/max_steps": "396/438", "percentage": "90.41%", "elapsed_time": "1h 41m 47s", "remaining_time": "10m 47s"} | |
| {"loss": 0.53581452, "token_acc": 0.81975232, "grad_norm": 2.44148636, "learning_rate": 1.2e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064022, "epoch": 2.7045746, "global_step/max_steps": "397/438", "percentage": "90.64%", "elapsed_time": "1h 42m 1s", "remaining_time": "10m 32s"} | |
| {"loss": 0.50972188, "token_acc": 0.83630985, "grad_norm": 2.56261468, "learning_rate": 1.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064046, "epoch": 2.71141513, "global_step/max_steps": "398/438", "percentage": "90.87%", "elapsed_time": "1h 42m 15s", "remaining_time": "10m 16s"} | |
| {"loss": 0.49168357, "token_acc": 0.83592764, "grad_norm": 2.42961025, "learning_rate": 1.1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064067, "epoch": 2.71825566, "global_step/max_steps": "399/438", "percentage": "91.10%", "elapsed_time": "1h 42m 28s", "remaining_time": "10m 0s"} | |
| {"loss": 0.53853214, "token_acc": 0.82509871, "grad_norm": 2.48838353, "learning_rate": 1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.064086, "epoch": 2.72509619, "global_step/max_steps": "400/438", "percentage": "91.32%", "elapsed_time": "1h 42m 42s", "remaining_time": "9m 45s"} | |
| {"eval_loss": 1.22433937, "eval_token_acc": 0.67808256, "eval_runtime": 18.8324, "eval_samples_per_second": 52.304, "eval_steps_per_second": 6.584, "epoch": 2.72509619, "global_step/max_steps": "400/438", "percentage": "91.32%", "elapsed_time": "1h 43m 1s", "remaining_time": "9m 47s"} | |
| {"loss": 0.53457832, "token_acc": 0.69474202, "grad_norm": 2.51015949, "learning_rate": 1e-07, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061365, "epoch": 2.73193673, "global_step/max_steps": "401/438", "percentage": "91.55%", "elapsed_time": "1h 47m 35s", "remaining_time": "9m 55s"} | |
| {"loss": 0.51891947, "token_acc": 0.84036367, "grad_norm": 2.42954993, "learning_rate": 9e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06139, "epoch": 2.73877726, "global_step/max_steps": "402/438", "percentage": "91.78%", "elapsed_time": "1h 47m 49s", "remaining_time": "9m 39s"} | |
| {"loss": 0.51018775, "token_acc": 0.82649581, "grad_norm": 2.39964032, "learning_rate": 9e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061417, "epoch": 2.74561779, "global_step/max_steps": "403/438", "percentage": "92.01%", "elapsed_time": "1h 48m 2s", "remaining_time": "9m 22s"} | |
| {"loss": 0.54332262, "token_acc": 0.83393195, "grad_norm": 2.45799017, "learning_rate": 8e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061445, "epoch": 2.75245832, "global_step/max_steps": "404/438", "percentage": "92.24%", "elapsed_time": "1h 48m 15s", "remaining_time": "9m 6s"} | |
| {"loss": 0.52517444, "token_acc": 0.83187773, "grad_norm": 2.47388411, "learning_rate": 8e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06147, "epoch": 2.75929885, "global_step/max_steps": "405/438", "percentage": "92.47%", "elapsed_time": "1h 48m 29s", "remaining_time": "8m 50s"} | |
| {"loss": 0.50706697, "token_acc": 0.83561251, "grad_norm": 2.45484161, "learning_rate": 7e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061496, "epoch": 2.76613938, "global_step/max_steps": "406/438", "percentage": "92.69%", "elapsed_time": "1h 48m 42s", "remaining_time": "8m 34s"} | |
| {"loss": 0.51422864, "token_acc": 0.84813938, "grad_norm": 2.49588203, "learning_rate": 7e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061522, "epoch": 2.77297991, "global_step/max_steps": "407/438", "percentage": "92.92%", "elapsed_time": "1h 48m 56s", "remaining_time": "8m 17s"} | |
| {"loss": 0.53380007, "token_acc": 0.83799231, "grad_norm": 2.50493956, "learning_rate": 6e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061549, "epoch": 2.77982044, "global_step/max_steps": "408/438", "percentage": "93.15%", "elapsed_time": "1h 49m 9s", "remaining_time": "8m 1s"} | |
| {"loss": 0.52506411, "token_acc": 0.83243351, "grad_norm": 2.43923593, "learning_rate": 6e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061575, "epoch": 2.78666097, "global_step/max_steps": "409/438", "percentage": "93.38%", "elapsed_time": "1h 49m 22s", "remaining_time": "7m 45s"} | |
| {"loss": 0.50715059, "token_acc": 0.85681895, "grad_norm": 2.48566842, "learning_rate": 6e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061602, "epoch": 2.7935015, "global_step/max_steps": "410/438", "percentage": "93.61%", "elapsed_time": "1h 49m 36s", "remaining_time": "7m 29s"} | |
| {"loss": 0.51219714, "token_acc": 0.8262906, "grad_norm": 2.5247848, "learning_rate": 5e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061627, "epoch": 2.80034203, "global_step/max_steps": "411/438", "percentage": "93.84%", "elapsed_time": "1h 49m 49s", "remaining_time": "7m 12s"} | |
| {"loss": 0.52488077, "token_acc": 0.80776637, "grad_norm": 2.54660797, "learning_rate": 5e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061653, "epoch": 2.80718256, "global_step/max_steps": "412/438", "percentage": "94.06%", "elapsed_time": "1h 50m 3s", "remaining_time": "6m 56s"} | |
| {"loss": 0.50903523, "token_acc": 0.82981642, "grad_norm": 2.46572828, "learning_rate": 4e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061681, "epoch": 2.81402309, "global_step/max_steps": "413/438", "percentage": "94.29%", "elapsed_time": "1h 50m 16s", "remaining_time": "6m 40s"} | |
| {"loss": 0.52513015, "token_acc": 0.84954507, "grad_norm": 2.46908092, "learning_rate": 4e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061703, "epoch": 2.82086362, "global_step/max_steps": "414/438", "percentage": "94.52%", "elapsed_time": "1h 50m 30s", "remaining_time": "6m 24s"} | |
| {"loss": 0.50396407, "token_acc": 0.83059, "grad_norm": 2.48102379, "learning_rate": 4e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061727, "epoch": 2.82770415, "global_step/max_steps": "415/438", "percentage": "94.75%", "elapsed_time": "1h 50m 43s", "remaining_time": "6m 8s"} | |
| {"loss": 0.49974871, "token_acc": 0.83345247, "grad_norm": 2.48826981, "learning_rate": 3e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06175, "epoch": 2.83454468, "global_step/max_steps": "416/438", "percentage": "94.98%", "elapsed_time": "1h 50m 57s", "remaining_time": "5m 52s"} | |
| {"loss": 0.51035678, "token_acc": 0.83724705, "grad_norm": 2.54147506, "learning_rate": 3e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061775, "epoch": 2.84138521, "global_step/max_steps": "417/438", "percentage": "95.21%", "elapsed_time": "1h 51m 11s", "remaining_time": "5m 35s"} | |
| {"loss": 0.52940035, "token_acc": 0.82826131, "grad_norm": 2.43360066, "learning_rate": 3e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061801, "epoch": 2.84822574, "global_step/max_steps": "418/438", "percentage": "95.43%", "elapsed_time": "1h 51m 24s", "remaining_time": "5m 19s"} | |
| {"loss": 0.50959879, "token_acc": 0.83928209, "grad_norm": 2.45538974, "learning_rate": 3e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061826, "epoch": 2.85506627, "global_step/max_steps": "419/438", "percentage": "95.66%", "elapsed_time": "1h 51m 37s", "remaining_time": "5m 3s"} | |
| {"loss": 0.50068247, "token_acc": 0.84269557, "grad_norm": 2.45483184, "learning_rate": 2e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061848, "epoch": 2.8619068, "global_step/max_steps": "420/438", "percentage": "95.89%", "elapsed_time": "1h 51m 51s", "remaining_time": "4m 47s"} | |
| {"loss": 0.52739275, "token_acc": 0.82337943, "grad_norm": 2.44415188, "learning_rate": 2e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061872, "epoch": 2.86874733, "global_step/max_steps": "421/438", "percentage": "96.12%", "elapsed_time": "1h 52m 5s", "remaining_time": "4m 31s"} | |
| {"loss": 0.55418283, "token_acc": 0.82830071, "grad_norm": 2.53355789, "learning_rate": 2e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061895, "epoch": 2.87558786, "global_step/max_steps": "422/438", "percentage": "96.35%", "elapsed_time": "1h 52m 18s", "remaining_time": "4m 15s"} | |
| {"loss": 0.51480871, "token_acc": 0.83036215, "grad_norm": 2.48076105, "learning_rate": 2e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06192, "epoch": 2.88242839, "global_step/max_steps": "423/438", "percentage": "96.58%", "elapsed_time": "1h 52m 32s", "remaining_time": "3m 59s"} | |
| {"loss": 0.54701948, "token_acc": 0.81361782, "grad_norm": 2.58623719, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061945, "epoch": 2.88926892, "global_step/max_steps": "424/438", "percentage": "96.80%", "elapsed_time": "1h 52m 45s", "remaining_time": "3m 43s"} | |
| {"loss": 0.5385617, "token_acc": 0.8125, "grad_norm": 2.50514054, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061965, "epoch": 2.89610945, "global_step/max_steps": "425/438", "percentage": "97.03%", "elapsed_time": "1h 52m 59s", "remaining_time": "3m 27s"} | |
| {"loss": 0.52808082, "token_acc": 0.85201194, "grad_norm": 2.60127926, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.061988, "epoch": 2.90294998, "global_step/max_steps": "426/438", "percentage": "97.26%", "elapsed_time": "1h 53m 13s", "remaining_time": "3m 11s"} | |
| {"loss": 0.51827139, "token_acc": 0.81627349, "grad_norm": 2.41370583, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062008, "epoch": 2.90979051, "global_step/max_steps": "427/438", "percentage": "97.49%", "elapsed_time": "1h 53m 26s", "remaining_time": "2m 55s"} | |
| {"loss": 0.52868354, "token_acc": 0.82665991, "grad_norm": 2.48274255, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.06203, "epoch": 2.91663104, "global_step/max_steps": "428/438", "percentage": "97.72%", "elapsed_time": "1h 53m 40s", "remaining_time": "2m 39s"} | |
| {"loss": 0.53060955, "token_acc": 0.83036936, "grad_norm": 2.55862856, "learning_rate": 1e-08, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062054, "epoch": 2.92347157, "global_step/max_steps": "429/438", "percentage": "97.95%", "elapsed_time": "1h 53m 54s", "remaining_time": "2m 23s"} | |
| {"loss": 0.51118046, "token_acc": 0.84689541, "grad_norm": 2.47546721, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062077, "epoch": 2.9303121, "global_step/max_steps": "430/438", "percentage": "98.17%", "elapsed_time": "1h 54m 7s", "remaining_time": "2m 7s"} | |
| {"loss": 0.51955283, "token_acc": 0.82734793, "grad_norm": 2.47424197, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062098, "epoch": 2.93715263, "global_step/max_steps": "431/438", "percentage": "98.40%", "elapsed_time": "1h 54m 21s", "remaining_time": "1m 51s"} | |
| {"loss": 0.52452219, "token_acc": 0.82631634, "grad_norm": 2.42999721, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062122, "epoch": 2.94399316, "global_step/max_steps": "432/438", "percentage": "98.63%", "elapsed_time": "1h 54m 34s", "remaining_time": "1m 35s"} | |
| {"loss": 0.51984036, "token_acc": 0.82361255, "grad_norm": 2.54335403, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062143, "epoch": 2.95083369, "global_step/max_steps": "433/438", "percentage": "98.86%", "elapsed_time": "1h 54m 48s", "remaining_time": "1m 19s"} | |
| {"loss": 0.50992489, "token_acc": 0.84367768, "grad_norm": 2.45550585, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062169, "epoch": 2.95767422, "global_step/max_steps": "434/438", "percentage": "99.09%", "elapsed_time": "1h 55m 1s", "remaining_time": "1m 3s"} | |
| {"loss": 0.53428268, "token_acc": 0.85260116, "grad_norm": 2.41839981, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062192, "epoch": 2.96451475, "global_step/max_steps": "435/438", "percentage": "99.32%", "elapsed_time": "1h 55m 15s", "remaining_time": "47s"} | |
| {"loss": 0.50954926, "token_acc": 0.83965416, "grad_norm": 2.75094581, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062213, "epoch": 2.97135528, "global_step/max_steps": "436/438", "percentage": "99.54%", "elapsed_time": "1h 55m 28s", "remaining_time": "31s"} | |
| {"loss": 0.51194847, "token_acc": 0.85188719, "grad_norm": 2.44417763, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062237, "epoch": 2.97819581, "global_step/max_steps": "437/438", "percentage": "99.77%", "elapsed_time": "1h 55m 42s", "remaining_time": "15s"} | |
| {"loss": 0.49934715, "token_acc": 0.84054025, "grad_norm": 2.47979784, "learning_rate": 0.0, "memory(GiB)": 115.59, "train_speed(iter/s)": 0.062261, "epoch": 2.98503634, "global_step/max_steps": "438/438", "percentage": "100.00%", "elapsed_time": "1h 55m 55s", "remaining_time": "0s"} | |
| {"eval_loss": 1.22380471, "eval_token_acc": 0.67801107, "eval_runtime": 18.7124, "eval_samples_per_second": 52.639, "eval_steps_per_second": 6.627, "epoch": 2.98503634, "global_step/max_steps": "438/438", "percentage": "100.00%", "elapsed_time": "1h 56m 14s", "remaining_time": "0s"} | |
| {"train_runtime": 7207.0206, "train_samples_per_second": 7.792, "train_steps_per_second": 0.061, "total_flos": 1.1009700104427799e+19, "train_loss": 0.81728786, "epoch": 2.98503634, "global_step/max_steps": "438/438", "percentage": "100.00%", "elapsed_time": "1h 59m 59s", "remaining_time": "0s"} | |