Text Generation
Transformers
Safetensors
qwen3
Generated from Trainer
sft
trl
conversational
text-generation-inference
Instructions to use Jerry999/TempSFTSkill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jerry999/TempSFTSkill with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jerry999/TempSFTSkill") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Jerry999/TempSFTSkill") model = AutoModelForCausalLM.from_pretrained("Jerry999/TempSFTSkill") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Jerry999/TempSFTSkill with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jerry999/TempSFTSkill" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jerry999/TempSFTSkill", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jerry999/TempSFTSkill
- SGLang
How to use Jerry999/TempSFTSkill with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jerry999/TempSFTSkill" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jerry999/TempSFTSkill", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jerry999/TempSFTSkill" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jerry999/TempSFTSkill", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Jerry999/TempSFTSkill with Docker Model Runner:
docker model run hf.co/Jerry999/TempSFTSkill
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 0.7598784194528876, | |
| "eval_steps": 500, | |
| "global_step": 1000, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.0007598784194528875, | |
| "grad_norm": 11.283177375793457, | |
| "learning_rate": 0.0, | |
| "loss": 0.7971611022949219, | |
| "mean_token_accuracy": 0.7827156782150269, | |
| "num_tokens": 9945.0, | |
| "step": 1 | |
| }, | |
| { | |
| "epoch": 0.001519756838905775, | |
| "grad_norm": 15.171019554138184, | |
| "learning_rate": 2.5252525252525256e-08, | |
| "loss": 0.7045958042144775, | |
| "mean_token_accuracy": 0.8420767784118652, | |
| "num_tokens": 13584.0, | |
| "step": 2 | |
| }, | |
| { | |
| "epoch": 0.0022796352583586625, | |
| "grad_norm": 13.268917083740234, | |
| "learning_rate": 5.050505050505051e-08, | |
| "loss": 0.9431202411651611, | |
| "mean_token_accuracy": 0.7105640172958374, | |
| "num_tokens": 25466.0, | |
| "step": 3 | |
| }, | |
| { | |
| "epoch": 0.00303951367781155, | |
| "grad_norm": 16.681028366088867, | |
| "learning_rate": 7.575757575757576e-08, | |
| "loss": 1.0039623975753784, | |
| "mean_token_accuracy": 0.7251508235931396, | |
| "num_tokens": 31349.0, | |
| "step": 4 | |
| }, | |
| { | |
| "epoch": 0.003799392097264438, | |
| "grad_norm": 14.952352523803711, | |
| "learning_rate": 1.0101010101010103e-07, | |
| "loss": 0.9401828050613403, | |
| "mean_token_accuracy": 0.7458977699279785, | |
| "num_tokens": 38127.0, | |
| "step": 5 | |
| }, | |
| { | |
| "epoch": 0.004559270516717325, | |
| "grad_norm": 12.397281646728516, | |
| "learning_rate": 1.2626262626262626e-07, | |
| "loss": 0.7461763620376587, | |
| "mean_token_accuracy": 0.7800794839859009, | |
| "num_tokens": 44277.0, | |
| "step": 6 | |
| }, | |
| { | |
| "epoch": 0.005319148936170213, | |
| "grad_norm": 13.733500480651855, | |
| "learning_rate": 1.5151515151515152e-07, | |
| "loss": 0.845583438873291, | |
| "mean_token_accuracy": 0.7497140169143677, | |
| "num_tokens": 52421.0, | |
| "step": 7 | |
| }, | |
| { | |
| "epoch": 0.0060790273556231, | |
| "grad_norm": 11.571980476379395, | |
| "learning_rate": 1.767676767676768e-07, | |
| "loss": 0.8373166918754578, | |
| "mean_token_accuracy": 0.760189414024353, | |
| "num_tokens": 62813.0, | |
| "step": 8 | |
| }, | |
| { | |
| "epoch": 0.006838905775075988, | |
| "grad_norm": 14.639892578125, | |
| "learning_rate": 2.0202020202020205e-07, | |
| "loss": 0.9515388607978821, | |
| "mean_token_accuracy": 0.7438177466392517, | |
| "num_tokens": 69119.0, | |
| "step": 9 | |
| }, | |
| { | |
| "epoch": 0.007598784194528876, | |
| "grad_norm": 17.526134490966797, | |
| "learning_rate": 2.2727272727272729e-07, | |
| "loss": 1.0442354679107666, | |
| "mean_token_accuracy": 0.7132344245910645, | |
| "num_tokens": 74791.0, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.008358662613981762, | |
| "grad_norm": 11.106691360473633, | |
| "learning_rate": 2.525252525252525e-07, | |
| "loss": 0.8264291286468506, | |
| "mean_token_accuracy": 0.7412869334220886, | |
| "num_tokens": 89934.0, | |
| "step": 11 | |
| }, | |
| { | |
| "epoch": 0.00911854103343465, | |
| "grad_norm": 16.4799861907959, | |
| "learning_rate": 2.7777777777777776e-07, | |
| "loss": 1.0191465616226196, | |
| "mean_token_accuracy": 0.7320127487182617, | |
| "num_tokens": 96056.0, | |
| "step": 12 | |
| }, | |
| { | |
| "epoch": 0.009878419452887538, | |
| "grad_norm": 12.276870727539062, | |
| "learning_rate": 3.0303030303030305e-07, | |
| "loss": 0.9538564682006836, | |
| "mean_token_accuracy": 0.7254630327224731, | |
| "num_tokens": 106049.0, | |
| "step": 13 | |
| }, | |
| { | |
| "epoch": 0.010638297872340425, | |
| "grad_norm": 13.591197967529297, | |
| "learning_rate": 3.2828282828282834e-07, | |
| "loss": 0.8274934887886047, | |
| "mean_token_accuracy": 0.7830080986022949, | |
| "num_tokens": 111456.0, | |
| "step": 14 | |
| }, | |
| { | |
| "epoch": 0.011398176291793313, | |
| "grad_norm": 15.956825256347656, | |
| "learning_rate": 3.535353535353536e-07, | |
| "loss": 0.8980859518051147, | |
| "mean_token_accuracy": 0.7673473954200745, | |
| "num_tokens": 117730.0, | |
| "step": 15 | |
| }, | |
| { | |
| "epoch": 0.0121580547112462, | |
| "grad_norm": 11.615748405456543, | |
| "learning_rate": 3.787878787878788e-07, | |
| "loss": 0.8425506949424744, | |
| "mean_token_accuracy": 0.7624208927154541, | |
| "num_tokens": 127723.0, | |
| "step": 16 | |
| }, | |
| { | |
| "epoch": 0.012917933130699088, | |
| "grad_norm": 13.266345024108887, | |
| "learning_rate": 4.040404040404041e-07, | |
| "loss": 0.955736517906189, | |
| "mean_token_accuracy": 0.7301814556121826, | |
| "num_tokens": 140758.0, | |
| "step": 17 | |
| }, | |
| { | |
| "epoch": 0.013677811550151976, | |
| "grad_norm": 12.722447395324707, | |
| "learning_rate": 4.2929292929292934e-07, | |
| "loss": 0.8487507700920105, | |
| "mean_token_accuracy": 0.7661495804786682, | |
| "num_tokens": 148284.0, | |
| "step": 18 | |
| }, | |
| { | |
| "epoch": 0.014437689969604863, | |
| "grad_norm": 20.19617462158203, | |
| "learning_rate": 4.5454545454545457e-07, | |
| "loss": 1.1628165245056152, | |
| "mean_token_accuracy": 0.7297697067260742, | |
| "num_tokens": 151360.0, | |
| "step": 19 | |
| }, | |
| { | |
| "epoch": 0.015197568389057751, | |
| "grad_norm": 10.173198699951172, | |
| "learning_rate": 4.797979797979798e-07, | |
| "loss": 0.7091408967971802, | |
| "mean_token_accuracy": 0.8006219267845154, | |
| "num_tokens": 161563.0, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.015957446808510637, | |
| "grad_norm": 10.829021453857422, | |
| "learning_rate": 5.05050505050505e-07, | |
| "loss": 0.7268050909042358, | |
| "mean_token_accuracy": 0.7929083108901978, | |
| "num_tokens": 171492.0, | |
| "step": 21 | |
| }, | |
| { | |
| "epoch": 0.016717325227963525, | |
| "grad_norm": 13.791851997375488, | |
| "learning_rate": 5.303030303030304e-07, | |
| "loss": 1.0046508312225342, | |
| "mean_token_accuracy": 0.7021903991699219, | |
| "num_tokens": 180054.0, | |
| "step": 22 | |
| }, | |
| { | |
| "epoch": 0.017477203647416412, | |
| "grad_norm": 12.147807121276855, | |
| "learning_rate": 5.555555555555555e-07, | |
| "loss": 0.868274450302124, | |
| "mean_token_accuracy": 0.7546091675758362, | |
| "num_tokens": 187940.0, | |
| "step": 23 | |
| }, | |
| { | |
| "epoch": 0.0182370820668693, | |
| "grad_norm": 11.01634407043457, | |
| "learning_rate": 5.808080808080809e-07, | |
| "loss": 0.6816753149032593, | |
| "mean_token_accuracy": 0.7865602970123291, | |
| "num_tokens": 204159.0, | |
| "step": 24 | |
| }, | |
| { | |
| "epoch": 0.018996960486322188, | |
| "grad_norm": 25.296483993530273, | |
| "learning_rate": 6.060606060606061e-07, | |
| "loss": 0.9021725654602051, | |
| "mean_token_accuracy": 0.8104009628295898, | |
| "num_tokens": 206425.0, | |
| "step": 25 | |
| }, | |
| { | |
| "epoch": 0.019756838905775075, | |
| "grad_norm": 12.985673904418945, | |
| "learning_rate": 6.313131313131314e-07, | |
| "loss": 0.9623971581459045, | |
| "mean_token_accuracy": 0.7304720878601074, | |
| "num_tokens": 216035.0, | |
| "step": 26 | |
| }, | |
| { | |
| "epoch": 0.020516717325227963, | |
| "grad_norm": 10.757186889648438, | |
| "learning_rate": 6.565656565656567e-07, | |
| "loss": 0.8544430136680603, | |
| "mean_token_accuracy": 0.7558294534683228, | |
| "num_tokens": 221961.0, | |
| "step": 27 | |
| }, | |
| { | |
| "epoch": 0.02127659574468085, | |
| "grad_norm": 14.1519775390625, | |
| "learning_rate": 6.818181818181818e-07, | |
| "loss": 0.9835999011993408, | |
| "mean_token_accuracy": 0.7347756028175354, | |
| "num_tokens": 226096.0, | |
| "step": 28 | |
| }, | |
| { | |
| "epoch": 0.022036474164133738, | |
| "grad_norm": 10.439614295959473, | |
| "learning_rate": 7.070707070707071e-07, | |
| "loss": 0.7926623821258545, | |
| "mean_token_accuracy": 0.7580159306526184, | |
| "num_tokens": 233847.0, | |
| "step": 29 | |
| }, | |
| { | |
| "epoch": 0.022796352583586626, | |
| "grad_norm": 7.497498512268066, | |
| "learning_rate": 7.323232323232324e-07, | |
| "loss": 0.6091005206108093, | |
| "mean_token_accuracy": 0.8185515999794006, | |
| "num_tokens": 250007.0, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.023556231003039513, | |
| "grad_norm": 7.350061893463135, | |
| "learning_rate": 7.575757575757576e-07, | |
| "loss": 0.6764670610427856, | |
| "mean_token_accuracy": 0.8003748655319214, | |
| "num_tokens": 266452.0, | |
| "step": 31 | |
| }, | |
| { | |
| "epoch": 0.0243161094224924, | |
| "grad_norm": 8.805062294006348, | |
| "learning_rate": 7.82828282828283e-07, | |
| "loss": 0.9537851810455322, | |
| "mean_token_accuracy": 0.7211984992027283, | |
| "num_tokens": 273503.0, | |
| "step": 32 | |
| }, | |
| { | |
| "epoch": 0.02507598784194529, | |
| "grad_norm": 7.508991718292236, | |
| "learning_rate": 8.080808080808082e-07, | |
| "loss": 0.7078614830970764, | |
| "mean_token_accuracy": 0.8025475144386292, | |
| "num_tokens": 279366.0, | |
| "step": 33 | |
| }, | |
| { | |
| "epoch": 0.025835866261398176, | |
| "grad_norm": 7.2268242835998535, | |
| "learning_rate": 8.333333333333333e-07, | |
| "loss": 0.724380373954773, | |
| "mean_token_accuracy": 0.7933604717254639, | |
| "num_tokens": 286850.0, | |
| "step": 34 | |
| }, | |
| { | |
| "epoch": 0.026595744680851064, | |
| "grad_norm": 5.610812664031982, | |
| "learning_rate": 8.585858585858587e-07, | |
| "loss": 0.5734965801239014, | |
| "mean_token_accuracy": 0.8215349912643433, | |
| "num_tokens": 298456.0, | |
| "step": 35 | |
| }, | |
| { | |
| "epoch": 0.02735562310030395, | |
| "grad_norm": 5.60352087020874, | |
| "learning_rate": 8.838383838383839e-07, | |
| "loss": 0.58527672290802, | |
| "mean_token_accuracy": 0.8147624731063843, | |
| "num_tokens": 308843.0, | |
| "step": 36 | |
| }, | |
| { | |
| "epoch": 0.02811550151975684, | |
| "grad_norm": 7.149941444396973, | |
| "learning_rate": 9.090909090909091e-07, | |
| "loss": 0.743807315826416, | |
| "mean_token_accuracy": 0.7675718665122986, | |
| "num_tokens": 314275.0, | |
| "step": 37 | |
| }, | |
| { | |
| "epoch": 0.028875379939209727, | |
| "grad_norm": 4.390667915344238, | |
| "learning_rate": 9.343434343434345e-07, | |
| "loss": 0.6498252749443054, | |
| "mean_token_accuracy": 0.7975111603736877, | |
| "num_tokens": 326556.0, | |
| "step": 38 | |
| }, | |
| { | |
| "epoch": 0.029635258358662615, | |
| "grad_norm": 4.54934549331665, | |
| "learning_rate": 9.595959595959596e-07, | |
| "loss": 0.6274977326393127, | |
| "mean_token_accuracy": 0.7976137399673462, | |
| "num_tokens": 332575.0, | |
| "step": 39 | |
| }, | |
| { | |
| "epoch": 0.030395136778115502, | |
| "grad_norm": 3.5230090618133545, | |
| "learning_rate": 9.84848484848485e-07, | |
| "loss": 0.5257169008255005, | |
| "mean_token_accuracy": 0.8215709924697876, | |
| "num_tokens": 344979.0, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.03115501519756839, | |
| "grad_norm": 5.243790626525879, | |
| "learning_rate": 1.01010101010101e-06, | |
| "loss": 0.6588603854179382, | |
| "mean_token_accuracy": 0.8016726970672607, | |
| "num_tokens": 352625.0, | |
| "step": 41 | |
| }, | |
| { | |
| "epoch": 0.031914893617021274, | |
| "grad_norm": 4.898980617523193, | |
| "learning_rate": 1.0353535353535354e-06, | |
| "loss": 0.48198166489601135, | |
| "mean_token_accuracy": 0.8494296073913574, | |
| "num_tokens": 356906.0, | |
| "step": 42 | |
| }, | |
| { | |
| "epoch": 0.03267477203647416, | |
| "grad_norm": 4.83416748046875, | |
| "learning_rate": 1.0606060606060608e-06, | |
| "loss": 0.7198941707611084, | |
| "mean_token_accuracy": 0.7772096991539001, | |
| "num_tokens": 362345.0, | |
| "step": 43 | |
| }, | |
| { | |
| "epoch": 0.03343465045592705, | |
| "grad_norm": 3.0984978675842285, | |
| "learning_rate": 1.085858585858586e-06, | |
| "loss": 0.5374331474304199, | |
| "mean_token_accuracy": 0.8272683620452881, | |
| "num_tokens": 369624.0, | |
| "step": 44 | |
| }, | |
| { | |
| "epoch": 0.03419452887537994, | |
| "grad_norm": 2.5063867568969727, | |
| "learning_rate": 1.111111111111111e-06, | |
| "loss": 0.5425165891647339, | |
| "mean_token_accuracy": 0.8135398626327515, | |
| "num_tokens": 383627.0, | |
| "step": 45 | |
| }, | |
| { | |
| "epoch": 0.034954407294832825, | |
| "grad_norm": 2.2525131702423096, | |
| "learning_rate": 1.1363636363636364e-06, | |
| "loss": 0.4416641891002655, | |
| "mean_token_accuracy": 0.8437566757202148, | |
| "num_tokens": 401569.0, | |
| "step": 46 | |
| }, | |
| { | |
| "epoch": 0.03571428571428571, | |
| "grad_norm": 3.5199527740478516, | |
| "learning_rate": 1.1616161616161617e-06, | |
| "loss": 0.5858573913574219, | |
| "mean_token_accuracy": 0.7998367547988892, | |
| "num_tokens": 408392.0, | |
| "step": 47 | |
| }, | |
| { | |
| "epoch": 0.0364741641337386, | |
| "grad_norm": 3.3617491722106934, | |
| "learning_rate": 1.186868686868687e-06, | |
| "loss": 0.4014914929866791, | |
| "mean_token_accuracy": 0.8768728971481323, | |
| "num_tokens": 413042.0, | |
| "step": 48 | |
| }, | |
| { | |
| "epoch": 0.03723404255319149, | |
| "grad_norm": 6.081437110900879, | |
| "learning_rate": 1.2121212121212122e-06, | |
| "loss": 0.74919193983078, | |
| "mean_token_accuracy": 0.7965025901794434, | |
| "num_tokens": 415571.0, | |
| "step": 49 | |
| }, | |
| { | |
| "epoch": 0.037993920972644375, | |
| "grad_norm": 5.349213600158691, | |
| "learning_rate": 1.2373737373737375e-06, | |
| "loss": 0.5735707879066467, | |
| "mean_token_accuracy": 0.8414558172225952, | |
| "num_tokens": 418323.0, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.03875379939209726, | |
| "grad_norm": 3.1204047203063965, | |
| "learning_rate": 1.2626262626262629e-06, | |
| "loss": 0.5516555309295654, | |
| "mean_token_accuracy": 0.8204443454742432, | |
| "num_tokens": 431515.0, | |
| "step": 51 | |
| }, | |
| { | |
| "epoch": 0.03951367781155015, | |
| "grad_norm": 3.1404378414154053, | |
| "learning_rate": 1.287878787878788e-06, | |
| "loss": 0.640109658241272, | |
| "mean_token_accuracy": 0.7860654592514038, | |
| "num_tokens": 441654.0, | |
| "step": 52 | |
| }, | |
| { | |
| "epoch": 0.04027355623100304, | |
| "grad_norm": 2.680227518081665, | |
| "learning_rate": 1.3131313131313134e-06, | |
| "loss": 0.5188562273979187, | |
| "mean_token_accuracy": 0.8292064666748047, | |
| "num_tokens": 451848.0, | |
| "step": 53 | |
| }, | |
| { | |
| "epoch": 0.041033434650455926, | |
| "grad_norm": 3.0411198139190674, | |
| "learning_rate": 1.3383838383838385e-06, | |
| "loss": 0.6637120842933655, | |
| "mean_token_accuracy": 0.7974024415016174, | |
| "num_tokens": 463258.0, | |
| "step": 54 | |
| }, | |
| { | |
| "epoch": 0.04179331306990881, | |
| "grad_norm": 1.9518145322799683, | |
| "learning_rate": 1.3636363636363636e-06, | |
| "loss": 0.5045453310012817, | |
| "mean_token_accuracy": 0.8279316425323486, | |
| "num_tokens": 482025.0, | |
| "step": 55 | |
| }, | |
| { | |
| "epoch": 0.0425531914893617, | |
| "grad_norm": 2.809535503387451, | |
| "learning_rate": 1.3888888888888892e-06, | |
| "loss": 0.6189742684364319, | |
| "mean_token_accuracy": 0.788748025894165, | |
| "num_tokens": 491561.0, | |
| "step": 56 | |
| }, | |
| { | |
| "epoch": 0.04331306990881459, | |
| "grad_norm": 2.19962477684021, | |
| "learning_rate": 1.4141414141414143e-06, | |
| "loss": 0.5295487642288208, | |
| "mean_token_accuracy": 0.8143942356109619, | |
| "num_tokens": 513580.0, | |
| "step": 57 | |
| }, | |
| { | |
| "epoch": 0.044072948328267476, | |
| "grad_norm": 3.321911096572876, | |
| "learning_rate": 1.4393939393939396e-06, | |
| "loss": 0.5833899974822998, | |
| "mean_token_accuracy": 0.8207724690437317, | |
| "num_tokens": 518608.0, | |
| "step": 58 | |
| }, | |
| { | |
| "epoch": 0.044832826747720364, | |
| "grad_norm": 3.252626657485962, | |
| "learning_rate": 1.4646464646464648e-06, | |
| "loss": 0.6266637444496155, | |
| "mean_token_accuracy": 0.7806040048599243, | |
| "num_tokens": 526028.0, | |
| "step": 59 | |
| }, | |
| { | |
| "epoch": 0.04559270516717325, | |
| "grad_norm": 2.391918182373047, | |
| "learning_rate": 1.48989898989899e-06, | |
| "loss": 0.4990342855453491, | |
| "mean_token_accuracy": 0.8362030982971191, | |
| "num_tokens": 536515.0, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.04635258358662614, | |
| "grad_norm": 4.2576985359191895, | |
| "learning_rate": 1.5151515151515152e-06, | |
| "loss": 0.6036560535430908, | |
| "mean_token_accuracy": 0.8048122525215149, | |
| "num_tokens": 539997.0, | |
| "step": 61 | |
| }, | |
| { | |
| "epoch": 0.04711246200607903, | |
| "grad_norm": 3.1281285285949707, | |
| "learning_rate": 1.5404040404040404e-06, | |
| "loss": 0.5602238178253174, | |
| "mean_token_accuracy": 0.8046067357063293, | |
| "num_tokens": 546192.0, | |
| "step": 62 | |
| }, | |
| { | |
| "epoch": 0.047872340425531915, | |
| "grad_norm": 2.5616486072540283, | |
| "learning_rate": 1.565656565656566e-06, | |
| "loss": 0.563162624835968, | |
| "mean_token_accuracy": 0.8025326728820801, | |
| "num_tokens": 555153.0, | |
| "step": 63 | |
| }, | |
| { | |
| "epoch": 0.0486322188449848, | |
| "grad_norm": 2.0530178546905518, | |
| "learning_rate": 1.590909090909091e-06, | |
| "loss": 0.5288445353507996, | |
| "mean_token_accuracy": 0.8235469460487366, | |
| "num_tokens": 570107.0, | |
| "step": 64 | |
| }, | |
| { | |
| "epoch": 0.04939209726443769, | |
| "grad_norm": 6.312800884246826, | |
| "learning_rate": 1.6161616161616164e-06, | |
| "loss": 0.4401736855506897, | |
| "mean_token_accuracy": 0.8390042185783386, | |
| "num_tokens": 573715.0, | |
| "step": 65 | |
| }, | |
| { | |
| "epoch": 0.05015197568389058, | |
| "grad_norm": 2.2594375610351562, | |
| "learning_rate": 1.6414141414141415e-06, | |
| "loss": 0.6218773126602173, | |
| "mean_token_accuracy": 0.782254695892334, | |
| "num_tokens": 586651.0, | |
| "step": 66 | |
| }, | |
| { | |
| "epoch": 0.050911854103343465, | |
| "grad_norm": 2.471200466156006, | |
| "learning_rate": 1.6666666666666667e-06, | |
| "loss": 0.601629376411438, | |
| "mean_token_accuracy": 0.819005012512207, | |
| "num_tokens": 599682.0, | |
| "step": 67 | |
| }, | |
| { | |
| "epoch": 0.05167173252279635, | |
| "grad_norm": 3.020212411880493, | |
| "learning_rate": 1.6919191919191922e-06, | |
| "loss": 0.47155898809432983, | |
| "mean_token_accuracy": 0.8229066729545593, | |
| "num_tokens": 604964.0, | |
| "step": 68 | |
| }, | |
| { | |
| "epoch": 0.05243161094224924, | |
| "grad_norm": 2.80106520652771, | |
| "learning_rate": 1.7171717171717173e-06, | |
| "loss": 0.6624096632003784, | |
| "mean_token_accuracy": 0.7921645641326904, | |
| "num_tokens": 614910.0, | |
| "step": 69 | |
| }, | |
| { | |
| "epoch": 0.05319148936170213, | |
| "grad_norm": 3.308406114578247, | |
| "learning_rate": 1.7424242424242427e-06, | |
| "loss": 0.5815398693084717, | |
| "mean_token_accuracy": 0.8116977214813232, | |
| "num_tokens": 619735.0, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.053951367781155016, | |
| "grad_norm": 2.4876530170440674, | |
| "learning_rate": 1.7676767676767678e-06, | |
| "loss": 0.5332903861999512, | |
| "mean_token_accuracy": 0.816318154335022, | |
| "num_tokens": 627667.0, | |
| "step": 71 | |
| }, | |
| { | |
| "epoch": 0.0547112462006079, | |
| "grad_norm": 2.404669761657715, | |
| "learning_rate": 1.792929292929293e-06, | |
| "loss": 0.6185491681098938, | |
| "mean_token_accuracy": 0.7874469757080078, | |
| "num_tokens": 637711.0, | |
| "step": 72 | |
| }, | |
| { | |
| "epoch": 0.05547112462006079, | |
| "grad_norm": 2.0979228019714355, | |
| "learning_rate": 1.8181818181818183e-06, | |
| "loss": 0.4259670376777649, | |
| "mean_token_accuracy": 0.8593729734420776, | |
| "num_tokens": 646713.0, | |
| "step": 73 | |
| }, | |
| { | |
| "epoch": 0.05623100303951368, | |
| "grad_norm": 3.3549416065216064, | |
| "learning_rate": 1.8434343434343434e-06, | |
| "loss": 0.5811284780502319, | |
| "mean_token_accuracy": 0.8014984130859375, | |
| "num_tokens": 652156.0, | |
| "step": 74 | |
| }, | |
| { | |
| "epoch": 0.056990881458966566, | |
| "grad_norm": 2.107623338699341, | |
| "learning_rate": 1.868686868686869e-06, | |
| "loss": 0.41120773553848267, | |
| "mean_token_accuracy": 0.8120651841163635, | |
| "num_tokens": 660078.0, | |
| "step": 75 | |
| }, | |
| { | |
| "epoch": 0.057750759878419454, | |
| "grad_norm": 2.8542380332946777, | |
| "learning_rate": 1.8939393939393941e-06, | |
| "loss": 0.5268816351890564, | |
| "mean_token_accuracy": 0.8322579860687256, | |
| "num_tokens": 665346.0, | |
| "step": 76 | |
| }, | |
| { | |
| "epoch": 0.05851063829787234, | |
| "grad_norm": 1.578070878982544, | |
| "learning_rate": 1.9191919191919192e-06, | |
| "loss": 0.5522144436836243, | |
| "mean_token_accuracy": 0.8128998279571533, | |
| "num_tokens": 686660.0, | |
| "step": 77 | |
| }, | |
| { | |
| "epoch": 0.05927051671732523, | |
| "grad_norm": 1.591410756111145, | |
| "learning_rate": 1.944444444444445e-06, | |
| "loss": 0.5557606220245361, | |
| "mean_token_accuracy": 0.8052172660827637, | |
| "num_tokens": 707484.0, | |
| "step": 78 | |
| }, | |
| { | |
| "epoch": 0.06003039513677812, | |
| "grad_norm": 2.8112404346466064, | |
| "learning_rate": 1.96969696969697e-06, | |
| "loss": 0.5988024473190308, | |
| "mean_token_accuracy": 0.8061349987983704, | |
| "num_tokens": 714318.0, | |
| "step": 79 | |
| }, | |
| { | |
| "epoch": 0.060790273556231005, | |
| "grad_norm": 2.6897754669189453, | |
| "learning_rate": 1.994949494949495e-06, | |
| "loss": 0.5176495313644409, | |
| "mean_token_accuracy": 0.8129833936691284, | |
| "num_tokens": 720917.0, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.06155015197568389, | |
| "grad_norm": 3.3799517154693604, | |
| "learning_rate": 2.02020202020202e-06, | |
| "loss": 0.43894606828689575, | |
| "mean_token_accuracy": 0.8361872434616089, | |
| "num_tokens": 733885.0, | |
| "step": 81 | |
| }, | |
| { | |
| "epoch": 0.06231003039513678, | |
| "grad_norm": 3.728245496749878, | |
| "learning_rate": 2.0454545454545457e-06, | |
| "loss": 0.613950252532959, | |
| "mean_token_accuracy": 0.8073053359985352, | |
| "num_tokens": 737662.0, | |
| "step": 82 | |
| }, | |
| { | |
| "epoch": 0.06306990881458967, | |
| "grad_norm": 1.8278634548187256, | |
| "learning_rate": 2.070707070707071e-06, | |
| "loss": 0.4874405264854431, | |
| "mean_token_accuracy": 0.8348642587661743, | |
| "num_tokens": 749980.0, | |
| "step": 83 | |
| }, | |
| { | |
| "epoch": 0.06382978723404255, | |
| "grad_norm": 1.807234764099121, | |
| "learning_rate": 2.095959595959596e-06, | |
| "loss": 0.4484546482563019, | |
| "mean_token_accuracy": 0.8600111603736877, | |
| "num_tokens": 762881.0, | |
| "step": 84 | |
| }, | |
| { | |
| "epoch": 0.06458966565349544, | |
| "grad_norm": 2.55515456199646, | |
| "learning_rate": 2.1212121212121216e-06, | |
| "loss": 0.5734638571739197, | |
| "mean_token_accuracy": 0.8466952443122864, | |
| "num_tokens": 769672.0, | |
| "step": 85 | |
| }, | |
| { | |
| "epoch": 0.06534954407294832, | |
| "grad_norm": 7.3995842933654785, | |
| "learning_rate": 2.1464646464646467e-06, | |
| "loss": 0.5208925008773804, | |
| "mean_token_accuracy": 0.8223283886909485, | |
| "num_tokens": 776196.0, | |
| "step": 86 | |
| }, | |
| { | |
| "epoch": 0.06610942249240122, | |
| "grad_norm": 2.283025026321411, | |
| "learning_rate": 2.171717171717172e-06, | |
| "loss": 0.5865270495414734, | |
| "mean_token_accuracy": 0.8025071620941162, | |
| "num_tokens": 784754.0, | |
| "step": 87 | |
| }, | |
| { | |
| "epoch": 0.0668693009118541, | |
| "grad_norm": 2.318805456161499, | |
| "learning_rate": 2.196969696969697e-06, | |
| "loss": 0.613620400428772, | |
| "mean_token_accuracy": 0.7853732109069824, | |
| "num_tokens": 792804.0, | |
| "step": 88 | |
| }, | |
| { | |
| "epoch": 0.067629179331307, | |
| "grad_norm": 1.811884880065918, | |
| "learning_rate": 2.222222222222222e-06, | |
| "loss": 0.5702196955680847, | |
| "mean_token_accuracy": 0.8059632778167725, | |
| "num_tokens": 809277.0, | |
| "step": 89 | |
| }, | |
| { | |
| "epoch": 0.06838905775075987, | |
| "grad_norm": 2.4894015789031982, | |
| "learning_rate": 2.2474747474747476e-06, | |
| "loss": 0.5306090116500854, | |
| "mean_token_accuracy": 0.8193257451057434, | |
| "num_tokens": 815089.0, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.06914893617021277, | |
| "grad_norm": 1.965124487876892, | |
| "learning_rate": 2.2727272727272728e-06, | |
| "loss": 0.5051044225692749, | |
| "mean_token_accuracy": 0.8207112550735474, | |
| "num_tokens": 824557.0, | |
| "step": 91 | |
| }, | |
| { | |
| "epoch": 0.06990881458966565, | |
| "grad_norm": 1.7140872478485107, | |
| "learning_rate": 2.2979797979797983e-06, | |
| "loss": 0.5126519203186035, | |
| "mean_token_accuracy": 0.8226314783096313, | |
| "num_tokens": 836722.0, | |
| "step": 92 | |
| }, | |
| { | |
| "epoch": 0.07066869300911854, | |
| "grad_norm": 1.8673690557479858, | |
| "learning_rate": 2.3232323232323234e-06, | |
| "loss": 0.4990379512310028, | |
| "mean_token_accuracy": 0.8230464458465576, | |
| "num_tokens": 847964.0, | |
| "step": 93 | |
| }, | |
| { | |
| "epoch": 0.07142857142857142, | |
| "grad_norm": 3.2861623764038086, | |
| "learning_rate": 2.348484848484849e-06, | |
| "loss": 0.48529326915740967, | |
| "mean_token_accuracy": 0.8435841202735901, | |
| "num_tokens": 851748.0, | |
| "step": 94 | |
| }, | |
| { | |
| "epoch": 0.07218844984802432, | |
| "grad_norm": 1.7555155754089355, | |
| "learning_rate": 2.373737373737374e-06, | |
| "loss": 0.4947798252105713, | |
| "mean_token_accuracy": 0.8226956129074097, | |
| "num_tokens": 862745.0, | |
| "step": 95 | |
| }, | |
| { | |
| "epoch": 0.0729483282674772, | |
| "grad_norm": 1.8702484369277954, | |
| "learning_rate": 2.3989898989898993e-06, | |
| "loss": 0.556981086730957, | |
| "mean_token_accuracy": 0.8084474802017212, | |
| "num_tokens": 875938.0, | |
| "step": 96 | |
| }, | |
| { | |
| "epoch": 0.0737082066869301, | |
| "grad_norm": 5.505680084228516, | |
| "learning_rate": 2.4242424242424244e-06, | |
| "loss": 0.7146786451339722, | |
| "mean_token_accuracy": 0.7729057669639587, | |
| "num_tokens": 879032.0, | |
| "step": 97 | |
| }, | |
| { | |
| "epoch": 0.07446808510638298, | |
| "grad_norm": 2.6841628551483154, | |
| "learning_rate": 2.4494949494949495e-06, | |
| "loss": 0.5092146396636963, | |
| "mean_token_accuracy": 0.8317261934280396, | |
| "num_tokens": 884771.0, | |
| "step": 98 | |
| }, | |
| { | |
| "epoch": 0.07522796352583587, | |
| "grad_norm": 2.146594524383545, | |
| "learning_rate": 2.474747474747475e-06, | |
| "loss": 0.46619778871536255, | |
| "mean_token_accuracy": 0.829513669013977, | |
| "num_tokens": 899646.0, | |
| "step": 99 | |
| }, | |
| { | |
| "epoch": 0.07598784194528875, | |
| "grad_norm": 2.7602176666259766, | |
| "learning_rate": 2.5e-06, | |
| "loss": 0.6693596839904785, | |
| "mean_token_accuracy": 0.7876315116882324, | |
| "num_tokens": 906491.0, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.07674772036474165, | |
| "grad_norm": 2.2387099266052246, | |
| "learning_rate": 2.5252525252525258e-06, | |
| "loss": 0.6272501945495605, | |
| "mean_token_accuracy": 0.7843590974807739, | |
| "num_tokens": 917335.0, | |
| "step": 101 | |
| }, | |
| { | |
| "epoch": 0.07750759878419453, | |
| "grad_norm": 1.5419353246688843, | |
| "learning_rate": 2.5505050505050505e-06, | |
| "loss": 0.4267471730709076, | |
| "mean_token_accuracy": 0.8374665975570679, | |
| "num_tokens": 933355.0, | |
| "step": 102 | |
| }, | |
| { | |
| "epoch": 0.07826747720364742, | |
| "grad_norm": 2.0402910709381104, | |
| "learning_rate": 2.575757575757576e-06, | |
| "loss": 0.6042163968086243, | |
| "mean_token_accuracy": 0.7968152761459351, | |
| "num_tokens": 945204.0, | |
| "step": 103 | |
| }, | |
| { | |
| "epoch": 0.0790273556231003, | |
| "grad_norm": 1.4149705171585083, | |
| "learning_rate": 2.601010101010101e-06, | |
| "loss": 0.3748396933078766, | |
| "mean_token_accuracy": 0.847211480140686, | |
| "num_tokens": 958612.0, | |
| "step": 104 | |
| }, | |
| { | |
| "epoch": 0.0797872340425532, | |
| "grad_norm": 2.3728153705596924, | |
| "learning_rate": 2.6262626262626267e-06, | |
| "loss": 0.5331054925918579, | |
| "mean_token_accuracy": 0.8208682537078857, | |
| "num_tokens": 965752.0, | |
| "step": 105 | |
| }, | |
| { | |
| "epoch": 0.08054711246200608, | |
| "grad_norm": 3.549079179763794, | |
| "learning_rate": 2.6515151515151514e-06, | |
| "loss": 0.42785799503326416, | |
| "mean_token_accuracy": 0.8573940992355347, | |
| "num_tokens": 969027.0, | |
| "step": 106 | |
| }, | |
| { | |
| "epoch": 0.08130699088145897, | |
| "grad_norm": 4.202904224395752, | |
| "learning_rate": 2.676767676767677e-06, | |
| "loss": 0.5676337480545044, | |
| "mean_token_accuracy": 0.8095206022262573, | |
| "num_tokens": 971972.0, | |
| "step": 107 | |
| }, | |
| { | |
| "epoch": 0.08206686930091185, | |
| "grad_norm": 2.4428415298461914, | |
| "learning_rate": 2.7020202020202025e-06, | |
| "loss": 0.6118494272232056, | |
| "mean_token_accuracy": 0.7927250862121582, | |
| "num_tokens": 980420.0, | |
| "step": 108 | |
| }, | |
| { | |
| "epoch": 0.08282674772036475, | |
| "grad_norm": 1.3100357055664062, | |
| "learning_rate": 2.7272727272727272e-06, | |
| "loss": 0.38054990768432617, | |
| "mean_token_accuracy": 0.8546916842460632, | |
| "num_tokens": 999025.0, | |
| "step": 109 | |
| }, | |
| { | |
| "epoch": 0.08358662613981763, | |
| "grad_norm": 3.113126039505005, | |
| "learning_rate": 2.7525252525252528e-06, | |
| "loss": 0.6473406553268433, | |
| "mean_token_accuracy": 0.7861988544464111, | |
| "num_tokens": 1003949.0, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.08434650455927052, | |
| "grad_norm": 1.9397801160812378, | |
| "learning_rate": 2.7777777777777783e-06, | |
| "loss": 0.5103553533554077, | |
| "mean_token_accuracy": 0.8258156180381775, | |
| "num_tokens": 1015959.0, | |
| "step": 111 | |
| }, | |
| { | |
| "epoch": 0.0851063829787234, | |
| "grad_norm": 2.4072656631469727, | |
| "learning_rate": 2.803030303030303e-06, | |
| "loss": 0.5189981460571289, | |
| "mean_token_accuracy": 0.8395966291427612, | |
| "num_tokens": 1023545.0, | |
| "step": 112 | |
| }, | |
| { | |
| "epoch": 0.0858662613981763, | |
| "grad_norm": 2.847546339035034, | |
| "learning_rate": 2.8282828282828286e-06, | |
| "loss": 0.48659196496009827, | |
| "mean_token_accuracy": 0.8372319340705872, | |
| "num_tokens": 1028486.0, | |
| "step": 113 | |
| }, | |
| { | |
| "epoch": 0.08662613981762918, | |
| "grad_norm": 2.4680769443511963, | |
| "learning_rate": 2.8535353535353537e-06, | |
| "loss": 0.46832722425460815, | |
| "mean_token_accuracy": 0.8386073112487793, | |
| "num_tokens": 1034730.0, | |
| "step": 114 | |
| }, | |
| { | |
| "epoch": 0.08738601823708207, | |
| "grad_norm": 2.969630718231201, | |
| "learning_rate": 2.8787878787878793e-06, | |
| "loss": 0.5375088453292847, | |
| "mean_token_accuracy": 0.8141192197799683, | |
| "num_tokens": 1041427.0, | |
| "step": 115 | |
| }, | |
| { | |
| "epoch": 0.08814589665653495, | |
| "grad_norm": 1.9104801416397095, | |
| "learning_rate": 2.904040404040404e-06, | |
| "loss": 0.4691570997238159, | |
| "mean_token_accuracy": 0.8179234266281128, | |
| "num_tokens": 1052158.0, | |
| "step": 116 | |
| }, | |
| { | |
| "epoch": 0.08890577507598785, | |
| "grad_norm": 1.835992455482483, | |
| "learning_rate": 2.9292929292929295e-06, | |
| "loss": 0.519229531288147, | |
| "mean_token_accuracy": 0.823078989982605, | |
| "num_tokens": 1064753.0, | |
| "step": 117 | |
| }, | |
| { | |
| "epoch": 0.08966565349544073, | |
| "grad_norm": 1.4507238864898682, | |
| "learning_rate": 2.954545454545455e-06, | |
| "loss": 0.44041338562965393, | |
| "mean_token_accuracy": 0.8542196750640869, | |
| "num_tokens": 1080811.0, | |
| "step": 118 | |
| }, | |
| { | |
| "epoch": 0.09042553191489362, | |
| "grad_norm": 2.6786766052246094, | |
| "learning_rate": 2.97979797979798e-06, | |
| "loss": 0.611121654510498, | |
| "mean_token_accuracy": 0.7847856283187866, | |
| "num_tokens": 1087186.0, | |
| "step": 119 | |
| }, | |
| { | |
| "epoch": 0.0911854103343465, | |
| "grad_norm": 2.6856000423431396, | |
| "learning_rate": 3.0050505050505054e-06, | |
| "loss": 0.48068684339523315, | |
| "mean_token_accuracy": 0.8335654139518738, | |
| "num_tokens": 1092987.0, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.0919452887537994, | |
| "grad_norm": 2.734081745147705, | |
| "learning_rate": 3.0303030303030305e-06, | |
| "loss": 0.5843905806541443, | |
| "mean_token_accuracy": 0.8087998628616333, | |
| "num_tokens": 1098814.0, | |
| "step": 121 | |
| }, | |
| { | |
| "epoch": 0.09270516717325228, | |
| "grad_norm": 2.8224918842315674, | |
| "learning_rate": 3.055555555555556e-06, | |
| "loss": 0.5544137954711914, | |
| "mean_token_accuracy": 0.8101569414138794, | |
| "num_tokens": 1104380.0, | |
| "step": 122 | |
| }, | |
| { | |
| "epoch": 0.09346504559270517, | |
| "grad_norm": 2.3097996711730957, | |
| "learning_rate": 3.0808080808080807e-06, | |
| "loss": 0.5377509593963623, | |
| "mean_token_accuracy": 0.8328956961631775, | |
| "num_tokens": 1112904.0, | |
| "step": 123 | |
| }, | |
| { | |
| "epoch": 0.09422492401215805, | |
| "grad_norm": 1.8075451850891113, | |
| "learning_rate": 3.1060606060606063e-06, | |
| "loss": 0.44285160303115845, | |
| "mean_token_accuracy": 0.8430694341659546, | |
| "num_tokens": 1122381.0, | |
| "step": 124 | |
| }, | |
| { | |
| "epoch": 0.09498480243161095, | |
| "grad_norm": 1.7106983661651611, | |
| "learning_rate": 3.131313131313132e-06, | |
| "loss": 0.5341939926147461, | |
| "mean_token_accuracy": 0.833850622177124, | |
| "num_tokens": 1135861.0, | |
| "step": 125 | |
| }, | |
| { | |
| "epoch": 0.09574468085106383, | |
| "grad_norm": 2.5285379886627197, | |
| "learning_rate": 3.1565656565656566e-06, | |
| "loss": 0.4692184031009674, | |
| "mean_token_accuracy": 0.8595664501190186, | |
| "num_tokens": 1142032.0, | |
| "step": 126 | |
| }, | |
| { | |
| "epoch": 0.09650455927051672, | |
| "grad_norm": 1.740753412246704, | |
| "learning_rate": 3.181818181818182e-06, | |
| "loss": 0.4924545884132385, | |
| "mean_token_accuracy": 0.8333353996276855, | |
| "num_tokens": 1155508.0, | |
| "step": 127 | |
| }, | |
| { | |
| "epoch": 0.0972644376899696, | |
| "grad_norm": 3.07169246673584, | |
| "learning_rate": 3.2070707070707072e-06, | |
| "loss": 0.6479271054267883, | |
| "mean_token_accuracy": 0.7937667965888977, | |
| "num_tokens": 1160594.0, | |
| "step": 128 | |
| }, | |
| { | |
| "epoch": 0.0980243161094225, | |
| "grad_norm": 2.6075611114501953, | |
| "learning_rate": 3.232323232323233e-06, | |
| "loss": 0.5685735940933228, | |
| "mean_token_accuracy": 0.8052327632904053, | |
| "num_tokens": 1169828.0, | |
| "step": 129 | |
| }, | |
| { | |
| "epoch": 0.09878419452887538, | |
| "grad_norm": 2.055074453353882, | |
| "learning_rate": 3.257575757575758e-06, | |
| "loss": 0.401486873626709, | |
| "mean_token_accuracy": 0.864121675491333, | |
| "num_tokens": 1177760.0, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.09954407294832827, | |
| "grad_norm": 1.610193133354187, | |
| "learning_rate": 3.282828282828283e-06, | |
| "loss": 0.4663908779621124, | |
| "mean_token_accuracy": 0.8454406261444092, | |
| "num_tokens": 1190196.0, | |
| "step": 131 | |
| }, | |
| { | |
| "epoch": 0.10030395136778116, | |
| "grad_norm": 3.7052183151245117, | |
| "learning_rate": 3.3080808080808086e-06, | |
| "loss": 0.5785504579544067, | |
| "mean_token_accuracy": 0.7985049486160278, | |
| "num_tokens": 1194569.0, | |
| "step": 132 | |
| }, | |
| { | |
| "epoch": 0.10106382978723404, | |
| "grad_norm": 2.4421536922454834, | |
| "learning_rate": 3.3333333333333333e-06, | |
| "loss": 0.47366881370544434, | |
| "mean_token_accuracy": 0.8285588026046753, | |
| "num_tokens": 1200246.0, | |
| "step": 133 | |
| }, | |
| { | |
| "epoch": 0.10182370820668693, | |
| "grad_norm": 2.562891721725464, | |
| "learning_rate": 3.358585858585859e-06, | |
| "loss": 0.4463881552219391, | |
| "mean_token_accuracy": 0.874564528465271, | |
| "num_tokens": 1204990.0, | |
| "step": 134 | |
| }, | |
| { | |
| "epoch": 0.10258358662613981, | |
| "grad_norm": 2.4984731674194336, | |
| "learning_rate": 3.3838383838383844e-06, | |
| "loss": 0.36950692534446716, | |
| "mean_token_accuracy": 0.8726234436035156, | |
| "num_tokens": 1209798.0, | |
| "step": 135 | |
| }, | |
| { | |
| "epoch": 0.1033434650455927, | |
| "grad_norm": 2.000546455383301, | |
| "learning_rate": 3.409090909090909e-06, | |
| "loss": 0.5601820945739746, | |
| "mean_token_accuracy": 0.8426068425178528, | |
| "num_tokens": 1222406.0, | |
| "step": 136 | |
| }, | |
| { | |
| "epoch": 0.10410334346504559, | |
| "grad_norm": 3.5904040336608887, | |
| "learning_rate": 3.4343434343434347e-06, | |
| "loss": 0.46701300144195557, | |
| "mean_token_accuracy": 0.8508556485176086, | |
| "num_tokens": 1225928.0, | |
| "step": 137 | |
| }, | |
| { | |
| "epoch": 0.10486322188449848, | |
| "grad_norm": 2.627048969268799, | |
| "learning_rate": 3.45959595959596e-06, | |
| "loss": 0.5123917460441589, | |
| "mean_token_accuracy": 0.8308509588241577, | |
| "num_tokens": 1232582.0, | |
| "step": 138 | |
| }, | |
| { | |
| "epoch": 0.10562310030395136, | |
| "grad_norm": 1.6476247310638428, | |
| "learning_rate": 3.4848484848484854e-06, | |
| "loss": 0.37991654872894287, | |
| "mean_token_accuracy": 0.8649545907974243, | |
| "num_tokens": 1242846.0, | |
| "step": 139 | |
| }, | |
| { | |
| "epoch": 0.10638297872340426, | |
| "grad_norm": 1.9283066987991333, | |
| "learning_rate": 3.51010101010101e-06, | |
| "loss": 0.44996580481529236, | |
| "mean_token_accuracy": 0.8297461867332458, | |
| "num_tokens": 1251870.0, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.10714285714285714, | |
| "grad_norm": 2.539581298828125, | |
| "learning_rate": 3.5353535353535356e-06, | |
| "loss": 0.5692148804664612, | |
| "mean_token_accuracy": 0.8007944226264954, | |
| "num_tokens": 1259323.0, | |
| "step": 141 | |
| }, | |
| { | |
| "epoch": 0.10790273556231003, | |
| "grad_norm": 3.483673572540283, | |
| "learning_rate": 3.560606060606061e-06, | |
| "loss": 0.4150466322898865, | |
| "mean_token_accuracy": 0.8599950671195984, | |
| "num_tokens": 1262353.0, | |
| "step": 142 | |
| }, | |
| { | |
| "epoch": 0.10866261398176291, | |
| "grad_norm": 2.72830867767334, | |
| "learning_rate": 3.585858585858586e-06, | |
| "loss": 0.5083350539207458, | |
| "mean_token_accuracy": 0.8165005445480347, | |
| "num_tokens": 1267258.0, | |
| "step": 143 | |
| }, | |
| { | |
| "epoch": 0.1094224924012158, | |
| "grad_norm": 3.0948173999786377, | |
| "learning_rate": 3.6111111111111115e-06, | |
| "loss": 0.6016761064529419, | |
| "mean_token_accuracy": 0.7953758239746094, | |
| "num_tokens": 1274814.0, | |
| "step": 144 | |
| }, | |
| { | |
| "epoch": 0.11018237082066869, | |
| "grad_norm": 1.8370214700698853, | |
| "learning_rate": 3.6363636363636366e-06, | |
| "loss": 0.46101880073547363, | |
| "mean_token_accuracy": 0.842634916305542, | |
| "num_tokens": 1285621.0, | |
| "step": 145 | |
| }, | |
| { | |
| "epoch": 0.11094224924012158, | |
| "grad_norm": 2.6534411907196045, | |
| "learning_rate": 3.661616161616162e-06, | |
| "loss": 0.49813517928123474, | |
| "mean_token_accuracy": 0.8288999199867249, | |
| "num_tokens": 1291286.0, | |
| "step": 146 | |
| }, | |
| { | |
| "epoch": 0.11170212765957446, | |
| "grad_norm": 2.3179194927215576, | |
| "learning_rate": 3.686868686868687e-06, | |
| "loss": 0.37569794058799744, | |
| "mean_token_accuracy": 0.8637095093727112, | |
| "num_tokens": 1296590.0, | |
| "step": 147 | |
| }, | |
| { | |
| "epoch": 0.11246200607902736, | |
| "grad_norm": 2.950357675552368, | |
| "learning_rate": 3.7121212121212124e-06, | |
| "loss": 0.36906760931015015, | |
| "mean_token_accuracy": 0.8722575306892395, | |
| "num_tokens": 1300285.0, | |
| "step": 148 | |
| }, | |
| { | |
| "epoch": 0.11322188449848024, | |
| "grad_norm": 2.5531680583953857, | |
| "learning_rate": 3.737373737373738e-06, | |
| "loss": 0.6222835779190063, | |
| "mean_token_accuracy": 0.8069770336151123, | |
| "num_tokens": 1308128.0, | |
| "step": 149 | |
| }, | |
| { | |
| "epoch": 0.11398176291793313, | |
| "grad_norm": 1.5450068712234497, | |
| "learning_rate": 3.7626262626262627e-06, | |
| "loss": 0.5184916257858276, | |
| "mean_token_accuracy": 0.8111289143562317, | |
| "num_tokens": 1322203.0, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.11474164133738601, | |
| "grad_norm": 2.367525815963745, | |
| "learning_rate": 3.7878787878787882e-06, | |
| "loss": 0.5896013975143433, | |
| "mean_token_accuracy": 0.8134846687316895, | |
| "num_tokens": 1330319.0, | |
| "step": 151 | |
| }, | |
| { | |
| "epoch": 0.11550151975683891, | |
| "grad_norm": 2.4617726802825928, | |
| "learning_rate": 3.8131313131313138e-06, | |
| "loss": 0.48776859045028687, | |
| "mean_token_accuracy": 0.8270028829574585, | |
| "num_tokens": 1336441.0, | |
| "step": 152 | |
| }, | |
| { | |
| "epoch": 0.11626139817629179, | |
| "grad_norm": 2.1757259368896484, | |
| "learning_rate": 3.8383838383838385e-06, | |
| "loss": 0.46739453077316284, | |
| "mean_token_accuracy": 0.8506530523300171, | |
| "num_tokens": 1344429.0, | |
| "step": 153 | |
| }, | |
| { | |
| "epoch": 0.11702127659574468, | |
| "grad_norm": 1.836829662322998, | |
| "learning_rate": 3.863636363636364e-06, | |
| "loss": 0.39413154125213623, | |
| "mean_token_accuracy": 0.8306484222412109, | |
| "num_tokens": 1353819.0, | |
| "step": 154 | |
| }, | |
| { | |
| "epoch": 0.11778115501519756, | |
| "grad_norm": 1.9097377061843872, | |
| "learning_rate": 3.88888888888889e-06, | |
| "loss": 0.5854921340942383, | |
| "mean_token_accuracy": 0.808419942855835, | |
| "num_tokens": 1367394.0, | |
| "step": 155 | |
| }, | |
| { | |
| "epoch": 0.11854103343465046, | |
| "grad_norm": 1.3445755243301392, | |
| "learning_rate": 3.914141414141415e-06, | |
| "loss": 0.424688458442688, | |
| "mean_token_accuracy": 0.8546179533004761, | |
| "num_tokens": 1385196.0, | |
| "step": 156 | |
| }, | |
| { | |
| "epoch": 0.11930091185410334, | |
| "grad_norm": 3.0801331996917725, | |
| "learning_rate": 3.93939393939394e-06, | |
| "loss": 0.5556809306144714, | |
| "mean_token_accuracy": 0.839694619178772, | |
| "num_tokens": 1389204.0, | |
| "step": 157 | |
| }, | |
| { | |
| "epoch": 0.12006079027355623, | |
| "grad_norm": 2.406383991241455, | |
| "learning_rate": 3.964646464646465e-06, | |
| "loss": 0.577480673789978, | |
| "mean_token_accuracy": 0.7948997020721436, | |
| "num_tokens": 1395989.0, | |
| "step": 158 | |
| }, | |
| { | |
| "epoch": 0.12082066869300911, | |
| "grad_norm": 2.291191339492798, | |
| "learning_rate": 3.98989898989899e-06, | |
| "loss": 0.47489291429519653, | |
| "mean_token_accuracy": 0.8395655751228333, | |
| "num_tokens": 1403502.0, | |
| "step": 159 | |
| }, | |
| { | |
| "epoch": 0.12158054711246201, | |
| "grad_norm": 2.4482150077819824, | |
| "learning_rate": 4.015151515151515e-06, | |
| "loss": 0.4630856215953827, | |
| "mean_token_accuracy": 0.8551818132400513, | |
| "num_tokens": 1409583.0, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.12234042553191489, | |
| "grad_norm": 2.2204723358154297, | |
| "learning_rate": 4.04040404040404e-06, | |
| "loss": 0.5246984958648682, | |
| "mean_token_accuracy": 0.8199984431266785, | |
| "num_tokens": 1417105.0, | |
| "step": 161 | |
| }, | |
| { | |
| "epoch": 0.12310030395136778, | |
| "grad_norm": 2.2438621520996094, | |
| "learning_rate": 4.065656565656566e-06, | |
| "loss": 0.5016493797302246, | |
| "mean_token_accuracy": 0.8170363903045654, | |
| "num_tokens": 1424524.0, | |
| "step": 162 | |
| }, | |
| { | |
| "epoch": 0.12386018237082067, | |
| "grad_norm": 1.9608901739120483, | |
| "learning_rate": 4.0909090909090915e-06, | |
| "loss": 0.4367106556892395, | |
| "mean_token_accuracy": 0.866153359413147, | |
| "num_tokens": 1433634.0, | |
| "step": 163 | |
| }, | |
| { | |
| "epoch": 0.12462006079027356, | |
| "grad_norm": 1.6554137468338013, | |
| "learning_rate": 4.116161616161617e-06, | |
| "loss": 0.5264513492584229, | |
| "mean_token_accuracy": 0.8171229958534241, | |
| "num_tokens": 1449026.0, | |
| "step": 164 | |
| }, | |
| { | |
| "epoch": 0.12537993920972645, | |
| "grad_norm": 2.085062265396118, | |
| "learning_rate": 4.141414141414142e-06, | |
| "loss": 0.5241636037826538, | |
| "mean_token_accuracy": 0.8255504965782166, | |
| "num_tokens": 1463885.0, | |
| "step": 165 | |
| }, | |
| { | |
| "epoch": 0.12613981762917933, | |
| "grad_norm": 3.6241228580474854, | |
| "learning_rate": 4.166666666666667e-06, | |
| "loss": 0.47209346294403076, | |
| "mean_token_accuracy": 0.848996102809906, | |
| "num_tokens": 1467858.0, | |
| "step": 166 | |
| }, | |
| { | |
| "epoch": 0.12689969604863222, | |
| "grad_norm": 1.6649004220962524, | |
| "learning_rate": 4.191919191919192e-06, | |
| "loss": 0.49254295229911804, | |
| "mean_token_accuracy": 0.8267063498497009, | |
| "num_tokens": 1478671.0, | |
| "step": 167 | |
| }, | |
| { | |
| "epoch": 0.1276595744680851, | |
| "grad_norm": 1.5069276094436646, | |
| "learning_rate": 4.217171717171717e-06, | |
| "loss": 0.49602210521698, | |
| "mean_token_accuracy": 0.8417288661003113, | |
| "num_tokens": 1499120.0, | |
| "step": 168 | |
| }, | |
| { | |
| "epoch": 0.128419452887538, | |
| "grad_norm": 1.7290006875991821, | |
| "learning_rate": 4.242424242424243e-06, | |
| "loss": 0.4862136244773865, | |
| "mean_token_accuracy": 0.8200680017471313, | |
| "num_tokens": 1511400.0, | |
| "step": 169 | |
| }, | |
| { | |
| "epoch": 0.12917933130699089, | |
| "grad_norm": 2.7758901119232178, | |
| "learning_rate": 4.267676767676767e-06, | |
| "loss": 0.5446156859397888, | |
| "mean_token_accuracy": 0.8082899451255798, | |
| "num_tokens": 1518381.0, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 0.12993920972644377, | |
| "grad_norm": 2.673225164413452, | |
| "learning_rate": 4.292929292929293e-06, | |
| "loss": 0.603697657585144, | |
| "mean_token_accuracy": 0.814541757106781, | |
| "num_tokens": 1525862.0, | |
| "step": 171 | |
| }, | |
| { | |
| "epoch": 0.13069908814589665, | |
| "grad_norm": 3.221341133117676, | |
| "learning_rate": 4.3181818181818185e-06, | |
| "loss": 0.3855879604816437, | |
| "mean_token_accuracy": 0.8715401291847229, | |
| "num_tokens": 1529082.0, | |
| "step": 172 | |
| }, | |
| { | |
| "epoch": 0.13145896656534956, | |
| "grad_norm": 2.2676260471343994, | |
| "learning_rate": 4.343434343434344e-06, | |
| "loss": 0.39173901081085205, | |
| "mean_token_accuracy": 0.8528841733932495, | |
| "num_tokens": 1535440.0, | |
| "step": 173 | |
| }, | |
| { | |
| "epoch": 0.13221884498480244, | |
| "grad_norm": 1.9669594764709473, | |
| "learning_rate": 4.368686868686869e-06, | |
| "loss": 0.4901638627052307, | |
| "mean_token_accuracy": 0.8228653073310852, | |
| "num_tokens": 1544007.0, | |
| "step": 174 | |
| }, | |
| { | |
| "epoch": 0.13297872340425532, | |
| "grad_norm": 2.5451693534851074, | |
| "learning_rate": 4.393939393939394e-06, | |
| "loss": 0.5373168587684631, | |
| "mean_token_accuracy": 0.8103193044662476, | |
| "num_tokens": 1550958.0, | |
| "step": 175 | |
| }, | |
| { | |
| "epoch": 0.1337386018237082, | |
| "grad_norm": 1.6717054843902588, | |
| "learning_rate": 4.41919191919192e-06, | |
| "loss": 0.5187485218048096, | |
| "mean_token_accuracy": 0.8374402523040771, | |
| "num_tokens": 1565548.0, | |
| "step": 176 | |
| }, | |
| { | |
| "epoch": 0.1344984802431611, | |
| "grad_norm": 1.7334532737731934, | |
| "learning_rate": 4.444444444444444e-06, | |
| "loss": 0.49591949582099915, | |
| "mean_token_accuracy": 0.8237836360931396, | |
| "num_tokens": 1577431.0, | |
| "step": 177 | |
| }, | |
| { | |
| "epoch": 0.135258358662614, | |
| "grad_norm": 3.037680149078369, | |
| "learning_rate": 4.46969696969697e-06, | |
| "loss": 0.3759227395057678, | |
| "mean_token_accuracy": 0.8660281896591187, | |
| "num_tokens": 1580945.0, | |
| "step": 178 | |
| }, | |
| { | |
| "epoch": 0.13601823708206687, | |
| "grad_norm": 2.263552188873291, | |
| "learning_rate": 4.494949494949495e-06, | |
| "loss": 0.519679069519043, | |
| "mean_token_accuracy": 0.8363398313522339, | |
| "num_tokens": 1588403.0, | |
| "step": 179 | |
| }, | |
| { | |
| "epoch": 0.13677811550151975, | |
| "grad_norm": 2.6347336769104004, | |
| "learning_rate": 4.520202020202021e-06, | |
| "loss": 0.5427145957946777, | |
| "mean_token_accuracy": 0.8300544023513794, | |
| "num_tokens": 1594552.0, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 0.13753799392097266, | |
| "grad_norm": 2.181586503982544, | |
| "learning_rate": 4.5454545454545455e-06, | |
| "loss": 0.5583125352859497, | |
| "mean_token_accuracy": 0.8262332081794739, | |
| "num_tokens": 1603003.0, | |
| "step": 181 | |
| }, | |
| { | |
| "epoch": 0.13829787234042554, | |
| "grad_norm": 2.5012893676757812, | |
| "learning_rate": 4.5707070707070715e-06, | |
| "loss": 0.3074784278869629, | |
| "mean_token_accuracy": 0.8823067545890808, | |
| "num_tokens": 1607891.0, | |
| "step": 182 | |
| }, | |
| { | |
| "epoch": 0.13905775075987842, | |
| "grad_norm": 2.5653810501098633, | |
| "learning_rate": 4.595959595959597e-06, | |
| "loss": 0.5661969184875488, | |
| "mean_token_accuracy": 0.812934398651123, | |
| "num_tokens": 1614430.0, | |
| "step": 183 | |
| }, | |
| { | |
| "epoch": 0.1398176291793313, | |
| "grad_norm": 2.126523017883301, | |
| "learning_rate": 4.621212121212122e-06, | |
| "loss": 0.49272066354751587, | |
| "mean_token_accuracy": 0.8277758359909058, | |
| "num_tokens": 1621646.0, | |
| "step": 184 | |
| }, | |
| { | |
| "epoch": 0.1405775075987842, | |
| "grad_norm": 2.557128429412842, | |
| "learning_rate": 4.646464646464647e-06, | |
| "loss": 0.45818793773651123, | |
| "mean_token_accuracy": 0.8401618003845215, | |
| "num_tokens": 1627007.0, | |
| "step": 185 | |
| }, | |
| { | |
| "epoch": 0.1413373860182371, | |
| "grad_norm": 3.263108491897583, | |
| "learning_rate": 4.671717171717172e-06, | |
| "loss": 0.36023402214050293, | |
| "mean_token_accuracy": 0.8685399293899536, | |
| "num_tokens": 1629994.0, | |
| "step": 186 | |
| }, | |
| { | |
| "epoch": 0.14209726443768997, | |
| "grad_norm": 1.8945348262786865, | |
| "learning_rate": 4.696969696969698e-06, | |
| "loss": 0.5250070095062256, | |
| "mean_token_accuracy": 0.8402786254882812, | |
| "num_tokens": 1642363.0, | |
| "step": 187 | |
| }, | |
| { | |
| "epoch": 0.14285714285714285, | |
| "grad_norm": 3.2461509704589844, | |
| "learning_rate": 4.722222222222222e-06, | |
| "loss": 0.4553906321525574, | |
| "mean_token_accuracy": 0.8446127772331238, | |
| "num_tokens": 1646341.0, | |
| "step": 188 | |
| }, | |
| { | |
| "epoch": 0.14361702127659576, | |
| "grad_norm": 1.7286163568496704, | |
| "learning_rate": 4.747474747474748e-06, | |
| "loss": 0.47110071778297424, | |
| "mean_token_accuracy": 0.8438384532928467, | |
| "num_tokens": 1657009.0, | |
| "step": 189 | |
| }, | |
| { | |
| "epoch": 0.14437689969604864, | |
| "grad_norm": 2.5611977577209473, | |
| "learning_rate": 4.772727272727273e-06, | |
| "loss": 0.40714097023010254, | |
| "mean_token_accuracy": 0.8551381230354309, | |
| "num_tokens": 1662133.0, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 0.14513677811550152, | |
| "grad_norm": 2.824475049972534, | |
| "learning_rate": 4.7979797979797985e-06, | |
| "loss": 0.37457284331321716, | |
| "mean_token_accuracy": 0.8744102716445923, | |
| "num_tokens": 1666071.0, | |
| "step": 191 | |
| }, | |
| { | |
| "epoch": 0.1458966565349544, | |
| "grad_norm": 2.905911445617676, | |
| "learning_rate": 4.823232323232324e-06, | |
| "loss": 0.4000989496707916, | |
| "mean_token_accuracy": 0.86374431848526, | |
| "num_tokens": 1670783.0, | |
| "step": 192 | |
| }, | |
| { | |
| "epoch": 0.1466565349544073, | |
| "grad_norm": 2.0857510566711426, | |
| "learning_rate": 4.848484848484849e-06, | |
| "loss": 0.5203642845153809, | |
| "mean_token_accuracy": 0.8152116537094116, | |
| "num_tokens": 1681211.0, | |
| "step": 193 | |
| }, | |
| { | |
| "epoch": 0.1474164133738602, | |
| "grad_norm": 2.348444938659668, | |
| "learning_rate": 4.873737373737374e-06, | |
| "loss": 0.47104156017303467, | |
| "mean_token_accuracy": 0.8531544208526611, | |
| "num_tokens": 1688386.0, | |
| "step": 194 | |
| }, | |
| { | |
| "epoch": 0.14817629179331307, | |
| "grad_norm": 2.4826736450195312, | |
| "learning_rate": 4.898989898989899e-06, | |
| "loss": 0.38890179991722107, | |
| "mean_token_accuracy": 0.8573125004768372, | |
| "num_tokens": 1693788.0, | |
| "step": 195 | |
| }, | |
| { | |
| "epoch": 0.14893617021276595, | |
| "grad_norm": 2.54610276222229, | |
| "learning_rate": 4.924242424242425e-06, | |
| "loss": 0.513835072517395, | |
| "mean_token_accuracy": 0.8168267011642456, | |
| "num_tokens": 1700459.0, | |
| "step": 196 | |
| }, | |
| { | |
| "epoch": 0.14969604863221886, | |
| "grad_norm": 2.144178867340088, | |
| "learning_rate": 4.94949494949495e-06, | |
| "loss": 0.39389657974243164, | |
| "mean_token_accuracy": 0.860227108001709, | |
| "num_tokens": 1707935.0, | |
| "step": 197 | |
| }, | |
| { | |
| "epoch": 0.15045592705167174, | |
| "grad_norm": 2.5684738159179688, | |
| "learning_rate": 4.974747474747475e-06, | |
| "loss": 0.4755311608314514, | |
| "mean_token_accuracy": 0.8305763602256775, | |
| "num_tokens": 1713174.0, | |
| "step": 198 | |
| }, | |
| { | |
| "epoch": 0.15121580547112462, | |
| "grad_norm": 1.8828866481781006, | |
| "learning_rate": 5e-06, | |
| "loss": 0.4542219638824463, | |
| "mean_token_accuracy": 0.8394654393196106, | |
| "num_tokens": 1721615.0, | |
| "step": 199 | |
| }, | |
| { | |
| "epoch": 0.1519756838905775, | |
| "grad_norm": 3.436140775680542, | |
| "learning_rate": 4.999999122701883e-06, | |
| "loss": 0.3606000542640686, | |
| "mean_token_accuracy": 0.8709638118743896, | |
| "num_tokens": 1724591.0, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.15273556231003038, | |
| "grad_norm": 3.222001791000366, | |
| "learning_rate": 4.999996490808146e-06, | |
| "loss": 0.43145138025283813, | |
| "mean_token_accuracy": 0.8614907264709473, | |
| "num_tokens": 1728551.0, | |
| "step": 201 | |
| }, | |
| { | |
| "epoch": 0.1534954407294833, | |
| "grad_norm": 1.425344705581665, | |
| "learning_rate": 4.9999921043206356e-06, | |
| "loss": 0.39088189601898193, | |
| "mean_token_accuracy": 0.8567240238189697, | |
| "num_tokens": 1742833.0, | |
| "step": 202 | |
| }, | |
| { | |
| "epoch": 0.15425531914893617, | |
| "grad_norm": 4.0360260009765625, | |
| "learning_rate": 4.999985963242432e-06, | |
| "loss": 0.5054274797439575, | |
| "mean_token_accuracy": 0.830514669418335, | |
| "num_tokens": 1745849.0, | |
| "step": 203 | |
| }, | |
| { | |
| "epoch": 0.15501519756838905, | |
| "grad_norm": 3.223965644836426, | |
| "learning_rate": 4.999978067577844e-06, | |
| "loss": 0.40649789571762085, | |
| "mean_token_accuracy": 0.8544188141822815, | |
| "num_tokens": 1749247.0, | |
| "step": 204 | |
| }, | |
| { | |
| "epoch": 0.15577507598784193, | |
| "grad_norm": 2.2814829349517822, | |
| "learning_rate": 4.999968417332415e-06, | |
| "loss": 0.5317715406417847, | |
| "mean_token_accuracy": 0.8255351781845093, | |
| "num_tokens": 1756221.0, | |
| "step": 205 | |
| }, | |
| { | |
| "epoch": 0.15653495440729484, | |
| "grad_norm": 2.2380337715148926, | |
| "learning_rate": 4.999957012512916e-06, | |
| "loss": 0.4584760069847107, | |
| "mean_token_accuracy": 0.836658239364624, | |
| "num_tokens": 1762609.0, | |
| "step": 206 | |
| }, | |
| { | |
| "epoch": 0.15729483282674772, | |
| "grad_norm": 1.8422174453735352, | |
| "learning_rate": 4.999943853127351e-06, | |
| "loss": 0.4300195574760437, | |
| "mean_token_accuracy": 0.8459518551826477, | |
| "num_tokens": 1771234.0, | |
| "step": 207 | |
| }, | |
| { | |
| "epoch": 0.1580547112462006, | |
| "grad_norm": 2.113293170928955, | |
| "learning_rate": 4.999928939184958e-06, | |
| "loss": 0.3882524371147156, | |
| "mean_token_accuracy": 0.8600642681121826, | |
| "num_tokens": 1778111.0, | |
| "step": 208 | |
| }, | |
| { | |
| "epoch": 0.15881458966565348, | |
| "grad_norm": 3.6378543376922607, | |
| "learning_rate": 4.999912270696202e-06, | |
| "loss": 0.5725066661834717, | |
| "mean_token_accuracy": 0.8156357407569885, | |
| "num_tokens": 1781576.0, | |
| "step": 209 | |
| }, | |
| { | |
| "epoch": 0.1595744680851064, | |
| "grad_norm": 2.112945079803467, | |
| "learning_rate": 4.999893847672783e-06, | |
| "loss": 0.5687650442123413, | |
| "mean_token_accuracy": 0.80972820520401, | |
| "num_tokens": 1790799.0, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 0.16033434650455927, | |
| "grad_norm": 2.2433907985687256, | |
| "learning_rate": 4.99987367012763e-06, | |
| "loss": 0.599341094493866, | |
| "mean_token_accuracy": 0.7983472347259521, | |
| "num_tokens": 1800040.0, | |
| "step": 211 | |
| }, | |
| { | |
| "epoch": 0.16109422492401215, | |
| "grad_norm": 2.1451005935668945, | |
| "learning_rate": 4.999851738074904e-06, | |
| "loss": 0.6137303709983826, | |
| "mean_token_accuracy": 0.7854923605918884, | |
| "num_tokens": 1816346.0, | |
| "step": 212 | |
| }, | |
| { | |
| "epoch": 0.16185410334346503, | |
| "grad_norm": 2.982390880584717, | |
| "learning_rate": 4.9998280515300006e-06, | |
| "loss": 0.5502551794052124, | |
| "mean_token_accuracy": 0.8052335977554321, | |
| "num_tokens": 1821568.0, | |
| "step": 213 | |
| }, | |
| { | |
| "epoch": 0.16261398176291794, | |
| "grad_norm": 3.35490345954895, | |
| "learning_rate": 4.999802610509541e-06, | |
| "loss": 0.545773983001709, | |
| "mean_token_accuracy": 0.8195146322250366, | |
| "num_tokens": 1825494.0, | |
| "step": 214 | |
| }, | |
| { | |
| "epoch": 0.16337386018237082, | |
| "grad_norm": 3.035769462585449, | |
| "learning_rate": 4.999775415031381e-06, | |
| "loss": 0.5717880129814148, | |
| "mean_token_accuracy": 0.8186711668968201, | |
| "num_tokens": 1829808.0, | |
| "step": 215 | |
| }, | |
| { | |
| "epoch": 0.1641337386018237, | |
| "grad_norm": 2.9388792514801025, | |
| "learning_rate": 4.999746465114609e-06, | |
| "loss": 0.5311149954795837, | |
| "mean_token_accuracy": 0.8200923204421997, | |
| "num_tokens": 1834365.0, | |
| "step": 216 | |
| }, | |
| { | |
| "epoch": 0.16489361702127658, | |
| "grad_norm": 1.7520580291748047, | |
| "learning_rate": 4.999715760779541e-06, | |
| "loss": 0.4936062693595886, | |
| "mean_token_accuracy": 0.8098561763763428, | |
| "num_tokens": 1846298.0, | |
| "step": 217 | |
| }, | |
| { | |
| "epoch": 0.1656534954407295, | |
| "grad_norm": 1.4501854181289673, | |
| "learning_rate": 4.999683302047729e-06, | |
| "loss": 0.4379549026489258, | |
| "mean_token_accuracy": 0.8455474376678467, | |
| "num_tokens": 1862718.0, | |
| "step": 218 | |
| }, | |
| { | |
| "epoch": 0.16641337386018237, | |
| "grad_norm": 1.5712978839874268, | |
| "learning_rate": 4.999649088941951e-06, | |
| "loss": 0.36806052923202515, | |
| "mean_token_accuracy": 0.8413015604019165, | |
| "num_tokens": 1873162.0, | |
| "step": 219 | |
| }, | |
| { | |
| "epoch": 0.16717325227963525, | |
| "grad_norm": 3.3667683601379395, | |
| "learning_rate": 4.999613121486222e-06, | |
| "loss": 0.5865274667739868, | |
| "mean_token_accuracy": 0.8232425451278687, | |
| "num_tokens": 1877383.0, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 0.16793313069908813, | |
| "grad_norm": 2.027570962905884, | |
| "learning_rate": 4.999575399705782e-06, | |
| "loss": 0.47757071256637573, | |
| "mean_token_accuracy": 0.8435803651809692, | |
| "num_tokens": 1885443.0, | |
| "step": 221 | |
| }, | |
| { | |
| "epoch": 0.16869300911854104, | |
| "grad_norm": 1.9477081298828125, | |
| "learning_rate": 4.9995359236271094e-06, | |
| "loss": 0.49632883071899414, | |
| "mean_token_accuracy": 0.8447240591049194, | |
| "num_tokens": 1897117.0, | |
| "step": 222 | |
| }, | |
| { | |
| "epoch": 0.16945288753799392, | |
| "grad_norm": 2.1391282081604004, | |
| "learning_rate": 4.9994946932779076e-06, | |
| "loss": 0.6010444164276123, | |
| "mean_token_accuracy": 0.8133578300476074, | |
| "num_tokens": 1907743.0, | |
| "step": 223 | |
| }, | |
| { | |
| "epoch": 0.1702127659574468, | |
| "grad_norm": 3.3081393241882324, | |
| "learning_rate": 4.999451708687114e-06, | |
| "loss": 0.5357567071914673, | |
| "mean_token_accuracy": 0.8079873323440552, | |
| "num_tokens": 1911565.0, | |
| "step": 224 | |
| }, | |
| { | |
| "epoch": 0.17097264437689969, | |
| "grad_norm": 2.4006707668304443, | |
| "learning_rate": 4.999406969884897e-06, | |
| "loss": 0.5457048416137695, | |
| "mean_token_accuracy": 0.8097658753395081, | |
| "num_tokens": 1918799.0, | |
| "step": 225 | |
| }, | |
| { | |
| "epoch": 0.1717325227963526, | |
| "grad_norm": 1.8322361707687378, | |
| "learning_rate": 4.999360476902656e-06, | |
| "loss": 0.4240890145301819, | |
| "mean_token_accuracy": 0.8516747951507568, | |
| "num_tokens": 1927555.0, | |
| "step": 226 | |
| }, | |
| { | |
| "epoch": 0.17249240121580547, | |
| "grad_norm": 3.136608600616455, | |
| "learning_rate": 4.999312229773022e-06, | |
| "loss": 0.4659491181373596, | |
| "mean_token_accuracy": 0.838162899017334, | |
| "num_tokens": 1931836.0, | |
| "step": 227 | |
| }, | |
| { | |
| "epoch": 0.17325227963525835, | |
| "grad_norm": 2.2767984867095947, | |
| "learning_rate": 4.999262228529855e-06, | |
| "loss": 0.563478946685791, | |
| "mean_token_accuracy": 0.8091976642608643, | |
| "num_tokens": 1939554.0, | |
| "step": 228 | |
| }, | |
| { | |
| "epoch": 0.17401215805471124, | |
| "grad_norm": 1.4322718381881714, | |
| "learning_rate": 4.99921047320825e-06, | |
| "loss": 0.4094346761703491, | |
| "mean_token_accuracy": 0.8564238548278809, | |
| "num_tokens": 1954282.0, | |
| "step": 229 | |
| }, | |
| { | |
| "epoch": 0.17477203647416414, | |
| "grad_norm": 3.2467753887176514, | |
| "learning_rate": 4.99915696384453e-06, | |
| "loss": 0.5694391131401062, | |
| "mean_token_accuracy": 0.8160352110862732, | |
| "num_tokens": 1958627.0, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 0.17553191489361702, | |
| "grad_norm": 1.9246042966842651, | |
| "learning_rate": 4.99910170047625e-06, | |
| "loss": 0.5620714426040649, | |
| "mean_token_accuracy": 0.8041449189186096, | |
| "num_tokens": 1969365.0, | |
| "step": 231 | |
| }, | |
| { | |
| "epoch": 0.1762917933130699, | |
| "grad_norm": 2.924678087234497, | |
| "learning_rate": 4.999044683142196e-06, | |
| "loss": 0.4853099286556244, | |
| "mean_token_accuracy": 0.8256953954696655, | |
| "num_tokens": 1973371.0, | |
| "step": 232 | |
| }, | |
| { | |
| "epoch": 0.1770516717325228, | |
| "grad_norm": 2.098088026046753, | |
| "learning_rate": 4.998985911882383e-06, | |
| "loss": 0.5587087869644165, | |
| "mean_token_accuracy": 0.7966718673706055, | |
| "num_tokens": 1983843.0, | |
| "step": 233 | |
| }, | |
| { | |
| "epoch": 0.1778115501519757, | |
| "grad_norm": 2.514716386795044, | |
| "learning_rate": 4.998925386738063e-06, | |
| "loss": 0.47832345962524414, | |
| "mean_token_accuracy": 0.8339341878890991, | |
| "num_tokens": 1989182.0, | |
| "step": 234 | |
| }, | |
| { | |
| "epoch": 0.17857142857142858, | |
| "grad_norm": 3.023073196411133, | |
| "learning_rate": 4.998863107751711e-06, | |
| "loss": 0.49698033928871155, | |
| "mean_token_accuracy": 0.8559197187423706, | |
| "num_tokens": 1993536.0, | |
| "step": 235 | |
| }, | |
| { | |
| "epoch": 0.17933130699088146, | |
| "grad_norm": 3.1792895793914795, | |
| "learning_rate": 4.99879907496704e-06, | |
| "loss": 0.5702434778213501, | |
| "mean_token_accuracy": 0.8000571131706238, | |
| "num_tokens": 1998042.0, | |
| "step": 236 | |
| }, | |
| { | |
| "epoch": 0.18009118541033434, | |
| "grad_norm": 2.1569433212280273, | |
| "learning_rate": 4.998733288428987e-06, | |
| "loss": 0.5830904245376587, | |
| "mean_token_accuracy": 0.8168346285820007, | |
| "num_tokens": 2009376.0, | |
| "step": 237 | |
| }, | |
| { | |
| "epoch": 0.18085106382978725, | |
| "grad_norm": 2.4051129817962646, | |
| "learning_rate": 4.998665748183727e-06, | |
| "loss": 0.5650715231895447, | |
| "mean_token_accuracy": 0.8147177696228027, | |
| "num_tokens": 2017048.0, | |
| "step": 238 | |
| }, | |
| { | |
| "epoch": 0.18161094224924013, | |
| "grad_norm": 1.5866661071777344, | |
| "learning_rate": 4.998596454278661e-06, | |
| "loss": 0.5082447528839111, | |
| "mean_token_accuracy": 0.8248941898345947, | |
| "num_tokens": 2031366.0, | |
| "step": 239 | |
| }, | |
| { | |
| "epoch": 0.182370820668693, | |
| "grad_norm": 1.99013090133667, | |
| "learning_rate": 4.998525406762422e-06, | |
| "loss": 0.5089194774627686, | |
| "mean_token_accuracy": 0.8197280168533325, | |
| "num_tokens": 2040568.0, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 0.1831306990881459, | |
| "grad_norm": 2.6105995178222656, | |
| "learning_rate": 4.998452605684874e-06, | |
| "loss": 0.4203953146934509, | |
| "mean_token_accuracy": 0.8532576560974121, | |
| "num_tokens": 2045583.0, | |
| "step": 241 | |
| }, | |
| { | |
| "epoch": 0.1838905775075988, | |
| "grad_norm": 2.277846097946167, | |
| "learning_rate": 4.998378051097111e-06, | |
| "loss": 0.5560142397880554, | |
| "mean_token_accuracy": 0.8047254085540771, | |
| "num_tokens": 2053500.0, | |
| "step": 242 | |
| }, | |
| { | |
| "epoch": 0.18465045592705168, | |
| "grad_norm": 1.656392216682434, | |
| "learning_rate": 4.998301743051459e-06, | |
| "loss": 0.602260947227478, | |
| "mean_token_accuracy": 0.7868745923042297, | |
| "num_tokens": 2069389.0, | |
| "step": 243 | |
| }, | |
| { | |
| "epoch": 0.18541033434650456, | |
| "grad_norm": 2.1323728561401367, | |
| "learning_rate": 4.9982236816014735e-06, | |
| "loss": 0.4376535415649414, | |
| "mean_token_accuracy": 0.8592725992202759, | |
| "num_tokens": 2077138.0, | |
| "step": 244 | |
| }, | |
| { | |
| "epoch": 0.18617021276595744, | |
| "grad_norm": 2.616633176803589, | |
| "learning_rate": 4.998143866801941e-06, | |
| "loss": 0.5682218670845032, | |
| "mean_token_accuracy": 0.8179980516433716, | |
| "num_tokens": 2083970.0, | |
| "step": 245 | |
| }, | |
| { | |
| "epoch": 0.18693009118541035, | |
| "grad_norm": 2.6488561630249023, | |
| "learning_rate": 4.99806229870888e-06, | |
| "loss": 0.47162926197052, | |
| "mean_token_accuracy": 0.8438947200775146, | |
| "num_tokens": 2089457.0, | |
| "step": 246 | |
| }, | |
| { | |
| "epoch": 0.18768996960486323, | |
| "grad_norm": 2.019186496734619, | |
| "learning_rate": 4.9979789773795365e-06, | |
| "loss": 0.408061683177948, | |
| "mean_token_accuracy": 0.8576331734657288, | |
| "num_tokens": 2097206.0, | |
| "step": 247 | |
| }, | |
| { | |
| "epoch": 0.1884498480243161, | |
| "grad_norm": 2.302354335784912, | |
| "learning_rate": 4.997893902872389e-06, | |
| "loss": 0.5505059361457825, | |
| "mean_token_accuracy": 0.8150410652160645, | |
| "num_tokens": 2105214.0, | |
| "step": 248 | |
| }, | |
| { | |
| "epoch": 0.189209726443769, | |
| "grad_norm": 1.7864950895309448, | |
| "learning_rate": 4.997807075247147e-06, | |
| "loss": 0.41310858726501465, | |
| "mean_token_accuracy": 0.8526202440261841, | |
| "num_tokens": 2114182.0, | |
| "step": 249 | |
| }, | |
| { | |
| "epoch": 0.1899696048632219, | |
| "grad_norm": 1.6269062757492065, | |
| "learning_rate": 4.997718494564747e-06, | |
| "loss": 0.39904171228408813, | |
| "mean_token_accuracy": 0.8582673072814941, | |
| "num_tokens": 2124312.0, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.19072948328267478, | |
| "grad_norm": 1.565595269203186, | |
| "learning_rate": 4.997628160887361e-06, | |
| "loss": 0.49161577224731445, | |
| "mean_token_accuracy": 0.821890115737915, | |
| "num_tokens": 2146690.0, | |
| "step": 251 | |
| }, | |
| { | |
| "epoch": 0.19148936170212766, | |
| "grad_norm": 3.404179096221924, | |
| "learning_rate": 4.997536074278388e-06, | |
| "loss": 0.5341525673866272, | |
| "mean_token_accuracy": 0.8176189661026001, | |
| "num_tokens": 2150460.0, | |
| "step": 252 | |
| }, | |
| { | |
| "epoch": 0.19224924012158054, | |
| "grad_norm": 2.601632595062256, | |
| "learning_rate": 4.9974422348024565e-06, | |
| "loss": 0.5186881422996521, | |
| "mean_token_accuracy": 0.8306224346160889, | |
| "num_tokens": 2158218.0, | |
| "step": 253 | |
| }, | |
| { | |
| "epoch": 0.19300911854103345, | |
| "grad_norm": 2.3589515686035156, | |
| "learning_rate": 4.997346642525429e-06, | |
| "loss": 0.45730939507484436, | |
| "mean_token_accuracy": 0.8451381921768188, | |
| "num_tokens": 2164297.0, | |
| "step": 254 | |
| }, | |
| { | |
| "epoch": 0.19376899696048633, | |
| "grad_norm": 3.0846846103668213, | |
| "learning_rate": 4.9972492975143936e-06, | |
| "loss": 0.4467453360557556, | |
| "mean_token_accuracy": 0.8379011154174805, | |
| "num_tokens": 2169760.0, | |
| "step": 255 | |
| }, | |
| { | |
| "epoch": 0.1945288753799392, | |
| "grad_norm": 1.7366052865982056, | |
| "learning_rate": 4.997150199837671e-06, | |
| "loss": 0.4394836723804474, | |
| "mean_token_accuracy": 0.8398878574371338, | |
| "num_tokens": 2180146.0, | |
| "step": 256 | |
| }, | |
| { | |
| "epoch": 0.1952887537993921, | |
| "grad_norm": 2.384813070297241, | |
| "learning_rate": 4.997049349564814e-06, | |
| "loss": 0.47991105914115906, | |
| "mean_token_accuracy": 0.8374294638633728, | |
| "num_tokens": 2188162.0, | |
| "step": 257 | |
| }, | |
| { | |
| "epoch": 0.196048632218845, | |
| "grad_norm": 2.590132236480713, | |
| "learning_rate": 4.996946746766602e-06, | |
| "loss": 0.41144347190856934, | |
| "mean_token_accuracy": 0.8605992794036865, | |
| "num_tokens": 2193311.0, | |
| "step": 258 | |
| }, | |
| { | |
| "epoch": 0.19680851063829788, | |
| "grad_norm": 1.7227436304092407, | |
| "learning_rate": 4.996842391515045e-06, | |
| "loss": 0.5073963403701782, | |
| "mean_token_accuracy": 0.8359246253967285, | |
| "num_tokens": 2206332.0, | |
| "step": 259 | |
| }, | |
| { | |
| "epoch": 0.19756838905775076, | |
| "grad_norm": 1.2812135219573975, | |
| "learning_rate": 4.996736283883382e-06, | |
| "loss": 0.40696483850479126, | |
| "mean_token_accuracy": 0.8485742211341858, | |
| "num_tokens": 2226343.0, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 0.19832826747720364, | |
| "grad_norm": 2.5326876640319824, | |
| "learning_rate": 4.9966284239460875e-06, | |
| "loss": 0.46859997510910034, | |
| "mean_token_accuracy": 0.8571674823760986, | |
| "num_tokens": 2231825.0, | |
| "step": 261 | |
| }, | |
| { | |
| "epoch": 0.19908814589665655, | |
| "grad_norm": 1.9771672487258911, | |
| "learning_rate": 4.996518811778858e-06, | |
| "loss": 0.41854721307754517, | |
| "mean_token_accuracy": 0.8526982665061951, | |
| "num_tokens": 2239051.0, | |
| "step": 262 | |
| }, | |
| { | |
| "epoch": 0.19984802431610943, | |
| "grad_norm": 2.1683223247528076, | |
| "learning_rate": 4.996407447458626e-06, | |
| "loss": 0.5161755681037903, | |
| "mean_token_accuracy": 0.8490742444992065, | |
| "num_tokens": 2247131.0, | |
| "step": 263 | |
| }, | |
| { | |
| "epoch": 0.2006079027355623, | |
| "grad_norm": 2.5736196041107178, | |
| "learning_rate": 4.99629433106355e-06, | |
| "loss": 0.4735284745693207, | |
| "mean_token_accuracy": 0.8350207209587097, | |
| "num_tokens": 2253296.0, | |
| "step": 264 | |
| }, | |
| { | |
| "epoch": 0.2013677811550152, | |
| "grad_norm": 1.7842241525650024, | |
| "learning_rate": 4.99617946267302e-06, | |
| "loss": 0.47157543897628784, | |
| "mean_token_accuracy": 0.833238959312439, | |
| "num_tokens": 2264773.0, | |
| "step": 265 | |
| }, | |
| { | |
| "epoch": 0.20212765957446807, | |
| "grad_norm": 2.9247069358825684, | |
| "learning_rate": 4.996062842367655e-06, | |
| "loss": 0.4112093448638916, | |
| "mean_token_accuracy": 0.8585200309753418, | |
| "num_tokens": 2268575.0, | |
| "step": 266 | |
| }, | |
| { | |
| "epoch": 0.20288753799392098, | |
| "grad_norm": 2.306239604949951, | |
| "learning_rate": 4.9959444702293025e-06, | |
| "loss": 0.4145011305809021, | |
| "mean_token_accuracy": 0.8556011915206909, | |
| "num_tokens": 2274508.0, | |
| "step": 267 | |
| }, | |
| { | |
| "epoch": 0.20364741641337386, | |
| "grad_norm": 2.7466163635253906, | |
| "learning_rate": 4.995824346341041e-06, | |
| "loss": 0.37871870398521423, | |
| "mean_token_accuracy": 0.8531461358070374, | |
| "num_tokens": 2279159.0, | |
| "step": 268 | |
| }, | |
| { | |
| "epoch": 0.20440729483282674, | |
| "grad_norm": 2.0082874298095703, | |
| "learning_rate": 4.99570247078718e-06, | |
| "loss": 0.5860568881034851, | |
| "mean_token_accuracy": 0.8031082153320312, | |
| "num_tokens": 2291049.0, | |
| "step": 269 | |
| }, | |
| { | |
| "epoch": 0.20516717325227962, | |
| "grad_norm": 2.280381917953491, | |
| "learning_rate": 4.995578843653255e-06, | |
| "loss": 0.467026948928833, | |
| "mean_token_accuracy": 0.8364914059638977, | |
| "num_tokens": 2297400.0, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 0.20592705167173253, | |
| "grad_norm": 1.8473336696624756, | |
| "learning_rate": 4.995453465026033e-06, | |
| "loss": 0.4944884777069092, | |
| "mean_token_accuracy": 0.838844895362854, | |
| "num_tokens": 2307492.0, | |
| "step": 271 | |
| }, | |
| { | |
| "epoch": 0.2066869300911854, | |
| "grad_norm": 2.4100840091705322, | |
| "learning_rate": 4.995326334993508e-06, | |
| "loss": 0.5006181001663208, | |
| "mean_token_accuracy": 0.8268105983734131, | |
| "num_tokens": 2313273.0, | |
| "step": 272 | |
| }, | |
| { | |
| "epoch": 0.2074468085106383, | |
| "grad_norm": 2.236138343811035, | |
| "learning_rate": 4.9951974536449055e-06, | |
| "loss": 0.4911901354789734, | |
| "mean_token_accuracy": 0.8307386040687561, | |
| "num_tokens": 2320363.0, | |
| "step": 273 | |
| }, | |
| { | |
| "epoch": 0.20820668693009117, | |
| "grad_norm": 3.372265577316284, | |
| "learning_rate": 4.9950668210706795e-06, | |
| "loss": 0.37246087193489075, | |
| "mean_token_accuracy": 0.8783556818962097, | |
| "num_tokens": 2323307.0, | |
| "step": 274 | |
| }, | |
| { | |
| "epoch": 0.20896656534954408, | |
| "grad_norm": 2.1185147762298584, | |
| "learning_rate": 4.994934437362513e-06, | |
| "loss": 0.5806586146354675, | |
| "mean_token_accuracy": 0.8004832863807678, | |
| "num_tokens": 2333048.0, | |
| "step": 275 | |
| }, | |
| { | |
| "epoch": 0.20972644376899696, | |
| "grad_norm": 1.9632351398468018, | |
| "learning_rate": 4.994800302613318e-06, | |
| "loss": 0.4413146376609802, | |
| "mean_token_accuracy": 0.8510923981666565, | |
| "num_tokens": 2340860.0, | |
| "step": 276 | |
| }, | |
| { | |
| "epoch": 0.21048632218844984, | |
| "grad_norm": 2.279758930206299, | |
| "learning_rate": 4.994664416917236e-06, | |
| "loss": 0.5088690519332886, | |
| "mean_token_accuracy": 0.8209558725357056, | |
| "num_tokens": 2348974.0, | |
| "step": 277 | |
| }, | |
| { | |
| "epoch": 0.21124620060790272, | |
| "grad_norm": 1.6696852445602417, | |
| "learning_rate": 4.994526780369636e-06, | |
| "loss": 0.4469617009162903, | |
| "mean_token_accuracy": 0.8400031328201294, | |
| "num_tokens": 2370327.0, | |
| "step": 278 | |
| }, | |
| { | |
| "epoch": 0.21200607902735563, | |
| "grad_norm": 2.841850757598877, | |
| "learning_rate": 4.9943873930671175e-06, | |
| "loss": 0.5606095790863037, | |
| "mean_token_accuracy": 0.807634711265564, | |
| "num_tokens": 2375131.0, | |
| "step": 279 | |
| }, | |
| { | |
| "epoch": 0.2127659574468085, | |
| "grad_norm": 2.82785701751709, | |
| "learning_rate": 4.994246255107506e-06, | |
| "loss": 0.410330593585968, | |
| "mean_token_accuracy": 0.8566482067108154, | |
| "num_tokens": 2378971.0, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 0.2135258358662614, | |
| "grad_norm": 2.8596596717834473, | |
| "learning_rate": 4.994103366589859e-06, | |
| "loss": 0.3787328600883484, | |
| "mean_token_accuracy": 0.874786376953125, | |
| "num_tokens": 2382513.0, | |
| "step": 281 | |
| }, | |
| { | |
| "epoch": 0.21428571428571427, | |
| "grad_norm": 1.7845677137374878, | |
| "learning_rate": 4.993958727614462e-06, | |
| "loss": 0.4610300660133362, | |
| "mean_token_accuracy": 0.8375419974327087, | |
| "num_tokens": 2393147.0, | |
| "step": 282 | |
| }, | |
| { | |
| "epoch": 0.21504559270516718, | |
| "grad_norm": 2.3245997428894043, | |
| "learning_rate": 4.993812338282826e-06, | |
| "loss": 0.4142797589302063, | |
| "mean_token_accuracy": 0.8542178869247437, | |
| "num_tokens": 2398916.0, | |
| "step": 283 | |
| }, | |
| { | |
| "epoch": 0.21580547112462006, | |
| "grad_norm": 1.692062258720398, | |
| "learning_rate": 4.993664198697694e-06, | |
| "loss": 0.44197604060173035, | |
| "mean_token_accuracy": 0.8350081443786621, | |
| "num_tokens": 2411836.0, | |
| "step": 284 | |
| }, | |
| { | |
| "epoch": 0.21656534954407294, | |
| "grad_norm": 2.157754898071289, | |
| "learning_rate": 4.993514308963037e-06, | |
| "loss": 0.5904836058616638, | |
| "mean_token_accuracy": 0.7999725341796875, | |
| "num_tokens": 2420611.0, | |
| "step": 285 | |
| }, | |
| { | |
| "epoch": 0.21732522796352582, | |
| "grad_norm": 3.618769884109497, | |
| "learning_rate": 4.993362669184051e-06, | |
| "loss": 0.5695822238922119, | |
| "mean_token_accuracy": 0.7965190410614014, | |
| "num_tokens": 2424041.0, | |
| "step": 286 | |
| }, | |
| { | |
| "epoch": 0.21808510638297873, | |
| "grad_norm": 2.031795024871826, | |
| "learning_rate": 4.993209279467164e-06, | |
| "loss": 0.5142983198165894, | |
| "mean_token_accuracy": 0.8019678592681885, | |
| "num_tokens": 2434620.0, | |
| "step": 287 | |
| }, | |
| { | |
| "epoch": 0.2188449848024316, | |
| "grad_norm": 1.7205814123153687, | |
| "learning_rate": 4.993054139920031e-06, | |
| "loss": 0.4392193853855133, | |
| "mean_token_accuracy": 0.8336896896362305, | |
| "num_tokens": 2444965.0, | |
| "step": 288 | |
| }, | |
| { | |
| "epoch": 0.2196048632218845, | |
| "grad_norm": 1.7577887773513794, | |
| "learning_rate": 4.992897250651535e-06, | |
| "loss": 0.5540003180503845, | |
| "mean_token_accuracy": 0.7990663051605225, | |
| "num_tokens": 2457544.0, | |
| "step": 289 | |
| }, | |
| { | |
| "epoch": 0.22036474164133737, | |
| "grad_norm": 1.802450180053711, | |
| "learning_rate": 4.992738611771787e-06, | |
| "loss": 0.5318427085876465, | |
| "mean_token_accuracy": 0.8389760851860046, | |
| "num_tokens": 2467875.0, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 0.22112462006079028, | |
| "grad_norm": 2.1131720542907715, | |
| "learning_rate": 4.992578223392124e-06, | |
| "loss": 0.557640790939331, | |
| "mean_token_accuracy": 0.8158830404281616, | |
| "num_tokens": 2475554.0, | |
| "step": 291 | |
| }, | |
| { | |
| "epoch": 0.22188449848024316, | |
| "grad_norm": 3.0931613445281982, | |
| "learning_rate": 4.992416085625115e-06, | |
| "loss": 0.4821542501449585, | |
| "mean_token_accuracy": 0.8444563746452332, | |
| "num_tokens": 2479755.0, | |
| "step": 292 | |
| }, | |
| { | |
| "epoch": 0.22264437689969604, | |
| "grad_norm": 2.84855055809021, | |
| "learning_rate": 4.992252198584554e-06, | |
| "loss": 0.48346447944641113, | |
| "mean_token_accuracy": 0.8550156354904175, | |
| "num_tokens": 2483838.0, | |
| "step": 293 | |
| }, | |
| { | |
| "epoch": 0.22340425531914893, | |
| "grad_norm": 1.8370510339736938, | |
| "learning_rate": 4.992086562385462e-06, | |
| "loss": 0.5358907580375671, | |
| "mean_token_accuracy": 0.8083988428115845, | |
| "num_tokens": 2497336.0, | |
| "step": 294 | |
| }, | |
| { | |
| "epoch": 0.22416413373860183, | |
| "grad_norm": 1.771071195602417, | |
| "learning_rate": 4.9919191771440905e-06, | |
| "loss": 0.5313449501991272, | |
| "mean_token_accuracy": 0.8171120882034302, | |
| "num_tokens": 2513012.0, | |
| "step": 295 | |
| }, | |
| { | |
| "epoch": 0.22492401215805471, | |
| "grad_norm": 2.845015048980713, | |
| "learning_rate": 4.9917500429779165e-06, | |
| "loss": 0.5177006125450134, | |
| "mean_token_accuracy": 0.8264347314834595, | |
| "num_tokens": 2517827.0, | |
| "step": 296 | |
| }, | |
| { | |
| "epoch": 0.2256838905775076, | |
| "grad_norm": 2.6123650074005127, | |
| "learning_rate": 4.991579160005644e-06, | |
| "loss": 0.4545767605304718, | |
| "mean_token_accuracy": 0.8568557500839233, | |
| "num_tokens": 2522988.0, | |
| "step": 297 | |
| }, | |
| { | |
| "epoch": 0.22644376899696048, | |
| "grad_norm": 1.7086553573608398, | |
| "learning_rate": 4.991406528347206e-06, | |
| "loss": 0.44247299432754517, | |
| "mean_token_accuracy": 0.8628532886505127, | |
| "num_tokens": 2534893.0, | |
| "step": 298 | |
| }, | |
| { | |
| "epoch": 0.22720364741641338, | |
| "grad_norm": 2.7990589141845703, | |
| "learning_rate": 4.9912321481237616e-06, | |
| "loss": 0.5641108751296997, | |
| "mean_token_accuracy": 0.8045810461044312, | |
| "num_tokens": 2541195.0, | |
| "step": 299 | |
| }, | |
| { | |
| "epoch": 0.22796352583586627, | |
| "grad_norm": 3.0234270095825195, | |
| "learning_rate": 4.991056019457697e-06, | |
| "loss": 0.4119265675544739, | |
| "mean_token_accuracy": 0.864372730255127, | |
| "num_tokens": 2545007.0, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.22872340425531915, | |
| "grad_norm": 2.3037335872650146, | |
| "learning_rate": 4.990878142472628e-06, | |
| "loss": 0.4908141493797302, | |
| "mean_token_accuracy": 0.8342020511627197, | |
| "num_tokens": 2552092.0, | |
| "step": 301 | |
| }, | |
| { | |
| "epoch": 0.22948328267477203, | |
| "grad_norm": 1.9956134557724, | |
| "learning_rate": 4.990698517293394e-06, | |
| "loss": 0.45161187648773193, | |
| "mean_token_accuracy": 0.844601035118103, | |
| "num_tokens": 2560369.0, | |
| "step": 302 | |
| }, | |
| { | |
| "epoch": 0.23024316109422494, | |
| "grad_norm": 3.525520086288452, | |
| "learning_rate": 4.9905171440460645e-06, | |
| "loss": 0.43304887413978577, | |
| "mean_token_accuracy": 0.8541014790534973, | |
| "num_tokens": 2563256.0, | |
| "step": 303 | |
| }, | |
| { | |
| "epoch": 0.23100303951367782, | |
| "grad_norm": 4.261448383331299, | |
| "learning_rate": 4.990334022857932e-06, | |
| "loss": 0.5139227509498596, | |
| "mean_token_accuracy": 0.8354399800300598, | |
| "num_tokens": 2565879.0, | |
| "step": 304 | |
| }, | |
| { | |
| "epoch": 0.2317629179331307, | |
| "grad_norm": 2.440788507461548, | |
| "learning_rate": 4.990149153857519e-06, | |
| "loss": 0.4294750690460205, | |
| "mean_token_accuracy": 0.8510936498641968, | |
| "num_tokens": 2572210.0, | |
| "step": 305 | |
| }, | |
| { | |
| "epoch": 0.23252279635258358, | |
| "grad_norm": 1.7491639852523804, | |
| "learning_rate": 4.989962537174573e-06, | |
| "loss": 0.47826993465423584, | |
| "mean_token_accuracy": 0.8374208807945251, | |
| "num_tokens": 2584135.0, | |
| "step": 306 | |
| }, | |
| { | |
| "epoch": 0.23328267477203649, | |
| "grad_norm": 3.845266342163086, | |
| "learning_rate": 4.989774172940071e-06, | |
| "loss": 0.5992549657821655, | |
| "mean_token_accuracy": 0.7800809144973755, | |
| "num_tokens": 2587386.0, | |
| "step": 307 | |
| }, | |
| { | |
| "epoch": 0.23404255319148937, | |
| "grad_norm": 2.2111220359802246, | |
| "learning_rate": 4.989584061286211e-06, | |
| "loss": 0.4815753996372223, | |
| "mean_token_accuracy": 0.8330761194229126, | |
| "num_tokens": 2593992.0, | |
| "step": 308 | |
| }, | |
| { | |
| "epoch": 0.23480243161094225, | |
| "grad_norm": 1.858041524887085, | |
| "learning_rate": 4.989392202346423e-06, | |
| "loss": 0.4118471145629883, | |
| "mean_token_accuracy": 0.8520488739013672, | |
| "num_tokens": 2604174.0, | |
| "step": 309 | |
| }, | |
| { | |
| "epoch": 0.23556231003039513, | |
| "grad_norm": 2.405632734298706, | |
| "learning_rate": 4.989198596255361e-06, | |
| "loss": 0.38085171580314636, | |
| "mean_token_accuracy": 0.854683518409729, | |
| "num_tokens": 2609186.0, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 0.23632218844984804, | |
| "grad_norm": 3.8427865505218506, | |
| "learning_rate": 4.989003243148904e-06, | |
| "loss": 0.4445143938064575, | |
| "mean_token_accuracy": 0.8356641530990601, | |
| "num_tokens": 2611913.0, | |
| "step": 311 | |
| }, | |
| { | |
| "epoch": 0.23708206686930092, | |
| "grad_norm": 2.3617193698883057, | |
| "learning_rate": 4.988806143164159e-06, | |
| "loss": 0.4281064569950104, | |
| "mean_token_accuracy": 0.8460803031921387, | |
| "num_tokens": 2621405.0, | |
| "step": 312 | |
| }, | |
| { | |
| "epoch": 0.2378419452887538, | |
| "grad_norm": 2.439340353012085, | |
| "learning_rate": 4.988607296439459e-06, | |
| "loss": 0.5275991559028625, | |
| "mean_token_accuracy": 0.8325222730636597, | |
| "num_tokens": 2629060.0, | |
| "step": 313 | |
| }, | |
| { | |
| "epoch": 0.23860182370820668, | |
| "grad_norm": 1.5777690410614014, | |
| "learning_rate": 4.98840670311436e-06, | |
| "loss": 0.4817584455013275, | |
| "mean_token_accuracy": 0.8332220911979675, | |
| "num_tokens": 2642411.0, | |
| "step": 314 | |
| }, | |
| { | |
| "epoch": 0.2393617021276596, | |
| "grad_norm": 2.179872989654541, | |
| "learning_rate": 4.988204363329648e-06, | |
| "loss": 0.6049296855926514, | |
| "mean_token_accuracy": 0.7860822677612305, | |
| "num_tokens": 2652538.0, | |
| "step": 315 | |
| }, | |
| { | |
| "epoch": 0.24012158054711247, | |
| "grad_norm": 3.253547430038452, | |
| "learning_rate": 4.988000277227334e-06, | |
| "loss": 0.46995067596435547, | |
| "mean_token_accuracy": 0.8340871930122375, | |
| "num_tokens": 2655887.0, | |
| "step": 316 | |
| }, | |
| { | |
| "epoch": 0.24088145896656535, | |
| "grad_norm": 3.441596508026123, | |
| "learning_rate": 4.987794444950651e-06, | |
| "loss": 0.3157607316970825, | |
| "mean_token_accuracy": 0.8920344114303589, | |
| "num_tokens": 2658686.0, | |
| "step": 317 | |
| }, | |
| { | |
| "epoch": 0.24164133738601823, | |
| "grad_norm": 1.8112664222717285, | |
| "learning_rate": 4.987586866644061e-06, | |
| "loss": 0.502338171005249, | |
| "mean_token_accuracy": 0.8305802345275879, | |
| "num_tokens": 2669691.0, | |
| "step": 318 | |
| }, | |
| { | |
| "epoch": 0.24240121580547114, | |
| "grad_norm": 1.8285833597183228, | |
| "learning_rate": 4.9873775424532515e-06, | |
| "loss": 0.4544355869293213, | |
| "mean_token_accuracy": 0.8415238261222839, | |
| "num_tokens": 2678956.0, | |
| "step": 319 | |
| }, | |
| { | |
| "epoch": 0.24316109422492402, | |
| "grad_norm": 2.1771512031555176, | |
| "learning_rate": 4.9871664725251314e-06, | |
| "loss": 0.454698383808136, | |
| "mean_token_accuracy": 0.8483167886734009, | |
| "num_tokens": 2686411.0, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 0.2439209726443769, | |
| "grad_norm": 1.6493504047393799, | |
| "learning_rate": 4.986953657007841e-06, | |
| "loss": 0.42326417565345764, | |
| "mean_token_accuracy": 0.8452361822128296, | |
| "num_tokens": 2698617.0, | |
| "step": 321 | |
| }, | |
| { | |
| "epoch": 0.24468085106382978, | |
| "grad_norm": 1.1489403247833252, | |
| "learning_rate": 4.98673909605074e-06, | |
| "loss": 0.36659368872642517, | |
| "mean_token_accuracy": 0.8352444171905518, | |
| "num_tokens": 2717933.0, | |
| "step": 322 | |
| }, | |
| { | |
| "epoch": 0.2454407294832827, | |
| "grad_norm": 2.295814275741577, | |
| "learning_rate": 4.986522789804417e-06, | |
| "loss": 0.5098875164985657, | |
| "mean_token_accuracy": 0.8190979957580566, | |
| "num_tokens": 2723970.0, | |
| "step": 323 | |
| }, | |
| { | |
| "epoch": 0.24620060790273557, | |
| "grad_norm": 2.3241398334503174, | |
| "learning_rate": 4.986304738420684e-06, | |
| "loss": 0.42751336097717285, | |
| "mean_token_accuracy": 0.8547185659408569, | |
| "num_tokens": 2729390.0, | |
| "step": 324 | |
| }, | |
| { | |
| "epoch": 0.24696048632218845, | |
| "grad_norm": 2.8512768745422363, | |
| "learning_rate": 4.986084942052577e-06, | |
| "loss": 0.33234283328056335, | |
| "mean_token_accuracy": 0.8766615986824036, | |
| "num_tokens": 2733610.0, | |
| "step": 325 | |
| }, | |
| { | |
| "epoch": 0.24772036474164133, | |
| "grad_norm": 2.4620141983032227, | |
| "learning_rate": 4.9858634008543574e-06, | |
| "loss": 0.5473405122756958, | |
| "mean_token_accuracy": 0.8383826017379761, | |
| "num_tokens": 2740063.0, | |
| "step": 326 | |
| }, | |
| { | |
| "epoch": 0.24848024316109424, | |
| "grad_norm": 1.984655737876892, | |
| "learning_rate": 4.985640114981513e-06, | |
| "loss": 0.4946171045303345, | |
| "mean_token_accuracy": 0.839870274066925, | |
| "num_tokens": 2750741.0, | |
| "step": 327 | |
| }, | |
| { | |
| "epoch": 0.24924012158054712, | |
| "grad_norm": 2.5328571796417236, | |
| "learning_rate": 4.985415084590752e-06, | |
| "loss": 0.5796651840209961, | |
| "mean_token_accuracy": 0.7959966659545898, | |
| "num_tokens": 2756165.0, | |
| "step": 328 | |
| }, | |
| { | |
| "epoch": 0.25, | |
| "grad_norm": 2.400641441345215, | |
| "learning_rate": 4.985188309840012e-06, | |
| "loss": 0.48668015003204346, | |
| "mean_token_accuracy": 0.8382711410522461, | |
| "num_tokens": 2761417.0, | |
| "step": 329 | |
| }, | |
| { | |
| "epoch": 0.2507598784194529, | |
| "grad_norm": 2.7159430980682373, | |
| "learning_rate": 4.984959790888451e-06, | |
| "loss": 0.5163013935089111, | |
| "mean_token_accuracy": 0.8203201293945312, | |
| "num_tokens": 2766549.0, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 0.25151975683890576, | |
| "grad_norm": 2.537278652191162, | |
| "learning_rate": 4.984729527896451e-06, | |
| "loss": 0.5671283602714539, | |
| "mean_token_accuracy": 0.8172976970672607, | |
| "num_tokens": 2772817.0, | |
| "step": 331 | |
| }, | |
| { | |
| "epoch": 0.25227963525835867, | |
| "grad_norm": 3.192000150680542, | |
| "learning_rate": 4.984497521025622e-06, | |
| "loss": 0.37886548042297363, | |
| "mean_token_accuracy": 0.8669577836990356, | |
| "num_tokens": 2775795.0, | |
| "step": 332 | |
| }, | |
| { | |
| "epoch": 0.2530395136778115, | |
| "grad_norm": 2.5922412872314453, | |
| "learning_rate": 4.984263770438793e-06, | |
| "loss": 0.4395058751106262, | |
| "mean_token_accuracy": 0.847874104976654, | |
| "num_tokens": 2781033.0, | |
| "step": 333 | |
| }, | |
| { | |
| "epoch": 0.25379939209726443, | |
| "grad_norm": 1.9635506868362427, | |
| "learning_rate": 4.984028276300021e-06, | |
| "loss": 0.4364502429962158, | |
| "mean_token_accuracy": 0.8494650721549988, | |
| "num_tokens": 2787582.0, | |
| "step": 334 | |
| }, | |
| { | |
| "epoch": 0.25455927051671734, | |
| "grad_norm": 2.2944114208221436, | |
| "learning_rate": 4.983791038774585e-06, | |
| "loss": 0.4683613181114197, | |
| "mean_token_accuracy": 0.8264153599739075, | |
| "num_tokens": 2794158.0, | |
| "step": 335 | |
| }, | |
| { | |
| "epoch": 0.2553191489361702, | |
| "grad_norm": 1.778643012046814, | |
| "learning_rate": 4.983552058028985e-06, | |
| "loss": 0.46646493673324585, | |
| "mean_token_accuracy": 0.8360317945480347, | |
| "num_tokens": 2808408.0, | |
| "step": 336 | |
| }, | |
| { | |
| "epoch": 0.2560790273556231, | |
| "grad_norm": 3.0198330879211426, | |
| "learning_rate": 4.9833113342309495e-06, | |
| "loss": 0.5558529496192932, | |
| "mean_token_accuracy": 0.8131203651428223, | |
| "num_tokens": 2813796.0, | |
| "step": 337 | |
| }, | |
| { | |
| "epoch": 0.256838905775076, | |
| "grad_norm": 2.508333683013916, | |
| "learning_rate": 4.983068867549427e-06, | |
| "loss": 0.4780687093734741, | |
| "mean_token_accuracy": 0.8369977474212646, | |
| "num_tokens": 2818909.0, | |
| "step": 338 | |
| }, | |
| { | |
| "epoch": 0.25759878419452886, | |
| "grad_norm": 2.1583943367004395, | |
| "learning_rate": 4.982824658154589e-06, | |
| "loss": 0.6210355162620544, | |
| "mean_token_accuracy": 0.786374568939209, | |
| "num_tokens": 2831458.0, | |
| "step": 339 | |
| }, | |
| { | |
| "epoch": 0.25835866261398177, | |
| "grad_norm": 2.619635581970215, | |
| "learning_rate": 4.9825787062178315e-06, | |
| "loss": 0.5546093583106995, | |
| "mean_token_accuracy": 0.8163464069366455, | |
| "num_tokens": 2843726.0, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 0.2591185410334346, | |
| "grad_norm": 1.9398376941680908, | |
| "learning_rate": 4.982331011911774e-06, | |
| "loss": 0.409252405166626, | |
| "mean_token_accuracy": 0.8436061143875122, | |
| "num_tokens": 2864157.0, | |
| "step": 341 | |
| }, | |
| { | |
| "epoch": 0.25987841945288753, | |
| "grad_norm": 2.2148611545562744, | |
| "learning_rate": 4.982081575410256e-06, | |
| "loss": 0.44126778841018677, | |
| "mean_token_accuracy": 0.8487811088562012, | |
| "num_tokens": 2870701.0, | |
| "step": 342 | |
| }, | |
| { | |
| "epoch": 0.26063829787234044, | |
| "grad_norm": 3.5705149173736572, | |
| "learning_rate": 4.9818303968883445e-06, | |
| "loss": 0.7215286493301392, | |
| "mean_token_accuracy": 0.7655127048492432, | |
| "num_tokens": 2874745.0, | |
| "step": 343 | |
| }, | |
| { | |
| "epoch": 0.2613981762917933, | |
| "grad_norm": 1.8558040857315063, | |
| "learning_rate": 4.981577476522323e-06, | |
| "loss": 0.5530655384063721, | |
| "mean_token_accuracy": 0.8249034881591797, | |
| "num_tokens": 2887052.0, | |
| "step": 344 | |
| }, | |
| { | |
| "epoch": 0.2621580547112462, | |
| "grad_norm": 2.4575531482696533, | |
| "learning_rate": 4.981322814489703e-06, | |
| "loss": 0.49899396300315857, | |
| "mean_token_accuracy": 0.828569233417511, | |
| "num_tokens": 2892473.0, | |
| "step": 345 | |
| }, | |
| { | |
| "epoch": 0.2629179331306991, | |
| "grad_norm": 1.9310275316238403, | |
| "learning_rate": 4.981066410969215e-06, | |
| "loss": 0.47420281171798706, | |
| "mean_token_accuracy": 0.8402732610702515, | |
| "num_tokens": 2900453.0, | |
| "step": 346 | |
| }, | |
| { | |
| "epoch": 0.26367781155015196, | |
| "grad_norm": 2.21812105178833, | |
| "learning_rate": 4.980808266140813e-06, | |
| "loss": 0.4555210769176483, | |
| "mean_token_accuracy": 0.8437404632568359, | |
| "num_tokens": 2906741.0, | |
| "step": 347 | |
| }, | |
| { | |
| "epoch": 0.26443768996960487, | |
| "grad_norm": 2.7364001274108887, | |
| "learning_rate": 4.9805483801856744e-06, | |
| "loss": 0.5119813680648804, | |
| "mean_token_accuracy": 0.8283645510673523, | |
| "num_tokens": 2911988.0, | |
| "step": 348 | |
| }, | |
| { | |
| "epoch": 0.2651975683890577, | |
| "grad_norm": 3.0648796558380127, | |
| "learning_rate": 4.980286753286196e-06, | |
| "loss": 0.3780750334262848, | |
| "mean_token_accuracy": 0.8826199769973755, | |
| "num_tokens": 2915193.0, | |
| "step": 349 | |
| }, | |
| { | |
| "epoch": 0.26595744680851063, | |
| "grad_norm": 1.5473605394363403, | |
| "learning_rate": 4.980023385625996e-06, | |
| "loss": 0.38594651222229004, | |
| "mean_token_accuracy": 0.8553473949432373, | |
| "num_tokens": 2929454.0, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.26671732522796354, | |
| "grad_norm": 2.988367795944214, | |
| "learning_rate": 4.979758277389919e-06, | |
| "loss": 0.4946131706237793, | |
| "mean_token_accuracy": 0.8174018859863281, | |
| "num_tokens": 2934240.0, | |
| "step": 351 | |
| }, | |
| { | |
| "epoch": 0.2674772036474164, | |
| "grad_norm": 2.039886236190796, | |
| "learning_rate": 4.9794914287640264e-06, | |
| "loss": 0.5628063678741455, | |
| "mean_token_accuracy": 0.8034824132919312, | |
| "num_tokens": 2945842.0, | |
| "step": 352 | |
| }, | |
| { | |
| "epoch": 0.2682370820668693, | |
| "grad_norm": 2.3829421997070312, | |
| "learning_rate": 4.979222839935602e-06, | |
| "loss": 0.6037441492080688, | |
| "mean_token_accuracy": 0.7891867160797119, | |
| "num_tokens": 2953898.0, | |
| "step": 353 | |
| }, | |
| { | |
| "epoch": 0.2689969604863222, | |
| "grad_norm": 1.959708571434021, | |
| "learning_rate": 4.9789525110931545e-06, | |
| "loss": 0.5061007738113403, | |
| "mean_token_accuracy": 0.8236986994743347, | |
| "num_tokens": 2962281.0, | |
| "step": 354 | |
| }, | |
| { | |
| "epoch": 0.26975683890577506, | |
| "grad_norm": 2.61007022857666, | |
| "learning_rate": 4.978680442426409e-06, | |
| "loss": 0.6019249558448792, | |
| "mean_token_accuracy": 0.7874947190284729, | |
| "num_tokens": 2969603.0, | |
| "step": 355 | |
| }, | |
| { | |
| "epoch": 0.270516717325228, | |
| "grad_norm": 1.9074316024780273, | |
| "learning_rate": 4.978406634126315e-06, | |
| "loss": 0.48542970418930054, | |
| "mean_token_accuracy": 0.8402796983718872, | |
| "num_tokens": 2979413.0, | |
| "step": 356 | |
| }, | |
| { | |
| "epoch": 0.2712765957446808, | |
| "grad_norm": 1.517846941947937, | |
| "learning_rate": 4.978131086385041e-06, | |
| "loss": 0.4455636143684387, | |
| "mean_token_accuracy": 0.838932991027832, | |
| "num_tokens": 2992495.0, | |
| "step": 357 | |
| }, | |
| { | |
| "epoch": 0.27203647416413373, | |
| "grad_norm": 2.1448144912719727, | |
| "learning_rate": 4.977853799395976e-06, | |
| "loss": 0.4592750072479248, | |
| "mean_token_accuracy": 0.8330867290496826, | |
| "num_tokens": 3000007.0, | |
| "step": 358 | |
| }, | |
| { | |
| "epoch": 0.27279635258358664, | |
| "grad_norm": 3.2687668800354004, | |
| "learning_rate": 4.977574773353732e-06, | |
| "loss": 0.511050820350647, | |
| "mean_token_accuracy": 0.8200520277023315, | |
| "num_tokens": 3003758.0, | |
| "step": 359 | |
| }, | |
| { | |
| "epoch": 0.2735562310030395, | |
| "grad_norm": 2.8292200565338135, | |
| "learning_rate": 4.97729400845414e-06, | |
| "loss": 0.41245830059051514, | |
| "mean_token_accuracy": 0.8344168663024902, | |
| "num_tokens": 3007678.0, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 0.2743161094224924, | |
| "grad_norm": 1.8785876035690308, | |
| "learning_rate": 4.977011504894253e-06, | |
| "loss": 0.46520695090293884, | |
| "mean_token_accuracy": 0.8308258652687073, | |
| "num_tokens": 3015743.0, | |
| "step": 361 | |
| }, | |
| { | |
| "epoch": 0.2750759878419453, | |
| "grad_norm": 1.6335071325302124, | |
| "learning_rate": 4.97672726287234e-06, | |
| "loss": 0.4188292324542999, | |
| "mean_token_accuracy": 0.865784764289856, | |
| "num_tokens": 3026661.0, | |
| "step": 362 | |
| }, | |
| { | |
| "epoch": 0.27583586626139817, | |
| "grad_norm": 3.381009101867676, | |
| "learning_rate": 4.976441282587894e-06, | |
| "loss": 0.5134732723236084, | |
| "mean_token_accuracy": 0.8187953233718872, | |
| "num_tokens": 3030422.0, | |
| "step": 363 | |
| }, | |
| { | |
| "epoch": 0.2765957446808511, | |
| "grad_norm": 1.3476839065551758, | |
| "learning_rate": 4.9761535642416284e-06, | |
| "loss": 0.4297364056110382, | |
| "mean_token_accuracy": 0.8355786800384521, | |
| "num_tokens": 3047763.0, | |
| "step": 364 | |
| }, | |
| { | |
| "epoch": 0.2773556231003039, | |
| "grad_norm": 2.3485515117645264, | |
| "learning_rate": 4.9758641080354745e-06, | |
| "loss": 0.4887448251247406, | |
| "mean_token_accuracy": 0.849709689617157, | |
| "num_tokens": 3053751.0, | |
| "step": 365 | |
| }, | |
| { | |
| "epoch": 0.27811550151975684, | |
| "grad_norm": 2.869009256362915, | |
| "learning_rate": 4.975572914172581e-06, | |
| "loss": 0.5443825721740723, | |
| "mean_token_accuracy": 0.8108746409416199, | |
| "num_tokens": 3058079.0, | |
| "step": 366 | |
| }, | |
| { | |
| "epoch": 0.27887537993920974, | |
| "grad_norm": 2.337939977645874, | |
| "learning_rate": 4.975279982857324e-06, | |
| "loss": 0.5280558466911316, | |
| "mean_token_accuracy": 0.8160051703453064, | |
| "num_tokens": 3065547.0, | |
| "step": 367 | |
| }, | |
| { | |
| "epoch": 0.2796352583586626, | |
| "grad_norm": 1.421703577041626, | |
| "learning_rate": 4.97498531429529e-06, | |
| "loss": 0.39594563841819763, | |
| "mean_token_accuracy": 0.8643261194229126, | |
| "num_tokens": 3078113.0, | |
| "step": 368 | |
| }, | |
| { | |
| "epoch": 0.2803951367781155, | |
| "grad_norm": 2.1441762447357178, | |
| "learning_rate": 4.97468890869329e-06, | |
| "loss": 0.4584254324436188, | |
| "mean_token_accuracy": 0.8336924314498901, | |
| "num_tokens": 3085266.0, | |
| "step": 369 | |
| }, | |
| { | |
| "epoch": 0.2811550151975684, | |
| "grad_norm": 1.343610405921936, | |
| "learning_rate": 4.974390766259353e-06, | |
| "loss": 0.4358016848564148, | |
| "mean_token_accuracy": 0.8309500217437744, | |
| "num_tokens": 3100672.0, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 0.28191489361702127, | |
| "grad_norm": 2.636687994003296, | |
| "learning_rate": 4.974090887202726e-06, | |
| "loss": 0.5070590376853943, | |
| "mean_token_accuracy": 0.8223599791526794, | |
| "num_tokens": 3106740.0, | |
| "step": 371 | |
| }, | |
| { | |
| "epoch": 0.2826747720364742, | |
| "grad_norm": 2.0654895305633545, | |
| "learning_rate": 4.973789271733877e-06, | |
| "loss": 0.6026565432548523, | |
| "mean_token_accuracy": 0.7863624095916748, | |
| "num_tokens": 3117948.0, | |
| "step": 372 | |
| }, | |
| { | |
| "epoch": 0.28343465045592703, | |
| "grad_norm": 4.953190326690674, | |
| "learning_rate": 4.973485920064491e-06, | |
| "loss": 0.5796823501586914, | |
| "mean_token_accuracy": 0.8124111890792847, | |
| "num_tokens": 3120338.0, | |
| "step": 373 | |
| }, | |
| { | |
| "epoch": 0.28419452887537994, | |
| "grad_norm": 1.2652311325073242, | |
| "learning_rate": 4.973180832407471e-06, | |
| "loss": 0.3768846392631531, | |
| "mean_token_accuracy": 0.8423436880111694, | |
| "num_tokens": 3135718.0, | |
| "step": 374 | |
| }, | |
| { | |
| "epoch": 0.28495440729483285, | |
| "grad_norm": 2.609652280807495, | |
| "learning_rate": 4.97287400897694e-06, | |
| "loss": 0.5162647366523743, | |
| "mean_token_accuracy": 0.8220236897468567, | |
| "num_tokens": 3141392.0, | |
| "step": 375 | |
| }, | |
| { | |
| "epoch": 0.2857142857142857, | |
| "grad_norm": 3.0013082027435303, | |
| "learning_rate": 4.972565449988238e-06, | |
| "loss": 0.3151981830596924, | |
| "mean_token_accuracy": 0.8900531530380249, | |
| "num_tokens": 3144656.0, | |
| "step": 376 | |
| }, | |
| { | |
| "epoch": 0.2864741641337386, | |
| "grad_norm": 1.9808810949325562, | |
| "learning_rate": 4.972255155657925e-06, | |
| "loss": 0.4985666275024414, | |
| "mean_token_accuracy": 0.8261170387268066, | |
| "num_tokens": 3152536.0, | |
| "step": 377 | |
| }, | |
| { | |
| "epoch": 0.2872340425531915, | |
| "grad_norm": 7.131393909454346, | |
| "learning_rate": 4.9719431262037755e-06, | |
| "loss": 0.5039874911308289, | |
| "mean_token_accuracy": 0.8170304298400879, | |
| "num_tokens": 3157134.0, | |
| "step": 378 | |
| }, | |
| { | |
| "epoch": 0.28799392097264437, | |
| "grad_norm": 1.4525601863861084, | |
| "learning_rate": 4.971629361844785e-06, | |
| "loss": 0.3949277400970459, | |
| "mean_token_accuracy": 0.8594280481338501, | |
| "num_tokens": 3171764.0, | |
| "step": 379 | |
| }, | |
| { | |
| "epoch": 0.2887537993920973, | |
| "grad_norm": 1.983331322669983, | |
| "learning_rate": 4.971313862801166e-06, | |
| "loss": 0.414550244808197, | |
| "mean_token_accuracy": 0.8540629744529724, | |
| "num_tokens": 3179444.0, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 0.28951367781155013, | |
| "grad_norm": 1.9733079671859741, | |
| "learning_rate": 4.9709966292943455e-06, | |
| "loss": 0.4406163692474365, | |
| "mean_token_accuracy": 0.8342312574386597, | |
| "num_tokens": 3186957.0, | |
| "step": 381 | |
| }, | |
| { | |
| "epoch": 0.29027355623100304, | |
| "grad_norm": 1.652886152267456, | |
| "learning_rate": 4.970677661546972e-06, | |
| "loss": 0.5227410197257996, | |
| "mean_token_accuracy": 0.8188983201980591, | |
| "num_tokens": 3201482.0, | |
| "step": 382 | |
| }, | |
| { | |
| "epoch": 0.29103343465045595, | |
| "grad_norm": 3.3413736820220947, | |
| "learning_rate": 4.970356959782909e-06, | |
| "loss": 0.5933316946029663, | |
| "mean_token_accuracy": 0.7997614145278931, | |
| "num_tokens": 3206201.0, | |
| "step": 383 | |
| }, | |
| { | |
| "epoch": 0.2917933130699088, | |
| "grad_norm": 1.6980115175247192, | |
| "learning_rate": 4.970034524227239e-06, | |
| "loss": 0.35611239075660706, | |
| "mean_token_accuracy": 0.871821939945221, | |
| "num_tokens": 3214696.0, | |
| "step": 384 | |
| }, | |
| { | |
| "epoch": 0.2925531914893617, | |
| "grad_norm": 1.4026317596435547, | |
| "learning_rate": 4.969710355106256e-06, | |
| "loss": 0.42110762000083923, | |
| "mean_token_accuracy": 0.843463659286499, | |
| "num_tokens": 3227516.0, | |
| "step": 385 | |
| }, | |
| { | |
| "epoch": 0.2933130699088146, | |
| "grad_norm": 2.508169651031494, | |
| "learning_rate": 4.969384452647477e-06, | |
| "loss": 0.4792252779006958, | |
| "mean_token_accuracy": 0.8330808877944946, | |
| "num_tokens": 3233895.0, | |
| "step": 386 | |
| }, | |
| { | |
| "epoch": 0.29407294832826747, | |
| "grad_norm": 1.7341818809509277, | |
| "learning_rate": 4.969056817079633e-06, | |
| "loss": 0.4872246980667114, | |
| "mean_token_accuracy": 0.8220229148864746, | |
| "num_tokens": 3244349.0, | |
| "step": 387 | |
| }, | |
| { | |
| "epoch": 0.2948328267477204, | |
| "grad_norm": 2.6779842376708984, | |
| "learning_rate": 4.968727448632669e-06, | |
| "loss": 0.3885750472545624, | |
| "mean_token_accuracy": 0.8585678339004517, | |
| "num_tokens": 3248613.0, | |
| "step": 388 | |
| }, | |
| { | |
| "epoch": 0.29559270516717323, | |
| "grad_norm": 1.7146910429000854, | |
| "learning_rate": 4.968396347537751e-06, | |
| "loss": 0.3956252932548523, | |
| "mean_token_accuracy": 0.8555857539176941, | |
| "num_tokens": 3259996.0, | |
| "step": 389 | |
| }, | |
| { | |
| "epoch": 0.29635258358662614, | |
| "grad_norm": 2.9935412406921387, | |
| "learning_rate": 4.968063514027258e-06, | |
| "loss": 0.37694051861763, | |
| "mean_token_accuracy": 0.8573237657546997, | |
| "num_tokens": 3263195.0, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 0.29711246200607905, | |
| "grad_norm": 2.540135145187378, | |
| "learning_rate": 4.967728948334784e-06, | |
| "loss": 0.47736361622810364, | |
| "mean_token_accuracy": 0.830003023147583, | |
| "num_tokens": 3267650.0, | |
| "step": 391 | |
| }, | |
| { | |
| "epoch": 0.2978723404255319, | |
| "grad_norm": 1.7454566955566406, | |
| "learning_rate": 4.967392650695141e-06, | |
| "loss": 0.37880340218544006, | |
| "mean_token_accuracy": 0.8590790629386902, | |
| "num_tokens": 3279001.0, | |
| "step": 392 | |
| }, | |
| { | |
| "epoch": 0.2986322188449848, | |
| "grad_norm": 2.264423370361328, | |
| "learning_rate": 4.967054621344356e-06, | |
| "loss": 0.557185709476471, | |
| "mean_token_accuracy": 0.834559977054596, | |
| "num_tokens": 3287231.0, | |
| "step": 393 | |
| }, | |
| { | |
| "epoch": 0.2993920972644377, | |
| "grad_norm": 1.8944240808486938, | |
| "learning_rate": 4.96671486051967e-06, | |
| "loss": 0.5128063559532166, | |
| "mean_token_accuracy": 0.8286118507385254, | |
| "num_tokens": 3295986.0, | |
| "step": 394 | |
| }, | |
| { | |
| "epoch": 0.30015197568389057, | |
| "grad_norm": 2.953937530517578, | |
| "learning_rate": 4.966373368459542e-06, | |
| "loss": 0.6272998452186584, | |
| "mean_token_accuracy": 0.7938471436500549, | |
| "num_tokens": 3301681.0, | |
| "step": 395 | |
| }, | |
| { | |
| "epoch": 0.3009118541033435, | |
| "grad_norm": 2.085981845855713, | |
| "learning_rate": 4.966030145403642e-06, | |
| "loss": 0.5305740237236023, | |
| "mean_token_accuracy": 0.8177378177642822, | |
| "num_tokens": 3310738.0, | |
| "step": 396 | |
| }, | |
| { | |
| "epoch": 0.30167173252279633, | |
| "grad_norm": 1.54762864112854, | |
| "learning_rate": 4.965685191592859e-06, | |
| "loss": 0.40358829498291016, | |
| "mean_token_accuracy": 0.8470232486724854, | |
| "num_tokens": 3321604.0, | |
| "step": 397 | |
| }, | |
| { | |
| "epoch": 0.30243161094224924, | |
| "grad_norm": 4.049755573272705, | |
| "learning_rate": 4.9653385072692935e-06, | |
| "loss": 0.4449229836463928, | |
| "mean_token_accuracy": 0.8261799812316895, | |
| "num_tokens": 3324318.0, | |
| "step": 398 | |
| }, | |
| { | |
| "epoch": 0.30319148936170215, | |
| "grad_norm": 2.5733447074890137, | |
| "learning_rate": 4.964990092676263e-06, | |
| "loss": 0.48942190408706665, | |
| "mean_token_accuracy": 0.8347821235656738, | |
| "num_tokens": 3329603.0, | |
| "step": 399 | |
| }, | |
| { | |
| "epoch": 0.303951367781155, | |
| "grad_norm": 2.338879108428955, | |
| "learning_rate": 4.964639948058297e-06, | |
| "loss": 0.3268873393535614, | |
| "mean_token_accuracy": 0.881216287612915, | |
| "num_tokens": 3334802.0, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.3047112462006079, | |
| "grad_norm": 1.804551124572754, | |
| "learning_rate": 4.964288073661142e-06, | |
| "loss": 0.3644765317440033, | |
| "mean_token_accuracy": 0.8437183499336243, | |
| "num_tokens": 3342948.0, | |
| "step": 401 | |
| }, | |
| { | |
| "epoch": 0.30547112462006076, | |
| "grad_norm": 1.6682343482971191, | |
| "learning_rate": 4.963934469731756e-06, | |
| "loss": 0.45854973793029785, | |
| "mean_token_accuracy": 0.8507781624794006, | |
| "num_tokens": 3353740.0, | |
| "step": 402 | |
| }, | |
| { | |
| "epoch": 0.30623100303951367, | |
| "grad_norm": 4.343244552612305, | |
| "learning_rate": 4.963579136518312e-06, | |
| "loss": 0.47984182834625244, | |
| "mean_token_accuracy": 0.8383501768112183, | |
| "num_tokens": 3357648.0, | |
| "step": 403 | |
| }, | |
| { | |
| "epoch": 0.3069908814589666, | |
| "grad_norm": 2.8490872383117676, | |
| "learning_rate": 4.963222074270197e-06, | |
| "loss": 0.5992711782455444, | |
| "mean_token_accuracy": 0.8178967237472534, | |
| "num_tokens": 3363050.0, | |
| "step": 404 | |
| }, | |
| { | |
| "epoch": 0.30775075987841943, | |
| "grad_norm": 2.543656587600708, | |
| "learning_rate": 4.962863283238011e-06, | |
| "loss": 0.5602473020553589, | |
| "mean_token_accuracy": 0.8076410293579102, | |
| "num_tokens": 3369067.0, | |
| "step": 405 | |
| }, | |
| { | |
| "epoch": 0.30851063829787234, | |
| "grad_norm": 1.5536396503448486, | |
| "learning_rate": 4.962502763673566e-06, | |
| "loss": 0.477932870388031, | |
| "mean_token_accuracy": 0.8217586278915405, | |
| "num_tokens": 3382570.0, | |
| "step": 406 | |
| }, | |
| { | |
| "epoch": 0.30927051671732525, | |
| "grad_norm": 2.4474048614501953, | |
| "learning_rate": 4.96214051582989e-06, | |
| "loss": 0.48681867122650146, | |
| "mean_token_accuracy": 0.8442000150680542, | |
| "num_tokens": 3389192.0, | |
| "step": 407 | |
| }, | |
| { | |
| "epoch": 0.3100303951367781, | |
| "grad_norm": 2.3083410263061523, | |
| "learning_rate": 4.961776539961222e-06, | |
| "loss": 0.5324057340621948, | |
| "mean_token_accuracy": 0.8179268836975098, | |
| "num_tokens": 3398668.0, | |
| "step": 408 | |
| }, | |
| { | |
| "epoch": 0.310790273556231, | |
| "grad_norm": 2.712888240814209, | |
| "learning_rate": 4.961410836323014e-06, | |
| "loss": 0.5375503301620483, | |
| "mean_token_accuracy": 0.8183479309082031, | |
| "num_tokens": 3403341.0, | |
| "step": 409 | |
| }, | |
| { | |
| "epoch": 0.31155015197568386, | |
| "grad_norm": 1.5305988788604736, | |
| "learning_rate": 4.961043405171931e-06, | |
| "loss": 0.5159702301025391, | |
| "mean_token_accuracy": 0.8177282214164734, | |
| "num_tokens": 3418953.0, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 0.3123100303951368, | |
| "grad_norm": 1.5801697969436646, | |
| "learning_rate": 4.9606742467658505e-06, | |
| "loss": 0.5146126747131348, | |
| "mean_token_accuracy": 0.8196765184402466, | |
| "num_tokens": 3437744.0, | |
| "step": 411 | |
| }, | |
| { | |
| "epoch": 0.3130699088145897, | |
| "grad_norm": 2.3317625522613525, | |
| "learning_rate": 4.960303361363863e-06, | |
| "loss": 0.5206908583641052, | |
| "mean_token_accuracy": 0.821370542049408, | |
| "num_tokens": 3444442.0, | |
| "step": 412 | |
| }, | |
| { | |
| "epoch": 0.31382978723404253, | |
| "grad_norm": 1.645398736000061, | |
| "learning_rate": 4.959930749226269e-06, | |
| "loss": 0.40867871046066284, | |
| "mean_token_accuracy": 0.8548761606216431, | |
| "num_tokens": 3456458.0, | |
| "step": 413 | |
| }, | |
| { | |
| "epoch": 0.31458966565349544, | |
| "grad_norm": 2.6546099185943604, | |
| "learning_rate": 4.9595564106145825e-06, | |
| "loss": 0.42953741550445557, | |
| "mean_token_accuracy": 0.8524375557899475, | |
| "num_tokens": 3460874.0, | |
| "step": 414 | |
| }, | |
| { | |
| "epoch": 0.31534954407294835, | |
| "grad_norm": 1.6146862506866455, | |
| "learning_rate": 4.959180345791528e-06, | |
| "loss": 0.45501887798309326, | |
| "mean_token_accuracy": 0.8171894550323486, | |
| "num_tokens": 3475225.0, | |
| "step": 415 | |
| }, | |
| { | |
| "epoch": 0.3161094224924012, | |
| "grad_norm": 1.3038263320922852, | |
| "learning_rate": 4.958802555021042e-06, | |
| "loss": 0.42623579502105713, | |
| "mean_token_accuracy": 0.8478569984436035, | |
| "num_tokens": 3493549.0, | |
| "step": 416 | |
| }, | |
| { | |
| "epoch": 0.3168693009118541, | |
| "grad_norm": 2.0775184631347656, | |
| "learning_rate": 4.958423038568274e-06, | |
| "loss": 0.370103120803833, | |
| "mean_token_accuracy": 0.8694682717323303, | |
| "num_tokens": 3499222.0, | |
| "step": 417 | |
| }, | |
| { | |
| "epoch": 0.31762917933130697, | |
| "grad_norm": 1.9842660427093506, | |
| "learning_rate": 4.958041796699583e-06, | |
| "loss": 0.4976680278778076, | |
| "mean_token_accuracy": 0.8439270853996277, | |
| "num_tokens": 3507686.0, | |
| "step": 418 | |
| }, | |
| { | |
| "epoch": 0.3183890577507599, | |
| "grad_norm": 2.6495370864868164, | |
| "learning_rate": 4.957658829682539e-06, | |
| "loss": 0.4992824196815491, | |
| "mean_token_accuracy": 0.8215071558952332, | |
| "num_tokens": 3512261.0, | |
| "step": 419 | |
| }, | |
| { | |
| "epoch": 0.3191489361702128, | |
| "grad_norm": 1.758331537246704, | |
| "learning_rate": 4.9572741377859225e-06, | |
| "loss": 0.5421558618545532, | |
| "mean_token_accuracy": 0.8140615224838257, | |
| "num_tokens": 3522889.0, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 0.31990881458966564, | |
| "grad_norm": 2.9629287719726562, | |
| "learning_rate": 4.956887721279726e-06, | |
| "loss": 0.49583905935287476, | |
| "mean_token_accuracy": 0.81712406873703, | |
| "num_tokens": 3527387.0, | |
| "step": 421 | |
| }, | |
| { | |
| "epoch": 0.32066869300911854, | |
| "grad_norm": 1.8280107975006104, | |
| "learning_rate": 4.95649958043515e-06, | |
| "loss": 0.3644634783267975, | |
| "mean_token_accuracy": 0.8655364513397217, | |
| "num_tokens": 3534002.0, | |
| "step": 422 | |
| }, | |
| { | |
| "epoch": 0.32142857142857145, | |
| "grad_norm": 2.3438503742218018, | |
| "learning_rate": 4.956109715524609e-06, | |
| "loss": 0.5218960046768188, | |
| "mean_token_accuracy": 0.8156325817108154, | |
| "num_tokens": 3540378.0, | |
| "step": 423 | |
| }, | |
| { | |
| "epoch": 0.3221884498480243, | |
| "grad_norm": 2.914623737335205, | |
| "learning_rate": 4.9557181268217225e-06, | |
| "loss": 0.5090000629425049, | |
| "mean_token_accuracy": 0.8220853805541992, | |
| "num_tokens": 3544791.0, | |
| "step": 424 | |
| }, | |
| { | |
| "epoch": 0.3229483282674772, | |
| "grad_norm": 1.8533551692962646, | |
| "learning_rate": 4.955324814601324e-06, | |
| "loss": 0.4710542559623718, | |
| "mean_token_accuracy": 0.8278185129165649, | |
| "num_tokens": 3554244.0, | |
| "step": 425 | |
| }, | |
| { | |
| "epoch": 0.32370820668693007, | |
| "grad_norm": 2.895254135131836, | |
| "learning_rate": 4.954929779139455e-06, | |
| "loss": 0.5684993863105774, | |
| "mean_token_accuracy": 0.8432695269584656, | |
| "num_tokens": 3560409.0, | |
| "step": 426 | |
| }, | |
| { | |
| "epoch": 0.324468085106383, | |
| "grad_norm": 2.5141751766204834, | |
| "learning_rate": 4.954533020713367e-06, | |
| "loss": 0.48398154973983765, | |
| "mean_token_accuracy": 0.8218153119087219, | |
| "num_tokens": 3567275.0, | |
| "step": 427 | |
| }, | |
| { | |
| "epoch": 0.3252279635258359, | |
| "grad_norm": 3.102905511856079, | |
| "learning_rate": 4.954134539601519e-06, | |
| "loss": 0.5117533206939697, | |
| "mean_token_accuracy": 0.8482083082199097, | |
| "num_tokens": 3572195.0, | |
| "step": 428 | |
| }, | |
| { | |
| "epoch": 0.32598784194528874, | |
| "grad_norm": 1.5614527463912964, | |
| "learning_rate": 4.953734336083582e-06, | |
| "loss": 0.39276060461997986, | |
| "mean_token_accuracy": 0.8795406818389893, | |
| "num_tokens": 3583260.0, | |
| "step": 429 | |
| }, | |
| { | |
| "epoch": 0.32674772036474165, | |
| "grad_norm": 2.461669921875, | |
| "learning_rate": 4.953332410440434e-06, | |
| "loss": 0.6011022329330444, | |
| "mean_token_accuracy": 0.790380597114563, | |
| "num_tokens": 3593519.0, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 0.32750759878419455, | |
| "grad_norm": 1.4988338947296143, | |
| "learning_rate": 4.952928762954161e-06, | |
| "loss": 0.3363340198993683, | |
| "mean_token_accuracy": 0.882912278175354, | |
| "num_tokens": 3603554.0, | |
| "step": 431 | |
| }, | |
| { | |
| "epoch": 0.3282674772036474, | |
| "grad_norm": 2.019150733947754, | |
| "learning_rate": 4.952523393908059e-06, | |
| "loss": 0.4858176112174988, | |
| "mean_token_accuracy": 0.8212261199951172, | |
| "num_tokens": 3611867.0, | |
| "step": 432 | |
| }, | |
| { | |
| "epoch": 0.3290273556231003, | |
| "grad_norm": 2.2393953800201416, | |
| "learning_rate": 4.952116303586631e-06, | |
| "loss": 0.4052902162075043, | |
| "mean_token_accuracy": 0.851185142993927, | |
| "num_tokens": 3617158.0, | |
| "step": 433 | |
| }, | |
| { | |
| "epoch": 0.32978723404255317, | |
| "grad_norm": 2.0428338050842285, | |
| "learning_rate": 4.951707492275589e-06, | |
| "loss": 0.4827128052711487, | |
| "mean_token_accuracy": 0.8306541442871094, | |
| "num_tokens": 3625958.0, | |
| "step": 434 | |
| }, | |
| { | |
| "epoch": 0.3305471124620061, | |
| "grad_norm": 3.0937533378601074, | |
| "learning_rate": 4.951296960261853e-06, | |
| "loss": 0.31141477823257446, | |
| "mean_token_accuracy": 0.894458532333374, | |
| "num_tokens": 3629275.0, | |
| "step": 435 | |
| }, | |
| { | |
| "epoch": 0.331306990881459, | |
| "grad_norm": 2.3901596069335938, | |
| "learning_rate": 4.95088470783355e-06, | |
| "loss": 0.5179769992828369, | |
| "mean_token_accuracy": 0.8250888586044312, | |
| "num_tokens": 3634963.0, | |
| "step": 436 | |
| }, | |
| { | |
| "epoch": 0.33206686930091184, | |
| "grad_norm": 2.4738881587982178, | |
| "learning_rate": 4.950470735280013e-06, | |
| "loss": 0.45892447233200073, | |
| "mean_token_accuracy": 0.8635761737823486, | |
| "num_tokens": 3640657.0, | |
| "step": 437 | |
| }, | |
| { | |
| "epoch": 0.33282674772036475, | |
| "grad_norm": 2.332380771636963, | |
| "learning_rate": 4.950055042891786e-06, | |
| "loss": 0.47390294075012207, | |
| "mean_token_accuracy": 0.86452317237854, | |
| "num_tokens": 3646819.0, | |
| "step": 438 | |
| }, | |
| { | |
| "epoch": 0.33358662613981765, | |
| "grad_norm": 4.826568126678467, | |
| "learning_rate": 4.949637630960618e-06, | |
| "loss": 0.48327547311782837, | |
| "mean_token_accuracy": 0.8322122097015381, | |
| "num_tokens": 3648870.0, | |
| "step": 439 | |
| }, | |
| { | |
| "epoch": 0.3343465045592705, | |
| "grad_norm": 2.105173349380493, | |
| "learning_rate": 4.949218499779462e-06, | |
| "loss": 0.5252559185028076, | |
| "mean_token_accuracy": 0.8192916512489319, | |
| "num_tokens": 3657851.0, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 0.3351063829787234, | |
| "grad_norm": 1.8412903547286987, | |
| "learning_rate": 4.948797649642484e-06, | |
| "loss": 0.49293676018714905, | |
| "mean_token_accuracy": 0.8469193577766418, | |
| "num_tokens": 3669262.0, | |
| "step": 441 | |
| }, | |
| { | |
| "epoch": 0.33586626139817627, | |
| "grad_norm": 3.4044852256774902, | |
| "learning_rate": 4.94837508084505e-06, | |
| "loss": 0.6744300127029419, | |
| "mean_token_accuracy": 0.7866237163543701, | |
| "num_tokens": 3673313.0, | |
| "step": 442 | |
| }, | |
| { | |
| "epoch": 0.3366261398176292, | |
| "grad_norm": 1.8514925241470337, | |
| "learning_rate": 4.9479507936837364e-06, | |
| "loss": 0.43219512701034546, | |
| "mean_token_accuracy": 0.8449255228042603, | |
| "num_tokens": 3681998.0, | |
| "step": 443 | |
| }, | |
| { | |
| "epoch": 0.3373860182370821, | |
| "grad_norm": 2.966836452484131, | |
| "learning_rate": 4.947524788456325e-06, | |
| "loss": 0.5979588031768799, | |
| "mean_token_accuracy": 0.8045225143432617, | |
| "num_tokens": 3686556.0, | |
| "step": 444 | |
| }, | |
| { | |
| "epoch": 0.33814589665653494, | |
| "grad_norm": 1.5906144380569458, | |
| "learning_rate": 4.947097065461801e-06, | |
| "loss": 0.48026829957962036, | |
| "mean_token_accuracy": 0.8438354134559631, | |
| "num_tokens": 3698564.0, | |
| "step": 445 | |
| }, | |
| { | |
| "epoch": 0.33890577507598785, | |
| "grad_norm": 1.947380542755127, | |
| "learning_rate": 4.946667625000358e-06, | |
| "loss": 0.4435119032859802, | |
| "mean_token_accuracy": 0.8234341740608215, | |
| "num_tokens": 3705749.0, | |
| "step": 446 | |
| }, | |
| { | |
| "epoch": 0.33966565349544076, | |
| "grad_norm": 1.73146390914917, | |
| "learning_rate": 4.946236467373392e-06, | |
| "loss": 0.5239195227622986, | |
| "mean_token_accuracy": 0.8134052157402039, | |
| "num_tokens": 3716071.0, | |
| "step": 447 | |
| }, | |
| { | |
| "epoch": 0.3404255319148936, | |
| "grad_norm": 1.9863660335540771, | |
| "learning_rate": 4.945803592883509e-06, | |
| "loss": 0.5005546808242798, | |
| "mean_token_accuracy": 0.8310836553573608, | |
| "num_tokens": 3724284.0, | |
| "step": 448 | |
| }, | |
| { | |
| "epoch": 0.3411854103343465, | |
| "grad_norm": 1.7305761575698853, | |
| "learning_rate": 4.9453690018345144e-06, | |
| "loss": 0.40787550806999207, | |
| "mean_token_accuracy": 0.8583958148956299, | |
| "num_tokens": 3734627.0, | |
| "step": 449 | |
| }, | |
| { | |
| "epoch": 0.34194528875379937, | |
| "grad_norm": 1.4045218229293823, | |
| "learning_rate": 4.944932694531423e-06, | |
| "loss": 0.5022330284118652, | |
| "mean_token_accuracy": 0.8324989080429077, | |
| "num_tokens": 3754357.0, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.3427051671732523, | |
| "grad_norm": 1.7415108680725098, | |
| "learning_rate": 4.94449467128045e-06, | |
| "loss": 0.39325833320617676, | |
| "mean_token_accuracy": 0.8628256320953369, | |
| "num_tokens": 3763300.0, | |
| "step": 451 | |
| }, | |
| { | |
| "epoch": 0.3434650455927052, | |
| "grad_norm": 2.2774908542633057, | |
| "learning_rate": 4.944054932389018e-06, | |
| "loss": 0.5054275393486023, | |
| "mean_token_accuracy": 0.856291651725769, | |
| "num_tokens": 3769233.0, | |
| "step": 452 | |
| }, | |
| { | |
| "epoch": 0.34422492401215804, | |
| "grad_norm": 1.5996630191802979, | |
| "learning_rate": 4.943613478165753e-06, | |
| "loss": 0.39869362115859985, | |
| "mean_token_accuracy": 0.8556894659996033, | |
| "num_tokens": 3779668.0, | |
| "step": 453 | |
| }, | |
| { | |
| "epoch": 0.34498480243161095, | |
| "grad_norm": 2.8231725692749023, | |
| "learning_rate": 4.943170308920484e-06, | |
| "loss": 0.4610729217529297, | |
| "mean_token_accuracy": 0.8591855764389038, | |
| "num_tokens": 3783629.0, | |
| "step": 454 | |
| }, | |
| { | |
| "epoch": 0.34574468085106386, | |
| "grad_norm": 2.540994882583618, | |
| "learning_rate": 4.9427254249642445e-06, | |
| "loss": 0.5615667104721069, | |
| "mean_token_accuracy": 0.8084384202957153, | |
| "num_tokens": 3790579.0, | |
| "step": 455 | |
| }, | |
| { | |
| "epoch": 0.3465045592705167, | |
| "grad_norm": 1.7328214645385742, | |
| "learning_rate": 4.942278826609272e-06, | |
| "loss": 0.5150455236434937, | |
| "mean_token_accuracy": 0.8214741349220276, | |
| "num_tokens": 3800795.0, | |
| "step": 456 | |
| }, | |
| { | |
| "epoch": 0.3472644376899696, | |
| "grad_norm": 1.6073330640792847, | |
| "learning_rate": 4.9418305141690045e-06, | |
| "loss": 0.48729372024536133, | |
| "mean_token_accuracy": 0.8309370875358582, | |
| "num_tokens": 3813738.0, | |
| "step": 457 | |
| }, | |
| { | |
| "epoch": 0.34802431610942247, | |
| "grad_norm": 3.1176810264587402, | |
| "learning_rate": 4.9413804879580865e-06, | |
| "loss": 0.5165641903877258, | |
| "mean_token_accuracy": 0.8539294600486755, | |
| "num_tokens": 3818088.0, | |
| "step": 458 | |
| }, | |
| { | |
| "epoch": 0.3487841945288754, | |
| "grad_norm": 1.476922869682312, | |
| "learning_rate": 4.940928748292363e-06, | |
| "loss": 0.5822708606719971, | |
| "mean_token_accuracy": 0.8083155751228333, | |
| "num_tokens": 3839183.0, | |
| "step": 459 | |
| }, | |
| { | |
| "epoch": 0.3495440729483283, | |
| "grad_norm": 2.4246726036071777, | |
| "learning_rate": 4.940475295488882e-06, | |
| "loss": 0.42251867055892944, | |
| "mean_token_accuracy": 0.8503992557525635, | |
| "num_tokens": 3844849.0, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 0.35030395136778114, | |
| "grad_norm": 1.3491480350494385, | |
| "learning_rate": 4.940020129865895e-06, | |
| "loss": 0.4598064124584198, | |
| "mean_token_accuracy": 0.830859899520874, | |
| "num_tokens": 3862108.0, | |
| "step": 461 | |
| }, | |
| { | |
| "epoch": 0.35106382978723405, | |
| "grad_norm": 2.066025495529175, | |
| "learning_rate": 4.9395632517428546e-06, | |
| "loss": 0.5363115072250366, | |
| "mean_token_accuracy": 0.8228449821472168, | |
| "num_tokens": 3870682.0, | |
| "step": 462 | |
| }, | |
| { | |
| "epoch": 0.3518237082066869, | |
| "grad_norm": 1.7449887990951538, | |
| "learning_rate": 4.939104661440415e-06, | |
| "loss": 0.42669913172721863, | |
| "mean_token_accuracy": 0.8581840395927429, | |
| "num_tokens": 3885309.0, | |
| "step": 463 | |
| }, | |
| { | |
| "epoch": 0.3525835866261398, | |
| "grad_norm": 2.282083749771118, | |
| "learning_rate": 4.938644359280433e-06, | |
| "loss": 0.528269350528717, | |
| "mean_token_accuracy": 0.8524715900421143, | |
| "num_tokens": 3892692.0, | |
| "step": 464 | |
| }, | |
| { | |
| "epoch": 0.3533434650455927, | |
| "grad_norm": 1.9782079458236694, | |
| "learning_rate": 4.938182345585967e-06, | |
| "loss": 0.5342779755592346, | |
| "mean_token_accuracy": 0.8000766038894653, | |
| "num_tokens": 3901731.0, | |
| "step": 465 | |
| }, | |
| { | |
| "epoch": 0.3541033434650456, | |
| "grad_norm": 2.3067269325256348, | |
| "learning_rate": 4.937718620681273e-06, | |
| "loss": 0.4966881275177002, | |
| "mean_token_accuracy": 0.8279182314872742, | |
| "num_tokens": 3908966.0, | |
| "step": 466 | |
| }, | |
| { | |
| "epoch": 0.3548632218844985, | |
| "grad_norm": 1.9411311149597168, | |
| "learning_rate": 4.9372531848918145e-06, | |
| "loss": 0.5149158239364624, | |
| "mean_token_accuracy": 0.8420406579971313, | |
| "num_tokens": 3919028.0, | |
| "step": 467 | |
| }, | |
| { | |
| "epoch": 0.3556231003039514, | |
| "grad_norm": 1.9435569047927856, | |
| "learning_rate": 4.936786038544251e-06, | |
| "loss": 0.5169678926467896, | |
| "mean_token_accuracy": 0.8235641121864319, | |
| "num_tokens": 3927936.0, | |
| "step": 468 | |
| }, | |
| { | |
| "epoch": 0.35638297872340424, | |
| "grad_norm": 1.3978698253631592, | |
| "learning_rate": 4.9363171819664434e-06, | |
| "loss": 0.5187251567840576, | |
| "mean_token_accuracy": 0.8063663244247437, | |
| "num_tokens": 3952117.0, | |
| "step": 469 | |
| }, | |
| { | |
| "epoch": 0.35714285714285715, | |
| "grad_norm": 2.639873743057251, | |
| "learning_rate": 4.9358466154874535e-06, | |
| "loss": 0.4771063029766083, | |
| "mean_token_accuracy": 0.8208592534065247, | |
| "num_tokens": 3957083.0, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 0.35790273556231, | |
| "grad_norm": 1.6088488101959229, | |
| "learning_rate": 4.935374339437543e-06, | |
| "loss": 0.5243949294090271, | |
| "mean_token_accuracy": 0.8492621779441833, | |
| "num_tokens": 3972633.0, | |
| "step": 471 | |
| }, | |
| { | |
| "epoch": 0.3586626139817629, | |
| "grad_norm": 3.3320486545562744, | |
| "learning_rate": 4.934900354148173e-06, | |
| "loss": 0.4870304763317108, | |
| "mean_token_accuracy": 0.8500401973724365, | |
| "num_tokens": 3975536.0, | |
| "step": 472 | |
| }, | |
| { | |
| "epoch": 0.3594224924012158, | |
| "grad_norm": 2.7519044876098633, | |
| "learning_rate": 4.934424659952006e-06, | |
| "loss": 0.3919612467288971, | |
| "mean_token_accuracy": 0.8723220825195312, | |
| "num_tokens": 3979855.0, | |
| "step": 473 | |
| }, | |
| { | |
| "epoch": 0.3601823708206687, | |
| "grad_norm": 1.1771601438522339, | |
| "learning_rate": 4.933947257182901e-06, | |
| "loss": 0.38711655139923096, | |
| "mean_token_accuracy": 0.8588876128196716, | |
| "num_tokens": 4004167.0, | |
| "step": 474 | |
| }, | |
| { | |
| "epoch": 0.3609422492401216, | |
| "grad_norm": 1.7675265073776245, | |
| "learning_rate": 4.933468146175918e-06, | |
| "loss": 0.5844885110855103, | |
| "mean_token_accuracy": 0.8076567649841309, | |
| "num_tokens": 4016801.0, | |
| "step": 475 | |
| }, | |
| { | |
| "epoch": 0.3617021276595745, | |
| "grad_norm": 3.0058584213256836, | |
| "learning_rate": 4.932987327267317e-06, | |
| "loss": 0.4400174021720886, | |
| "mean_token_accuracy": 0.8469029664993286, | |
| "num_tokens": 4022567.0, | |
| "step": 476 | |
| }, | |
| { | |
| "epoch": 0.36246200607902734, | |
| "grad_norm": 1.3611799478530884, | |
| "learning_rate": 4.932504800794553e-06, | |
| "loss": 0.4285426139831543, | |
| "mean_token_accuracy": 0.8450878858566284, | |
| "num_tokens": 4036585.0, | |
| "step": 477 | |
| }, | |
| { | |
| "epoch": 0.36322188449848025, | |
| "grad_norm": 1.4490348100662231, | |
| "learning_rate": 4.9320205670962815e-06, | |
| "loss": 0.5200105309486389, | |
| "mean_token_accuracy": 0.816506028175354, | |
| "num_tokens": 4052671.0, | |
| "step": 478 | |
| }, | |
| { | |
| "epoch": 0.3639817629179331, | |
| "grad_norm": 2.0383307933807373, | |
| "learning_rate": 4.931534626512359e-06, | |
| "loss": 0.4381054937839508, | |
| "mean_token_accuracy": 0.8396817445755005, | |
| "num_tokens": 4061884.0, | |
| "step": 479 | |
| }, | |
| { | |
| "epoch": 0.364741641337386, | |
| "grad_norm": 1.854593276977539, | |
| "learning_rate": 4.931046979383836e-06, | |
| "loss": 0.4555840492248535, | |
| "mean_token_accuracy": 0.84235680103302, | |
| "num_tokens": 4070797.0, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 0.3655015197568389, | |
| "grad_norm": 2.12614107131958, | |
| "learning_rate": 4.930557626052961e-06, | |
| "loss": 0.3838217854499817, | |
| "mean_token_accuracy": 0.8676179051399231, | |
| "num_tokens": 4076345.0, | |
| "step": 481 | |
| }, | |
| { | |
| "epoch": 0.3662613981762918, | |
| "grad_norm": 1.612610936164856, | |
| "learning_rate": 4.930066566863182e-06, | |
| "loss": 0.5174338817596436, | |
| "mean_token_accuracy": 0.8290661573410034, | |
| "num_tokens": 4092149.0, | |
| "step": 482 | |
| }, | |
| { | |
| "epoch": 0.3670212765957447, | |
| "grad_norm": 2.1137144565582275, | |
| "learning_rate": 4.929573802159143e-06, | |
| "loss": 0.4602130651473999, | |
| "mean_token_accuracy": 0.8441717624664307, | |
| "num_tokens": 4098977.0, | |
| "step": 483 | |
| }, | |
| { | |
| "epoch": 0.3677811550151976, | |
| "grad_norm": 1.9106091260910034, | |
| "learning_rate": 4.929079332286685e-06, | |
| "loss": 0.42526084184646606, | |
| "mean_token_accuracy": 0.8522722721099854, | |
| "num_tokens": 4106471.0, | |
| "step": 484 | |
| }, | |
| { | |
| "epoch": 0.36854103343465044, | |
| "grad_norm": 1.719895601272583, | |
| "learning_rate": 4.928583157592846e-06, | |
| "loss": 0.38735371828079224, | |
| "mean_token_accuracy": 0.8658885955810547, | |
| "num_tokens": 4116297.0, | |
| "step": 485 | |
| }, | |
| { | |
| "epoch": 0.36930091185410335, | |
| "grad_norm": 1.820185899734497, | |
| "learning_rate": 4.928085278425862e-06, | |
| "loss": 0.5133121609687805, | |
| "mean_token_accuracy": 0.8316508531570435, | |
| "num_tokens": 4127473.0, | |
| "step": 486 | |
| }, | |
| { | |
| "epoch": 0.3700607902735562, | |
| "grad_norm": 1.9347177743911743, | |
| "learning_rate": 4.927585695135162e-06, | |
| "loss": 0.5389706492424011, | |
| "mean_token_accuracy": 0.8159632682800293, | |
| "num_tokens": 4136883.0, | |
| "step": 487 | |
| }, | |
| { | |
| "epoch": 0.3708206686930091, | |
| "grad_norm": 2.309093713760376, | |
| "learning_rate": 4.9270844080713735e-06, | |
| "loss": 0.5580676198005676, | |
| "mean_token_accuracy": 0.8078963160514832, | |
| "num_tokens": 4143537.0, | |
| "step": 488 | |
| }, | |
| { | |
| "epoch": 0.371580547112462, | |
| "grad_norm": 1.7023398876190186, | |
| "learning_rate": 4.926581417586319e-06, | |
| "loss": 0.49399808049201965, | |
| "mean_token_accuracy": 0.8330680131912231, | |
| "num_tokens": 4155279.0, | |
| "step": 489 | |
| }, | |
| { | |
| "epoch": 0.3723404255319149, | |
| "grad_norm": 1.7478828430175781, | |
| "learning_rate": 4.926076724033016e-06, | |
| "loss": 0.4861065149307251, | |
| "mean_token_accuracy": 0.8224774599075317, | |
| "num_tokens": 4165624.0, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 0.3731003039513678, | |
| "grad_norm": 1.8368672132492065, | |
| "learning_rate": 4.925570327765678e-06, | |
| "loss": 0.5203143358230591, | |
| "mean_token_accuracy": 0.8500494956970215, | |
| "num_tokens": 4179106.0, | |
| "step": 491 | |
| }, | |
| { | |
| "epoch": 0.3738601823708207, | |
| "grad_norm": 1.7900545597076416, | |
| "learning_rate": 4.9250622291397144e-06, | |
| "loss": 0.29317501187324524, | |
| "mean_token_accuracy": 0.8932639360427856, | |
| "num_tokens": 4185792.0, | |
| "step": 492 | |
| }, | |
| { | |
| "epoch": 0.37462006079027355, | |
| "grad_norm": 1.993884563446045, | |
| "learning_rate": 4.924552428511727e-06, | |
| "loss": 0.41911810636520386, | |
| "mean_token_accuracy": 0.8486036062240601, | |
| "num_tokens": 4193481.0, | |
| "step": 493 | |
| }, | |
| { | |
| "epoch": 0.37537993920972645, | |
| "grad_norm": 1.8426238298416138, | |
| "learning_rate": 4.924040926239515e-06, | |
| "loss": 0.5417478084564209, | |
| "mean_token_accuracy": 0.7899638414382935, | |
| "num_tokens": 4206182.0, | |
| "step": 494 | |
| }, | |
| { | |
| "epoch": 0.3761398176291793, | |
| "grad_norm": 2.032972812652588, | |
| "learning_rate": 4.92352772268207e-06, | |
| "loss": 0.4417288899421692, | |
| "mean_token_accuracy": 0.8448511362075806, | |
| "num_tokens": 4212604.0, | |
| "step": 495 | |
| }, | |
| { | |
| "epoch": 0.3768996960486322, | |
| "grad_norm": 2.371108293533325, | |
| "learning_rate": 4.923012818199576e-06, | |
| "loss": 0.4979432225227356, | |
| "mean_token_accuracy": 0.8591092824935913, | |
| "num_tokens": 4217960.0, | |
| "step": 496 | |
| }, | |
| { | |
| "epoch": 0.3776595744680851, | |
| "grad_norm": 2.846374750137329, | |
| "learning_rate": 4.922496213153416e-06, | |
| "loss": 0.4680670201778412, | |
| "mean_token_accuracy": 0.8293017148971558, | |
| "num_tokens": 4222783.0, | |
| "step": 497 | |
| }, | |
| { | |
| "epoch": 0.378419452887538, | |
| "grad_norm": 1.91952645778656, | |
| "learning_rate": 4.921977907906161e-06, | |
| "loss": 0.46989706158638, | |
| "mean_token_accuracy": 0.8414992094039917, | |
| "num_tokens": 4230617.0, | |
| "step": 498 | |
| }, | |
| { | |
| "epoch": 0.3791793313069909, | |
| "grad_norm": 2.1629347801208496, | |
| "learning_rate": 4.921457902821578e-06, | |
| "loss": 0.40130868554115295, | |
| "mean_token_accuracy": 0.8518390655517578, | |
| "num_tokens": 4235935.0, | |
| "step": 499 | |
| }, | |
| { | |
| "epoch": 0.3799392097264438, | |
| "grad_norm": 1.874174952507019, | |
| "learning_rate": 4.9209361982646275e-06, | |
| "loss": 0.48775261640548706, | |
| "mean_token_accuracy": 0.8302479982376099, | |
| "num_tokens": 4244450.0, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.38069908814589665, | |
| "grad_norm": 2.055781126022339, | |
| "learning_rate": 4.920412794601461e-06, | |
| "loss": 0.47624891996383667, | |
| "mean_token_accuracy": 0.8339136242866516, | |
| "num_tokens": 4251400.0, | |
| "step": 501 | |
| }, | |
| { | |
| "epoch": 0.38145896656534956, | |
| "grad_norm": 2.230872392654419, | |
| "learning_rate": 4.919887692199423e-06, | |
| "loss": 0.4844909906387329, | |
| "mean_token_accuracy": 0.8168134689331055, | |
| "num_tokens": 4258101.0, | |
| "step": 502 | |
| }, | |
| { | |
| "epoch": 0.3822188449848024, | |
| "grad_norm": 2.1640610694885254, | |
| "learning_rate": 4.9193608914270515e-06, | |
| "loss": 0.563758134841919, | |
| "mean_token_accuracy": 0.8058563470840454, | |
| "num_tokens": 4267449.0, | |
| "step": 503 | |
| }, | |
| { | |
| "epoch": 0.3829787234042553, | |
| "grad_norm": 2.2596869468688965, | |
| "learning_rate": 4.918832392654075e-06, | |
| "loss": 0.497257798910141, | |
| "mean_token_accuracy": 0.8271679878234863, | |
| "num_tokens": 4274005.0, | |
| "step": 504 | |
| }, | |
| { | |
| "epoch": 0.3837386018237082, | |
| "grad_norm": 1.68129563331604, | |
| "learning_rate": 4.9183021962514145e-06, | |
| "loss": 0.5896461606025696, | |
| "mean_token_accuracy": 0.796101450920105, | |
| "num_tokens": 4289095.0, | |
| "step": 505 | |
| }, | |
| { | |
| "epoch": 0.3844984802431611, | |
| "grad_norm": 1.684326410293579, | |
| "learning_rate": 4.917770302591183e-06, | |
| "loss": 0.3357805609703064, | |
| "mean_token_accuracy": 0.877285361289978, | |
| "num_tokens": 4298089.0, | |
| "step": 506 | |
| }, | |
| { | |
| "epoch": 0.385258358662614, | |
| "grad_norm": 1.5425621271133423, | |
| "learning_rate": 4.917236712046682e-06, | |
| "loss": 0.5053616762161255, | |
| "mean_token_accuracy": 0.8094472289085388, | |
| "num_tokens": 4315325.0, | |
| "step": 507 | |
| }, | |
| { | |
| "epoch": 0.3860182370820669, | |
| "grad_norm": 1.8255196809768677, | |
| "learning_rate": 4.9167014249924075e-06, | |
| "loss": 0.3369247317314148, | |
| "mean_token_accuracy": 0.8630348443984985, | |
| "num_tokens": 4322731.0, | |
| "step": 508 | |
| }, | |
| { | |
| "epoch": 0.38677811550151975, | |
| "grad_norm": 2.2271909713745117, | |
| "learning_rate": 4.916164441804044e-06, | |
| "loss": 0.5025190711021423, | |
| "mean_token_accuracy": 0.8220493197441101, | |
| "num_tokens": 4329541.0, | |
| "step": 509 | |
| }, | |
| { | |
| "epoch": 0.38753799392097266, | |
| "grad_norm": 2.1169731616973877, | |
| "learning_rate": 4.915625762858467e-06, | |
| "loss": 0.49157875776290894, | |
| "mean_token_accuracy": 0.8319511413574219, | |
| "num_tokens": 4335972.0, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 0.3882978723404255, | |
| "grad_norm": 1.2977492809295654, | |
| "learning_rate": 4.915085388533743e-06, | |
| "loss": 0.4655773341655731, | |
| "mean_token_accuracy": 0.8238399028778076, | |
| "num_tokens": 4355657.0, | |
| "step": 511 | |
| }, | |
| { | |
| "epoch": 0.3890577507598784, | |
| "grad_norm": 2.400388479232788, | |
| "learning_rate": 4.914543319209126e-06, | |
| "loss": 0.510635256767273, | |
| "mean_token_accuracy": 0.8391962051391602, | |
| "num_tokens": 4361302.0, | |
| "step": 512 | |
| }, | |
| { | |
| "epoch": 0.3898176291793313, | |
| "grad_norm": 2.808622121810913, | |
| "learning_rate": 4.913999555265062e-06, | |
| "loss": 0.3918214440345764, | |
| "mean_token_accuracy": 0.854076623916626, | |
| "num_tokens": 4365314.0, | |
| "step": 513 | |
| }, | |
| { | |
| "epoch": 0.3905775075987842, | |
| "grad_norm": 2.3694703578948975, | |
| "learning_rate": 4.913454097083185e-06, | |
| "loss": 0.4798097312450409, | |
| "mean_token_accuracy": 0.8340897560119629, | |
| "num_tokens": 4370592.0, | |
| "step": 514 | |
| }, | |
| { | |
| "epoch": 0.3913373860182371, | |
| "grad_norm": 2.32905650138855, | |
| "learning_rate": 4.912906945046319e-06, | |
| "loss": 0.5024875998497009, | |
| "mean_token_accuracy": 0.8495163917541504, | |
| "num_tokens": 4376713.0, | |
| "step": 515 | |
| }, | |
| { | |
| "epoch": 0.39209726443769, | |
| "grad_norm": 1.4975775480270386, | |
| "learning_rate": 4.912358099538476e-06, | |
| "loss": 0.44349104166030884, | |
| "mean_token_accuracy": 0.8256516456604004, | |
| "num_tokens": 4392145.0, | |
| "step": 516 | |
| }, | |
| { | |
| "epoch": 0.39285714285714285, | |
| "grad_norm": 1.3479279279708862, | |
| "learning_rate": 4.911807560944858e-06, | |
| "loss": 0.39952075481414795, | |
| "mean_token_accuracy": 0.8625344038009644, | |
| "num_tokens": 4407508.0, | |
| "step": 517 | |
| }, | |
| { | |
| "epoch": 0.39361702127659576, | |
| "grad_norm": 2.503182888031006, | |
| "learning_rate": 4.911255329651852e-06, | |
| "loss": 0.5685954689979553, | |
| "mean_token_accuracy": 0.8361418843269348, | |
| "num_tokens": 4413300.0, | |
| "step": 518 | |
| }, | |
| { | |
| "epoch": 0.3943768996960486, | |
| "grad_norm": 1.550897479057312, | |
| "learning_rate": 4.910701406047037e-06, | |
| "loss": 0.5214766263961792, | |
| "mean_token_accuracy": 0.8029816150665283, | |
| "num_tokens": 4432131.0, | |
| "step": 519 | |
| }, | |
| { | |
| "epoch": 0.3951367781155015, | |
| "grad_norm": 2.2570624351501465, | |
| "learning_rate": 4.910145790519177e-06, | |
| "loss": 0.5039506554603577, | |
| "mean_token_accuracy": 0.8227694034576416, | |
| "num_tokens": 4439014.0, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 0.3958966565349544, | |
| "grad_norm": 1.2405915260314941, | |
| "learning_rate": 4.9095884834582256e-06, | |
| "loss": 0.43927496671676636, | |
| "mean_token_accuracy": 0.8406883478164673, | |
| "num_tokens": 4456142.0, | |
| "step": 521 | |
| }, | |
| { | |
| "epoch": 0.3966565349544073, | |
| "grad_norm": 2.8999664783477783, | |
| "learning_rate": 4.909029485255321e-06, | |
| "loss": 0.4566405415534973, | |
| "mean_token_accuracy": 0.8406169414520264, | |
| "num_tokens": 4460155.0, | |
| "step": 522 | |
| }, | |
| { | |
| "epoch": 0.3974164133738602, | |
| "grad_norm": 2.3775827884674072, | |
| "learning_rate": 4.90846879630279e-06, | |
| "loss": 0.47347691655158997, | |
| "mean_token_accuracy": 0.8391696214675903, | |
| "num_tokens": 4466962.0, | |
| "step": 523 | |
| }, | |
| { | |
| "epoch": 0.3981762917933131, | |
| "grad_norm": 2.705432176589966, | |
| "learning_rate": 4.907906416994146e-06, | |
| "loss": 0.34877750277519226, | |
| "mean_token_accuracy": 0.8586087822914124, | |
| "num_tokens": 4472049.0, | |
| "step": 524 | |
| }, | |
| { | |
| "epoch": 0.39893617021276595, | |
| "grad_norm": 2.1355597972869873, | |
| "learning_rate": 4.907342347724088e-06, | |
| "loss": 0.5352267026901245, | |
| "mean_token_accuracy": 0.8132659196853638, | |
| "num_tokens": 4479469.0, | |
| "step": 525 | |
| }, | |
| { | |
| "epoch": 0.39969604863221886, | |
| "grad_norm": 2.642930269241333, | |
| "learning_rate": 4.906776588888502e-06, | |
| "loss": 0.5417162179946899, | |
| "mean_token_accuracy": 0.8195326328277588, | |
| "num_tokens": 4484888.0, | |
| "step": 526 | |
| }, | |
| { | |
| "epoch": 0.4004559270516717, | |
| "grad_norm": 1.9653481245040894, | |
| "learning_rate": 4.906209140884459e-06, | |
| "loss": 0.5188787579536438, | |
| "mean_token_accuracy": 0.8164578080177307, | |
| "num_tokens": 4494126.0, | |
| "step": 527 | |
| }, | |
| { | |
| "epoch": 0.4012158054711246, | |
| "grad_norm": 2.1649622917175293, | |
| "learning_rate": 4.905640004110216e-06, | |
| "loss": 0.5350884199142456, | |
| "mean_token_accuracy": 0.8144514560699463, | |
| "num_tokens": 4500747.0, | |
| "step": 528 | |
| }, | |
| { | |
| "epoch": 0.40197568389057753, | |
| "grad_norm": 1.780606985092163, | |
| "learning_rate": 4.905069178965215e-06, | |
| "loss": 0.49227356910705566, | |
| "mean_token_accuracy": 0.8300670385360718, | |
| "num_tokens": 4511324.0, | |
| "step": 529 | |
| }, | |
| { | |
| "epoch": 0.4027355623100304, | |
| "grad_norm": 2.383995771408081, | |
| "learning_rate": 4.904496665850083e-06, | |
| "loss": 0.5728126764297485, | |
| "mean_token_accuracy": 0.7982839345932007, | |
| "num_tokens": 4518388.0, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 0.4034954407294833, | |
| "grad_norm": 2.1188619136810303, | |
| "learning_rate": 4.903922465166633e-06, | |
| "loss": 0.5170526504516602, | |
| "mean_token_accuracy": 0.8247905969619751, | |
| "num_tokens": 4524956.0, | |
| "step": 531 | |
| }, | |
| { | |
| "epoch": 0.40425531914893614, | |
| "grad_norm": 1.3629902601242065, | |
| "learning_rate": 4.903346577317859e-06, | |
| "loss": 0.4504483938217163, | |
| "mean_token_accuracy": 0.8357315063476562, | |
| "num_tokens": 4542947.0, | |
| "step": 532 | |
| }, | |
| { | |
| "epoch": 0.40501519756838905, | |
| "grad_norm": 1.911316156387329, | |
| "learning_rate": 4.902769002707942e-06, | |
| "loss": 0.3103373944759369, | |
| "mean_token_accuracy": 0.8955463767051697, | |
| "num_tokens": 4548663.0, | |
| "step": 533 | |
| }, | |
| { | |
| "epoch": 0.40577507598784196, | |
| "grad_norm": 1.58231782913208, | |
| "learning_rate": 4.902189741742247e-06, | |
| "loss": 0.4453960359096527, | |
| "mean_token_accuracy": 0.8380752205848694, | |
| "num_tokens": 4561920.0, | |
| "step": 534 | |
| }, | |
| { | |
| "epoch": 0.4065349544072948, | |
| "grad_norm": 2.2610323429107666, | |
| "learning_rate": 4.901608794827321e-06, | |
| "loss": 0.37087973952293396, | |
| "mean_token_accuracy": 0.8731696605682373, | |
| "num_tokens": 4566566.0, | |
| "step": 535 | |
| }, | |
| { | |
| "epoch": 0.4072948328267477, | |
| "grad_norm": 2.357311248779297, | |
| "learning_rate": 4.9010261623708945e-06, | |
| "loss": 0.4292754530906677, | |
| "mean_token_accuracy": 0.8524476289749146, | |
| "num_tokens": 4572060.0, | |
| "step": 536 | |
| }, | |
| { | |
| "epoch": 0.40805471124620063, | |
| "grad_norm": 1.5455151796340942, | |
| "learning_rate": 4.900441844781882e-06, | |
| "loss": 0.49441856145858765, | |
| "mean_token_accuracy": 0.8392636775970459, | |
| "num_tokens": 4583971.0, | |
| "step": 537 | |
| }, | |
| { | |
| "epoch": 0.4088145896656535, | |
| "grad_norm": 2.362008810043335, | |
| "learning_rate": 4.89985584247038e-06, | |
| "loss": 0.45737266540527344, | |
| "mean_token_accuracy": 0.862938642501831, | |
| "num_tokens": 4590152.0, | |
| "step": 538 | |
| }, | |
| { | |
| "epoch": 0.4095744680851064, | |
| "grad_norm": 1.7617286443710327, | |
| "learning_rate": 4.899268155847667e-06, | |
| "loss": 0.45554471015930176, | |
| "mean_token_accuracy": 0.8370910882949829, | |
| "num_tokens": 4600587.0, | |
| "step": 539 | |
| }, | |
| { | |
| "epoch": 0.41033434650455924, | |
| "grad_norm": 2.217749834060669, | |
| "learning_rate": 4.898678785326205e-06, | |
| "loss": 0.4922047257423401, | |
| "mean_token_accuracy": 0.8233155608177185, | |
| "num_tokens": 4608944.0, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 0.41109422492401215, | |
| "grad_norm": 2.5413386821746826, | |
| "learning_rate": 4.898087731319637e-06, | |
| "loss": 0.40927654504776, | |
| "mean_token_accuracy": 0.87587571144104, | |
| "num_tokens": 4613298.0, | |
| "step": 541 | |
| }, | |
| { | |
| "epoch": 0.41185410334346506, | |
| "grad_norm": 4.016610145568848, | |
| "learning_rate": 4.8974949942427854e-06, | |
| "loss": 0.44536292552948, | |
| "mean_token_accuracy": 0.8415871858596802, | |
| "num_tokens": 4615879.0, | |
| "step": 542 | |
| }, | |
| { | |
| "epoch": 0.4126139817629179, | |
| "grad_norm": 1.7483938932418823, | |
| "learning_rate": 4.896900574511657e-06, | |
| "loss": 0.4535871148109436, | |
| "mean_token_accuracy": 0.840785026550293, | |
| "num_tokens": 4625246.0, | |
| "step": 543 | |
| }, | |
| { | |
| "epoch": 0.4133738601823708, | |
| "grad_norm": 2.8072991371154785, | |
| "learning_rate": 4.89630447254344e-06, | |
| "loss": 0.6105838418006897, | |
| "mean_token_accuracy": 0.8333185911178589, | |
| "num_tokens": 4636022.0, | |
| "step": 544 | |
| }, | |
| { | |
| "epoch": 0.41413373860182373, | |
| "grad_norm": 1.475715160369873, | |
| "learning_rate": 4.8957066887565005e-06, | |
| "loss": 0.43646636605262756, | |
| "mean_token_accuracy": 0.8444384932518005, | |
| "num_tokens": 4649774.0, | |
| "step": 545 | |
| }, | |
| { | |
| "epoch": 0.4148936170212766, | |
| "grad_norm": 2.4876935482025146, | |
| "learning_rate": 4.895107223570386e-06, | |
| "loss": 0.4014703035354614, | |
| "mean_token_accuracy": 0.8722496628761292, | |
| "num_tokens": 4654593.0, | |
| "step": 546 | |
| }, | |
| { | |
| "epoch": 0.4156534954407295, | |
| "grad_norm": 2.698641538619995, | |
| "learning_rate": 4.894506077405824e-06, | |
| "loss": 0.5129483938217163, | |
| "mean_token_accuracy": 0.8329781889915466, | |
| "num_tokens": 4660158.0, | |
| "step": 547 | |
| }, | |
| { | |
| "epoch": 0.41641337386018235, | |
| "grad_norm": 2.892404317855835, | |
| "learning_rate": 4.893903250684723e-06, | |
| "loss": 0.435560405254364, | |
| "mean_token_accuracy": 0.8472626209259033, | |
| "num_tokens": 4663705.0, | |
| "step": 548 | |
| }, | |
| { | |
| "epoch": 0.41717325227963525, | |
| "grad_norm": 2.3079450130462646, | |
| "learning_rate": 4.893298743830168e-06, | |
| "loss": 0.4949483871459961, | |
| "mean_token_accuracy": 0.820269763469696, | |
| "num_tokens": 4669885.0, | |
| "step": 549 | |
| }, | |
| { | |
| "epoch": 0.41793313069908816, | |
| "grad_norm": 2.3712430000305176, | |
| "learning_rate": 4.892692557266429e-06, | |
| "loss": 0.46948671340942383, | |
| "mean_token_accuracy": 0.8335106372833252, | |
| "num_tokens": 4675845.0, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.418693009118541, | |
| "grad_norm": 3.5355911254882812, | |
| "learning_rate": 4.8920846914189465e-06, | |
| "loss": 0.5023574233055115, | |
| "mean_token_accuracy": 0.8397154808044434, | |
| "num_tokens": 4678778.0, | |
| "step": 551 | |
| }, | |
| { | |
| "epoch": 0.4194528875379939, | |
| "grad_norm": 1.6729077100753784, | |
| "learning_rate": 4.891475146714348e-06, | |
| "loss": 0.5844266414642334, | |
| "mean_token_accuracy": 0.7964673042297363, | |
| "num_tokens": 4692971.0, | |
| "step": 552 | |
| }, | |
| { | |
| "epoch": 0.42021276595744683, | |
| "grad_norm": 1.5726003646850586, | |
| "learning_rate": 4.8908639235804324e-06, | |
| "loss": 0.4634174406528473, | |
| "mean_token_accuracy": 0.8316428661346436, | |
| "num_tokens": 4706483.0, | |
| "step": 553 | |
| }, | |
| { | |
| "epoch": 0.4209726443768997, | |
| "grad_norm": 1.5735766887664795, | |
| "learning_rate": 4.890251022446181e-06, | |
| "loss": 0.5243015289306641, | |
| "mean_token_accuracy": 0.8176264762878418, | |
| "num_tokens": 4721006.0, | |
| "step": 554 | |
| }, | |
| { | |
| "epoch": 0.4217325227963526, | |
| "grad_norm": 1.8711097240447998, | |
| "learning_rate": 4.889636443741752e-06, | |
| "loss": 0.42648065090179443, | |
| "mean_token_accuracy": 0.8540663719177246, | |
| "num_tokens": 4731234.0, | |
| "step": 555 | |
| }, | |
| { | |
| "epoch": 0.42249240121580545, | |
| "grad_norm": 2.1808011531829834, | |
| "learning_rate": 4.88902018789848e-06, | |
| "loss": 0.4090477228164673, | |
| "mean_token_accuracy": 0.8533409833908081, | |
| "num_tokens": 4736982.0, | |
| "step": 556 | |
| }, | |
| { | |
| "epoch": 0.42325227963525835, | |
| "grad_norm": 1.9000643491744995, | |
| "learning_rate": 4.888402255348877e-06, | |
| "loss": 0.4920252561569214, | |
| "mean_token_accuracy": 0.8278638124465942, | |
| "num_tokens": 4746015.0, | |
| "step": 557 | |
| }, | |
| { | |
| "epoch": 0.42401215805471126, | |
| "grad_norm": 1.6835788488388062, | |
| "learning_rate": 4.887782646526631e-06, | |
| "loss": 0.5171347856521606, | |
| "mean_token_accuracy": 0.8305040001869202, | |
| "num_tokens": 4757995.0, | |
| "step": 558 | |
| }, | |
| { | |
| "epoch": 0.4247720364741641, | |
| "grad_norm": 2.4674644470214844, | |
| "learning_rate": 4.887161361866608e-06, | |
| "loss": 0.5379098653793335, | |
| "mean_token_accuracy": 0.805732011795044, | |
| "num_tokens": 4766333.0, | |
| "step": 559 | |
| }, | |
| { | |
| "epoch": 0.425531914893617, | |
| "grad_norm": 2.1818697452545166, | |
| "learning_rate": 4.8865384018048494e-06, | |
| "loss": 0.5358027219772339, | |
| "mean_token_accuracy": 0.8157109022140503, | |
| "num_tokens": 4773302.0, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 0.42629179331306993, | |
| "grad_norm": 1.5563722848892212, | |
| "learning_rate": 4.8859137667785735e-06, | |
| "loss": 0.48180875182151794, | |
| "mean_token_accuracy": 0.8297841548919678, | |
| "num_tokens": 4785130.0, | |
| "step": 561 | |
| }, | |
| { | |
| "epoch": 0.4270516717325228, | |
| "grad_norm": 2.0417134761810303, | |
| "learning_rate": 4.8852874572261715e-06, | |
| "loss": 0.482379674911499, | |
| "mean_token_accuracy": 0.8305231332778931, | |
| "num_tokens": 4791899.0, | |
| "step": 562 | |
| }, | |
| { | |
| "epoch": 0.4278115501519757, | |
| "grad_norm": 1.6289424896240234, | |
| "learning_rate": 4.884659473587213e-06, | |
| "loss": 0.5225020051002502, | |
| "mean_token_accuracy": 0.8188822865486145, | |
| "num_tokens": 4807666.0, | |
| "step": 563 | |
| }, | |
| { | |
| "epoch": 0.42857142857142855, | |
| "grad_norm": 2.3250341415405273, | |
| "learning_rate": 4.884029816302441e-06, | |
| "loss": 0.48849189281463623, | |
| "mean_token_accuracy": 0.8139042854309082, | |
| "num_tokens": 4813566.0, | |
| "step": 564 | |
| }, | |
| { | |
| "epoch": 0.42933130699088146, | |
| "grad_norm": 1.750071406364441, | |
| "learning_rate": 4.883398485813772e-06, | |
| "loss": 0.44057148694992065, | |
| "mean_token_accuracy": 0.8574005365371704, | |
| "num_tokens": 4822765.0, | |
| "step": 565 | |
| }, | |
| { | |
| "epoch": 0.43009118541033436, | |
| "grad_norm": 1.522481083869934, | |
| "learning_rate": 4.8827654825642984e-06, | |
| "loss": 0.4549819827079773, | |
| "mean_token_accuracy": 0.8268336057662964, | |
| "num_tokens": 4835230.0, | |
| "step": 566 | |
| }, | |
| { | |
| "epoch": 0.4308510638297872, | |
| "grad_norm": 1.2659013271331787, | |
| "learning_rate": 4.882130806998287e-06, | |
| "loss": 0.44574666023254395, | |
| "mean_token_accuracy": 0.8060784339904785, | |
| "num_tokens": 4851748.0, | |
| "step": 567 | |
| }, | |
| { | |
| "epoch": 0.4316109422492401, | |
| "grad_norm": 1.9666274785995483, | |
| "learning_rate": 4.881494459561177e-06, | |
| "loss": 0.553135871887207, | |
| "mean_token_accuracy": 0.8108686208724976, | |
| "num_tokens": 4860511.0, | |
| "step": 568 | |
| }, | |
| { | |
| "epoch": 0.43237082066869303, | |
| "grad_norm": 1.1471658945083618, | |
| "learning_rate": 4.880856440699582e-06, | |
| "loss": 0.373235821723938, | |
| "mean_token_accuracy": 0.8662744760513306, | |
| "num_tokens": 4881952.0, | |
| "step": 569 | |
| }, | |
| { | |
| "epoch": 0.4331306990881459, | |
| "grad_norm": 1.6941676139831543, | |
| "learning_rate": 4.880216750861288e-06, | |
| "loss": 0.536140501499176, | |
| "mean_token_accuracy": 0.8076417446136475, | |
| "num_tokens": 4893817.0, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 0.4338905775075988, | |
| "grad_norm": 1.8708007335662842, | |
| "learning_rate": 4.879575390495254e-06, | |
| "loss": 0.37522369623184204, | |
| "mean_token_accuracy": 0.8634734749794006, | |
| "num_tokens": 4900702.0, | |
| "step": 571 | |
| }, | |
| { | |
| "epoch": 0.43465045592705165, | |
| "grad_norm": 3.032951593399048, | |
| "learning_rate": 4.878932360051611e-06, | |
| "loss": 0.5705235004425049, | |
| "mean_token_accuracy": 0.822039008140564, | |
| "num_tokens": 4905360.0, | |
| "step": 572 | |
| }, | |
| { | |
| "epoch": 0.43541033434650456, | |
| "grad_norm": 2.2931487560272217, | |
| "learning_rate": 4.878287659981663e-06, | |
| "loss": 0.4667856693267822, | |
| "mean_token_accuracy": 0.8671571612358093, | |
| "num_tokens": 4911258.0, | |
| "step": 573 | |
| }, | |
| { | |
| "epoch": 0.43617021276595747, | |
| "grad_norm": 1.5880272388458252, | |
| "learning_rate": 4.8776412907378845e-06, | |
| "loss": 0.5333184003829956, | |
| "mean_token_accuracy": 0.8448289632797241, | |
| "num_tokens": 4929347.0, | |
| "step": 574 | |
| }, | |
| { | |
| "epoch": 0.4369300911854103, | |
| "grad_norm": 1.7592260837554932, | |
| "learning_rate": 4.876993252773923e-06, | |
| "loss": 0.42056071758270264, | |
| "mean_token_accuracy": 0.8493223190307617, | |
| "num_tokens": 4937999.0, | |
| "step": 575 | |
| }, | |
| { | |
| "epoch": 0.4376899696048632, | |
| "grad_norm": 1.3128409385681152, | |
| "learning_rate": 4.876343546544596e-06, | |
| "loss": 0.43090301752090454, | |
| "mean_token_accuracy": 0.8431683778762817, | |
| "num_tokens": 4951924.0, | |
| "step": 576 | |
| }, | |
| { | |
| "epoch": 0.43844984802431614, | |
| "grad_norm": 2.2033936977386475, | |
| "learning_rate": 4.8756921725058935e-06, | |
| "loss": 0.5168547630310059, | |
| "mean_token_accuracy": 0.82608562707901, | |
| "num_tokens": 4960473.0, | |
| "step": 577 | |
| }, | |
| { | |
| "epoch": 0.439209726443769, | |
| "grad_norm": 1.54935622215271, | |
| "learning_rate": 4.875039131114975e-06, | |
| "loss": 0.3442407250404358, | |
| "mean_token_accuracy": 0.8524424433708191, | |
| "num_tokens": 4970205.0, | |
| "step": 578 | |
| }, | |
| { | |
| "epoch": 0.4399696048632219, | |
| "grad_norm": 1.621962070465088, | |
| "learning_rate": 4.8743844228301676e-06, | |
| "loss": 0.4696016311645508, | |
| "mean_token_accuracy": 0.8362908363342285, | |
| "num_tokens": 4981997.0, | |
| "step": 579 | |
| }, | |
| { | |
| "epoch": 0.44072948328267475, | |
| "grad_norm": 1.776847004890442, | |
| "learning_rate": 4.873728048110973e-06, | |
| "loss": 0.5784905552864075, | |
| "mean_token_accuracy": 0.7997252941131592, | |
| "num_tokens": 4996539.0, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 0.44148936170212766, | |
| "grad_norm": 2.043445587158203, | |
| "learning_rate": 4.873070007418059e-06, | |
| "loss": 0.49215561151504517, | |
| "mean_token_accuracy": 0.8145220279693604, | |
| "num_tokens": 5005113.0, | |
| "step": 581 | |
| }, | |
| { | |
| "epoch": 0.44224924012158057, | |
| "grad_norm": 1.3814793825149536, | |
| "learning_rate": 4.872410301213265e-06, | |
| "loss": 0.47962701320648193, | |
| "mean_token_accuracy": 0.8382116556167603, | |
| "num_tokens": 5022068.0, | |
| "step": 582 | |
| }, | |
| { | |
| "epoch": 0.4430091185410334, | |
| "grad_norm": 1.8251920938491821, | |
| "learning_rate": 4.871748929959598e-06, | |
| "loss": 0.34540295600891113, | |
| "mean_token_accuracy": 0.8747892379760742, | |
| "num_tokens": 5031947.0, | |
| "step": 583 | |
| }, | |
| { | |
| "epoch": 0.44376899696048633, | |
| "grad_norm": 1.7698776721954346, | |
| "learning_rate": 4.871085894121234e-06, | |
| "loss": 0.5506308078765869, | |
| "mean_token_accuracy": 0.8076712489128113, | |
| "num_tokens": 5046125.0, | |
| "step": 584 | |
| }, | |
| { | |
| "epoch": 0.44452887537993924, | |
| "grad_norm": 2.159858226776123, | |
| "learning_rate": 4.870421194163515e-06, | |
| "loss": 0.4155213236808777, | |
| "mean_token_accuracy": 0.8612204194068909, | |
| "num_tokens": 5050994.0, | |
| "step": 585 | |
| }, | |
| { | |
| "epoch": 0.4452887537993921, | |
| "grad_norm": 2.544252634048462, | |
| "learning_rate": 4.869754830552956e-06, | |
| "loss": 0.4318983554840088, | |
| "mean_token_accuracy": 0.853102445602417, | |
| "num_tokens": 5055745.0, | |
| "step": 586 | |
| }, | |
| { | |
| "epoch": 0.446048632218845, | |
| "grad_norm": 2.085960626602173, | |
| "learning_rate": 4.869086803757235e-06, | |
| "loss": 0.5072901248931885, | |
| "mean_token_accuracy": 0.8226085305213928, | |
| "num_tokens": 5062649.0, | |
| "step": 587 | |
| }, | |
| { | |
| "epoch": 0.44680851063829785, | |
| "grad_norm": 2.9260871410369873, | |
| "learning_rate": 4.868417114245199e-06, | |
| "loss": 0.5853766798973083, | |
| "mean_token_accuracy": 0.8455843329429626, | |
| "num_tokens": 5067887.0, | |
| "step": 588 | |
| }, | |
| { | |
| "epoch": 0.44756838905775076, | |
| "grad_norm": 1.7980996370315552, | |
| "learning_rate": 4.867745762486862e-06, | |
| "loss": 0.496432900428772, | |
| "mean_token_accuracy": 0.8240994811058044, | |
| "num_tokens": 5077445.0, | |
| "step": 589 | |
| }, | |
| { | |
| "epoch": 0.44832826747720367, | |
| "grad_norm": 1.5280243158340454, | |
| "learning_rate": 4.8670727489534035e-06, | |
| "loss": 0.4973749816417694, | |
| "mean_token_accuracy": 0.8444130420684814, | |
| "num_tokens": 5090569.0, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 0.4490881458966565, | |
| "grad_norm": 2.85929012298584, | |
| "learning_rate": 4.866398074117173e-06, | |
| "loss": 0.3670230507850647, | |
| "mean_token_accuracy": 0.8688805103302002, | |
| "num_tokens": 5093853.0, | |
| "step": 591 | |
| }, | |
| { | |
| "epoch": 0.44984802431610943, | |
| "grad_norm": 2.1087558269500732, | |
| "learning_rate": 4.86572173845168e-06, | |
| "loss": 0.5709018707275391, | |
| "mean_token_accuracy": 0.8056541085243225, | |
| "num_tokens": 5102248.0, | |
| "step": 592 | |
| }, | |
| { | |
| "epoch": 0.4506079027355623, | |
| "grad_norm": 2.267425537109375, | |
| "learning_rate": 4.865043742431605e-06, | |
| "loss": 0.5483532547950745, | |
| "mean_token_accuracy": 0.8169167637825012, | |
| "num_tokens": 5110453.0, | |
| "step": 593 | |
| }, | |
| { | |
| "epoch": 0.4513677811550152, | |
| "grad_norm": 1.7768651247024536, | |
| "learning_rate": 4.864364086532792e-06, | |
| "loss": 0.4660247564315796, | |
| "mean_token_accuracy": 0.8425134420394897, | |
| "num_tokens": 5122417.0, | |
| "step": 594 | |
| }, | |
| { | |
| "epoch": 0.4521276595744681, | |
| "grad_norm": 1.4207671880722046, | |
| "learning_rate": 4.863682771232249e-06, | |
| "loss": 0.44873684644699097, | |
| "mean_token_accuracy": 0.8296250104904175, | |
| "num_tokens": 5137626.0, | |
| "step": 595 | |
| }, | |
| { | |
| "epoch": 0.45288753799392095, | |
| "grad_norm": 1.976282000541687, | |
| "learning_rate": 4.862999797008149e-06, | |
| "loss": 0.5604327321052551, | |
| "mean_token_accuracy": 0.8158591985702515, | |
| "num_tokens": 5147887.0, | |
| "step": 596 | |
| }, | |
| { | |
| "epoch": 0.45364741641337386, | |
| "grad_norm": 3.4956934452056885, | |
| "learning_rate": 4.862315164339829e-06, | |
| "loss": 0.4131582975387573, | |
| "mean_token_accuracy": 0.852203905582428, | |
| "num_tokens": 5150992.0, | |
| "step": 597 | |
| }, | |
| { | |
| "epoch": 0.45440729483282677, | |
| "grad_norm": 3.1881635189056396, | |
| "learning_rate": 4.861628873707792e-06, | |
| "loss": 0.6443498134613037, | |
| "mean_token_accuracy": 0.7832992672920227, | |
| "num_tokens": 5154764.0, | |
| "step": 598 | |
| }, | |
| { | |
| "epoch": 0.4551671732522796, | |
| "grad_norm": 2.0504488945007324, | |
| "learning_rate": 4.860940925593703e-06, | |
| "loss": 0.4550248384475708, | |
| "mean_token_accuracy": 0.8533322811126709, | |
| "num_tokens": 5162569.0, | |
| "step": 599 | |
| }, | |
| { | |
| "epoch": 0.45592705167173253, | |
| "grad_norm": 3.422691822052002, | |
| "learning_rate": 4.86025132048039e-06, | |
| "loss": 0.48267173767089844, | |
| "mean_token_accuracy": 0.8282521963119507, | |
| "num_tokens": 5167056.0, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.4566869300911854, | |
| "grad_norm": 1.7003215551376343, | |
| "learning_rate": 4.859560058851844e-06, | |
| "loss": 0.4646824300289154, | |
| "mean_token_accuracy": 0.8488043546676636, | |
| "num_tokens": 5177752.0, | |
| "step": 601 | |
| }, | |
| { | |
| "epoch": 0.4574468085106383, | |
| "grad_norm": 3.094937324523926, | |
| "learning_rate": 4.8588671411932195e-06, | |
| "loss": 0.47665974497795105, | |
| "mean_token_accuracy": 0.822475790977478, | |
| "num_tokens": 5181059.0, | |
| "step": 602 | |
| }, | |
| { | |
| "epoch": 0.4582066869300912, | |
| "grad_norm": 2.6230075359344482, | |
| "learning_rate": 4.858172567990832e-06, | |
| "loss": 0.5276778936386108, | |
| "mean_token_accuracy": 0.8277568817138672, | |
| "num_tokens": 5186541.0, | |
| "step": 603 | |
| }, | |
| { | |
| "epoch": 0.45896656534954405, | |
| "grad_norm": 2.0146803855895996, | |
| "learning_rate": 4.857476339732162e-06, | |
| "loss": 0.41573643684387207, | |
| "mean_token_accuracy": 0.8459112644195557, | |
| "num_tokens": 5193208.0, | |
| "step": 604 | |
| }, | |
| { | |
| "epoch": 0.45972644376899696, | |
| "grad_norm": 2.2832047939300537, | |
| "learning_rate": 4.856778456905846e-06, | |
| "loss": 0.44260644912719727, | |
| "mean_token_accuracy": 0.837356448173523, | |
| "num_tokens": 5198558.0, | |
| "step": 605 | |
| }, | |
| { | |
| "epoch": 0.46048632218844987, | |
| "grad_norm": 2.215421676635742, | |
| "learning_rate": 4.856078920001689e-06, | |
| "loss": 0.541540265083313, | |
| "mean_token_accuracy": 0.8139248490333557, | |
| "num_tokens": 5204515.0, | |
| "step": 606 | |
| }, | |
| { | |
| "epoch": 0.4612462006079027, | |
| "grad_norm": 2.1481690406799316, | |
| "learning_rate": 4.855377729510648e-06, | |
| "loss": 0.5875097513198853, | |
| "mean_token_accuracy": 0.8083330988883972, | |
| "num_tokens": 5212099.0, | |
| "step": 607 | |
| }, | |
| { | |
| "epoch": 0.46200607902735563, | |
| "grad_norm": 2.555629253387451, | |
| "learning_rate": 4.8546748859248504e-06, | |
| "loss": 0.6058074235916138, | |
| "mean_token_accuracy": 0.7870136499404907, | |
| "num_tokens": 5217974.0, | |
| "step": 608 | |
| }, | |
| { | |
| "epoch": 0.4627659574468085, | |
| "grad_norm": 2.7553253173828125, | |
| "learning_rate": 4.853970389737576e-06, | |
| "loss": 0.2978392541408539, | |
| "mean_token_accuracy": 0.8897851705551147, | |
| "num_tokens": 5221306.0, | |
| "step": 609 | |
| }, | |
| { | |
| "epoch": 0.4635258358662614, | |
| "grad_norm": 2.716369390487671, | |
| "learning_rate": 4.8532642414432675e-06, | |
| "loss": 0.6190924644470215, | |
| "mean_token_accuracy": 0.7923913598060608, | |
| "num_tokens": 5226958.0, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 0.4642857142857143, | |
| "grad_norm": 1.7936944961547852, | |
| "learning_rate": 4.852556441537528e-06, | |
| "loss": 0.33577677607536316, | |
| "mean_token_accuracy": 0.8650237917900085, | |
| "num_tokens": 5234392.0, | |
| "step": 611 | |
| }, | |
| { | |
| "epoch": 0.46504559270516715, | |
| "grad_norm": 1.6506803035736084, | |
| "learning_rate": 4.851846990517118e-06, | |
| "loss": 0.5909202098846436, | |
| "mean_token_accuracy": 0.7970037460327148, | |
| "num_tokens": 5247013.0, | |
| "step": 612 | |
| }, | |
| { | |
| "epoch": 0.46580547112462006, | |
| "grad_norm": 1.8104954957962036, | |
| "learning_rate": 4.851135888879958e-06, | |
| "loss": 0.432063490152359, | |
| "mean_token_accuracy": 0.8540003299713135, | |
| "num_tokens": 5256835.0, | |
| "step": 613 | |
| }, | |
| { | |
| "epoch": 0.46656534954407297, | |
| "grad_norm": 2.3342862129211426, | |
| "learning_rate": 4.850423137125126e-06, | |
| "loss": 0.5327414870262146, | |
| "mean_token_accuracy": 0.8305746912956238, | |
| "num_tokens": 5264369.0, | |
| "step": 614 | |
| }, | |
| { | |
| "epoch": 0.4673252279635258, | |
| "grad_norm": 2.3439366817474365, | |
| "learning_rate": 4.8497087357528585e-06, | |
| "loss": 0.6383283138275146, | |
| "mean_token_accuracy": 0.8068422079086304, | |
| "num_tokens": 5273232.0, | |
| "step": 615 | |
| }, | |
| { | |
| "epoch": 0.46808510638297873, | |
| "grad_norm": 2.5633602142333984, | |
| "learning_rate": 4.8489926852645505e-06, | |
| "loss": 0.43189486861228943, | |
| "mean_token_accuracy": 0.8446429371833801, | |
| "num_tokens": 5278349.0, | |
| "step": 616 | |
| }, | |
| { | |
| "epoch": 0.4688449848024316, | |
| "grad_norm": 1.5836342573165894, | |
| "learning_rate": 4.848274986162754e-06, | |
| "loss": 0.4738016128540039, | |
| "mean_token_accuracy": 0.8230966329574585, | |
| "num_tokens": 5292550.0, | |
| "step": 617 | |
| }, | |
| { | |
| "epoch": 0.4696048632218845, | |
| "grad_norm": 2.2851970195770264, | |
| "learning_rate": 4.847555638951177e-06, | |
| "loss": 0.4870763123035431, | |
| "mean_token_accuracy": 0.8352062702178955, | |
| "num_tokens": 5299266.0, | |
| "step": 618 | |
| }, | |
| { | |
| "epoch": 0.4703647416413374, | |
| "grad_norm": 1.6537147760391235, | |
| "learning_rate": 4.846834644134686e-06, | |
| "loss": 0.4185401201248169, | |
| "mean_token_accuracy": 0.8508622646331787, | |
| "num_tokens": 5309226.0, | |
| "step": 619 | |
| }, | |
| { | |
| "epoch": 0.47112462006079026, | |
| "grad_norm": 2.48115873336792, | |
| "learning_rate": 4.846112002219301e-06, | |
| "loss": 0.5209293365478516, | |
| "mean_token_accuracy": 0.8153672218322754, | |
| "num_tokens": 5315891.0, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 0.47188449848024316, | |
| "grad_norm": 2.528759002685547, | |
| "learning_rate": 4.845387713712203e-06, | |
| "loss": 0.4290841817855835, | |
| "mean_token_accuracy": 0.8516919612884521, | |
| "num_tokens": 5320410.0, | |
| "step": 621 | |
| }, | |
| { | |
| "epoch": 0.4726443768996961, | |
| "grad_norm": 1.7299134731292725, | |
| "learning_rate": 4.844661779121723e-06, | |
| "loss": 0.5427767634391785, | |
| "mean_token_accuracy": 0.809242844581604, | |
| "num_tokens": 5333369.0, | |
| "step": 622 | |
| }, | |
| { | |
| "epoch": 0.4734042553191489, | |
| "grad_norm": 2.600541353225708, | |
| "learning_rate": 4.843934198957351e-06, | |
| "loss": 0.5924317836761475, | |
| "mean_token_accuracy": 0.834161102771759, | |
| "num_tokens": 5338959.0, | |
| "step": 623 | |
| }, | |
| { | |
| "epoch": 0.47416413373860183, | |
| "grad_norm": 2.5299830436706543, | |
| "learning_rate": 4.84320497372973e-06, | |
| "loss": 0.5819499492645264, | |
| "mean_token_accuracy": 0.7921682596206665, | |
| "num_tokens": 5345088.0, | |
| "step": 624 | |
| }, | |
| { | |
| "epoch": 0.4749240121580547, | |
| "grad_norm": 2.7185213565826416, | |
| "learning_rate": 4.842474103950658e-06, | |
| "loss": 0.3976823687553406, | |
| "mean_token_accuracy": 0.8676345348358154, | |
| "num_tokens": 5349649.0, | |
| "step": 625 | |
| }, | |
| { | |
| "epoch": 0.4756838905775076, | |
| "grad_norm": 3.488968849182129, | |
| "learning_rate": 4.841741590133089e-06, | |
| "loss": 0.6358038783073425, | |
| "mean_token_accuracy": 0.8013798594474792, | |
| "num_tokens": 5353806.0, | |
| "step": 626 | |
| }, | |
| { | |
| "epoch": 0.4764437689969605, | |
| "grad_norm": 2.1854147911071777, | |
| "learning_rate": 4.841007432791129e-06, | |
| "loss": 0.4694806635379791, | |
| "mean_token_accuracy": 0.8429578542709351, | |
| "num_tokens": 5359814.0, | |
| "step": 627 | |
| }, | |
| { | |
| "epoch": 0.47720364741641336, | |
| "grad_norm": 2.2328732013702393, | |
| "learning_rate": 4.8402716324400375e-06, | |
| "loss": 0.3461613953113556, | |
| "mean_token_accuracy": 0.8803620338439941, | |
| "num_tokens": 5365183.0, | |
| "step": 628 | |
| }, | |
| { | |
| "epoch": 0.47796352583586627, | |
| "grad_norm": 1.5411367416381836, | |
| "learning_rate": 4.839534189596228e-06, | |
| "loss": 0.3935459554195404, | |
| "mean_token_accuracy": 0.8545909523963928, | |
| "num_tokens": 5375512.0, | |
| "step": 629 | |
| }, | |
| { | |
| "epoch": 0.4787234042553192, | |
| "grad_norm": 2.197303533554077, | |
| "learning_rate": 4.8387951047772656e-06, | |
| "loss": 0.4462829828262329, | |
| "mean_token_accuracy": 0.8534374237060547, | |
| "num_tokens": 5381628.0, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 0.479483282674772, | |
| "grad_norm": 1.4957845211029053, | |
| "learning_rate": 4.838054378501868e-06, | |
| "loss": 0.45892927050590515, | |
| "mean_token_accuracy": 0.8348727822303772, | |
| "num_tokens": 5394563.0, | |
| "step": 631 | |
| }, | |
| { | |
| "epoch": 0.48024316109422494, | |
| "grad_norm": 1.5337835550308228, | |
| "learning_rate": 4.837312011289907e-06, | |
| "loss": 0.4064037799835205, | |
| "mean_token_accuracy": 0.8595039248466492, | |
| "num_tokens": 5406922.0, | |
| "step": 632 | |
| }, | |
| { | |
| "epoch": 0.4810030395136778, | |
| "grad_norm": 3.764256238937378, | |
| "learning_rate": 4.836568003662403e-06, | |
| "loss": 0.4333711564540863, | |
| "mean_token_accuracy": 0.8405383825302124, | |
| "num_tokens": 5409256.0, | |
| "step": 633 | |
| }, | |
| { | |
| "epoch": 0.4817629179331307, | |
| "grad_norm": 1.2404766082763672, | |
| "learning_rate": 4.8358223561415304e-06, | |
| "loss": 0.3710271120071411, | |
| "mean_token_accuracy": 0.8664895296096802, | |
| "num_tokens": 5424772.0, | |
| "step": 634 | |
| }, | |
| { | |
| "epoch": 0.4825227963525836, | |
| "grad_norm": 1.9933364391326904, | |
| "learning_rate": 4.835075069250613e-06, | |
| "loss": 0.389653742313385, | |
| "mean_token_accuracy": 0.8555202484130859, | |
| "num_tokens": 5431807.0, | |
| "step": 635 | |
| }, | |
| { | |
| "epoch": 0.48328267477203646, | |
| "grad_norm": 1.4354065656661987, | |
| "learning_rate": 4.8343261435141245e-06, | |
| "loss": 0.45983028411865234, | |
| "mean_token_accuracy": 0.8382909297943115, | |
| "num_tokens": 5448859.0, | |
| "step": 636 | |
| }, | |
| { | |
| "epoch": 0.48404255319148937, | |
| "grad_norm": 1.6717485189437866, | |
| "learning_rate": 4.833575579457691e-06, | |
| "loss": 0.35919392108917236, | |
| "mean_token_accuracy": 0.8889127969741821, | |
| "num_tokens": 5456548.0, | |
| "step": 637 | |
| }, | |
| { | |
| "epoch": 0.4848024316109423, | |
| "grad_norm": 1.7048540115356445, | |
| "learning_rate": 4.832823377608088e-06, | |
| "loss": 0.4005333483219147, | |
| "mean_token_accuracy": 0.86216139793396, | |
| "num_tokens": 5468141.0, | |
| "step": 638 | |
| }, | |
| { | |
| "epoch": 0.48556231003039513, | |
| "grad_norm": 1.9860360622406006, | |
| "learning_rate": 4.832069538493237e-06, | |
| "loss": 0.3747888207435608, | |
| "mean_token_accuracy": 0.8657118082046509, | |
| "num_tokens": 5474833.0, | |
| "step": 639 | |
| }, | |
| { | |
| "epoch": 0.48632218844984804, | |
| "grad_norm": 1.6138876676559448, | |
| "learning_rate": 4.831314062642213e-06, | |
| "loss": 0.47836577892303467, | |
| "mean_token_accuracy": 0.8378514051437378, | |
| "num_tokens": 5486456.0, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 0.4870820668693009, | |
| "grad_norm": 1.9518992900848389, | |
| "learning_rate": 4.830556950585239e-06, | |
| "loss": 0.40961670875549316, | |
| "mean_token_accuracy": 0.8537701368331909, | |
| "num_tokens": 5493926.0, | |
| "step": 641 | |
| }, | |
| { | |
| "epoch": 0.4878419452887538, | |
| "grad_norm": 2.9804043769836426, | |
| "learning_rate": 4.829798202853683e-06, | |
| "loss": 0.5683507323265076, | |
| "mean_token_accuracy": 0.8166797161102295, | |
| "num_tokens": 5498797.0, | |
| "step": 642 | |
| }, | |
| { | |
| "epoch": 0.4886018237082067, | |
| "grad_norm": 1.910904049873352, | |
| "learning_rate": 4.829037819980065e-06, | |
| "loss": 0.4286502003669739, | |
| "mean_token_accuracy": 0.8568353652954102, | |
| "num_tokens": 5506365.0, | |
| "step": 643 | |
| }, | |
| { | |
| "epoch": 0.48936170212765956, | |
| "grad_norm": 2.3651235103607178, | |
| "learning_rate": 4.828275802498051e-06, | |
| "loss": 0.4485647976398468, | |
| "mean_token_accuracy": 0.8471283316612244, | |
| "num_tokens": 5511944.0, | |
| "step": 644 | |
| }, | |
| { | |
| "epoch": 0.49012158054711247, | |
| "grad_norm": 1.9755531549453735, | |
| "learning_rate": 4.827512150942454e-06, | |
| "loss": 0.4045594334602356, | |
| "mean_token_accuracy": 0.8545803427696228, | |
| "num_tokens": 5520331.0, | |
| "step": 645 | |
| }, | |
| { | |
| "epoch": 0.4908814589665654, | |
| "grad_norm": 1.8841909170150757, | |
| "learning_rate": 4.8267468658492335e-06, | |
| "loss": 0.4773695468902588, | |
| "mean_token_accuracy": 0.8486886024475098, | |
| "num_tokens": 5528605.0, | |
| "step": 646 | |
| }, | |
| { | |
| "epoch": 0.49164133738601823, | |
| "grad_norm": 1.8051406145095825, | |
| "learning_rate": 4.825979947755496e-06, | |
| "loss": 0.5543516278266907, | |
| "mean_token_accuracy": 0.7976117134094238, | |
| "num_tokens": 5540464.0, | |
| "step": 647 | |
| }, | |
| { | |
| "epoch": 0.49240121580547114, | |
| "grad_norm": 3.1752281188964844, | |
| "learning_rate": 4.8252113971994955e-06, | |
| "loss": 0.5746598243713379, | |
| "mean_token_accuracy": 0.825099766254425, | |
| "num_tokens": 5546292.0, | |
| "step": 648 | |
| }, | |
| { | |
| "epoch": 0.493161094224924, | |
| "grad_norm": 2.9637610912323, | |
| "learning_rate": 4.824441214720629e-06, | |
| "loss": 0.40092965960502625, | |
| "mean_token_accuracy": 0.8745595812797546, | |
| "num_tokens": 5549430.0, | |
| "step": 649 | |
| }, | |
| { | |
| "epoch": 0.4939209726443769, | |
| "grad_norm": 2.0234932899475098, | |
| "learning_rate": 4.823669400859441e-06, | |
| "loss": 0.579008162021637, | |
| "mean_token_accuracy": 0.8241140842437744, | |
| "num_tokens": 5557865.0, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.4946808510638298, | |
| "grad_norm": 1.1711043119430542, | |
| "learning_rate": 4.8228959561576195e-06, | |
| "loss": 0.40490004420280457, | |
| "mean_token_accuracy": 0.8479105234146118, | |
| "num_tokens": 5577337.0, | |
| "step": 651 | |
| }, | |
| { | |
| "epoch": 0.49544072948328266, | |
| "grad_norm": 1.9951932430267334, | |
| "learning_rate": 4.822120881157998e-06, | |
| "loss": 0.4962886571884155, | |
| "mean_token_accuracy": 0.8239437341690063, | |
| "num_tokens": 5586382.0, | |
| "step": 652 | |
| }, | |
| { | |
| "epoch": 0.49620060790273557, | |
| "grad_norm": 3.4275267124176025, | |
| "learning_rate": 4.821344176404554e-06, | |
| "loss": 0.4348936080932617, | |
| "mean_token_accuracy": 0.8483643531799316, | |
| "num_tokens": 5589227.0, | |
| "step": 653 | |
| }, | |
| { | |
| "epoch": 0.4969604863221885, | |
| "grad_norm": 3.089677095413208, | |
| "learning_rate": 4.820565842442408e-06, | |
| "loss": 0.4969814419746399, | |
| "mean_token_accuracy": 0.820526123046875, | |
| "num_tokens": 5593318.0, | |
| "step": 654 | |
| }, | |
| { | |
| "epoch": 0.49772036474164133, | |
| "grad_norm": 2.433255434036255, | |
| "learning_rate": 4.819785879817827e-06, | |
| "loss": 0.49912628531455994, | |
| "mean_token_accuracy": 0.8462561368942261, | |
| "num_tokens": 5598313.0, | |
| "step": 655 | |
| }, | |
| { | |
| "epoch": 0.49848024316109424, | |
| "grad_norm": 2.3052635192871094, | |
| "learning_rate": 4.819004289078217e-06, | |
| "loss": 0.5517863035202026, | |
| "mean_token_accuracy": 0.8038468360900879, | |
| "num_tokens": 5604827.0, | |
| "step": 656 | |
| }, | |
| { | |
| "epoch": 0.4992401215805471, | |
| "grad_norm": 2.087714672088623, | |
| "learning_rate": 4.818221070772129e-06, | |
| "loss": 0.5179933309555054, | |
| "mean_token_accuracy": 0.8126649260520935, | |
| "num_tokens": 5612345.0, | |
| "step": 657 | |
| }, | |
| { | |
| "epoch": 0.5, | |
| "grad_norm": 1.5718315839767456, | |
| "learning_rate": 4.8174362254492555e-06, | |
| "loss": 0.49791502952575684, | |
| "mean_token_accuracy": 0.8147987127304077, | |
| "num_tokens": 5624688.0, | |
| "step": 658 | |
| }, | |
| { | |
| "epoch": 0.5007598784194529, | |
| "grad_norm": 2.023894786834717, | |
| "learning_rate": 4.816649753660431e-06, | |
| "loss": 0.3900100290775299, | |
| "mean_token_accuracy": 0.8653963804244995, | |
| "num_tokens": 5630774.0, | |
| "step": 659 | |
| }, | |
| { | |
| "epoch": 0.5015197568389058, | |
| "grad_norm": 3.1195034980773926, | |
| "learning_rate": 4.815861655957632e-06, | |
| "loss": 0.38084471225738525, | |
| "mean_token_accuracy": 0.8551889657974243, | |
| "num_tokens": 5634806.0, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 0.5022796352583586, | |
| "grad_norm": 1.1556870937347412, | |
| "learning_rate": 4.815071932893976e-06, | |
| "loss": 0.41517752408981323, | |
| "mean_token_accuracy": 0.8434287905693054, | |
| "num_tokens": 5652308.0, | |
| "step": 661 | |
| }, | |
| { | |
| "epoch": 0.5030395136778115, | |
| "grad_norm": 1.3827934265136719, | |
| "learning_rate": 4.81428058502372e-06, | |
| "loss": 0.5281240940093994, | |
| "mean_token_accuracy": 0.8140045404434204, | |
| "num_tokens": 5670572.0, | |
| "step": 662 | |
| }, | |
| { | |
| "epoch": 0.5037993920972644, | |
| "grad_norm": 1.8974567651748657, | |
| "learning_rate": 4.813487612902265e-06, | |
| "loss": 0.5226365923881531, | |
| "mean_token_accuracy": 0.837066650390625, | |
| "num_tokens": 5679684.0, | |
| "step": 663 | |
| }, | |
| { | |
| "epoch": 0.5045592705167173, | |
| "grad_norm": 2.4013209342956543, | |
| "learning_rate": 4.812693017086145e-06, | |
| "loss": 0.470547616481781, | |
| "mean_token_accuracy": 0.821993350982666, | |
| "num_tokens": 5685769.0, | |
| "step": 664 | |
| }, | |
| { | |
| "epoch": 0.5053191489361702, | |
| "grad_norm": 1.8950684070587158, | |
| "learning_rate": 4.811896798133042e-06, | |
| "loss": 0.5226761102676392, | |
| "mean_token_accuracy": 0.8077384233474731, | |
| "num_tokens": 5696182.0, | |
| "step": 665 | |
| }, | |
| { | |
| "epoch": 0.506079027355623, | |
| "grad_norm": 2.1453075408935547, | |
| "learning_rate": 4.811098956601772e-06, | |
| "loss": 0.4317760765552521, | |
| "mean_token_accuracy": 0.8489662408828735, | |
| "num_tokens": 5702487.0, | |
| "step": 666 | |
| }, | |
| { | |
| "epoch": 0.506838905775076, | |
| "grad_norm": 1.4667294025421143, | |
| "learning_rate": 4.810299493052289e-06, | |
| "loss": 0.39067623019218445, | |
| "mean_token_accuracy": 0.8580642938613892, | |
| "num_tokens": 5714003.0, | |
| "step": 667 | |
| }, | |
| { | |
| "epoch": 0.5075987841945289, | |
| "grad_norm": 2.8400089740753174, | |
| "learning_rate": 4.809498408045691e-06, | |
| "loss": 0.4783066511154175, | |
| "mean_token_accuracy": 0.8334233164787292, | |
| "num_tokens": 5718153.0, | |
| "step": 668 | |
| }, | |
| { | |
| "epoch": 0.5083586626139818, | |
| "grad_norm": 1.5703561305999756, | |
| "learning_rate": 4.808695702144206e-06, | |
| "loss": 0.4625510573387146, | |
| "mean_token_accuracy": 0.8410191535949707, | |
| "num_tokens": 5730026.0, | |
| "step": 669 | |
| }, | |
| { | |
| "epoch": 0.5091185410334347, | |
| "grad_norm": 1.2351874113082886, | |
| "learning_rate": 4.807891375911207e-06, | |
| "loss": 0.37267962098121643, | |
| "mean_token_accuracy": 0.8449434041976929, | |
| "num_tokens": 5745666.0, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 0.5098784194528876, | |
| "grad_norm": 2.6481969356536865, | |
| "learning_rate": 4.8070854299112e-06, | |
| "loss": 0.6104316711425781, | |
| "mean_token_accuracy": 0.8011107444763184, | |
| "num_tokens": 5751746.0, | |
| "step": 671 | |
| }, | |
| { | |
| "epoch": 0.5106382978723404, | |
| "grad_norm": 2.7330830097198486, | |
| "learning_rate": 4.806277864709828e-06, | |
| "loss": 0.5459984540939331, | |
| "mean_token_accuracy": 0.814912736415863, | |
| "num_tokens": 5756602.0, | |
| "step": 672 | |
| }, | |
| { | |
| "epoch": 0.5113981762917933, | |
| "grad_norm": 2.625068187713623, | |
| "learning_rate": 4.805468680873874e-06, | |
| "loss": 0.4850236177444458, | |
| "mean_token_accuracy": 0.8303148150444031, | |
| "num_tokens": 5761466.0, | |
| "step": 673 | |
| }, | |
| { | |
| "epoch": 0.5121580547112462, | |
| "grad_norm": 2.8971197605133057, | |
| "learning_rate": 4.804657878971252e-06, | |
| "loss": 0.3686336874961853, | |
| "mean_token_accuracy": 0.8712918758392334, | |
| "num_tokens": 5764858.0, | |
| "step": 674 | |
| }, | |
| { | |
| "epoch": 0.5129179331306991, | |
| "grad_norm": 2.4213671684265137, | |
| "learning_rate": 4.803845459571014e-06, | |
| "loss": 0.4329441487789154, | |
| "mean_token_accuracy": 0.8345562219619751, | |
| "num_tokens": 5769167.0, | |
| "step": 675 | |
| }, | |
| { | |
| "epoch": 0.513677811550152, | |
| "grad_norm": 2.9091386795043945, | |
| "learning_rate": 4.803031423243349e-06, | |
| "loss": 0.5625981688499451, | |
| "mean_token_accuracy": 0.8446449041366577, | |
| "num_tokens": 5773945.0, | |
| "step": 676 | |
| }, | |
| { | |
| "epoch": 0.5144376899696048, | |
| "grad_norm": 1.6716400384902954, | |
| "learning_rate": 4.802215770559578e-06, | |
| "loss": 0.5100817680358887, | |
| "mean_token_accuracy": 0.8298100233078003, | |
| "num_tokens": 5785497.0, | |
| "step": 677 | |
| }, | |
| { | |
| "epoch": 0.5151975683890577, | |
| "grad_norm": 2.1554148197174072, | |
| "learning_rate": 4.801398502092156e-06, | |
| "loss": 0.4152962565422058, | |
| "mean_token_accuracy": 0.8570256233215332, | |
| "num_tokens": 5792592.0, | |
| "step": 678 | |
| }, | |
| { | |
| "epoch": 0.5159574468085106, | |
| "grad_norm": 2.435845136642456, | |
| "learning_rate": 4.800579618414677e-06, | |
| "loss": 0.4497343897819519, | |
| "mean_token_accuracy": 0.8423111438751221, | |
| "num_tokens": 5798486.0, | |
| "step": 679 | |
| }, | |
| { | |
| "epoch": 0.5167173252279635, | |
| "grad_norm": 1.989006519317627, | |
| "learning_rate": 4.799759120101861e-06, | |
| "loss": 0.5540581345558167, | |
| "mean_token_accuracy": 0.8367617726325989, | |
| "num_tokens": 5805441.0, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 0.5174772036474165, | |
| "grad_norm": 1.5651880502700806, | |
| "learning_rate": 4.798937007729568e-06, | |
| "loss": 0.4808984398841858, | |
| "mean_token_accuracy": 0.8281278610229492, | |
| "num_tokens": 5819267.0, | |
| "step": 681 | |
| }, | |
| { | |
| "epoch": 0.5182370820668692, | |
| "grad_norm": 1.9812109470367432, | |
| "learning_rate": 4.798113281874788e-06, | |
| "loss": 0.4705607295036316, | |
| "mean_token_accuracy": 0.8278742432594299, | |
| "num_tokens": 5827812.0, | |
| "step": 682 | |
| }, | |
| { | |
| "epoch": 0.5189969604863222, | |
| "grad_norm": 1.6913940906524658, | |
| "learning_rate": 4.797287943115642e-06, | |
| "loss": 0.5208501815795898, | |
| "mean_token_accuracy": 0.8249417543411255, | |
| "num_tokens": 5839235.0, | |
| "step": 683 | |
| }, | |
| { | |
| "epoch": 0.5197568389057751, | |
| "grad_norm": 1.81809401512146, | |
| "learning_rate": 4.796460992031386e-06, | |
| "loss": 0.4658868908882141, | |
| "mean_token_accuracy": 0.8420040607452393, | |
| "num_tokens": 5848972.0, | |
| "step": 684 | |
| }, | |
| { | |
| "epoch": 0.520516717325228, | |
| "grad_norm": 2.1637492179870605, | |
| "learning_rate": 4.7956324292024045e-06, | |
| "loss": 0.5366767644882202, | |
| "mean_token_accuracy": 0.8110297918319702, | |
| "num_tokens": 5856875.0, | |
| "step": 685 | |
| }, | |
| { | |
| "epoch": 0.5212765957446809, | |
| "grad_norm": 2.5795931816101074, | |
| "learning_rate": 4.794802255210217e-06, | |
| "loss": 0.5069275498390198, | |
| "mean_token_accuracy": 0.8302167654037476, | |
| "num_tokens": 5861708.0, | |
| "step": 686 | |
| }, | |
| { | |
| "epoch": 0.5220364741641338, | |
| "grad_norm": 2.643785238265991, | |
| "learning_rate": 4.793970470637469e-06, | |
| "loss": 0.592145562171936, | |
| "mean_token_accuracy": 0.790871262550354, | |
| "num_tokens": 5868283.0, | |
| "step": 687 | |
| }, | |
| { | |
| "epoch": 0.5227963525835866, | |
| "grad_norm": 1.5663838386535645, | |
| "learning_rate": 4.7931370760679415e-06, | |
| "loss": 0.4538711905479431, | |
| "mean_token_accuracy": 0.8419860005378723, | |
| "num_tokens": 5878884.0, | |
| "step": 688 | |
| }, | |
| { | |
| "epoch": 0.5235562310030395, | |
| "grad_norm": 2.236764669418335, | |
| "learning_rate": 4.792302072086542e-06, | |
| "loss": 0.5106793642044067, | |
| "mean_token_accuracy": 0.8331024646759033, | |
| "num_tokens": 5886047.0, | |
| "step": 689 | |
| }, | |
| { | |
| "epoch": 0.5243161094224924, | |
| "grad_norm": 2.964491128921509, | |
| "learning_rate": 4.7914654592793065e-06, | |
| "loss": 0.4634789824485779, | |
| "mean_token_accuracy": 0.8439992666244507, | |
| "num_tokens": 5889693.0, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 0.5250759878419453, | |
| "grad_norm": 1.6666783094406128, | |
| "learning_rate": 4.790627238233405e-06, | |
| "loss": 0.3990614414215088, | |
| "mean_token_accuracy": 0.853827953338623, | |
| "num_tokens": 5898664.0, | |
| "step": 691 | |
| }, | |
| { | |
| "epoch": 0.5258358662613982, | |
| "grad_norm": 2.42938494682312, | |
| "learning_rate": 4.789787409537131e-06, | |
| "loss": 0.5084211230278015, | |
| "mean_token_accuracy": 0.8411228060722351, | |
| "num_tokens": 5905294.0, | |
| "step": 692 | |
| }, | |
| { | |
| "epoch": 0.526595744680851, | |
| "grad_norm": 1.7794736623764038, | |
| "learning_rate": 4.7889459737799105e-06, | |
| "loss": 0.43089574575424194, | |
| "mean_token_accuracy": 0.8505694270133972, | |
| "num_tokens": 5914131.0, | |
| "step": 693 | |
| }, | |
| { | |
| "epoch": 0.5273556231003039, | |
| "grad_norm": 2.302791118621826, | |
| "learning_rate": 4.788102931552294e-06, | |
| "loss": 0.5151532292366028, | |
| "mean_token_accuracy": 0.8154581785202026, | |
| "num_tokens": 5919937.0, | |
| "step": 694 | |
| }, | |
| { | |
| "epoch": 0.5281155015197568, | |
| "grad_norm": 2.371701717376709, | |
| "learning_rate": 4.787258283445962e-06, | |
| "loss": 0.3740626573562622, | |
| "mean_token_accuracy": 0.8779724836349487, | |
| "num_tokens": 5924778.0, | |
| "step": 695 | |
| }, | |
| { | |
| "epoch": 0.5288753799392097, | |
| "grad_norm": 2.1565825939178467, | |
| "learning_rate": 4.786412030053721e-06, | |
| "loss": 0.4572535455226898, | |
| "mean_token_accuracy": 0.8557048439979553, | |
| "num_tokens": 5931744.0, | |
| "step": 696 | |
| }, | |
| { | |
| "epoch": 0.5296352583586627, | |
| "grad_norm": 1.9793959856033325, | |
| "learning_rate": 4.785564171969503e-06, | |
| "loss": 0.46432819962501526, | |
| "mean_token_accuracy": 0.8585522174835205, | |
| "num_tokens": 5942595.0, | |
| "step": 697 | |
| }, | |
| { | |
| "epoch": 0.5303951367781155, | |
| "grad_norm": 2.6405293941497803, | |
| "learning_rate": 4.784714709788368e-06, | |
| "loss": 0.577790379524231, | |
| "mean_token_accuracy": 0.8005146980285645, | |
| "num_tokens": 5947159.0, | |
| "step": 698 | |
| }, | |
| { | |
| "epoch": 0.5311550151975684, | |
| "grad_norm": 1.6467243432998657, | |
| "learning_rate": 4.783863644106502e-06, | |
| "loss": 0.378523588180542, | |
| "mean_token_accuracy": 0.8665074706077576, | |
| "num_tokens": 5955503.0, | |
| "step": 699 | |
| }, | |
| { | |
| "epoch": 0.5319148936170213, | |
| "grad_norm": 1.6351271867752075, | |
| "learning_rate": 4.783010975521216e-06, | |
| "loss": 0.42142003774642944, | |
| "mean_token_accuracy": 0.8479681611061096, | |
| "num_tokens": 5965049.0, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.5326747720364742, | |
| "grad_norm": 1.679734706878662, | |
| "learning_rate": 4.782156704630944e-06, | |
| "loss": 0.42243868112564087, | |
| "mean_token_accuracy": 0.8485743999481201, | |
| "num_tokens": 5975572.0, | |
| "step": 701 | |
| }, | |
| { | |
| "epoch": 0.5334346504559271, | |
| "grad_norm": 1.7572410106658936, | |
| "learning_rate": 4.7813008320352475e-06, | |
| "loss": 0.3124234080314636, | |
| "mean_token_accuracy": 0.8955981135368347, | |
| "num_tokens": 5982331.0, | |
| "step": 702 | |
| }, | |
| { | |
| "epoch": 0.53419452887538, | |
| "grad_norm": 2.038093328475952, | |
| "learning_rate": 4.78044335833481e-06, | |
| "loss": 0.3492271602153778, | |
| "mean_token_accuracy": 0.8749583959579468, | |
| "num_tokens": 5988044.0, | |
| "step": 703 | |
| }, | |
| { | |
| "epoch": 0.5349544072948328, | |
| "grad_norm": 1.502953290939331, | |
| "learning_rate": 4.77958428413144e-06, | |
| "loss": 0.45608213543891907, | |
| "mean_token_accuracy": 0.8436861634254456, | |
| "num_tokens": 5999360.0, | |
| "step": 704 | |
| }, | |
| { | |
| "epoch": 0.5357142857142857, | |
| "grad_norm": 1.2919143438339233, | |
| "learning_rate": 4.7787236100280685e-06, | |
| "loss": 0.35781624913215637, | |
| "mean_token_accuracy": 0.857500433921814, | |
| "num_tokens": 6014319.0, | |
| "step": 705 | |
| }, | |
| { | |
| "epoch": 0.5364741641337386, | |
| "grad_norm": 1.4746363162994385, | |
| "learning_rate": 4.777861336628751e-06, | |
| "loss": 0.4553064703941345, | |
| "mean_token_accuracy": 0.8644794821739197, | |
| "num_tokens": 6032012.0, | |
| "step": 706 | |
| }, | |
| { | |
| "epoch": 0.5372340425531915, | |
| "grad_norm": 1.2029075622558594, | |
| "learning_rate": 4.7769974645386616e-06, | |
| "loss": 0.36050671339035034, | |
| "mean_token_accuracy": 0.8742889165878296, | |
| "num_tokens": 6053982.0, | |
| "step": 707 | |
| }, | |
| { | |
| "epoch": 0.5379939209726444, | |
| "grad_norm": 1.7229902744293213, | |
| "learning_rate": 4.776131994364102e-06, | |
| "loss": 0.4012393653392792, | |
| "mean_token_accuracy": 0.8494868278503418, | |
| "num_tokens": 6062608.0, | |
| "step": 708 | |
| }, | |
| { | |
| "epoch": 0.5387537993920972, | |
| "grad_norm": 1.6619466543197632, | |
| "learning_rate": 4.775264926712489e-06, | |
| "loss": 0.5714209675788879, | |
| "mean_token_accuracy": 0.8079773187637329, | |
| "num_tokens": 6074800.0, | |
| "step": 709 | |
| }, | |
| { | |
| "epoch": 0.5395136778115501, | |
| "grad_norm": 1.8739644289016724, | |
| "learning_rate": 4.774396262192368e-06, | |
| "loss": 0.5224334001541138, | |
| "mean_token_accuracy": 0.8226214647293091, | |
| "num_tokens": 6084801.0, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 0.540273556231003, | |
| "grad_norm": 1.7326252460479736, | |
| "learning_rate": 4.7735260014133986e-06, | |
| "loss": 0.4534417390823364, | |
| "mean_token_accuracy": 0.85335773229599, | |
| "num_tokens": 6095510.0, | |
| "step": 711 | |
| }, | |
| { | |
| "epoch": 0.541033434650456, | |
| "grad_norm": 1.5122230052947998, | |
| "learning_rate": 4.772654144986364e-06, | |
| "loss": 0.361116886138916, | |
| "mean_token_accuracy": 0.8660293817520142, | |
| "num_tokens": 6106354.0, | |
| "step": 712 | |
| }, | |
| { | |
| "epoch": 0.5417933130699089, | |
| "grad_norm": 2.6396396160125732, | |
| "learning_rate": 4.7717806935231665e-06, | |
| "loss": 0.3782613277435303, | |
| "mean_token_accuracy": 0.8628933429718018, | |
| "num_tokens": 6110425.0, | |
| "step": 713 | |
| }, | |
| { | |
| "epoch": 0.5425531914893617, | |
| "grad_norm": 1.488351583480835, | |
| "learning_rate": 4.770905647636828e-06, | |
| "loss": 0.5653847455978394, | |
| "mean_token_accuracy": 0.7892078161239624, | |
| "num_tokens": 6126792.0, | |
| "step": 714 | |
| }, | |
| { | |
| "epoch": 0.5433130699088146, | |
| "grad_norm": 2.223890781402588, | |
| "learning_rate": 4.77002900794149e-06, | |
| "loss": 0.5461349487304688, | |
| "mean_token_accuracy": 0.805544376373291, | |
| "num_tokens": 6134544.0, | |
| "step": 715 | |
| }, | |
| { | |
| "epoch": 0.5440729483282675, | |
| "grad_norm": 2.103342056274414, | |
| "learning_rate": 4.769150775052411e-06, | |
| "loss": 0.5122923254966736, | |
| "mean_token_accuracy": 0.8244349360466003, | |
| "num_tokens": 6141102.0, | |
| "step": 716 | |
| }, | |
| { | |
| "epoch": 0.5448328267477204, | |
| "grad_norm": 3.34637451171875, | |
| "learning_rate": 4.768270949585968e-06, | |
| "loss": 0.6029807329177856, | |
| "mean_token_accuracy": 0.7977179884910583, | |
| "num_tokens": 6145033.0, | |
| "step": 717 | |
| }, | |
| { | |
| "epoch": 0.5455927051671733, | |
| "grad_norm": 2.35310959815979, | |
| "learning_rate": 4.767389532159659e-06, | |
| "loss": 0.36193016171455383, | |
| "mean_token_accuracy": 0.8708676099777222, | |
| "num_tokens": 6149618.0, | |
| "step": 718 | |
| }, | |
| { | |
| "epoch": 0.5463525835866262, | |
| "grad_norm": 2.06655216217041, | |
| "learning_rate": 4.766506523392095e-06, | |
| "loss": 0.3700219690799713, | |
| "mean_token_accuracy": 0.8714411854743958, | |
| "num_tokens": 6155510.0, | |
| "step": 719 | |
| }, | |
| { | |
| "epoch": 0.547112462006079, | |
| "grad_norm": 1.1178799867630005, | |
| "learning_rate": 4.765621923903005e-06, | |
| "loss": 0.44793474674224854, | |
| "mean_token_accuracy": 0.8378269672393799, | |
| "num_tokens": 6178674.0, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 0.5478723404255319, | |
| "grad_norm": 3.2518749237060547, | |
| "learning_rate": 4.764735734313236e-06, | |
| "loss": 0.4045289158821106, | |
| "mean_token_accuracy": 0.8469523191452026, | |
| "num_tokens": 6183091.0, | |
| "step": 721 | |
| }, | |
| { | |
| "epoch": 0.5486322188449848, | |
| "grad_norm": 2.1426641941070557, | |
| "learning_rate": 4.763847955244749e-06, | |
| "loss": 0.5390356779098511, | |
| "mean_token_accuracy": 0.819678783416748, | |
| "num_tokens": 6190703.0, | |
| "step": 722 | |
| }, | |
| { | |
| "epoch": 0.5493920972644377, | |
| "grad_norm": 2.606250762939453, | |
| "learning_rate": 4.762958587320623e-06, | |
| "loss": 0.526504635810852, | |
| "mean_token_accuracy": 0.8235678672790527, | |
| "num_tokens": 6196950.0, | |
| "step": 723 | |
| }, | |
| { | |
| "epoch": 0.5501519756838906, | |
| "grad_norm": 1.9263027906417847, | |
| "learning_rate": 4.762067631165049e-06, | |
| "loss": 0.48520591855049133, | |
| "mean_token_accuracy": 0.8337453603744507, | |
| "num_tokens": 6205760.0, | |
| "step": 724 | |
| }, | |
| { | |
| "epoch": 0.5509118541033434, | |
| "grad_norm": 4.06183385848999, | |
| "learning_rate": 4.761175087403336e-06, | |
| "loss": 0.4972907602787018, | |
| "mean_token_accuracy": 0.8399640917778015, | |
| "num_tokens": 6208851.0, | |
| "step": 725 | |
| }, | |
| { | |
| "epoch": 0.5516717325227963, | |
| "grad_norm": 2.033979654312134, | |
| "learning_rate": 4.760280956661904e-06, | |
| "loss": 0.43614792823791504, | |
| "mean_token_accuracy": 0.837505578994751, | |
| "num_tokens": 6216024.0, | |
| "step": 726 | |
| }, | |
| { | |
| "epoch": 0.5524316109422492, | |
| "grad_norm": 2.0920820236206055, | |
| "learning_rate": 4.75938523956829e-06, | |
| "loss": 0.45022889971733093, | |
| "mean_token_accuracy": 0.8302348256111145, | |
| "num_tokens": 6223527.0, | |
| "step": 727 | |
| }, | |
| { | |
| "epoch": 0.5531914893617021, | |
| "grad_norm": 1.4943269491195679, | |
| "learning_rate": 4.75848793675114e-06, | |
| "loss": 0.4882992208003998, | |
| "mean_token_accuracy": 0.8428292274475098, | |
| "num_tokens": 6240470.0, | |
| "step": 728 | |
| }, | |
| { | |
| "epoch": 0.5539513677811551, | |
| "grad_norm": 2.4450228214263916, | |
| "learning_rate": 4.757589048840219e-06, | |
| "loss": 0.36317628622055054, | |
| "mean_token_accuracy": 0.8800690174102783, | |
| "num_tokens": 6244432.0, | |
| "step": 729 | |
| }, | |
| { | |
| "epoch": 0.5547112462006079, | |
| "grad_norm": 2.691455841064453, | |
| "learning_rate": 4.756688576466398e-06, | |
| "loss": 0.4764796495437622, | |
| "mean_token_accuracy": 0.8537558913230896, | |
| "num_tokens": 6248845.0, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 0.5554711246200608, | |
| "grad_norm": 1.594687819480896, | |
| "learning_rate": 4.755786520261666e-06, | |
| "loss": 0.45443981885910034, | |
| "mean_token_accuracy": 0.8340262174606323, | |
| "num_tokens": 6261384.0, | |
| "step": 731 | |
| }, | |
| { | |
| "epoch": 0.5562310030395137, | |
| "grad_norm": 1.464937686920166, | |
| "learning_rate": 4.75488288085912e-06, | |
| "loss": 0.37749332189559937, | |
| "mean_token_accuracy": 0.862264096736908, | |
| "num_tokens": 6272961.0, | |
| "step": 732 | |
| }, | |
| { | |
| "epoch": 0.5569908814589666, | |
| "grad_norm": 2.8826513290405273, | |
| "learning_rate": 4.753977658892967e-06, | |
| "loss": 0.4915273189544678, | |
| "mean_token_accuracy": 0.8192765712738037, | |
| "num_tokens": 6277003.0, | |
| "step": 733 | |
| }, | |
| { | |
| "epoch": 0.5577507598784195, | |
| "grad_norm": 1.9049111604690552, | |
| "learning_rate": 4.753070854998529e-06, | |
| "loss": 0.4396716356277466, | |
| "mean_token_accuracy": 0.8403393030166626, | |
| "num_tokens": 6284281.0, | |
| "step": 734 | |
| }, | |
| { | |
| "epoch": 0.5585106382978723, | |
| "grad_norm": 2.1077311038970947, | |
| "learning_rate": 4.752162469812234e-06, | |
| "loss": 0.4790551960468292, | |
| "mean_token_accuracy": 0.8353488445281982, | |
| "num_tokens": 6291830.0, | |
| "step": 735 | |
| }, | |
| { | |
| "epoch": 0.5592705167173252, | |
| "grad_norm": 1.2300493717193604, | |
| "learning_rate": 4.751252503971624e-06, | |
| "loss": 0.3813965916633606, | |
| "mean_token_accuracy": 0.8325374126434326, | |
| "num_tokens": 6308423.0, | |
| "step": 736 | |
| }, | |
| { | |
| "epoch": 0.5600303951367781, | |
| "grad_norm": 1.8665045499801636, | |
| "learning_rate": 4.750340958115346e-06, | |
| "loss": 0.5871419310569763, | |
| "mean_token_accuracy": 0.8048850893974304, | |
| "num_tokens": 6320081.0, | |
| "step": 737 | |
| }, | |
| { | |
| "epoch": 0.560790273556231, | |
| "grad_norm": 1.7575510740280151, | |
| "learning_rate": 4.749427832883158e-06, | |
| "loss": 0.4706610441207886, | |
| "mean_token_accuracy": 0.8373581171035767, | |
| "num_tokens": 6330330.0, | |
| "step": 738 | |
| }, | |
| { | |
| "epoch": 0.5615501519756839, | |
| "grad_norm": 2.0865676403045654, | |
| "learning_rate": 4.748513128915928e-06, | |
| "loss": 0.4866131544113159, | |
| "mean_token_accuracy": 0.8118366003036499, | |
| "num_tokens": 6337844.0, | |
| "step": 739 | |
| }, | |
| { | |
| "epoch": 0.5623100303951368, | |
| "grad_norm": 2.1978392601013184, | |
| "learning_rate": 4.747596846855629e-06, | |
| "loss": 0.4845651686191559, | |
| "mean_token_accuracy": 0.8276708126068115, | |
| "num_tokens": 6344039.0, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 0.5630699088145896, | |
| "grad_norm": 1.6874626874923706, | |
| "learning_rate": 4.7466789873453446e-06, | |
| "loss": 0.41359972953796387, | |
| "mean_token_accuracy": 0.8586591482162476, | |
| "num_tokens": 6355767.0, | |
| "step": 741 | |
| }, | |
| { | |
| "epoch": 0.5638297872340425, | |
| "grad_norm": 1.4732517004013062, | |
| "learning_rate": 4.7457595510292615e-06, | |
| "loss": 0.5254936218261719, | |
| "mean_token_accuracy": 0.8229131698608398, | |
| "num_tokens": 6369705.0, | |
| "step": 742 | |
| }, | |
| { | |
| "epoch": 0.5645896656534954, | |
| "grad_norm": 1.4983631372451782, | |
| "learning_rate": 4.744838538552678e-06, | |
| "loss": 0.4143417477607727, | |
| "mean_token_accuracy": 0.8397407531738281, | |
| "num_tokens": 6382034.0, | |
| "step": 743 | |
| }, | |
| { | |
| "epoch": 0.5653495440729484, | |
| "grad_norm": 3.680663824081421, | |
| "learning_rate": 4.7439159505619946e-06, | |
| "loss": 0.397993266582489, | |
| "mean_token_accuracy": 0.8769915699958801, | |
| "num_tokens": 6384625.0, | |
| "step": 744 | |
| }, | |
| { | |
| "epoch": 0.5661094224924013, | |
| "grad_norm": 2.1235272884368896, | |
| "learning_rate": 4.74299178770472e-06, | |
| "loss": 0.5371411442756653, | |
| "mean_token_accuracy": 0.8123522400856018, | |
| "num_tokens": 6392977.0, | |
| "step": 745 | |
| }, | |
| { | |
| "epoch": 0.5668693009118541, | |
| "grad_norm": 4.376061916351318, | |
| "learning_rate": 4.742066050629465e-06, | |
| "loss": 0.5314540863037109, | |
| "mean_token_accuracy": 0.8173398375511169, | |
| "num_tokens": 6398374.0, | |
| "step": 746 | |
| }, | |
| { | |
| "epoch": 0.567629179331307, | |
| "grad_norm": 1.3401854038238525, | |
| "learning_rate": 4.741138739985951e-06, | |
| "loss": 0.37147653102874756, | |
| "mean_token_accuracy": 0.8691932559013367, | |
| "num_tokens": 6409794.0, | |
| "step": 747 | |
| }, | |
| { | |
| "epoch": 0.5683890577507599, | |
| "grad_norm": 1.9703563451766968, | |
| "learning_rate": 4.740209856424998e-06, | |
| "loss": 0.5025161504745483, | |
| "mean_token_accuracy": 0.8211580514907837, | |
| "num_tokens": 6424118.0, | |
| "step": 748 | |
| }, | |
| { | |
| "epoch": 0.5691489361702128, | |
| "grad_norm": 1.3038517236709595, | |
| "learning_rate": 4.7392794005985324e-06, | |
| "loss": 0.3963512182235718, | |
| "mean_token_accuracy": 0.8580185174942017, | |
| "num_tokens": 6440944.0, | |
| "step": 749 | |
| }, | |
| { | |
| "epoch": 0.5699088145896657, | |
| "grad_norm": 1.4175636768341064, | |
| "learning_rate": 4.738347373159585e-06, | |
| "loss": 0.5285158157348633, | |
| "mean_token_accuracy": 0.8222070336341858, | |
| "num_tokens": 6456143.0, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.5706686930091185, | |
| "grad_norm": 2.14648175239563, | |
| "learning_rate": 4.737413774762287e-06, | |
| "loss": 0.3865134119987488, | |
| "mean_token_accuracy": 0.8414736986160278, | |
| "num_tokens": 6461740.0, | |
| "step": 751 | |
| }, | |
| { | |
| "epoch": 0.5714285714285714, | |
| "grad_norm": 1.5074299573898315, | |
| "learning_rate": 4.736478606061876e-06, | |
| "loss": 0.43085965514183044, | |
| "mean_token_accuracy": 0.8482084274291992, | |
| "num_tokens": 6473155.0, | |
| "step": 752 | |
| }, | |
| { | |
| "epoch": 0.5721884498480243, | |
| "grad_norm": 3.027317523956299, | |
| "learning_rate": 4.735541867714687e-06, | |
| "loss": 0.3914988934993744, | |
| "mean_token_accuracy": 0.8739659786224365, | |
| "num_tokens": 6476832.0, | |
| "step": 753 | |
| }, | |
| { | |
| "epoch": 0.5729483282674772, | |
| "grad_norm": 2.3828823566436768, | |
| "learning_rate": 4.73460356037816e-06, | |
| "loss": 0.6306310892105103, | |
| "mean_token_accuracy": 0.7872239351272583, | |
| "num_tokens": 6483810.0, | |
| "step": 754 | |
| }, | |
| { | |
| "epoch": 0.5737082066869301, | |
| "grad_norm": 2.0841176509857178, | |
| "learning_rate": 4.733663684710835e-06, | |
| "loss": 0.5214060544967651, | |
| "mean_token_accuracy": 0.8292238712310791, | |
| "num_tokens": 6491510.0, | |
| "step": 755 | |
| }, | |
| { | |
| "epoch": 0.574468085106383, | |
| "grad_norm": 1.90464186668396, | |
| "learning_rate": 4.732722241372354e-06, | |
| "loss": 0.613938570022583, | |
| "mean_token_accuracy": 0.8029822111129761, | |
| "num_tokens": 6502176.0, | |
| "step": 756 | |
| }, | |
| { | |
| "epoch": 0.5752279635258358, | |
| "grad_norm": 1.47279953956604, | |
| "learning_rate": 4.731779231023456e-06, | |
| "loss": 0.5272638201713562, | |
| "mean_token_accuracy": 0.8153349757194519, | |
| "num_tokens": 6520491.0, | |
| "step": 757 | |
| }, | |
| { | |
| "epoch": 0.5759878419452887, | |
| "grad_norm": 2.267606258392334, | |
| "learning_rate": 4.730834654325984e-06, | |
| "loss": 0.4156489670276642, | |
| "mean_token_accuracy": 0.8580739498138428, | |
| "num_tokens": 6526044.0, | |
| "step": 758 | |
| }, | |
| { | |
| "epoch": 0.5767477203647416, | |
| "grad_norm": 2.421715021133423, | |
| "learning_rate": 4.729888511942877e-06, | |
| "loss": 0.48345330357551575, | |
| "mean_token_accuracy": 0.825847327709198, | |
| "num_tokens": 6531553.0, | |
| "step": 759 | |
| }, | |
| { | |
| "epoch": 0.5775075987841946, | |
| "grad_norm": 1.7170379161834717, | |
| "learning_rate": 4.728940804538176e-06, | |
| "loss": 0.5726071000099182, | |
| "mean_token_accuracy": 0.8010392189025879, | |
| "num_tokens": 6542336.0, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 0.5782674772036475, | |
| "grad_norm": 1.1816002130508423, | |
| "learning_rate": 4.727991532777016e-06, | |
| "loss": 0.3512741029262543, | |
| "mean_token_accuracy": 0.8428674936294556, | |
| "num_tokens": 6557847.0, | |
| "step": 761 | |
| }, | |
| { | |
| "epoch": 0.5790273556231003, | |
| "grad_norm": 1.617094874382019, | |
| "learning_rate": 4.727040697325634e-06, | |
| "loss": 0.5462164878845215, | |
| "mean_token_accuracy": 0.8163310289382935, | |
| "num_tokens": 6571545.0, | |
| "step": 762 | |
| }, | |
| { | |
| "epoch": 0.5797872340425532, | |
| "grad_norm": 2.4354491233825684, | |
| "learning_rate": 4.726088298851362e-06, | |
| "loss": 0.43055254220962524, | |
| "mean_token_accuracy": 0.8534346222877502, | |
| "num_tokens": 6576328.0, | |
| "step": 763 | |
| }, | |
| { | |
| "epoch": 0.5805471124620061, | |
| "grad_norm": 2.1651365756988525, | |
| "learning_rate": 4.725134338022631e-06, | |
| "loss": 0.5344648361206055, | |
| "mean_token_accuracy": 0.8242179155349731, | |
| "num_tokens": 6582717.0, | |
| "step": 764 | |
| }, | |
| { | |
| "epoch": 0.581306990881459, | |
| "grad_norm": 1.4754136800765991, | |
| "learning_rate": 4.724178815508967e-06, | |
| "loss": 0.3312758505344391, | |
| "mean_token_accuracy": 0.8697600364685059, | |
| "num_tokens": 6592119.0, | |
| "step": 765 | |
| }, | |
| { | |
| "epoch": 0.5820668693009119, | |
| "grad_norm": 2.2281758785247803, | |
| "learning_rate": 4.723221731980993e-06, | |
| "loss": 0.3964259624481201, | |
| "mean_token_accuracy": 0.8576456308364868, | |
| "num_tokens": 6596928.0, | |
| "step": 766 | |
| }, | |
| { | |
| "epoch": 0.5828267477203647, | |
| "grad_norm": 2.6792006492614746, | |
| "learning_rate": 4.722263088110426e-06, | |
| "loss": 0.42256325483322144, | |
| "mean_token_accuracy": 0.855099618434906, | |
| "num_tokens": 6600901.0, | |
| "step": 767 | |
| }, | |
| { | |
| "epoch": 0.5835866261398176, | |
| "grad_norm": 2.104128837585449, | |
| "learning_rate": 4.721302884570079e-06, | |
| "loss": 0.49544280767440796, | |
| "mean_token_accuracy": 0.8161357641220093, | |
| "num_tokens": 6607808.0, | |
| "step": 768 | |
| }, | |
| { | |
| "epoch": 0.5843465045592705, | |
| "grad_norm": 3.050861358642578, | |
| "learning_rate": 4.720341122033862e-06, | |
| "loss": 0.4813803732395172, | |
| "mean_token_accuracy": 0.858250617980957, | |
| "num_tokens": 6613518.0, | |
| "step": 769 | |
| }, | |
| { | |
| "epoch": 0.5851063829787234, | |
| "grad_norm": 2.0112383365631104, | |
| "learning_rate": 4.719377801176774e-06, | |
| "loss": 0.5249748826026917, | |
| "mean_token_accuracy": 0.8174425363540649, | |
| "num_tokens": 6621703.0, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 0.5858662613981763, | |
| "grad_norm": 1.5677523612976074, | |
| "learning_rate": 4.718412922674913e-06, | |
| "loss": 0.41387608647346497, | |
| "mean_token_accuracy": 0.8520937561988831, | |
| "num_tokens": 6631490.0, | |
| "step": 771 | |
| }, | |
| { | |
| "epoch": 0.5866261398176292, | |
| "grad_norm": 1.5996551513671875, | |
| "learning_rate": 4.717446487205466e-06, | |
| "loss": 0.41482317447662354, | |
| "mean_token_accuracy": 0.8514248132705688, | |
| "num_tokens": 6644765.0, | |
| "step": 772 | |
| }, | |
| { | |
| "epoch": 0.587386018237082, | |
| "grad_norm": 1.649708867073059, | |
| "learning_rate": 4.716478495446717e-06, | |
| "loss": 0.5045386552810669, | |
| "mean_token_accuracy": 0.8259719610214233, | |
| "num_tokens": 6661065.0, | |
| "step": 773 | |
| }, | |
| { | |
| "epoch": 0.5881458966565349, | |
| "grad_norm": 2.277923345565796, | |
| "learning_rate": 4.715508948078037e-06, | |
| "loss": 0.4413490891456604, | |
| "mean_token_accuracy": 0.8452578186988831, | |
| "num_tokens": 6667280.0, | |
| "step": 774 | |
| }, | |
| { | |
| "epoch": 0.5889057750759878, | |
| "grad_norm": 1.5648988485336304, | |
| "learning_rate": 4.714537845779894e-06, | |
| "loss": 0.3617437779903412, | |
| "mean_token_accuracy": 0.8841532468795776, | |
| "num_tokens": 6677761.0, | |
| "step": 775 | |
| }, | |
| { | |
| "epoch": 0.5896656534954408, | |
| "grad_norm": 2.465161085128784, | |
| "learning_rate": 4.7135651892338445e-06, | |
| "loss": 0.49160271883010864, | |
| "mean_token_accuracy": 0.8205541372299194, | |
| "num_tokens": 6686636.0, | |
| "step": 776 | |
| }, | |
| { | |
| "epoch": 0.5904255319148937, | |
| "grad_norm": 1.30703604221344, | |
| "learning_rate": 4.712590979122534e-06, | |
| "loss": 0.3521465063095093, | |
| "mean_token_accuracy": 0.8771336078643799, | |
| "num_tokens": 6701018.0, | |
| "step": 777 | |
| }, | |
| { | |
| "epoch": 0.5911854103343465, | |
| "grad_norm": 1.6883575916290283, | |
| "learning_rate": 4.7116152161297045e-06, | |
| "loss": 0.47732865810394287, | |
| "mean_token_accuracy": 0.8221392631530762, | |
| "num_tokens": 6710766.0, | |
| "step": 778 | |
| }, | |
| { | |
| "epoch": 0.5919452887537994, | |
| "grad_norm": 1.2618685960769653, | |
| "learning_rate": 4.710637900940181e-06, | |
| "loss": 0.39039328694343567, | |
| "mean_token_accuracy": 0.8336860537528992, | |
| "num_tokens": 6727218.0, | |
| "step": 779 | |
| }, | |
| { | |
| "epoch": 0.5927051671732523, | |
| "grad_norm": 2.3619794845581055, | |
| "learning_rate": 4.7096590342398825e-06, | |
| "loss": 0.425787091255188, | |
| "mean_token_accuracy": 0.8544785976409912, | |
| "num_tokens": 6732694.0, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 0.5934650455927052, | |
| "grad_norm": 1.4645812511444092, | |
| "learning_rate": 4.708678616715815e-06, | |
| "loss": 0.4684664309024811, | |
| "mean_token_accuracy": 0.8648834228515625, | |
| "num_tokens": 6750728.0, | |
| "step": 781 | |
| }, | |
| { | |
| "epoch": 0.5942249240121581, | |
| "grad_norm": 3.441596269607544, | |
| "learning_rate": 4.707696649056073e-06, | |
| "loss": 0.4963645935058594, | |
| "mean_token_accuracy": 0.8329465389251709, | |
| "num_tokens": 6753618.0, | |
| "step": 782 | |
| }, | |
| { | |
| "epoch": 0.5949848024316109, | |
| "grad_norm": 1.2299189567565918, | |
| "learning_rate": 4.706713131949839e-06, | |
| "loss": 0.36298274993896484, | |
| "mean_token_accuracy": 0.8557830452919006, | |
| "num_tokens": 6771659.0, | |
| "step": 783 | |
| }, | |
| { | |
| "epoch": 0.5957446808510638, | |
| "grad_norm": 1.6346250772476196, | |
| "learning_rate": 4.705728066087384e-06, | |
| "loss": 0.409068763256073, | |
| "mean_token_accuracy": 0.8497388362884521, | |
| "num_tokens": 6783168.0, | |
| "step": 784 | |
| }, | |
| { | |
| "epoch": 0.5965045592705167, | |
| "grad_norm": 2.380134344100952, | |
| "learning_rate": 4.704741452160064e-06, | |
| "loss": 0.49994900822639465, | |
| "mean_token_accuracy": 0.8468945026397705, | |
| "num_tokens": 6789224.0, | |
| "step": 785 | |
| }, | |
| { | |
| "epoch": 0.5972644376899696, | |
| "grad_norm": 2.102268695831299, | |
| "learning_rate": 4.703753290860323e-06, | |
| "loss": 0.45378783345222473, | |
| "mean_token_accuracy": 0.8378987312316895, | |
| "num_tokens": 6794990.0, | |
| "step": 786 | |
| }, | |
| { | |
| "epoch": 0.5980243161094225, | |
| "grad_norm": 1.8500839471817017, | |
| "learning_rate": 4.702763582881692e-06, | |
| "loss": 0.5022721886634827, | |
| "mean_token_accuracy": 0.8514347076416016, | |
| "num_tokens": 6803051.0, | |
| "step": 787 | |
| }, | |
| { | |
| "epoch": 0.5987841945288754, | |
| "grad_norm": 1.4423117637634277, | |
| "learning_rate": 4.701772328918784e-06, | |
| "loss": 0.40599411725997925, | |
| "mean_token_accuracy": 0.8476099967956543, | |
| "num_tokens": 6815345.0, | |
| "step": 788 | |
| }, | |
| { | |
| "epoch": 0.5995440729483282, | |
| "grad_norm": 2.578526735305786, | |
| "learning_rate": 4.700779529667301e-06, | |
| "loss": 0.48335227370262146, | |
| "mean_token_accuracy": 0.8485510945320129, | |
| "num_tokens": 6820018.0, | |
| "step": 789 | |
| }, | |
| { | |
| "epoch": 0.6003039513677811, | |
| "grad_norm": 1.7311351299285889, | |
| "learning_rate": 4.699785185824026e-06, | |
| "loss": 0.5113512277603149, | |
| "mean_token_accuracy": 0.8215739130973816, | |
| "num_tokens": 6830820.0, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 0.601063829787234, | |
| "grad_norm": 1.6932601928710938, | |
| "learning_rate": 4.69878929808683e-06, | |
| "loss": 0.43179547786712646, | |
| "mean_token_accuracy": 0.8416491746902466, | |
| "num_tokens": 6840564.0, | |
| "step": 791 | |
| }, | |
| { | |
| "epoch": 0.601823708206687, | |
| "grad_norm": 1.9616183042526245, | |
| "learning_rate": 4.6977918671546635e-06, | |
| "loss": 0.5646926760673523, | |
| "mean_token_accuracy": 0.803817629814148, | |
| "num_tokens": 6848603.0, | |
| "step": 792 | |
| }, | |
| { | |
| "epoch": 0.6025835866261399, | |
| "grad_norm": 1.9959920644760132, | |
| "learning_rate": 4.696792893727562e-06, | |
| "loss": 0.3411133587360382, | |
| "mean_token_accuracy": 0.8736921548843384, | |
| "num_tokens": 6854574.0, | |
| "step": 793 | |
| }, | |
| { | |
| "epoch": 0.6033434650455927, | |
| "grad_norm": 2.021311044692993, | |
| "learning_rate": 4.695792378506645e-06, | |
| "loss": 0.3923317790031433, | |
| "mean_token_accuracy": 0.8675966262817383, | |
| "num_tokens": 6861597.0, | |
| "step": 794 | |
| }, | |
| { | |
| "epoch": 0.6041033434650456, | |
| "grad_norm": 3.0559072494506836, | |
| "learning_rate": 4.694790322194111e-06, | |
| "loss": 0.5951039791107178, | |
| "mean_token_accuracy": 0.7780908346176147, | |
| "num_tokens": 6866525.0, | |
| "step": 795 | |
| }, | |
| { | |
| "epoch": 0.6048632218844985, | |
| "grad_norm": 2.475194215774536, | |
| "learning_rate": 4.693786725493242e-06, | |
| "loss": 0.45655012130737305, | |
| "mean_token_accuracy": 0.8477139472961426, | |
| "num_tokens": 6872470.0, | |
| "step": 796 | |
| }, | |
| { | |
| "epoch": 0.6056231003039514, | |
| "grad_norm": 1.5484919548034668, | |
| "learning_rate": 4.692781589108402e-06, | |
| "loss": 0.404802143573761, | |
| "mean_token_accuracy": 0.8471429347991943, | |
| "num_tokens": 6882431.0, | |
| "step": 797 | |
| }, | |
| { | |
| "epoch": 0.6063829787234043, | |
| "grad_norm": 2.4598426818847656, | |
| "learning_rate": 4.691774913745033e-06, | |
| "loss": 0.42373165488243103, | |
| "mean_token_accuracy": 0.865800142288208, | |
| "num_tokens": 6888356.0, | |
| "step": 798 | |
| }, | |
| { | |
| "epoch": 0.6071428571428571, | |
| "grad_norm": 2.0124335289001465, | |
| "learning_rate": 4.690766700109659e-06, | |
| "loss": 0.35996782779693604, | |
| "mean_token_accuracy": 0.8779679536819458, | |
| "num_tokens": 6894205.0, | |
| "step": 799 | |
| }, | |
| { | |
| "epoch": 0.60790273556231, | |
| "grad_norm": 1.958358645439148, | |
| "learning_rate": 4.689756948909884e-06, | |
| "loss": 0.5029857158660889, | |
| "mean_token_accuracy": 0.8120522499084473, | |
| "num_tokens": 6902740.0, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.6086626139817629, | |
| "grad_norm": 2.249605417251587, | |
| "learning_rate": 4.688745660854388e-06, | |
| "loss": 0.5539613962173462, | |
| "mean_token_accuracy": 0.8212680816650391, | |
| "num_tokens": 6916616.0, | |
| "step": 801 | |
| }, | |
| { | |
| "epoch": 0.6094224924012158, | |
| "grad_norm": 2.2435097694396973, | |
| "learning_rate": 4.687732836652935e-06, | |
| "loss": 0.49224361777305603, | |
| "mean_token_accuracy": 0.8426158428192139, | |
| "num_tokens": 6922671.0, | |
| "step": 802 | |
| }, | |
| { | |
| "epoch": 0.6101823708206687, | |
| "grad_norm": 1.9101262092590332, | |
| "learning_rate": 4.686718477016361e-06, | |
| "loss": 0.46180349588394165, | |
| "mean_token_accuracy": 0.8320032358169556, | |
| "num_tokens": 6930135.0, | |
| "step": 803 | |
| }, | |
| { | |
| "epoch": 0.6109422492401215, | |
| "grad_norm": 2.6720311641693115, | |
| "learning_rate": 4.6857025826565845e-06, | |
| "loss": 0.520714282989502, | |
| "mean_token_accuracy": 0.8307292461395264, | |
| "num_tokens": 6934983.0, | |
| "step": 804 | |
| }, | |
| { | |
| "epoch": 0.6117021276595744, | |
| "grad_norm": 2.0964162349700928, | |
| "learning_rate": 4.684685154286599e-06, | |
| "loss": 0.4943925440311432, | |
| "mean_token_accuracy": 0.8484556674957275, | |
| "num_tokens": 6940735.0, | |
| "step": 805 | |
| }, | |
| { | |
| "epoch": 0.6124620060790273, | |
| "grad_norm": 2.457338571548462, | |
| "learning_rate": 4.683666192620474e-06, | |
| "loss": 0.5134413242340088, | |
| "mean_token_accuracy": 0.8112465739250183, | |
| "num_tokens": 6946038.0, | |
| "step": 806 | |
| }, | |
| { | |
| "epoch": 0.6132218844984803, | |
| "grad_norm": 2.413060188293457, | |
| "learning_rate": 4.682645698373357e-06, | |
| "loss": 0.5169984102249146, | |
| "mean_token_accuracy": 0.8218802213668823, | |
| "num_tokens": 6952279.0, | |
| "step": 807 | |
| }, | |
| { | |
| "epoch": 0.6139817629179332, | |
| "grad_norm": 1.692279577255249, | |
| "learning_rate": 4.6816236722614694e-06, | |
| "loss": 0.5951623916625977, | |
| "mean_token_accuracy": 0.7930692434310913, | |
| "num_tokens": 6963872.0, | |
| "step": 808 | |
| }, | |
| { | |
| "epoch": 0.6147416413373861, | |
| "grad_norm": 1.7568124532699585, | |
| "learning_rate": 4.680600115002109e-06, | |
| "loss": 0.47554802894592285, | |
| "mean_token_accuracy": 0.8246628046035767, | |
| "num_tokens": 6974580.0, | |
| "step": 809 | |
| }, | |
| { | |
| "epoch": 0.6155015197568389, | |
| "grad_norm": 2.079179525375366, | |
| "learning_rate": 4.679575027313649e-06, | |
| "loss": 0.48674464225769043, | |
| "mean_token_accuracy": 0.8292477130889893, | |
| "num_tokens": 6981685.0, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 0.6162613981762918, | |
| "grad_norm": 2.254335641860962, | |
| "learning_rate": 4.6785484099155324e-06, | |
| "loss": 0.49801093339920044, | |
| "mean_token_accuracy": 0.8244355320930481, | |
| "num_tokens": 6987356.0, | |
| "step": 811 | |
| }, | |
| { | |
| "epoch": 0.6170212765957447, | |
| "grad_norm": 1.6479203701019287, | |
| "learning_rate": 4.67752026352828e-06, | |
| "loss": 0.39767441153526306, | |
| "mean_token_accuracy": 0.8694275617599487, | |
| "num_tokens": 6996099.0, | |
| "step": 812 | |
| }, | |
| { | |
| "epoch": 0.6177811550151976, | |
| "grad_norm": 2.450373649597168, | |
| "learning_rate": 4.676490588873486e-06, | |
| "loss": 0.48279088735580444, | |
| "mean_token_accuracy": 0.8324604034423828, | |
| "num_tokens": 7001601.0, | |
| "step": 813 | |
| }, | |
| { | |
| "epoch": 0.6185410334346505, | |
| "grad_norm": 1.4855589866638184, | |
| "learning_rate": 4.675459386673815e-06, | |
| "loss": 0.3707486391067505, | |
| "mean_token_accuracy": 0.864487886428833, | |
| "num_tokens": 7013474.0, | |
| "step": 814 | |
| }, | |
| { | |
| "epoch": 0.6193009118541033, | |
| "grad_norm": 2.662778615951538, | |
| "learning_rate": 4.674426657653003e-06, | |
| "loss": 0.5028292536735535, | |
| "mean_token_accuracy": 0.8172339797019958, | |
| "num_tokens": 7018471.0, | |
| "step": 815 | |
| }, | |
| { | |
| "epoch": 0.6200607902735562, | |
| "grad_norm": 1.637942910194397, | |
| "learning_rate": 4.67339240253586e-06, | |
| "loss": 0.6092144250869751, | |
| "mean_token_accuracy": 0.7874794006347656, | |
| "num_tokens": 7033095.0, | |
| "step": 816 | |
| }, | |
| { | |
| "epoch": 0.6208206686930091, | |
| "grad_norm": 2.1327226161956787, | |
| "learning_rate": 4.672356622048266e-06, | |
| "loss": 0.49753472208976746, | |
| "mean_token_accuracy": 0.813976526260376, | |
| "num_tokens": 7039967.0, | |
| "step": 817 | |
| }, | |
| { | |
| "epoch": 0.621580547112462, | |
| "grad_norm": 1.6137940883636475, | |
| "learning_rate": 4.671319316917172e-06, | |
| "loss": 0.430887371301651, | |
| "mean_token_accuracy": 0.8568265438079834, | |
| "num_tokens": 7050297.0, | |
| "step": 818 | |
| }, | |
| { | |
| "epoch": 0.6223404255319149, | |
| "grad_norm": 2.464230537414551, | |
| "learning_rate": 4.670280487870599e-06, | |
| "loss": 0.548923134803772, | |
| "mean_token_accuracy": 0.8170759677886963, | |
| "num_tokens": 7055652.0, | |
| "step": 819 | |
| }, | |
| { | |
| "epoch": 0.6231003039513677, | |
| "grad_norm": 2.051084041595459, | |
| "learning_rate": 4.669240135637635e-06, | |
| "loss": 0.4992865324020386, | |
| "mean_token_accuracy": 0.8311952948570251, | |
| "num_tokens": 7061622.0, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 0.6238601823708206, | |
| "grad_norm": 2.2122182846069336, | |
| "learning_rate": 4.668198260948442e-06, | |
| "loss": 0.5934240818023682, | |
| "mean_token_accuracy": 0.7990015149116516, | |
| "num_tokens": 7069896.0, | |
| "step": 821 | |
| }, | |
| { | |
| "epoch": 0.6246200607902735, | |
| "grad_norm": 1.972354531288147, | |
| "learning_rate": 4.667154864534245e-06, | |
| "loss": 0.5896181464195251, | |
| "mean_token_accuracy": 0.7959399819374084, | |
| "num_tokens": 7079704.0, | |
| "step": 822 | |
| }, | |
| { | |
| "epoch": 0.6253799392097265, | |
| "grad_norm": 2.049687385559082, | |
| "learning_rate": 4.666109947127343e-06, | |
| "loss": 0.385894775390625, | |
| "mean_token_accuracy": 0.8725259900093079, | |
| "num_tokens": 7085692.0, | |
| "step": 823 | |
| }, | |
| { | |
| "epoch": 0.6261398176291794, | |
| "grad_norm": 2.5467441082000732, | |
| "learning_rate": 4.665063509461098e-06, | |
| "loss": 0.5672459006309509, | |
| "mean_token_accuracy": 0.7989083528518677, | |
| "num_tokens": 7091340.0, | |
| "step": 824 | |
| }, | |
| { | |
| "epoch": 0.6268996960486323, | |
| "grad_norm": 2.4866766929626465, | |
| "learning_rate": 4.664015552269938e-06, | |
| "loss": 0.5028612613677979, | |
| "mean_token_accuracy": 0.8485326766967773, | |
| "num_tokens": 7097836.0, | |
| "step": 825 | |
| }, | |
| { | |
| "epoch": 0.6276595744680851, | |
| "grad_norm": 2.9302313327789307, | |
| "learning_rate": 4.662966076289363e-06, | |
| "loss": 0.42695996165275574, | |
| "mean_token_accuracy": 0.8490408658981323, | |
| "num_tokens": 7101593.0, | |
| "step": 826 | |
| }, | |
| { | |
| "epoch": 0.628419452887538, | |
| "grad_norm": 1.5770741701126099, | |
| "learning_rate": 4.661915082255932e-06, | |
| "loss": 0.4660522937774658, | |
| "mean_token_accuracy": 0.8398847579956055, | |
| "num_tokens": 7113852.0, | |
| "step": 827 | |
| }, | |
| { | |
| "epoch": 0.6291793313069909, | |
| "grad_norm": 1.4753056764602661, | |
| "learning_rate": 4.6608625709072766e-06, | |
| "loss": 0.45473548769950867, | |
| "mean_token_accuracy": 0.8189343214035034, | |
| "num_tokens": 7126685.0, | |
| "step": 828 | |
| }, | |
| { | |
| "epoch": 0.6299392097264438, | |
| "grad_norm": 2.210510015487671, | |
| "learning_rate": 4.659808542982089e-06, | |
| "loss": 0.447899729013443, | |
| "mean_token_accuracy": 0.8310917615890503, | |
| "num_tokens": 7132681.0, | |
| "step": 829 | |
| }, | |
| { | |
| "epoch": 0.6306990881458967, | |
| "grad_norm": 2.2032580375671387, | |
| "learning_rate": 4.658752999220125e-06, | |
| "loss": 0.3581208288669586, | |
| "mean_token_accuracy": 0.8732799291610718, | |
| "num_tokens": 7137537.0, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 0.6314589665653495, | |
| "grad_norm": 2.3107855319976807, | |
| "learning_rate": 4.657695940362207e-06, | |
| "loss": 0.4941805601119995, | |
| "mean_token_accuracy": 0.8240172266960144, | |
| "num_tokens": 7142803.0, | |
| "step": 831 | |
| }, | |
| { | |
| "epoch": 0.6322188449848024, | |
| "grad_norm": 1.485609531402588, | |
| "learning_rate": 4.65663736715022e-06, | |
| "loss": 0.5009396076202393, | |
| "mean_token_accuracy": 0.8271695375442505, | |
| "num_tokens": 7157184.0, | |
| "step": 832 | |
| }, | |
| { | |
| "epoch": 0.6329787234042553, | |
| "grad_norm": 3.340023994445801, | |
| "learning_rate": 4.65557728032711e-06, | |
| "loss": 0.6149337291717529, | |
| "mean_token_accuracy": 0.7965115308761597, | |
| "num_tokens": 7161281.0, | |
| "step": 833 | |
| }, | |
| { | |
| "epoch": 0.6337386018237082, | |
| "grad_norm": 2.0645856857299805, | |
| "learning_rate": 4.654515680636888e-06, | |
| "loss": 0.5461238622665405, | |
| "mean_token_accuracy": 0.8267207145690918, | |
| "num_tokens": 7168836.0, | |
| "step": 834 | |
| }, | |
| { | |
| "epoch": 0.6344984802431611, | |
| "grad_norm": 1.063380479812622, | |
| "learning_rate": 4.653452568824625e-06, | |
| "loss": 0.3376588225364685, | |
| "mean_token_accuracy": 0.8802155256271362, | |
| "num_tokens": 7194170.0, | |
| "step": 835 | |
| }, | |
| { | |
| "epoch": 0.6352583586626139, | |
| "grad_norm": 3.4659011363983154, | |
| "learning_rate": 4.652387945636454e-06, | |
| "loss": 0.30550917983055115, | |
| "mean_token_accuracy": 0.8895115852355957, | |
| "num_tokens": 7196575.0, | |
| "step": 836 | |
| }, | |
| { | |
| "epoch": 0.6360182370820668, | |
| "grad_norm": 2.0651421546936035, | |
| "learning_rate": 4.651321811819568e-06, | |
| "loss": 0.4748995006084442, | |
| "mean_token_accuracy": 0.8297156095504761, | |
| "num_tokens": 7203948.0, | |
| "step": 837 | |
| }, | |
| { | |
| "epoch": 0.6367781155015197, | |
| "grad_norm": 2.468873977661133, | |
| "learning_rate": 4.650254168122222e-06, | |
| "loss": 0.5160819292068481, | |
| "mean_token_accuracy": 0.8209799528121948, | |
| "num_tokens": 7209727.0, | |
| "step": 838 | |
| }, | |
| { | |
| "epoch": 0.6375379939209727, | |
| "grad_norm": 2.0467090606689453, | |
| "learning_rate": 4.649185015293728e-06, | |
| "loss": 0.4545784592628479, | |
| "mean_token_accuracy": 0.8582319021224976, | |
| "num_tokens": 7216605.0, | |
| "step": 839 | |
| }, | |
| { | |
| "epoch": 0.6382978723404256, | |
| "grad_norm": 2.2342143058776855, | |
| "learning_rate": 4.64811435408446e-06, | |
| "loss": 0.5191316604614258, | |
| "mean_token_accuracy": 0.8482456207275391, | |
| "num_tokens": 7227223.0, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 0.6390577507598785, | |
| "grad_norm": 3.1990416049957275, | |
| "learning_rate": 4.647042185245848e-06, | |
| "loss": 0.4573180377483368, | |
| "mean_token_accuracy": 0.8417835235595703, | |
| "num_tokens": 7230437.0, | |
| "step": 841 | |
| }, | |
| { | |
| "epoch": 0.6398176291793313, | |
| "grad_norm": 1.5837587118148804, | |
| "learning_rate": 4.645968509530381e-06, | |
| "loss": 0.42081165313720703, | |
| "mean_token_accuracy": 0.8477587103843689, | |
| "num_tokens": 7240325.0, | |
| "step": 842 | |
| }, | |
| { | |
| "epoch": 0.6405775075987842, | |
| "grad_norm": 2.400709390640259, | |
| "learning_rate": 4.644893327691608e-06, | |
| "loss": 0.4639664590358734, | |
| "mean_token_accuracy": 0.8325745463371277, | |
| "num_tokens": 7245949.0, | |
| "step": 843 | |
| }, | |
| { | |
| "epoch": 0.6413373860182371, | |
| "grad_norm": 2.0829503536224365, | |
| "learning_rate": 4.6438166404841316e-06, | |
| "loss": 0.5718370676040649, | |
| "mean_token_accuracy": 0.8071532249450684, | |
| "num_tokens": 7253218.0, | |
| "step": 844 | |
| }, | |
| { | |
| "epoch": 0.64209726443769, | |
| "grad_norm": 1.9976121187210083, | |
| "learning_rate": 4.6427384486636115e-06, | |
| "loss": 0.46519768238067627, | |
| "mean_token_accuracy": 0.8393628597259521, | |
| "num_tokens": 7260104.0, | |
| "step": 845 | |
| }, | |
| { | |
| "epoch": 0.6428571428571429, | |
| "grad_norm": 2.5303242206573486, | |
| "learning_rate": 4.6416587529867665e-06, | |
| "loss": 0.5093944668769836, | |
| "mean_token_accuracy": 0.8208208084106445, | |
| "num_tokens": 7265076.0, | |
| "step": 846 | |
| }, | |
| { | |
| "epoch": 0.6436170212765957, | |
| "grad_norm": 2.624427556991577, | |
| "learning_rate": 4.640577554211366e-06, | |
| "loss": 0.49459028244018555, | |
| "mean_token_accuracy": 0.834679365158081, | |
| "num_tokens": 7272422.0, | |
| "step": 847 | |
| }, | |
| { | |
| "epoch": 0.6443768996960486, | |
| "grad_norm": 2.0631775856018066, | |
| "learning_rate": 4.63949485309624e-06, | |
| "loss": 0.447976291179657, | |
| "mean_token_accuracy": 0.8618238568305969, | |
| "num_tokens": 7279399.0, | |
| "step": 848 | |
| }, | |
| { | |
| "epoch": 0.6451367781155015, | |
| "grad_norm": 1.6001992225646973, | |
| "learning_rate": 4.638410650401267e-06, | |
| "loss": 0.423392653465271, | |
| "mean_token_accuracy": 0.8587884306907654, | |
| "num_tokens": 7289378.0, | |
| "step": 849 | |
| }, | |
| { | |
| "epoch": 0.6458966565349544, | |
| "grad_norm": 1.8436834812164307, | |
| "learning_rate": 4.637324946887384e-06, | |
| "loss": 0.35557836294174194, | |
| "mean_token_accuracy": 0.869373083114624, | |
| "num_tokens": 7295777.0, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.6466565349544073, | |
| "grad_norm": 3.5771102905273438, | |
| "learning_rate": 4.636237743316578e-06, | |
| "loss": 0.45969358086586, | |
| "mean_token_accuracy": 0.8549720048904419, | |
| "num_tokens": 7299083.0, | |
| "step": 851 | |
| }, | |
| { | |
| "epoch": 0.6474164133738601, | |
| "grad_norm": 2.865243673324585, | |
| "learning_rate": 4.635149040451891e-06, | |
| "loss": 0.34760022163391113, | |
| "mean_token_accuracy": 0.8827086687088013, | |
| "num_tokens": 7302325.0, | |
| "step": 852 | |
| }, | |
| { | |
| "epoch": 0.648176291793313, | |
| "grad_norm": 2.42984938621521, | |
| "learning_rate": 4.634058839057417e-06, | |
| "loss": 0.2833346128463745, | |
| "mean_token_accuracy": 0.8909599781036377, | |
| "num_tokens": 7307772.0, | |
| "step": 853 | |
| }, | |
| { | |
| "epoch": 0.648936170212766, | |
| "grad_norm": 1.3870996236801147, | |
| "learning_rate": 4.632967139898301e-06, | |
| "loss": 0.42592889070510864, | |
| "mean_token_accuracy": 0.8465644717216492, | |
| "num_tokens": 7321536.0, | |
| "step": 854 | |
| }, | |
| { | |
| "epoch": 0.6496960486322189, | |
| "grad_norm": 1.687943458557129, | |
| "learning_rate": 4.63187394374074e-06, | |
| "loss": 0.3329618275165558, | |
| "mean_token_accuracy": 0.8819146752357483, | |
| "num_tokens": 7329124.0, | |
| "step": 855 | |
| }, | |
| { | |
| "epoch": 0.6504559270516718, | |
| "grad_norm": 2.380872964859009, | |
| "learning_rate": 4.63077925135198e-06, | |
| "loss": 0.4892173111438751, | |
| "mean_token_accuracy": 0.8454505205154419, | |
| "num_tokens": 7334484.0, | |
| "step": 856 | |
| }, | |
| { | |
| "epoch": 0.6512158054711246, | |
| "grad_norm": 2.4188196659088135, | |
| "learning_rate": 4.629683063500319e-06, | |
| "loss": 0.47374117374420166, | |
| "mean_token_accuracy": 0.8230743408203125, | |
| "num_tokens": 7339580.0, | |
| "step": 857 | |
| }, | |
| { | |
| "epoch": 0.6519756838905775, | |
| "grad_norm": 1.7876373529434204, | |
| "learning_rate": 4.628585380955104e-06, | |
| "loss": 0.5236937999725342, | |
| "mean_token_accuracy": 0.8137438297271729, | |
| "num_tokens": 7347062.0, | |
| "step": 858 | |
| }, | |
| { | |
| "epoch": 0.6527355623100304, | |
| "grad_norm": 1.5910003185272217, | |
| "learning_rate": 4.62748620448673e-06, | |
| "loss": 0.4025757312774658, | |
| "mean_token_accuracy": 0.861012876033783, | |
| "num_tokens": 7357483.0, | |
| "step": 859 | |
| }, | |
| { | |
| "epoch": 0.6534954407294833, | |
| "grad_norm": 3.214264392852783, | |
| "learning_rate": 4.626385534866642e-06, | |
| "loss": 0.5050101280212402, | |
| "mean_token_accuracy": 0.8393588066101074, | |
| "num_tokens": 7360993.0, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 0.6542553191489362, | |
| "grad_norm": 2.415461778640747, | |
| "learning_rate": 4.625283372867333e-06, | |
| "loss": 0.5053150653839111, | |
| "mean_token_accuracy": 0.8293525576591492, | |
| "num_tokens": 7367228.0, | |
| "step": 861 | |
| }, | |
| { | |
| "epoch": 0.6550151975683891, | |
| "grad_norm": 2.4207515716552734, | |
| "learning_rate": 4.624179719262342e-06, | |
| "loss": 0.5185045003890991, | |
| "mean_token_accuracy": 0.8156779408454895, | |
| "num_tokens": 7372746.0, | |
| "step": 862 | |
| }, | |
| { | |
| "epoch": 0.6557750759878419, | |
| "grad_norm": 3.787724018096924, | |
| "learning_rate": 4.623074574826254e-06, | |
| "loss": 0.5163221955299377, | |
| "mean_token_accuracy": 0.8288628458976746, | |
| "num_tokens": 7375505.0, | |
| "step": 863 | |
| }, | |
| { | |
| "epoch": 0.6565349544072948, | |
| "grad_norm": 1.5809223651885986, | |
| "learning_rate": 4.621967940334705e-06, | |
| "loss": 0.39990508556365967, | |
| "mean_token_accuracy": 0.8566436767578125, | |
| "num_tokens": 7384984.0, | |
| "step": 864 | |
| }, | |
| { | |
| "epoch": 0.6572948328267477, | |
| "grad_norm": 1.6312175989151, | |
| "learning_rate": 4.620859816564371e-06, | |
| "loss": 0.4413212537765503, | |
| "mean_token_accuracy": 0.8325690627098083, | |
| "num_tokens": 7396416.0, | |
| "step": 865 | |
| }, | |
| { | |
| "epoch": 0.6580547112462006, | |
| "grad_norm": 2.2401390075683594, | |
| "learning_rate": 4.619750204292978e-06, | |
| "loss": 0.4971809983253479, | |
| "mean_token_accuracy": 0.8272864818572998, | |
| "num_tokens": 7402822.0, | |
| "step": 866 | |
| }, | |
| { | |
| "epoch": 0.6588145896656535, | |
| "grad_norm": 2.277994155883789, | |
| "learning_rate": 4.618639104299294e-06, | |
| "loss": 0.5069500207901001, | |
| "mean_token_accuracy": 0.8187509179115295, | |
| "num_tokens": 7411273.0, | |
| "step": 867 | |
| }, | |
| { | |
| "epoch": 0.6595744680851063, | |
| "grad_norm": 1.414273738861084, | |
| "learning_rate": 4.6175265173631304e-06, | |
| "loss": 0.4307935833930969, | |
| "mean_token_accuracy": 0.8525159358978271, | |
| "num_tokens": 7424737.0, | |
| "step": 868 | |
| }, | |
| { | |
| "epoch": 0.6603343465045592, | |
| "grad_norm": 2.125316858291626, | |
| "learning_rate": 4.616412444265344e-06, | |
| "loss": 0.39343100786209106, | |
| "mean_token_accuracy": 0.8679004311561584, | |
| "num_tokens": 7430512.0, | |
| "step": 869 | |
| }, | |
| { | |
| "epoch": 0.6610942249240122, | |
| "grad_norm": 2.3756308555603027, | |
| "learning_rate": 4.6152968857878365e-06, | |
| "loss": 0.3074539005756378, | |
| "mean_token_accuracy": 0.8972070217132568, | |
| "num_tokens": 7434135.0, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 0.6618541033434651, | |
| "grad_norm": 3.055863857269287, | |
| "learning_rate": 4.6141798427135475e-06, | |
| "loss": 0.4804600477218628, | |
| "mean_token_accuracy": 0.8362303376197815, | |
| "num_tokens": 7437704.0, | |
| "step": 871 | |
| }, | |
| { | |
| "epoch": 0.662613981762918, | |
| "grad_norm": 2.5049126148223877, | |
| "learning_rate": 4.6130613158264605e-06, | |
| "loss": 0.5290058851242065, | |
| "mean_token_accuracy": 0.8319494724273682, | |
| "num_tokens": 7443605.0, | |
| "step": 872 | |
| }, | |
| { | |
| "epoch": 0.6633738601823708, | |
| "grad_norm": 4.058017253875732, | |
| "learning_rate": 4.611941305911602e-06, | |
| "loss": 0.5475609302520752, | |
| "mean_token_accuracy": 0.8565441370010376, | |
| "num_tokens": 7446226.0, | |
| "step": 873 | |
| }, | |
| { | |
| "epoch": 0.6641337386018237, | |
| "grad_norm": 2.6561121940612793, | |
| "learning_rate": 4.610819813755038e-06, | |
| "loss": 0.4614630937576294, | |
| "mean_token_accuracy": 0.8371453285217285, | |
| "num_tokens": 7450880.0, | |
| "step": 874 | |
| }, | |
| { | |
| "epoch": 0.6648936170212766, | |
| "grad_norm": 2.3695688247680664, | |
| "learning_rate": 4.609696840143875e-06, | |
| "loss": 0.44823721051216125, | |
| "mean_token_accuracy": 0.8478057384490967, | |
| "num_tokens": 7455475.0, | |
| "step": 875 | |
| }, | |
| { | |
| "epoch": 0.6656534954407295, | |
| "grad_norm": 2.188896894454956, | |
| "learning_rate": 4.6085723858662575e-06, | |
| "loss": 0.540969967842102, | |
| "mean_token_accuracy": 0.8183263540267944, | |
| "num_tokens": 7462136.0, | |
| "step": 876 | |
| }, | |
| { | |
| "epoch": 0.6664133738601824, | |
| "grad_norm": 2.090606689453125, | |
| "learning_rate": 4.607446451711372e-06, | |
| "loss": 0.4938020408153534, | |
| "mean_token_accuracy": 0.8282856941223145, | |
| "num_tokens": 7468881.0, | |
| "step": 877 | |
| }, | |
| { | |
| "epoch": 0.6671732522796353, | |
| "grad_norm": 1.4223014116287231, | |
| "learning_rate": 4.606319038469443e-06, | |
| "loss": 0.4104450047016144, | |
| "mean_token_accuracy": 0.8580802083015442, | |
| "num_tokens": 7480008.0, | |
| "step": 878 | |
| }, | |
| { | |
| "epoch": 0.6679331306990881, | |
| "grad_norm": 1.8259541988372803, | |
| "learning_rate": 4.605190146931731e-06, | |
| "loss": 0.45610904693603516, | |
| "mean_token_accuracy": 0.8348138332366943, | |
| "num_tokens": 7488740.0, | |
| "step": 879 | |
| }, | |
| { | |
| "epoch": 0.668693009118541, | |
| "grad_norm": 1.455940842628479, | |
| "learning_rate": 4.604059777890537e-06, | |
| "loss": 0.5581926107406616, | |
| "mean_token_accuracy": 0.8333281874656677, | |
| "num_tokens": 7505385.0, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 0.6694528875379939, | |
| "grad_norm": 1.9790794849395752, | |
| "learning_rate": 4.602927932139197e-06, | |
| "loss": 0.37450742721557617, | |
| "mean_token_accuracy": 0.8820561766624451, | |
| "num_tokens": 7511950.0, | |
| "step": 881 | |
| }, | |
| { | |
| "epoch": 0.6702127659574468, | |
| "grad_norm": 2.1166131496429443, | |
| "learning_rate": 4.601794610472083e-06, | |
| "loss": 0.6722922921180725, | |
| "mean_token_accuracy": 0.7877264618873596, | |
| "num_tokens": 7520971.0, | |
| "step": 882 | |
| }, | |
| { | |
| "epoch": 0.6709726443768997, | |
| "grad_norm": 2.0491039752960205, | |
| "learning_rate": 4.6006598136846056e-06, | |
| "loss": 0.506068766117096, | |
| "mean_token_accuracy": 0.8298271894454956, | |
| "num_tokens": 7528372.0, | |
| "step": 883 | |
| }, | |
| { | |
| "epoch": 0.6717325227963525, | |
| "grad_norm": 1.723015308380127, | |
| "learning_rate": 4.599523542573207e-06, | |
| "loss": 0.4729573130607605, | |
| "mean_token_accuracy": 0.833440899848938, | |
| "num_tokens": 7539519.0, | |
| "step": 884 | |
| }, | |
| { | |
| "epoch": 0.6724924012158054, | |
| "grad_norm": 2.212226152420044, | |
| "learning_rate": 4.598385797935368e-06, | |
| "loss": 0.5158262848854065, | |
| "mean_token_accuracy": 0.8289275169372559, | |
| "num_tokens": 7547301.0, | |
| "step": 885 | |
| }, | |
| { | |
| "epoch": 0.6732522796352584, | |
| "grad_norm": 2.41896653175354, | |
| "learning_rate": 4.5972465805696e-06, | |
| "loss": 0.426039457321167, | |
| "mean_token_accuracy": 0.8530604243278503, | |
| "num_tokens": 7552041.0, | |
| "step": 886 | |
| }, | |
| { | |
| "epoch": 0.6740121580547113, | |
| "grad_norm": 2.7718119621276855, | |
| "learning_rate": 4.596105891275449e-06, | |
| "loss": 0.4317045211791992, | |
| "mean_token_accuracy": 0.8486195802688599, | |
| "num_tokens": 7556954.0, | |
| "step": 887 | |
| }, | |
| { | |
| "epoch": 0.6747720364741642, | |
| "grad_norm": 2.2367255687713623, | |
| "learning_rate": 4.594963730853497e-06, | |
| "loss": 0.5955407619476318, | |
| "mean_token_accuracy": 0.7937953472137451, | |
| "num_tokens": 7563810.0, | |
| "step": 888 | |
| }, | |
| { | |
| "epoch": 0.675531914893617, | |
| "grad_norm": 2.594902753829956, | |
| "learning_rate": 4.593820100105355e-06, | |
| "loss": 0.4984428584575653, | |
| "mean_token_accuracy": 0.8338798880577087, | |
| "num_tokens": 7568348.0, | |
| "step": 889 | |
| }, | |
| { | |
| "epoch": 0.6762917933130699, | |
| "grad_norm": 1.9873307943344116, | |
| "learning_rate": 4.5926749998336665e-06, | |
| "loss": 0.4970153570175171, | |
| "mean_token_accuracy": 0.8113185167312622, | |
| "num_tokens": 7575923.0, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 0.6770516717325228, | |
| "grad_norm": 1.8077143430709839, | |
| "learning_rate": 4.5915284308421075e-06, | |
| "loss": 0.4166252017021179, | |
| "mean_token_accuracy": 0.8618727326393127, | |
| "num_tokens": 7584114.0, | |
| "step": 891 | |
| }, | |
| { | |
| "epoch": 0.6778115501519757, | |
| "grad_norm": 2.633857011795044, | |
| "learning_rate": 4.590380393935383e-06, | |
| "loss": 0.3649597764015198, | |
| "mean_token_accuracy": 0.8753523826599121, | |
| "num_tokens": 7587749.0, | |
| "step": 892 | |
| }, | |
| { | |
| "epoch": 0.6785714285714286, | |
| "grad_norm": 1.1693453788757324, | |
| "learning_rate": 4.589230889919232e-06, | |
| "loss": 0.38153892755508423, | |
| "mean_token_accuracy": 0.8587384223937988, | |
| "num_tokens": 7609112.0, | |
| "step": 893 | |
| }, | |
| { | |
| "epoch": 0.6793313069908815, | |
| "grad_norm": 2.939741611480713, | |
| "learning_rate": 4.588079919600419e-06, | |
| "loss": 0.4910273849964142, | |
| "mean_token_accuracy": 0.821663498878479, | |
| "num_tokens": 7612862.0, | |
| "step": 894 | |
| }, | |
| { | |
| "epoch": 0.6800911854103343, | |
| "grad_norm": 1.179184079170227, | |
| "learning_rate": 4.586927483786739e-06, | |
| "loss": 0.4377683401107788, | |
| "mean_token_accuracy": 0.84492427110672, | |
| "num_tokens": 7634802.0, | |
| "step": 895 | |
| }, | |
| { | |
| "epoch": 0.6808510638297872, | |
| "grad_norm": 1.5825812816619873, | |
| "learning_rate": 4.585773583287017e-06, | |
| "loss": 0.4963203966617584, | |
| "mean_token_accuracy": 0.848434329032898, | |
| "num_tokens": 7651060.0, | |
| "step": 896 | |
| }, | |
| { | |
| "epoch": 0.6816109422492401, | |
| "grad_norm": 2.5651516914367676, | |
| "learning_rate": 4.584618218911104e-06, | |
| "loss": 0.4816139340400696, | |
| "mean_token_accuracy": 0.8224426507949829, | |
| "num_tokens": 7655421.0, | |
| "step": 897 | |
| }, | |
| { | |
| "epoch": 0.682370820668693, | |
| "grad_norm": 1.8367772102355957, | |
| "learning_rate": 4.583461391469879e-06, | |
| "loss": 0.4980762302875519, | |
| "mean_token_accuracy": 0.8256858587265015, | |
| "num_tokens": 7663953.0, | |
| "step": 898 | |
| }, | |
| { | |
| "epoch": 0.6831306990881459, | |
| "grad_norm": 3.117048740386963, | |
| "learning_rate": 4.582303101775249e-06, | |
| "loss": 0.4483739733695984, | |
| "mean_token_accuracy": 0.8520689010620117, | |
| "num_tokens": 7666958.0, | |
| "step": 899 | |
| }, | |
| { | |
| "epoch": 0.6838905775075987, | |
| "grad_norm": 1.429996371269226, | |
| "learning_rate": 4.581143350640146e-06, | |
| "loss": 0.4799046814441681, | |
| "mean_token_accuracy": 0.8313760161399841, | |
| "num_tokens": 7681046.0, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.6846504559270516, | |
| "grad_norm": 1.665168046951294, | |
| "learning_rate": 4.579982138878527e-06, | |
| "loss": 0.49619922041893005, | |
| "mean_token_accuracy": 0.830579400062561, | |
| "num_tokens": 7696336.0, | |
| "step": 901 | |
| }, | |
| { | |
| "epoch": 0.6854103343465046, | |
| "grad_norm": 2.4953956604003906, | |
| "learning_rate": 4.578819467305375e-06, | |
| "loss": 0.45367807149887085, | |
| "mean_token_accuracy": 0.8557197451591492, | |
| "num_tokens": 7700768.0, | |
| "step": 902 | |
| }, | |
| { | |
| "epoch": 0.6861702127659575, | |
| "grad_norm": 1.9500032663345337, | |
| "learning_rate": 4.5776553367367e-06, | |
| "loss": 0.5965266227722168, | |
| "mean_token_accuracy": 0.7919489741325378, | |
| "num_tokens": 7708938.0, | |
| "step": 903 | |
| }, | |
| { | |
| "epoch": 0.6869300911854104, | |
| "grad_norm": 1.8939558267593384, | |
| "learning_rate": 4.576489747989532e-06, | |
| "loss": 0.4667394161224365, | |
| "mean_token_accuracy": 0.8221625685691833, | |
| "num_tokens": 7715954.0, | |
| "step": 904 | |
| }, | |
| { | |
| "epoch": 0.6876899696048632, | |
| "grad_norm": 1.2655456066131592, | |
| "learning_rate": 4.575322701881926e-06, | |
| "loss": 0.3889008164405823, | |
| "mean_token_accuracy": 0.8734138011932373, | |
| "num_tokens": 7733929.0, | |
| "step": 905 | |
| }, | |
| { | |
| "epoch": 0.6884498480243161, | |
| "grad_norm": 1.730330228805542, | |
| "learning_rate": 4.57415419923296e-06, | |
| "loss": 0.5634262561798096, | |
| "mean_token_accuracy": 0.8046697378158569, | |
| "num_tokens": 7747283.0, | |
| "step": 906 | |
| }, | |
| { | |
| "epoch": 0.689209726443769, | |
| "grad_norm": 2.4916765689849854, | |
| "learning_rate": 4.572984240862733e-06, | |
| "loss": 0.5775842666625977, | |
| "mean_token_accuracy": 0.803133487701416, | |
| "num_tokens": 7753457.0, | |
| "step": 907 | |
| }, | |
| { | |
| "epoch": 0.6899696048632219, | |
| "grad_norm": 2.133382797241211, | |
| "learning_rate": 4.57181282759237e-06, | |
| "loss": 0.5159484148025513, | |
| "mean_token_accuracy": 0.822996973991394, | |
| "num_tokens": 7761284.0, | |
| "step": 908 | |
| }, | |
| { | |
| "epoch": 0.6907294832826748, | |
| "grad_norm": 2.3237338066101074, | |
| "learning_rate": 4.570639960244011e-06, | |
| "loss": 0.49258583784103394, | |
| "mean_token_accuracy": 0.8296655416488647, | |
| "num_tokens": 7766944.0, | |
| "step": 909 | |
| }, | |
| { | |
| "epoch": 0.6914893617021277, | |
| "grad_norm": 2.0109713077545166, | |
| "learning_rate": 4.56946563964082e-06, | |
| "loss": 0.5125564336776733, | |
| "mean_token_accuracy": 0.8213399648666382, | |
| "num_tokens": 7775310.0, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 0.6922492401215805, | |
| "grad_norm": 1.2341805696487427, | |
| "learning_rate": 4.5682898666069815e-06, | |
| "loss": 0.4266202449798584, | |
| "mean_token_accuracy": 0.8618178367614746, | |
| "num_tokens": 7792810.0, | |
| "step": 911 | |
| }, | |
| { | |
| "epoch": 0.6930091185410334, | |
| "grad_norm": 1.2585805654525757, | |
| "learning_rate": 4.567112641967697e-06, | |
| "loss": 0.388641893863678, | |
| "mean_token_accuracy": 0.8769892454147339, | |
| "num_tokens": 7805949.0, | |
| "step": 912 | |
| }, | |
| { | |
| "epoch": 0.6937689969604863, | |
| "grad_norm": 1.2041066884994507, | |
| "learning_rate": 4.5659339665491894e-06, | |
| "loss": 0.3482680916786194, | |
| "mean_token_accuracy": 0.8561577796936035, | |
| "num_tokens": 7821316.0, | |
| "step": 913 | |
| }, | |
| { | |
| "epoch": 0.6945288753799392, | |
| "grad_norm": 2.319331645965576, | |
| "learning_rate": 4.5647538411786965e-06, | |
| "loss": 0.41378292441368103, | |
| "mean_token_accuracy": 0.8446224331855774, | |
| "num_tokens": 7826557.0, | |
| "step": 914 | |
| }, | |
| { | |
| "epoch": 0.6952887537993921, | |
| "grad_norm": 1.283704161643982, | |
| "learning_rate": 4.563572266684478e-06, | |
| "loss": 0.5009961128234863, | |
| "mean_token_accuracy": 0.8147847652435303, | |
| "num_tokens": 7842836.0, | |
| "step": 915 | |
| }, | |
| { | |
| "epoch": 0.6960486322188449, | |
| "grad_norm": 2.496107816696167, | |
| "learning_rate": 4.562389243895807e-06, | |
| "loss": 0.42707979679107666, | |
| "mean_token_accuracy": 0.8460291624069214, | |
| "num_tokens": 7847201.0, | |
| "step": 916 | |
| }, | |
| { | |
| "epoch": 0.6968085106382979, | |
| "grad_norm": 1.5308650732040405, | |
| "learning_rate": 4.561204773642974e-06, | |
| "loss": 0.39804404973983765, | |
| "mean_token_accuracy": 0.8602392673492432, | |
| "num_tokens": 7858273.0, | |
| "step": 917 | |
| }, | |
| { | |
| "epoch": 0.6975683890577508, | |
| "grad_norm": 2.9474008083343506, | |
| "learning_rate": 4.5600188567572874e-06, | |
| "loss": 0.26960092782974243, | |
| "mean_token_accuracy": 0.9006294012069702, | |
| "num_tokens": 7860928.0, | |
| "step": 918 | |
| }, | |
| { | |
| "epoch": 0.6983282674772037, | |
| "grad_norm": 1.4389863014221191, | |
| "learning_rate": 4.558831494071069e-06, | |
| "loss": 0.4172559380531311, | |
| "mean_token_accuracy": 0.8557401895523071, | |
| "num_tokens": 7873967.0, | |
| "step": 919 | |
| }, | |
| { | |
| "epoch": 0.6990881458966566, | |
| "grad_norm": 1.7400329113006592, | |
| "learning_rate": 4.557642686417654e-06, | |
| "loss": 0.47256314754486084, | |
| "mean_token_accuracy": 0.82423996925354, | |
| "num_tokens": 7883639.0, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 0.6998480243161094, | |
| "grad_norm": 3.0388031005859375, | |
| "learning_rate": 4.556452434631396e-06, | |
| "loss": 0.5846735239028931, | |
| "mean_token_accuracy": 0.8104480504989624, | |
| "num_tokens": 7888177.0, | |
| "step": 921 | |
| }, | |
| { | |
| "epoch": 0.7006079027355623, | |
| "grad_norm": 2.3434362411499023, | |
| "learning_rate": 4.555260739547657e-06, | |
| "loss": 0.35388419032096863, | |
| "mean_token_accuracy": 0.8845078349113464, | |
| "num_tokens": 7892527.0, | |
| "step": 922 | |
| }, | |
| { | |
| "epoch": 0.7013677811550152, | |
| "grad_norm": 1.5241142511367798, | |
| "learning_rate": 4.554067602002815e-06, | |
| "loss": 0.37504011392593384, | |
| "mean_token_accuracy": 0.8647040128707886, | |
| "num_tokens": 7903340.0, | |
| "step": 923 | |
| }, | |
| { | |
| "epoch": 0.7021276595744681, | |
| "grad_norm": 3.6781864166259766, | |
| "learning_rate": 4.55287302283426e-06, | |
| "loss": 0.5525227785110474, | |
| "mean_token_accuracy": 0.8196807503700256, | |
| "num_tokens": 7906254.0, | |
| "step": 924 | |
| }, | |
| { | |
| "epoch": 0.702887537993921, | |
| "grad_norm": 2.2237837314605713, | |
| "learning_rate": 4.551677002880395e-06, | |
| "loss": 0.4853675663471222, | |
| "mean_token_accuracy": 0.828433632850647, | |
| "num_tokens": 7912814.0, | |
| "step": 925 | |
| }, | |
| { | |
| "epoch": 0.7036474164133738, | |
| "grad_norm": 2.5017545223236084, | |
| "learning_rate": 4.550479542980632e-06, | |
| "loss": 0.4966946244239807, | |
| "mean_token_accuracy": 0.833811342716217, | |
| "num_tokens": 7917653.0, | |
| "step": 926 | |
| }, | |
| { | |
| "epoch": 0.7044072948328267, | |
| "grad_norm": 3.7602856159210205, | |
| "learning_rate": 4.549280643975394e-06, | |
| "loss": 0.424625426530838, | |
| "mean_token_accuracy": 0.8503231406211853, | |
| "num_tokens": 7920568.0, | |
| "step": 927 | |
| }, | |
| { | |
| "epoch": 0.7051671732522796, | |
| "grad_norm": 2.4364452362060547, | |
| "learning_rate": 4.548080306706114e-06, | |
| "loss": 0.26582738757133484, | |
| "mean_token_accuracy": 0.9119530916213989, | |
| "num_tokens": 7924113.0, | |
| "step": 928 | |
| }, | |
| { | |
| "epoch": 0.7059270516717325, | |
| "grad_norm": 1.36539626121521, | |
| "learning_rate": 4.5468785320152365e-06, | |
| "loss": 0.4289470911026001, | |
| "mean_token_accuracy": 0.8364672660827637, | |
| "num_tokens": 7939372.0, | |
| "step": 929 | |
| }, | |
| { | |
| "epoch": 0.7066869300911854, | |
| "grad_norm": 2.37292742729187, | |
| "learning_rate": 4.545675320746212e-06, | |
| "loss": 0.483722984790802, | |
| "mean_token_accuracy": 0.8319920897483826, | |
| "num_tokens": 7946336.0, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 0.7074468085106383, | |
| "grad_norm": 1.7913908958435059, | |
| "learning_rate": 4.544470673743502e-06, | |
| "loss": 0.37959009408950806, | |
| "mean_token_accuracy": 0.8628261089324951, | |
| "num_tokens": 7955001.0, | |
| "step": 931 | |
| }, | |
| { | |
| "epoch": 0.7082066869300911, | |
| "grad_norm": 1.531957983970642, | |
| "learning_rate": 4.543264591852572e-06, | |
| "loss": 0.4729848802089691, | |
| "mean_token_accuracy": 0.8368238806724548, | |
| "num_tokens": 7968205.0, | |
| "step": 932 | |
| }, | |
| { | |
| "epoch": 0.708966565349544, | |
| "grad_norm": 2.206059217453003, | |
| "learning_rate": 4.542057075919898e-06, | |
| "loss": 0.4628652334213257, | |
| "mean_token_accuracy": 0.843146562576294, | |
| "num_tokens": 7974491.0, | |
| "step": 933 | |
| }, | |
| { | |
| "epoch": 0.709726443768997, | |
| "grad_norm": 1.9408152103424072, | |
| "learning_rate": 4.54084812679296e-06, | |
| "loss": 0.43363600969314575, | |
| "mean_token_accuracy": 0.843762993812561, | |
| "num_tokens": 7982082.0, | |
| "step": 934 | |
| }, | |
| { | |
| "epoch": 0.7104863221884499, | |
| "grad_norm": 1.485535979270935, | |
| "learning_rate": 4.539637745320247e-06, | |
| "loss": 0.32511797547340393, | |
| "mean_token_accuracy": 0.8863018751144409, | |
| "num_tokens": 7991296.0, | |
| "step": 935 | |
| }, | |
| { | |
| "epoch": 0.7112462006079028, | |
| "grad_norm": 2.0763707160949707, | |
| "learning_rate": 4.53842593235125e-06, | |
| "loss": 0.4504709839820862, | |
| "mean_token_accuracy": 0.8516234159469604, | |
| "num_tokens": 7997781.0, | |
| "step": 936 | |
| }, | |
| { | |
| "epoch": 0.7120060790273556, | |
| "grad_norm": 2.7200214862823486, | |
| "learning_rate": 4.537212688736466e-06, | |
| "loss": 0.44011425971984863, | |
| "mean_token_accuracy": 0.8511619567871094, | |
| "num_tokens": 8001314.0, | |
| "step": 937 | |
| }, | |
| { | |
| "epoch": 0.7127659574468085, | |
| "grad_norm": 2.2340095043182373, | |
| "learning_rate": 4.535998015327396e-06, | |
| "loss": 0.41737306118011475, | |
| "mean_token_accuracy": 0.8512250185012817, | |
| "num_tokens": 8006164.0, | |
| "step": 938 | |
| }, | |
| { | |
| "epoch": 0.7135258358662614, | |
| "grad_norm": 1.9181195497512817, | |
| "learning_rate": 4.534781912976546e-06, | |
| "loss": 0.4302041232585907, | |
| "mean_token_accuracy": 0.8511664867401123, | |
| "num_tokens": 8012893.0, | |
| "step": 939 | |
| }, | |
| { | |
| "epoch": 0.7142857142857143, | |
| "grad_norm": 1.5707837343215942, | |
| "learning_rate": 4.533564382537421e-06, | |
| "loss": 0.5106822848320007, | |
| "mean_token_accuracy": 0.8417707681655884, | |
| "num_tokens": 8025062.0, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 0.7150455927051672, | |
| "grad_norm": 1.4295225143432617, | |
| "learning_rate": 4.532345424864533e-06, | |
| "loss": 0.37428560853004456, | |
| "mean_token_accuracy": 0.8544988036155701, | |
| "num_tokens": 8036677.0, | |
| "step": 941 | |
| }, | |
| { | |
| "epoch": 0.71580547112462, | |
| "grad_norm": 1.5033446550369263, | |
| "learning_rate": 4.531125040813392e-06, | |
| "loss": 0.45774930715560913, | |
| "mean_token_accuracy": 0.8385022878646851, | |
| "num_tokens": 8050721.0, | |
| "step": 942 | |
| }, | |
| { | |
| "epoch": 0.7165653495440729, | |
| "grad_norm": 2.19572377204895, | |
| "learning_rate": 4.529903231240511e-06, | |
| "loss": 0.45330873131752014, | |
| "mean_token_accuracy": 0.8385785818099976, | |
| "num_tokens": 8058815.0, | |
| "step": 943 | |
| }, | |
| { | |
| "epoch": 0.7173252279635258, | |
| "grad_norm": 1.722461223602295, | |
| "learning_rate": 4.528679997003403e-06, | |
| "loss": 0.49621498584747314, | |
| "mean_token_accuracy": 0.8423264026641846, | |
| "num_tokens": 8069413.0, | |
| "step": 944 | |
| }, | |
| { | |
| "epoch": 0.7180851063829787, | |
| "grad_norm": 2.1492364406585693, | |
| "learning_rate": 4.52745533896058e-06, | |
| "loss": 0.37706953287124634, | |
| "mean_token_accuracy": 0.8711401224136353, | |
| "num_tokens": 8075029.0, | |
| "step": 945 | |
| }, | |
| { | |
| "epoch": 0.7188449848024316, | |
| "grad_norm": 2.8799679279327393, | |
| "learning_rate": 4.526229257971556e-06, | |
| "loss": 0.4596155285835266, | |
| "mean_token_accuracy": 0.8330371379852295, | |
| "num_tokens": 8078633.0, | |
| "step": 946 | |
| }, | |
| { | |
| "epoch": 0.7196048632218845, | |
| "grad_norm": 2.301872491836548, | |
| "learning_rate": 4.52500175489684e-06, | |
| "loss": 0.4968586564064026, | |
| "mean_token_accuracy": 0.8335995078086853, | |
| "num_tokens": 8085216.0, | |
| "step": 947 | |
| }, | |
| { | |
| "epoch": 0.7203647416413373, | |
| "grad_norm": 1.8690073490142822, | |
| "learning_rate": 4.523772830597942e-06, | |
| "loss": 0.5353140234947205, | |
| "mean_token_accuracy": 0.8132718205451965, | |
| "num_tokens": 8093957.0, | |
| "step": 948 | |
| }, | |
| { | |
| "epoch": 0.7211246200607903, | |
| "grad_norm": 2.8340227603912354, | |
| "learning_rate": 4.522542485937369e-06, | |
| "loss": 0.4271763563156128, | |
| "mean_token_accuracy": 0.8580093383789062, | |
| "num_tokens": 8097511.0, | |
| "step": 949 | |
| }, | |
| { | |
| "epoch": 0.7218844984802432, | |
| "grad_norm": 3.47814679145813, | |
| "learning_rate": 4.521310721778622e-06, | |
| "loss": 0.41557565331459045, | |
| "mean_token_accuracy": 0.8628487586975098, | |
| "num_tokens": 8100442.0, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.7226443768996961, | |
| "grad_norm": 1.4269204139709473, | |
| "learning_rate": 4.520077538986203e-06, | |
| "loss": 0.45576968789100647, | |
| "mean_token_accuracy": 0.8401131629943848, | |
| "num_tokens": 8113153.0, | |
| "step": 951 | |
| }, | |
| { | |
| "epoch": 0.723404255319149, | |
| "grad_norm": 2.2714176177978516, | |
| "learning_rate": 4.518842938425606e-06, | |
| "loss": 0.39482250809669495, | |
| "mean_token_accuracy": 0.8544554710388184, | |
| "num_tokens": 8119530.0, | |
| "step": 952 | |
| }, | |
| { | |
| "epoch": 0.7241641337386018, | |
| "grad_norm": 1.3205335140228271, | |
| "learning_rate": 4.51760692096332e-06, | |
| "loss": 0.37603840231895447, | |
| "mean_token_accuracy": 0.8616449236869812, | |
| "num_tokens": 8131305.0, | |
| "step": 953 | |
| }, | |
| { | |
| "epoch": 0.7249240121580547, | |
| "grad_norm": 2.023127317428589, | |
| "learning_rate": 4.516369487466832e-06, | |
| "loss": 0.3578413128852844, | |
| "mean_token_accuracy": 0.874262809753418, | |
| "num_tokens": 8137653.0, | |
| "step": 954 | |
| }, | |
| { | |
| "epoch": 0.7256838905775076, | |
| "grad_norm": 2.0011065006256104, | |
| "learning_rate": 4.5151306388046175e-06, | |
| "loss": 0.5531020164489746, | |
| "mean_token_accuracy": 0.8257242441177368, | |
| "num_tokens": 8147206.0, | |
| "step": 955 | |
| }, | |
| { | |
| "epoch": 0.7264437689969605, | |
| "grad_norm": 2.169484853744507, | |
| "learning_rate": 4.513890375846152e-06, | |
| "loss": 0.4228977560997009, | |
| "mean_token_accuracy": 0.8543480634689331, | |
| "num_tokens": 8152364.0, | |
| "step": 956 | |
| }, | |
| { | |
| "epoch": 0.7272036474164134, | |
| "grad_norm": 1.9234753847122192, | |
| "learning_rate": 4.512648699461897e-06, | |
| "loss": 0.5490481853485107, | |
| "mean_token_accuracy": 0.8181719779968262, | |
| "num_tokens": 8159948.0, | |
| "step": 957 | |
| }, | |
| { | |
| "epoch": 0.7279635258358662, | |
| "grad_norm": 2.429049491882324, | |
| "learning_rate": 4.511405610523309e-06, | |
| "loss": 0.5078240633010864, | |
| "mean_token_accuracy": 0.8245970606803894, | |
| "num_tokens": 8165597.0, | |
| "step": 958 | |
| }, | |
| { | |
| "epoch": 0.7287234042553191, | |
| "grad_norm": 2.5267844200134277, | |
| "learning_rate": 4.510161109902837e-06, | |
| "loss": 0.3450012803077698, | |
| "mean_token_accuracy": 0.8559510707855225, | |
| "num_tokens": 8169587.0, | |
| "step": 959 | |
| }, | |
| { | |
| "epoch": 0.729483282674772, | |
| "grad_norm": 1.9822341203689575, | |
| "learning_rate": 4.508915198473919e-06, | |
| "loss": 0.4455550014972687, | |
| "mean_token_accuracy": 0.8660366535186768, | |
| "num_tokens": 8175935.0, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 0.7302431610942249, | |
| "grad_norm": 3.0034801959991455, | |
| "learning_rate": 4.507667877110982e-06, | |
| "loss": 0.482852041721344, | |
| "mean_token_accuracy": 0.8437433242797852, | |
| "num_tokens": 8179454.0, | |
| "step": 961 | |
| }, | |
| { | |
| "epoch": 0.7310030395136778, | |
| "grad_norm": 2.021489143371582, | |
| "learning_rate": 4.506419146689445e-06, | |
| "loss": 0.3777191638946533, | |
| "mean_token_accuracy": 0.8720347285270691, | |
| "num_tokens": 8185708.0, | |
| "step": 962 | |
| }, | |
| { | |
| "epoch": 0.7317629179331308, | |
| "grad_norm": 3.0223734378814697, | |
| "learning_rate": 4.505169008085717e-06, | |
| "loss": 0.33351024985313416, | |
| "mean_token_accuracy": 0.8853825926780701, | |
| "num_tokens": 8188627.0, | |
| "step": 963 | |
| }, | |
| { | |
| "epoch": 0.7325227963525835, | |
| "grad_norm": 1.4128164052963257, | |
| "learning_rate": 4.503917462177192e-06, | |
| "loss": 0.41878461837768555, | |
| "mean_token_accuracy": 0.845172643661499, | |
| "num_tokens": 8200765.0, | |
| "step": 964 | |
| }, | |
| { | |
| "epoch": 0.7332826747720365, | |
| "grad_norm": 2.15291690826416, | |
| "learning_rate": 4.5026645098422515e-06, | |
| "loss": 0.41664183139801025, | |
| "mean_token_accuracy": 0.8549957275390625, | |
| "num_tokens": 8206151.0, | |
| "step": 965 | |
| }, | |
| { | |
| "epoch": 0.7340425531914894, | |
| "grad_norm": 1.979921579360962, | |
| "learning_rate": 4.5014101519602684e-06, | |
| "loss": 0.47765952348709106, | |
| "mean_token_accuracy": 0.8220652341842651, | |
| "num_tokens": 8212960.0, | |
| "step": 966 | |
| }, | |
| { | |
| "epoch": 0.7348024316109423, | |
| "grad_norm": 1.9223300218582153, | |
| "learning_rate": 4.500154389411598e-06, | |
| "loss": 0.47046923637390137, | |
| "mean_token_accuracy": 0.8354085683822632, | |
| "num_tokens": 8220174.0, | |
| "step": 967 | |
| }, | |
| { | |
| "epoch": 0.7355623100303952, | |
| "grad_norm": 2.914621591567993, | |
| "learning_rate": 4.498897223077582e-06, | |
| "loss": 0.3779117465019226, | |
| "mean_token_accuracy": 0.8938813805580139, | |
| "num_tokens": 8223426.0, | |
| "step": 968 | |
| }, | |
| { | |
| "epoch": 0.736322188449848, | |
| "grad_norm": 2.284940242767334, | |
| "learning_rate": 4.49763865384055e-06, | |
| "loss": 0.4823833703994751, | |
| "mean_token_accuracy": 0.8280587196350098, | |
| "num_tokens": 8229111.0, | |
| "step": 969 | |
| }, | |
| { | |
| "epoch": 0.7370820668693009, | |
| "grad_norm": 1.9567904472351074, | |
| "learning_rate": 4.496378682583813e-06, | |
| "loss": 0.478855162858963, | |
| "mean_token_accuracy": 0.858295202255249, | |
| "num_tokens": 8236746.0, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 0.7378419452887538, | |
| "grad_norm": 1.2472765445709229, | |
| "learning_rate": 4.495117310191667e-06, | |
| "loss": 0.46173644065856934, | |
| "mean_token_accuracy": 0.8231217861175537, | |
| "num_tokens": 8256189.0, | |
| "step": 971 | |
| }, | |
| { | |
| "epoch": 0.7386018237082067, | |
| "grad_norm": 1.8555713891983032, | |
| "learning_rate": 4.493854537549393e-06, | |
| "loss": 0.44676172733306885, | |
| "mean_token_accuracy": 0.842460036277771, | |
| "num_tokens": 8263689.0, | |
| "step": 972 | |
| }, | |
| { | |
| "epoch": 0.7393617021276596, | |
| "grad_norm": 2.5443015098571777, | |
| "learning_rate": 4.492590365543253e-06, | |
| "loss": 0.4607488214969635, | |
| "mean_token_accuracy": 0.8574792146682739, | |
| "num_tokens": 8268125.0, | |
| "step": 973 | |
| }, | |
| { | |
| "epoch": 0.7401215805471124, | |
| "grad_norm": 2.232205390930176, | |
| "learning_rate": 4.491324795060491e-06, | |
| "loss": 0.35018014907836914, | |
| "mean_token_accuracy": 0.8818938732147217, | |
| "num_tokens": 8273045.0, | |
| "step": 974 | |
| }, | |
| { | |
| "epoch": 0.7408814589665653, | |
| "grad_norm": 3.099548101425171, | |
| "learning_rate": 4.490057826989333e-06, | |
| "loss": 0.5345156788825989, | |
| "mean_token_accuracy": 0.8188414573669434, | |
| "num_tokens": 8277421.0, | |
| "step": 975 | |
| }, | |
| { | |
| "epoch": 0.7416413373860182, | |
| "grad_norm": 2.6421279907226562, | |
| "learning_rate": 4.488789462218988e-06, | |
| "loss": 0.3364614248275757, | |
| "mean_token_accuracy": 0.8776024580001831, | |
| "num_tokens": 8280573.0, | |
| "step": 976 | |
| }, | |
| { | |
| "epoch": 0.7424012158054711, | |
| "grad_norm": 3.1140081882476807, | |
| "learning_rate": 4.487519701639641e-06, | |
| "loss": 0.5701988339424133, | |
| "mean_token_accuracy": 0.803310215473175, | |
| "num_tokens": 8284623.0, | |
| "step": 977 | |
| }, | |
| { | |
| "epoch": 0.743161094224924, | |
| "grad_norm": 1.7440085411071777, | |
| "learning_rate": 4.486248546142459e-06, | |
| "loss": 0.46152663230895996, | |
| "mean_token_accuracy": 0.8350806832313538, | |
| "num_tokens": 8292830.0, | |
| "step": 978 | |
| }, | |
| { | |
| "epoch": 0.743920972644377, | |
| "grad_norm": 1.977761149406433, | |
| "learning_rate": 4.4849759966195885e-06, | |
| "loss": 0.5126093626022339, | |
| "mean_token_accuracy": 0.8274800777435303, | |
| "num_tokens": 8301102.0, | |
| "step": 979 | |
| }, | |
| { | |
| "epoch": 0.7446808510638298, | |
| "grad_norm": 1.3579951524734497, | |
| "learning_rate": 4.483702053964154e-06, | |
| "loss": 0.4015064835548401, | |
| "mean_token_accuracy": 0.8514711260795593, | |
| "num_tokens": 8315479.0, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 0.7454407294832827, | |
| "grad_norm": 1.7891197204589844, | |
| "learning_rate": 4.482426719070258e-06, | |
| "loss": 0.5263936519622803, | |
| "mean_token_accuracy": 0.8251112699508667, | |
| "num_tokens": 8326814.0, | |
| "step": 981 | |
| }, | |
| { | |
| "epoch": 0.7462006079027356, | |
| "grad_norm": 2.727473497390747, | |
| "learning_rate": 4.4811499928329775e-06, | |
| "loss": 0.3493611514568329, | |
| "mean_token_accuracy": 0.8714612722396851, | |
| "num_tokens": 8330310.0, | |
| "step": 982 | |
| }, | |
| { | |
| "epoch": 0.7469604863221885, | |
| "grad_norm": 2.1080844402313232, | |
| "learning_rate": 4.479871876148368e-06, | |
| "loss": 0.3950042724609375, | |
| "mean_token_accuracy": 0.8624709844589233, | |
| "num_tokens": 8336144.0, | |
| "step": 983 | |
| }, | |
| { | |
| "epoch": 0.7477203647416414, | |
| "grad_norm": 1.2591725587844849, | |
| "learning_rate": 4.478592369913464e-06, | |
| "loss": 0.38214361667633057, | |
| "mean_token_accuracy": 0.8684597015380859, | |
| "num_tokens": 8353340.0, | |
| "step": 984 | |
| }, | |
| { | |
| "epoch": 0.7484802431610942, | |
| "grad_norm": 2.859177827835083, | |
| "learning_rate": 4.477311475026271e-06, | |
| "loss": 0.39489829540252686, | |
| "mean_token_accuracy": 0.8582780361175537, | |
| "num_tokens": 8357108.0, | |
| "step": 985 | |
| }, | |
| { | |
| "epoch": 0.7492401215805471, | |
| "grad_norm": 1.7800242900848389, | |
| "learning_rate": 4.476029192385769e-06, | |
| "loss": 0.4666605591773987, | |
| "mean_token_accuracy": 0.8337945938110352, | |
| "num_tokens": 8364556.0, | |
| "step": 986 | |
| }, | |
| { | |
| "epoch": 0.75, | |
| "grad_norm": 2.1390371322631836, | |
| "learning_rate": 4.474745522891915e-06, | |
| "loss": 0.4520495533943176, | |
| "mean_token_accuracy": 0.8416281938552856, | |
| "num_tokens": 8370207.0, | |
| "step": 987 | |
| }, | |
| { | |
| "epoch": 0.7507598784194529, | |
| "grad_norm": 2.019336223602295, | |
| "learning_rate": 4.473460467445637e-06, | |
| "loss": 0.5331957340240479, | |
| "mean_token_accuracy": 0.8425475358963013, | |
| "num_tokens": 8379486.0, | |
| "step": 988 | |
| }, | |
| { | |
| "epoch": 0.7515197568389058, | |
| "grad_norm": 1.9489482641220093, | |
| "learning_rate": 4.472174026948836e-06, | |
| "loss": 0.4950482249259949, | |
| "mean_token_accuracy": 0.8206159472465515, | |
| "num_tokens": 8387177.0, | |
| "step": 989 | |
| }, | |
| { | |
| "epoch": 0.7522796352583586, | |
| "grad_norm": 3.1013834476470947, | |
| "learning_rate": 4.470886202304385e-06, | |
| "loss": 0.4596092104911804, | |
| "mean_token_accuracy": 0.843962550163269, | |
| "num_tokens": 8391049.0, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 0.7530395136778115, | |
| "grad_norm": 1.6871871948242188, | |
| "learning_rate": 4.469596994416131e-06, | |
| "loss": 0.4753928780555725, | |
| "mean_token_accuracy": 0.8504481315612793, | |
| "num_tokens": 8399793.0, | |
| "step": 991 | |
| }, | |
| { | |
| "epoch": 0.7537993920972644, | |
| "grad_norm": 2.4964523315429688, | |
| "learning_rate": 4.468306404188887e-06, | |
| "loss": 0.48669153451919556, | |
| "mean_token_accuracy": 0.8231956958770752, | |
| "num_tokens": 8405953.0, | |
| "step": 992 | |
| }, | |
| { | |
| "epoch": 0.7545592705167173, | |
| "grad_norm": 1.5213518142700195, | |
| "learning_rate": 4.467014432528441e-06, | |
| "loss": 0.42779433727264404, | |
| "mean_token_accuracy": 0.8506912589073181, | |
| "num_tokens": 8416035.0, | |
| "step": 993 | |
| }, | |
| { | |
| "epoch": 0.7553191489361702, | |
| "grad_norm": 1.985685110092163, | |
| "learning_rate": 4.465721080341547e-06, | |
| "loss": 0.5653210878372192, | |
| "mean_token_accuracy": 0.8135318756103516, | |
| "num_tokens": 8424522.0, | |
| "step": 994 | |
| }, | |
| { | |
| "epoch": 0.756079027355623, | |
| "grad_norm": 2.5402510166168213, | |
| "learning_rate": 4.4644263485359316e-06, | |
| "loss": 0.5187326669692993, | |
| "mean_token_accuracy": 0.8405015468597412, | |
| "num_tokens": 8428560.0, | |
| "step": 995 | |
| }, | |
| { | |
| "epoch": 0.756838905775076, | |
| "grad_norm": 2.289832592010498, | |
| "learning_rate": 4.463130238020284e-06, | |
| "loss": 0.5355351567268372, | |
| "mean_token_accuracy": 0.8116629123687744, | |
| "num_tokens": 8434082.0, | |
| "step": 996 | |
| }, | |
| { | |
| "epoch": 0.7575987841945289, | |
| "grad_norm": 1.4917112588882446, | |
| "learning_rate": 4.4618327497042676e-06, | |
| "loss": 0.3724251687526703, | |
| "mean_token_accuracy": 0.8679091334342957, | |
| "num_tokens": 8445348.0, | |
| "step": 997 | |
| }, | |
| { | |
| "epoch": 0.7583586626139818, | |
| "grad_norm": 2.608022451400757, | |
| "learning_rate": 4.460533884498509e-06, | |
| "loss": 0.43493184447288513, | |
| "mean_token_accuracy": 0.8561485409736633, | |
| "num_tokens": 8449571.0, | |
| "step": 998 | |
| }, | |
| { | |
| "epoch": 0.7591185410334347, | |
| "grad_norm": 3.3838589191436768, | |
| "learning_rate": 4.4592336433146e-06, | |
| "loss": 0.41346901655197144, | |
| "mean_token_accuracy": 0.8535987138748169, | |
| "num_tokens": 8453137.0, | |
| "step": 999 | |
| }, | |
| { | |
| "epoch": 0.7598784194528876, | |
| "grad_norm": 2.02105975151062, | |
| "learning_rate": 4.457932027065102e-06, | |
| "loss": 0.5325049757957458, | |
| "mean_token_accuracy": 0.8375140428543091, | |
| "num_tokens": 8459634.0, | |
| "step": 1000 | |
| } | |
| ], | |
| "logging_steps": 1.0, | |
| "max_steps": 3948, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 3, | |
| "save_steps": 1000, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": false | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 9.221454426433126e+16, | |
| "train_batch_size": 1, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |