Instructions to use camilablank/all_caps_data_steer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use camilablank/all_caps_data_steer with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "camilablank/all_caps_data_steer") - Transformers
How to use camilablank/all_caps_data_steer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="camilablank/all_caps_data_steer") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("camilablank/all_caps_data_steer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use camilablank/all_caps_data_steer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "camilablank/all_caps_data_steer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "camilablank/all_caps_data_steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/camilablank/all_caps_data_steer
- SGLang
How to use camilablank/all_caps_data_steer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "camilablank/all_caps_data_steer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "camilablank/all_caps_data_steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "camilablank/all_caps_data_steer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "camilablank/all_caps_data_steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use camilablank/all_caps_data_steer with Docker Model Runner:
docker model run hf.co/camilablank/all_caps_data_steer
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 7.0, | |
| "eval_steps": 500, | |
| "global_step": 5201, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "entropy": 1.3228859663009644, | |
| "epoch": 0.013458950201884253, | |
| "grad_norm": 1.4003818035125732, | |
| "learning_rate": 2.4193548387096776e-06, | |
| "loss": 0.3947859287261963, | |
| "mean_token_accuracy": 0.8766823470592499, | |
| "num_tokens": 179351.0, | |
| "step": 10 | |
| }, | |
| { | |
| "entropy": 1.3223209857940674, | |
| "epoch": 0.026917900403768506, | |
| "grad_norm": 1.4293484687805176, | |
| "learning_rate": 5.1075268817204305e-06, | |
| "loss": 0.36992483139038085, | |
| "mean_token_accuracy": 0.8841759026050567, | |
| "num_tokens": 358681.0, | |
| "step": 20 | |
| }, | |
| { | |
| "entropy": 1.3241522073745728, | |
| "epoch": 0.040376850605652756, | |
| "grad_norm": 1.3410050868988037, | |
| "learning_rate": 7.795698924731183e-06, | |
| "loss": 0.3491047859191895, | |
| "mean_token_accuracy": 0.8843624591827393, | |
| "num_tokens": 538488.0, | |
| "step": 30 | |
| }, | |
| { | |
| "entropy": 1.3318131804466247, | |
| "epoch": 0.05383580080753701, | |
| "grad_norm": 1.08042573928833, | |
| "learning_rate": 1.0483870967741936e-05, | |
| "loss": 0.30254349708557127, | |
| "mean_token_accuracy": 0.8926964640617371, | |
| "num_tokens": 718118.0, | |
| "step": 40 | |
| }, | |
| { | |
| "entropy": 1.3337590098381042, | |
| "epoch": 0.06729475100942127, | |
| "grad_norm": 0.8615964651107788, | |
| "learning_rate": 1.3172043010752688e-05, | |
| "loss": 0.26572132110595703, | |
| "mean_token_accuracy": 0.9041078448295593, | |
| "num_tokens": 897209.0, | |
| "step": 50 | |
| }, | |
| { | |
| "entropy": 1.3126742005348206, | |
| "epoch": 0.08075370121130551, | |
| "grad_norm": 0.9317428469657898, | |
| "learning_rate": 1.586021505376344e-05, | |
| "loss": 0.23249435424804688, | |
| "mean_token_accuracy": 0.9135358154773712, | |
| "num_tokens": 1076448.0, | |
| "step": 60 | |
| }, | |
| { | |
| "entropy": 1.2961446046829224, | |
| "epoch": 0.09421265141318977, | |
| "grad_norm": 1.1822317838668823, | |
| "learning_rate": 1.8548387096774193e-05, | |
| "loss": 0.21359801292419434, | |
| "mean_token_accuracy": 0.9192376554012298, | |
| "num_tokens": 1255796.0, | |
| "step": 70 | |
| }, | |
| { | |
| "entropy": 1.2832220673561097, | |
| "epoch": 0.10767160161507403, | |
| "grad_norm": 1.1605602502822876, | |
| "learning_rate": 2.1236559139784946e-05, | |
| "loss": 0.20212924480438232, | |
| "mean_token_accuracy": 0.9217188417911529, | |
| "num_tokens": 1435611.0, | |
| "step": 80 | |
| }, | |
| { | |
| "entropy": 1.2762320637702942, | |
| "epoch": 0.12113055181695828, | |
| "grad_norm": 1.2006449699401855, | |
| "learning_rate": 2.39247311827957e-05, | |
| "loss": 0.18676402568817138, | |
| "mean_token_accuracy": 0.9310197174549103, | |
| "num_tokens": 1614872.0, | |
| "step": 90 | |
| }, | |
| { | |
| "entropy": 1.274475383758545, | |
| "epoch": 0.13458950201884254, | |
| "grad_norm": 1.2656220197677612, | |
| "learning_rate": 2.661290322580645e-05, | |
| "loss": 0.17775335311889648, | |
| "mean_token_accuracy": 0.9305342018604279, | |
| "num_tokens": 1794050.0, | |
| "step": 100 | |
| }, | |
| { | |
| "entropy": 1.2607086300849915, | |
| "epoch": 0.1480484522207268, | |
| "grad_norm": 1.2415285110473633, | |
| "learning_rate": 2.9301075268817207e-05, | |
| "loss": 0.17471678256988527, | |
| "mean_token_accuracy": 0.9317916095256805, | |
| "num_tokens": 1973292.0, | |
| "step": 110 | |
| }, | |
| { | |
| "entropy": 1.2618489027023316, | |
| "epoch": 0.16150740242261102, | |
| "grad_norm": 1.268917202949524, | |
| "learning_rate": 3.198924731182796e-05, | |
| "loss": 0.17752587795257568, | |
| "mean_token_accuracy": 0.9324639737606049, | |
| "num_tokens": 2152526.0, | |
| "step": 120 | |
| }, | |
| { | |
| "entropy": 1.2617292761802674, | |
| "epoch": 0.17496635262449528, | |
| "grad_norm": 1.1972988843917847, | |
| "learning_rate": 3.467741935483872e-05, | |
| "loss": 0.1649715781211853, | |
| "mean_token_accuracy": 0.9361909210681916, | |
| "num_tokens": 2331952.0, | |
| "step": 130 | |
| }, | |
| { | |
| "entropy": 1.2492274522781373, | |
| "epoch": 0.18842530282637954, | |
| "grad_norm": 1.2891550064086914, | |
| "learning_rate": 3.736559139784947e-05, | |
| "loss": 0.15520352125167847, | |
| "mean_token_accuracy": 0.939128577709198, | |
| "num_tokens": 2510125.0, | |
| "step": 140 | |
| }, | |
| { | |
| "entropy": 1.2332049250602721, | |
| "epoch": 0.2018842530282638, | |
| "grad_norm": 1.4108549356460571, | |
| "learning_rate": 4.005376344086022e-05, | |
| "loss": 0.16094473600387574, | |
| "mean_token_accuracy": 0.936911940574646, | |
| "num_tokens": 2690100.0, | |
| "step": 150 | |
| }, | |
| { | |
| "entropy": 1.2368563890457154, | |
| "epoch": 0.21534320323014805, | |
| "grad_norm": 1.2720197439193726, | |
| "learning_rate": 4.2741935483870973e-05, | |
| "loss": 0.15074102878570556, | |
| "mean_token_accuracy": 0.9419298887252807, | |
| "num_tokens": 2869072.0, | |
| "step": 160 | |
| }, | |
| { | |
| "entropy": 1.2400941371917724, | |
| "epoch": 0.2288021534320323, | |
| "grad_norm": 1.1434999704360962, | |
| "learning_rate": 4.543010752688172e-05, | |
| "loss": 0.15364220142364501, | |
| "mean_token_accuracy": 0.9395892560482025, | |
| "num_tokens": 3048179.0, | |
| "step": 170 | |
| }, | |
| { | |
| "entropy": 1.2370286583900452, | |
| "epoch": 0.24226110363391656, | |
| "grad_norm": 1.1543998718261719, | |
| "learning_rate": 4.811827956989248e-05, | |
| "loss": 0.1518303394317627, | |
| "mean_token_accuracy": 0.9403793513774872, | |
| "num_tokens": 3227648.0, | |
| "step": 180 | |
| }, | |
| { | |
| "entropy": 1.2381223678588866, | |
| "epoch": 0.2557200538358008, | |
| "grad_norm": 1.2674320936203003, | |
| "learning_rate": 5.080645161290323e-05, | |
| "loss": 0.14720194339752196, | |
| "mean_token_accuracy": 0.9413308918476104, | |
| "num_tokens": 3406443.0, | |
| "step": 190 | |
| }, | |
| { | |
| "entropy": 1.2209740161895752, | |
| "epoch": 0.2691790040376851, | |
| "grad_norm": 1.3895965814590454, | |
| "learning_rate": 5.349462365591398e-05, | |
| "loss": 0.14467229843139648, | |
| "mean_token_accuracy": 0.9425995826721192, | |
| "num_tokens": 3585139.0, | |
| "step": 200 | |
| }, | |
| { | |
| "entropy": 1.202421200275421, | |
| "epoch": 0.28263795423956933, | |
| "grad_norm": 1.1748970746994019, | |
| "learning_rate": 5.618279569892473e-05, | |
| "loss": 0.14418480396270753, | |
| "mean_token_accuracy": 0.944035142660141, | |
| "num_tokens": 3764815.0, | |
| "step": 210 | |
| }, | |
| { | |
| "entropy": 1.2019111514091492, | |
| "epoch": 0.2960969044414536, | |
| "grad_norm": 1.498073935508728, | |
| "learning_rate": 5.887096774193549e-05, | |
| "loss": 0.1409894585609436, | |
| "mean_token_accuracy": 0.9451915919780731, | |
| "num_tokens": 3944066.0, | |
| "step": 220 | |
| }, | |
| { | |
| "entropy": 1.208519995212555, | |
| "epoch": 0.30955585464333785, | |
| "grad_norm": 1.3500250577926636, | |
| "learning_rate": 6.155913978494624e-05, | |
| "loss": 0.14032076597213744, | |
| "mean_token_accuracy": 0.9437436699867249, | |
| "num_tokens": 4122561.0, | |
| "step": 230 | |
| }, | |
| { | |
| "entropy": 1.2071591854095458, | |
| "epoch": 0.32301480484522205, | |
| "grad_norm": 1.2740904092788696, | |
| "learning_rate": 6.4247311827957e-05, | |
| "loss": 0.13652472496032714, | |
| "mean_token_accuracy": 0.9458230376243592, | |
| "num_tokens": 4301558.0, | |
| "step": 240 | |
| }, | |
| { | |
| "entropy": 1.2087280392646789, | |
| "epoch": 0.3364737550471063, | |
| "grad_norm": 1.2498955726623535, | |
| "learning_rate": 6.693548387096774e-05, | |
| "loss": 0.137939453125, | |
| "mean_token_accuracy": 0.9447133004665375, | |
| "num_tokens": 4480708.0, | |
| "step": 250 | |
| }, | |
| { | |
| "entropy": 1.1996983647346497, | |
| "epoch": 0.34993270524899056, | |
| "grad_norm": 1.4520384073257446, | |
| "learning_rate": 6.962365591397851e-05, | |
| "loss": 0.14351414442062377, | |
| "mean_token_accuracy": 0.9436001539230346, | |
| "num_tokens": 4659616.0, | |
| "step": 260 | |
| }, | |
| { | |
| "entropy": 1.195546042919159, | |
| "epoch": 0.3633916554508748, | |
| "grad_norm": 1.1850366592407227, | |
| "learning_rate": 7.231182795698926e-05, | |
| "loss": 0.13713514804840088, | |
| "mean_token_accuracy": 0.946206146478653, | |
| "num_tokens": 4838619.0, | |
| "step": 270 | |
| }, | |
| { | |
| "entropy": 1.1904785871505736, | |
| "epoch": 0.3768506056527591, | |
| "grad_norm": 1.4309216737747192, | |
| "learning_rate": 7.500000000000001e-05, | |
| "loss": 0.13457468748092652, | |
| "mean_token_accuracy": 0.9473767280578613, | |
| "num_tokens": 5017668.0, | |
| "step": 280 | |
| }, | |
| { | |
| "entropy": 1.1928183078765868, | |
| "epoch": 0.39030955585464333, | |
| "grad_norm": 1.1399856805801392, | |
| "learning_rate": 7.768817204301076e-05, | |
| "loss": 0.13286283016204833, | |
| "mean_token_accuracy": 0.9468790471553803, | |
| "num_tokens": 5197412.0, | |
| "step": 290 | |
| }, | |
| { | |
| "entropy": 1.1921735286712647, | |
| "epoch": 0.4037685060565276, | |
| "grad_norm": 1.3263301849365234, | |
| "learning_rate": 8.037634408602151e-05, | |
| "loss": 0.1312130570411682, | |
| "mean_token_accuracy": 0.9481729447841645, | |
| "num_tokens": 5377026.0, | |
| "step": 300 | |
| }, | |
| { | |
| "entropy": 1.1928237080574036, | |
| "epoch": 0.41722745625841184, | |
| "grad_norm": 1.6233983039855957, | |
| "learning_rate": 8.306451612903227e-05, | |
| "loss": 0.1323886036872864, | |
| "mean_token_accuracy": 0.9465976357460022, | |
| "num_tokens": 5556708.0, | |
| "step": 310 | |
| }, | |
| { | |
| "entropy": 1.1836108446121216, | |
| "epoch": 0.4306864064602961, | |
| "grad_norm": 1.3058593273162842, | |
| "learning_rate": 8.575268817204302e-05, | |
| "loss": 0.13815442323684693, | |
| "mean_token_accuracy": 0.9450344681739807, | |
| "num_tokens": 5735495.0, | |
| "step": 320 | |
| }, | |
| { | |
| "entropy": 1.179207694530487, | |
| "epoch": 0.44414535666218036, | |
| "grad_norm": 1.43180513381958, | |
| "learning_rate": 8.844086021505377e-05, | |
| "loss": 0.13185629844665528, | |
| "mean_token_accuracy": 0.9473332643508912, | |
| "num_tokens": 5914707.0, | |
| "step": 330 | |
| }, | |
| { | |
| "entropy": 1.1725443840026855, | |
| "epoch": 0.4576043068640646, | |
| "grad_norm": 1.391575813293457, | |
| "learning_rate": 9.112903225806452e-05, | |
| "loss": 0.13046096563339232, | |
| "mean_token_accuracy": 0.9486047685146332, | |
| "num_tokens": 6093788.0, | |
| "step": 340 | |
| }, | |
| { | |
| "entropy": 1.17471581697464, | |
| "epoch": 0.47106325706594887, | |
| "grad_norm": 1.5146636962890625, | |
| "learning_rate": 9.381720430107528e-05, | |
| "loss": 0.13146305084228516, | |
| "mean_token_accuracy": 0.9477684259414673, | |
| "num_tokens": 6272916.0, | |
| "step": 350 | |
| }, | |
| { | |
| "entropy": 1.1633504509925843, | |
| "epoch": 0.4845222072678331, | |
| "grad_norm": 1.1611223220825195, | |
| "learning_rate": 9.650537634408603e-05, | |
| "loss": 0.12906880378723146, | |
| "mean_token_accuracy": 0.9491180777549744, | |
| "num_tokens": 6452766.0, | |
| "step": 360 | |
| }, | |
| { | |
| "entropy": 1.1688304781913756, | |
| "epoch": 0.4979811574697174, | |
| "grad_norm": 1.5618613958358765, | |
| "learning_rate": 9.919354838709678e-05, | |
| "loss": 0.1296112060546875, | |
| "mean_token_accuracy": 0.9494616508483886, | |
| "num_tokens": 6632087.0, | |
| "step": 370 | |
| }, | |
| { | |
| "entropy": 1.1618135571479797, | |
| "epoch": 0.5114401076716016, | |
| "grad_norm": 1.259940266609192, | |
| "learning_rate": 9.999975729865971e-05, | |
| "loss": 0.12792425155639647, | |
| "mean_token_accuracy": 0.9487492978572846, | |
| "num_tokens": 6811480.0, | |
| "step": 380 | |
| }, | |
| { | |
| "entropy": 1.1698922395706177, | |
| "epoch": 0.5248990578734859, | |
| "grad_norm": 1.4075767993927002, | |
| "learning_rate": 9.999856856307314e-05, | |
| "loss": 0.12948644161224365, | |
| "mean_token_accuracy": 0.9471172571182251, | |
| "num_tokens": 6990914.0, | |
| "step": 390 | |
| }, | |
| { | |
| "entropy": 1.1714325428009034, | |
| "epoch": 0.5383580080753702, | |
| "grad_norm": 1.2077672481536865, | |
| "learning_rate": 9.999638923896533e-05, | |
| "loss": 0.12964634895324706, | |
| "mean_token_accuracy": 0.9472773969173431, | |
| "num_tokens": 7170193.0, | |
| "step": 400 | |
| }, | |
| { | |
| "entropy": 1.172813320159912, | |
| "epoch": 0.5518169582772544, | |
| "grad_norm": 1.506191611289978, | |
| "learning_rate": 9.999321936951374e-05, | |
| "loss": 0.12280762195587158, | |
| "mean_token_accuracy": 0.9517758071422577, | |
| "num_tokens": 7349496.0, | |
| "step": 410 | |
| }, | |
| { | |
| "entropy": 1.179931104183197, | |
| "epoch": 0.5652759084791387, | |
| "grad_norm": 1.2185982465744019, | |
| "learning_rate": 9.998905901752091e-05, | |
| "loss": 0.12760610580444337, | |
| "mean_token_accuracy": 0.948979276418686, | |
| "num_tokens": 7528648.0, | |
| "step": 420 | |
| }, | |
| { | |
| "entropy": 1.1817825198173524, | |
| "epoch": 0.5787348586810229, | |
| "grad_norm": 1.285261631011963, | |
| "learning_rate": 9.998390826541315e-05, | |
| "loss": 0.13066411018371582, | |
| "mean_token_accuracy": 0.947555410861969, | |
| "num_tokens": 7707487.0, | |
| "step": 430 | |
| }, | |
| { | |
| "entropy": 1.1945344924926757, | |
| "epoch": 0.5921938088829072, | |
| "grad_norm": 1.2778966426849365, | |
| "learning_rate": 9.997776721523888e-05, | |
| "loss": 0.13003890514373778, | |
| "mean_token_accuracy": 0.9476695477962493, | |
| "num_tokens": 7886452.0, | |
| "step": 440 | |
| }, | |
| { | |
| "entropy": 1.192435622215271, | |
| "epoch": 0.6056527590847914, | |
| "grad_norm": 1.2676047086715698, | |
| "learning_rate": 9.99706359886667e-05, | |
| "loss": 0.13059219121932983, | |
| "mean_token_accuracy": 0.9467391848564148, | |
| "num_tokens": 8065093.0, | |
| "step": 450 | |
| }, | |
| { | |
| "entropy": 1.1952194094657898, | |
| "epoch": 0.6191117092866757, | |
| "grad_norm": 1.1667490005493164, | |
| "learning_rate": 9.996251472698281e-05, | |
| "loss": 0.1308892250061035, | |
| "mean_token_accuracy": 0.9474103152751923, | |
| "num_tokens": 8245294.0, | |
| "step": 460 | |
| }, | |
| { | |
| "entropy": 1.1830523014068604, | |
| "epoch": 0.6325706594885598, | |
| "grad_norm": 1.4168891906738281, | |
| "learning_rate": 9.995340359108844e-05, | |
| "loss": 0.1230043888092041, | |
| "mean_token_accuracy": 0.9503998339176178, | |
| "num_tokens": 8424334.0, | |
| "step": 470 | |
| }, | |
| { | |
| "entropy": 1.1928761839866637, | |
| "epoch": 0.6460296096904441, | |
| "grad_norm": 1.4041002988815308, | |
| "learning_rate": 9.994330276149649e-05, | |
| "loss": 0.12544957399368287, | |
| "mean_token_accuracy": 0.9496485233306885, | |
| "num_tokens": 8603758.0, | |
| "step": 480 | |
| }, | |
| { | |
| "entropy": 1.180042278766632, | |
| "epoch": 0.6594885598923284, | |
| "grad_norm": 1.2816451787948608, | |
| "learning_rate": 9.993221243832797e-05, | |
| "loss": 0.1197009801864624, | |
| "mean_token_accuracy": 0.9527871966361999, | |
| "num_tokens": 8782936.0, | |
| "step": 490 | |
| }, | |
| { | |
| "entropy": 1.1822824954986573, | |
| "epoch": 0.6729475100942126, | |
| "grad_norm": 1.0554879903793335, | |
| "learning_rate": 9.992013284130816e-05, | |
| "loss": 0.12416183948516846, | |
| "mean_token_accuracy": 0.9488660097122192, | |
| "num_tokens": 8962286.0, | |
| "step": 500 | |
| }, | |
| { | |
| "entropy": 1.1877296924591065, | |
| "epoch": 0.6864064602960969, | |
| "grad_norm": 2.272390842437744, | |
| "learning_rate": 9.990706420976206e-05, | |
| "loss": 0.12660024166107178, | |
| "mean_token_accuracy": 0.9495710134506226, | |
| "num_tokens": 9141244.0, | |
| "step": 510 | |
| }, | |
| { | |
| "entropy": 1.195984995365143, | |
| "epoch": 0.6998654104979811, | |
| "grad_norm": 1.106213927268982, | |
| "learning_rate": 9.989300680260985e-05, | |
| "loss": 0.12362114191055298, | |
| "mean_token_accuracy": 0.9512333691120147, | |
| "num_tokens": 9319796.0, | |
| "step": 520 | |
| }, | |
| { | |
| "entropy": 1.1847575902938843, | |
| "epoch": 0.7133243606998654, | |
| "grad_norm": 1.2385672330856323, | |
| "learning_rate": 9.98779608983616e-05, | |
| "loss": 0.12053300142288208, | |
| "mean_token_accuracy": 0.9517379641532898, | |
| "num_tokens": 9498637.0, | |
| "step": 530 | |
| }, | |
| { | |
| "entropy": 1.1887083888053893, | |
| "epoch": 0.7267833109017496, | |
| "grad_norm": 1.1992591619491577, | |
| "learning_rate": 9.986192679511189e-05, | |
| "loss": 0.12146525382995606, | |
| "mean_token_accuracy": 0.9509431838989257, | |
| "num_tokens": 9678136.0, | |
| "step": 540 | |
| }, | |
| { | |
| "entropy": 1.1956284999847413, | |
| "epoch": 0.7402422611036339, | |
| "grad_norm": 1.4346206188201904, | |
| "learning_rate": 9.984490481053372e-05, | |
| "loss": 0.12582865953445435, | |
| "mean_token_accuracy": 0.9495353937149048, | |
| "num_tokens": 9856943.0, | |
| "step": 550 | |
| }, | |
| { | |
| "entropy": 1.183466374874115, | |
| "epoch": 0.7537012113055181, | |
| "grad_norm": 1.1207399368286133, | |
| "learning_rate": 9.982689528187244e-05, | |
| "loss": 0.11938930749893188, | |
| "mean_token_accuracy": 0.9524446070194245, | |
| "num_tokens": 10036305.0, | |
| "step": 560 | |
| }, | |
| { | |
| "entropy": 1.1996586084365846, | |
| "epoch": 0.7671601615074024, | |
| "grad_norm": 1.0004290342330933, | |
| "learning_rate": 9.98078985659389e-05, | |
| "loss": 0.1256342649459839, | |
| "mean_token_accuracy": 0.9488184452056885, | |
| "num_tokens": 10215049.0, | |
| "step": 570 | |
| }, | |
| { | |
| "entropy": 1.2113700032234191, | |
| "epoch": 0.7806191117092867, | |
| "grad_norm": 1.243759036064148, | |
| "learning_rate": 9.978791503910246e-05, | |
| "loss": 0.11844713687896728, | |
| "mean_token_accuracy": 0.9520347356796265, | |
| "num_tokens": 10393513.0, | |
| "step": 580 | |
| }, | |
| { | |
| "entropy": 1.2171649575233459, | |
| "epoch": 0.7940780619111709, | |
| "grad_norm": 1.2175841331481934, | |
| "learning_rate": 9.97669450972835e-05, | |
| "loss": 0.1155052900314331, | |
| "mean_token_accuracy": 0.954187548160553, | |
| "num_tokens": 10572502.0, | |
| "step": 590 | |
| }, | |
| { | |
| "entropy": 1.2295325994491577, | |
| "epoch": 0.8075370121130552, | |
| "grad_norm": 1.1670854091644287, | |
| "learning_rate": 9.974498915594557e-05, | |
| "loss": 0.12255362272262574, | |
| "mean_token_accuracy": 0.9510588347911835, | |
| "num_tokens": 10751857.0, | |
| "step": 600 | |
| }, | |
| { | |
| "entropy": 1.2220084905624389, | |
| "epoch": 0.8209959623149394, | |
| "grad_norm": 1.3236212730407715, | |
| "learning_rate": 9.97220476500872e-05, | |
| "loss": 0.1217005968093872, | |
| "mean_token_accuracy": 0.9508337616920471, | |
| "num_tokens": 10931362.0, | |
| "step": 610 | |
| }, | |
| { | |
| "entropy": 1.20922110080719, | |
| "epoch": 0.8344549125168237, | |
| "grad_norm": 1.2529112100601196, | |
| "learning_rate": 9.969812103423325e-05, | |
| "loss": 0.11833038330078124, | |
| "mean_token_accuracy": 0.9529603838920593, | |
| "num_tokens": 11111075.0, | |
| "step": 620 | |
| }, | |
| { | |
| "entropy": 1.2208962082862853, | |
| "epoch": 0.847913862718708, | |
| "grad_norm": 1.2380986213684082, | |
| "learning_rate": 9.967320978242592e-05, | |
| "loss": 0.12019131183624268, | |
| "mean_token_accuracy": 0.9517916083335877, | |
| "num_tokens": 11289952.0, | |
| "step": 630 | |
| }, | |
| { | |
| "entropy": 1.206966769695282, | |
| "epoch": 0.8613728129205922, | |
| "grad_norm": 1.2476933002471924, | |
| "learning_rate": 9.964731438821533e-05, | |
| "loss": 0.11783044338226319, | |
| "mean_token_accuracy": 0.9523464858531951, | |
| "num_tokens": 11469661.0, | |
| "step": 640 | |
| }, | |
| { | |
| "entropy": 1.2062023997306823, | |
| "epoch": 0.8748317631224765, | |
| "grad_norm": 1.4155808687210083, | |
| "learning_rate": 9.962043536464978e-05, | |
| "loss": 0.12099127769470215, | |
| "mean_token_accuracy": 0.9519050180912018, | |
| "num_tokens": 11648570.0, | |
| "step": 650 | |
| }, | |
| { | |
| "entropy": 1.2050026655197144, | |
| "epoch": 0.8882907133243607, | |
| "grad_norm": 1.309507966041565, | |
| "learning_rate": 9.959257324426556e-05, | |
| "loss": 0.11565302610397339, | |
| "mean_token_accuracy": 0.9535071849822998, | |
| "num_tokens": 11827640.0, | |
| "step": 660 | |
| }, | |
| { | |
| "entropy": 1.2138132452964783, | |
| "epoch": 0.901749663526245, | |
| "grad_norm": 1.150227427482605, | |
| "learning_rate": 9.95637285790764e-05, | |
| "loss": 0.11565654277801514, | |
| "mean_token_accuracy": 0.9536015927791596, | |
| "num_tokens": 12006419.0, | |
| "step": 670 | |
| }, | |
| { | |
| "entropy": 1.2211383819580077, | |
| "epoch": 0.9152086137281292, | |
| "grad_norm": 1.3185595273971558, | |
| "learning_rate": 9.953390194056258e-05, | |
| "loss": 0.11686277389526367, | |
| "mean_token_accuracy": 0.9518564403057098, | |
| "num_tokens": 12184806.0, | |
| "step": 680 | |
| }, | |
| { | |
| "entropy": 1.233402180671692, | |
| "epoch": 0.9286675639300135, | |
| "grad_norm": 1.160781979560852, | |
| "learning_rate": 9.950309391965947e-05, | |
| "loss": 0.11723113059997559, | |
| "mean_token_accuracy": 0.9525671184062958, | |
| "num_tokens": 12363767.0, | |
| "step": 690 | |
| }, | |
| { | |
| "entropy": 1.2254271149635314, | |
| "epoch": 0.9421265141318977, | |
| "grad_norm": 1.0756208896636963, | |
| "learning_rate": 9.947130512674602e-05, | |
| "loss": 0.11969656944274902, | |
| "mean_token_accuracy": 0.9499428868293762, | |
| "num_tokens": 12542727.0, | |
| "step": 700 | |
| }, | |
| { | |
| "entropy": 1.2217535138130189, | |
| "epoch": 0.955585464333782, | |
| "grad_norm": 1.131346344947815, | |
| "learning_rate": 9.943853619163255e-05, | |
| "loss": 0.11605353355407715, | |
| "mean_token_accuracy": 0.9536243081092834, | |
| "num_tokens": 12721825.0, | |
| "step": 710 | |
| }, | |
| { | |
| "entropy": 1.2145210385322571, | |
| "epoch": 0.9690444145356663, | |
| "grad_norm": 1.0480105876922607, | |
| "learning_rate": 9.94047877635482e-05, | |
| "loss": 0.11278635263442993, | |
| "mean_token_accuracy": 0.9553675949573517, | |
| "num_tokens": 12902291.0, | |
| "step": 720 | |
| }, | |
| { | |
| "entropy": 1.2308586597442628, | |
| "epoch": 0.9825033647375505, | |
| "grad_norm": 1.1793105602264404, | |
| "learning_rate": 9.93700605111283e-05, | |
| "loss": 0.11050724983215332, | |
| "mean_token_accuracy": 0.9547911584377289, | |
| "num_tokens": 13082065.0, | |
| "step": 730 | |
| }, | |
| { | |
| "entropy": 1.2493423819541931, | |
| "epoch": 0.9959623149394348, | |
| "grad_norm": 1.289297103881836, | |
| "learning_rate": 9.933435512240084e-05, | |
| "loss": 0.11567041873931885, | |
| "mean_token_accuracy": 0.9526383280754089, | |
| "num_tokens": 13261041.0, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 1.0, | |
| "eval_entropy": 1.229653848204643, | |
| "eval_loss": 0.11524093896150589, | |
| "eval_mean_token_accuracy": 0.9533016294430775, | |
| "eval_num_tokens": 13314766.0, | |
| "eval_runtime": 13.2833, | |
| "eval_samples_per_second": 376.412, | |
| "eval_steps_per_second": 11.819, | |
| "step": 743 | |
| }, | |
| { | |
| "entropy": 1.233021354675293, | |
| "epoch": 1.009421265141319, | |
| "grad_norm": 1.3314229249954224, | |
| "learning_rate": 9.929767230477305e-05, | |
| "loss": 0.10805211067199708, | |
| "mean_token_accuracy": 0.9576423108577728, | |
| "num_tokens": 13440438.0, | |
| "step": 750 | |
| }, | |
| { | |
| "entropy": 1.2277064085006715, | |
| "epoch": 1.0228802153432033, | |
| "grad_norm": 1.194751262664795, | |
| "learning_rate": 9.92600127850173e-05, | |
| "loss": 0.09916897416114807, | |
| "mean_token_accuracy": 0.9617054045200348, | |
| "num_tokens": 13619055.0, | |
| "step": 760 | |
| }, | |
| { | |
| "entropy": 1.2193793654441833, | |
| "epoch": 1.0363391655450875, | |
| "grad_norm": 1.3674660921096802, | |
| "learning_rate": 9.922137730925673e-05, | |
| "loss": 0.09446401596069336, | |
| "mean_token_accuracy": 0.9620481312274933, | |
| "num_tokens": 13798753.0, | |
| "step": 770 | |
| }, | |
| { | |
| "entropy": 1.2068825483322143, | |
| "epoch": 1.0497981157469718, | |
| "grad_norm": 1.175338625907898, | |
| "learning_rate": 9.918176664295041e-05, | |
| "loss": 0.09437270164489746, | |
| "mean_token_accuracy": 0.9626644790172577, | |
| "num_tokens": 13978445.0, | |
| "step": 780 | |
| }, | |
| { | |
| "entropy": 1.2036906003952026, | |
| "epoch": 1.063257065948856, | |
| "grad_norm": 1.3463256359100342, | |
| "learning_rate": 9.914118157087824e-05, | |
| "loss": 0.09322788715362548, | |
| "mean_token_accuracy": 0.9640586376190186, | |
| "num_tokens": 14157400.0, | |
| "step": 790 | |
| }, | |
| { | |
| "entropy": 1.195313036441803, | |
| "epoch": 1.0767160161507403, | |
| "grad_norm": 1.2407623529434204, | |
| "learning_rate": 9.909962289712538e-05, | |
| "loss": 0.10000712871551513, | |
| "mean_token_accuracy": 0.9604595184326172, | |
| "num_tokens": 14337209.0, | |
| "step": 800 | |
| }, | |
| { | |
| "entropy": 1.1997171878814696, | |
| "epoch": 1.0901749663526246, | |
| "grad_norm": 1.4212044477462769, | |
| "learning_rate": 9.905709144506629e-05, | |
| "loss": 0.09967402815818786, | |
| "mean_token_accuracy": 0.9616046726703644, | |
| "num_tokens": 14516327.0, | |
| "step": 810 | |
| }, | |
| { | |
| "entropy": 1.1916023015975952, | |
| "epoch": 1.1036339165545088, | |
| "grad_norm": 1.296561598777771, | |
| "learning_rate": 9.901358805734846e-05, | |
| "loss": 0.09139133095741273, | |
| "mean_token_accuracy": 0.9634343802928924, | |
| "num_tokens": 14695257.0, | |
| "step": 820 | |
| }, | |
| { | |
| "entropy": 1.1924808621406555, | |
| "epoch": 1.117092866756393, | |
| "grad_norm": 1.3003162145614624, | |
| "learning_rate": 9.89691135958757e-05, | |
| "loss": 0.0935364007949829, | |
| "mean_token_accuracy": 0.9624580383300781, | |
| "num_tokens": 14874609.0, | |
| "step": 830 | |
| }, | |
| { | |
| "entropy": 1.2110714197158814, | |
| "epoch": 1.1305518169582773, | |
| "grad_norm": 1.341585636138916, | |
| "learning_rate": 9.892366894179105e-05, | |
| "loss": 0.09882450699806214, | |
| "mean_token_accuracy": 0.961030250787735, | |
| "num_tokens": 15053971.0, | |
| "step": 840 | |
| }, | |
| { | |
| "entropy": 1.2037933468818665, | |
| "epoch": 1.1440107671601616, | |
| "grad_norm": 1.4529608488082886, | |
| "learning_rate": 9.887725499545937e-05, | |
| "loss": 0.09266124367713928, | |
| "mean_token_accuracy": 0.9641264617443085, | |
| "num_tokens": 15233217.0, | |
| "step": 850 | |
| }, | |
| { | |
| "entropy": 1.210195577144623, | |
| "epoch": 1.1574697173620458, | |
| "grad_norm": 0.9631540179252625, | |
| "learning_rate": 9.882987267644939e-05, | |
| "loss": 0.09560335874557495, | |
| "mean_token_accuracy": 0.9616110920906067, | |
| "num_tokens": 15412460.0, | |
| "step": 860 | |
| }, | |
| { | |
| "entropy": 1.210800564289093, | |
| "epoch": 1.17092866756393, | |
| "grad_norm": 1.078429937362671, | |
| "learning_rate": 9.878152292351563e-05, | |
| "loss": 0.0967819094657898, | |
| "mean_token_accuracy": 0.960888934135437, | |
| "num_tokens": 15590882.0, | |
| "step": 870 | |
| }, | |
| { | |
| "entropy": 1.204759907722473, | |
| "epoch": 1.1843876177658144, | |
| "grad_norm": 1.132325530052185, | |
| "learning_rate": 9.873220669457975e-05, | |
| "loss": 0.09479628801345825, | |
| "mean_token_accuracy": 0.9629071593284607, | |
| "num_tokens": 15770658.0, | |
| "step": 880 | |
| }, | |
| { | |
| "entropy": 1.2071414232254027, | |
| "epoch": 1.1978465679676986, | |
| "grad_norm": 1.0902807712554932, | |
| "learning_rate": 9.868192496671147e-05, | |
| "loss": 0.09629296064376831, | |
| "mean_token_accuracy": 0.9622378885746002, | |
| "num_tokens": 15950126.0, | |
| "step": 890 | |
| }, | |
| { | |
| "entropy": 1.2057334661483765, | |
| "epoch": 1.2113055181695827, | |
| "grad_norm": 1.2059770822525024, | |
| "learning_rate": 9.86306787361094e-05, | |
| "loss": 0.09764755368232728, | |
| "mean_token_accuracy": 0.962404465675354, | |
| "num_tokens": 16129315.0, | |
| "step": 900 | |
| }, | |
| { | |
| "entropy": 1.2059934735298157, | |
| "epoch": 1.224764468371467, | |
| "grad_norm": 1.5481969118118286, | |
| "learning_rate": 9.857846901808117e-05, | |
| "loss": 0.09670655727386475, | |
| "mean_token_accuracy": 0.9619201004505158, | |
| "num_tokens": 16307839.0, | |
| "step": 910 | |
| }, | |
| { | |
| "entropy": 1.2136071562767028, | |
| "epoch": 1.2382234185733512, | |
| "grad_norm": 1.1171293258666992, | |
| "learning_rate": 9.852529684702329e-05, | |
| "loss": 0.09502402544021607, | |
| "mean_token_accuracy": 0.9619876623153687, | |
| "num_tokens": 16487021.0, | |
| "step": 920 | |
| }, | |
| { | |
| "entropy": 1.232442605495453, | |
| "epoch": 1.2516823687752354, | |
| "grad_norm": 1.2836637496948242, | |
| "learning_rate": 9.847116327640082e-05, | |
| "loss": 0.09930729866027832, | |
| "mean_token_accuracy": 0.9604007601737976, | |
| "num_tokens": 16665995.0, | |
| "step": 930 | |
| }, | |
| { | |
| "entropy": 1.2321829080581665, | |
| "epoch": 1.2651413189771197, | |
| "grad_norm": 1.1843444108963013, | |
| "learning_rate": 9.841606937872632e-05, | |
| "loss": 0.10086537599563598, | |
| "mean_token_accuracy": 0.9602800250053406, | |
| "num_tokens": 16845090.0, | |
| "step": 940 | |
| }, | |
| { | |
| "entropy": 1.233613657951355, | |
| "epoch": 1.278600269179004, | |
| "grad_norm": 1.3496166467666626, | |
| "learning_rate": 9.836001624553869e-05, | |
| "loss": 0.09795907735824586, | |
| "mean_token_accuracy": 0.9610718429088593, | |
| "num_tokens": 17024295.0, | |
| "step": 950 | |
| }, | |
| { | |
| "entropy": 1.2220023155212403, | |
| "epoch": 1.2920592193808882, | |
| "grad_norm": 1.238175392150879, | |
| "learning_rate": 9.830300498738152e-05, | |
| "loss": 0.09709340333938599, | |
| "mean_token_accuracy": 0.9621975898742676, | |
| "num_tokens": 17203525.0, | |
| "step": 960 | |
| }, | |
| { | |
| "entropy": 1.2186771392822267, | |
| "epoch": 1.3055181695827724, | |
| "grad_norm": 1.0820763111114502, | |
| "learning_rate": 9.824503673378112e-05, | |
| "loss": 0.09260507822036743, | |
| "mean_token_accuracy": 0.9632427036762238, | |
| "num_tokens": 17382269.0, | |
| "step": 970 | |
| }, | |
| { | |
| "entropy": 1.212946391105652, | |
| "epoch": 1.3189771197846567, | |
| "grad_norm": 1.253194808959961, | |
| "learning_rate": 9.81861126332241e-05, | |
| "loss": 0.10012803077697754, | |
| "mean_token_accuracy": 0.9596389472484589, | |
| "num_tokens": 17561951.0, | |
| "step": 980 | |
| }, | |
| { | |
| "entropy": 1.2092903971672058, | |
| "epoch": 1.332436069986541, | |
| "grad_norm": 1.6471713781356812, | |
| "learning_rate": 9.812623385313461e-05, | |
| "loss": 0.1032632827758789, | |
| "mean_token_accuracy": 0.9594815850257874, | |
| "num_tokens": 17741116.0, | |
| "step": 990 | |
| }, | |
| { | |
| "entropy": 1.2158336997032166, | |
| "epoch": 1.3458950201884252, | |
| "grad_norm": 1.076393723487854, | |
| "learning_rate": 9.806540157985131e-05, | |
| "loss": 0.09857285022735596, | |
| "mean_token_accuracy": 0.9608540177345276, | |
| "num_tokens": 17920249.0, | |
| "step": 1000 | |
| }, | |
| { | |
| "entropy": 1.2093246698379516, | |
| "epoch": 1.3593539703903095, | |
| "grad_norm": 1.1203004121780396, | |
| "learning_rate": 9.800361701860368e-05, | |
| "loss": 0.09807900190353394, | |
| "mean_token_accuracy": 0.9611685931682586, | |
| "num_tokens": 18099006.0, | |
| "step": 1010 | |
| }, | |
| { | |
| "entropy": 1.2070690989494324, | |
| "epoch": 1.3728129205921937, | |
| "grad_norm": 1.3285764455795288, | |
| "learning_rate": 9.794088139348835e-05, | |
| "loss": 0.10283086299896241, | |
| "mean_token_accuracy": 0.9585156977176666, | |
| "num_tokens": 18277971.0, | |
| "step": 1020 | |
| }, | |
| { | |
| "entropy": 1.2022451281547546, | |
| "epoch": 1.386271870794078, | |
| "grad_norm": 1.0949617624282837, | |
| "learning_rate": 9.787719594744468e-05, | |
| "loss": 0.10161725282669068, | |
| "mean_token_accuracy": 0.9598902583122253, | |
| "num_tokens": 18457464.0, | |
| "step": 1030 | |
| }, | |
| { | |
| "entropy": 1.2045769929885863, | |
| "epoch": 1.3997308209959622, | |
| "grad_norm": 1.008150577545166, | |
| "learning_rate": 9.781256194223023e-05, | |
| "loss": 0.10038440227508545, | |
| "mean_token_accuracy": 0.960367614030838, | |
| "num_tokens": 18636876.0, | |
| "step": 1040 | |
| }, | |
| { | |
| "entropy": 1.204549217224121, | |
| "epoch": 1.4131897711978465, | |
| "grad_norm": 1.0495935678482056, | |
| "learning_rate": 9.774698065839577e-05, | |
| "loss": 0.09564157128334046, | |
| "mean_token_accuracy": 0.9625212967395782, | |
| "num_tokens": 18816243.0, | |
| "step": 1050 | |
| }, | |
| { | |
| "entropy": 1.2045063614845275, | |
| "epoch": 1.4266487213997308, | |
| "grad_norm": 1.2372835874557495, | |
| "learning_rate": 9.768045339525979e-05, | |
| "loss": 0.09781360626220703, | |
| "mean_token_accuracy": 0.9605839848518372, | |
| "num_tokens": 18995594.0, | |
| "step": 1060 | |
| }, | |
| { | |
| "entropy": 1.2258678078651428, | |
| "epoch": 1.440107671601615, | |
| "grad_norm": 1.0772687196731567, | |
| "learning_rate": 9.76129814708829e-05, | |
| "loss": 0.09291026592254639, | |
| "mean_token_accuracy": 0.9634248733520507, | |
| "num_tokens": 19173887.0, | |
| "step": 1070 | |
| }, | |
| { | |
| "entropy": 1.2230794191360475, | |
| "epoch": 1.4535666218034993, | |
| "grad_norm": 1.2008293867111206, | |
| "learning_rate": 9.754456622204167e-05, | |
| "loss": 0.09285001754760742, | |
| "mean_token_accuracy": 0.9633622407913208, | |
| "num_tokens": 19352678.0, | |
| "step": 1080 | |
| }, | |
| { | |
| "entropy": 1.2313218355178832, | |
| "epoch": 1.4670255720053835, | |
| "grad_norm": 1.5826188325881958, | |
| "learning_rate": 9.747520900420209e-05, | |
| "loss": 0.1002782940864563, | |
| "mean_token_accuracy": 0.9600823521614075, | |
| "num_tokens": 19532077.0, | |
| "step": 1090 | |
| }, | |
| { | |
| "entropy": 1.2246542692184448, | |
| "epoch": 1.4804845222072678, | |
| "grad_norm": 1.3970143795013428, | |
| "learning_rate": 9.740491119149277e-05, | |
| "loss": 0.1005969524383545, | |
| "mean_token_accuracy": 0.9596368432044983, | |
| "num_tokens": 19710609.0, | |
| "step": 1100 | |
| }, | |
| { | |
| "entropy": 1.207058048248291, | |
| "epoch": 1.493943472409152, | |
| "grad_norm": 1.3544780015945435, | |
| "learning_rate": 9.733367417667773e-05, | |
| "loss": 0.09367164373397827, | |
| "mean_token_accuracy": 0.9632523238658905, | |
| "num_tokens": 19889820.0, | |
| "step": 1110 | |
| }, | |
| { | |
| "entropy": 1.2027259588241577, | |
| "epoch": 1.5074024226110363, | |
| "grad_norm": 1.2393465042114258, | |
| "learning_rate": 9.726149937112873e-05, | |
| "loss": 0.09854428172111511, | |
| "mean_token_accuracy": 0.9612930059432984, | |
| "num_tokens": 20069561.0, | |
| "step": 1120 | |
| }, | |
| { | |
| "entropy": 1.2199820518493651, | |
| "epoch": 1.5208613728129206, | |
| "grad_norm": 1.4061861038208008, | |
| "learning_rate": 9.718838820479743e-05, | |
| "loss": 0.09687533378601074, | |
| "mean_token_accuracy": 0.9612306416034698, | |
| "num_tokens": 20249088.0, | |
| "step": 1130 | |
| }, | |
| { | |
| "entropy": 1.2114709615707397, | |
| "epoch": 1.5343203230148048, | |
| "grad_norm": 1.2970331907272339, | |
| "learning_rate": 9.711434212618691e-05, | |
| "loss": 0.09762253165245056, | |
| "mean_token_accuracy": 0.9609376013278961, | |
| "num_tokens": 20428600.0, | |
| "step": 1140 | |
| }, | |
| { | |
| "entropy": 1.1982413172721862, | |
| "epoch": 1.547779273216689, | |
| "grad_norm": 1.621308445930481, | |
| "learning_rate": 9.703936260232308e-05, | |
| "loss": 0.09679374098777771, | |
| "mean_token_accuracy": 0.9625207364559174, | |
| "num_tokens": 20608047.0, | |
| "step": 1150 | |
| }, | |
| { | |
| "entropy": 1.195889377593994, | |
| "epoch": 1.5612382234185733, | |
| "grad_norm": 1.2940045595169067, | |
| "learning_rate": 9.696345111872557e-05, | |
| "loss": 0.09699609279632568, | |
| "mean_token_accuracy": 0.96190345287323, | |
| "num_tokens": 20787142.0, | |
| "step": 1160 | |
| }, | |
| { | |
| "entropy": 1.1944554448127747, | |
| "epoch": 1.5746971736204576, | |
| "grad_norm": 1.3155335187911987, | |
| "learning_rate": 9.688660917937838e-05, | |
| "loss": 0.09831242561340332, | |
| "mean_token_accuracy": 0.9606768429279328, | |
| "num_tokens": 20966230.0, | |
| "step": 1170 | |
| }, | |
| { | |
| "entropy": 1.1957345604896545, | |
| "epoch": 1.5881561238223418, | |
| "grad_norm": 1.2948030233383179, | |
| "learning_rate": 9.68088383066999e-05, | |
| "loss": 0.09834452867507934, | |
| "mean_token_accuracy": 0.9612827241420746, | |
| "num_tokens": 21145768.0, | |
| "step": 1180 | |
| }, | |
| { | |
| "entropy": 1.2020023703575133, | |
| "epoch": 1.601615074024226, | |
| "grad_norm": 1.0523329973220825, | |
| "learning_rate": 9.673014004151292e-05, | |
| "loss": 0.09663949012756348, | |
| "mean_token_accuracy": 0.9620199799537659, | |
| "num_tokens": 21324592.0, | |
| "step": 1190 | |
| }, | |
| { | |
| "entropy": 1.1892358779907226, | |
| "epoch": 1.6150740242261103, | |
| "grad_norm": 1.1584330797195435, | |
| "learning_rate": 9.665051594301407e-05, | |
| "loss": 0.09669581055641174, | |
| "mean_token_accuracy": 0.961614978313446, | |
| "num_tokens": 21504539.0, | |
| "step": 1200 | |
| }, | |
| { | |
| "entropy": 1.190696406364441, | |
| "epoch": 1.6285329744279946, | |
| "grad_norm": 1.1194695234298706, | |
| "learning_rate": 9.656996758874284e-05, | |
| "loss": 0.09648081660270691, | |
| "mean_token_accuracy": 0.9612169206142426, | |
| "num_tokens": 21683905.0, | |
| "step": 1210 | |
| }, | |
| { | |
| "entropy": 1.2076977849006654, | |
| "epoch": 1.6419919246298789, | |
| "grad_norm": 1.1297376155853271, | |
| "learning_rate": 9.648849657455044e-05, | |
| "loss": 0.09605686664581299, | |
| "mean_token_accuracy": 0.961658376455307, | |
| "num_tokens": 21862162.0, | |
| "step": 1220 | |
| }, | |
| { | |
| "entropy": 1.2096962213516236, | |
| "epoch": 1.6554508748317631, | |
| "grad_norm": 1.2401906251907349, | |
| "learning_rate": 9.640610451456811e-05, | |
| "loss": 0.09206328392028809, | |
| "mean_token_accuracy": 0.962989890575409, | |
| "num_tokens": 22041015.0, | |
| "step": 1230 | |
| }, | |
| { | |
| "entropy": 1.216509222984314, | |
| "epoch": 1.6689098250336474, | |
| "grad_norm": 1.2637176513671875, | |
| "learning_rate": 9.632279304117517e-05, | |
| "loss": 0.09614999294281006, | |
| "mean_token_accuracy": 0.9613571405410767, | |
| "num_tokens": 22220655.0, | |
| "step": 1240 | |
| }, | |
| { | |
| "entropy": 1.2126171827316283, | |
| "epoch": 1.6823687752355316, | |
| "grad_norm": 1.2879180908203125, | |
| "learning_rate": 9.623856380496664e-05, | |
| "loss": 0.09818092584609986, | |
| "mean_token_accuracy": 0.9603166699409484, | |
| "num_tokens": 22399957.0, | |
| "step": 1250 | |
| }, | |
| { | |
| "entropy": 1.182448434829712, | |
| "epoch": 1.695827725437416, | |
| "grad_norm": 1.0547544956207275, | |
| "learning_rate": 9.615341847472059e-05, | |
| "loss": 0.0945388674736023, | |
| "mean_token_accuracy": 0.9623521089553833, | |
| "num_tokens": 22579222.0, | |
| "step": 1260 | |
| }, | |
| { | |
| "entropy": 1.1886864185333252, | |
| "epoch": 1.7092866756393001, | |
| "grad_norm": 1.4119364023208618, | |
| "learning_rate": 9.606735873736505e-05, | |
| "loss": 0.0979494333267212, | |
| "mean_token_accuracy": 0.9607987105846405, | |
| "num_tokens": 22758487.0, | |
| "step": 1270 | |
| }, | |
| { | |
| "entropy": 1.1918489813804627, | |
| "epoch": 1.7227456258411844, | |
| "grad_norm": 1.2551711797714233, | |
| "learning_rate": 9.598038629794461e-05, | |
| "loss": 0.09586712718009949, | |
| "mean_token_accuracy": 0.9615644454956055, | |
| "num_tokens": 22936708.0, | |
| "step": 1280 | |
| }, | |
| { | |
| "entropy": 1.1913212060928344, | |
| "epoch": 1.7362045760430687, | |
| "grad_norm": 1.0276069641113281, | |
| "learning_rate": 9.589250287958657e-05, | |
| "loss": 0.09535220861434937, | |
| "mean_token_accuracy": 0.9606883823871613, | |
| "num_tokens": 23116329.0, | |
| "step": 1290 | |
| }, | |
| { | |
| "entropy": 1.203085219860077, | |
| "epoch": 1.749663526244953, | |
| "grad_norm": 1.2456278800964355, | |
| "learning_rate": 9.580371022346693e-05, | |
| "loss": 0.09598281383514404, | |
| "mean_token_accuracy": 0.9608144044876099, | |
| "num_tokens": 23295164.0, | |
| "step": 1300 | |
| }, | |
| { | |
| "entropy": 1.1892922878265382, | |
| "epoch": 1.7631224764468372, | |
| "grad_norm": 1.1159876585006714, | |
| "learning_rate": 9.571401008877572e-05, | |
| "loss": 0.09096106886863708, | |
| "mean_token_accuracy": 0.9636982321739197, | |
| "num_tokens": 23474377.0, | |
| "step": 1310 | |
| }, | |
| { | |
| "entropy": 1.2096730828285218, | |
| "epoch": 1.7765814266487214, | |
| "grad_norm": 1.420886516571045, | |
| "learning_rate": 9.562340425268233e-05, | |
| "loss": 0.0925740659236908, | |
| "mean_token_accuracy": 0.9629011929035187, | |
| "num_tokens": 23653389.0, | |
| "step": 1320 | |
| }, | |
| { | |
| "entropy": 1.2122852802276611, | |
| "epoch": 1.7900403768506057, | |
| "grad_norm": 1.1587319374084473, | |
| "learning_rate": 9.553189451030019e-05, | |
| "loss": 0.09554123878479004, | |
| "mean_token_accuracy": 0.9622859060764313, | |
| "num_tokens": 23832469.0, | |
| "step": 1330 | |
| }, | |
| { | |
| "entropy": 1.2176487922668457, | |
| "epoch": 1.80349932705249, | |
| "grad_norm": 1.147444248199463, | |
| "learning_rate": 9.543948267465115e-05, | |
| "loss": 0.09707238674163818, | |
| "mean_token_accuracy": 0.9612141191959381, | |
| "num_tokens": 24011518.0, | |
| "step": 1340 | |
| }, | |
| { | |
| "entropy": 1.2231361389160156, | |
| "epoch": 1.8169582772543742, | |
| "grad_norm": 1.1775709390640259, | |
| "learning_rate": 9.534617057662977e-05, | |
| "loss": 0.09654755592346191, | |
| "mean_token_accuracy": 0.9617958247661591, | |
| "num_tokens": 24190267.0, | |
| "step": 1350 | |
| }, | |
| { | |
| "entropy": 1.2070120811462401, | |
| "epoch": 1.8304172274562585, | |
| "grad_norm": 1.1315947771072388, | |
| "learning_rate": 9.525196006496679e-05, | |
| "loss": 0.09382581114768981, | |
| "mean_token_accuracy": 0.9625270128250122, | |
| "num_tokens": 24369982.0, | |
| "step": 1360 | |
| }, | |
| { | |
| "entropy": 1.2057390093803406, | |
| "epoch": 1.8438761776581427, | |
| "grad_norm": 1.1973934173583984, | |
| "learning_rate": 9.515685300619271e-05, | |
| "loss": 0.09683746099472046, | |
| "mean_token_accuracy": 0.9607476830482483, | |
| "num_tokens": 24549256.0, | |
| "step": 1370 | |
| }, | |
| { | |
| "entropy": 1.207427191734314, | |
| "epoch": 1.857335127860027, | |
| "grad_norm": 1.3193334341049194, | |
| "learning_rate": 9.506085128460065e-05, | |
| "loss": 0.09461041688919067, | |
| "mean_token_accuracy": 0.9628551185131073, | |
| "num_tokens": 24727544.0, | |
| "step": 1380 | |
| }, | |
| { | |
| "entropy": 1.2011541604995728, | |
| "epoch": 1.8707940780619112, | |
| "grad_norm": 1.0681352615356445, | |
| "learning_rate": 9.496395680220918e-05, | |
| "loss": 0.0960330069065094, | |
| "mean_token_accuracy": 0.9622460305690765, | |
| "num_tokens": 24907721.0, | |
| "step": 1390 | |
| }, | |
| { | |
| "entropy": 1.2034499764442443, | |
| "epoch": 1.8842530282637955, | |
| "grad_norm": 1.2765101194381714, | |
| "learning_rate": 9.486617147872446e-05, | |
| "loss": 0.09376740455627441, | |
| "mean_token_accuracy": 0.9624415040016174, | |
| "num_tokens": 25086496.0, | |
| "step": 1400 | |
| }, | |
| { | |
| "entropy": 1.1899038195610045, | |
| "epoch": 1.8977119784656797, | |
| "grad_norm": 1.1333132982254028, | |
| "learning_rate": 9.476749725150235e-05, | |
| "loss": 0.09668049812316895, | |
| "mean_token_accuracy": 0.9621514558792115, | |
| "num_tokens": 25266204.0, | |
| "step": 1410 | |
| }, | |
| { | |
| "entropy": 1.1895189881324768, | |
| "epoch": 1.911170928667564, | |
| "grad_norm": 1.310587763786316, | |
| "learning_rate": 9.466793607550995e-05, | |
| "loss": 0.0920013129711151, | |
| "mean_token_accuracy": 0.963368022441864, | |
| "num_tokens": 25445604.0, | |
| "step": 1420 | |
| }, | |
| { | |
| "entropy": 1.1946001529693604, | |
| "epoch": 1.9246298788694483, | |
| "grad_norm": 1.3619959354400635, | |
| "learning_rate": 9.45674899232869e-05, | |
| "loss": 0.09906838536262512, | |
| "mean_token_accuracy": 0.9604476511478424, | |
| "num_tokens": 25624947.0, | |
| "step": 1430 | |
| }, | |
| { | |
| "entropy": 1.2078219771385192, | |
| "epoch": 1.9380888290713325, | |
| "grad_norm": 1.152220606803894, | |
| "learning_rate": 9.446616078490626e-05, | |
| "loss": 0.09479650259017944, | |
| "mean_token_accuracy": 0.9627270400524139, | |
| "num_tokens": 25804643.0, | |
| "step": 1440 | |
| }, | |
| { | |
| "entropy": 1.2207356214523315, | |
| "epoch": 1.9515477792732168, | |
| "grad_norm": 1.186161994934082, | |
| "learning_rate": 9.436395066793518e-05, | |
| "loss": 0.09636704921722412, | |
| "mean_token_accuracy": 0.9604843854904175, | |
| "num_tokens": 25984119.0, | |
| "step": 1450 | |
| }, | |
| { | |
| "entropy": 1.205849301815033, | |
| "epoch": 1.965006729475101, | |
| "grad_norm": 1.409846544265747, | |
| "learning_rate": 9.426086159739496e-05, | |
| "loss": 0.09743249416351318, | |
| "mean_token_accuracy": 0.9608718931674958, | |
| "num_tokens": 26163483.0, | |
| "step": 1460 | |
| }, | |
| { | |
| "entropy": 1.2089114785194397, | |
| "epoch": 1.9784656796769853, | |
| "grad_norm": 1.2226805686950684, | |
| "learning_rate": 9.415689561572107e-05, | |
| "loss": 0.09131012558937072, | |
| "mean_token_accuracy": 0.9631811439990997, | |
| "num_tokens": 26342666.0, | |
| "step": 1470 | |
| }, | |
| { | |
| "entropy": 1.1953012466430664, | |
| "epoch": 1.9919246298788695, | |
| "grad_norm": 1.0700947046279907, | |
| "learning_rate": 9.405205478272267e-05, | |
| "loss": 0.09140577316284179, | |
| "mean_token_accuracy": 0.9642649590969086, | |
| "num_tokens": 26521895.0, | |
| "step": 1480 | |
| }, | |
| { | |
| "epoch": 2.0, | |
| "eval_entropy": 1.1869331590688912, | |
| "eval_loss": 0.10991495102643967, | |
| "eval_mean_token_accuracy": 0.9554494638351878, | |
| "eval_num_tokens": 26629596.0, | |
| "eval_runtime": 12.7631, | |
| "eval_samples_per_second": 391.753, | |
| "eval_steps_per_second": 12.301, | |
| "step": 1486 | |
| }, | |
| { | |
| "entropy": 1.1832876205444336, | |
| "epoch": 2.005383580080754, | |
| "grad_norm": 1.0044941902160645, | |
| "learning_rate": 9.394634117554173e-05, | |
| "loss": 0.0840892255306244, | |
| "mean_token_accuracy": 0.967725521326065, | |
| "num_tokens": 26701394.0, | |
| "step": 1490 | |
| }, | |
| { | |
| "entropy": 1.159238350391388, | |
| "epoch": 2.018842530282638, | |
| "grad_norm": 1.4198471307754517, | |
| "learning_rate": 9.38397568886119e-05, | |
| "loss": 0.07137876152992248, | |
| "mean_token_accuracy": 0.9723187386989594, | |
| "num_tokens": 26880890.0, | |
| "step": 1500 | |
| }, | |
| { | |
| "entropy": 1.165151631832123, | |
| "epoch": 2.0323014804845223, | |
| "grad_norm": 1.1602118015289307, | |
| "learning_rate": 9.373230403361712e-05, | |
| "loss": 0.06463043689727783, | |
| "mean_token_accuracy": 0.9757274091243744, | |
| "num_tokens": 27059741.0, | |
| "step": 1510 | |
| }, | |
| { | |
| "entropy": 1.1644548654556275, | |
| "epoch": 2.0457604306864066, | |
| "grad_norm": 1.3322592973709106, | |
| "learning_rate": 9.362398473944958e-05, | |
| "loss": 0.07388677597045898, | |
| "mean_token_accuracy": 0.971617478132248, | |
| "num_tokens": 27238125.0, | |
| "step": 1520 | |
| }, | |
| { | |
| "entropy": 1.1564043641090394, | |
| "epoch": 2.059219380888291, | |
| "grad_norm": 1.1690629720687866, | |
| "learning_rate": 9.35148011521677e-05, | |
| "loss": 0.06990204453468322, | |
| "mean_token_accuracy": 0.9719981133937836, | |
| "num_tokens": 27417172.0, | |
| "step": 1530 | |
| }, | |
| { | |
| "entropy": 1.1576861262321472, | |
| "epoch": 2.072678331090175, | |
| "grad_norm": 1.7016727924346924, | |
| "learning_rate": 9.340475543495364e-05, | |
| "loss": 0.06850625276565551, | |
| "mean_token_accuracy": 0.9732699453830719, | |
| "num_tokens": 27596848.0, | |
| "step": 1540 | |
| }, | |
| { | |
| "entropy": 1.1597527265548706, | |
| "epoch": 2.0861372812920593, | |
| "grad_norm": 1.1524600982666016, | |
| "learning_rate": 9.329384976807023e-05, | |
| "loss": 0.06980778574943543, | |
| "mean_token_accuracy": 0.9729204118251801, | |
| "num_tokens": 27775617.0, | |
| "step": 1550 | |
| }, | |
| { | |
| "entropy": 1.1575467109680175, | |
| "epoch": 2.0995962314939436, | |
| "grad_norm": 1.4498176574707031, | |
| "learning_rate": 9.318208634881802e-05, | |
| "loss": 0.07390267252922059, | |
| "mean_token_accuracy": 0.9713942348957062, | |
| "num_tokens": 27954133.0, | |
| "step": 1560 | |
| }, | |
| { | |
| "entropy": 1.1575489521026612, | |
| "epoch": 2.113055181695828, | |
| "grad_norm": 1.243706464767456, | |
| "learning_rate": 9.306946739149161e-05, | |
| "loss": 0.06798491477966309, | |
| "mean_token_accuracy": 0.973270720243454, | |
| "num_tokens": 28133292.0, | |
| "step": 1570 | |
| }, | |
| { | |
| "entropy": 1.1514037609100343, | |
| "epoch": 2.126514131897712, | |
| "grad_norm": 1.256933331489563, | |
| "learning_rate": 9.29559951273358e-05, | |
| "loss": 0.0749699592590332, | |
| "mean_token_accuracy": 0.9699350416660308, | |
| "num_tokens": 28312726.0, | |
| "step": 1580 | |
| }, | |
| { | |
| "entropy": 1.1511113524436951, | |
| "epoch": 2.1399730820995964, | |
| "grad_norm": 1.1914122104644775, | |
| "learning_rate": 9.284167180450141e-05, | |
| "loss": 0.06752681732177734, | |
| "mean_token_accuracy": 0.9743177771568299, | |
| "num_tokens": 28492614.0, | |
| "step": 1590 | |
| }, | |
| { | |
| "entropy": 1.1382261991500855, | |
| "epoch": 2.1534320323014806, | |
| "grad_norm": 1.109575867652893, | |
| "learning_rate": 9.272649968800069e-05, | |
| "loss": 0.06449686884880065, | |
| "mean_token_accuracy": 0.9755833566188812, | |
| "num_tokens": 28671719.0, | |
| "step": 1600 | |
| }, | |
| { | |
| "entropy": 1.1397038459777833, | |
| "epoch": 2.166890982503365, | |
| "grad_norm": 1.3151781558990479, | |
| "learning_rate": 9.26104810596625e-05, | |
| "loss": 0.07052424550056458, | |
| "mean_token_accuracy": 0.972499680519104, | |
| "num_tokens": 28851056.0, | |
| "step": 1610 | |
| }, | |
| { | |
| "entropy": 1.136834406852722, | |
| "epoch": 2.180349932705249, | |
| "grad_norm": 1.4410197734832764, | |
| "learning_rate": 9.249361821808708e-05, | |
| "loss": 0.06850321292877197, | |
| "mean_token_accuracy": 0.9728572845458985, | |
| "num_tokens": 29030750.0, | |
| "step": 1620 | |
| }, | |
| { | |
| "entropy": 1.140534520149231, | |
| "epoch": 2.1938088829071334, | |
| "grad_norm": 1.1493765115737915, | |
| "learning_rate": 9.237591347860052e-05, | |
| "loss": 0.06934296488761901, | |
| "mean_token_accuracy": 0.972960364818573, | |
| "num_tokens": 29210336.0, | |
| "step": 1630 | |
| }, | |
| { | |
| "entropy": 1.138876986503601, | |
| "epoch": 2.2072678331090176, | |
| "grad_norm": 1.0383925437927246, | |
| "learning_rate": 9.225736917320886e-05, | |
| "loss": 0.06877213716506958, | |
| "mean_token_accuracy": 0.9730811774730682, | |
| "num_tokens": 29389788.0, | |
| "step": 1640 | |
| }, | |
| { | |
| "entropy": 1.1449824213981628, | |
| "epoch": 2.220726783310902, | |
| "grad_norm": 1.374165654182434, | |
| "learning_rate": 9.213798765055187e-05, | |
| "loss": 0.07060860991477966, | |
| "mean_token_accuracy": 0.9721363008022308, | |
| "num_tokens": 29569111.0, | |
| "step": 1650 | |
| }, | |
| { | |
| "entropy": 1.143638014793396, | |
| "epoch": 2.234185733512786, | |
| "grad_norm": 1.0196412801742554, | |
| "learning_rate": 9.20177712758566e-05, | |
| "loss": 0.07119340896606445, | |
| "mean_token_accuracy": 0.9731849789619446, | |
| "num_tokens": 29747986.0, | |
| "step": 1660 | |
| }, | |
| { | |
| "entropy": 1.1349515676498414, | |
| "epoch": 2.2476446837146704, | |
| "grad_norm": 1.1247919797897339, | |
| "learning_rate": 9.189672243089046e-05, | |
| "loss": 0.07071832418441773, | |
| "mean_token_accuracy": 0.9731756567955017, | |
| "num_tokens": 29927276.0, | |
| "step": 1670 | |
| }, | |
| { | |
| "entropy": 1.1376260280609132, | |
| "epoch": 2.2611036339165547, | |
| "grad_norm": 1.4197320938110352, | |
| "learning_rate": 9.177484351391402e-05, | |
| "loss": 0.07115572690963745, | |
| "mean_token_accuracy": 0.9723047018051147, | |
| "num_tokens": 30106267.0, | |
| "step": 1680 | |
| }, | |
| { | |
| "entropy": 1.1282797813415528, | |
| "epoch": 2.274562584118439, | |
| "grad_norm": 1.0774035453796387, | |
| "learning_rate": 9.165213693963355e-05, | |
| "loss": 0.068689626455307, | |
| "mean_token_accuracy": 0.9729084491729736, | |
| "num_tokens": 30285658.0, | |
| "step": 1690 | |
| }, | |
| { | |
| "entropy": 1.140774166584015, | |
| "epoch": 2.288021534320323, | |
| "grad_norm": 1.5625728368759155, | |
| "learning_rate": 9.152860513915314e-05, | |
| "loss": 0.07172787189483643, | |
| "mean_token_accuracy": 0.9718713641166687, | |
| "num_tokens": 30464579.0, | |
| "step": 1700 | |
| }, | |
| { | |
| "entropy": 1.1416746616363525, | |
| "epoch": 2.3014804845222074, | |
| "grad_norm": 1.2159788608551025, | |
| "learning_rate": 9.140425055992648e-05, | |
| "loss": 0.07109695672988892, | |
| "mean_token_accuracy": 0.9723007261753083, | |
| "num_tokens": 30643566.0, | |
| "step": 1710 | |
| }, | |
| { | |
| "entropy": 1.137896478176117, | |
| "epoch": 2.3149394347240917, | |
| "grad_norm": 1.1671864986419678, | |
| "learning_rate": 9.127907566570853e-05, | |
| "loss": 0.07048168182373046, | |
| "mean_token_accuracy": 0.9725371599197388, | |
| "num_tokens": 30822593.0, | |
| "step": 1720 | |
| }, | |
| { | |
| "entropy": 1.121722447872162, | |
| "epoch": 2.328398384925976, | |
| "grad_norm": 1.3899428844451904, | |
| "learning_rate": 9.115308293650653e-05, | |
| "loss": 0.07030471563339233, | |
| "mean_token_accuracy": 0.972334086894989, | |
| "num_tokens": 31001872.0, | |
| "step": 1730 | |
| }, | |
| { | |
| "entropy": 1.1253960013389588, | |
| "epoch": 2.34185733512786, | |
| "grad_norm": 1.3878816366195679, | |
| "learning_rate": 9.102627486853099e-05, | |
| "loss": 0.06956568956375123, | |
| "mean_token_accuracy": 0.9728380262851715, | |
| "num_tokens": 31181325.0, | |
| "step": 1740 | |
| }, | |
| { | |
| "entropy": 1.1276877760887145, | |
| "epoch": 2.3553162853297445, | |
| "grad_norm": 1.057102084159851, | |
| "learning_rate": 9.089865397414614e-05, | |
| "loss": 0.07267707586288452, | |
| "mean_token_accuracy": 0.9716822624206543, | |
| "num_tokens": 31360101.0, | |
| "step": 1750 | |
| }, | |
| { | |
| "entropy": 1.1322511553764343, | |
| "epoch": 2.3687752355316287, | |
| "grad_norm": 0.9549157023429871, | |
| "learning_rate": 9.077022278182024e-05, | |
| "loss": 0.0700565218925476, | |
| "mean_token_accuracy": 0.9737613677978516, | |
| "num_tokens": 31539212.0, | |
| "step": 1760 | |
| }, | |
| { | |
| "entropy": 1.1198163986206056, | |
| "epoch": 2.382234185733513, | |
| "grad_norm": 1.2134599685668945, | |
| "learning_rate": 9.064098383607545e-05, | |
| "loss": 0.07131816148757934, | |
| "mean_token_accuracy": 0.9710030317306518, | |
| "num_tokens": 31718375.0, | |
| "step": 1770 | |
| }, | |
| { | |
| "entropy": 1.1219655275344849, | |
| "epoch": 2.3956931359353972, | |
| "grad_norm": 1.3444453477859497, | |
| "learning_rate": 9.051093969743738e-05, | |
| "loss": 0.06774230003356933, | |
| "mean_token_accuracy": 0.9737627685070038, | |
| "num_tokens": 31897547.0, | |
| "step": 1780 | |
| }, | |
| { | |
| "entropy": 1.112294065952301, | |
| "epoch": 2.409152086137281, | |
| "grad_norm": 1.5304001569747925, | |
| "learning_rate": 9.03800929423844e-05, | |
| "loss": 0.0724343478679657, | |
| "mean_token_accuracy": 0.9721882760524749, | |
| "num_tokens": 32076717.0, | |
| "step": 1790 | |
| }, | |
| { | |
| "entropy": 1.1336388826370238, | |
| "epoch": 2.4226110363391653, | |
| "grad_norm": 1.2315043210983276, | |
| "learning_rate": 9.024844616329662e-05, | |
| "loss": 0.07212550640106201, | |
| "mean_token_accuracy": 0.9726522386074066, | |
| "num_tokens": 32255927.0, | |
| "step": 1800 | |
| }, | |
| { | |
| "entropy": 1.1251046061515808, | |
| "epoch": 2.4360699865410496, | |
| "grad_norm": 1.3251651525497437, | |
| "learning_rate": 9.011600196840447e-05, | |
| "loss": 0.07009173035621644, | |
| "mean_token_accuracy": 0.9724473178386688, | |
| "num_tokens": 32434929.0, | |
| "step": 1810 | |
| }, | |
| { | |
| "entropy": 1.1328514456748962, | |
| "epoch": 2.449528936742934, | |
| "grad_norm": 1.2601144313812256, | |
| "learning_rate": 8.998276298173707e-05, | |
| "loss": 0.0719257116317749, | |
| "mean_token_accuracy": 0.9720721006393432, | |
| "num_tokens": 32614063.0, | |
| "step": 1820 | |
| }, | |
| { | |
| "entropy": 1.136410367488861, | |
| "epoch": 2.462987886944818, | |
| "grad_norm": 1.2086918354034424, | |
| "learning_rate": 8.984873184307017e-05, | |
| "loss": 0.07017306089401246, | |
| "mean_token_accuracy": 0.9722030460834503, | |
| "num_tokens": 32793068.0, | |
| "step": 1830 | |
| }, | |
| { | |
| "entropy": 1.13520849943161, | |
| "epoch": 2.4764468371467023, | |
| "grad_norm": 1.453801155090332, | |
| "learning_rate": 8.971391120787397e-05, | |
| "loss": 0.07180649638175965, | |
| "mean_token_accuracy": 0.9726110398769379, | |
| "num_tokens": 32972445.0, | |
| "step": 1840 | |
| }, | |
| { | |
| "entropy": 1.1437011241912842, | |
| "epoch": 2.4899057873485866, | |
| "grad_norm": 1.1886014938354492, | |
| "learning_rate": 8.957830374726042e-05, | |
| "loss": 0.07153818607330323, | |
| "mean_token_accuracy": 0.9720338463783265, | |
| "num_tokens": 33151976.0, | |
| "step": 1850 | |
| }, | |
| { | |
| "entropy": 1.1325899600982665, | |
| "epoch": 2.503364737550471, | |
| "grad_norm": 1.1960384845733643, | |
| "learning_rate": 8.944191214793028e-05, | |
| "loss": 0.06935594081878663, | |
| "mean_token_accuracy": 0.9729972183704376, | |
| "num_tokens": 33330611.0, | |
| "step": 1860 | |
| }, | |
| { | |
| "entropy": 1.127972149848938, | |
| "epoch": 2.516823687752355, | |
| "grad_norm": 1.1048696041107178, | |
| "learning_rate": 8.930473911212e-05, | |
| "loss": 0.07217252850532532, | |
| "mean_token_accuracy": 0.9718475580215454, | |
| "num_tokens": 33509614.0, | |
| "step": 1870 | |
| }, | |
| { | |
| "entropy": 1.1299184799194335, | |
| "epoch": 2.5302826379542394, | |
| "grad_norm": 1.6520979404449463, | |
| "learning_rate": 8.916678735754809e-05, | |
| "loss": 0.07317680716514588, | |
| "mean_token_accuracy": 0.971524566411972, | |
| "num_tokens": 33688724.0, | |
| "step": 1880 | |
| }, | |
| { | |
| "entropy": 1.1270474672317505, | |
| "epoch": 2.5437415881561236, | |
| "grad_norm": 1.1285676956176758, | |
| "learning_rate": 8.902805961736123e-05, | |
| "loss": 0.07085765600204467, | |
| "mean_token_accuracy": 0.9733552634716034, | |
| "num_tokens": 33868061.0, | |
| "step": 1890 | |
| }, | |
| { | |
| "entropy": 1.1217491984367371, | |
| "epoch": 2.557200538358008, | |
| "grad_norm": 1.2642406225204468, | |
| "learning_rate": 8.88885586400803e-05, | |
| "loss": 0.06978695392608643, | |
| "mean_token_accuracy": 0.9726768732070923, | |
| "num_tokens": 34047387.0, | |
| "step": 1900 | |
| }, | |
| { | |
| "entropy": 1.1316059112548829, | |
| "epoch": 2.570659488559892, | |
| "grad_norm": 1.3016549348831177, | |
| "learning_rate": 8.874828718954576e-05, | |
| "loss": 0.07102057337760925, | |
| "mean_token_accuracy": 0.9723258554935456, | |
| "num_tokens": 34227141.0, | |
| "step": 1910 | |
| }, | |
| { | |
| "entropy": 1.137197768688202, | |
| "epoch": 2.5841184387617764, | |
| "grad_norm": 1.1534605026245117, | |
| "learning_rate": 8.86072480448629e-05, | |
| "loss": 0.07511197328567505, | |
| "mean_token_accuracy": 0.9702324509620667, | |
| "num_tokens": 34406526.0, | |
| "step": 1920 | |
| }, | |
| { | |
| "entropy": 1.1349146008491515, | |
| "epoch": 2.5975773889636606, | |
| "grad_norm": 1.3732489347457886, | |
| "learning_rate": 8.84654440003469e-05, | |
| "loss": 0.07147140502929687, | |
| "mean_token_accuracy": 0.9726706743240356, | |
| "num_tokens": 34586116.0, | |
| "step": 1930 | |
| }, | |
| { | |
| "entropy": 1.1294147491455078, | |
| "epoch": 2.611036339165545, | |
| "grad_norm": 0.9715967178344727, | |
| "learning_rate": 8.83228778654674e-05, | |
| "loss": 0.07225455045700073, | |
| "mean_token_accuracy": 0.9726594388484955, | |
| "num_tokens": 34765289.0, | |
| "step": 1940 | |
| }, | |
| { | |
| "entropy": 1.130816388130188, | |
| "epoch": 2.624495289367429, | |
| "grad_norm": 1.293736219406128, | |
| "learning_rate": 8.817955246479276e-05, | |
| "loss": 0.06845389604568482, | |
| "mean_token_accuracy": 0.9736943006515503, | |
| "num_tokens": 34944224.0, | |
| "step": 1950 | |
| }, | |
| { | |
| "entropy": 1.1200148224830628, | |
| "epoch": 2.6379542395693134, | |
| "grad_norm": 1.2962090969085693, | |
| "learning_rate": 8.803547063793422e-05, | |
| "loss": 0.07189736366271973, | |
| "mean_token_accuracy": 0.9717683315277099, | |
| "num_tokens": 35123825.0, | |
| "step": 1960 | |
| }, | |
| { | |
| "entropy": 1.1301296949386597, | |
| "epoch": 2.6514131897711977, | |
| "grad_norm": 1.4734028577804565, | |
| "learning_rate": 8.789063523948958e-05, | |
| "loss": 0.0702283263206482, | |
| "mean_token_accuracy": 0.9727118015289307, | |
| "num_tokens": 35302914.0, | |
| "step": 1970 | |
| }, | |
| { | |
| "entropy": 1.1409568905830383, | |
| "epoch": 2.664872139973082, | |
| "grad_norm": 1.4132834672927856, | |
| "learning_rate": 8.774504913898663e-05, | |
| "loss": 0.07676968574523926, | |
| "mean_token_accuracy": 0.9695852339267731, | |
| "num_tokens": 35481783.0, | |
| "step": 1980 | |
| }, | |
| { | |
| "entropy": 1.1532965421676635, | |
| "epoch": 2.678331090174966, | |
| "grad_norm": 1.1802046298980713, | |
| "learning_rate": 8.75987152208264e-05, | |
| "loss": 0.06539074182510377, | |
| "mean_token_accuracy": 0.9745090186595917, | |
| "num_tokens": 35660431.0, | |
| "step": 1990 | |
| }, | |
| { | |
| "entropy": 1.1450922250747682, | |
| "epoch": 2.6917900403768504, | |
| "grad_norm": 1.1922194957733154, | |
| "learning_rate": 8.745163638422583e-05, | |
| "loss": 0.07205181121826172, | |
| "mean_token_accuracy": 0.9712056815624237, | |
| "num_tokens": 35839308.0, | |
| "step": 2000 | |
| }, | |
| { | |
| "entropy": 1.1444142818450929, | |
| "epoch": 2.7052489905787347, | |
| "grad_norm": 1.1868208646774292, | |
| "learning_rate": 8.730381554316051e-05, | |
| "loss": 0.07235864400863648, | |
| "mean_token_accuracy": 0.9725943446159363, | |
| "num_tokens": 36018734.0, | |
| "step": 2010 | |
| }, | |
| { | |
| "entropy": 1.133234965801239, | |
| "epoch": 2.718707940780619, | |
| "grad_norm": 1.4924181699752808, | |
| "learning_rate": 8.715525562630687e-05, | |
| "loss": 0.07137352228164673, | |
| "mean_token_accuracy": 0.9720249474048615, | |
| "num_tokens": 36197607.0, | |
| "step": 2020 | |
| }, | |
| { | |
| "entropy": 1.150734007358551, | |
| "epoch": 2.732166890982503, | |
| "grad_norm": 1.0833524465560913, | |
| "learning_rate": 8.700595957698411e-05, | |
| "loss": 0.07287259101867676, | |
| "mean_token_accuracy": 0.9720607042312622, | |
| "num_tokens": 36377137.0, | |
| "step": 2030 | |
| }, | |
| { | |
| "entropy": 1.15074782371521, | |
| "epoch": 2.7456258411843875, | |
| "grad_norm": 1.1532199382781982, | |
| "learning_rate": 8.685593035309598e-05, | |
| "loss": 0.07189793586730957, | |
| "mean_token_accuracy": 0.971609354019165, | |
| "num_tokens": 36556438.0, | |
| "step": 2040 | |
| }, | |
| { | |
| "entropy": 1.1425267219543458, | |
| "epoch": 2.7590847913862717, | |
| "grad_norm": 1.65394926071167, | |
| "learning_rate": 8.670517092707213e-05, | |
| "loss": 0.07228031158447265, | |
| "mean_token_accuracy": 0.972437036037445, | |
| "num_tokens": 36734936.0, | |
| "step": 2050 | |
| }, | |
| { | |
| "entropy": 1.1443095088005066, | |
| "epoch": 2.772543741588156, | |
| "grad_norm": 1.1756309270858765, | |
| "learning_rate": 8.655368428580919e-05, | |
| "loss": 0.07032725811004639, | |
| "mean_token_accuracy": 0.9716470181941986, | |
| "num_tokens": 36913871.0, | |
| "step": 2060 | |
| }, | |
| { | |
| "entropy": 1.1369405388832092, | |
| "epoch": 2.7860026917900402, | |
| "grad_norm": 1.2845005989074707, | |
| "learning_rate": 8.640147343061165e-05, | |
| "loss": 0.07300193309783935, | |
| "mean_token_accuracy": 0.971499103307724, | |
| "num_tokens": 37093380.0, | |
| "step": 2070 | |
| }, | |
| { | |
| "entropy": 1.1441598296165467, | |
| "epoch": 2.7994616419919245, | |
| "grad_norm": 1.0858702659606934, | |
| "learning_rate": 8.624854137713234e-05, | |
| "loss": 0.07180417776107788, | |
| "mean_token_accuracy": 0.9721956551074982, | |
| "num_tokens": 37272902.0, | |
| "step": 2080 | |
| }, | |
| { | |
| "entropy": 1.1500964164733887, | |
| "epoch": 2.8129205921938087, | |
| "grad_norm": 1.1274408102035522, | |
| "learning_rate": 8.609489115531278e-05, | |
| "loss": 0.07155272960662842, | |
| "mean_token_accuracy": 0.971377295255661, | |
| "num_tokens": 37451897.0, | |
| "step": 2090 | |
| }, | |
| { | |
| "entropy": 1.1386480689048768, | |
| "epoch": 2.826379542395693, | |
| "grad_norm": 1.1894359588623047, | |
| "learning_rate": 8.594052580932301e-05, | |
| "loss": 0.06719542145729065, | |
| "mean_token_accuracy": 0.9733208954334259, | |
| "num_tokens": 37631343.0, | |
| "step": 2100 | |
| }, | |
| { | |
| "entropy": 1.1420456409454345, | |
| "epoch": 2.8398384925975773, | |
| "grad_norm": 1.266627550125122, | |
| "learning_rate": 8.578544839750141e-05, | |
| "loss": 0.06839650273323059, | |
| "mean_token_accuracy": 0.9735166966915131, | |
| "num_tokens": 37811111.0, | |
| "step": 2110 | |
| }, | |
| { | |
| "entropy": 1.136332881450653, | |
| "epoch": 2.8532974427994615, | |
| "grad_norm": 1.6329811811447144, | |
| "learning_rate": 8.562966199229399e-05, | |
| "loss": 0.0761029601097107, | |
| "mean_token_accuracy": 0.9703849673271179, | |
| "num_tokens": 37991040.0, | |
| "step": 2120 | |
| }, | |
| { | |
| "entropy": 1.1302747488021851, | |
| "epoch": 2.8667563930013458, | |
| "grad_norm": 1.355210304260254, | |
| "learning_rate": 8.547316968019363e-05, | |
| "loss": 0.07443415522575378, | |
| "mean_token_accuracy": 0.971444720029831, | |
| "num_tokens": 38169539.0, | |
| "step": 2130 | |
| }, | |
| { | |
| "entropy": 1.1407261610031127, | |
| "epoch": 2.88021534320323, | |
| "grad_norm": 1.0608882904052734, | |
| "learning_rate": 8.531597456167885e-05, | |
| "loss": 0.07463377118110656, | |
| "mean_token_accuracy": 0.9705821752548218, | |
| "num_tokens": 38348423.0, | |
| "step": 2140 | |
| }, | |
| { | |
| "entropy": 1.121661639213562, | |
| "epoch": 2.8936742934051143, | |
| "grad_norm": 1.4264295101165771, | |
| "learning_rate": 8.515807975115239e-05, | |
| "loss": 0.06971895098686218, | |
| "mean_token_accuracy": 0.9723855495452881, | |
| "num_tokens": 38527736.0, | |
| "step": 2150 | |
| }, | |
| { | |
| "entropy": 1.117723774909973, | |
| "epoch": 2.9071332436069985, | |
| "grad_norm": 1.114941120147705, | |
| "learning_rate": 8.499948837687959e-05, | |
| "loss": 0.07229661345481872, | |
| "mean_token_accuracy": 0.9710954666137696, | |
| "num_tokens": 38706893.0, | |
| "step": 2160 | |
| }, | |
| { | |
| "entropy": 1.12820805311203, | |
| "epoch": 2.920592193808883, | |
| "grad_norm": 1.219247817993164, | |
| "learning_rate": 8.484020358092625e-05, | |
| "loss": 0.07078794836997986, | |
| "mean_token_accuracy": 0.9723483324050903, | |
| "num_tokens": 38886237.0, | |
| "step": 2170 | |
| }, | |
| { | |
| "entropy": 1.1263705372810364, | |
| "epoch": 2.934051144010767, | |
| "grad_norm": 1.9336564540863037, | |
| "learning_rate": 8.468022851909657e-05, | |
| "loss": 0.07343355417251587, | |
| "mean_token_accuracy": 0.9712878108024597, | |
| "num_tokens": 39065800.0, | |
| "step": 2180 | |
| }, | |
| { | |
| "entropy": 1.1296117424964904, | |
| "epoch": 2.9475100942126513, | |
| "grad_norm": 1.1307294368743896, | |
| "learning_rate": 8.451956636087046e-05, | |
| "loss": 0.07211248874664307, | |
| "mean_token_accuracy": 0.9710810720920563, | |
| "num_tokens": 39245521.0, | |
| "step": 2190 | |
| }, | |
| { | |
| "entropy": 1.1172988891601563, | |
| "epoch": 2.9609690444145356, | |
| "grad_norm": 1.1942963600158691, | |
| "learning_rate": 8.435822028934087e-05, | |
| "loss": 0.07098879814147949, | |
| "mean_token_accuracy": 0.9725285410881043, | |
| "num_tokens": 39424709.0, | |
| "step": 2200 | |
| }, | |
| { | |
| "entropy": 1.1155216932296752, | |
| "epoch": 2.97442799461642, | |
| "grad_norm": 1.241036057472229, | |
| "learning_rate": 8.41961935011506e-05, | |
| "loss": 0.06951723098754883, | |
| "mean_token_accuracy": 0.9731462299823761, | |
| "num_tokens": 39603326.0, | |
| "step": 2210 | |
| }, | |
| { | |
| "entropy": 1.1296806812286377, | |
| "epoch": 2.987886944818304, | |
| "grad_norm": 1.249014139175415, | |
| "learning_rate": 8.403348920642911e-05, | |
| "loss": 0.07304394245147705, | |
| "mean_token_accuracy": 0.9720990836620331, | |
| "num_tokens": 39782396.0, | |
| "step": 2220 | |
| }, | |
| { | |
| "epoch": 3.0, | |
| "eval_entropy": 1.1091555235492196, | |
| "eval_loss": 0.10757029801607132, | |
| "eval_mean_token_accuracy": 0.9568785178433558, | |
| "eval_num_tokens": 39944284.0, | |
| "eval_runtime": 12.7473, | |
| "eval_samples_per_second": 392.239, | |
| "eval_steps_per_second": 12.316, | |
| "step": 2229 | |
| }, | |
| { | |
| "entropy": 1.1163917779922485, | |
| "epoch": 3.0013458950201883, | |
| "grad_norm": 1.0238746404647827, | |
| "learning_rate": 8.387011062872883e-05, | |
| "loss": 0.06967235207557679, | |
| "mean_token_accuracy": 0.9721763372421265, | |
| "num_tokens": 39962316.0, | |
| "step": 2230 | |
| }, | |
| { | |
| "entropy": 1.09123033285141, | |
| "epoch": 3.0148048452220726, | |
| "grad_norm": 1.2878422737121582, | |
| "learning_rate": 8.370606100496128e-05, | |
| "loss": 0.04885563850402832, | |
| "mean_token_accuracy": 0.9820802330970764, | |
| "num_tokens": 40141723.0, | |
| "step": 2240 | |
| }, | |
| { | |
| "entropy": 1.0791454672813416, | |
| "epoch": 3.028263795423957, | |
| "grad_norm": 1.1635181903839111, | |
| "learning_rate": 8.354134358533301e-05, | |
| "loss": 0.052800750732421874, | |
| "mean_token_accuracy": 0.9794562578201294, | |
| "num_tokens": 40320756.0, | |
| "step": 2250 | |
| }, | |
| { | |
| "entropy": 1.0924689769744873, | |
| "epoch": 3.041722745625841, | |
| "grad_norm": 1.0146963596343994, | |
| "learning_rate": 8.337596163328114e-05, | |
| "loss": 0.051714420318603516, | |
| "mean_token_accuracy": 0.9811573922634125, | |
| "num_tokens": 40499632.0, | |
| "step": 2260 | |
| }, | |
| { | |
| "entropy": 1.086620819568634, | |
| "epoch": 3.0551816958277254, | |
| "grad_norm": 1.1477982997894287, | |
| "learning_rate": 8.320991842540875e-05, | |
| "loss": 0.046452611684799194, | |
| "mean_token_accuracy": 0.9823605060577393, | |
| "num_tokens": 40679088.0, | |
| "step": 2270 | |
| }, | |
| { | |
| "entropy": 1.0724918246269226, | |
| "epoch": 3.0686406460296096, | |
| "grad_norm": 1.5574088096618652, | |
| "learning_rate": 8.304321725141995e-05, | |
| "loss": 0.05086854100227356, | |
| "mean_token_accuracy": 0.981047946214676, | |
| "num_tokens": 40859040.0, | |
| "step": 2280 | |
| }, | |
| { | |
| "entropy": 1.0786724448204041, | |
| "epoch": 3.082099596231494, | |
| "grad_norm": 1.0978903770446777, | |
| "learning_rate": 8.287586141405464e-05, | |
| "loss": 0.053134101629257205, | |
| "mean_token_accuracy": 0.9794782817363739, | |
| "num_tokens": 41038118.0, | |
| "step": 2290 | |
| }, | |
| { | |
| "entropy": 1.0825668334960938, | |
| "epoch": 3.095558546433378, | |
| "grad_norm": 1.1968939304351807, | |
| "learning_rate": 8.27078542290232e-05, | |
| "loss": 0.04702814817428589, | |
| "mean_token_accuracy": 0.9820038139820099, | |
| "num_tokens": 41217196.0, | |
| "step": 2300 | |
| }, | |
| { | |
| "entropy": 1.0657127141952514, | |
| "epoch": 3.1090174966352624, | |
| "grad_norm": 1.544787049293518, | |
| "learning_rate": 8.253919902494071e-05, | |
| "loss": 0.051616507768630984, | |
| "mean_token_accuracy": 0.9803533375263214, | |
| "num_tokens": 41396525.0, | |
| "step": 2310 | |
| }, | |
| { | |
| "entropy": 1.0903544425964355, | |
| "epoch": 3.1224764468371466, | |
| "grad_norm": 1.2193773984909058, | |
| "learning_rate": 8.236989914326101e-05, | |
| "loss": 0.049640601873397826, | |
| "mean_token_accuracy": 0.9816096484661102, | |
| "num_tokens": 41575854.0, | |
| "step": 2320 | |
| }, | |
| { | |
| "entropy": 1.0869248747825622, | |
| "epoch": 3.135935397039031, | |
| "grad_norm": 1.1424866914749146, | |
| "learning_rate": 8.21999579382105e-05, | |
| "loss": 0.049025171995162965, | |
| "mean_token_accuracy": 0.9807829558849335, | |
| "num_tokens": 41755176.0, | |
| "step": 2330 | |
| }, | |
| { | |
| "entropy": 1.0788421630859375, | |
| "epoch": 3.149394347240915, | |
| "grad_norm": 1.1553189754486084, | |
| "learning_rate": 8.202937877672175e-05, | |
| "loss": 0.05088653564453125, | |
| "mean_token_accuracy": 0.9806456983089447, | |
| "num_tokens": 41935004.0, | |
| "step": 2340 | |
| }, | |
| { | |
| "entropy": 1.0868588209152221, | |
| "epoch": 3.1628532974427994, | |
| "grad_norm": 1.2867755889892578, | |
| "learning_rate": 8.185816503836665e-05, | |
| "loss": 0.04932470321655273, | |
| "mean_token_accuracy": 0.9814176499843598, | |
| "num_tokens": 42114422.0, | |
| "step": 2350 | |
| }, | |
| { | |
| "entropy": 1.077353870868683, | |
| "epoch": 3.1763122476446837, | |
| "grad_norm": 1.0823496580123901, | |
| "learning_rate": 8.168632011528961e-05, | |
| "loss": 0.05128706097602844, | |
| "mean_token_accuracy": 0.9801781177520752, | |
| "num_tokens": 42293971.0, | |
| "step": 2360 | |
| }, | |
| { | |
| "entropy": 1.066714358329773, | |
| "epoch": 3.189771197846568, | |
| "grad_norm": 1.2061221599578857, | |
| "learning_rate": 8.15138474121403e-05, | |
| "loss": 0.04973976612091065, | |
| "mean_token_accuracy": 0.9804952144622803, | |
| "num_tokens": 42473552.0, | |
| "step": 2370 | |
| }, | |
| { | |
| "entropy": 1.0797779440879822, | |
| "epoch": 3.203230148048452, | |
| "grad_norm": 1.111051321029663, | |
| "learning_rate": 8.134075034600609e-05, | |
| "loss": 0.05457779169082642, | |
| "mean_token_accuracy": 0.979478371143341, | |
| "num_tokens": 42652094.0, | |
| "step": 2380 | |
| }, | |
| { | |
| "entropy": 1.0842301845550537, | |
| "epoch": 3.2166890982503364, | |
| "grad_norm": 1.2103989124298096, | |
| "learning_rate": 8.116703234634453e-05, | |
| "loss": 0.05012301206588745, | |
| "mean_token_accuracy": 0.9813940584659576, | |
| "num_tokens": 42831176.0, | |
| "step": 2390 | |
| }, | |
| { | |
| "entropy": 1.072116458415985, | |
| "epoch": 3.2301480484522207, | |
| "grad_norm": 1.1604132652282715, | |
| "learning_rate": 8.099269685491528e-05, | |
| "loss": 0.0516257107257843, | |
| "mean_token_accuracy": 0.9805888772010803, | |
| "num_tokens": 43010038.0, | |
| "step": 2400 | |
| }, | |
| { | |
| "entropy": 1.0759256601333618, | |
| "epoch": 3.243606998654105, | |
| "grad_norm": 1.055504560470581, | |
| "learning_rate": 8.081774732571196e-05, | |
| "loss": 0.05206198692321777, | |
| "mean_token_accuracy": 0.9800681293010711, | |
| "num_tokens": 43189534.0, | |
| "step": 2410 | |
| }, | |
| { | |
| "entropy": 1.074904704093933, | |
| "epoch": 3.257065948855989, | |
| "grad_norm": 1.3270564079284668, | |
| "learning_rate": 8.06421872248937e-05, | |
| "loss": 0.053070676326751706, | |
| "mean_token_accuracy": 0.9799122869968414, | |
| "num_tokens": 43369326.0, | |
| "step": 2420 | |
| }, | |
| { | |
| "entropy": 1.0772022962570191, | |
| "epoch": 3.2705248990578735, | |
| "grad_norm": 1.1767622232437134, | |
| "learning_rate": 8.046602003071648e-05, | |
| "loss": 0.05088210105895996, | |
| "mean_token_accuracy": 0.980486124753952, | |
| "num_tokens": 43547755.0, | |
| "step": 2430 | |
| }, | |
| { | |
| "entropy": 1.0786782264709474, | |
| "epoch": 3.2839838492597577, | |
| "grad_norm": 1.320910930633545, | |
| "learning_rate": 8.028924923346426e-05, | |
| "loss": 0.05664767622947693, | |
| "mean_token_accuracy": 0.9779302835464477, | |
| "num_tokens": 43726776.0, | |
| "step": 2440 | |
| }, | |
| { | |
| "entropy": 1.0860824465751648, | |
| "epoch": 3.297442799461642, | |
| "grad_norm": 1.2538353204727173, | |
| "learning_rate": 8.011187833537972e-05, | |
| "loss": 0.053240764141082766, | |
| "mean_token_accuracy": 0.9801206707954406, | |
| "num_tokens": 43905546.0, | |
| "step": 2450 | |
| }, | |
| { | |
| "entropy": 1.081036913394928, | |
| "epoch": 3.3109017496635262, | |
| "grad_norm": 1.1865931749343872, | |
| "learning_rate": 7.993391085059502e-05, | |
| "loss": 0.05133863687515259, | |
| "mean_token_accuracy": 0.9802879393100739, | |
| "num_tokens": 44084371.0, | |
| "step": 2460 | |
| }, | |
| { | |
| "entropy": 1.0795913338661194, | |
| "epoch": 3.3243606998654105, | |
| "grad_norm": 1.3042405843734741, | |
| "learning_rate": 7.975535030506203e-05, | |
| "loss": 0.053297781944274904, | |
| "mean_token_accuracy": 0.9790417075157165, | |
| "num_tokens": 44263366.0, | |
| "step": 2470 | |
| }, | |
| { | |
| "entropy": 1.0945096254348754, | |
| "epoch": 3.3378196500672948, | |
| "grad_norm": 1.224970817565918, | |
| "learning_rate": 7.957620023648256e-05, | |
| "loss": 0.05278623104095459, | |
| "mean_token_accuracy": 0.9799198031425476, | |
| "num_tokens": 44442200.0, | |
| "step": 2480 | |
| }, | |
| { | |
| "entropy": 1.0911083102226258, | |
| "epoch": 3.351278600269179, | |
| "grad_norm": 1.1166850328445435, | |
| "learning_rate": 7.939646419423826e-05, | |
| "loss": 0.055463647842407225, | |
| "mean_token_accuracy": 0.9793131470680236, | |
| "num_tokens": 44621518.0, | |
| "step": 2490 | |
| }, | |
| { | |
| "entropy": 1.0902422070503235, | |
| "epoch": 3.3647375504710633, | |
| "grad_norm": 1.3478926420211792, | |
| "learning_rate": 7.92161457393203e-05, | |
| "loss": 0.05521976947784424, | |
| "mean_token_accuracy": 0.9795126497745514, | |
| "num_tokens": 44800926.0, | |
| "step": 2500 | |
| }, | |
| { | |
| "entropy": 1.0822187185287475, | |
| "epoch": 3.3781965006729475, | |
| "grad_norm": 1.3261746168136597, | |
| "learning_rate": 7.903524844425878e-05, | |
| "loss": 0.0533198356628418, | |
| "mean_token_accuracy": 0.9789367377758026, | |
| "num_tokens": 44980265.0, | |
| "step": 2510 | |
| }, | |
| { | |
| "entropy": 1.087248969078064, | |
| "epoch": 3.391655450874832, | |
| "grad_norm": 1.005294919013977, | |
| "learning_rate": 7.885377589305197e-05, | |
| "loss": 0.05112231373786926, | |
| "mean_token_accuracy": 0.980577951669693, | |
| "num_tokens": 45159183.0, | |
| "step": 2520 | |
| }, | |
| { | |
| "entropy": 1.0852233290672302, | |
| "epoch": 3.405114401076716, | |
| "grad_norm": 1.2185736894607544, | |
| "learning_rate": 7.867173168109534e-05, | |
| "loss": 0.05045679807662964, | |
| "mean_token_accuracy": 0.9805097222328186, | |
| "num_tokens": 45338667.0, | |
| "step": 2530 | |
| }, | |
| { | |
| "entropy": 1.0843430399894713, | |
| "epoch": 3.4185733512786003, | |
| "grad_norm": 1.183929681777954, | |
| "learning_rate": 7.84891194151103e-05, | |
| "loss": 0.050162500143051146, | |
| "mean_token_accuracy": 0.9804108917713166, | |
| "num_tokens": 45517477.0, | |
| "step": 2540 | |
| }, | |
| { | |
| "entropy": 1.0773622393608093, | |
| "epoch": 3.4320323014804845, | |
| "grad_norm": 1.0225229263305664, | |
| "learning_rate": 7.830594271307267e-05, | |
| "loss": 0.05130444765090943, | |
| "mean_token_accuracy": 0.9804662525653839, | |
| "num_tokens": 45696935.0, | |
| "step": 2550 | |
| }, | |
| { | |
| "entropy": 1.0753762125968933, | |
| "epoch": 3.445491251682369, | |
| "grad_norm": 1.1646474599838257, | |
| "learning_rate": 7.812220520414115e-05, | |
| "loss": 0.05273773670196533, | |
| "mean_token_accuracy": 0.9797065794467926, | |
| "num_tokens": 45875869.0, | |
| "step": 2560 | |
| }, | |
| { | |
| "entropy": 1.0715667009353638, | |
| "epoch": 3.458950201884253, | |
| "grad_norm": 1.1959201097488403, | |
| "learning_rate": 7.793791052858528e-05, | |
| "loss": 0.053073453903198245, | |
| "mean_token_accuracy": 0.9801750302314758, | |
| "num_tokens": 46055697.0, | |
| "step": 2570 | |
| }, | |
| { | |
| "entropy": 1.07788827419281, | |
| "epoch": 3.4724091520861373, | |
| "grad_norm": 1.3329288959503174, | |
| "learning_rate": 7.775306233771343e-05, | |
| "loss": 0.053553718328475955, | |
| "mean_token_accuracy": 0.9794677913188934, | |
| "num_tokens": 46234543.0, | |
| "step": 2580 | |
| }, | |
| { | |
| "entropy": 1.0729936122894288, | |
| "epoch": 3.4858681022880216, | |
| "grad_norm": 1.261353850364685, | |
| "learning_rate": 7.756766429380033e-05, | |
| "loss": 0.0518909215927124, | |
| "mean_token_accuracy": 0.9810412526130676, | |
| "num_tokens": 46413969.0, | |
| "step": 2590 | |
| }, | |
| { | |
| "entropy": 1.0738806128501892, | |
| "epoch": 3.499327052489906, | |
| "grad_norm": 1.5493957996368408, | |
| "learning_rate": 7.738172007001465e-05, | |
| "loss": 0.05155509114265442, | |
| "mean_token_accuracy": 0.9803483247756958, | |
| "num_tokens": 46593798.0, | |
| "step": 2600 | |
| }, | |
| { | |
| "entropy": 1.0901301860809327, | |
| "epoch": 3.51278600269179, | |
| "grad_norm": 1.0253880023956299, | |
| "learning_rate": 7.719523335034612e-05, | |
| "loss": 0.05098943710327149, | |
| "mean_token_accuracy": 0.9800853669643402, | |
| "num_tokens": 46772326.0, | |
| "step": 2610 | |
| }, | |
| { | |
| "entropy": 1.0800783038139343, | |
| "epoch": 3.5262449528936743, | |
| "grad_norm": 1.4688411951065063, | |
| "learning_rate": 7.70082078295326e-05, | |
| "loss": 0.05392987728118896, | |
| "mean_token_accuracy": 0.978635448217392, | |
| "num_tokens": 46950959.0, | |
| "step": 2620 | |
| }, | |
| { | |
| "entropy": 1.090990924835205, | |
| "epoch": 3.5397039030955586, | |
| "grad_norm": 1.3596893548965454, | |
| "learning_rate": 7.682064721298683e-05, | |
| "loss": 0.053889667987823485, | |
| "mean_token_accuracy": 0.9795092165470123, | |
| "num_tokens": 47129717.0, | |
| "step": 2630 | |
| }, | |
| { | |
| "entropy": 1.1031178951263427, | |
| "epoch": 3.553162853297443, | |
| "grad_norm": 1.2542155981063843, | |
| "learning_rate": 7.663255521672308e-05, | |
| "loss": 0.05507693886756897, | |
| "mean_token_accuracy": 0.9788171648979187, | |
| "num_tokens": 47308851.0, | |
| "step": 2640 | |
| }, | |
| { | |
| "entropy": 1.0928621292114258, | |
| "epoch": 3.566621803499327, | |
| "grad_norm": 1.328484058380127, | |
| "learning_rate": 7.64439355672835e-05, | |
| "loss": 0.05089789628982544, | |
| "mean_token_accuracy": 0.9808636486530304, | |
| "num_tokens": 47487370.0, | |
| "step": 2650 | |
| }, | |
| { | |
| "entropy": 1.083961284160614, | |
| "epoch": 3.5800807537012114, | |
| "grad_norm": 1.1594445705413818, | |
| "learning_rate": 7.625479200166425e-05, | |
| "loss": 0.0499467521905899, | |
| "mean_token_accuracy": 0.9808259546756745, | |
| "num_tokens": 47667176.0, | |
| "step": 2660 | |
| }, | |
| { | |
| "entropy": 1.0920774698257447, | |
| "epoch": 3.5935397039030956, | |
| "grad_norm": 1.3259137868881226, | |
| "learning_rate": 7.606512826724155e-05, | |
| "loss": 0.051684331893920896, | |
| "mean_token_accuracy": 0.9798979878425598, | |
| "num_tokens": 47845643.0, | |
| "step": 2670 | |
| }, | |
| { | |
| "entropy": 1.0919681429862975, | |
| "epoch": 3.60699865410498, | |
| "grad_norm": 1.3699877262115479, | |
| "learning_rate": 7.587494812169728e-05, | |
| "loss": 0.051729452610015866, | |
| "mean_token_accuracy": 0.980066442489624, | |
| "num_tokens": 48024631.0, | |
| "step": 2680 | |
| }, | |
| { | |
| "entropy": 1.084936547279358, | |
| "epoch": 3.620457604306864, | |
| "grad_norm": 1.2969093322753906, | |
| "learning_rate": 7.568425533294476e-05, | |
| "loss": 0.054199641942977904, | |
| "mean_token_accuracy": 0.9787844777107239, | |
| "num_tokens": 48204028.0, | |
| "step": 2690 | |
| }, | |
| { | |
| "entropy": 1.0940564393997192, | |
| "epoch": 3.6339165545087484, | |
| "grad_norm": 1.0239887237548828, | |
| "learning_rate": 7.549305367905385e-05, | |
| "loss": 0.054245364665985105, | |
| "mean_token_accuracy": 0.9782158195972442, | |
| "num_tokens": 48382282.0, | |
| "step": 2700 | |
| }, | |
| { | |
| "entropy": 1.0823640823364258, | |
| "epoch": 3.6473755047106327, | |
| "grad_norm": 1.1898547410964966, | |
| "learning_rate": 7.53013469481763e-05, | |
| "loss": 0.056242328882217404, | |
| "mean_token_accuracy": 0.9788333296775817, | |
| "num_tokens": 48561545.0, | |
| "step": 2710 | |
| }, | |
| { | |
| "entropy": 1.0766677498817443, | |
| "epoch": 3.660834454912517, | |
| "grad_norm": 1.2227085828781128, | |
| "learning_rate": 7.510913893847058e-05, | |
| "loss": 0.0562747597694397, | |
| "mean_token_accuracy": 0.9780334293842315, | |
| "num_tokens": 48741185.0, | |
| "step": 2720 | |
| }, | |
| { | |
| "entropy": 1.091788113117218, | |
| "epoch": 3.674293405114401, | |
| "grad_norm": 1.517991304397583, | |
| "learning_rate": 7.491643345802667e-05, | |
| "loss": 0.05473091006278992, | |
| "mean_token_accuracy": 0.9788542568683625, | |
| "num_tokens": 48920353.0, | |
| "step": 2730 | |
| }, | |
| { | |
| "entropy": 1.1030115008354187, | |
| "epoch": 3.6877523553162854, | |
| "grad_norm": 1.3098318576812744, | |
| "learning_rate": 7.472323432479062e-05, | |
| "loss": 0.058284854888916014, | |
| "mean_token_accuracy": 0.9783021330833435, | |
| "num_tokens": 49099763.0, | |
| "step": 2740 | |
| }, | |
| { | |
| "entropy": 1.0996167540550232, | |
| "epoch": 3.7012113055181697, | |
| "grad_norm": 1.1821726560592651, | |
| "learning_rate": 7.452954536648888e-05, | |
| "loss": 0.0540702223777771, | |
| "mean_token_accuracy": 0.979110324382782, | |
| "num_tokens": 49278776.0, | |
| "step": 2750 | |
| }, | |
| { | |
| "entropy": 1.0863705158233643, | |
| "epoch": 3.714670255720054, | |
| "grad_norm": 1.2044000625610352, | |
| "learning_rate": 7.433537042055248e-05, | |
| "loss": 0.05256187915802002, | |
| "mean_token_accuracy": 0.9793642222881317, | |
| "num_tokens": 49457667.0, | |
| "step": 2760 | |
| }, | |
| { | |
| "entropy": 1.0813929796218873, | |
| "epoch": 3.728129205921938, | |
| "grad_norm": 0.9910821318626404, | |
| "learning_rate": 7.414071333404104e-05, | |
| "loss": 0.053735208511352536, | |
| "mean_token_accuracy": 0.9796746790409088, | |
| "num_tokens": 49637000.0, | |
| "step": 2770 | |
| }, | |
| { | |
| "entropy": 1.0826418280601502, | |
| "epoch": 3.7415881561238225, | |
| "grad_norm": 1.16364586353302, | |
| "learning_rate": 7.394557796356644e-05, | |
| "loss": 0.05005708336830139, | |
| "mean_token_accuracy": 0.9802528321743011, | |
| "num_tokens": 49815945.0, | |
| "step": 2780 | |
| }, | |
| { | |
| "entropy": 1.0849217176437378, | |
| "epoch": 3.7550471063257067, | |
| "grad_norm": 1.1651335954666138, | |
| "learning_rate": 7.374996817521653e-05, | |
| "loss": 0.05107729434967041, | |
| "mean_token_accuracy": 0.9808483123779297, | |
| "num_tokens": 49995002.0, | |
| "step": 2790 | |
| }, | |
| { | |
| "entropy": 1.0982914447784424, | |
| "epoch": 3.768506056527591, | |
| "grad_norm": 1.300089716911316, | |
| "learning_rate": 7.35538878444785e-05, | |
| "loss": 0.05192199945449829, | |
| "mean_token_accuracy": 0.9799326181411743, | |
| "num_tokens": 50174611.0, | |
| "step": 2800 | |
| }, | |
| { | |
| "entropy": 1.0978451728820802, | |
| "epoch": 3.781965006729475, | |
| "grad_norm": 1.131295919418335, | |
| "learning_rate": 7.335734085616206e-05, | |
| "loss": 0.05380460023880005, | |
| "mean_token_accuracy": 0.9787914991378784, | |
| "num_tokens": 50354572.0, | |
| "step": 2810 | |
| }, | |
| { | |
| "entropy": 1.1055003643035888, | |
| "epoch": 3.7954239569313595, | |
| "grad_norm": 1.1545559167861938, | |
| "learning_rate": 7.316033110432249e-05, | |
| "loss": 0.05247194766998291, | |
| "mean_token_accuracy": 0.9800500810146332, | |
| "num_tokens": 50533810.0, | |
| "step": 2820 | |
| }, | |
| { | |
| "entropy": 1.1019269943237304, | |
| "epoch": 3.8088829071332437, | |
| "grad_norm": 1.439558744430542, | |
| "learning_rate": 7.296286249218352e-05, | |
| "loss": 0.05247552394866943, | |
| "mean_token_accuracy": 0.9799910664558411, | |
| "num_tokens": 50712921.0, | |
| "step": 2830 | |
| }, | |
| { | |
| "entropy": 1.1034127831459046, | |
| "epoch": 3.822341857335128, | |
| "grad_norm": 1.1686484813690186, | |
| "learning_rate": 7.276493893205995e-05, | |
| "loss": 0.053317368030548096, | |
| "mean_token_accuracy": 0.9795270919799804, | |
| "num_tokens": 50892646.0, | |
| "step": 2840 | |
| }, | |
| { | |
| "entropy": 1.0969440698623658, | |
| "epoch": 3.8358008075370122, | |
| "grad_norm": 1.1592854261398315, | |
| "learning_rate": 7.256656434528018e-05, | |
| "loss": 0.05192090272903442, | |
| "mean_token_accuracy": 0.9800365567207336, | |
| "num_tokens": 51072211.0, | |
| "step": 2850 | |
| }, | |
| { | |
| "entropy": 1.090685820579529, | |
| "epoch": 3.8492597577388965, | |
| "grad_norm": 1.5477663278579712, | |
| "learning_rate": 7.236774266210852e-05, | |
| "loss": 0.051884579658508304, | |
| "mean_token_accuracy": 0.9792610287666321, | |
| "num_tokens": 51251374.0, | |
| "step": 2860 | |
| }, | |
| { | |
| "entropy": 1.1031171798706054, | |
| "epoch": 3.8627187079407808, | |
| "grad_norm": 0.9951223134994507, | |
| "learning_rate": 7.216847782166727e-05, | |
| "loss": 0.051608985662460326, | |
| "mean_token_accuracy": 0.9799349427223205, | |
| "num_tokens": 51429845.0, | |
| "step": 2870 | |
| }, | |
| { | |
| "entropy": 1.0826797842979432, | |
| "epoch": 3.876177658142665, | |
| "grad_norm": 1.0834344625473022, | |
| "learning_rate": 7.196877377185872e-05, | |
| "loss": 0.05112177133560181, | |
| "mean_token_accuracy": 0.980464369058609, | |
| "num_tokens": 51609039.0, | |
| "step": 2880 | |
| }, | |
| { | |
| "entropy": 1.0658052206039428, | |
| "epoch": 3.8896366083445493, | |
| "grad_norm": 1.361104130744934, | |
| "learning_rate": 7.176863446928694e-05, | |
| "loss": 0.05474343299865723, | |
| "mean_token_accuracy": 0.9786961674690247, | |
| "num_tokens": 51787900.0, | |
| "step": 2890 | |
| }, | |
| { | |
| "entropy": 1.0786802649497986, | |
| "epoch": 3.9030955585464335, | |
| "grad_norm": 1.3113523721694946, | |
| "learning_rate": 7.156806387917937e-05, | |
| "loss": 0.05351792573928833, | |
| "mean_token_accuracy": 0.9795374929904938, | |
| "num_tokens": 51967068.0, | |
| "step": 2900 | |
| }, | |
| { | |
| "entropy": 1.0683523178100587, | |
| "epoch": 3.916554508748318, | |
| "grad_norm": 1.172418236732483, | |
| "learning_rate": 7.136706597530825e-05, | |
| "loss": 0.05207247734069824, | |
| "mean_token_accuracy": 0.9796711683273316, | |
| "num_tokens": 52146442.0, | |
| "step": 2910 | |
| }, | |
| { | |
| "entropy": 1.0647220134735107, | |
| "epoch": 3.930013458950202, | |
| "grad_norm": 1.109689712524414, | |
| "learning_rate": 7.116564473991192e-05, | |
| "loss": 0.050079309940338136, | |
| "mean_token_accuracy": 0.9804611086845398, | |
| "num_tokens": 52325461.0, | |
| "step": 2920 | |
| }, | |
| { | |
| "entropy": 1.0695144891738892, | |
| "epoch": 3.9434724091520863, | |
| "grad_norm": 1.0333713293075562, | |
| "learning_rate": 7.096380416361588e-05, | |
| "loss": 0.05163470506668091, | |
| "mean_token_accuracy": 0.9802155733108521, | |
| "num_tokens": 52505157.0, | |
| "step": 2930 | |
| }, | |
| { | |
| "entropy": 1.0664403676986693, | |
| "epoch": 3.9569313593539706, | |
| "grad_norm": 1.1881392002105713, | |
| "learning_rate": 7.076154824535381e-05, | |
| "loss": 0.05164710283279419, | |
| "mean_token_accuracy": 0.9804204642772675, | |
| "num_tokens": 52684876.0, | |
| "step": 2940 | |
| }, | |
| { | |
| "entropy": 1.061314308643341, | |
| "epoch": 3.970390309555855, | |
| "grad_norm": 1.1483125686645508, | |
| "learning_rate": 7.055888099228825e-05, | |
| "loss": 0.051780533790588376, | |
| "mean_token_accuracy": 0.9797692000865936, | |
| "num_tokens": 52864664.0, | |
| "step": 2950 | |
| }, | |
| { | |
| "entropy": 1.0693163990974426, | |
| "epoch": 3.983849259757739, | |
| "grad_norm": 1.319306492805481, | |
| "learning_rate": 7.035580641973119e-05, | |
| "loss": 0.054670310020446776, | |
| "mean_token_accuracy": 0.9792993664741516, | |
| "num_tokens": 53043689.0, | |
| "step": 2960 | |
| }, | |
| { | |
| "entropy": 1.071792209148407, | |
| "epoch": 3.9973082099596233, | |
| "grad_norm": 1.1120209693908691, | |
| "learning_rate": 7.015232855106468e-05, | |
| "loss": 0.05089702606201172, | |
| "mean_token_accuracy": 0.9801424205303192, | |
| "num_tokens": 53222975.0, | |
| "step": 2970 | |
| }, | |
| { | |
| "epoch": 4.0, | |
| "eval_entropy": 1.0652484506558462, | |
| "eval_loss": 0.11533018946647644, | |
| "eval_mean_token_accuracy": 0.9572236617659308, | |
| "eval_num_tokens": 53259126.0, | |
| "eval_runtime": 12.7495, | |
| "eval_samples_per_second": 392.172, | |
| "eval_steps_per_second": 12.314, | |
| "step": 2972 | |
| }, | |
| { | |
| "entropy": 1.0567853927612305, | |
| "epoch": 4.010767160161508, | |
| "grad_norm": 1.0667976140975952, | |
| "learning_rate": 6.994845141766093e-05, | |
| "loss": 0.037795445322990416, | |
| "mean_token_accuracy": 0.9864524364471435, | |
| "num_tokens": 53401849.0, | |
| "step": 2980 | |
| }, | |
| { | |
| "entropy": 1.031903612613678, | |
| "epoch": 4.024226110363392, | |
| "grad_norm": 1.3452744483947754, | |
| "learning_rate": 6.974417905880255e-05, | |
| "loss": 0.03760968446731568, | |
| "mean_token_accuracy": 0.9865435302257538, | |
| "num_tokens": 53581109.0, | |
| "step": 2990 | |
| }, | |
| { | |
| "entropy": 1.0267599463462829, | |
| "epoch": 4.037685060565276, | |
| "grad_norm": 0.9383119940757751, | |
| "learning_rate": 6.953951552160248e-05, | |
| "loss": 0.034871619939804074, | |
| "mean_token_accuracy": 0.9870846211910248, | |
| "num_tokens": 53760927.0, | |
| "step": 3000 | |
| }, | |
| { | |
| "entropy": 1.0307875275611877, | |
| "epoch": 4.05114401076716, | |
| "grad_norm": 1.077243447303772, | |
| "learning_rate": 6.933446486092381e-05, | |
| "loss": 0.03702903389930725, | |
| "mean_token_accuracy": 0.9866882026195526, | |
| "num_tokens": 53939706.0, | |
| "step": 3010 | |
| }, | |
| { | |
| "entropy": 1.0310243368148804, | |
| "epoch": 4.064602960969045, | |
| "grad_norm": 1.3179757595062256, | |
| "learning_rate": 6.912903113929947e-05, | |
| "loss": 0.03828337490558624, | |
| "mean_token_accuracy": 0.9860410630702973, | |
| "num_tokens": 54118885.0, | |
| "step": 3020 | |
| }, | |
| { | |
| "entropy": 1.0358707904815674, | |
| "epoch": 4.078061911170929, | |
| "grad_norm": 0.949308454990387, | |
| "learning_rate": 6.892321842685171e-05, | |
| "loss": 0.036287522315979, | |
| "mean_token_accuracy": 0.9870272636413574, | |
| "num_tokens": 54298699.0, | |
| "step": 3030 | |
| }, | |
| { | |
| "entropy": 1.0304685473442077, | |
| "epoch": 4.091520861372813, | |
| "grad_norm": 1.271782636642456, | |
| "learning_rate": 6.871703080121148e-05, | |
| "loss": 0.036009562015533444, | |
| "mean_token_accuracy": 0.9868483424186707, | |
| "num_tokens": 54477713.0, | |
| "step": 3040 | |
| }, | |
| { | |
| "entropy": 1.0414454936981201, | |
| "epoch": 4.104979811574697, | |
| "grad_norm": 1.2563632726669312, | |
| "learning_rate": 6.851047234743763e-05, | |
| "loss": 0.03808114230632782, | |
| "mean_token_accuracy": 0.9857545673847199, | |
| "num_tokens": 54656606.0, | |
| "step": 3050 | |
| }, | |
| { | |
| "entropy": 1.0467128515243531, | |
| "epoch": 4.118438761776582, | |
| "grad_norm": 1.3075348138809204, | |
| "learning_rate": 6.830354715793598e-05, | |
| "loss": 0.03709319531917572, | |
| "mean_token_accuracy": 0.9860689222812653, | |
| "num_tokens": 54835569.0, | |
| "step": 3060 | |
| }, | |
| { | |
| "entropy": 1.0483134746551515, | |
| "epoch": 4.131897711978466, | |
| "grad_norm": 1.325262188911438, | |
| "learning_rate": 6.809625933237826e-05, | |
| "loss": 0.03990113139152527, | |
| "mean_token_accuracy": 0.9844289362430573, | |
| "num_tokens": 55015181.0, | |
| "step": 3070 | |
| }, | |
| { | |
| "entropy": 1.0507565259933471, | |
| "epoch": 4.14535666218035, | |
| "grad_norm": 0.9971099495887756, | |
| "learning_rate": 6.788861297762086e-05, | |
| "loss": 0.03460783362388611, | |
| "mean_token_accuracy": 0.9869785606861115, | |
| "num_tokens": 55194211.0, | |
| "step": 3080 | |
| }, | |
| { | |
| "entropy": 1.0375954627990722, | |
| "epoch": 4.158815612382234, | |
| "grad_norm": 1.1459321975708008, | |
| "learning_rate": 6.768061220762345e-05, | |
| "loss": 0.039771124720573425, | |
| "mean_token_accuracy": 0.984779155254364, | |
| "num_tokens": 55373415.0, | |
| "step": 3090 | |
| }, | |
| { | |
| "entropy": 1.047706139087677, | |
| "epoch": 4.172274562584119, | |
| "grad_norm": 0.9713513255119324, | |
| "learning_rate": 6.747226114336753e-05, | |
| "loss": 0.036009186506271364, | |
| "mean_token_accuracy": 0.9871737420558929, | |
| "num_tokens": 55552640.0, | |
| "step": 3100 | |
| }, | |
| { | |
| "entropy": 1.0445024371147156, | |
| "epoch": 4.185733512786003, | |
| "grad_norm": 1.0984820127487183, | |
| "learning_rate": 6.726356391277471e-05, | |
| "loss": 0.03706789314746857, | |
| "mean_token_accuracy": 0.98599613904953, | |
| "num_tokens": 55731084.0, | |
| "step": 3110 | |
| }, | |
| { | |
| "entropy": 1.0476725816726684, | |
| "epoch": 4.199192462987887, | |
| "grad_norm": 1.326766014099121, | |
| "learning_rate": 6.7054524650625e-05, | |
| "loss": 0.037712886929512024, | |
| "mean_token_accuracy": 0.9861681163311005, | |
| "num_tokens": 55910040.0, | |
| "step": 3120 | |
| }, | |
| { | |
| "entropy": 1.0618762969970703, | |
| "epoch": 4.212651413189771, | |
| "grad_norm": 1.0765913724899292, | |
| "learning_rate": 6.684514749847482e-05, | |
| "loss": 0.04045211672782898, | |
| "mean_token_accuracy": 0.985183447599411, | |
| "num_tokens": 56089401.0, | |
| "step": 3130 | |
| }, | |
| { | |
| "entropy": 1.0641907811164857, | |
| "epoch": 4.226110363391656, | |
| "grad_norm": 1.1481443643569946, | |
| "learning_rate": 6.663543660457503e-05, | |
| "loss": 0.03950580060482025, | |
| "mean_token_accuracy": 0.9853214621543884, | |
| "num_tokens": 56268108.0, | |
| "step": 3140 | |
| }, | |
| { | |
| "entropy": 1.0535814762115479, | |
| "epoch": 4.23956931359354, | |
| "grad_norm": 1.169559359550476, | |
| "learning_rate": 6.642539612378863e-05, | |
| "loss": 0.03928137719631195, | |
| "mean_token_accuracy": 0.9853263258934021, | |
| "num_tokens": 56447473.0, | |
| "step": 3150 | |
| }, | |
| { | |
| "entropy": 1.0659124374389648, | |
| "epoch": 4.253028263795424, | |
| "grad_norm": 1.4614970684051514, | |
| "learning_rate": 6.621503021750858e-05, | |
| "loss": 0.03726804554462433, | |
| "mean_token_accuracy": 0.9859472155570984, | |
| "num_tokens": 56626084.0, | |
| "step": 3160 | |
| }, | |
| { | |
| "entropy": 1.0618607163429261, | |
| "epoch": 4.2664872139973085, | |
| "grad_norm": 1.1655552387237549, | |
| "learning_rate": 6.600434305357521e-05, | |
| "loss": 0.0387746661901474, | |
| "mean_token_accuracy": 0.9854085683822632, | |
| "num_tokens": 56805198.0, | |
| "step": 3170 | |
| }, | |
| { | |
| "entropy": 1.0502426981925965, | |
| "epoch": 4.279946164199193, | |
| "grad_norm": 0.985268235206604, | |
| "learning_rate": 6.579333880619376e-05, | |
| "loss": 0.035914698243141176, | |
| "mean_token_accuracy": 0.9867818415164947, | |
| "num_tokens": 56984246.0, | |
| "step": 3180 | |
| }, | |
| { | |
| "entropy": 1.0420450568199158, | |
| "epoch": 4.293405114401077, | |
| "grad_norm": 1.1994178295135498, | |
| "learning_rate": 6.558202165585161e-05, | |
| "loss": 0.034868115186691286, | |
| "mean_token_accuracy": 0.9868482947349548, | |
| "num_tokens": 57163039.0, | |
| "step": 3190 | |
| }, | |
| { | |
| "entropy": 1.0321968078613282, | |
| "epoch": 4.306864064602961, | |
| "grad_norm": 1.160141110420227, | |
| "learning_rate": 6.53703957892355e-05, | |
| "loss": 0.03723163604736328, | |
| "mean_token_accuracy": 0.9852693140506744, | |
| "num_tokens": 57342230.0, | |
| "step": 3200 | |
| }, | |
| { | |
| "entropy": 1.0504021406173707, | |
| "epoch": 4.3203230148048455, | |
| "grad_norm": 1.068778395652771, | |
| "learning_rate": 6.515846539914854e-05, | |
| "loss": 0.03703950047492981, | |
| "mean_token_accuracy": 0.9864457190036774, | |
| "num_tokens": 57521452.0, | |
| "step": 3210 | |
| }, | |
| { | |
| "entropy": 1.0472504138946532, | |
| "epoch": 4.33378196500673, | |
| "grad_norm": 1.424491047859192, | |
| "learning_rate": 6.494623468442718e-05, | |
| "loss": 0.037775719165802, | |
| "mean_token_accuracy": 0.9865365862846375, | |
| "num_tokens": 57700961.0, | |
| "step": 3220 | |
| }, | |
| { | |
| "entropy": 1.0458205103874207, | |
| "epoch": 4.347240915208614, | |
| "grad_norm": 1.4818817377090454, | |
| "learning_rate": 6.473370784985798e-05, | |
| "loss": 0.03907344341278076, | |
| "mean_token_accuracy": 0.9854373514652253, | |
| "num_tokens": 57880226.0, | |
| "step": 3230 | |
| }, | |
| { | |
| "entropy": 1.0566739559173584, | |
| "epoch": 4.360699865410498, | |
| "grad_norm": 1.147052526473999, | |
| "learning_rate": 6.452088910609436e-05, | |
| "loss": 0.036917302012443545, | |
| "mean_token_accuracy": 0.9858834028244019, | |
| "num_tokens": 58059019.0, | |
| "step": 3240 | |
| }, | |
| { | |
| "entropy": 1.053157901763916, | |
| "epoch": 4.3741588156123825, | |
| "grad_norm": 1.1795092821121216, | |
| "learning_rate": 6.430778266957312e-05, | |
| "loss": 0.036924809217453, | |
| "mean_token_accuracy": 0.9859816431999207, | |
| "num_tokens": 58237916.0, | |
| "step": 3250 | |
| }, | |
| { | |
| "entropy": 1.0390576720237732, | |
| "epoch": 4.387617765814267, | |
| "grad_norm": 1.2286282777786255, | |
| "learning_rate": 6.409439276243092e-05, | |
| "loss": 0.03474595844745636, | |
| "mean_token_accuracy": 0.9870323657989502, | |
| "num_tokens": 58416659.0, | |
| "step": 3260 | |
| }, | |
| { | |
| "entropy": 1.0422054052352905, | |
| "epoch": 4.401076716016151, | |
| "grad_norm": 1.1984254121780396, | |
| "learning_rate": 6.388072361242067e-05, | |
| "loss": 0.03632686138153076, | |
| "mean_token_accuracy": 0.9864870607852936, | |
| "num_tokens": 58595581.0, | |
| "step": 3270 | |
| }, | |
| { | |
| "entropy": 1.0455679059028626, | |
| "epoch": 4.414535666218035, | |
| "grad_norm": 1.201826810836792, | |
| "learning_rate": 6.366677945282769e-05, | |
| "loss": 0.03645781874656677, | |
| "mean_token_accuracy": 0.9858992576599122, | |
| "num_tokens": 58774496.0, | |
| "step": 3280 | |
| }, | |
| { | |
| "entropy": 1.0460187315940856, | |
| "epoch": 4.4279946164199195, | |
| "grad_norm": 1.320504069328308, | |
| "learning_rate": 6.345256452238591e-05, | |
| "loss": 0.036577820777893066, | |
| "mean_token_accuracy": 0.9865557551383972, | |
| "num_tokens": 58953827.0, | |
| "step": 3290 | |
| }, | |
| { | |
| "entropy": 1.043444275856018, | |
| "epoch": 4.441453566621804, | |
| "grad_norm": 1.1478440761566162, | |
| "learning_rate": 6.323808306519385e-05, | |
| "loss": 0.03831025958061218, | |
| "mean_token_accuracy": 0.9856758773326874, | |
| "num_tokens": 59133259.0, | |
| "step": 3300 | |
| }, | |
| { | |
| "entropy": 1.0497878909111023, | |
| "epoch": 4.454912516823688, | |
| "grad_norm": 1.0950919389724731, | |
| "learning_rate": 6.302333933063057e-05, | |
| "loss": 0.0407365620136261, | |
| "mean_token_accuracy": 0.984984940290451, | |
| "num_tokens": 59312911.0, | |
| "step": 3310 | |
| }, | |
| { | |
| "entropy": 1.058940577507019, | |
| "epoch": 4.468371467025572, | |
| "grad_norm": 1.059362769126892, | |
| "learning_rate": 6.280833757327142e-05, | |
| "loss": 0.03741527199745178, | |
| "mean_token_accuracy": 0.9860852122306824, | |
| "num_tokens": 59492704.0, | |
| "step": 3320 | |
| }, | |
| { | |
| "entropy": 1.0534117579460145, | |
| "epoch": 4.481830417227457, | |
| "grad_norm": 1.3785133361816406, | |
| "learning_rate": 6.259308205280383e-05, | |
| "loss": 0.03677107989788055, | |
| "mean_token_accuracy": 0.9864885747432709, | |
| "num_tokens": 59672041.0, | |
| "step": 3330 | |
| }, | |
| { | |
| "entropy": 1.0532031655311584, | |
| "epoch": 4.495289367429341, | |
| "grad_norm": 1.4289969205856323, | |
| "learning_rate": 6.237757703394283e-05, | |
| "loss": 0.040090084075927734, | |
| "mean_token_accuracy": 0.9849297285079956, | |
| "num_tokens": 59851174.0, | |
| "step": 3340 | |
| }, | |
| { | |
| "entropy": 1.0606793522834779, | |
| "epoch": 4.508748317631225, | |
| "grad_norm": 0.9429711699485779, | |
| "learning_rate": 6.216182678634664e-05, | |
| "loss": 0.03906591236591339, | |
| "mean_token_accuracy": 0.9849115431308746, | |
| "num_tokens": 60030790.0, | |
| "step": 3350 | |
| }, | |
| { | |
| "entropy": 1.0648606181144715, | |
| "epoch": 4.522207267833109, | |
| "grad_norm": 1.0886033773422241, | |
| "learning_rate": 6.194583558453199e-05, | |
| "loss": 0.038545870780944826, | |
| "mean_token_accuracy": 0.9858034968376159, | |
| "num_tokens": 60209459.0, | |
| "step": 3360 | |
| }, | |
| { | |
| "entropy": 1.0667495369911193, | |
| "epoch": 4.535666218034994, | |
| "grad_norm": 0.9514157772064209, | |
| "learning_rate": 6.172960770778948e-05, | |
| "loss": 0.03615821003913879, | |
| "mean_token_accuracy": 0.9864137768745422, | |
| "num_tokens": 60389121.0, | |
| "step": 3370 | |
| }, | |
| { | |
| "entropy": 1.0640890002250671, | |
| "epoch": 4.549125168236878, | |
| "grad_norm": 1.3572577238082886, | |
| "learning_rate": 6.151314744009885e-05, | |
| "loss": 0.03793430626392365, | |
| "mean_token_accuracy": 0.9861454784870147, | |
| "num_tokens": 60568718.0, | |
| "step": 3380 | |
| }, | |
| { | |
| "entropy": 1.0605762481689454, | |
| "epoch": 4.562584118438762, | |
| "grad_norm": 0.9496264457702637, | |
| "learning_rate": 6.129645907004395e-05, | |
| "loss": 0.03883695602416992, | |
| "mean_token_accuracy": 0.9849031150341034, | |
| "num_tokens": 60747908.0, | |
| "step": 3390 | |
| }, | |
| { | |
| "entropy": 1.0625512599945068, | |
| "epoch": 4.576043068640646, | |
| "grad_norm": 1.4468268156051636, | |
| "learning_rate": 6.107954689072796e-05, | |
| "loss": 0.03865863978862762, | |
| "mean_token_accuracy": 0.9855649054050446, | |
| "num_tokens": 60927000.0, | |
| "step": 3400 | |
| }, | |
| { | |
| "entropy": 1.0635502815246582, | |
| "epoch": 4.589502018842531, | |
| "grad_norm": 1.2759664058685303, | |
| "learning_rate": 6.086241519968822e-05, | |
| "loss": 0.037098705768585205, | |
| "mean_token_accuracy": 0.9857240319252014, | |
| "num_tokens": 61105703.0, | |
| "step": 3410 | |
| }, | |
| { | |
| "entropy": 1.0495989680290223, | |
| "epoch": 4.602960969044415, | |
| "grad_norm": 1.011147379875183, | |
| "learning_rate": 6.064506829881109e-05, | |
| "loss": 0.03874770402908325, | |
| "mean_token_accuracy": 0.9859291315078735, | |
| "num_tokens": 61285304.0, | |
| "step": 3420 | |
| }, | |
| { | |
| "entropy": 1.05042724609375, | |
| "epoch": 4.616419919246299, | |
| "grad_norm": 1.129265308380127, | |
| "learning_rate": 6.042751049424675e-05, | |
| "loss": 0.0369877964258194, | |
| "mean_token_accuracy": 0.9860033929347992, | |
| "num_tokens": 61464950.0, | |
| "step": 3430 | |
| }, | |
| { | |
| "entropy": 1.0447500228881836, | |
| "epoch": 4.629878869448183, | |
| "grad_norm": 1.359832525253296, | |
| "learning_rate": 6.02097460963239e-05, | |
| "loss": 0.03797791302204132, | |
| "mean_token_accuracy": 0.9854108691215515, | |
| "num_tokens": 61644720.0, | |
| "step": 3440 | |
| }, | |
| { | |
| "entropy": 1.0469836950302125, | |
| "epoch": 4.643337819650068, | |
| "grad_norm": 1.5161691904067993, | |
| "learning_rate": 5.999177941946429e-05, | |
| "loss": 0.03715096116065979, | |
| "mean_token_accuracy": 0.9859564423561096, | |
| "num_tokens": 61824536.0, | |
| "step": 3450 | |
| }, | |
| { | |
| "entropy": 1.0558869957923889, | |
| "epoch": 4.656796769851952, | |
| "grad_norm": 1.3739120960235596, | |
| "learning_rate": 5.977361478209732e-05, | |
| "loss": 0.03880961239337921, | |
| "mean_token_accuracy": 0.9851081371307373, | |
| "num_tokens": 62003793.0, | |
| "step": 3460 | |
| }, | |
| { | |
| "entropy": 1.0508078336715698, | |
| "epoch": 4.670255720053836, | |
| "grad_norm": 1.2848538160324097, | |
| "learning_rate": 5.955525650657444e-05, | |
| "loss": 0.03713637590408325, | |
| "mean_token_accuracy": 0.9857253849506378, | |
| "num_tokens": 62184015.0, | |
| "step": 3470 | |
| }, | |
| { | |
| "entropy": 1.0443377375602723, | |
| "epoch": 4.68371467025572, | |
| "grad_norm": 1.073598027229309, | |
| "learning_rate": 5.933670891908355e-05, | |
| "loss": 0.03473606109619141, | |
| "mean_token_accuracy": 0.9873181879520416, | |
| "num_tokens": 62363492.0, | |
| "step": 3480 | |
| }, | |
| { | |
| "entropy": 1.0368189215660095, | |
| "epoch": 4.697173620457605, | |
| "grad_norm": 1.203999638557434, | |
| "learning_rate": 5.9117976349563206e-05, | |
| "loss": 0.036920982599258426, | |
| "mean_token_accuracy": 0.986221992969513, | |
| "num_tokens": 62542906.0, | |
| "step": 3490 | |
| }, | |
| { | |
| "entropy": 1.043162429332733, | |
| "epoch": 4.710632570659489, | |
| "grad_norm": 1.0199706554412842, | |
| "learning_rate": 5.889906313161696e-05, | |
| "loss": 0.03679157495498657, | |
| "mean_token_accuracy": 0.9863799631595611, | |
| "num_tokens": 62721864.0, | |
| "step": 3500 | |
| }, | |
| { | |
| "entropy": 1.0550310969352723, | |
| "epoch": 4.724091520861373, | |
| "grad_norm": 1.172404170036316, | |
| "learning_rate": 5.8679973602427376e-05, | |
| "loss": 0.03852836787700653, | |
| "mean_token_accuracy": 0.9853650152683258, | |
| "num_tokens": 62900752.0, | |
| "step": 3510 | |
| }, | |
| { | |
| "entropy": 1.0554153800010682, | |
| "epoch": 4.737550471063257, | |
| "grad_norm": 1.2630856037139893, | |
| "learning_rate": 5.846071210267018e-05, | |
| "loss": 0.03773666620254516, | |
| "mean_token_accuracy": 0.9868132829666137, | |
| "num_tokens": 63079670.0, | |
| "step": 3520 | |
| }, | |
| { | |
| "entropy": 1.0372057795524596, | |
| "epoch": 4.751009421265142, | |
| "grad_norm": 1.2371525764465332, | |
| "learning_rate": 5.824128297642823e-05, | |
| "loss": 0.0382472962141037, | |
| "mean_token_accuracy": 0.9858209669589997, | |
| "num_tokens": 63258512.0, | |
| "step": 3530 | |
| }, | |
| { | |
| "entropy": 1.0347737193107605, | |
| "epoch": 4.764468371467026, | |
| "grad_norm": 0.9487586617469788, | |
| "learning_rate": 5.802169057110548e-05, | |
| "loss": 0.035361993312835696, | |
| "mean_token_accuracy": 0.9863068044185639, | |
| "num_tokens": 63438267.0, | |
| "step": 3540 | |
| }, | |
| { | |
| "entropy": 1.050095522403717, | |
| "epoch": 4.77792732166891, | |
| "grad_norm": 1.0213005542755127, | |
| "learning_rate": 5.7801939237340786e-05, | |
| "loss": 0.03737463653087616, | |
| "mean_token_accuracy": 0.986017245054245, | |
| "num_tokens": 63617530.0, | |
| "step": 3550 | |
| }, | |
| { | |
| "entropy": 1.0533493757247925, | |
| "epoch": 4.7913862718707945, | |
| "grad_norm": 1.1204028129577637, | |
| "learning_rate": 5.758203332892177e-05, | |
| "loss": 0.03767351508140564, | |
| "mean_token_accuracy": 0.9861111581325531, | |
| "num_tokens": 63796732.0, | |
| "step": 3560 | |
| }, | |
| { | |
| "entropy": 1.0467125058174134, | |
| "epoch": 4.804845222072679, | |
| "grad_norm": 1.1478379964828491, | |
| "learning_rate": 5.736197720269855e-05, | |
| "loss": 0.03344551920890808, | |
| "mean_token_accuracy": 0.9871456980705261, | |
| "num_tokens": 63975699.0, | |
| "step": 3570 | |
| }, | |
| { | |
| "entropy": 1.0390986204147339, | |
| "epoch": 4.818304172274562, | |
| "grad_norm": 1.1709128618240356, | |
| "learning_rate": 5.714177521849736e-05, | |
| "loss": 0.03542309701442718, | |
| "mean_token_accuracy": 0.9866024851799011, | |
| "num_tokens": 64154668.0, | |
| "step": 3580 | |
| }, | |
| { | |
| "entropy": 1.0338069915771484, | |
| "epoch": 4.831763122476447, | |
| "grad_norm": 1.0544331073760986, | |
| "learning_rate": 5.69214317390343e-05, | |
| "loss": 0.032989829778671265, | |
| "mean_token_accuracy": 0.987553596496582, | |
| "num_tokens": 64334132.0, | |
| "step": 3590 | |
| }, | |
| { | |
| "entropy": 1.0331252694129944, | |
| "epoch": 4.845222072678331, | |
| "grad_norm": 1.3105525970458984, | |
| "learning_rate": 5.670095112982875e-05, | |
| "loss": 0.03612107336521149, | |
| "mean_token_accuracy": 0.986699789762497, | |
| "num_tokens": 64513056.0, | |
| "step": 3600 | |
| }, | |
| { | |
| "entropy": 1.0471250653266906, | |
| "epoch": 4.858681022880216, | |
| "grad_norm": 1.1299049854278564, | |
| "learning_rate": 5.648033775911701e-05, | |
| "loss": 0.03683598637580872, | |
| "mean_token_accuracy": 0.9859676957130432, | |
| "num_tokens": 64692811.0, | |
| "step": 3610 | |
| }, | |
| { | |
| "entropy": 1.0519919276237488, | |
| "epoch": 4.872139973082099, | |
| "grad_norm": 1.1945998668670654, | |
| "learning_rate": 5.625959599776564e-05, | |
| "loss": 0.038029834628105164, | |
| "mean_token_accuracy": 0.9856249630451203, | |
| "num_tokens": 64872098.0, | |
| "step": 3620 | |
| }, | |
| { | |
| "entropy": 1.0599406599998473, | |
| "epoch": 4.885598923283984, | |
| "grad_norm": 1.1491973400115967, | |
| "learning_rate": 5.603873021918493e-05, | |
| "loss": 0.03606452643871307, | |
| "mean_token_accuracy": 0.9867934942245483, | |
| "num_tokens": 65050691.0, | |
| "step": 3630 | |
| }, | |
| { | |
| "entropy": 1.0520189166069032, | |
| "epoch": 4.899057873485868, | |
| "grad_norm": 1.2668406963348389, | |
| "learning_rate": 5.581774479924229e-05, | |
| "loss": 0.03717108964920044, | |
| "mean_token_accuracy": 0.9856311261653901, | |
| "num_tokens": 65230073.0, | |
| "step": 3640 | |
| }, | |
| { | |
| "entropy": 1.0371422290802002, | |
| "epoch": 4.912516823687753, | |
| "grad_norm": 1.2444928884506226, | |
| "learning_rate": 5.5596644116175444e-05, | |
| "loss": 0.03783654570579529, | |
| "mean_token_accuracy": 0.9851936995983124, | |
| "num_tokens": 65409403.0, | |
| "step": 3650 | |
| }, | |
| { | |
| "entropy": 1.035485291481018, | |
| "epoch": 4.925975773889636, | |
| "grad_norm": 1.1150327920913696, | |
| "learning_rate": 5.537543255050579e-05, | |
| "loss": 0.03533655405044556, | |
| "mean_token_accuracy": 0.986861526966095, | |
| "num_tokens": 65588232.0, | |
| "step": 3660 | |
| }, | |
| { | |
| "entropy": 1.0358031511306762, | |
| "epoch": 4.939434724091521, | |
| "grad_norm": 1.2493401765823364, | |
| "learning_rate": 5.5154114484951556e-05, | |
| "loss": 0.03752387762069702, | |
| "mean_token_accuracy": 0.9860036075115204, | |
| "num_tokens": 65767287.0, | |
| "step": 3670 | |
| }, | |
| { | |
| "entropy": 1.0428583860397338, | |
| "epoch": 4.952893674293405, | |
| "grad_norm": 1.1343940496444702, | |
| "learning_rate": 5.4932694304340985e-05, | |
| "loss": 0.03644071221351623, | |
| "mean_token_accuracy": 0.9862412512302399, | |
| "num_tokens": 65946115.0, | |
| "step": 3680 | |
| }, | |
| { | |
| "entropy": 1.050352644920349, | |
| "epoch": 4.96635262449529, | |
| "grad_norm": 1.3006900548934937, | |
| "learning_rate": 5.471117639552543e-05, | |
| "loss": 0.035864520072937014, | |
| "mean_token_accuracy": 0.9864962816238403, | |
| "num_tokens": 66125715.0, | |
| "step": 3690 | |
| }, | |
| { | |
| "entropy": 1.048657739162445, | |
| "epoch": 4.979811574697173, | |
| "grad_norm": 1.1723651885986328, | |
| "learning_rate": 5.448956514729251e-05, | |
| "loss": 0.03770222663879395, | |
| "mean_token_accuracy": 0.9857257843017578, | |
| "num_tokens": 66304829.0, | |
| "step": 3700 | |
| }, | |
| { | |
| "entropy": 1.044695222377777, | |
| "epoch": 4.993270524899058, | |
| "grad_norm": 0.9742991328239441, | |
| "learning_rate": 5.426786495027908e-05, | |
| "loss": 0.03706640601158142, | |
| "mean_token_accuracy": 0.985941207408905, | |
| "num_tokens": 66483883.0, | |
| "step": 3710 | |
| }, | |
| { | |
| "epoch": 5.0, | |
| "eval_entropy": 1.0356649699484466, | |
| "eval_loss": 0.12430668622255325, | |
| "eval_mean_token_accuracy": 0.9573376892478602, | |
| "eval_num_tokens": 66573703.0, | |
| "eval_runtime": 12.7724, | |
| "eval_samples_per_second": 391.468, | |
| "eval_steps_per_second": 12.292, | |
| "step": 3715 | |
| }, | |
| { | |
| "entropy": 1.0324241042137146, | |
| "epoch": 5.006729475100942, | |
| "grad_norm": 0.9353874921798706, | |
| "learning_rate": 5.404608019688432e-05, | |
| "loss": 0.028589162230491637, | |
| "mean_token_accuracy": 0.9895434856414795, | |
| "num_tokens": 66663377.0, | |
| "step": 3720 | |
| }, | |
| { | |
| "entropy": 1.0308491230010985, | |
| "epoch": 5.020188425302826, | |
| "grad_norm": 1.2845776081085205, | |
| "learning_rate": 5.382421528118262e-05, | |
| "loss": 0.024698126316070556, | |
| "mean_token_accuracy": 0.9909838557243347, | |
| "num_tokens": 66842022.0, | |
| "step": 3730 | |
| }, | |
| { | |
| "entropy": 1.0136583924293519, | |
| "epoch": 5.03364737550471, | |
| "grad_norm": 0.8143796324729919, | |
| "learning_rate": 5.360227459883662e-05, | |
| "loss": 0.021929217875003813, | |
| "mean_token_accuracy": 0.9924641191959381, | |
| "num_tokens": 67021330.0, | |
| "step": 3740 | |
| }, | |
| { | |
| "entropy": 1.002350401878357, | |
| "epoch": 5.0471063257065945, | |
| "grad_norm": 1.3543813228607178, | |
| "learning_rate": 5.338026254701003e-05, | |
| "loss": 0.024280667304992676, | |
| "mean_token_accuracy": 0.9914784133434296, | |
| "num_tokens": 67200604.0, | |
| "step": 3750 | |
| }, | |
| { | |
| "entropy": 1.0096928119659423, | |
| "epoch": 5.060565275908479, | |
| "grad_norm": 1.0620241165161133, | |
| "learning_rate": 5.31581835242806e-05, | |
| "loss": 0.023655575513839722, | |
| "mean_token_accuracy": 0.9915423214435577, | |
| "num_tokens": 67380371.0, | |
| "step": 3760 | |
| }, | |
| { | |
| "entropy": 1.021187925338745, | |
| "epoch": 5.074024226110363, | |
| "grad_norm": 1.0421315431594849, | |
| "learning_rate": 5.293604193055289e-05, | |
| "loss": 0.02454202026128769, | |
| "mean_token_accuracy": 0.9910594940185546, | |
| "num_tokens": 67559723.0, | |
| "step": 3770 | |
| }, | |
| { | |
| "entropy": 1.018889594078064, | |
| "epoch": 5.087483176312247, | |
| "grad_norm": 0.9473801851272583, | |
| "learning_rate": 5.2713842166971165e-05, | |
| "loss": 0.025448399782180785, | |
| "mean_token_accuracy": 0.9912221610546113, | |
| "num_tokens": 67738595.0, | |
| "step": 3780 | |
| }, | |
| { | |
| "entropy": 1.0147689938545228, | |
| "epoch": 5.1009421265141315, | |
| "grad_norm": 0.939347505569458, | |
| "learning_rate": 5.249158863583216e-05, | |
| "loss": 0.024416190385818482, | |
| "mean_token_accuracy": 0.9910872936248779, | |
| "num_tokens": 67917476.0, | |
| "step": 3790 | |
| }, | |
| { | |
| "entropy": 1.0090774953365327, | |
| "epoch": 5.114401076716016, | |
| "grad_norm": 1.214951753616333, | |
| "learning_rate": 5.2269285740497876e-05, | |
| "loss": 0.02616993188858032, | |
| "mean_token_accuracy": 0.9900912940502167, | |
| "num_tokens": 68097328.0, | |
| "step": 3800 | |
| }, | |
| { | |
| "entropy": 1.0201542019844054, | |
| "epoch": 5.1278600269179, | |
| "grad_norm": 0.9982582926750183, | |
| "learning_rate": 5.204693788530832e-05, | |
| "loss": 0.026025664806365967, | |
| "mean_token_accuracy": 0.9906771540641784, | |
| "num_tokens": 68276754.0, | |
| "step": 3810 | |
| }, | |
| { | |
| "entropy": 1.0209372758865356, | |
| "epoch": 5.141318977119784, | |
| "grad_norm": 1.2663366794586182, | |
| "learning_rate": 5.182454947549428e-05, | |
| "loss": 0.02434307783842087, | |
| "mean_token_accuracy": 0.99150630235672, | |
| "num_tokens": 68456764.0, | |
| "step": 3820 | |
| }, | |
| { | |
| "entropy": 1.0369561433792114, | |
| "epoch": 5.1547779273216685, | |
| "grad_norm": 1.13618004322052, | |
| "learning_rate": 5.160212491709002e-05, | |
| "loss": 0.025401628017425536, | |
| "mean_token_accuracy": 0.9907406866550446, | |
| "num_tokens": 68635336.0, | |
| "step": 3830 | |
| }, | |
| { | |
| "entropy": 1.0440601348876952, | |
| "epoch": 5.168236877523553, | |
| "grad_norm": 1.2365202903747559, | |
| "learning_rate": 5.1379668616845975e-05, | |
| "loss": 0.02707371711730957, | |
| "mean_token_accuracy": 0.9900575876235962, | |
| "num_tokens": 68814346.0, | |
| "step": 3840 | |
| }, | |
| { | |
| "entropy": 1.0355126500129699, | |
| "epoch": 5.181695827725437, | |
| "grad_norm": 0.9310021996498108, | |
| "learning_rate": 5.115718498214148e-05, | |
| "loss": 0.02353193610906601, | |
| "mean_token_accuracy": 0.9916076958179474, | |
| "num_tokens": 68993930.0, | |
| "step": 3850 | |
| }, | |
| { | |
| "entropy": 1.0217126727104187, | |
| "epoch": 5.195154777927321, | |
| "grad_norm": 1.1135667562484741, | |
| "learning_rate": 5.093467842089742e-05, | |
| "loss": 0.0253662109375, | |
| "mean_token_accuracy": 0.9902997076511383, | |
| "num_tokens": 69173251.0, | |
| "step": 3860 | |
| }, | |
| { | |
| "entropy": 1.0064147591590882, | |
| "epoch": 5.2086137281292055, | |
| "grad_norm": 1.0604337453842163, | |
| "learning_rate": 5.071215334148891e-05, | |
| "loss": 0.026254481077194212, | |
| "mean_token_accuracy": 0.990457046031952, | |
| "num_tokens": 69352492.0, | |
| "step": 3870 | |
| }, | |
| { | |
| "entropy": 1.0141639947891234, | |
| "epoch": 5.22207267833109, | |
| "grad_norm": 1.1456916332244873, | |
| "learning_rate": 5.048961415265797e-05, | |
| "loss": 0.027528288960456847, | |
| "mean_token_accuracy": 0.9905337333679199, | |
| "num_tokens": 69531556.0, | |
| "step": 3880 | |
| }, | |
| { | |
| "entropy": 1.0185627937316895, | |
| "epoch": 5.235531628532974, | |
| "grad_norm": 1.3022476434707642, | |
| "learning_rate": 5.0267065263426125e-05, | |
| "loss": 0.0235461950302124, | |
| "mean_token_accuracy": 0.9911311268806458, | |
| "num_tokens": 69710721.0, | |
| "step": 3890 | |
| }, | |
| { | |
| "entropy": 1.0204244375228881, | |
| "epoch": 5.248990578734858, | |
| "grad_norm": 1.01898992061615, | |
| "learning_rate": 5.00445110830071e-05, | |
| "loss": 0.024360083043575287, | |
| "mean_token_accuracy": 0.991031140089035, | |
| "num_tokens": 69890062.0, | |
| "step": 3900 | |
| }, | |
| { | |
| "entropy": 1.0131066560745239, | |
| "epoch": 5.262449528936743, | |
| "grad_norm": 1.0795366764068604, | |
| "learning_rate": 4.9821956020719474e-05, | |
| "loss": 0.02722684442996979, | |
| "mean_token_accuracy": 0.989804482460022, | |
| "num_tokens": 70068900.0, | |
| "step": 3910 | |
| }, | |
| { | |
| "entropy": 1.025877797603607, | |
| "epoch": 5.275908479138627, | |
| "grad_norm": 1.2080345153808594, | |
| "learning_rate": 4.959940448589928e-05, | |
| "loss": 0.027399671077728272, | |
| "mean_token_accuracy": 0.9899636626243591, | |
| "num_tokens": 70248062.0, | |
| "step": 3920 | |
| }, | |
| { | |
| "entropy": 1.0310349464416504, | |
| "epoch": 5.289367429340511, | |
| "grad_norm": 1.1293290853500366, | |
| "learning_rate": 4.9376860887812666e-05, | |
| "loss": 0.02553839683532715, | |
| "mean_token_accuracy": 0.9908572614192963, | |
| "num_tokens": 70427066.0, | |
| "step": 3930 | |
| }, | |
| { | |
| "entropy": 1.024777591228485, | |
| "epoch": 5.302826379542395, | |
| "grad_norm": 0.9517710208892822, | |
| "learning_rate": 4.915432963556853e-05, | |
| "loss": 0.026053178310394286, | |
| "mean_token_accuracy": 0.9905928015708924, | |
| "num_tokens": 70606283.0, | |
| "step": 3940 | |
| }, | |
| { | |
| "entropy": 1.024202048778534, | |
| "epoch": 5.31628532974428, | |
| "grad_norm": 1.156661868095398, | |
| "learning_rate": 4.8931815138031173e-05, | |
| "loss": 0.02635410726070404, | |
| "mean_token_accuracy": 0.9904537379741669, | |
| "num_tokens": 70784962.0, | |
| "step": 3950 | |
| }, | |
| { | |
| "entropy": 1.0223736405372619, | |
| "epoch": 5.329744279946164, | |
| "grad_norm": 1.0991413593292236, | |
| "learning_rate": 4.870932180373296e-05, | |
| "loss": 0.026505589485168457, | |
| "mean_token_accuracy": 0.9905425131320953, | |
| "num_tokens": 70964289.0, | |
| "step": 3960 | |
| }, | |
| { | |
| "entropy": 1.0177703976631165, | |
| "epoch": 5.343203230148048, | |
| "grad_norm": 1.0344407558441162, | |
| "learning_rate": 4.8486854040786926e-05, | |
| "loss": 0.026624417304992674, | |
| "mean_token_accuracy": 0.9904337406158448, | |
| "num_tokens": 71144133.0, | |
| "step": 3970 | |
| }, | |
| { | |
| "entropy": 1.0158238291740418, | |
| "epoch": 5.356662180349932, | |
| "grad_norm": 0.8977853059768677, | |
| "learning_rate": 4.826441625679953e-05, | |
| "loss": 0.026179781556129454, | |
| "mean_token_accuracy": 0.9901742041110992, | |
| "num_tokens": 71323642.0, | |
| "step": 3980 | |
| }, | |
| { | |
| "entropy": 1.0216598749160766, | |
| "epoch": 5.370121130551817, | |
| "grad_norm": 0.8806330561637878, | |
| "learning_rate": 4.8042012858783223e-05, | |
| "loss": 0.024043604731559753, | |
| "mean_token_accuracy": 0.9914625465869904, | |
| "num_tokens": 71502613.0, | |
| "step": 3990 | |
| }, | |
| { | |
| "entropy": 1.0118016242980956, | |
| "epoch": 5.383580080753701, | |
| "grad_norm": 0.9316030740737915, | |
| "learning_rate": 4.781964825306923e-05, | |
| "loss": 0.023112615942955016, | |
| "mean_token_accuracy": 0.9917285740375519, | |
| "num_tokens": 71682439.0, | |
| "step": 4000 | |
| }, | |
| { | |
| "entropy": 1.0056150257587433, | |
| "epoch": 5.397039030955585, | |
| "grad_norm": 0.8675498366355896, | |
| "learning_rate": 4.7597326845220206e-05, | |
| "loss": 0.02428761273622513, | |
| "mean_token_accuracy": 0.9905657172203064, | |
| "num_tokens": 71861391.0, | |
| "step": 4010 | |
| }, | |
| { | |
| "entropy": 1.0037673354148864, | |
| "epoch": 5.410497981157469, | |
| "grad_norm": 1.0555452108383179, | |
| "learning_rate": 4.737505303994292e-05, | |
| "loss": 0.0254961758852005, | |
| "mean_token_accuracy": 0.9906506896018982, | |
| "num_tokens": 72040499.0, | |
| "step": 4020 | |
| }, | |
| { | |
| "entropy": 0.9917411029338836, | |
| "epoch": 5.423956931359354, | |
| "grad_norm": 1.0477594137191772, | |
| "learning_rate": 4.7152831241001065e-05, | |
| "loss": 0.023488616943359374, | |
| "mean_token_accuracy": 0.991758120059967, | |
| "num_tokens": 72220507.0, | |
| "step": 4030 | |
| }, | |
| { | |
| "entropy": 0.9974137663841247, | |
| "epoch": 5.437415881561238, | |
| "grad_norm": 1.189155101776123, | |
| "learning_rate": 4.693066585112795e-05, | |
| "loss": 0.024205952882766724, | |
| "mean_token_accuracy": 0.9910655796527863, | |
| "num_tokens": 72400051.0, | |
| "step": 4040 | |
| }, | |
| { | |
| "entropy": 1.0056391954421997, | |
| "epoch": 5.450874831763122, | |
| "grad_norm": 1.024430751800537, | |
| "learning_rate": 4.670856127193928e-05, | |
| "loss": 0.0251724511384964, | |
| "mean_token_accuracy": 0.990533047914505, | |
| "num_tokens": 72578988.0, | |
| "step": 4050 | |
| }, | |
| { | |
| "entropy": 1.000277614593506, | |
| "epoch": 5.464333781965006, | |
| "grad_norm": 1.0882636308670044, | |
| "learning_rate": 4.648652190384597e-05, | |
| "loss": 0.02423400282859802, | |
| "mean_token_accuracy": 0.9916969239711761, | |
| "num_tokens": 72758741.0, | |
| "step": 4060 | |
| }, | |
| { | |
| "entropy": 1.0084757328033447, | |
| "epoch": 5.477792732166891, | |
| "grad_norm": 0.7930824756622314, | |
| "learning_rate": 4.626455214596695e-05, | |
| "loss": 0.023505699634552003, | |
| "mean_token_accuracy": 0.9915890038013458, | |
| "num_tokens": 72938170.0, | |
| "step": 4070 | |
| }, | |
| { | |
| "entropy": 1.015303874015808, | |
| "epoch": 5.491251682368775, | |
| "grad_norm": 1.3612091541290283, | |
| "learning_rate": 4.6042656396042e-05, | |
| "loss": 0.025665953755378723, | |
| "mean_token_accuracy": 0.990752911567688, | |
| "num_tokens": 73117014.0, | |
| "step": 4080 | |
| }, | |
| { | |
| "entropy": 1.0118269324302673, | |
| "epoch": 5.504710632570659, | |
| "grad_norm": 0.9727089405059814, | |
| "learning_rate": 4.5820839050344643e-05, | |
| "loss": 0.027782341837882994, | |
| "mean_token_accuracy": 0.9897802293300628, | |
| "num_tokens": 73296744.0, | |
| "step": 4090 | |
| }, | |
| { | |
| "entropy": 1.0124372959136962, | |
| "epoch": 5.518169582772543, | |
| "grad_norm": 1.0280526876449585, | |
| "learning_rate": 4.559910450359502e-05, | |
| "loss": 0.026491034030914306, | |
| "mean_token_accuracy": 0.9905598402023316, | |
| "num_tokens": 73476213.0, | |
| "step": 4100 | |
| }, | |
| { | |
| "entropy": 1.0201756000518798, | |
| "epoch": 5.531628532974428, | |
| "grad_norm": 1.0505496263504028, | |
| "learning_rate": 4.5377457148872837e-05, | |
| "loss": 0.02565605640411377, | |
| "mean_token_accuracy": 0.9904898762702942, | |
| "num_tokens": 73655498.0, | |
| "step": 4110 | |
| }, | |
| { | |
| "entropy": 1.0122828364372254, | |
| "epoch": 5.545087483176312, | |
| "grad_norm": 0.9964900016784668, | |
| "learning_rate": 4.515590137753032e-05, | |
| "loss": 0.026221027970314024, | |
| "mean_token_accuracy": 0.9898638904094696, | |
| "num_tokens": 73834885.0, | |
| "step": 4120 | |
| }, | |
| { | |
| "entropy": 1.002439957857132, | |
| "epoch": 5.558546433378196, | |
| "grad_norm": 1.2505117654800415, | |
| "learning_rate": 4.493444157910521e-05, | |
| "loss": 0.025265097618103027, | |
| "mean_token_accuracy": 0.9906082093715668, | |
| "num_tokens": 74014153.0, | |
| "step": 4130 | |
| }, | |
| { | |
| "entropy": 1.0038666009902955, | |
| "epoch": 5.5720053835800805, | |
| "grad_norm": 1.189091682434082, | |
| "learning_rate": 4.471308214123381e-05, | |
| "loss": 0.027262809872627258, | |
| "mean_token_accuracy": 0.9900608360767365, | |
| "num_tokens": 74192608.0, | |
| "step": 4140 | |
| }, | |
| { | |
| "entropy": 1.0128184318542481, | |
| "epoch": 5.585464333781965, | |
| "grad_norm": 1.4007318019866943, | |
| "learning_rate": 4.449182744956403e-05, | |
| "loss": 0.02719965875148773, | |
| "mean_token_accuracy": 0.9899117827415467, | |
| "num_tokens": 74371709.0, | |
| "step": 4150 | |
| }, | |
| { | |
| "entropy": 1.01895170211792, | |
| "epoch": 5.598923283983849, | |
| "grad_norm": 0.764778733253479, | |
| "learning_rate": 4.4270681887668544e-05, | |
| "loss": 0.024375322461128234, | |
| "mean_token_accuracy": 0.9916343033313751, | |
| "num_tokens": 74550754.0, | |
| "step": 4160 | |
| }, | |
| { | |
| "entropy": 1.0186664938926697, | |
| "epoch": 5.612382234185733, | |
| "grad_norm": 1.1775010824203491, | |
| "learning_rate": 4.404964983695786e-05, | |
| "loss": 0.024209040403366088, | |
| "mean_token_accuracy": 0.9905198216438293, | |
| "num_tokens": 74729681.0, | |
| "step": 4170 | |
| }, | |
| { | |
| "entropy": 1.0036309778690338, | |
| "epoch": 5.6258411843876175, | |
| "grad_norm": 0.9738861322402954, | |
| "learning_rate": 4.382873567659361e-05, | |
| "loss": 0.02190125733613968, | |
| "mean_token_accuracy": 0.9924726009368896, | |
| "num_tokens": 74908847.0, | |
| "step": 4180 | |
| }, | |
| { | |
| "entropy": 0.9877856969833374, | |
| "epoch": 5.639300134589502, | |
| "grad_norm": 1.000006914138794, | |
| "learning_rate": 4.3607943783401736e-05, | |
| "loss": 0.023811979591846465, | |
| "mean_token_accuracy": 0.9915596306324005, | |
| "num_tokens": 75087768.0, | |
| "step": 4190 | |
| }, | |
| { | |
| "entropy": 0.978437089920044, | |
| "epoch": 5.652759084791386, | |
| "grad_norm": 0.9680097103118896, | |
| "learning_rate": 4.3387278531785747e-05, | |
| "loss": 0.023803821206092833, | |
| "mean_token_accuracy": 0.9915006577968597, | |
| "num_tokens": 75267006.0, | |
| "step": 4200 | |
| }, | |
| { | |
| "entropy": 0.9951873600482941, | |
| "epoch": 5.66621803499327, | |
| "grad_norm": 1.0874643325805664, | |
| "learning_rate": 4.3166744293640134e-05, | |
| "loss": 0.024874386191368104, | |
| "mean_token_accuracy": 0.9910345673561096, | |
| "num_tokens": 75445346.0, | |
| "step": 4210 | |
| }, | |
| { | |
| "entropy": 0.9993082582950592, | |
| "epoch": 5.6796769851951545, | |
| "grad_norm": 1.0696096420288086, | |
| "learning_rate": 4.2946345438263665e-05, | |
| "loss": 0.024564328789710998, | |
| "mean_token_accuracy": 0.9908776760101319, | |
| "num_tokens": 75624164.0, | |
| "step": 4220 | |
| }, | |
| { | |
| "entropy": 0.9952058970928193, | |
| "epoch": 5.693135935397039, | |
| "grad_norm": 1.1028224229812622, | |
| "learning_rate": 4.272608633227287e-05, | |
| "loss": 0.024792733788490295, | |
| "mean_token_accuracy": 0.9909776628017426, | |
| "num_tokens": 75803270.0, | |
| "step": 4230 | |
| }, | |
| { | |
| "entropy": 0.9933918535709381, | |
| "epoch": 5.706594885598923, | |
| "grad_norm": 0.9133896827697754, | |
| "learning_rate": 4.250597133951554e-05, | |
| "loss": 0.025022977590560914, | |
| "mean_token_accuracy": 0.9908600449562073, | |
| "num_tokens": 75982078.0, | |
| "step": 4240 | |
| }, | |
| { | |
| "entropy": 0.9992476463317871, | |
| "epoch": 5.720053835800807, | |
| "grad_norm": 0.9527667760848999, | |
| "learning_rate": 4.228600482098423e-05, | |
| "loss": 0.023611989617347718, | |
| "mean_token_accuracy": 0.9913343906402587, | |
| "num_tokens": 76161175.0, | |
| "step": 4250 | |
| }, | |
| { | |
| "entropy": 1.0076547145843506, | |
| "epoch": 5.7335127860026915, | |
| "grad_norm": 1.2215615510940552, | |
| "learning_rate": 4.206619113472986e-05, | |
| "loss": 0.024665328860282897, | |
| "mean_token_accuracy": 0.9910496056079865, | |
| "num_tokens": 76340176.0, | |
| "step": 4260 | |
| }, | |
| { | |
| "entropy": 1.0158798456192017, | |
| "epoch": 5.746971736204576, | |
| "grad_norm": 1.210359811782837, | |
| "learning_rate": 4.18465346357754e-05, | |
| "loss": 0.024151743948459627, | |
| "mean_token_accuracy": 0.9907714128494263, | |
| "num_tokens": 76519302.0, | |
| "step": 4270 | |
| }, | |
| { | |
| "entropy": 1.0168303012847901, | |
| "epoch": 5.76043068640646, | |
| "grad_norm": 1.1331851482391357, | |
| "learning_rate": 4.16270396760296e-05, | |
| "loss": 0.026728877425193788, | |
| "mean_token_accuracy": 0.9904241323471069, | |
| "num_tokens": 76698543.0, | |
| "step": 4280 | |
| }, | |
| { | |
| "entropy": 1.0135691165924072, | |
| "epoch": 5.773889636608344, | |
| "grad_norm": 1.0528459548950195, | |
| "learning_rate": 4.140771060420066e-05, | |
| "loss": 0.024574026465415955, | |
| "mean_token_accuracy": 0.9909278094768524, | |
| "num_tokens": 76877560.0, | |
| "step": 4290 | |
| }, | |
| { | |
| "entropy": 1.0032446384429932, | |
| "epoch": 5.787348586810229, | |
| "grad_norm": 0.8913487195968628, | |
| "learning_rate": 4.118855176571021e-05, | |
| "loss": 0.023184402287006377, | |
| "mean_token_accuracy": 0.9918475985527039, | |
| "num_tokens": 77057531.0, | |
| "step": 4300 | |
| }, | |
| { | |
| "entropy": 0.9989308416843414, | |
| "epoch": 5.800807537012113, | |
| "grad_norm": 0.7943524122238159, | |
| "learning_rate": 4.096956750260718e-05, | |
| "loss": 0.020456457138061525, | |
| "mean_token_accuracy": 0.9927145361900329, | |
| "num_tokens": 77236668.0, | |
| "step": 4310 | |
| }, | |
| { | |
| "entropy": 0.9927454888820648, | |
| "epoch": 5.814266487213997, | |
| "grad_norm": 1.0990914106369019, | |
| "learning_rate": 4.07507621534817e-05, | |
| "loss": 0.02571033835411072, | |
| "mean_token_accuracy": 0.9910105168819427, | |
| "num_tokens": 77415692.0, | |
| "step": 4320 | |
| }, | |
| { | |
| "entropy": 0.9979439377784729, | |
| "epoch": 5.827725437415881, | |
| "grad_norm": 0.9713761806488037, | |
| "learning_rate": 4.053214005337924e-05, | |
| "loss": 0.024069878458976745, | |
| "mean_token_accuracy": 0.9908368527889252, | |
| "num_tokens": 77595253.0, | |
| "step": 4330 | |
| }, | |
| { | |
| "entropy": 1.000143724679947, | |
| "epoch": 5.841184387617766, | |
| "grad_norm": 1.071368932723999, | |
| "learning_rate": 4.031370553371465e-05, | |
| "loss": 0.02418675720691681, | |
| "mean_token_accuracy": 0.9909208953380585, | |
| "num_tokens": 77774484.0, | |
| "step": 4340 | |
| }, | |
| { | |
| "entropy": 1.007152682542801, | |
| "epoch": 5.85464333781965, | |
| "grad_norm": 1.1110836267471313, | |
| "learning_rate": 4.0095462922186385e-05, | |
| "loss": 0.02507558763027191, | |
| "mean_token_accuracy": 0.9910820901393891, | |
| "num_tokens": 77954000.0, | |
| "step": 4350 | |
| }, | |
| { | |
| "entropy": 1.010567343235016, | |
| "epoch": 5.868102288021534, | |
| "grad_norm": 0.9108941555023193, | |
| "learning_rate": 3.9877416542690746e-05, | |
| "loss": 0.02309911847114563, | |
| "mean_token_accuracy": 0.9912745118141174, | |
| "num_tokens": 78133205.0, | |
| "step": 4360 | |
| }, | |
| { | |
| "entropy": 1.0007760405540467, | |
| "epoch": 5.881561238223418, | |
| "grad_norm": 0.9847738146781921, | |
| "learning_rate": 3.9659570715236234e-05, | |
| "loss": 0.022994443774223328, | |
| "mean_token_accuracy": 0.991887879371643, | |
| "num_tokens": 78312209.0, | |
| "step": 4370 | |
| }, | |
| { | |
| "entropy": 1.0055775403976441, | |
| "epoch": 5.895020188425303, | |
| "grad_norm": 0.9286985993385315, | |
| "learning_rate": 3.944192975585792e-05, | |
| "loss": 0.024959474802017212, | |
| "mean_token_accuracy": 0.9908322095870972, | |
| "num_tokens": 78491839.0, | |
| "step": 4380 | |
| }, | |
| { | |
| "entropy": 1.0130927085876464, | |
| "epoch": 5.908479138627187, | |
| "grad_norm": 1.101596713066101, | |
| "learning_rate": 3.922449797653198e-05, | |
| "loss": 0.024828463792800903, | |
| "mean_token_accuracy": 0.9908891916275024, | |
| "num_tokens": 78670330.0, | |
| "step": 4390 | |
| }, | |
| { | |
| "entropy": 1.0086628675460816, | |
| "epoch": 5.921938088829071, | |
| "grad_norm": 0.7992931604385376, | |
| "learning_rate": 3.900727968509024e-05, | |
| "loss": 0.0249856099486351, | |
| "mean_token_accuracy": 0.9912192106246949, | |
| "num_tokens": 78849616.0, | |
| "step": 4400 | |
| }, | |
| { | |
| "entropy": 1.0030723690986634, | |
| "epoch": 5.935397039030955, | |
| "grad_norm": 1.0745497941970825, | |
| "learning_rate": 3.879027918513483e-05, | |
| "loss": 0.024454452097415924, | |
| "mean_token_accuracy": 0.9910886645317077, | |
| "num_tokens": 79028280.0, | |
| "step": 4410 | |
| }, | |
| { | |
| "entropy": 1.0046655774116515, | |
| "epoch": 5.94885598923284, | |
| "grad_norm": 1.1712580919265747, | |
| "learning_rate": 3.857350077595289e-05, | |
| "loss": 0.024604372680187225, | |
| "mean_token_accuracy": 0.9910597622394561, | |
| "num_tokens": 79207400.0, | |
| "step": 4420 | |
| }, | |
| { | |
| "entropy": 1.0095412015914917, | |
| "epoch": 5.962314939434724, | |
| "grad_norm": 1.0506722927093506, | |
| "learning_rate": 3.835694875243149e-05, | |
| "loss": 0.023779749870300293, | |
| "mean_token_accuracy": 0.9909038186073303, | |
| "num_tokens": 79386847.0, | |
| "step": 4430 | |
| }, | |
| { | |
| "entropy": 1.014033055305481, | |
| "epoch": 5.975773889636608, | |
| "grad_norm": 1.269294261932373, | |
| "learning_rate": 3.814062740497243e-05, | |
| "loss": 0.02225509285926819, | |
| "mean_token_accuracy": 0.991925185918808, | |
| "num_tokens": 79566247.0, | |
| "step": 4440 | |
| }, | |
| { | |
| "entropy": 1.0057903945446014, | |
| "epoch": 5.989232839838492, | |
| "grad_norm": 0.8028563857078552, | |
| "learning_rate": 3.7924541019407264e-05, | |
| "loss": 0.021707671880722045, | |
| "mean_token_accuracy": 0.9925049006938934, | |
| "num_tokens": 79745532.0, | |
| "step": 4450 | |
| }, | |
| { | |
| "epoch": 6.0, | |
| "eval_entropy": 0.9986690609318436, | |
| "eval_loss": 0.1345997303724289, | |
| "eval_mean_token_accuracy": 0.9580068561681516, | |
| "eval_num_tokens": 79888260.0, | |
| "eval_runtime": 12.7672, | |
| "eval_samples_per_second": 391.628, | |
| "eval_steps_per_second": 12.297, | |
| "step": 4458 | |
| }, | |
| { | |
| "entropy": 1.0019358932971953, | |
| "epoch": 6.002691790040377, | |
| "grad_norm": 0.9917385578155518, | |
| "learning_rate": 3.7708693876912435e-05, | |
| "loss": 0.02177269458770752, | |
| "mean_token_accuracy": 0.9918585777282715, | |
| "num_tokens": 79924160.0, | |
| "step": 4460 | |
| }, | |
| { | |
| "entropy": 0.9939176559448242, | |
| "epoch": 6.016150740242261, | |
| "grad_norm": 0.8566797375679016, | |
| "learning_rate": 3.74930902539244e-05, | |
| "loss": 0.016230569779872896, | |
| "mean_token_accuracy": 0.9944312930107116, | |
| "num_tokens": 80103783.0, | |
| "step": 4470 | |
| }, | |
| { | |
| "entropy": 0.9944855153560639, | |
| "epoch": 6.029609690444145, | |
| "grad_norm": 0.7844175100326538, | |
| "learning_rate": 3.727773442205493e-05, | |
| "loss": 0.016550612449645997, | |
| "mean_token_accuracy": 0.9942670106887818, | |
| "num_tokens": 80282630.0, | |
| "step": 4480 | |
| }, | |
| { | |
| "entropy": 0.9905124843120575, | |
| "epoch": 6.043068640646029, | |
| "grad_norm": 0.927370011806488, | |
| "learning_rate": 3.7062630648006485e-05, | |
| "loss": 0.015363200008869171, | |
| "mean_token_accuracy": 0.9948609352111817, | |
| "num_tokens": 80461728.0, | |
| "step": 4490 | |
| }, | |
| { | |
| "entropy": 0.9897569298744202, | |
| "epoch": 6.056527590847914, | |
| "grad_norm": 0.8723940253257751, | |
| "learning_rate": 3.684778319348765e-05, | |
| "loss": 0.01630091369152069, | |
| "mean_token_accuracy": 0.9939091682434082, | |
| "num_tokens": 80640800.0, | |
| "step": 4500 | |
| }, | |
| { | |
| "entropy": 0.9903276920318603, | |
| "epoch": 6.069986541049798, | |
| "grad_norm": 0.886618435382843, | |
| "learning_rate": 3.663319631512874e-05, | |
| "loss": 0.015594318509101868, | |
| "mean_token_accuracy": 0.9944710969924927, | |
| "num_tokens": 80820047.0, | |
| "step": 4510 | |
| }, | |
| { | |
| "entropy": 0.9845296442508698, | |
| "epoch": 6.083445491251682, | |
| "grad_norm": 0.7410146594047546, | |
| "learning_rate": 3.641887426439743e-05, | |
| "loss": 0.014708422124385834, | |
| "mean_token_accuracy": 0.9950878858566284, | |
| "num_tokens": 81000118.0, | |
| "step": 4520 | |
| }, | |
| { | |
| "entropy": 0.9876994490623474, | |
| "epoch": 6.0969044414535665, | |
| "grad_norm": 0.8826921582221985, | |
| "learning_rate": 3.620482128751456e-05, | |
| "loss": 0.014842665195465088, | |
| "mean_token_accuracy": 0.995011156797409, | |
| "num_tokens": 81179122.0, | |
| "step": 4530 | |
| }, | |
| { | |
| "entropy": 0.9812965631484986, | |
| "epoch": 6.110363391655451, | |
| "grad_norm": 0.9228511452674866, | |
| "learning_rate": 3.599104162536997e-05, | |
| "loss": 0.016777697205543517, | |
| "mean_token_accuracy": 0.994401341676712, | |
| "num_tokens": 81358570.0, | |
| "step": 4540 | |
| }, | |
| { | |
| "entropy": 0.9767560362815857, | |
| "epoch": 6.123822341857335, | |
| "grad_norm": 0.869403600692749, | |
| "learning_rate": 3.577753951343851e-05, | |
| "loss": 0.01631978303194046, | |
| "mean_token_accuracy": 0.9939962327480316, | |
| "num_tokens": 81537539.0, | |
| "step": 4550 | |
| }, | |
| { | |
| "entropy": 0.9815046191215515, | |
| "epoch": 6.137281292059219, | |
| "grad_norm": 0.7447561025619507, | |
| "learning_rate": 3.55643191816961e-05, | |
| "loss": 0.016492336988449097, | |
| "mean_token_accuracy": 0.9942592322826386, | |
| "num_tokens": 81716684.0, | |
| "step": 4560 | |
| }, | |
| { | |
| "entropy": 0.9812978148460388, | |
| "epoch": 6.1507402422611035, | |
| "grad_norm": 0.9655007123947144, | |
| "learning_rate": 3.535138485453595e-05, | |
| "loss": 0.015433910489082336, | |
| "mean_token_accuracy": 0.994744598865509, | |
| "num_tokens": 81895922.0, | |
| "step": 4570 | |
| }, | |
| { | |
| "entropy": 0.9846289932727814, | |
| "epoch": 6.164199192462988, | |
| "grad_norm": 0.771899402141571, | |
| "learning_rate": 3.513874075068484e-05, | |
| "loss": 0.01579052358865738, | |
| "mean_token_accuracy": 0.994454699754715, | |
| "num_tokens": 82074541.0, | |
| "step": 4580 | |
| }, | |
| { | |
| "entropy": 0.9767491102218628, | |
| "epoch": 6.177658142664872, | |
| "grad_norm": 0.7259030342102051, | |
| "learning_rate": 3.492639108311955e-05, | |
| "loss": 0.0157473087310791, | |
| "mean_token_accuracy": 0.9944621860980988, | |
| "num_tokens": 82254238.0, | |
| "step": 4590 | |
| }, | |
| { | |
| "entropy": 0.9825841665267945, | |
| "epoch": 6.191117092866756, | |
| "grad_norm": 1.0318107604980469, | |
| "learning_rate": 3.471434005898339e-05, | |
| "loss": 0.015785586833953858, | |
| "mean_token_accuracy": 0.9947353541851044, | |
| "num_tokens": 82433024.0, | |
| "step": 4600 | |
| }, | |
| { | |
| "entropy": 0.9830503404140473, | |
| "epoch": 6.2045760430686405, | |
| "grad_norm": 0.7057244181632996, | |
| "learning_rate": 3.450259187950283e-05, | |
| "loss": 0.014754258096218109, | |
| "mean_token_accuracy": 0.995175975561142, | |
| "num_tokens": 82611917.0, | |
| "step": 4610 | |
| }, | |
| { | |
| "entropy": 0.9704286217689514, | |
| "epoch": 6.218034993270525, | |
| "grad_norm": 0.7422804832458496, | |
| "learning_rate": 3.429115073990431e-05, | |
| "loss": 0.014288221299648286, | |
| "mean_token_accuracy": 0.9952509880065918, | |
| "num_tokens": 82791352.0, | |
| "step": 4620 | |
| }, | |
| { | |
| "entropy": 0.9686006426811218, | |
| "epoch": 6.231493943472409, | |
| "grad_norm": 1.1123586893081665, | |
| "learning_rate": 3.408002082933107e-05, | |
| "loss": 0.015031391382217407, | |
| "mean_token_accuracy": 0.9947735011577606, | |
| "num_tokens": 82970025.0, | |
| "step": 4630 | |
| }, | |
| { | |
| "entropy": 0.9696213662624359, | |
| "epoch": 6.244952893674293, | |
| "grad_norm": 0.8872476816177368, | |
| "learning_rate": 3.3869206330760187e-05, | |
| "loss": 0.01578601598739624, | |
| "mean_token_accuracy": 0.9947343170642853, | |
| "num_tokens": 83148439.0, | |
| "step": 4640 | |
| }, | |
| { | |
| "entropy": 0.97174631357193, | |
| "epoch": 6.2584118438761775, | |
| "grad_norm": 1.1288238763809204, | |
| "learning_rate": 3.365871142091968e-05, | |
| "loss": 0.017076049745082856, | |
| "mean_token_accuracy": 0.9942302107810974, | |
| "num_tokens": 83327163.0, | |
| "step": 4650 | |
| }, | |
| { | |
| "entropy": 0.97059445977211, | |
| "epoch": 6.271870794078062, | |
| "grad_norm": 0.9584773182868958, | |
| "learning_rate": 3.3448540270205766e-05, | |
| "loss": 0.015507197380065918, | |
| "mean_token_accuracy": 0.9949185311794281, | |
| "num_tokens": 83506315.0, | |
| "step": 4660 | |
| }, | |
| { | |
| "entropy": 0.9751429915428161, | |
| "epoch": 6.285329744279946, | |
| "grad_norm": 1.0442348718643188, | |
| "learning_rate": 3.323869704260025e-05, | |
| "loss": 0.01604621559381485, | |
| "mean_token_accuracy": 0.9944981873035431, | |
| "num_tokens": 83685291.0, | |
| "step": 4670 | |
| }, | |
| { | |
| "entropy": 0.9754107832908631, | |
| "epoch": 6.29878869448183, | |
| "grad_norm": 1.268025279045105, | |
| "learning_rate": 3.302918589558801e-05, | |
| "loss": 0.016522862017154694, | |
| "mean_token_accuracy": 0.9943258702754975, | |
| "num_tokens": 83864366.0, | |
| "step": 4680 | |
| }, | |
| { | |
| "entropy": 0.9815966248512268, | |
| "epoch": 6.312247644683715, | |
| "grad_norm": 0.9281536936759949, | |
| "learning_rate": 3.282001098007462e-05, | |
| "loss": 0.016834209859371185, | |
| "mean_token_accuracy": 0.9941475093364716, | |
| "num_tokens": 84043214.0, | |
| "step": 4690 | |
| }, | |
| { | |
| "entropy": 0.9755142807960511, | |
| "epoch": 6.325706594885599, | |
| "grad_norm": 0.707933247089386, | |
| "learning_rate": 3.261117644030412e-05, | |
| "loss": 0.01579307168722153, | |
| "mean_token_accuracy": 0.9944198191165924, | |
| "num_tokens": 84222708.0, | |
| "step": 4700 | |
| }, | |
| { | |
| "entropy": 0.975419533252716, | |
| "epoch": 6.339165545087483, | |
| "grad_norm": 1.036906361579895, | |
| "learning_rate": 3.240268641377694e-05, | |
| "loss": 0.015164561569690704, | |
| "mean_token_accuracy": 0.9947849869728088, | |
| "num_tokens": 84402234.0, | |
| "step": 4710 | |
| }, | |
| { | |
| "entropy": 0.9727317273616791, | |
| "epoch": 6.352624495289367, | |
| "grad_norm": 0.897186815738678, | |
| "learning_rate": 3.2194545031167866e-05, | |
| "loss": 0.016288670897483825, | |
| "mean_token_accuracy": 0.9942747056484222, | |
| "num_tokens": 84580970.0, | |
| "step": 4720 | |
| }, | |
| { | |
| "entropy": 0.9730606615543366, | |
| "epoch": 6.366083445491252, | |
| "grad_norm": 0.7614856958389282, | |
| "learning_rate": 3.1986756416244245e-05, | |
| "loss": 0.015893808007240294, | |
| "mean_token_accuracy": 0.9945310592651367, | |
| "num_tokens": 84759995.0, | |
| "step": 4730 | |
| }, | |
| { | |
| "entropy": 0.9783101677894592, | |
| "epoch": 6.379542395693136, | |
| "grad_norm": 0.8672609329223633, | |
| "learning_rate": 3.177932468578426e-05, | |
| "loss": 0.015059471130371094, | |
| "mean_token_accuracy": 0.9949405431747437, | |
| "num_tokens": 84939577.0, | |
| "step": 4740 | |
| }, | |
| { | |
| "entropy": 0.9774933516979217, | |
| "epoch": 6.39300134589502, | |
| "grad_norm": 1.1011027097702026, | |
| "learning_rate": 3.157225394949542e-05, | |
| "loss": 0.015801313519477844, | |
| "mean_token_accuracy": 0.9942348599433899, | |
| "num_tokens": 85119302.0, | |
| "step": 4750 | |
| }, | |
| { | |
| "entropy": 0.9862867891788483, | |
| "epoch": 6.406460296096904, | |
| "grad_norm": 0.813953161239624, | |
| "learning_rate": 3.136554830993304e-05, | |
| "loss": 0.015246957540512085, | |
| "mean_token_accuracy": 0.9948500633239746, | |
| "num_tokens": 85298248.0, | |
| "step": 4760 | |
| }, | |
| { | |
| "entropy": 0.9877782166004181, | |
| "epoch": 6.419919246298789, | |
| "grad_norm": 0.8229409456253052, | |
| "learning_rate": 3.115921186241906e-05, | |
| "loss": 0.015437261760234832, | |
| "mean_token_accuracy": 0.9939783871173858, | |
| "num_tokens": 85477420.0, | |
| "step": 4770 | |
| }, | |
| { | |
| "entropy": 0.9822047591209412, | |
| "epoch": 6.433378196500673, | |
| "grad_norm": 1.027951955795288, | |
| "learning_rate": 3.0953248694960824e-05, | |
| "loss": 0.01582965850830078, | |
| "mean_token_accuracy": 0.9941119194030762, | |
| "num_tokens": 85656231.0, | |
| "step": 4780 | |
| }, | |
| { | |
| "entropy": 0.9807499408721924, | |
| "epoch": 6.446837146702557, | |
| "grad_norm": 0.6412605047225952, | |
| "learning_rate": 3.0747662888170146e-05, | |
| "loss": 0.01425456702709198, | |
| "mean_token_accuracy": 0.994831895828247, | |
| "num_tokens": 85835707.0, | |
| "step": 4790 | |
| }, | |
| { | |
| "entropy": 0.978723019361496, | |
| "epoch": 6.460296096904441, | |
| "grad_norm": 0.8043062090873718, | |
| "learning_rate": 3.054245851518246e-05, | |
| "loss": 0.015074288845062256, | |
| "mean_token_accuracy": 0.9949940204620361, | |
| "num_tokens": 86014951.0, | |
| "step": 4800 | |
| }, | |
| { | |
| "entropy": 0.9753713130950927, | |
| "epoch": 6.473755047106326, | |
| "grad_norm": 0.787334680557251, | |
| "learning_rate": 3.0337639641576065e-05, | |
| "loss": 0.015617562830448151, | |
| "mean_token_accuracy": 0.9947958707809448, | |
| "num_tokens": 86194757.0, | |
| "step": 4810 | |
| }, | |
| { | |
| "entropy": 0.9751108765602112, | |
| "epoch": 6.48721399730821, | |
| "grad_norm": 0.8143076300621033, | |
| "learning_rate": 3.0133210325291662e-05, | |
| "loss": 0.01638820767402649, | |
| "mean_token_accuracy": 0.9946176171302795, | |
| "num_tokens": 86374538.0, | |
| "step": 4820 | |
| }, | |
| { | |
| "entropy": 0.9809349179267883, | |
| "epoch": 6.500672947510094, | |
| "grad_norm": 0.8488563895225525, | |
| "learning_rate": 2.9929174616551857e-05, | |
| "loss": 0.015374109148979187, | |
| "mean_token_accuracy": 0.9945039927959443, | |
| "num_tokens": 86553350.0, | |
| "step": 4830 | |
| }, | |
| { | |
| "entropy": 0.9858408272266388, | |
| "epoch": 6.514131897711978, | |
| "grad_norm": 0.9234822988510132, | |
| "learning_rate": 2.9725536557781008e-05, | |
| "loss": 0.014567127823829651, | |
| "mean_token_accuracy": 0.9946143448352813, | |
| "num_tokens": 86732008.0, | |
| "step": 4840 | |
| }, | |
| { | |
| "entropy": 0.9714087128639222, | |
| "epoch": 6.527590847913863, | |
| "grad_norm": 0.7859043478965759, | |
| "learning_rate": 2.9522300183525097e-05, | |
| "loss": 0.014002390205860138, | |
| "mean_token_accuracy": 0.9951234996318817, | |
| "num_tokens": 86911698.0, | |
| "step": 4850 | |
| }, | |
| { | |
| "entropy": 0.9689762830734253, | |
| "epoch": 6.541049798115747, | |
| "grad_norm": 0.7839444279670715, | |
| "learning_rate": 2.931946952037179e-05, | |
| "loss": 0.015208807587623597, | |
| "mean_token_accuracy": 0.9949181616306305, | |
| "num_tokens": 87090127.0, | |
| "step": 4860 | |
| }, | |
| { | |
| "entropy": 0.9610137641429901, | |
| "epoch": 6.554508748317631, | |
| "grad_norm": 0.8249244689941406, | |
| "learning_rate": 2.9117048586870654e-05, | |
| "loss": 0.013001258671283721, | |
| "mean_token_accuracy": 0.9952585935592652, | |
| "num_tokens": 87269489.0, | |
| "step": 4870 | |
| }, | |
| { | |
| "entropy": 0.965034544467926, | |
| "epoch": 6.5679676985195155, | |
| "grad_norm": 0.656859815120697, | |
| "learning_rate": 2.891504139345358e-05, | |
| "loss": 0.0163768470287323, | |
| "mean_token_accuracy": 0.994057834148407, | |
| "num_tokens": 87448301.0, | |
| "step": 4880 | |
| }, | |
| { | |
| "entropy": 0.9685207188129425, | |
| "epoch": 6.5814266487214, | |
| "grad_norm": 0.9349063634872437, | |
| "learning_rate": 2.8713451942355285e-05, | |
| "loss": 0.014103662967681885, | |
| "mean_token_accuracy": 0.9951852679252624, | |
| "num_tokens": 87627477.0, | |
| "step": 4890 | |
| }, | |
| { | |
| "entropy": 0.9671570241451264, | |
| "epoch": 6.594885598923284, | |
| "grad_norm": 0.7718268632888794, | |
| "learning_rate": 2.8512284227534027e-05, | |
| "loss": 0.014896789193153381, | |
| "mean_token_accuracy": 0.9945676863193512, | |
| "num_tokens": 87806660.0, | |
| "step": 4900 | |
| }, | |
| { | |
| "entropy": 0.9660835266113281, | |
| "epoch": 6.608344549125168, | |
| "grad_norm": 0.948875904083252, | |
| "learning_rate": 2.8311542234592497e-05, | |
| "loss": 0.014949330687522888, | |
| "mean_token_accuracy": 0.9944600164890289, | |
| "num_tokens": 87986236.0, | |
| "step": 4910 | |
| }, | |
| { | |
| "entropy": 0.9721682965755463, | |
| "epoch": 6.6218034993270525, | |
| "grad_norm": 0.8616317510604858, | |
| "learning_rate": 2.8111229940698842e-05, | |
| "loss": 0.015388941764831543, | |
| "mean_token_accuracy": 0.994827789068222, | |
| "num_tokens": 88164624.0, | |
| "step": 4920 | |
| }, | |
| { | |
| "entropy": 0.9700623154640198, | |
| "epoch": 6.635262449528937, | |
| "grad_norm": 0.8105503916740417, | |
| "learning_rate": 2.791135131450785e-05, | |
| "loss": 0.014793205261230468, | |
| "mean_token_accuracy": 0.995061457157135, | |
| "num_tokens": 88344024.0, | |
| "step": 4930 | |
| }, | |
| { | |
| "entropy": 0.9719329476356506, | |
| "epoch": 6.648721399730821, | |
| "grad_norm": 0.9461077451705933, | |
| "learning_rate": 2.7711910316082357e-05, | |
| "loss": 0.014379647374153138, | |
| "mean_token_accuracy": 0.9948806226253509, | |
| "num_tokens": 88524031.0, | |
| "step": 4940 | |
| }, | |
| { | |
| "entropy": 0.9734318852424622, | |
| "epoch": 6.662180349932705, | |
| "grad_norm": 1.367875576019287, | |
| "learning_rate": 2.7512910896814747e-05, | |
| "loss": 0.01659778952598572, | |
| "mean_token_accuracy": 0.9942364275455475, | |
| "num_tokens": 88703645.0, | |
| "step": 4950 | |
| }, | |
| { | |
| "entropy": 0.9703188121318818, | |
| "epoch": 6.6756393001345895, | |
| "grad_norm": 1.3389500379562378, | |
| "learning_rate": 2.7314356999348713e-05, | |
| "loss": 0.014963071048259734, | |
| "mean_token_accuracy": 0.9946206867694855, | |
| "num_tokens": 88883045.0, | |
| "step": 4960 | |
| }, | |
| { | |
| "entropy": 0.9737108051776886, | |
| "epoch": 6.689098250336474, | |
| "grad_norm": 0.9245385527610779, | |
| "learning_rate": 2.711625255750111e-05, | |
| "loss": 0.016331857442855834, | |
| "mean_token_accuracy": 0.9943425476551055, | |
| "num_tokens": 89062755.0, | |
| "step": 4970 | |
| }, | |
| { | |
| "entropy": 0.9814624786376953, | |
| "epoch": 6.702557200538358, | |
| "grad_norm": 0.7200494408607483, | |
| "learning_rate": 2.691860149618402e-05, | |
| "loss": 0.014298510551452637, | |
| "mean_token_accuracy": 0.9951595067977905, | |
| "num_tokens": 89241358.0, | |
| "step": 4980 | |
| }, | |
| { | |
| "entropy": 0.9681638181209564, | |
| "epoch": 6.716016150740242, | |
| "grad_norm": 0.8605403304100037, | |
| "learning_rate": 2.6721407731327004e-05, | |
| "loss": 0.013024243712425231, | |
| "mean_token_accuracy": 0.9957234740257264, | |
| "num_tokens": 89421272.0, | |
| "step": 4990 | |
| }, | |
| { | |
| "entropy": 0.9670853018760681, | |
| "epoch": 6.7294751009421265, | |
| "grad_norm": 0.8923637866973877, | |
| "learning_rate": 2.6524675169799506e-05, | |
| "loss": 0.013967196643352508, | |
| "mean_token_accuracy": 0.9953422844409943, | |
| "num_tokens": 89600056.0, | |
| "step": 5000 | |
| }, | |
| { | |
| "entropy": 0.9660456538200378, | |
| "epoch": 6.742934051144011, | |
| "grad_norm": 0.7672311663627625, | |
| "learning_rate": 2.6328407709333463e-05, | |
| "loss": 0.013995295763015747, | |
| "mean_token_accuracy": 0.9953011035919189, | |
| "num_tokens": 89779879.0, | |
| "step": 5010 | |
| }, | |
| { | |
| "entropy": 0.9670967280864715, | |
| "epoch": 6.756393001345895, | |
| "grad_norm": 0.9835227727890015, | |
| "learning_rate": 2.6132609238446072e-05, | |
| "loss": 0.014588207006454468, | |
| "mean_token_accuracy": 0.9950920879840851, | |
| "num_tokens": 89959526.0, | |
| "step": 5020 | |
| }, | |
| { | |
| "entropy": 0.970816558599472, | |
| "epoch": 6.769851951547779, | |
| "grad_norm": 0.7489247918128967, | |
| "learning_rate": 2.5937283636362724e-05, | |
| "loss": 0.014287692308425904, | |
| "mean_token_accuracy": 0.994792515039444, | |
| "num_tokens": 90138932.0, | |
| "step": 5030 | |
| }, | |
| { | |
| "entropy": 0.9720617949962616, | |
| "epoch": 6.783310901749664, | |
| "grad_norm": 0.8448637127876282, | |
| "learning_rate": 2.5742434772940216e-05, | |
| "loss": 0.015046033263206481, | |
| "mean_token_accuracy": 0.9947949469089508, | |
| "num_tokens": 90318294.0, | |
| "step": 5040 | |
| }, | |
| { | |
| "entropy": 0.972127640247345, | |
| "epoch": 6.796769851951548, | |
| "grad_norm": 1.0595027208328247, | |
| "learning_rate": 2.5548066508590007e-05, | |
| "loss": 0.015439464151859284, | |
| "mean_token_accuracy": 0.9945104479789734, | |
| "num_tokens": 90497522.0, | |
| "step": 5050 | |
| }, | |
| { | |
| "entropy": 0.9663094699382782, | |
| "epoch": 6.810228802153432, | |
| "grad_norm": 1.0235310792922974, | |
| "learning_rate": 2.535418269420178e-05, | |
| "loss": 0.01386038064956665, | |
| "mean_token_accuracy": 0.9952830314636231, | |
| "num_tokens": 90677085.0, | |
| "step": 5060 | |
| }, | |
| { | |
| "entropy": 0.9662455677986145, | |
| "epoch": 6.823687752355316, | |
| "grad_norm": 1.0424898862838745, | |
| "learning_rate": 2.5160787171067126e-05, | |
| "loss": 0.012799303233623504, | |
| "mean_token_accuracy": 0.995597755908966, | |
| "num_tokens": 90856270.0, | |
| "step": 5070 | |
| }, | |
| { | |
| "entropy": 0.957698255777359, | |
| "epoch": 6.837146702557201, | |
| "grad_norm": 0.8649188876152039, | |
| "learning_rate": 2.4967883770803413e-05, | |
| "loss": 0.013675674796104431, | |
| "mean_token_accuracy": 0.9950868904590606, | |
| "num_tokens": 91035537.0, | |
| "step": 5080 | |
| }, | |
| { | |
| "entropy": 0.9543747365474701, | |
| "epoch": 6.850605652759085, | |
| "grad_norm": 0.9695640206336975, | |
| "learning_rate": 2.477547631527799e-05, | |
| "loss": 0.014598837494850159, | |
| "mean_token_accuracy": 0.9951783776283264, | |
| "num_tokens": 91214714.0, | |
| "step": 5090 | |
| }, | |
| { | |
| "entropy": 0.9589528560638427, | |
| "epoch": 6.864064602960969, | |
| "grad_norm": 0.7789045572280884, | |
| "learning_rate": 2.45835686165323e-05, | |
| "loss": 0.014046281576156616, | |
| "mean_token_accuracy": 0.9951943933963776, | |
| "num_tokens": 91393783.0, | |
| "step": 5100 | |
| }, | |
| { | |
| "entropy": 0.9627966225147248, | |
| "epoch": 6.877523553162853, | |
| "grad_norm": 0.8687154650688171, | |
| "learning_rate": 2.4392164476706468e-05, | |
| "loss": 0.015588788688182831, | |
| "mean_token_accuracy": 0.9948146939277649, | |
| "num_tokens": 91573569.0, | |
| "step": 5110 | |
| }, | |
| { | |
| "entropy": 0.9699371516704559, | |
| "epoch": 6.890982503364738, | |
| "grad_norm": 1.4397906064987183, | |
| "learning_rate": 2.4201267687963935e-05, | |
| "loss": 0.014941152930259705, | |
| "mean_token_accuracy": 0.9946426331996918, | |
| "num_tokens": 91751970.0, | |
| "step": 5120 | |
| }, | |
| { | |
| "entropy": 0.9695391833782196, | |
| "epoch": 6.904441453566622, | |
| "grad_norm": 0.938165545463562, | |
| "learning_rate": 2.4010882032416332e-05, | |
| "loss": 0.01557524800300598, | |
| "mean_token_accuracy": 0.994098824262619, | |
| "num_tokens": 91931575.0, | |
| "step": 5130 | |
| }, | |
| { | |
| "entropy": 0.9726310014724732, | |
| "epoch": 6.917900403768506, | |
| "grad_norm": 1.0270136594772339, | |
| "learning_rate": 2.3821011282048545e-05, | |
| "loss": 0.015305042266845703, | |
| "mean_token_accuracy": 0.9945661365985871, | |
| "num_tokens": 92110694.0, | |
| "step": 5140 | |
| }, | |
| { | |
| "entropy": 0.9728193879127502, | |
| "epoch": 6.93135935397039, | |
| "grad_norm": 1.027956485748291, | |
| "learning_rate": 2.3631659198643985e-05, | |
| "loss": 0.01502918303012848, | |
| "mean_token_accuracy": 0.994612592458725, | |
| "num_tokens": 92289711.0, | |
| "step": 5150 | |
| }, | |
| { | |
| "entropy": 0.9718130826950073, | |
| "epoch": 6.944818304172275, | |
| "grad_norm": 0.926624596118927, | |
| "learning_rate": 2.344282953371006e-05, | |
| "loss": 0.013758787512779235, | |
| "mean_token_accuracy": 0.9950670182704926, | |
| "num_tokens": 92468704.0, | |
| "step": 5160 | |
| }, | |
| { | |
| "entropy": 0.9718481361865997, | |
| "epoch": 6.958277254374159, | |
| "grad_norm": 0.9092168211936951, | |
| "learning_rate": 2.325452602840385e-05, | |
| "loss": 0.013540112972259521, | |
| "mean_token_accuracy": 0.9952319920063019, | |
| "num_tokens": 92647600.0, | |
| "step": 5170 | |
| }, | |
| { | |
| "entropy": 0.9687652945518493, | |
| "epoch": 6.971736204576043, | |
| "grad_norm": 1.0669724941253662, | |
| "learning_rate": 2.306675241345797e-05, | |
| "loss": 0.014361311495304108, | |
| "mean_token_accuracy": 0.9953016221523285, | |
| "num_tokens": 92826363.0, | |
| "step": 5180 | |
| }, | |
| { | |
| "entropy": 0.9618574023246765, | |
| "epoch": 6.985195154777927, | |
| "grad_norm": 0.7335039377212524, | |
| "learning_rate": 2.287951240910668e-05, | |
| "loss": 0.012940369546413422, | |
| "mean_token_accuracy": 0.9956261157989502, | |
| "num_tokens": 93005625.0, | |
| "step": 5190 | |
| }, | |
| { | |
| "entropy": 0.959715747833252, | |
| "epoch": 6.998654104979812, | |
| "grad_norm": 0.9648978114128113, | |
| "learning_rate": 2.269280972501217e-05, | |
| "loss": 0.013761961460113525, | |
| "mean_token_accuracy": 0.9951907277107239, | |
| "num_tokens": 93184930.0, | |
| "step": 5200 | |
| }, | |
| { | |
| "epoch": 7.0, | |
| "eval_entropy": 0.9639249083342826, | |
| "eval_loss": 0.14176945388317108, | |
| "eval_mean_token_accuracy": 0.9595637909925667, | |
| "eval_num_tokens": 93202879.0, | |
| "eval_runtime": 12.8001, | |
| "eval_samples_per_second": 390.623, | |
| "eval_steps_per_second": 12.266, | |
| "step": 5201 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 7430, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 10, | |
| "save_steps": 500, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": false | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 4.442001086890377e+18, | |
| "train_batch_size": 32, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |