Instructions to use modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora
- SGLang
How to use modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora with Docker Model Runner:
docker model run hf.co/modrill/qwen3_4b_base_kodcode4o_shortcot_8k_lora
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 1.0, | |
| "eval_steps": 200, | |
| "global_step": 3094, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.0032323232323232323, | |
| "grad_norm": 0.03605492040514946, | |
| "learning_rate": 5.806451612903226e-06, | |
| "loss": 0.4773341178894043, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.006464646464646465, | |
| "grad_norm": 0.019345857203006744, | |
| "learning_rate": 1.2258064516129032e-05, | |
| "loss": 0.45131635665893555, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.009696969696969697, | |
| "grad_norm": 0.013088575564324856, | |
| "learning_rate": 1.870967741935484e-05, | |
| "loss": 0.4297126293182373, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.01292929292929293, | |
| "grad_norm": 0.011059993878006935, | |
| "learning_rate": 2.5161290322580645e-05, | |
| "loss": 0.41430253982543946, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.01616161616161616, | |
| "grad_norm": 0.010876496322453022, | |
| "learning_rate": 3.161290322580645e-05, | |
| "loss": 0.399013614654541, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.019393939393939394, | |
| "grad_norm": 0.00941646471619606, | |
| "learning_rate": 3.8064516129032254e-05, | |
| "loss": 0.394245982170105, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.022626262626262626, | |
| "grad_norm": 0.011145997792482376, | |
| "learning_rate": 4.451612903225807e-05, | |
| "loss": 0.4129539966583252, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.02585858585858586, | |
| "grad_norm": 0.009536129422485828, | |
| "learning_rate": 5.096774193548387e-05, | |
| "loss": 0.40581393241882324, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.02909090909090909, | |
| "grad_norm": 0.008861752226948738, | |
| "learning_rate": 5.7419354838709685e-05, | |
| "loss": 0.381392502784729, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.03232323232323232, | |
| "grad_norm": 0.010512659326195717, | |
| "learning_rate": 6.387096774193548e-05, | |
| "loss": 0.39757261276245115, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.035555555555555556, | |
| "grad_norm": 0.010718696750700474, | |
| "learning_rate": 7.03225806451613e-05, | |
| "loss": 0.38897051811218264, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.03878787878787879, | |
| "grad_norm": 0.010814059525728226, | |
| "learning_rate": 7.67741935483871e-05, | |
| "loss": 0.38843908309936526, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.04202020202020202, | |
| "grad_norm": 0.010136906057596207, | |
| "learning_rate": 8.32258064516129e-05, | |
| "loss": 0.3926044225692749, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.04525252525252525, | |
| "grad_norm": 0.009705257602036, | |
| "learning_rate": 8.967741935483871e-05, | |
| "loss": 0.3880144596099854, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.048484848484848485, | |
| "grad_norm": 0.009561076760292053, | |
| "learning_rate": 9.612903225806452e-05, | |
| "loss": 0.3880956172943115, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.05171717171717172, | |
| "grad_norm": 0.009765783324837685, | |
| "learning_rate": 9.999954295400999e-05, | |
| "loss": 0.39316844940185547, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.05494949494949495, | |
| "grad_norm": 0.010105657391250134, | |
| "learning_rate": 9.999440128258112e-05, | |
| "loss": 0.38610661029815674, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 0.05818181818181818, | |
| "grad_norm": 0.008919674903154373, | |
| "learning_rate": 9.998354722168459e-05, | |
| "loss": 0.3945873975753784, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 0.061414141414141414, | |
| "grad_norm": 0.009847168810665607, | |
| "learning_rate": 9.996698201151175e-05, | |
| "loss": 0.4054004669189453, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 0.06464646464646465, | |
| "grad_norm": 0.008714244700968266, | |
| "learning_rate": 9.994470754481315e-05, | |
| "loss": 0.3861499786376953, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.06787878787878789, | |
| "grad_norm": 0.009142388589680195, | |
| "learning_rate": 9.991672636668239e-05, | |
| "loss": 0.39889438152313234, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 0.07111111111111111, | |
| "grad_norm": 0.008630151860415936, | |
| "learning_rate": 9.988304167426519e-05, | |
| "loss": 0.37990422248840333, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 0.07434343434343435, | |
| "grad_norm": 0.008543030358850956, | |
| "learning_rate": 9.984365731639419e-05, | |
| "loss": 0.3961310386657715, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 0.07757575757575758, | |
| "grad_norm": 0.009128885343670845, | |
| "learning_rate": 9.979857779314906e-05, | |
| "loss": 0.38288607597351076, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 0.08080808080808081, | |
| "grad_norm": 0.007948827929794788, | |
| "learning_rate": 9.974780825534246e-05, | |
| "loss": 0.39522628784179686, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.08404040404040404, | |
| "grad_norm": 0.009197822771966457, | |
| "learning_rate": 9.969135450393141e-05, | |
| "loss": 0.38869237899780273, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 0.08727272727272728, | |
| "grad_norm": 0.00922387931495905, | |
| "learning_rate": 9.96292229893545e-05, | |
| "loss": 0.38947885036468505, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 0.0905050505050505, | |
| "grad_norm": 0.008089344948530197, | |
| "learning_rate": 9.956142081079484e-05, | |
| "loss": 0.3940277576446533, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 0.09373737373737374, | |
| "grad_norm": 0.008494450710713863, | |
| "learning_rate": 9.948795571536891e-05, | |
| "loss": 0.3915890693664551, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 0.09696969696969697, | |
| "grad_norm": 0.008135691285133362, | |
| "learning_rate": 9.94088360972414e-05, | |
| "loss": 0.37494850158691406, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.10020202020202021, | |
| "grad_norm": 0.008869413286447525, | |
| "learning_rate": 9.932407099666608e-05, | |
| "loss": 0.4039336681365967, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 0.10343434343434343, | |
| "grad_norm": 0.00836237333714962, | |
| "learning_rate": 9.923367009895274e-05, | |
| "loss": 0.3808545351028442, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 0.10666666666666667, | |
| "grad_norm": 0.008709998801350594, | |
| "learning_rate": 9.913764373336079e-05, | |
| "loss": 0.3846753597259521, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 0.1098989898989899, | |
| "grad_norm": 0.008051419630646706, | |
| "learning_rate": 9.903600287191875e-05, | |
| "loss": 0.3809442281723022, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 0.11313131313131314, | |
| "grad_norm": 0.008488772436976433, | |
| "learning_rate": 9.892875912817079e-05, | |
| "loss": 0.39042062759399415, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.11636363636363636, | |
| "grad_norm": 0.007915249094367027, | |
| "learning_rate": 9.881592475584964e-05, | |
| "loss": 0.37756659984588625, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 0.1195959595959596, | |
| "grad_norm": 0.008581338450312614, | |
| "learning_rate": 9.869751264747656e-05, | |
| "loss": 0.3929391145706177, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 0.12282828282828283, | |
| "grad_norm": 0.008737748488783836, | |
| "learning_rate": 9.857353633288814e-05, | |
| "loss": 0.3863339424133301, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 0.12606060606060607, | |
| "grad_norm": 0.008519729599356651, | |
| "learning_rate": 9.844400997769043e-05, | |
| "loss": 0.38788180351257323, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 0.1292929292929293, | |
| "grad_norm": 0.008482911624014378, | |
| "learning_rate": 9.83089483816404e-05, | |
| "loss": 0.3903486967086792, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.13252525252525252, | |
| "grad_norm": 0.008296508342027664, | |
| "learning_rate": 9.816836697695482e-05, | |
| "loss": 0.39067506790161133, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 0.13575757575757577, | |
| "grad_norm": 0.009947331622242928, | |
| "learning_rate": 9.802228182654702e-05, | |
| "loss": 0.3869569540023804, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 0.138989898989899, | |
| "grad_norm": 0.008398473262786865, | |
| "learning_rate": 9.787070962219156e-05, | |
| "loss": 0.3667590618133545, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 0.14222222222222222, | |
| "grad_norm": 0.00859450176358223, | |
| "learning_rate": 9.771366768261696e-05, | |
| "loss": 0.38375401496887207, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 0.14545454545454545, | |
| "grad_norm": 0.008234560489654541, | |
| "learning_rate": 9.755117395152689e-05, | |
| "loss": 0.3801938533782959, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.1486868686868687, | |
| "grad_norm": 0.008372402749955654, | |
| "learning_rate": 9.73832469955499e-05, | |
| "loss": 0.3815694570541382, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 0.15191919191919193, | |
| "grad_norm": 0.008638300001621246, | |
| "learning_rate": 9.720990600211797e-05, | |
| "loss": 0.38620219230651853, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 0.15515151515151515, | |
| "grad_norm": 0.008054033853113651, | |
| "learning_rate": 9.703117077727419e-05, | |
| "loss": 0.36687431335449217, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 0.15838383838383838, | |
| "grad_norm": 0.008317321538925171, | |
| "learning_rate": 9.684706174340965e-05, | |
| "loss": 0.3756044626235962, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 0.16161616161616163, | |
| "grad_norm": 0.008002839982509613, | |
| "learning_rate": 9.665759993693e-05, | |
| "loss": 0.3840150833129883, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.16484848484848486, | |
| "grad_norm": 0.008865526877343655, | |
| "learning_rate": 9.646280700585185e-05, | |
| "loss": 0.38756704330444336, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 0.16808080808080808, | |
| "grad_norm": 0.009027686901390553, | |
| "learning_rate": 9.626270520732916e-05, | |
| "loss": 0.3690171241760254, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 0.1713131313131313, | |
| "grad_norm": 0.009051651693880558, | |
| "learning_rate": 9.605731740511022e-05, | |
| "loss": 0.38026604652404783, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 0.17454545454545456, | |
| "grad_norm": 0.008543914183974266, | |
| "learning_rate": 9.584666706692517e-05, | |
| "loss": 0.3790082216262817, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 0.17777777777777778, | |
| "grad_norm": 0.008143576793372631, | |
| "learning_rate": 9.56307782618046e-05, | |
| "loss": 0.36831059455871584, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.181010101010101, | |
| "grad_norm": 0.008094793185591698, | |
| "learning_rate": 9.540967565732937e-05, | |
| "loss": 0.39109277725219727, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 0.18424242424242424, | |
| "grad_norm": 0.007765405345708132, | |
| "learning_rate": 9.51833845168121e-05, | |
| "loss": 0.38657331466674805, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 0.1874747474747475, | |
| "grad_norm": 0.008732021786272526, | |
| "learning_rate": 9.495193069641057e-05, | |
| "loss": 0.375126314163208, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 0.1907070707070707, | |
| "grad_norm": 0.008094674907624722, | |
| "learning_rate": 9.47153406421734e-05, | |
| "loss": 0.3850594997406006, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 0.19393939393939394, | |
| "grad_norm": 0.00792758259922266, | |
| "learning_rate": 9.447364138701823e-05, | |
| "loss": 0.3871599674224854, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.19717171717171716, | |
| "grad_norm": 0.007575330324470997, | |
| "learning_rate": 9.422686054764302e-05, | |
| "loss": 0.37601659297943113, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 0.20040404040404042, | |
| "grad_norm": 0.008024236187338829, | |
| "learning_rate": 9.397502632137055e-05, | |
| "loss": 0.3801377773284912, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 0.20363636363636364, | |
| "grad_norm": 0.00865199789404869, | |
| "learning_rate": 9.371816748292641e-05, | |
| "loss": 0.37289042472839357, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 0.20686868686868687, | |
| "grad_norm": 0.008158660493791103, | |
| "learning_rate": 9.345631338115141e-05, | |
| "loss": 0.3836984395980835, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 0.2101010101010101, | |
| "grad_norm": 0.009124912321567535, | |
| "learning_rate": 9.318949393564807e-05, | |
| "loss": 0.3835611820220947, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.21333333333333335, | |
| "grad_norm": 0.008734654635190964, | |
| "learning_rate": 9.291773963336193e-05, | |
| "loss": 0.3856090545654297, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 0.21656565656565657, | |
| "grad_norm": 0.0075140586122870445, | |
| "learning_rate": 9.264108152509816e-05, | |
| "loss": 0.3813042163848877, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 0.2197979797979798, | |
| "grad_norm": 0.008879176340997219, | |
| "learning_rate": 9.235955122197368e-05, | |
| "loss": 0.3917116165161133, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 0.22303030303030302, | |
| "grad_norm": 0.008145987056195736, | |
| "learning_rate": 9.207318089180524e-05, | |
| "loss": 0.38028013706207275, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 0.22626262626262628, | |
| "grad_norm": 0.007830113172531128, | |
| "learning_rate": 9.178200325543384e-05, | |
| "loss": 0.37664792537689207, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.2294949494949495, | |
| "grad_norm": 0.008120265789330006, | |
| "learning_rate": 9.148605158298621e-05, | |
| "loss": 0.36904723644256593, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 0.23272727272727273, | |
| "grad_norm": 0.00862862542271614, | |
| "learning_rate": 9.118535969007314e-05, | |
| "loss": 0.3809346675872803, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 0.23595959595959595, | |
| "grad_norm": 0.00880429707467556, | |
| "learning_rate": 9.087996193392578e-05, | |
| "loss": 0.38595972061157224, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 0.2391919191919192, | |
| "grad_norm": 0.008185146376490593, | |
| "learning_rate": 9.056989320947e-05, | |
| "loss": 0.3923794269561768, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 0.24242424242424243, | |
| "grad_norm": 0.0071869189850986, | |
| "learning_rate": 9.025518894533921e-05, | |
| "loss": 0.38382692337036134, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.24565656565656566, | |
| "grad_norm": 0.008295041508972645, | |
| "learning_rate": 8.99358850998263e-05, | |
| "loss": 0.3738658666610718, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 0.24888888888888888, | |
| "grad_norm": 0.007510303985327482, | |
| "learning_rate": 8.9612018156775e-05, | |
| "loss": 0.3734901905059814, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 0.25212121212121213, | |
| "grad_norm": 0.007918701507151127, | |
| "learning_rate": 8.928362512141124e-05, | |
| "loss": 0.3856965065002441, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 0.25535353535353533, | |
| "grad_norm": 0.008347084745764732, | |
| "learning_rate": 8.895074351611488e-05, | |
| "loss": 0.3775136470794678, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 0.2585858585858586, | |
| "grad_norm": 0.007822258397936821, | |
| "learning_rate": 8.861341137613242e-05, | |
| "loss": 0.3710262060165405, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.26181818181818184, | |
| "grad_norm": 0.008437932468950748, | |
| "learning_rate": 8.827166724523105e-05, | |
| "loss": 0.38133988380432127, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 0.26505050505050504, | |
| "grad_norm": 0.008143547922372818, | |
| "learning_rate": 8.792555017129461e-05, | |
| "loss": 0.38831157684326173, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 0.2682828282828283, | |
| "grad_norm": 0.008261593990027905, | |
| "learning_rate": 8.757509970186196e-05, | |
| "loss": 0.3812253475189209, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 0.27151515151515154, | |
| "grad_norm": 0.007986029610037804, | |
| "learning_rate": 8.722035587960826e-05, | |
| "loss": 0.3840745449066162, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 0.27474747474747474, | |
| "grad_norm": 0.008006410673260689, | |
| "learning_rate": 8.686135923776969e-05, | |
| "loss": 0.389667272567749, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.277979797979798, | |
| "grad_norm": 0.008182629942893982, | |
| "learning_rate": 8.649815079551205e-05, | |
| "loss": 0.3824803113937378, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 0.2812121212121212, | |
| "grad_norm": 0.0076688132248818874, | |
| "learning_rate": 8.613077205324389e-05, | |
| "loss": 0.36859698295593263, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 0.28444444444444444, | |
| "grad_norm": 0.0075200945138931274, | |
| "learning_rate": 8.575926498787476e-05, | |
| "loss": 0.37808995246887206, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 0.2876767676767677, | |
| "grad_norm": 0.00802092906087637, | |
| "learning_rate": 8.538367204801872e-05, | |
| "loss": 0.3732459545135498, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 0.2909090909090909, | |
| "grad_norm": 0.008313149213790894, | |
| "learning_rate": 8.500403614914432e-05, | |
| "loss": 0.36839566230773924, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.29414141414141415, | |
| "grad_norm": 0.0081673264503479, | |
| "learning_rate": 8.462040066867089e-05, | |
| "loss": 0.3731460332870483, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 0.2973737373737374, | |
| "grad_norm": 0.007819955237209797, | |
| "learning_rate": 8.423280944101233e-05, | |
| "loss": 0.3801119804382324, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 0.3006060606060606, | |
| "grad_norm": 0.008252877742052078, | |
| "learning_rate": 8.384130675256852e-05, | |
| "loss": 0.36914944648742676, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 0.30383838383838385, | |
| "grad_norm": 0.008051907643675804, | |
| "learning_rate": 8.34459373366651e-05, | |
| "loss": 0.37649900913238527, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 0.30707070707070705, | |
| "grad_norm": 0.007895972579717636, | |
| "learning_rate": 8.304674636844231e-05, | |
| "loss": 0.3798959255218506, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.3103030303030303, | |
| "grad_norm": 0.0086957523599267, | |
| "learning_rate": 8.264377945969312e-05, | |
| "loss": 0.393034553527832, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 0.31353535353535356, | |
| "grad_norm": 0.008488897234201431, | |
| "learning_rate": 8.223708265365174e-05, | |
| "loss": 0.3909647226333618, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 0.31676767676767675, | |
| "grad_norm": 0.008450930006802082, | |
| "learning_rate": 8.182670241973253e-05, | |
| "loss": 0.37601802349090574, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 0.32, | |
| "grad_norm": 0.008603021502494812, | |
| "learning_rate": 8.141268564822053e-05, | |
| "loss": 0.39119911193847656, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 0.32323232323232326, | |
| "grad_norm": 0.008219755254685879, | |
| "learning_rate": 8.099507964491369e-05, | |
| "loss": 0.36634268760681155, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.32646464646464646, | |
| "grad_norm": 0.008747057989239693, | |
| "learning_rate": 8.057393212571767e-05, | |
| "loss": 0.390001916885376, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 0.3296969696969697, | |
| "grad_norm": 0.009628907777369022, | |
| "learning_rate": 8.014929121119378e-05, | |
| "loss": 0.3795316696166992, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 0.3329292929292929, | |
| "grad_norm": 0.007869881577789783, | |
| "learning_rate": 7.972120542106077e-05, | |
| "loss": 0.37975897789001467, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 0.33616161616161616, | |
| "grad_norm": 0.00830437894910574, | |
| "learning_rate": 7.92897236686508e-05, | |
| "loss": 0.3775317192077637, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 0.3393939393939394, | |
| "grad_norm": 0.008555877953767776, | |
| "learning_rate": 7.885489525532075e-05, | |
| "loss": 0.3789222240447998, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 0.3426262626262626, | |
| "grad_norm": 0.01898166537284851, | |
| "learning_rate": 7.84167698648189e-05, | |
| "loss": 0.3830681085586548, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 0.34585858585858587, | |
| "grad_norm": 0.008047427050769329, | |
| "learning_rate": 7.797539755760805e-05, | |
| "loss": 0.3770411968231201, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 0.3490909090909091, | |
| "grad_norm": 0.0074907769449055195, | |
| "learning_rate": 7.753082876514562e-05, | |
| "loss": 0.3806899547576904, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 0.3523232323232323, | |
| "grad_norm": 0.00827647466212511, | |
| "learning_rate": 7.708311428412129e-05, | |
| "loss": 0.37074985504150393, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 0.35555555555555557, | |
| "grad_norm": 0.008136745542287827, | |
| "learning_rate": 7.663230527065293e-05, | |
| "loss": 0.37122316360473634, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.35878787878787877, | |
| "grad_norm": 0.008221322670578957, | |
| "learning_rate": 7.617845323444156e-05, | |
| "loss": 0.38070154190063477, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 0.362020202020202, | |
| "grad_norm": 0.008503571152687073, | |
| "learning_rate": 7.572161003288565e-05, | |
| "loss": 0.3785174608230591, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 0.3652525252525253, | |
| "grad_norm": 0.009484554640948772, | |
| "learning_rate": 7.526182786515609e-05, | |
| "loss": 0.37593255043029783, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 0.36848484848484847, | |
| "grad_norm": 0.008690858259797096, | |
| "learning_rate": 7.479915926623165e-05, | |
| "loss": 0.3795978307723999, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 0.3717171717171717, | |
| "grad_norm": 0.008204210549592972, | |
| "learning_rate": 7.433365710089646e-05, | |
| "loss": 0.3610103130340576, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 0.374949494949495, | |
| "grad_norm": 0.008487153798341751, | |
| "learning_rate": 7.386537455769963e-05, | |
| "loss": 0.380059027671814, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 0.3781818181818182, | |
| "grad_norm": 0.008578482083976269, | |
| "learning_rate": 7.339436514287783e-05, | |
| "loss": 0.377803635597229, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 0.3814141414141414, | |
| "grad_norm": 0.007899395190179348, | |
| "learning_rate": 7.292068267424165e-05, | |
| "loss": 0.3671201229095459, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 0.3846464646464646, | |
| "grad_norm": 0.009204370900988579, | |
| "learning_rate": 7.244438127502647e-05, | |
| "loss": 0.3741163969039917, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 0.3878787878787879, | |
| "grad_norm": 0.008694990538060665, | |
| "learning_rate": 7.196551536770807e-05, | |
| "loss": 0.3826310396194458, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 0.39111111111111113, | |
| "grad_norm": 0.008572017773985863, | |
| "learning_rate": 7.148413966778451e-05, | |
| "loss": 0.381903338432312, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 0.39434343434343433, | |
| "grad_norm": 0.008069795556366444, | |
| "learning_rate": 7.100030917752423e-05, | |
| "loss": 0.38312816619873047, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 0.3975757575757576, | |
| "grad_norm": 0.007709018420428038, | |
| "learning_rate": 7.051407917968138e-05, | |
| "loss": 0.3835233211517334, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 0.40080808080808084, | |
| "grad_norm": 0.007809523958712816, | |
| "learning_rate": 7.002550523117926e-05, | |
| "loss": 0.37577004432678224, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 0.40404040404040403, | |
| "grad_norm": 0.008419531397521496, | |
| "learning_rate": 6.953464315676241e-05, | |
| "loss": 0.37052106857299805, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 0.4072727272727273, | |
| "grad_norm": 0.009041344746947289, | |
| "learning_rate": 6.904154904261792e-05, | |
| "loss": 0.3696247339248657, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 0.4105050505050505, | |
| "grad_norm": 0.00816253386437893, | |
| "learning_rate": 6.8546279229967e-05, | |
| "loss": 0.38099074363708496, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 0.41373737373737374, | |
| "grad_norm": 0.008244721218943596, | |
| "learning_rate": 6.804889030862753e-05, | |
| "loss": 0.37920713424682617, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 0.416969696969697, | |
| "grad_norm": 0.007987082935869694, | |
| "learning_rate": 6.754943911054793e-05, | |
| "loss": 0.3793349742889404, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 0.4202020202020202, | |
| "grad_norm": 0.008034025318920612, | |
| "learning_rate": 6.704798270331358e-05, | |
| "loss": 0.37303624153137205, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 0.42343434343434344, | |
| "grad_norm": 0.008346694521605968, | |
| "learning_rate": 6.654457838362621e-05, | |
| "loss": 0.3781913757324219, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 0.4266666666666667, | |
| "grad_norm": 0.008833488449454308, | |
| "learning_rate": 6.603928367075718e-05, | |
| "loss": 0.3740977764129639, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 0.4298989898989899, | |
| "grad_norm": 0.00850659143179655, | |
| "learning_rate": 6.553215629997529e-05, | |
| "loss": 0.37595219612121583, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 0.43313131313131314, | |
| "grad_norm": 0.008206614293158054, | |
| "learning_rate": 6.502325421594988e-05, | |
| "loss": 0.3707082271575928, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 0.43636363636363634, | |
| "grad_norm": 0.008460193872451782, | |
| "learning_rate": 6.451263556613007e-05, | |
| "loss": 0.37059659957885743, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 0.4395959595959596, | |
| "grad_norm": 0.007676625624299049, | |
| "learning_rate": 6.40003586941008e-05, | |
| "loss": 0.37028162479400634, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 0.44282828282828285, | |
| "grad_norm": 0.00827883929014206, | |
| "learning_rate": 6.348648213291642e-05, | |
| "loss": 0.38210372924804686, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 0.44606060606060605, | |
| "grad_norm": 0.008517625741660595, | |
| "learning_rate": 6.297106459841272e-05, | |
| "loss": 0.37311854362487795, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 0.4492929292929293, | |
| "grad_norm": 0.007962854579091072, | |
| "learning_rate": 6.245416498249801e-05, | |
| "loss": 0.3756999969482422, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 0.45252525252525255, | |
| "grad_norm": 0.008170384913682938, | |
| "learning_rate": 6.193584234642403e-05, | |
| "loss": 0.36833963394165037, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 0.45575757575757575, | |
| "grad_norm": 0.008493859320878983, | |
| "learning_rate": 6.141615591403771e-05, | |
| "loss": 0.3753085136413574, | |
| "step": 1410 | |
| }, | |
| { | |
| "epoch": 0.458989898989899, | |
| "grad_norm": 0.008191689848899841, | |
| "learning_rate": 6.0895165065014106e-05, | |
| "loss": 0.3819756269454956, | |
| "step": 1420 | |
| }, | |
| { | |
| "epoch": 0.4622222222222222, | |
| "grad_norm": 0.008063061162829399, | |
| "learning_rate": 6.037292932807167e-05, | |
| "loss": 0.38694086074829104, | |
| "step": 1430 | |
| }, | |
| { | |
| "epoch": 0.46545454545454545, | |
| "grad_norm": 0.009821748360991478, | |
| "learning_rate": 5.984950837417048e-05, | |
| "loss": 0.36938455104827883, | |
| "step": 1440 | |
| }, | |
| { | |
| "epoch": 0.4686868686868687, | |
| "grad_norm": 0.008381606079638004, | |
| "learning_rate": 5.932496200969422e-05, | |
| "loss": 0.37668848037719727, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 0.4719191919191919, | |
| "grad_norm": 0.008244876749813557, | |
| "learning_rate": 5.879935016961661e-05, | |
| "loss": 0.38069169521331786, | |
| "step": 1460 | |
| }, | |
| { | |
| "epoch": 0.47515151515151516, | |
| "grad_norm": 0.008627382107079029, | |
| "learning_rate": 5.827273291065326e-05, | |
| "loss": 0.37565131187438966, | |
| "step": 1470 | |
| }, | |
| { | |
| "epoch": 0.4783838383838384, | |
| "grad_norm": 0.008732340298593044, | |
| "learning_rate": 5.7745170404399484e-05, | |
| "loss": 0.379933762550354, | |
| "step": 1480 | |
| }, | |
| { | |
| "epoch": 0.4816161616161616, | |
| "grad_norm": 0.008591915480792522, | |
| "learning_rate": 5.721672293045518e-05, | |
| "loss": 0.3786482810974121, | |
| "step": 1490 | |
| }, | |
| { | |
| "epoch": 0.48484848484848486, | |
| "grad_norm": 0.008559978567063808, | |
| "learning_rate": 5.668745086953712e-05, | |
| "loss": 0.37692484855651853, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 0.48808080808080806, | |
| "grad_norm": 0.008458509109914303, | |
| "learning_rate": 5.615741469657985e-05, | |
| "loss": 0.3862480878829956, | |
| "step": 1510 | |
| }, | |
| { | |
| "epoch": 0.4913131313131313, | |
| "grad_norm": 0.00866667926311493, | |
| "learning_rate": 5.562667497382582e-05, | |
| "loss": 0.3674156188964844, | |
| "step": 1520 | |
| }, | |
| { | |
| "epoch": 0.49454545454545457, | |
| "grad_norm": 0.00853388849645853, | |
| "learning_rate": 5.509529234390553e-05, | |
| "loss": 0.38260979652404786, | |
| "step": 1530 | |
| }, | |
| { | |
| "epoch": 0.49777777777777776, | |
| "grad_norm": 0.008287868462502956, | |
| "learning_rate": 5.456332752290837e-05, | |
| "loss": 0.36568374633789064, | |
| "step": 1540 | |
| }, | |
| { | |
| "epoch": 0.501010101010101, | |
| "grad_norm": 0.008751815184950829, | |
| "learning_rate": 5.4030841293445244e-05, | |
| "loss": 0.37983543872833253, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 0.5042424242424243, | |
| "grad_norm": 0.007895979098975658, | |
| "learning_rate": 5.349789449770351e-05, | |
| "loss": 0.3738078594207764, | |
| "step": 1560 | |
| }, | |
| { | |
| "epoch": 0.5074747474747475, | |
| "grad_norm": 0.008043098263442516, | |
| "learning_rate": 5.2964548030495065e-05, | |
| "loss": 0.3763037919998169, | |
| "step": 1570 | |
| }, | |
| { | |
| "epoch": 0.5107070707070707, | |
| "grad_norm": 0.00858687050640583, | |
| "learning_rate": 5.243086283229852e-05, | |
| "loss": 0.3780511856079102, | |
| "step": 1580 | |
| }, | |
| { | |
| "epoch": 0.5139393939393939, | |
| "grad_norm": 0.00860162265598774, | |
| "learning_rate": 5.18968998822961e-05, | |
| "loss": 0.37319676876068114, | |
| "step": 1590 | |
| }, | |
| { | |
| "epoch": 0.5171717171717172, | |
| "grad_norm": 0.009063811041414738, | |
| "learning_rate": 5.1362720191406065e-05, | |
| "loss": 0.3769842624664307, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 0.5204040404040404, | |
| "grad_norm": 0.009149775840342045, | |
| "learning_rate": 5.082838479531169e-05, | |
| "loss": 0.3767851829528809, | |
| "step": 1610 | |
| }, | |
| { | |
| "epoch": 0.5236363636363637, | |
| "grad_norm": 0.007992517203092575, | |
| "learning_rate": 5.029395474748714e-05, | |
| "loss": 0.3868858814239502, | |
| "step": 1620 | |
| }, | |
| { | |
| "epoch": 0.5268686868686868, | |
| "grad_norm": 0.008455543778836727, | |
| "learning_rate": 4.975949111222158e-05, | |
| "loss": 0.37787058353424074, | |
| "step": 1630 | |
| }, | |
| { | |
| "epoch": 0.5301010101010101, | |
| "grad_norm": 0.008353844285011292, | |
| "learning_rate": 4.9225054957641916e-05, | |
| "loss": 0.366853141784668, | |
| "step": 1640 | |
| }, | |
| { | |
| "epoch": 0.5333333333333333, | |
| "grad_norm": 0.008381880819797516, | |
| "learning_rate": 4.8690707348735035e-05, | |
| "loss": 0.3768073558807373, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 0.5365656565656566, | |
| "grad_norm": 0.00869447086006403, | |
| "learning_rate": 4.8156509340370605e-05, | |
| "loss": 0.3740663766860962, | |
| "step": 1660 | |
| }, | |
| { | |
| "epoch": 0.5397979797979798, | |
| "grad_norm": 0.00835113599896431, | |
| "learning_rate": 4.762252197032482e-05, | |
| "loss": 0.3748412847518921, | |
| "step": 1670 | |
| }, | |
| { | |
| "epoch": 0.5430303030303031, | |
| "grad_norm": 0.008712276816368103, | |
| "learning_rate": 4.7088806252306224e-05, | |
| "loss": 0.3652827262878418, | |
| "step": 1680 | |
| }, | |
| { | |
| "epoch": 0.5462626262626262, | |
| "grad_norm": 0.008263722993433475, | |
| "learning_rate": 4.655542316898423e-05, | |
| "loss": 0.35825161933898925, | |
| "step": 1690 | |
| }, | |
| { | |
| "epoch": 0.5494949494949495, | |
| "grad_norm": 0.008255276829004288, | |
| "learning_rate": 4.6022433665021246e-05, | |
| "loss": 0.3670318603515625, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 0.5527272727272727, | |
| "grad_norm": 0.008898588828742504, | |
| "learning_rate": 4.548989864010902e-05, | |
| "loss": 0.37490177154541016, | |
| "step": 1710 | |
| }, | |
| { | |
| "epoch": 0.555959595959596, | |
| "grad_norm": 0.00867844931781292, | |
| "learning_rate": 4.495787894201031e-05, | |
| "loss": 0.3661633014678955, | |
| "step": 1720 | |
| }, | |
| { | |
| "epoch": 0.5591919191919192, | |
| "grad_norm": 0.00891869980841875, | |
| "learning_rate": 4.442643535960631e-05, | |
| "loss": 0.38137781620025635, | |
| "step": 1730 | |
| }, | |
| { | |
| "epoch": 0.5624242424242424, | |
| "grad_norm": 0.008473970927298069, | |
| "learning_rate": 4.3895628615950864e-05, | |
| "loss": 0.37122745513916017, | |
| "step": 1740 | |
| }, | |
| { | |
| "epoch": 0.5656565656565656, | |
| "grad_norm": 0.00889658834785223, | |
| "learning_rate": 4.3365519361332345e-05, | |
| "loss": 0.3819819211959839, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 0.5688888888888889, | |
| "grad_norm": 0.008457816205918789, | |
| "learning_rate": 4.283616816634353e-05, | |
| "loss": 0.3663030624389648, | |
| "step": 1760 | |
| }, | |
| { | |
| "epoch": 0.5721212121212121, | |
| "grad_norm": 0.008427497930824757, | |
| "learning_rate": 4.230763551496089e-05, | |
| "loss": 0.38602652549743655, | |
| "step": 1770 | |
| }, | |
| { | |
| "epoch": 0.5753535353535354, | |
| "grad_norm": 0.008894660510122776, | |
| "learning_rate": 4.1779981797633645e-05, | |
| "loss": 0.3838383674621582, | |
| "step": 1780 | |
| }, | |
| { | |
| "epoch": 0.5785858585858585, | |
| "grad_norm": 0.008388307876884937, | |
| "learning_rate": 4.1253267304383455e-05, | |
| "loss": 0.3710144281387329, | |
| "step": 1790 | |
| }, | |
| { | |
| "epoch": 0.5818181818181818, | |
| "grad_norm": 0.008177547715604305, | |
| "learning_rate": 4.072755221791572e-05, | |
| "loss": 0.36981887817382814, | |
| "step": 1800 | |
| }, | |
| { | |
| "epoch": 0.585050505050505, | |
| "grad_norm": 0.008034786209464073, | |
| "learning_rate": 4.020289660674306e-05, | |
| "loss": 0.3789166212081909, | |
| "step": 1810 | |
| }, | |
| { | |
| "epoch": 0.5882828282828283, | |
| "grad_norm": 0.008227713406085968, | |
| "learning_rate": 3.967936041832173e-05, | |
| "loss": 0.3742852210998535, | |
| "step": 1820 | |
| }, | |
| { | |
| "epoch": 0.5915151515151515, | |
| "grad_norm": 0.008400660008192062, | |
| "learning_rate": 3.9157003472202246e-05, | |
| "loss": 0.3705322504043579, | |
| "step": 1830 | |
| }, | |
| { | |
| "epoch": 0.5947474747474748, | |
| "grad_norm": 0.008642120286822319, | |
| "learning_rate": 3.863588545319407e-05, | |
| "loss": 0.3812143087387085, | |
| "step": 1840 | |
| }, | |
| { | |
| "epoch": 0.597979797979798, | |
| "grad_norm": 0.008550690487027168, | |
| "learning_rate": 3.8116065904546196e-05, | |
| "loss": 0.36846873760223386, | |
| "step": 1850 | |
| }, | |
| { | |
| "epoch": 0.6012121212121212, | |
| "grad_norm": 0.009093121625483036, | |
| "learning_rate": 3.759760422114362e-05, | |
| "loss": 0.36917288303375245, | |
| "step": 1860 | |
| }, | |
| { | |
| "epoch": 0.6044444444444445, | |
| "grad_norm": 0.011531802825629711, | |
| "learning_rate": 3.708055964272088e-05, | |
| "loss": 0.37623181343078616, | |
| "step": 1870 | |
| }, | |
| { | |
| "epoch": 0.6076767676767677, | |
| "grad_norm": 0.008886284194886684, | |
| "learning_rate": 3.6564991247093234e-05, | |
| "loss": 0.368613076210022, | |
| "step": 1880 | |
| }, | |
| { | |
| "epoch": 0.610909090909091, | |
| "grad_norm": 0.008327574469149113, | |
| "learning_rate": 3.6050957943406465e-05, | |
| "loss": 0.3828991413116455, | |
| "step": 1890 | |
| }, | |
| { | |
| "epoch": 0.6141414141414141, | |
| "grad_norm": 0.008432856760919094, | |
| "learning_rate": 3.553851846540584e-05, | |
| "loss": 0.36706550121307374, | |
| "step": 1900 | |
| }, | |
| { | |
| "epoch": 0.6173737373737374, | |
| "grad_norm": 0.008847289718687534, | |
| "learning_rate": 3.50277313647252e-05, | |
| "loss": 0.3688380718231201, | |
| "step": 1910 | |
| }, | |
| { | |
| "epoch": 0.6206060606060606, | |
| "grad_norm": 0.008496418595314026, | |
| "learning_rate": 3.451865500419676e-05, | |
| "loss": 0.37277908325195314, | |
| "step": 1920 | |
| }, | |
| { | |
| "epoch": 0.6238383838383839, | |
| "grad_norm": 0.008594812825322151, | |
| "learning_rate": 3.401134755118256e-05, | |
| "loss": 0.3851970911026001, | |
| "step": 1930 | |
| }, | |
| { | |
| "epoch": 0.6270707070707071, | |
| "grad_norm": 0.00816626101732254, | |
| "learning_rate": 3.350586697092826e-05, | |
| "loss": 0.3817636251449585, | |
| "step": 1940 | |
| }, | |
| { | |
| "epoch": 0.6303030303030303, | |
| "grad_norm": 0.008723457343876362, | |
| "learning_rate": 3.300227101993998e-05, | |
| "loss": 0.3650315284729004, | |
| "step": 1950 | |
| }, | |
| { | |
| "epoch": 0.6335353535353535, | |
| "grad_norm": 0.0087728351354599, | |
| "learning_rate": 3.2500617239384947e-05, | |
| "loss": 0.37312395572662355, | |
| "step": 1960 | |
| }, | |
| { | |
| "epoch": 0.6367676767676768, | |
| "grad_norm": 0.008724791929125786, | |
| "learning_rate": 3.200096294851691e-05, | |
| "loss": 0.3921516418457031, | |
| "step": 1970 | |
| }, | |
| { | |
| "epoch": 0.64, | |
| "grad_norm": 0.008351181633770466, | |
| "learning_rate": 3.150336523812674e-05, | |
| "loss": 0.3623528957366943, | |
| "step": 1980 | |
| }, | |
| { | |
| "epoch": 0.6432323232323233, | |
| "grad_norm": 0.008469253778457642, | |
| "learning_rate": 3.100788096401925e-05, | |
| "loss": 0.36600675582885744, | |
| "step": 1990 | |
| }, | |
| { | |
| "epoch": 0.6464646464646465, | |
| "grad_norm": 0.008192651905119419, | |
| "learning_rate": 3.051456674051677e-05, | |
| "loss": 0.3711225986480713, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 0.6496969696969697, | |
| "grad_norm": 0.007976829074323177, | |
| "learning_rate": 3.0023478933990347e-05, | |
| "loss": 0.37237536907196045, | |
| "step": 2010 | |
| }, | |
| { | |
| "epoch": 0.6529292929292929, | |
| "grad_norm": 0.009051225148141384, | |
| "learning_rate": 2.9534673656419377e-05, | |
| "loss": 0.37553870677948, | |
| "step": 2020 | |
| }, | |
| { | |
| "epoch": 0.6561616161616162, | |
| "grad_norm": 0.008214341476559639, | |
| "learning_rate": 2.9048206758980136e-05, | |
| "loss": 0.36155047416687014, | |
| "step": 2030 | |
| }, | |
| { | |
| "epoch": 0.6593939393939394, | |
| "grad_norm": 0.008441639132797718, | |
| "learning_rate": 2.856413382566425e-05, | |
| "loss": 0.3772094488143921, | |
| "step": 2040 | |
| }, | |
| { | |
| "epoch": 0.6626262626262627, | |
| "grad_norm": 0.008770185522735119, | |
| "learning_rate": 2.8082510166927583e-05, | |
| "loss": 0.37615342140197755, | |
| "step": 2050 | |
| }, | |
| { | |
| "epoch": 0.6658585858585858, | |
| "grad_norm": 0.008841089904308319, | |
| "learning_rate": 2.760339081337041e-05, | |
| "loss": 0.37926411628723145, | |
| "step": 2060 | |
| }, | |
| { | |
| "epoch": 0.6690909090909091, | |
| "grad_norm": 0.008792232722043991, | |
| "learning_rate": 2.7126830509449773e-05, | |
| "loss": 0.36652073860168455, | |
| "step": 2070 | |
| }, | |
| { | |
| "epoch": 0.6723232323232323, | |
| "grad_norm": 0.008509263396263123, | |
| "learning_rate": 2.6652883707224075e-05, | |
| "loss": 0.3772120952606201, | |
| "step": 2080 | |
| }, | |
| { | |
| "epoch": 0.6755555555555556, | |
| "grad_norm": 0.0077481819316744804, | |
| "learning_rate": 2.618160456013153e-05, | |
| "loss": 0.3723082304000854, | |
| "step": 2090 | |
| }, | |
| { | |
| "epoch": 0.6787878787878788, | |
| "grad_norm": 0.008416908793151379, | |
| "learning_rate": 2.571304691680255e-05, | |
| "loss": 0.3793506145477295, | |
| "step": 2100 | |
| }, | |
| { | |
| "epoch": 0.682020202020202, | |
| "grad_norm": 0.008886829949915409, | |
| "learning_rate": 2.5247264314906917e-05, | |
| "loss": 0.3736711025238037, | |
| "step": 2110 | |
| }, | |
| { | |
| "epoch": 0.6852525252525252, | |
| "grad_norm": 0.008566746488213539, | |
| "learning_rate": 2.4784309975036513e-05, | |
| "loss": 0.37454140186309814, | |
| "step": 2120 | |
| }, | |
| { | |
| "epoch": 0.6884848484848485, | |
| "grad_norm": 0.008895625360310078, | |
| "learning_rate": 2.4324236794624456e-05, | |
| "loss": 0.3789727210998535, | |
| "step": 2130 | |
| }, | |
| { | |
| "epoch": 0.6917171717171717, | |
| "grad_norm": 0.008392645977437496, | |
| "learning_rate": 2.386709734190079e-05, | |
| "loss": 0.35956587791442873, | |
| "step": 2140 | |
| }, | |
| { | |
| "epoch": 0.694949494949495, | |
| "grad_norm": 0.008756759576499462, | |
| "learning_rate": 2.34129438498862e-05, | |
| "loss": 0.3661501884460449, | |
| "step": 2150 | |
| }, | |
| { | |
| "epoch": 0.6981818181818182, | |
| "grad_norm": 0.008462606929242611, | |
| "learning_rate": 2.296182821042374e-05, | |
| "loss": 0.37202165126800535, | |
| "step": 2160 | |
| }, | |
| { | |
| "epoch": 0.7014141414141414, | |
| "grad_norm": 0.008555078878998756, | |
| "learning_rate": 2.2513801968249644e-05, | |
| "loss": 0.37806949615478513, | |
| "step": 2170 | |
| }, | |
| { | |
| "epoch": 0.7046464646464646, | |
| "grad_norm": 0.008424860425293446, | |
| "learning_rate": 2.2068916315103783e-05, | |
| "loss": 0.36311826705932615, | |
| "step": 2180 | |
| }, | |
| { | |
| "epoch": 0.7078787878787879, | |
| "grad_norm": 0.008660963736474514, | |
| "learning_rate": 2.162722208388057e-05, | |
| "loss": 0.3788281440734863, | |
| "step": 2190 | |
| }, | |
| { | |
| "epoch": 0.7111111111111111, | |
| "grad_norm": 0.008829527534544468, | |
| "learning_rate": 2.1188769742820625e-05, | |
| "loss": 0.363692045211792, | |
| "step": 2200 | |
| }, | |
| { | |
| "epoch": 0.7143434343434344, | |
| "grad_norm": 0.008732212707400322, | |
| "learning_rate": 2.075360938974429e-05, | |
| "loss": 0.377083945274353, | |
| "step": 2210 | |
| }, | |
| { | |
| "epoch": 0.7175757575757575, | |
| "grad_norm": 0.008612161502242088, | |
| "learning_rate": 2.03217907463275e-05, | |
| "loss": 0.37938365936279295, | |
| "step": 2220 | |
| }, | |
| { | |
| "epoch": 0.7208080808080808, | |
| "grad_norm": 0.008470796048641205, | |
| "learning_rate": 1.989336315242048e-05, | |
| "loss": 0.36910898685455323, | |
| "step": 2230 | |
| }, | |
| { | |
| "epoch": 0.724040404040404, | |
| "grad_norm": 0.008112024515867233, | |
| "learning_rate": 1.9468375560410117e-05, | |
| "loss": 0.37638006210327146, | |
| "step": 2240 | |
| }, | |
| { | |
| "epoch": 0.7272727272727273, | |
| "grad_norm": 0.008808580227196217, | |
| "learning_rate": 1.90468765296267e-05, | |
| "loss": 0.3817383050918579, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 0.7305050505050505, | |
| "grad_norm": 0.00813962984830141, | |
| "learning_rate": 1.8628914220795494e-05, | |
| "loss": 0.37254207134246825, | |
| "step": 2260 | |
| }, | |
| { | |
| "epoch": 0.7337373737373737, | |
| "grad_norm": 0.00881427712738514, | |
| "learning_rate": 1.8214536390533822e-05, | |
| "loss": 0.3720477819442749, | |
| "step": 2270 | |
| }, | |
| { | |
| "epoch": 0.7369696969696969, | |
| "grad_norm": 0.00897192768752575, | |
| "learning_rate": 1.7803790385894387e-05, | |
| "loss": 0.3803945302963257, | |
| "step": 2280 | |
| }, | |
| { | |
| "epoch": 0.7402020202020202, | |
| "grad_norm": 0.008690549992024899, | |
| "learning_rate": 1.7396723138955428e-05, | |
| "loss": 0.36781790256500246, | |
| "step": 2290 | |
| }, | |
| { | |
| "epoch": 0.7434343434343434, | |
| "grad_norm": 0.008678211830556393, | |
| "learning_rate": 1.699338116145811e-05, | |
| "loss": 0.3670048236846924, | |
| "step": 2300 | |
| }, | |
| { | |
| "epoch": 0.7466666666666667, | |
| "grad_norm": 0.008816958405077457, | |
| "learning_rate": 1.6593810539492195e-05, | |
| "loss": 0.373481011390686, | |
| "step": 2310 | |
| }, | |
| { | |
| "epoch": 0.74989898989899, | |
| "grad_norm": 0.008747234009206295, | |
| "learning_rate": 1.619805692823016e-05, | |
| "loss": 0.37540497779846194, | |
| "step": 2320 | |
| }, | |
| { | |
| "epoch": 0.7531313131313131, | |
| "grad_norm": 0.00825244840234518, | |
| "learning_rate": 1.580616554671057e-05, | |
| "loss": 0.36757464408874513, | |
| "step": 2330 | |
| }, | |
| { | |
| "epoch": 0.7563636363636363, | |
| "grad_norm": 0.008477483876049519, | |
| "learning_rate": 1.5418181172671382e-05, | |
| "loss": 0.37665433883666993, | |
| "step": 2340 | |
| }, | |
| { | |
| "epoch": 0.7595959595959596, | |
| "grad_norm": 0.00875260028988123, | |
| "learning_rate": 1.5034148137433623e-05, | |
| "loss": 0.366714334487915, | |
| "step": 2350 | |
| }, | |
| { | |
| "epoch": 0.7628282828282829, | |
| "grad_norm": 0.008853024803102016, | |
| "learning_rate": 1.4654110320836017e-05, | |
| "loss": 0.37020263671875, | |
| "step": 2360 | |
| }, | |
| { | |
| "epoch": 0.7660606060606061, | |
| "grad_norm": 0.009046499617397785, | |
| "learning_rate": 1.4278111146221263e-05, | |
| "loss": 0.3723160982131958, | |
| "step": 2370 | |
| }, | |
| { | |
| "epoch": 0.7692929292929293, | |
| "grad_norm": 0.008310087956488132, | |
| "learning_rate": 1.3906193575474508e-05, | |
| "loss": 0.3688467264175415, | |
| "step": 2380 | |
| }, | |
| { | |
| "epoch": 0.7725252525252525, | |
| "grad_norm": 0.008621731773018837, | |
| "learning_rate": 1.3538400104114446e-05, | |
| "loss": 0.37307281494140626, | |
| "step": 2390 | |
| }, | |
| { | |
| "epoch": 0.7757575757575758, | |
| "grad_norm": 0.008672415278851986, | |
| "learning_rate": 1.3174772756437742e-05, | |
| "loss": 0.36974148750305175, | |
| "step": 2400 | |
| }, | |
| { | |
| "epoch": 0.778989898989899, | |
| "grad_norm": 0.008731256239116192, | |
| "learning_rate": 1.2815353080717379e-05, | |
| "loss": 0.37264394760131836, | |
| "step": 2410 | |
| }, | |
| { | |
| "epoch": 0.7822222222222223, | |
| "grad_norm": 0.014323744922876358, | |
| "learning_rate": 1.246018214445525e-05, | |
| "loss": 0.3763184309005737, | |
| "step": 2420 | |
| }, | |
| { | |
| "epoch": 0.7854545454545454, | |
| "grad_norm": 0.008530031889677048, | |
| "learning_rate": 1.210930052968981e-05, | |
| "loss": 0.3757563591003418, | |
| "step": 2430 | |
| }, | |
| { | |
| "epoch": 0.7886868686868687, | |
| "grad_norm": 0.008401346392929554, | |
| "learning_rate": 1.1762748328359152e-05, | |
| "loss": 0.3683294773101807, | |
| "step": 2440 | |
| }, | |
| { | |
| "epoch": 0.7919191919191919, | |
| "grad_norm": 0.008516985923051834, | |
| "learning_rate": 1.1420565137720045e-05, | |
| "loss": 0.36771197319030763, | |
| "step": 2450 | |
| }, | |
| { | |
| "epoch": 0.7951515151515152, | |
| "grad_norm": 0.009087449871003628, | |
| "learning_rate": 1.1082790055823533e-05, | |
| "loss": 0.3740364074707031, | |
| "step": 2460 | |
| }, | |
| { | |
| "epoch": 0.7983838383838384, | |
| "grad_norm": 0.008735416457057, | |
| "learning_rate": 1.0749461677047624e-05, | |
| "loss": 0.3658547639846802, | |
| "step": 2470 | |
| }, | |
| { | |
| "epoch": 0.8016161616161617, | |
| "grad_norm": 0.008218985982239246, | |
| "learning_rate": 1.0420618087687418e-05, | |
| "loss": 0.36727066040039064, | |
| "step": 2480 | |
| }, | |
| { | |
| "epoch": 0.8048484848484848, | |
| "grad_norm": 0.008089488372206688, | |
| "learning_rate": 1.0096296861603321e-05, | |
| "loss": 0.3734628200531006, | |
| "step": 2490 | |
| }, | |
| { | |
| "epoch": 0.8080808080808081, | |
| "grad_norm": 0.009074541740119457, | |
| "learning_rate": 9.776535055927931e-06, | |
| "loss": 0.38172001838684083, | |
| "step": 2500 | |
| }, | |
| { | |
| "epoch": 0.8113131313131313, | |
| "grad_norm": 0.008856716565787792, | |
| "learning_rate": 9.461369206831772e-06, | |
| "loss": 0.3696982622146606, | |
| "step": 2510 | |
| }, | |
| { | |
| "epoch": 0.8145454545454546, | |
| "grad_norm": 0.008767404593527317, | |
| "learning_rate": 9.150835325348678e-06, | |
| "loss": 0.37330069541931155, | |
| "step": 2520 | |
| }, | |
| { | |
| "epoch": 0.8177777777777778, | |
| "grad_norm": 0.009410719387233257, | |
| "learning_rate": 8.844968893261197e-06, | |
| "loss": 0.3685540914535522, | |
| "step": 2530 | |
| }, | |
| { | |
| "epoch": 0.821010101010101, | |
| "grad_norm": 0.008310632780194283, | |
| "learning_rate": 8.543804859046345e-06, | |
| "loss": 0.37013726234436034, | |
| "step": 2540 | |
| }, | |
| { | |
| "epoch": 0.8242424242424242, | |
| "grad_norm": 0.008878393098711967, | |
| "learning_rate": 8.247377633882463e-06, | |
| "loss": 0.3597676753997803, | |
| "step": 2550 | |
| }, | |
| { | |
| "epoch": 0.8274747474747475, | |
| "grad_norm": 0.008639859966933727, | |
| "learning_rate": 7.95572108771726e-06, | |
| "loss": 0.3774217128753662, | |
| "step": 2560 | |
| }, | |
| { | |
| "epoch": 0.8307070707070707, | |
| "grad_norm": 0.009035849943757057, | |
| "learning_rate": 7.66886854539795e-06, | |
| "loss": 0.3716104984283447, | |
| "step": 2570 | |
| }, | |
| { | |
| "epoch": 0.833939393939394, | |
| "grad_norm": 0.008561176247894764, | |
| "learning_rate": 7.386852782863407e-06, | |
| "loss": 0.3702033042907715, | |
| "step": 2580 | |
| }, | |
| { | |
| "epoch": 0.8371717171717171, | |
| "grad_norm": 0.008522417396306992, | |
| "learning_rate": 7.109706023399232e-06, | |
| "loss": 0.3779261589050293, | |
| "step": 2590 | |
| }, | |
| { | |
| "epoch": 0.8404040404040404, | |
| "grad_norm": 0.00893703568726778, | |
| "learning_rate": 6.837459933955936e-06, | |
| "loss": 0.37835590839385985, | |
| "step": 2600 | |
| }, | |
| { | |
| "epoch": 0.8436363636363636, | |
| "grad_norm": 0.008540214039385319, | |
| "learning_rate": 6.5701456215305656e-06, | |
| "loss": 0.3780329704284668, | |
| "step": 2610 | |
| }, | |
| { | |
| "epoch": 0.8468686868686869, | |
| "grad_norm": 0.008812173269689083, | |
| "learning_rate": 6.307793629612452e-06, | |
| "loss": 0.3683763980865479, | |
| "step": 2620 | |
| }, | |
| { | |
| "epoch": 0.8501010101010101, | |
| "grad_norm": 0.009678703732788563, | |
| "learning_rate": 6.050433934693339e-06, | |
| "loss": 0.37782022953033445, | |
| "step": 2630 | |
| }, | |
| { | |
| "epoch": 0.8533333333333334, | |
| "grad_norm": 0.008860490284860134, | |
| "learning_rate": 5.798095942842141e-06, | |
| "loss": 0.3841053009033203, | |
| "step": 2640 | |
| }, | |
| { | |
| "epoch": 0.8565656565656565, | |
| "grad_norm": 0.009095641784369946, | |
| "learning_rate": 5.550808486345072e-06, | |
| "loss": 0.378291392326355, | |
| "step": 2650 | |
| }, | |
| { | |
| "epoch": 0.8597979797979798, | |
| "grad_norm": 0.008367888629436493, | |
| "learning_rate": 5.308599820411247e-06, | |
| "loss": 0.36860671043396, | |
| "step": 2660 | |
| }, | |
| { | |
| "epoch": 0.863030303030303, | |
| "grad_norm": 0.008273428305983543, | |
| "learning_rate": 5.071497619944171e-06, | |
| "loss": 0.37145724296569826, | |
| "step": 2670 | |
| }, | |
| { | |
| "epoch": 0.8662626262626263, | |
| "grad_norm": 0.00850183516740799, | |
| "learning_rate": 4.839528976379648e-06, | |
| "loss": 0.37532649040222166, | |
| "step": 2680 | |
| }, | |
| { | |
| "epoch": 0.8694949494949495, | |
| "grad_norm": 0.008846405893564224, | |
| "learning_rate": 4.612720394590286e-06, | |
| "loss": 0.3695547580718994, | |
| "step": 2690 | |
| }, | |
| { | |
| "epoch": 0.8727272727272727, | |
| "grad_norm": 0.008502434007823467, | |
| "learning_rate": 4.391097789856985e-06, | |
| "loss": 0.3720081806182861, | |
| "step": 2700 | |
| }, | |
| { | |
| "epoch": 0.8759595959595959, | |
| "grad_norm": 0.00865609385073185, | |
| "learning_rate": 4.174686484907908e-06, | |
| "loss": 0.366014289855957, | |
| "step": 2710 | |
| }, | |
| { | |
| "epoch": 0.8791919191919192, | |
| "grad_norm": 0.008588762022554874, | |
| "learning_rate": 3.963511207025078e-06, | |
| "loss": 0.3735676288604736, | |
| "step": 2720 | |
| }, | |
| { | |
| "epoch": 0.8824242424242424, | |
| "grad_norm": 0.008576790802180767, | |
| "learning_rate": 3.7575960852189728e-06, | |
| "loss": 0.38012237548828126, | |
| "step": 2730 | |
| }, | |
| { | |
| "epoch": 0.8856565656565657, | |
| "grad_norm": 0.00867045484483242, | |
| "learning_rate": 3.5569646474715722e-06, | |
| "loss": 0.3639381885528564, | |
| "step": 2740 | |
| }, | |
| { | |
| "epoch": 0.8888888888888888, | |
| "grad_norm": 0.008396121673285961, | |
| "learning_rate": 3.361639818048068e-06, | |
| "loss": 0.3669224500656128, | |
| "step": 2750 | |
| }, | |
| { | |
| "epoch": 0.8921212121212121, | |
| "grad_norm": 0.008845800533890724, | |
| "learning_rate": 3.1716439148774534e-06, | |
| "loss": 0.37716219425201414, | |
| "step": 2760 | |
| }, | |
| { | |
| "epoch": 0.8953535353535353, | |
| "grad_norm": 0.008693251758813858, | |
| "learning_rate": 2.986998647002498e-06, | |
| "loss": 0.37473173141479493, | |
| "step": 2770 | |
| }, | |
| { | |
| "epoch": 0.8985858585858586, | |
| "grad_norm": 0.008422034792602062, | |
| "learning_rate": 2.8077251120992742e-06, | |
| "loss": 0.36577663421630857, | |
| "step": 2780 | |
| }, | |
| { | |
| "epoch": 0.9018181818181819, | |
| "grad_norm": 0.008648978546261787, | |
| "learning_rate": 2.633843794066515e-06, | |
| "loss": 0.367098331451416, | |
| "step": 2790 | |
| }, | |
| { | |
| "epoch": 0.9050505050505051, | |
| "grad_norm": 0.010963937267661095, | |
| "learning_rate": 2.465374560685091e-06, | |
| "loss": 0.3678403615951538, | |
| "step": 2800 | |
| }, | |
| { | |
| "epoch": 0.9082828282828282, | |
| "grad_norm": 0.009094985201954842, | |
| "learning_rate": 2.302336661347926e-06, | |
| "loss": 0.3675699234008789, | |
| "step": 2810 | |
| }, | |
| { | |
| "epoch": 0.9115151515151515, | |
| "grad_norm": 0.008743363432586193, | |
| "learning_rate": 2.1447487248605513e-06, | |
| "loss": 0.37444562911987306, | |
| "step": 2820 | |
| }, | |
| { | |
| "epoch": 0.9147474747474748, | |
| "grad_norm": 0.00941784493625164, | |
| "learning_rate": 1.9926287573125537e-06, | |
| "loss": 0.3681609630584717, | |
| "step": 2830 | |
| }, | |
| { | |
| "epoch": 0.917979797979798, | |
| "grad_norm": 0.00954042561352253, | |
| "learning_rate": 1.845994140020213e-06, | |
| "loss": 0.38029980659484863, | |
| "step": 2840 | |
| }, | |
| { | |
| "epoch": 0.9212121212121213, | |
| "grad_norm": 0.009196951985359192, | |
| "learning_rate": 1.7048616275404771e-06, | |
| "loss": 0.3789214134216309, | |
| "step": 2850 | |
| }, | |
| { | |
| "epoch": 0.9244444444444444, | |
| "grad_norm": 0.00853448174893856, | |
| "learning_rate": 1.5692473457565748e-06, | |
| "loss": 0.3718825340270996, | |
| "step": 2860 | |
| }, | |
| { | |
| "epoch": 0.9276767676767677, | |
| "grad_norm": 0.00839314702898264, | |
| "learning_rate": 1.439166790035501e-06, | |
| "loss": 0.3590099334716797, | |
| "step": 2870 | |
| }, | |
| { | |
| "epoch": 0.9309090909090909, | |
| "grad_norm": 0.008586350828409195, | |
| "learning_rate": 1.3146348234574724e-06, | |
| "loss": 0.3683621883392334, | |
| "step": 2880 | |
| }, | |
| { | |
| "epoch": 0.9341414141414142, | |
| "grad_norm": 0.008600099012255669, | |
| "learning_rate": 1.1956656751176577e-06, | |
| "loss": 0.37033562660217284, | |
| "step": 2890 | |
| }, | |
| { | |
| "epoch": 0.9373737373737374, | |
| "grad_norm": 0.008418564684689045, | |
| "learning_rate": 1.0822729385003727e-06, | |
| "loss": 0.3722561836242676, | |
| "step": 2900 | |
| }, | |
| { | |
| "epoch": 0.9406060606060606, | |
| "grad_norm": 0.009096021763980389, | |
| "learning_rate": 9.744695699258955e-07, | |
| "loss": 0.3705434799194336, | |
| "step": 2910 | |
| }, | |
| { | |
| "epoch": 0.9438383838383838, | |
| "grad_norm": 0.008366765454411507, | |
| "learning_rate": 8.722678870700274e-07, | |
| "loss": 0.3757180690765381, | |
| "step": 2920 | |
| }, | |
| { | |
| "epoch": 0.9470707070707071, | |
| "grad_norm": 0.00876353308558464, | |
| "learning_rate": 7.756795675566919e-07, | |
| "loss": 0.37983224391937254, | |
| "step": 2930 | |
| }, | |
| { | |
| "epoch": 0.9503030303030303, | |
| "grad_norm": 0.009546471759676933, | |
| "learning_rate": 6.847156476236516e-07, | |
| "loss": 0.3681799411773682, | |
| "step": 2940 | |
| }, | |
| { | |
| "epoch": 0.9535353535353536, | |
| "grad_norm": 0.009170792065560818, | |
| "learning_rate": 5.993865208614835e-07, | |
| "loss": 0.37668063640594485, | |
| "step": 2950 | |
| }, | |
| { | |
| "epoch": 0.9567676767676768, | |
| "grad_norm": 0.009117783978581429, | |
| "learning_rate": 5.197019370260125e-07, | |
| "loss": 0.37759225368499755, | |
| "step": 2960 | |
| }, | |
| { | |
| "epoch": 0.96, | |
| "grad_norm": 0.008532952517271042, | |
| "learning_rate": 4.4567100092429704e-07, | |
| "loss": 0.37395672798156737, | |
| "step": 2970 | |
| }, | |
| { | |
| "epoch": 0.9632323232323232, | |
| "grad_norm": 0.009223098866641521, | |
| "learning_rate": 3.7730217137428857e-07, | |
| "loss": 0.37163801193237306, | |
| "step": 2980 | |
| }, | |
| { | |
| "epoch": 0.9664646464646465, | |
| "grad_norm": 0.008754228241741657, | |
| "learning_rate": 3.1460326023836083e-07, | |
| "loss": 0.36648709774017335, | |
| "step": 2990 | |
| }, | |
| { | |
| "epoch": 0.9696969696969697, | |
| "grad_norm": 0.008591044694185257, | |
| "learning_rate": 2.575814315306846e-07, | |
| "loss": 0.37227301597595214, | |
| "step": 3000 | |
| }, | |
| { | |
| "epoch": 0.972929292929293, | |
| "grad_norm": 0.008041838183999062, | |
| "learning_rate": 2.0624320059869918e-07, | |
| "loss": 0.3628725528717041, | |
| "step": 3010 | |
| }, | |
| { | |
| "epoch": 0.9761616161616161, | |
| "grad_norm": 0.008703021332621574, | |
| "learning_rate": 1.6059443337861912e-07, | |
| "loss": 0.37576377391815186, | |
| "step": 3020 | |
| }, | |
| { | |
| "epoch": 0.9793939393939394, | |
| "grad_norm": 0.008821804076433182, | |
| "learning_rate": 1.2064034572523142e-07, | |
| "loss": 0.3834287405014038, | |
| "step": 3030 | |
| }, | |
| { | |
| "epoch": 0.9826262626262626, | |
| "grad_norm": 0.008591694757342339, | |
| "learning_rate": 8.638550281591107e-08, | |
| "loss": 0.37302587032318113, | |
| "step": 3040 | |
| }, | |
| { | |
| "epoch": 0.9858585858585859, | |
| "grad_norm": 0.008161679841578007, | |
| "learning_rate": 5.7833818629005054e-08, | |
| "loss": 0.37163610458374025, | |
| "step": 3050 | |
| }, | |
| { | |
| "epoch": 0.9890909090909091, | |
| "grad_norm": 0.008312749676406384, | |
| "learning_rate": 3.498855549660118e-08, | |
| "loss": 0.36723101139068604, | |
| "step": 3060 | |
| }, | |
| { | |
| "epoch": 0.9923232323232323, | |
| "grad_norm": 0.00888834148645401, | |
| "learning_rate": 1.785232373180401e-08, | |
| "loss": 0.37539513111114503, | |
| "step": 3070 | |
| }, | |
| { | |
| "epoch": 0.9955555555555555, | |
| "grad_norm": 0.008697266690433025, | |
| "learning_rate": 6.427081330456774e-09, | |
| "loss": 0.36815404891967773, | |
| "step": 3080 | |
| }, | |
| { | |
| "epoch": 0.9987878787878788, | |
| "grad_norm": 0.008382439613342285, | |
| "learning_rate": 7.141337474148025e-10, | |
| "loss": 0.36412370204925537, | |
| "step": 3090 | |
| }, | |
| { | |
| "epoch": 1.0, | |
| "step": 3094, | |
| "total_flos": 4.707818479338652e+18, | |
| "train_loss": 0.3778848514375027, | |
| "train_runtime": 9023.6676, | |
| "train_samples_per_second": 21.942, | |
| "train_steps_per_second": 0.343 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 3094, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 1, | |
| "save_steps": 200, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 4.707818479338652e+18, | |
| "train_batch_size": 2, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |