Instructions to use anupbth1/Ved-Code-7B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anupbth1/Ved-Code-7B-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("anupbth1/Ved-Code-7B-4bit") model = PeftModel.from_pretrained(base_model, "anupbth1/Ved-Code-7B-LoRA") - Transformers
How to use anupbth1/Ved-Code-7B-LoRA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anupbth1/Ved-Code-7B-LoRA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anupbth1/Ved-Code-7B-LoRA", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use anupbth1/Ved-Code-7B-LoRA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anupbth1/Ved-Code-7B-LoRA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anupbth1/Ved-Code-7B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anupbth1/Ved-Code-7B-LoRA
- SGLang
How to use anupbth1/Ved-Code-7B-LoRA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anupbth1/Ved-Code-7B-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anupbth1/Ved-Code-7B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anupbth1/Ved-Code-7B-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anupbth1/Ved-Code-7B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use anupbth1/Ved-Code-7B-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anupbth1/Ved-Code-7B-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anupbth1/Ved-Code-7B-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anupbth1/Ved-Code-7B-LoRA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="anupbth1/Ved-Code-7B-LoRA", max_seq_length=2048, ) - Docker Model Runner
How to use anupbth1/Ved-Code-7B-LoRA with Docker Model Runner:
docker model run hf.co/anupbth1/Ved-Code-7B-LoRA
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 0.39603960396039606, | |
| "eval_steps": 500, | |
| "global_step": 2500, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.0015841584158415843, | |
| "grad_norm": 0.06556262075901031, | |
| "learning_rate": 7.2e-06, | |
| "loss": 0.6151810169219971, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.0031683168316831685, | |
| "grad_norm": 0.06630237400531769, | |
| "learning_rate": 1.52e-05, | |
| "loss": 0.5479158401489258, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.004752475247524752, | |
| "grad_norm": 0.08819983899593353, | |
| "learning_rate": 2.32e-05, | |
| "loss": 0.580345344543457, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.006336633663366337, | |
| "grad_norm": 0.07164224237203598, | |
| "learning_rate": 3.12e-05, | |
| "loss": 0.5383748054504395, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.007920792079207921, | |
| "grad_norm": 0.08992987126111984, | |
| "learning_rate": 3.9200000000000004e-05, | |
| "loss": 0.5134584426879882, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.009504950495049505, | |
| "grad_norm": 0.09747444093227386, | |
| "learning_rate": 4.72e-05, | |
| "loss": 0.52029390335083, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.011089108910891089, | |
| "grad_norm": 0.11321567744016647, | |
| "learning_rate": 5.520000000000001e-05, | |
| "loss": 0.4892634391784668, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.012673267326732674, | |
| "grad_norm": 0.08909470587968826, | |
| "learning_rate": 6.32e-05, | |
| "loss": 0.4739553928375244, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.014257425742574258, | |
| "grad_norm": 0.12608297169208527, | |
| "learning_rate": 7.12e-05, | |
| "loss": 0.4961063385009766, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.015841584158415842, | |
| "grad_norm": 0.11412779241800308, | |
| "learning_rate": 7.920000000000001e-05, | |
| "loss": 0.4843149662017822, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.017425742574257427, | |
| "grad_norm": 0.12621361017227173, | |
| "learning_rate": 8.72e-05, | |
| "loss": 0.48389220237731934, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.01900990099009901, | |
| "grad_norm": 0.12417051941156387, | |
| "learning_rate": 9.52e-05, | |
| "loss": 0.4838583946228027, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.020594059405940595, | |
| "grad_norm": 0.14021746814250946, | |
| "learning_rate": 0.0001032, | |
| "loss": 0.45241260528564453, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.022178217821782177, | |
| "grad_norm": 0.09451174736022949, | |
| "learning_rate": 0.00011120000000000002, | |
| "loss": 0.44155592918395997, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.023762376237623763, | |
| "grad_norm": 0.12511184811592102, | |
| "learning_rate": 0.0001192, | |
| "loss": 0.47519407272338865, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.025346534653465348, | |
| "grad_norm": 0.13305315375328064, | |
| "learning_rate": 0.0001272, | |
| "loss": 0.4809138298034668, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.02693069306930693, | |
| "grad_norm": 0.11137474328279495, | |
| "learning_rate": 0.0001352, | |
| "loss": 0.4950218677520752, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 0.028514851485148516, | |
| "grad_norm": 0.14295189082622528, | |
| "learning_rate": 0.0001432, | |
| "loss": 0.44295687675476075, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 0.030099009900990098, | |
| "grad_norm": 0.11107228696346283, | |
| "learning_rate": 0.00015120000000000002, | |
| "loss": 0.5302713871002197, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 0.031683168316831684, | |
| "grad_norm": 0.11264927685260773, | |
| "learning_rate": 0.00015920000000000002, | |
| "loss": 0.4833076000213623, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.03326732673267327, | |
| "grad_norm": 0.1116105169057846, | |
| "learning_rate": 0.0001672, | |
| "loss": 0.47181167602539065, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 0.034851485148514855, | |
| "grad_norm": 0.12198604643344879, | |
| "learning_rate": 0.0001752, | |
| "loss": 0.45895776748657224, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 0.03643564356435643, | |
| "grad_norm": 0.09375844895839691, | |
| "learning_rate": 0.0001832, | |
| "loss": 0.46950302124023435, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 0.03801980198019802, | |
| "grad_norm": 0.12337016314268112, | |
| "learning_rate": 0.0001912, | |
| "loss": 0.5031816482543945, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 0.039603960396039604, | |
| "grad_norm": 0.1063649132847786, | |
| "learning_rate": 0.00019920000000000002, | |
| "loss": 0.4671049118041992, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.04118811881188119, | |
| "grad_norm": 0.09282703697681427, | |
| "learning_rate": 0.00019920000000000002, | |
| "loss": 0.4365957260131836, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 0.042772277227722776, | |
| "grad_norm": 0.11267738789319992, | |
| "learning_rate": 0.00019831111111111112, | |
| "loss": 0.480745267868042, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 0.044356435643564354, | |
| "grad_norm": 0.1347280740737915, | |
| "learning_rate": 0.00019742222222222225, | |
| "loss": 0.46505031585693357, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 0.04594059405940594, | |
| "grad_norm": 0.10801614820957184, | |
| "learning_rate": 0.00019653333333333336, | |
| "loss": 0.46571884155273435, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 0.047524752475247525, | |
| "grad_norm": 0.12413369119167328, | |
| "learning_rate": 0.00019564444444444446, | |
| "loss": 0.4441887378692627, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.04910891089108911, | |
| "grad_norm": 0.0879567414522171, | |
| "learning_rate": 0.00019475555555555557, | |
| "loss": 0.43287091255187987, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 0.050693069306930696, | |
| "grad_norm": 0.09971684217453003, | |
| "learning_rate": 0.0001938666666666667, | |
| "loss": 0.45388994216918943, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 0.052277227722772275, | |
| "grad_norm": 0.09890090674161911, | |
| "learning_rate": 0.0001929777777777778, | |
| "loss": 0.42935981750488283, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 0.05386138613861386, | |
| "grad_norm": 0.09626103192567825, | |
| "learning_rate": 0.0001920888888888889, | |
| "loss": 0.4544685363769531, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 0.055445544554455446, | |
| "grad_norm": 0.09048525989055634, | |
| "learning_rate": 0.0001912, | |
| "loss": 0.4353527069091797, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.05702970297029703, | |
| "grad_norm": 0.1372356116771698, | |
| "learning_rate": 0.00019031111111111112, | |
| "loss": 0.4675909519195557, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 0.05861386138613861, | |
| "grad_norm": 0.11068324744701385, | |
| "learning_rate": 0.00018942222222222222, | |
| "loss": 0.4603554725646973, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 0.060198019801980196, | |
| "grad_norm": 0.10414744913578033, | |
| "learning_rate": 0.00018853333333333333, | |
| "loss": 0.45653133392333983, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 0.06178217821782178, | |
| "grad_norm": 0.15728670358657837, | |
| "learning_rate": 0.00018764444444444446, | |
| "loss": 0.46082301139831544, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 0.06336633663366337, | |
| "grad_norm": 0.14269177615642548, | |
| "learning_rate": 0.00018675555555555556, | |
| "loss": 0.47365808486938477, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.06495049504950495, | |
| "grad_norm": 0.08951593935489655, | |
| "learning_rate": 0.00018586666666666667, | |
| "loss": 0.4434823036193848, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 0.06653465346534654, | |
| "grad_norm": 0.1097274124622345, | |
| "learning_rate": 0.00018497777777777777, | |
| "loss": 0.4335814952850342, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 0.06811881188118812, | |
| "grad_norm": 0.14073720574378967, | |
| "learning_rate": 0.00018408888888888888, | |
| "loss": 0.42411150932312014, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 0.06970297029702971, | |
| "grad_norm": 0.12873341143131256, | |
| "learning_rate": 0.0001832, | |
| "loss": 0.4766682624816895, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 0.07128712871287128, | |
| "grad_norm": 0.07945746928453445, | |
| "learning_rate": 0.0001823111111111111, | |
| "loss": 0.41811428070068357, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.07287128712871287, | |
| "grad_norm": 0.13228054344654083, | |
| "learning_rate": 0.00018142222222222222, | |
| "loss": 0.466593599319458, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 0.07445544554455445, | |
| "grad_norm": 0.12058842182159424, | |
| "learning_rate": 0.00018053333333333332, | |
| "loss": 0.4782561302185059, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 0.07603960396039604, | |
| "grad_norm": 0.10949750244617462, | |
| "learning_rate": 0.00017964444444444445, | |
| "loss": 0.4511709213256836, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 0.07762376237623762, | |
| "grad_norm": 0.133078470826149, | |
| "learning_rate": 0.00017875555555555556, | |
| "loss": 0.4502392292022705, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 0.07920792079207921, | |
| "grad_norm": 0.09396151453256607, | |
| "learning_rate": 0.00017786666666666666, | |
| "loss": 0.44018964767456054, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.0807920792079208, | |
| "grad_norm": 0.1271175742149353, | |
| "learning_rate": 0.00017697777777777777, | |
| "loss": 0.4549531936645508, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 0.08237623762376238, | |
| "grad_norm": 0.13191580772399902, | |
| "learning_rate": 0.0001760888888888889, | |
| "loss": 0.4755974769592285, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 0.08396039603960397, | |
| "grad_norm": 0.10729491710662842, | |
| "learning_rate": 0.0001752, | |
| "loss": 0.41382646560668945, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 0.08554455445544555, | |
| "grad_norm": 0.08870874345302582, | |
| "learning_rate": 0.0001743111111111111, | |
| "loss": 0.4281641483306885, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 0.08712871287128712, | |
| "grad_norm": 0.1239466741681099, | |
| "learning_rate": 0.00017342222222222224, | |
| "loss": 0.4584688186645508, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.08871287128712871, | |
| "grad_norm": 0.11894556134939194, | |
| "learning_rate": 0.00017253333333333334, | |
| "loss": 0.46378793716430666, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 0.0902970297029703, | |
| "grad_norm": 0.11373710632324219, | |
| "learning_rate": 0.00017164444444444445, | |
| "loss": 0.49725875854492185, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 0.09188118811881188, | |
| "grad_norm": 0.10424434393644333, | |
| "learning_rate": 0.00017075555555555555, | |
| "loss": 0.45825581550598143, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 0.09346534653465347, | |
| "grad_norm": 0.11420601606369019, | |
| "learning_rate": 0.00016986666666666668, | |
| "loss": 0.45974411964416506, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 0.09504950495049505, | |
| "grad_norm": 0.08729609847068787, | |
| "learning_rate": 0.0001689777777777778, | |
| "loss": 0.41493749618530273, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.09663366336633664, | |
| "grad_norm": 0.12459246814250946, | |
| "learning_rate": 0.0001680888888888889, | |
| "loss": 0.45732903480529785, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 0.09821782178217822, | |
| "grad_norm": 0.11139161139726639, | |
| "learning_rate": 0.0001672, | |
| "loss": 0.452393913269043, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 0.09980198019801981, | |
| "grad_norm": 0.13227005302906036, | |
| "learning_rate": 0.00016631111111111113, | |
| "loss": 0.4697711944580078, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 0.10138613861386139, | |
| "grad_norm": 0.11359205096960068, | |
| "learning_rate": 0.00016542222222222223, | |
| "loss": 0.44234395027160645, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 0.10297029702970296, | |
| "grad_norm": 0.10215561091899872, | |
| "learning_rate": 0.00016453333333333334, | |
| "loss": 0.4134825706481934, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.10455445544554455, | |
| "grad_norm": 0.10554394870996475, | |
| "learning_rate": 0.00016364444444444444, | |
| "loss": 0.4320853233337402, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 0.10613861386138614, | |
| "grad_norm": 0.10074356943368912, | |
| "learning_rate": 0.00016275555555555558, | |
| "loss": 0.47079954147338865, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 0.10772277227722772, | |
| "grad_norm": 0.1219848170876503, | |
| "learning_rate": 0.00016186666666666668, | |
| "loss": 0.44622125625610354, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 0.1093069306930693, | |
| "grad_norm": 0.10316894948482513, | |
| "learning_rate": 0.00016097777777777778, | |
| "loss": 0.4087726593017578, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 0.11089108910891089, | |
| "grad_norm": 0.09130258858203888, | |
| "learning_rate": 0.0001600888888888889, | |
| "loss": 0.4572176933288574, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.11247524752475248, | |
| "grad_norm": 0.13283619284629822, | |
| "learning_rate": 0.00015920000000000002, | |
| "loss": 0.44556303024291993, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 0.11405940594059406, | |
| "grad_norm": 0.09566845744848251, | |
| "learning_rate": 0.00015831111111111113, | |
| "loss": 0.4299760818481445, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 0.11564356435643565, | |
| "grad_norm": 0.09357430040836334, | |
| "learning_rate": 0.00015742222222222223, | |
| "loss": 0.4479306697845459, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 0.11722772277227722, | |
| "grad_norm": 0.08751889318227768, | |
| "learning_rate": 0.00015653333333333333, | |
| "loss": 0.42789626121520996, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 0.1188118811881188, | |
| "grad_norm": 0.12049714475870132, | |
| "learning_rate": 0.00015564444444444447, | |
| "loss": 0.47111997604370115, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.12039603960396039, | |
| "grad_norm": 0.10753843188285828, | |
| "learning_rate": 0.00015475555555555557, | |
| "loss": 0.42185111045837403, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 0.12198019801980198, | |
| "grad_norm": 0.1295730322599411, | |
| "learning_rate": 0.00015386666666666668, | |
| "loss": 0.4739703178405762, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 0.12356435643564356, | |
| "grad_norm": 0.08475282788276672, | |
| "learning_rate": 0.00015297777777777778, | |
| "loss": 0.4738037586212158, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 0.12514851485148515, | |
| "grad_norm": 0.10000675916671753, | |
| "learning_rate": 0.0001520888888888889, | |
| "loss": 0.44550237655639646, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 0.12673267326732673, | |
| "grad_norm": 0.10124850273132324, | |
| "learning_rate": 0.00015120000000000002, | |
| "loss": 0.41213264465332033, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.12831683168316832, | |
| "grad_norm": 0.09979727119207382, | |
| "learning_rate": 0.00015031111111111112, | |
| "loss": 0.4510068893432617, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 0.1299009900990099, | |
| "grad_norm": 0.10252496600151062, | |
| "learning_rate": 0.00014942222222222223, | |
| "loss": 0.4421220302581787, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 0.1314851485148515, | |
| "grad_norm": 0.11230350285768509, | |
| "learning_rate": 0.00014853333333333336, | |
| "loss": 0.4216045379638672, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 0.13306930693069308, | |
| "grad_norm": 0.10745341330766678, | |
| "learning_rate": 0.00014764444444444446, | |
| "loss": 0.45771260261535646, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 0.13465346534653466, | |
| "grad_norm": 0.10362319648265839, | |
| "learning_rate": 0.00014675555555555557, | |
| "loss": 0.456635570526123, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.13623762376237625, | |
| "grad_norm": 0.10825644433498383, | |
| "learning_rate": 0.00014586666666666667, | |
| "loss": 0.4578948974609375, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 0.13782178217821783, | |
| "grad_norm": 0.09999847412109375, | |
| "learning_rate": 0.0001449777777777778, | |
| "loss": 0.4257713794708252, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 0.13940594059405942, | |
| "grad_norm": 0.09439483284950256, | |
| "learning_rate": 0.0001440888888888889, | |
| "loss": 0.4488701820373535, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 0.14099009900990098, | |
| "grad_norm": 0.10177771002054214, | |
| "learning_rate": 0.0001432, | |
| "loss": 0.4768357276916504, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 0.14257425742574256, | |
| "grad_norm": 0.09642136842012405, | |
| "learning_rate": 0.00014231111111111112, | |
| "loss": 0.4253392696380615, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.14415841584158415, | |
| "grad_norm": 0.09430525451898575, | |
| "learning_rate": 0.00014142222222222222, | |
| "loss": 0.43414530754089353, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 0.14574257425742573, | |
| "grad_norm": 0.11593130230903625, | |
| "learning_rate": 0.00014053333333333335, | |
| "loss": 0.4248401165008545, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 0.14732673267326732, | |
| "grad_norm": 0.11584466695785522, | |
| "learning_rate": 0.00013964444444444446, | |
| "loss": 0.4541325092315674, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 0.1489108910891089, | |
| "grad_norm": 0.09682377427816391, | |
| "learning_rate": 0.00013875555555555556, | |
| "loss": 0.44811625480651857, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 0.1504950495049505, | |
| "grad_norm": 0.12739314138889313, | |
| "learning_rate": 0.00013786666666666667, | |
| "loss": 0.4510763168334961, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.15207920792079208, | |
| "grad_norm": 0.11477553099393845, | |
| "learning_rate": 0.00013697777777777777, | |
| "loss": 0.4399250507354736, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 0.15366336633663366, | |
| "grad_norm": 0.11998990923166275, | |
| "learning_rate": 0.00013608888888888887, | |
| "loss": 0.4643832206726074, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 0.15524752475247525, | |
| "grad_norm": 0.12250885367393494, | |
| "learning_rate": 0.0001352, | |
| "loss": 0.4670434474945068, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 0.15683168316831683, | |
| "grad_norm": 0.10394606739282608, | |
| "learning_rate": 0.0001343111111111111, | |
| "loss": 0.41683096885681153, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 0.15841584158415842, | |
| "grad_norm": 0.11151418834924698, | |
| "learning_rate": 0.00013342222222222222, | |
| "loss": 0.4357429504394531, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.16, | |
| "grad_norm": 0.1484747976064682, | |
| "learning_rate": 0.00013253333333333332, | |
| "loss": 0.4269531726837158, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 0.1615841584158416, | |
| "grad_norm": 0.11201906949281693, | |
| "learning_rate": 0.00013164444444444445, | |
| "loss": 0.4297961711883545, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 0.16316831683168317, | |
| "grad_norm": 0.11010719835758209, | |
| "learning_rate": 0.00013075555555555556, | |
| "loss": 0.41904025077819823, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 0.16475247524752476, | |
| "grad_norm": 0.10168910026550293, | |
| "learning_rate": 0.00012986666666666666, | |
| "loss": 0.46724610328674315, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 0.16633663366336635, | |
| "grad_norm": 0.11108486354351044, | |
| "learning_rate": 0.00012897777777777777, | |
| "loss": 0.41109704971313477, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 0.16792079207920793, | |
| "grad_norm": 0.1291012018918991, | |
| "learning_rate": 0.0001280888888888889, | |
| "loss": 0.44829635620117186, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 0.16950495049504952, | |
| "grad_norm": 0.11215164512395859, | |
| "learning_rate": 0.0001272, | |
| "loss": 0.4662069797515869, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 0.1710891089108911, | |
| "grad_norm": 0.13233599066734314, | |
| "learning_rate": 0.0001263111111111111, | |
| "loss": 0.4491884708404541, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 0.17267326732673266, | |
| "grad_norm": 0.08990936726331711, | |
| "learning_rate": 0.0001254222222222222, | |
| "loss": 0.431490421295166, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 0.17425742574257425, | |
| "grad_norm": 0.10440412163734436, | |
| "learning_rate": 0.00012453333333333334, | |
| "loss": 0.3993945598602295, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.17584158415841583, | |
| "grad_norm": 0.11035147309303284, | |
| "learning_rate": 0.00012364444444444445, | |
| "loss": 0.456577730178833, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 0.17742574257425742, | |
| "grad_norm": 0.11196247488260269, | |
| "learning_rate": 0.00012275555555555555, | |
| "loss": 0.4181276798248291, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 0.179009900990099, | |
| "grad_norm": 0.10106303542852402, | |
| "learning_rate": 0.00012186666666666666, | |
| "loss": 0.4272180080413818, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 0.1805940594059406, | |
| "grad_norm": 0.11019843071699142, | |
| "learning_rate": 0.00012097777777777779, | |
| "loss": 0.44555273056030276, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 0.18217821782178217, | |
| "grad_norm": 0.09329156577587128, | |
| "learning_rate": 0.00012008888888888889, | |
| "loss": 0.42681331634521485, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 0.18376237623762376, | |
| "grad_norm": 0.08857206255197525, | |
| "learning_rate": 0.0001192, | |
| "loss": 0.4692983627319336, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 0.18534653465346534, | |
| "grad_norm": 0.11052225530147552, | |
| "learning_rate": 0.0001183111111111111, | |
| "loss": 0.44810261726379397, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 0.18693069306930693, | |
| "grad_norm": 0.10589273273944855, | |
| "learning_rate": 0.00011742222222222223, | |
| "loss": 0.43929290771484375, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 0.18851485148514852, | |
| "grad_norm": 0.12494352459907532, | |
| "learning_rate": 0.00011653333333333334, | |
| "loss": 0.48512043952941897, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 0.1900990099009901, | |
| "grad_norm": 0.10260408371686935, | |
| "learning_rate": 0.00011564444444444444, | |
| "loss": 0.4629175662994385, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 0.1916831683168317, | |
| "grad_norm": 0.12947669625282288, | |
| "learning_rate": 0.00011475555555555557, | |
| "loss": 0.43849620819091795, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 0.19326732673267327, | |
| "grad_norm": 0.10582385957241058, | |
| "learning_rate": 0.00011386666666666668, | |
| "loss": 0.4508364677429199, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 0.19485148514851486, | |
| "grad_norm": 0.12441077828407288, | |
| "learning_rate": 0.00011297777777777778, | |
| "loss": 0.42998151779174804, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 0.19643564356435644, | |
| "grad_norm": 0.09037347137928009, | |
| "learning_rate": 0.00011208888888888889, | |
| "loss": 0.4441089630126953, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 0.19801980198019803, | |
| "grad_norm": 0.1148349717259407, | |
| "learning_rate": 0.00011120000000000002, | |
| "loss": 0.47240777015686036, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 0.19960396039603961, | |
| "grad_norm": 0.1014682874083519, | |
| "learning_rate": 0.00011031111111111112, | |
| "loss": 0.418576717376709, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 0.2011881188118812, | |
| "grad_norm": 0.11128360033035278, | |
| "learning_rate": 0.00010942222222222223, | |
| "loss": 0.43076472282409667, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 0.20277227722772279, | |
| "grad_norm": 0.11667651683092117, | |
| "learning_rate": 0.00010853333333333333, | |
| "loss": 0.44633755683898924, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 0.20435643564356434, | |
| "grad_norm": 0.1285824179649353, | |
| "learning_rate": 0.00010764444444444446, | |
| "loss": 0.4465335845947266, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 0.20594059405940593, | |
| "grad_norm": 0.1088799238204956, | |
| "learning_rate": 0.00010675555555555557, | |
| "loss": 0.44507641792297364, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 0.20752475247524751, | |
| "grad_norm": 0.12076769769191742, | |
| "learning_rate": 0.00010586666666666667, | |
| "loss": 0.42668471336364744, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 0.2091089108910891, | |
| "grad_norm": 0.13205377757549286, | |
| "learning_rate": 0.00010497777777777778, | |
| "loss": 0.41853861808776854, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 0.21069306930693069, | |
| "grad_norm": 0.11711034923791885, | |
| "learning_rate": 0.0001040888888888889, | |
| "loss": 0.4606321334838867, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 0.21227722772277227, | |
| "grad_norm": 0.0950397327542305, | |
| "learning_rate": 0.0001032, | |
| "loss": 0.4588432788848877, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 0.21386138613861386, | |
| "grad_norm": 0.09417828172445297, | |
| "learning_rate": 0.00010231111111111112, | |
| "loss": 0.45938754081726074, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 0.21544554455445544, | |
| "grad_norm": 0.1291818916797638, | |
| "learning_rate": 0.00010142222222222222, | |
| "loss": 0.4537965774536133, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 0.21702970297029703, | |
| "grad_norm": 0.11345808953046799, | |
| "learning_rate": 0.00010053333333333334, | |
| "loss": 0.4731899261474609, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 0.2186138613861386, | |
| "grad_norm": 0.11020190268754959, | |
| "learning_rate": 9.964444444444445e-05, | |
| "loss": 0.4215576171875, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 0.2201980198019802, | |
| "grad_norm": 0.10281681269407272, | |
| "learning_rate": 9.875555555555555e-05, | |
| "loss": 0.45673704147338867, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 0.22178217821782178, | |
| "grad_norm": 0.11533461511135101, | |
| "learning_rate": 9.786666666666667e-05, | |
| "loss": 0.43448405265808104, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 0.22336633663366337, | |
| "grad_norm": 0.10428951680660248, | |
| "learning_rate": 9.697777777777777e-05, | |
| "loss": 0.42266035079956055, | |
| "step": 1410 | |
| }, | |
| { | |
| "epoch": 0.22495049504950496, | |
| "grad_norm": 0.11180785298347473, | |
| "learning_rate": 9.608888888888889e-05, | |
| "loss": 0.43655991554260254, | |
| "step": 1420 | |
| }, | |
| { | |
| "epoch": 0.22653465346534654, | |
| "grad_norm": 0.14148098230361938, | |
| "learning_rate": 9.52e-05, | |
| "loss": 0.45973858833312986, | |
| "step": 1430 | |
| }, | |
| { | |
| "epoch": 0.22811881188118813, | |
| "grad_norm": 0.10056508332490921, | |
| "learning_rate": 9.431111111111111e-05, | |
| "loss": 0.4729654312133789, | |
| "step": 1440 | |
| }, | |
| { | |
| "epoch": 0.2297029702970297, | |
| "grad_norm": 0.12625491619110107, | |
| "learning_rate": 9.342222222222222e-05, | |
| "loss": 0.4501173496246338, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 0.2312871287128713, | |
| "grad_norm": 0.13399824500083923, | |
| "learning_rate": 9.253333333333334e-05, | |
| "loss": 0.4454296588897705, | |
| "step": 1460 | |
| }, | |
| { | |
| "epoch": 0.23287128712871288, | |
| "grad_norm": 0.10759555548429489, | |
| "learning_rate": 9.164444444444444e-05, | |
| "loss": 0.4457117557525635, | |
| "step": 1470 | |
| }, | |
| { | |
| "epoch": 0.23445544554455444, | |
| "grad_norm": 0.11816436797380447, | |
| "learning_rate": 9.075555555555556e-05, | |
| "loss": 0.43582868576049805, | |
| "step": 1480 | |
| }, | |
| { | |
| "epoch": 0.23603960396039603, | |
| "grad_norm": 0.12996898591518402, | |
| "learning_rate": 8.986666666666666e-05, | |
| "loss": 0.4595947265625, | |
| "step": 1490 | |
| }, | |
| { | |
| "epoch": 0.2376237623762376, | |
| "grad_norm": 0.12041634321212769, | |
| "learning_rate": 8.897777777777778e-05, | |
| "loss": 0.4592463493347168, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 0.2392079207920792, | |
| "grad_norm": 0.09746157377958298, | |
| "learning_rate": 8.80888888888889e-05, | |
| "loss": 0.4601451873779297, | |
| "step": 1510 | |
| }, | |
| { | |
| "epoch": 0.24079207920792078, | |
| "grad_norm": 0.13244478404521942, | |
| "learning_rate": 8.72e-05, | |
| "loss": 0.4243985652923584, | |
| "step": 1520 | |
| }, | |
| { | |
| "epoch": 0.24237623762376237, | |
| "grad_norm": 0.11454407870769501, | |
| "learning_rate": 8.631111111111112e-05, | |
| "loss": 0.4436774730682373, | |
| "step": 1530 | |
| }, | |
| { | |
| "epoch": 0.24396039603960396, | |
| "grad_norm": 0.10578440874814987, | |
| "learning_rate": 8.542222222222223e-05, | |
| "loss": 0.42084641456604005, | |
| "step": 1540 | |
| }, | |
| { | |
| "epoch": 0.24554455445544554, | |
| "grad_norm": 0.12399782985448837, | |
| "learning_rate": 8.453333333333335e-05, | |
| "loss": 0.4574925422668457, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 0.24712871287128713, | |
| "grad_norm": 0.1136360839009285, | |
| "learning_rate": 8.364444444444445e-05, | |
| "loss": 0.4346503257751465, | |
| "step": 1560 | |
| }, | |
| { | |
| "epoch": 0.2487128712871287, | |
| "grad_norm": 0.1318485289812088, | |
| "learning_rate": 8.275555555555557e-05, | |
| "loss": 0.43329200744628904, | |
| "step": 1570 | |
| }, | |
| { | |
| "epoch": 0.2502970297029703, | |
| "grad_norm": 0.11364690959453583, | |
| "learning_rate": 8.186666666666667e-05, | |
| "loss": 0.4125970840454102, | |
| "step": 1580 | |
| }, | |
| { | |
| "epoch": 0.2518811881188119, | |
| "grad_norm": 0.10456566512584686, | |
| "learning_rate": 8.097777777777779e-05, | |
| "loss": 0.4665355682373047, | |
| "step": 1590 | |
| }, | |
| { | |
| "epoch": 0.25346534653465347, | |
| "grad_norm": 0.08970664441585541, | |
| "learning_rate": 8.00888888888889e-05, | |
| "loss": 0.5053329944610596, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 0.25504950495049505, | |
| "grad_norm": 0.1372910887002945, | |
| "learning_rate": 7.920000000000001e-05, | |
| "loss": 0.42962069511413575, | |
| "step": 1610 | |
| }, | |
| { | |
| "epoch": 0.25663366336633664, | |
| "grad_norm": 0.12862013280391693, | |
| "learning_rate": 7.831111111111112e-05, | |
| "loss": 0.4417405128479004, | |
| "step": 1620 | |
| }, | |
| { | |
| "epoch": 0.2582178217821782, | |
| "grad_norm": 0.1060621365904808, | |
| "learning_rate": 7.742222222222222e-05, | |
| "loss": 0.4423251152038574, | |
| "step": 1630 | |
| }, | |
| { | |
| "epoch": 0.2598019801980198, | |
| "grad_norm": 0.11200203001499176, | |
| "learning_rate": 7.653333333333333e-05, | |
| "loss": 0.4621281623840332, | |
| "step": 1640 | |
| }, | |
| { | |
| "epoch": 0.2613861386138614, | |
| "grad_norm": 0.11022822558879852, | |
| "learning_rate": 7.564444444444445e-05, | |
| "loss": 0.4474879264831543, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 0.262970297029703, | |
| "grad_norm": 0.10621003806591034, | |
| "learning_rate": 7.475555555555555e-05, | |
| "loss": 0.42008557319641116, | |
| "step": 1660 | |
| }, | |
| { | |
| "epoch": 0.26455445544554457, | |
| "grad_norm": 0.11836650967597961, | |
| "learning_rate": 7.386666666666667e-05, | |
| "loss": 0.43319091796875, | |
| "step": 1670 | |
| }, | |
| { | |
| "epoch": 0.26613861386138615, | |
| "grad_norm": 0.1123187392950058, | |
| "learning_rate": 7.297777777777777e-05, | |
| "loss": 0.4296769618988037, | |
| "step": 1680 | |
| }, | |
| { | |
| "epoch": 0.26772277227722774, | |
| "grad_norm": 0.10100077092647552, | |
| "learning_rate": 7.208888888888889e-05, | |
| "loss": 0.41154913902282714, | |
| "step": 1690 | |
| }, | |
| { | |
| "epoch": 0.2693069306930693, | |
| "grad_norm": 0.1045333743095398, | |
| "learning_rate": 7.12e-05, | |
| "loss": 0.4219111442565918, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 0.2708910891089109, | |
| "grad_norm": 0.13197870552539825, | |
| "learning_rate": 7.031111111111111e-05, | |
| "loss": 0.43259439468383787, | |
| "step": 1710 | |
| }, | |
| { | |
| "epoch": 0.2724752475247525, | |
| "grad_norm": 0.14993301033973694, | |
| "learning_rate": 6.942222222222222e-05, | |
| "loss": 0.4652869701385498, | |
| "step": 1720 | |
| }, | |
| { | |
| "epoch": 0.2740594059405941, | |
| "grad_norm": 0.10407901555299759, | |
| "learning_rate": 6.853333333333334e-05, | |
| "loss": 0.4714209079742432, | |
| "step": 1730 | |
| }, | |
| { | |
| "epoch": 0.27564356435643567, | |
| "grad_norm": 0.10922378301620483, | |
| "learning_rate": 6.764444444444444e-05, | |
| "loss": 0.4610575199127197, | |
| "step": 1740 | |
| }, | |
| { | |
| "epoch": 0.27722772277227725, | |
| "grad_norm": 0.1403568983078003, | |
| "learning_rate": 6.675555555555556e-05, | |
| "loss": 0.41899795532226564, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 0.27881188118811884, | |
| "grad_norm": 0.10836900025606155, | |
| "learning_rate": 6.586666666666666e-05, | |
| "loss": 0.4317145824432373, | |
| "step": 1760 | |
| }, | |
| { | |
| "epoch": 0.2803960396039604, | |
| "grad_norm": 0.1111619770526886, | |
| "learning_rate": 6.497777777777778e-05, | |
| "loss": 0.4658851146697998, | |
| "step": 1770 | |
| }, | |
| { | |
| "epoch": 0.28198019801980195, | |
| "grad_norm": 0.12308915704488754, | |
| "learning_rate": 6.408888888888889e-05, | |
| "loss": 0.4481384754180908, | |
| "step": 1780 | |
| }, | |
| { | |
| "epoch": 0.28356435643564354, | |
| "grad_norm": 0.12358427047729492, | |
| "learning_rate": 6.32e-05, | |
| "loss": 0.40901408195495603, | |
| "step": 1790 | |
| }, | |
| { | |
| "epoch": 0.2851485148514851, | |
| "grad_norm": 0.10029692202806473, | |
| "learning_rate": 6.231111111111111e-05, | |
| "loss": 0.40301804542541503, | |
| "step": 1800 | |
| }, | |
| { | |
| "epoch": 0.2867326732673267, | |
| "grad_norm": 0.11558814346790314, | |
| "learning_rate": 6.142222222222223e-05, | |
| "loss": 0.4106534481048584, | |
| "step": 1810 | |
| }, | |
| { | |
| "epoch": 0.2883168316831683, | |
| "grad_norm": 0.14374975860118866, | |
| "learning_rate": 6.053333333333333e-05, | |
| "loss": 0.4378472328186035, | |
| "step": 1820 | |
| }, | |
| { | |
| "epoch": 0.2899009900990099, | |
| "grad_norm": 0.10107695311307907, | |
| "learning_rate": 5.964444444444445e-05, | |
| "loss": 0.45861082077026366, | |
| "step": 1830 | |
| }, | |
| { | |
| "epoch": 0.29148514851485147, | |
| "grad_norm": 0.11167020350694656, | |
| "learning_rate": 5.875555555555556e-05, | |
| "loss": 0.4487330913543701, | |
| "step": 1840 | |
| }, | |
| { | |
| "epoch": 0.29306930693069305, | |
| "grad_norm": 0.13690310716629028, | |
| "learning_rate": 5.7866666666666666e-05, | |
| "loss": 0.46242694854736327, | |
| "step": 1850 | |
| }, | |
| { | |
| "epoch": 0.29465346534653464, | |
| "grad_norm": 0.14845994114875793, | |
| "learning_rate": 5.6977777777777784e-05, | |
| "loss": 0.45589003562927244, | |
| "step": 1860 | |
| }, | |
| { | |
| "epoch": 0.2962376237623762, | |
| "grad_norm": 0.11164864152669907, | |
| "learning_rate": 5.608888888888889e-05, | |
| "loss": 0.42093238830566404, | |
| "step": 1870 | |
| }, | |
| { | |
| "epoch": 0.2978217821782178, | |
| "grad_norm": 0.11217094957828522, | |
| "learning_rate": 5.520000000000001e-05, | |
| "loss": 0.4216471195220947, | |
| "step": 1880 | |
| }, | |
| { | |
| "epoch": 0.2994059405940594, | |
| "grad_norm": 0.12560051679611206, | |
| "learning_rate": 5.431111111111111e-05, | |
| "loss": 0.4341439247131348, | |
| "step": 1890 | |
| }, | |
| { | |
| "epoch": 0.300990099009901, | |
| "grad_norm": 0.11575620621442795, | |
| "learning_rate": 5.342222222222223e-05, | |
| "loss": 0.42635207176208495, | |
| "step": 1900 | |
| }, | |
| { | |
| "epoch": 0.30257425742574257, | |
| "grad_norm": 0.11144798994064331, | |
| "learning_rate": 5.2533333333333334e-05, | |
| "loss": 0.44115509986877444, | |
| "step": 1910 | |
| }, | |
| { | |
| "epoch": 0.30415841584158415, | |
| "grad_norm": 0.11413414776325226, | |
| "learning_rate": 5.164444444444445e-05, | |
| "loss": 0.4849900722503662, | |
| "step": 1920 | |
| }, | |
| { | |
| "epoch": 0.30574257425742574, | |
| "grad_norm": 0.11314431577920914, | |
| "learning_rate": 5.075555555555556e-05, | |
| "loss": 0.4439102649688721, | |
| "step": 1930 | |
| }, | |
| { | |
| "epoch": 0.3073267326732673, | |
| "grad_norm": 0.12936046719551086, | |
| "learning_rate": 4.986666666666667e-05, | |
| "loss": 0.4341707706451416, | |
| "step": 1940 | |
| }, | |
| { | |
| "epoch": 0.3089108910891089, | |
| "grad_norm": 0.1315099000930786, | |
| "learning_rate": 4.897777777777778e-05, | |
| "loss": 0.47466235160827636, | |
| "step": 1950 | |
| }, | |
| { | |
| "epoch": 0.3104950495049505, | |
| "grad_norm": 0.135579913854599, | |
| "learning_rate": 4.808888888888889e-05, | |
| "loss": 0.4311628818511963, | |
| "step": 1960 | |
| }, | |
| { | |
| "epoch": 0.3120792079207921, | |
| "grad_norm": 0.1412050724029541, | |
| "learning_rate": 4.72e-05, | |
| "loss": 0.4453381061553955, | |
| "step": 1970 | |
| }, | |
| { | |
| "epoch": 0.31366336633663366, | |
| "grad_norm": 0.1284494251012802, | |
| "learning_rate": 4.6311111111111113e-05, | |
| "loss": 0.4290179252624512, | |
| "step": 1980 | |
| }, | |
| { | |
| "epoch": 0.31524752475247525, | |
| "grad_norm": 0.13294199109077454, | |
| "learning_rate": 4.5422222222222225e-05, | |
| "loss": 0.433257007598877, | |
| "step": 1990 | |
| }, | |
| { | |
| "epoch": 0.31683168316831684, | |
| "grad_norm": 0.12909874320030212, | |
| "learning_rate": 4.4533333333333336e-05, | |
| "loss": 0.44462175369262696, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 0.3184158415841584, | |
| "grad_norm": 0.10991871356964111, | |
| "learning_rate": 4.364444444444445e-05, | |
| "loss": 0.45618624687194825, | |
| "step": 2010 | |
| }, | |
| { | |
| "epoch": 0.32, | |
| "grad_norm": 0.12459543347358704, | |
| "learning_rate": 4.275555555555556e-05, | |
| "loss": 0.46598353385925295, | |
| "step": 2020 | |
| }, | |
| { | |
| "epoch": 0.3215841584158416, | |
| "grad_norm": 0.11573746055364609, | |
| "learning_rate": 4.186666666666667e-05, | |
| "loss": 0.3969010591506958, | |
| "step": 2030 | |
| }, | |
| { | |
| "epoch": 0.3231683168316832, | |
| "grad_norm": 0.10749443620443344, | |
| "learning_rate": 4.097777777777778e-05, | |
| "loss": 0.43877344131469725, | |
| "step": 2040 | |
| }, | |
| { | |
| "epoch": 0.32475247524752476, | |
| "grad_norm": 0.11602727323770523, | |
| "learning_rate": 4.008888888888889e-05, | |
| "loss": 0.43892335891723633, | |
| "step": 2050 | |
| }, | |
| { | |
| "epoch": 0.32633663366336635, | |
| "grad_norm": 0.1159844696521759, | |
| "learning_rate": 3.9200000000000004e-05, | |
| "loss": 0.4742868423461914, | |
| "step": 2060 | |
| }, | |
| { | |
| "epoch": 0.32792079207920793, | |
| "grad_norm": 0.12614595890045166, | |
| "learning_rate": 3.8311111111111115e-05, | |
| "loss": 0.4636037826538086, | |
| "step": 2070 | |
| }, | |
| { | |
| "epoch": 0.3295049504950495, | |
| "grad_norm": 0.11560297012329102, | |
| "learning_rate": 3.742222222222223e-05, | |
| "loss": 0.4710518836975098, | |
| "step": 2080 | |
| }, | |
| { | |
| "epoch": 0.3310891089108911, | |
| "grad_norm": 0.15510666370391846, | |
| "learning_rate": 3.653333333333334e-05, | |
| "loss": 0.4677096366882324, | |
| "step": 2090 | |
| }, | |
| { | |
| "epoch": 0.3326732673267327, | |
| "grad_norm": 0.14245380461215973, | |
| "learning_rate": 3.564444444444445e-05, | |
| "loss": 0.4679864406585693, | |
| "step": 2100 | |
| }, | |
| { | |
| "epoch": 0.3342574257425743, | |
| "grad_norm": 0.11864912509918213, | |
| "learning_rate": 3.475555555555556e-05, | |
| "loss": 0.4409189701080322, | |
| "step": 2110 | |
| }, | |
| { | |
| "epoch": 0.33584158415841586, | |
| "grad_norm": 0.1343812793493271, | |
| "learning_rate": 3.3866666666666665e-05, | |
| "loss": 0.4066458702087402, | |
| "step": 2120 | |
| }, | |
| { | |
| "epoch": 0.33742574257425745, | |
| "grad_norm": 0.10461611300706863, | |
| "learning_rate": 3.297777777777778e-05, | |
| "loss": 0.4319614887237549, | |
| "step": 2130 | |
| }, | |
| { | |
| "epoch": 0.33900990099009903, | |
| "grad_norm": 0.11563409864902496, | |
| "learning_rate": 3.208888888888889e-05, | |
| "loss": 0.432065486907959, | |
| "step": 2140 | |
| }, | |
| { | |
| "epoch": 0.3405940594059406, | |
| "grad_norm": 0.10783884674310684, | |
| "learning_rate": 3.12e-05, | |
| "loss": 0.4129596710205078, | |
| "step": 2150 | |
| }, | |
| { | |
| "epoch": 0.3421782178217822, | |
| "grad_norm": 0.14003720879554749, | |
| "learning_rate": 3.031111111111111e-05, | |
| "loss": 0.4282253265380859, | |
| "step": 2160 | |
| }, | |
| { | |
| "epoch": 0.34376237623762373, | |
| "grad_norm": 0.1377970576286316, | |
| "learning_rate": 2.9422222222222222e-05, | |
| "loss": 0.4517963886260986, | |
| "step": 2170 | |
| }, | |
| { | |
| "epoch": 0.3453465346534653, | |
| "grad_norm": 0.14984577894210815, | |
| "learning_rate": 2.8533333333333333e-05, | |
| "loss": 0.4358660697937012, | |
| "step": 2180 | |
| }, | |
| { | |
| "epoch": 0.3469306930693069, | |
| "grad_norm": 0.11084114760160446, | |
| "learning_rate": 2.7644444444444445e-05, | |
| "loss": 0.3825148344039917, | |
| "step": 2190 | |
| }, | |
| { | |
| "epoch": 0.3485148514851485, | |
| "grad_norm": 0.11796099692583084, | |
| "learning_rate": 2.6755555555555556e-05, | |
| "loss": 0.4264970779418945, | |
| "step": 2200 | |
| }, | |
| { | |
| "epoch": 0.3500990099009901, | |
| "grad_norm": 0.13587944209575653, | |
| "learning_rate": 2.5866666666666667e-05, | |
| "loss": 0.41341686248779297, | |
| "step": 2210 | |
| }, | |
| { | |
| "epoch": 0.35168316831683166, | |
| "grad_norm": 0.09793379157781601, | |
| "learning_rate": 2.497777777777778e-05, | |
| "loss": 0.4133430480957031, | |
| "step": 2220 | |
| }, | |
| { | |
| "epoch": 0.35326732673267325, | |
| "grad_norm": 0.10808942466974258, | |
| "learning_rate": 2.408888888888889e-05, | |
| "loss": 0.42418746948242186, | |
| "step": 2230 | |
| }, | |
| { | |
| "epoch": 0.35485148514851483, | |
| "grad_norm": 0.11084719747304916, | |
| "learning_rate": 2.32e-05, | |
| "loss": 0.42908754348754885, | |
| "step": 2240 | |
| }, | |
| { | |
| "epoch": 0.3564356435643564, | |
| "grad_norm": 0.11243141442537308, | |
| "learning_rate": 2.2311111111111113e-05, | |
| "loss": 0.4357435703277588, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 0.358019801980198, | |
| "grad_norm": 0.0989893451333046, | |
| "learning_rate": 2.1422222222222224e-05, | |
| "loss": 0.4179375648498535, | |
| "step": 2260 | |
| }, | |
| { | |
| "epoch": 0.3596039603960396, | |
| "grad_norm": 0.1555781066417694, | |
| "learning_rate": 2.0533333333333336e-05, | |
| "loss": 0.42656970024108887, | |
| "step": 2270 | |
| }, | |
| { | |
| "epoch": 0.3611881188118812, | |
| "grad_norm": 0.10041913390159607, | |
| "learning_rate": 1.9644444444444447e-05, | |
| "loss": 0.40676274299621584, | |
| "step": 2280 | |
| }, | |
| { | |
| "epoch": 0.36277227722772276, | |
| "grad_norm": 0.11605637520551682, | |
| "learning_rate": 1.8755555555555558e-05, | |
| "loss": 0.4679983139038086, | |
| "step": 2290 | |
| }, | |
| { | |
| "epoch": 0.36435643564356435, | |
| "grad_norm": 0.10629253089427948, | |
| "learning_rate": 1.7866666666666666e-05, | |
| "loss": 0.41867480278015134, | |
| "step": 2300 | |
| }, | |
| { | |
| "epoch": 0.36594059405940593, | |
| "grad_norm": 0.12453669309616089, | |
| "learning_rate": 1.6977777777777777e-05, | |
| "loss": 0.42065892219543455, | |
| "step": 2310 | |
| }, | |
| { | |
| "epoch": 0.3675247524752475, | |
| "grad_norm": 0.11581775546073914, | |
| "learning_rate": 1.608888888888889e-05, | |
| "loss": 0.444520092010498, | |
| "step": 2320 | |
| }, | |
| { | |
| "epoch": 0.3691089108910891, | |
| "grad_norm": 0.1057516410946846, | |
| "learning_rate": 1.52e-05, | |
| "loss": 0.4548838138580322, | |
| "step": 2330 | |
| }, | |
| { | |
| "epoch": 0.3706930693069307, | |
| "grad_norm": 0.11470479518175125, | |
| "learning_rate": 1.4311111111111111e-05, | |
| "loss": 0.42058815956115725, | |
| "step": 2340 | |
| }, | |
| { | |
| "epoch": 0.3722772277227723, | |
| "grad_norm": 0.11543627828359604, | |
| "learning_rate": 1.3422222222222223e-05, | |
| "loss": 0.4344294548034668, | |
| "step": 2350 | |
| }, | |
| { | |
| "epoch": 0.37386138613861386, | |
| "grad_norm": 0.12915924191474915, | |
| "learning_rate": 1.2533333333333332e-05, | |
| "loss": 0.4566244125366211, | |
| "step": 2360 | |
| }, | |
| { | |
| "epoch": 0.37544554455445545, | |
| "grad_norm": 0.11681529879570007, | |
| "learning_rate": 1.1644444444444446e-05, | |
| "loss": 0.451328182220459, | |
| "step": 2370 | |
| }, | |
| { | |
| "epoch": 0.37702970297029703, | |
| "grad_norm": 0.11974669992923737, | |
| "learning_rate": 1.0755555555555557e-05, | |
| "loss": 0.45825467109680174, | |
| "step": 2380 | |
| }, | |
| { | |
| "epoch": 0.3786138613861386, | |
| "grad_norm": 0.11217518150806427, | |
| "learning_rate": 9.866666666666667e-06, | |
| "loss": 0.4391200065612793, | |
| "step": 2390 | |
| }, | |
| { | |
| "epoch": 0.3801980198019802, | |
| "grad_norm": 0.13289013504981995, | |
| "learning_rate": 8.977777777777778e-06, | |
| "loss": 0.42261600494384766, | |
| "step": 2400 | |
| }, | |
| { | |
| "epoch": 0.3817821782178218, | |
| "grad_norm": 0.13508014380931854, | |
| "learning_rate": 8.08888888888889e-06, | |
| "loss": 0.41110858917236326, | |
| "step": 2410 | |
| }, | |
| { | |
| "epoch": 0.3833663366336634, | |
| "grad_norm": 0.12474465370178223, | |
| "learning_rate": 7.2e-06, | |
| "loss": 0.45510258674621584, | |
| "step": 2420 | |
| }, | |
| { | |
| "epoch": 0.38495049504950496, | |
| "grad_norm": 0.13648369908332825, | |
| "learning_rate": 6.311111111111112e-06, | |
| "loss": 0.44538493156433107, | |
| "step": 2430 | |
| }, | |
| { | |
| "epoch": 0.38653465346534654, | |
| "grad_norm": 0.1486520767211914, | |
| "learning_rate": 5.422222222222222e-06, | |
| "loss": 0.4148688793182373, | |
| "step": 2440 | |
| }, | |
| { | |
| "epoch": 0.38811881188118813, | |
| "grad_norm": 0.12737219035625458, | |
| "learning_rate": 4.533333333333334e-06, | |
| "loss": 0.44829936027526857, | |
| "step": 2450 | |
| }, | |
| { | |
| "epoch": 0.3897029702970297, | |
| "grad_norm": 0.1182004064321518, | |
| "learning_rate": 3.6444444444444446e-06, | |
| "loss": 0.45412731170654297, | |
| "step": 2460 | |
| }, | |
| { | |
| "epoch": 0.3912871287128713, | |
| "grad_norm": 0.14805327355861664, | |
| "learning_rate": 2.7555555555555555e-06, | |
| "loss": 0.43036956787109376, | |
| "step": 2470 | |
| }, | |
| { | |
| "epoch": 0.3928712871287129, | |
| "grad_norm": 0.12756042182445526, | |
| "learning_rate": 1.8666666666666669e-06, | |
| "loss": 0.42612438201904296, | |
| "step": 2480 | |
| }, | |
| { | |
| "epoch": 0.3944554455445545, | |
| "grad_norm": 0.12241974472999573, | |
| "learning_rate": 9.777777777777778e-07, | |
| "loss": 0.4630708694458008, | |
| "step": 2490 | |
| }, | |
| { | |
| "epoch": 0.39603960396039606, | |
| "grad_norm": 0.10594528913497925, | |
| "learning_rate": 8.88888888888889e-08, | |
| "loss": 0.4205953598022461, | |
| "step": 2500 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 2500, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 1, | |
| "save_steps": 200, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 3.900426845678039e+17, | |
| "train_batch_size": 4, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |