Instructions to use kadeck/harry_potter_1Epoch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use kadeck/harry_potter_1Epoch with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M") model = PeftModel.from_pretrained(base_model, "kadeck/harry_potter_1Epoch") - Transformers
How to use kadeck/harry_potter_1Epoch with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kadeck/harry_potter_1Epoch")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("kadeck/harry_potter_1Epoch", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kadeck/harry_potter_1Epoch with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kadeck/harry_potter_1Epoch" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kadeck/harry_potter_1Epoch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/kadeck/harry_potter_1Epoch
- SGLang
How to use kadeck/harry_potter_1Epoch with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kadeck/harry_potter_1Epoch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kadeck/harry_potter_1Epoch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kadeck/harry_potter_1Epoch" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kadeck/harry_potter_1Epoch", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use kadeck/harry_potter_1Epoch with Docker Model Runner:
docker model run hf.co/kadeck/harry_potter_1Epoch
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 0.9655172413793104, | |
| "eval_steps": 100, | |
| "global_step": 1400, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.006896551724137931, | |
| "grad_norm": 0.23170334100723267, | |
| "learning_rate": 4.0909090909090915e-05, | |
| "loss": 3.881548309326172, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.013793103448275862, | |
| "grad_norm": 0.2462630271911621, | |
| "learning_rate": 8.636363636363637e-05, | |
| "loss": 3.794137954711914, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.020689655172413793, | |
| "grad_norm": 0.2365545630455017, | |
| "learning_rate": 0.0001318181818181818, | |
| "loss": 3.7382652282714846, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.027586206896551724, | |
| "grad_norm": 0.2707105576992035, | |
| "learning_rate": 0.00017727272727272728, | |
| "loss": 3.7427467346191405, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.034482758620689655, | |
| "grad_norm": 0.24755217134952545, | |
| "learning_rate": 0.0001992887624466572, | |
| "loss": 3.61156005859375, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.041379310344827586, | |
| "grad_norm": 0.3027788996696472, | |
| "learning_rate": 0.00019786628733997158, | |
| "loss": 3.653477096557617, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.04827586206896552, | |
| "grad_norm": 0.2786545753479004, | |
| "learning_rate": 0.00019644381223328592, | |
| "loss": 3.6743812561035156, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.05517241379310345, | |
| "grad_norm": 0.2662774622440338, | |
| "learning_rate": 0.0001950213371266003, | |
| "loss": 3.5414581298828125, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.06206896551724138, | |
| "grad_norm": 0.30465996265411377, | |
| "learning_rate": 0.00019359886201991466, | |
| "loss": 3.5396636962890624, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.06896551724137931, | |
| "grad_norm": 0.26726341247558594, | |
| "learning_rate": 0.00019217638691322903, | |
| "loss": 3.56505126953125, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.06896551724137931, | |
| "eval_loss": 3.540494680404663, | |
| "eval_runtime": 20.7606, | |
| "eval_samples_per_second": 59.68, | |
| "eval_steps_per_second": 7.466, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.07586206896551724, | |
| "grad_norm": 0.28638505935668945, | |
| "learning_rate": 0.00019075391180654338, | |
| "loss": 3.5479259490966797, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.08275862068965517, | |
| "grad_norm": 0.26623088121414185, | |
| "learning_rate": 0.00018933143669985775, | |
| "loss": 3.538827133178711, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.0896551724137931, | |
| "grad_norm": 0.3132971525192261, | |
| "learning_rate": 0.00018790896159317212, | |
| "loss": 3.500360107421875, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.09655172413793103, | |
| "grad_norm": 0.2965874969959259, | |
| "learning_rate": 0.0001864864864864865, | |
| "loss": 3.5192401885986326, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.10344827586206896, | |
| "grad_norm": 0.2784412205219269, | |
| "learning_rate": 0.00018506401137980089, | |
| "loss": 3.6045772552490236, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.1103448275862069, | |
| "grad_norm": 0.3488187789916992, | |
| "learning_rate": 0.00018364153627311523, | |
| "loss": 3.497270202636719, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.11724137931034483, | |
| "grad_norm": 0.2777283191680908, | |
| "learning_rate": 0.0001822190611664296, | |
| "loss": 3.417000961303711, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 0.12413793103448276, | |
| "grad_norm": 0.3144644796848297, | |
| "learning_rate": 0.00018079658605974397, | |
| "loss": 3.5388118743896486, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 0.1310344827586207, | |
| "grad_norm": 0.32053834199905396, | |
| "learning_rate": 0.00017937411095305834, | |
| "loss": 3.4628257751464844, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 0.13793103448275862, | |
| "grad_norm": 0.3077249228954315, | |
| "learning_rate": 0.00017795163584637268, | |
| "loss": 3.456898498535156, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.13793103448275862, | |
| "eval_loss": 3.459949016571045, | |
| "eval_runtime": 23.2077, | |
| "eval_samples_per_second": 53.388, | |
| "eval_steps_per_second": 6.679, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.14482758620689656, | |
| "grad_norm": 0.3117501139640808, | |
| "learning_rate": 0.00017652916073968705, | |
| "loss": 3.4756256103515626, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 0.15172413793103448, | |
| "grad_norm": 0.31221652030944824, | |
| "learning_rate": 0.00017510668563300142, | |
| "loss": 3.4557666778564453, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 0.15862068965517243, | |
| "grad_norm": 0.30920568108558655, | |
| "learning_rate": 0.0001736842105263158, | |
| "loss": 3.5030059814453125, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 0.16551724137931034, | |
| "grad_norm": 0.3118240535259247, | |
| "learning_rate": 0.00017226173541963016, | |
| "loss": 3.463837814331055, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 0.1724137931034483, | |
| "grad_norm": 0.29713189601898193, | |
| "learning_rate": 0.00017083926031294454, | |
| "loss": 3.482789993286133, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.1793103448275862, | |
| "grad_norm": 0.31772103905677795, | |
| "learning_rate": 0.0001694167852062589, | |
| "loss": 3.4497989654541015, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 0.18620689655172415, | |
| "grad_norm": 0.31749677658081055, | |
| "learning_rate": 0.00016799431009957328, | |
| "loss": 3.426702880859375, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 0.19310344827586207, | |
| "grad_norm": 0.3107665479183197, | |
| "learning_rate": 0.00016657183499288765, | |
| "loss": 3.396825408935547, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 0.2, | |
| "grad_norm": 0.32543718814849854, | |
| "learning_rate": 0.000165149359886202, | |
| "loss": 3.4777637481689454, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 0.20689655172413793, | |
| "grad_norm": 0.3045833110809326, | |
| "learning_rate": 0.00016372688477951636, | |
| "loss": 3.417892074584961, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.20689655172413793, | |
| "eval_loss": 3.4208664894104004, | |
| "eval_runtime": 22.6365, | |
| "eval_samples_per_second": 54.735, | |
| "eval_steps_per_second": 6.847, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.21379310344827587, | |
| "grad_norm": 0.3320230543613434, | |
| "learning_rate": 0.00016230440967283073, | |
| "loss": 3.4840187072753905, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 0.2206896551724138, | |
| "grad_norm": 0.30821651220321655, | |
| "learning_rate": 0.0001608819345661451, | |
| "loss": 3.362594985961914, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 0.22758620689655173, | |
| "grad_norm": 0.330126017332077, | |
| "learning_rate": 0.00015945945945945947, | |
| "loss": 3.4734111785888673, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 0.23448275862068965, | |
| "grad_norm": 0.30710867047309875, | |
| "learning_rate": 0.00015803698435277384, | |
| "loss": 3.4003364562988283, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 0.2413793103448276, | |
| "grad_norm": 0.30796217918395996, | |
| "learning_rate": 0.0001566145092460882, | |
| "loss": 3.4497623443603516, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.2482758620689655, | |
| "grad_norm": 0.31471186876296997, | |
| "learning_rate": 0.00015519203413940258, | |
| "loss": 3.410964584350586, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 0.25517241379310346, | |
| "grad_norm": 0.31033286452293396, | |
| "learning_rate": 0.00015376955903271693, | |
| "loss": 3.4347129821777345, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 0.2620689655172414, | |
| "grad_norm": 0.32137277722358704, | |
| "learning_rate": 0.0001523470839260313, | |
| "loss": 3.4425697326660156, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 0.2689655172413793, | |
| "grad_norm": 0.3627667725086212, | |
| "learning_rate": 0.00015092460881934567, | |
| "loss": 3.3625614166259767, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 0.27586206896551724, | |
| "grad_norm": 0.3407364785671234, | |
| "learning_rate": 0.00014950213371266004, | |
| "loss": 3.3849735260009766, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.27586206896551724, | |
| "eval_loss": 3.3975093364715576, | |
| "eval_runtime": 24.3998, | |
| "eval_samples_per_second": 50.779, | |
| "eval_steps_per_second": 6.353, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.2827586206896552, | |
| "grad_norm": 0.32097798585891724, | |
| "learning_rate": 0.00014807965860597438, | |
| "loss": 3.417188262939453, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 0.2896551724137931, | |
| "grad_norm": 0.33030977845191956, | |
| "learning_rate": 0.00014665718349928875, | |
| "loss": 3.429982376098633, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 0.296551724137931, | |
| "grad_norm": 0.3301655054092407, | |
| "learning_rate": 0.00014523470839260315, | |
| "loss": 3.338633728027344, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 0.30344827586206896, | |
| "grad_norm": 0.32900184392929077, | |
| "learning_rate": 0.00014381223328591752, | |
| "loss": 3.376237487792969, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 0.3103448275862069, | |
| "grad_norm": 0.3433472812175751, | |
| "learning_rate": 0.0001423897581792319, | |
| "loss": 3.3466358184814453, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.31724137931034485, | |
| "grad_norm": 0.31025466322898865, | |
| "learning_rate": 0.00014096728307254623, | |
| "loss": 3.371001052856445, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 0.32413793103448274, | |
| "grad_norm": 0.3327469527721405, | |
| "learning_rate": 0.0001395448079658606, | |
| "loss": 3.355500411987305, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 0.3310344827586207, | |
| "grad_norm": 0.34813839197158813, | |
| "learning_rate": 0.00013812233285917497, | |
| "loss": 3.3663055419921877, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 0.33793103448275863, | |
| "grad_norm": 0.35365816950798035, | |
| "learning_rate": 0.00013669985775248934, | |
| "loss": 3.4172080993652343, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 0.3448275862068966, | |
| "grad_norm": 0.31251364946365356, | |
| "learning_rate": 0.0001352773826458037, | |
| "loss": 3.334903335571289, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.3448275862068966, | |
| "eval_loss": 3.377959728240967, | |
| "eval_runtime": 22.4366, | |
| "eval_samples_per_second": 55.222, | |
| "eval_steps_per_second": 6.908, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.35172413793103446, | |
| "grad_norm": 0.3221539855003357, | |
| "learning_rate": 0.00013385490753911806, | |
| "loss": 3.3751365661621096, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 0.3586206896551724, | |
| "grad_norm": 0.31918609142303467, | |
| "learning_rate": 0.00013243243243243243, | |
| "loss": 3.360042953491211, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 0.36551724137931035, | |
| "grad_norm": 0.3304445445537567, | |
| "learning_rate": 0.00013100995732574682, | |
| "loss": 3.355394744873047, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 0.3724137931034483, | |
| "grad_norm": 0.31707221269607544, | |
| "learning_rate": 0.00012958748221906117, | |
| "loss": 3.3813400268554688, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 0.3793103448275862, | |
| "grad_norm": 0.3358207643032074, | |
| "learning_rate": 0.00012816500711237554, | |
| "loss": 3.4241260528564452, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.38620689655172413, | |
| "grad_norm": 0.3196071982383728, | |
| "learning_rate": 0.0001267425320056899, | |
| "loss": 3.3610572814941406, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 0.3931034482758621, | |
| "grad_norm": 0.31611961126327515, | |
| "learning_rate": 0.00012532005689900428, | |
| "loss": 3.328931427001953, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 0.4, | |
| "grad_norm": 0.33409208059310913, | |
| "learning_rate": 0.00012389758179231865, | |
| "loss": 3.32372932434082, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 0.4068965517241379, | |
| "grad_norm": 0.322489470243454, | |
| "learning_rate": 0.000122475106685633, | |
| "loss": 3.389539337158203, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 0.41379310344827586, | |
| "grad_norm": 0.3401939272880554, | |
| "learning_rate": 0.00012105263157894738, | |
| "loss": 3.292881393432617, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.41379310344827586, | |
| "eval_loss": 3.3636481761932373, | |
| "eval_runtime": 24.5094, | |
| "eval_samples_per_second": 50.552, | |
| "eval_steps_per_second": 6.324, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.4206896551724138, | |
| "grad_norm": 0.36831986904144287, | |
| "learning_rate": 0.00011963015647226175, | |
| "loss": 3.3248523712158202, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 0.42758620689655175, | |
| "grad_norm": 0.31736257672309875, | |
| "learning_rate": 0.00011820768136557612, | |
| "loss": 3.328786849975586, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 0.43448275862068964, | |
| "grad_norm": 0.3393501341342926, | |
| "learning_rate": 0.00011678520625889046, | |
| "loss": 3.3191741943359374, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 0.4413793103448276, | |
| "grad_norm": 0.3327409327030182, | |
| "learning_rate": 0.00011536273115220485, | |
| "loss": 3.4381561279296875, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 0.4482758620689655, | |
| "grad_norm": 0.32990631461143494, | |
| "learning_rate": 0.00011394025604551922, | |
| "loss": 3.4140262603759766, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.45517241379310347, | |
| "grad_norm": 0.3171171247959137, | |
| "learning_rate": 0.00011251778093883359, | |
| "loss": 3.358592987060547, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 0.46206896551724136, | |
| "grad_norm": 0.319813072681427, | |
| "learning_rate": 0.00011109530583214793, | |
| "loss": 3.3165565490722657, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 0.4689655172413793, | |
| "grad_norm": 0.3260372579097748, | |
| "learning_rate": 0.0001096728307254623, | |
| "loss": 3.353110122680664, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 0.47586206896551725, | |
| "grad_norm": 0.3186911642551422, | |
| "learning_rate": 0.00010825035561877668, | |
| "loss": 3.4071842193603517, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 0.4827586206896552, | |
| "grad_norm": 0.3407030701637268, | |
| "learning_rate": 0.00010682788051209105, | |
| "loss": 3.3047447204589844, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.4827586206896552, | |
| "eval_loss": 3.353717803955078, | |
| "eval_runtime": 23.3132, | |
| "eval_samples_per_second": 53.146, | |
| "eval_steps_per_second": 6.649, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.4896551724137931, | |
| "grad_norm": 0.34808802604675293, | |
| "learning_rate": 0.0001054054054054054, | |
| "loss": 3.3398147583007813, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 0.496551724137931, | |
| "grad_norm": 0.31498315930366516, | |
| "learning_rate": 0.00010398293029871977, | |
| "loss": 3.3216724395751953, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 0.503448275862069, | |
| "grad_norm": 0.32081830501556396, | |
| "learning_rate": 0.00010256045519203414, | |
| "loss": 3.3753803253173826, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 0.5103448275862069, | |
| "grad_norm": 0.38737478852272034, | |
| "learning_rate": 0.00010113798008534852, | |
| "loss": 3.347806930541992, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 0.5172413793103449, | |
| "grad_norm": 0.3532518744468689, | |
| "learning_rate": 9.971550497866288e-05, | |
| "loss": 3.3405616760253904, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.5241379310344828, | |
| "grad_norm": 0.3295048773288727, | |
| "learning_rate": 9.829302987197725e-05, | |
| "loss": 3.3597396850585937, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 0.5310344827586206, | |
| "grad_norm": 0.3602186441421509, | |
| "learning_rate": 9.68705547652916e-05, | |
| "loss": 3.3083240509033205, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 0.5379310344827586, | |
| "grad_norm": 0.3464964032173157, | |
| "learning_rate": 9.544807965860598e-05, | |
| "loss": 3.3121707916259764, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 0.5448275862068965, | |
| "grad_norm": 0.314314067363739, | |
| "learning_rate": 9.402560455192035e-05, | |
| "loss": 3.3149440765380858, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 0.5517241379310345, | |
| "grad_norm": 0.3291971683502197, | |
| "learning_rate": 9.260312944523472e-05, | |
| "loss": 3.3775871276855467, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.5517241379310345, | |
| "eval_loss": 3.3451411724090576, | |
| "eval_runtime": 22.871, | |
| "eval_samples_per_second": 54.173, | |
| "eval_steps_per_second": 6.777, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.5586206896551724, | |
| "grad_norm": 0.33278515934944153, | |
| "learning_rate": 9.118065433854907e-05, | |
| "loss": 3.348500061035156, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 0.5655172413793104, | |
| "grad_norm": 0.32254090905189514, | |
| "learning_rate": 8.975817923186344e-05, | |
| "loss": 3.289051818847656, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 0.5724137931034483, | |
| "grad_norm": 0.37034258246421814, | |
| "learning_rate": 8.833570412517781e-05, | |
| "loss": 3.3568161010742186, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 0.5793103448275863, | |
| "grad_norm": 0.3335118889808655, | |
| "learning_rate": 8.691322901849219e-05, | |
| "loss": 3.388734817504883, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 0.5862068965517241, | |
| "grad_norm": 0.321696013212204, | |
| "learning_rate": 8.549075391180654e-05, | |
| "loss": 3.225112533569336, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.593103448275862, | |
| "grad_norm": 0.32803264260292053, | |
| "learning_rate": 8.406827880512091e-05, | |
| "loss": 3.3828250885009767, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 0.6, | |
| "grad_norm": 0.32728055119514465, | |
| "learning_rate": 8.264580369843528e-05, | |
| "loss": 3.382917022705078, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 0.6068965517241379, | |
| "grad_norm": 0.3484093248844147, | |
| "learning_rate": 8.122332859174965e-05, | |
| "loss": 3.3160984039306642, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 0.6137931034482759, | |
| "grad_norm": 0.3902784585952759, | |
| "learning_rate": 7.980085348506402e-05, | |
| "loss": 3.366690444946289, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 0.6206896551724138, | |
| "grad_norm": 0.32276031374931335, | |
| "learning_rate": 7.837837837837838e-05, | |
| "loss": 3.2556941986083983, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.6206896551724138, | |
| "eval_loss": 3.337270736694336, | |
| "eval_runtime": 23.3962, | |
| "eval_samples_per_second": 52.957, | |
| "eval_steps_per_second": 6.625, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.6275862068965518, | |
| "grad_norm": 0.36281818151474, | |
| "learning_rate": 7.695590327169275e-05, | |
| "loss": 3.4061851501464844, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 0.6344827586206897, | |
| "grad_norm": 0.3139365017414093, | |
| "learning_rate": 7.553342816500711e-05, | |
| "loss": 3.2938968658447267, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 0.6413793103448275, | |
| "grad_norm": 0.33926886320114136, | |
| "learning_rate": 7.411095305832149e-05, | |
| "loss": 3.3076290130615233, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 0.6482758620689655, | |
| "grad_norm": 0.3455406427383423, | |
| "learning_rate": 7.268847795163585e-05, | |
| "loss": 3.338056182861328, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 0.6551724137931034, | |
| "grad_norm": 0.3547625243663788, | |
| "learning_rate": 7.126600284495022e-05, | |
| "loss": 3.3874538421630858, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.6620689655172414, | |
| "grad_norm": 0.34468552470207214, | |
| "learning_rate": 6.984352773826458e-05, | |
| "loss": 3.35147705078125, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 0.6689655172413793, | |
| "grad_norm": 0.3656456470489502, | |
| "learning_rate": 6.842105263157895e-05, | |
| "loss": 3.415922164916992, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 0.6758620689655173, | |
| "grad_norm": 0.34468477964401245, | |
| "learning_rate": 6.699857752489332e-05, | |
| "loss": 3.3692134857177733, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 0.6827586206896552, | |
| "grad_norm": 0.3500272333621979, | |
| "learning_rate": 6.557610241820769e-05, | |
| "loss": 3.371417999267578, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 0.6896551724137931, | |
| "grad_norm": 0.3438541889190674, | |
| "learning_rate": 6.415362731152204e-05, | |
| "loss": 3.4223506927490233, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.6896551724137931, | |
| "eval_loss": 3.330833911895752, | |
| "eval_runtime": 22.3929, | |
| "eval_samples_per_second": 55.33, | |
| "eval_steps_per_second": 6.922, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.696551724137931, | |
| "grad_norm": 0.33815649151802063, | |
| "learning_rate": 6.273115220483641e-05, | |
| "loss": 3.3118003845214843, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 0.7034482758620689, | |
| "grad_norm": 0.3285435438156128, | |
| "learning_rate": 6.130867709815078e-05, | |
| "loss": 3.3082767486572267, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 0.7103448275862069, | |
| "grad_norm": 0.3286275863647461, | |
| "learning_rate": 5.988620199146515e-05, | |
| "loss": 3.373445510864258, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 0.7172413793103448, | |
| "grad_norm": 0.3484683334827423, | |
| "learning_rate": 5.8463726884779526e-05, | |
| "loss": 3.3201057434082033, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 0.7241379310344828, | |
| "grad_norm": 0.37690791487693787, | |
| "learning_rate": 5.704125177809388e-05, | |
| "loss": 3.322885513305664, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 0.7310344827586207, | |
| "grad_norm": 0.3458273112773895, | |
| "learning_rate": 5.561877667140826e-05, | |
| "loss": 3.3502052307128904, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 0.7379310344827587, | |
| "grad_norm": 0.3618911802768707, | |
| "learning_rate": 5.4196301564722616e-05, | |
| "loss": 3.3665504455566406, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 0.7448275862068966, | |
| "grad_norm": 0.34324246644973755, | |
| "learning_rate": 5.277382645803699e-05, | |
| "loss": 3.318630599975586, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 0.7517241379310344, | |
| "grad_norm": 0.3743279278278351, | |
| "learning_rate": 5.135135135135135e-05, | |
| "loss": 3.2997642517089845, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 0.7586206896551724, | |
| "grad_norm": 0.3348490595817566, | |
| "learning_rate": 4.992887624466572e-05, | |
| "loss": 3.194792556762695, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.7586206896551724, | |
| "eval_loss": 3.3259100914001465, | |
| "eval_runtime": 22.2447, | |
| "eval_samples_per_second": 55.699, | |
| "eval_steps_per_second": 6.968, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 0.7655172413793103, | |
| "grad_norm": 0.33868151903152466, | |
| "learning_rate": 4.850640113798009e-05, | |
| "loss": 3.346274566650391, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 0.7724137931034483, | |
| "grad_norm": 0.3498711585998535, | |
| "learning_rate": 4.7083926031294455e-05, | |
| "loss": 3.323177719116211, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 0.7793103448275862, | |
| "grad_norm": 0.3602657914161682, | |
| "learning_rate": 4.5661450924608825e-05, | |
| "loss": 3.273370361328125, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 0.7862068965517242, | |
| "grad_norm": 0.34091508388519287, | |
| "learning_rate": 4.423897581792319e-05, | |
| "loss": 3.288585662841797, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 0.7931034482758621, | |
| "grad_norm": 0.35901182889938354, | |
| "learning_rate": 4.281650071123756e-05, | |
| "loss": 3.3729190826416016, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 0.8, | |
| "grad_norm": 0.33599621057510376, | |
| "learning_rate": 4.139402560455192e-05, | |
| "loss": 3.355024719238281, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 0.8068965517241379, | |
| "grad_norm": 0.38110241293907166, | |
| "learning_rate": 3.997155049786629e-05, | |
| "loss": 3.343807601928711, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 0.8137931034482758, | |
| "grad_norm": 0.34958431124687195, | |
| "learning_rate": 3.854907539118066e-05, | |
| "loss": 3.272492218017578, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 0.8206896551724138, | |
| "grad_norm": 0.3552829623222351, | |
| "learning_rate": 3.712660028449502e-05, | |
| "loss": 3.3572948455810545, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 0.8275862068965517, | |
| "grad_norm": 0.322081595659256, | |
| "learning_rate": 3.570412517780939e-05, | |
| "loss": 3.3037082672119142, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 0.8275862068965517, | |
| "eval_loss": 3.3224334716796875, | |
| "eval_runtime": 22.2507, | |
| "eval_samples_per_second": 55.684, | |
| "eval_steps_per_second": 6.966, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 0.8344827586206897, | |
| "grad_norm": 0.35375288128852844, | |
| "learning_rate": 3.4281650071123755e-05, | |
| "loss": 3.2933795928955076, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 0.8413793103448276, | |
| "grad_norm": 0.35284116864204407, | |
| "learning_rate": 3.2859174964438125e-05, | |
| "loss": 3.3349658966064455, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 0.8482758620689655, | |
| "grad_norm": 0.36195898056030273, | |
| "learning_rate": 3.143669985775249e-05, | |
| "loss": 3.3784534454345705, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 0.8551724137931035, | |
| "grad_norm": 0.3708537518978119, | |
| "learning_rate": 3.0014224751066856e-05, | |
| "loss": 3.3437496185302735, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 0.8620689655172413, | |
| "grad_norm": 0.32489216327667236, | |
| "learning_rate": 2.8591749644381226e-05, | |
| "loss": 3.276387023925781, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 0.8689655172413793, | |
| "grad_norm": 0.3359311819076538, | |
| "learning_rate": 2.7169274537695593e-05, | |
| "loss": 3.2540233612060545, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 0.8758620689655172, | |
| "grad_norm": 0.40804561972618103, | |
| "learning_rate": 2.574679943100996e-05, | |
| "loss": 3.2611133575439455, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 0.8827586206896552, | |
| "grad_norm": 0.3684781491756439, | |
| "learning_rate": 2.4324324324324327e-05, | |
| "loss": 3.362625503540039, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 0.8896551724137931, | |
| "grad_norm": 0.38623297214508057, | |
| "learning_rate": 2.2901849217638694e-05, | |
| "loss": 3.302141571044922, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 0.896551724137931, | |
| "grad_norm": 0.3602025508880615, | |
| "learning_rate": 2.147937411095306e-05, | |
| "loss": 3.3853092193603516, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 0.896551724137931, | |
| "eval_loss": 3.319241762161255, | |
| "eval_runtime": 22.5951, | |
| "eval_samples_per_second": 54.835, | |
| "eval_steps_per_second": 6.86, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 0.903448275862069, | |
| "grad_norm": 0.3617671728134155, | |
| "learning_rate": 2.0056899004267428e-05, | |
| "loss": 3.3196762084960936, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 0.9103448275862069, | |
| "grad_norm": 0.3671157956123352, | |
| "learning_rate": 1.8634423897581792e-05, | |
| "loss": 3.358323669433594, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 0.9172413793103448, | |
| "grad_norm": 0.3617306053638458, | |
| "learning_rate": 1.721194879089616e-05, | |
| "loss": 3.2942028045654297, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 0.9241379310344827, | |
| "grad_norm": 0.3539746403694153, | |
| "learning_rate": 1.5789473684210526e-05, | |
| "loss": 3.332453155517578, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 0.9310344827586207, | |
| "grad_norm": 0.34931978583335876, | |
| "learning_rate": 1.4366998577524893e-05, | |
| "loss": 3.304658889770508, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 0.9379310344827586, | |
| "grad_norm": 0.33509236574172974, | |
| "learning_rate": 1.2944523470839262e-05, | |
| "loss": 3.365464782714844, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 0.9448275862068966, | |
| "grad_norm": 0.36600831151008606, | |
| "learning_rate": 1.1522048364153627e-05, | |
| "loss": 3.357099914550781, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 0.9517241379310345, | |
| "grad_norm": 0.32806214690208435, | |
| "learning_rate": 1.0099573257467996e-05, | |
| "loss": 3.3524654388427733, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 0.9586206896551724, | |
| "grad_norm": 0.34161072969436646, | |
| "learning_rate": 8.677098150782363e-06, | |
| "loss": 3.266713333129883, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 0.9655172413793104, | |
| "grad_norm": 0.3262627124786377, | |
| "learning_rate": 7.254623044096729e-06, | |
| "loss": 3.2161521911621094, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 0.9655172413793104, | |
| "eval_loss": 3.3177387714385986, | |
| "eval_runtime": 22.491, | |
| "eval_samples_per_second": 55.089, | |
| "eval_steps_per_second": 6.892, | |
| "step": 1400 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 1450, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 1, | |
| "save_steps": 100, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": false | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 477766995148800.0, | |
| "train_batch_size": 2, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |