Instructions to use lllqaq/SWE_Next_14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lllqaq/SWE_Next_14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lllqaq/SWE_Next_14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lllqaq/SWE_Next_14B")
model = AutoModelForCausalLM.from_pretrained("lllqaq/SWE_Next_14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lllqaq/SWE_Next_14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lllqaq/SWE_Next_14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lllqaq/SWE_Next_14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lllqaq/SWE_Next_14B

SGLang

How to use lllqaq/SWE_Next_14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lllqaq/SWE_Next_14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lllqaq/SWE_Next_14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lllqaq/SWE_Next_14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lllqaq/SWE_Next_14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lllqaq/SWE_Next_14B with Docker Model Runner:
```
docker model run hf.co/lllqaq/SWE_Next_14B
```

SWE_Next_14B / trainer_state.json

lllqaq

Upload SWE_Next_14B SFT model

bed2609 verified about 1 month ago

raw

history blame contribute delete

22.3 kB

	{
	"best_global_step": null,
	"best_metric": null,
	"best_model_checkpoint": null,
	"epoch": 2.0,
	"eval_steps": 500,
	"global_step": 1232,
	"is_hyper_param_search": false,
	"is_local_process_zero": true,
	"is_world_process_zero": true,
	"log_history": [
	{
	"epoch": 0.016233766233766232,
	"grad_norm": 64.73149845425546,
	"learning_rate": 7.258064516129033e-07,
	"loss": 3.3198,
	"step": 10
	},
	{
	"epoch": 0.032467532467532464,
	"grad_norm": 33.62473079554,
	"learning_rate": 1.5322580645161292e-06,
	"loss": 3.0328,
	"step": 20
	},
	{
	"epoch": 0.048701298701298704,
	"grad_norm": 10.429545095678938,
	"learning_rate": 2.338709677419355e-06,
	"loss": 1.8054,
	"step": 30
	},
	{
	"epoch": 0.06493506493506493,
	"grad_norm": 2.1555639207187185,
	"learning_rate": 3.145161290322581e-06,
	"loss": 1.2848,
	"step": 40
	},
	{
	"epoch": 0.08116883116883117,
	"grad_norm": 2.109927665115888,
	"learning_rate": 3.951612903225807e-06,
	"loss": 0.9412,
	"step": 50
	},
	{
	"epoch": 0.09740259740259741,
	"grad_norm": 3.835176262683339,
	"learning_rate": 4.758064516129033e-06,
	"loss": 0.9246,
	"step": 60
	},
	{
	"epoch": 0.11363636363636363,
	"grad_norm": 2.3665002147902174,
	"learning_rate": 5.564516129032258e-06,
	"loss": 0.6407,
	"step": 70
	},
	{
	"epoch": 0.12987012987012986,
	"grad_norm": 2.0758395514553736,
	"learning_rate": 6.370967741935485e-06,
	"loss": 0.4818,
	"step": 80
	},
	{
	"epoch": 0.1461038961038961,
	"grad_norm": 1.3197745285431102,
	"learning_rate": 7.177419354838711e-06,
	"loss": 0.3839,
	"step": 90
	},
	{
	"epoch": 0.16233766233766234,
	"grad_norm": 1.167725739422115,
	"learning_rate": 7.983870967741935e-06,
	"loss": 0.3852,
	"step": 100
	},
	{
	"epoch": 0.17857142857142858,
	"grad_norm": 1.4271468826660396,
	"learning_rate": 8.790322580645163e-06,
	"loss": 0.4254,
	"step": 110
	},
	{
	"epoch": 0.19480519480519481,
	"grad_norm": 1.3041783516757963,
	"learning_rate": 9.596774193548389e-06,
	"loss": 0.3735,
	"step": 120
	},
	{
	"epoch": 0.21103896103896103,
	"grad_norm": 1.6445704066341653,
	"learning_rate": 9.999497549864013e-06,
	"loss": 0.3139,
	"step": 130
	},
	{
	"epoch": 0.22727272727272727,
	"grad_norm": 2.26609564584476,
	"learning_rate": 9.995478554650548e-06,
	"loss": 0.3639,
	"step": 140
	},
	{
	"epoch": 0.2435064935064935,
	"grad_norm": 1.3241916296568497,
	"learning_rate": 9.987443795012786e-06,
	"loss": 0.3085,
	"step": 150
	},
	{
	"epoch": 0.2597402597402597,
	"grad_norm": 1.8795485094172018,
	"learning_rate": 9.975399729931894e-06,
	"loss": 0.3401,
	"step": 160
	},
	{
	"epoch": 0.275974025974026,
	"grad_norm": 1.2518399347916536,
	"learning_rate": 9.959356041388799e-06,
	"loss": 0.3232,
	"step": 170
	},
	{
	"epoch": 0.2922077922077922,
	"grad_norm": 1.3254250765286968,
	"learning_rate": 9.939325626581032e-06,
	"loss": 0.3167,
	"step": 180
	},
	{
	"epoch": 0.30844155844155846,
	"grad_norm": 1.3464203708686748,
	"learning_rate": 9.915324587554933e-06,
	"loss": 0.2968,
	"step": 190
	},
	{
	"epoch": 0.3246753246753247,
	"grad_norm": 0.9391334903099222,
	"learning_rate": 9.887372218261547e-06,
	"loss": 0.2933,
	"step": 200
	},
	{
	"epoch": 0.3409090909090909,
	"grad_norm": 1.3224980977395502,
	"learning_rate": 9.8554909890466e-06,
	"loss": 0.3062,
	"step": 210
	},
	{
	"epoch": 0.35714285714285715,
	"grad_norm": 2.727455179143666,
	"learning_rate": 9.819706528587036e-06,
	"loss": 0.2979,
	"step": 220
	},
	{
	"epoch": 0.37337662337662336,
	"grad_norm": 1.5019191545607462,
	"learning_rate": 9.780047603288656e-06,
	"loss": 0.2838,
	"step": 230
	},
	{
	"epoch": 0.38961038961038963,
	"grad_norm": 1.6879557721862124,
	"learning_rate": 9.736546094161375e-06,
	"loss": 0.2995,
	"step": 240
	},
	{
	"epoch": 0.40584415584415584,
	"grad_norm": 1.4809220185905931,
	"learning_rate": 9.689236971190715e-06,
	"loss": 0.2975,
	"step": 250
	},
	{
	"epoch": 0.42207792207792205,
	"grad_norm": 1.9452778336930967,
	"learning_rate": 9.638158265226155e-06,
	"loss": 0.2862,
	"step": 260
	},
	{
	"epoch": 0.4383116883116883,
	"grad_norm": 2.1580249714158084,
	"learning_rate": 9.583351037408886e-06,
	"loss": 0.2805,
	"step": 270
	},
	{
	"epoch": 0.45454545454545453,
	"grad_norm": 2.2778945637898516,
	"learning_rate": 9.52485934616359e-06,
	"loss": 0.2783,
	"step": 280
	},
	{
	"epoch": 0.4707792207792208,
	"grad_norm": 1.3731917008379857,
	"learning_rate": 9.46273021178077e-06,
	"loss": 0.2539,
	"step": 290
	},
	{
	"epoch": 0.487012987012987,
	"grad_norm": 0.9640448637905654,
	"learning_rate": 9.397013578618073e-06,
	"loss": 0.2732,
	"step": 300
	},
	{
	"epoch": 0.5032467532467533,
	"grad_norm": 1.879194030041846,
	"learning_rate": 9.327762274951042e-06,
	"loss": 0.2789,
	"step": 310
	},
	{
	"epoch": 0.5194805194805194,
	"grad_norm": 2.0274067648187875,
	"learning_rate": 9.255031970505518e-06,
	"loss": 0.2995,
	"step": 320
	},
	{
	"epoch": 0.5357142857142857,
	"grad_norm": 1.1439713018352973,
	"learning_rate": 9.178881131705882e-06,
	"loss": 0.2626,
	"step": 330
	},
	{
	"epoch": 0.551948051948052,
	"grad_norm": 1.217033674995871,
	"learning_rate": 9.099370974675074e-06,
	"loss": 0.2437,
	"step": 340
	},
	{
	"epoch": 0.5681818181818182,
	"grad_norm": 1.1243152390495403,
	"learning_rate": 9.016565416024181e-06,
	"loss": 0.2676,
	"step": 350
	},
	{
	"epoch": 0.5844155844155844,
	"grad_norm": 1.4806002470901694,
	"learning_rate": 8.930531021471167e-06,
	"loss": 0.2656,
	"step": 360
	},
	{
	"epoch": 0.6006493506493507,
	"grad_norm": 1.1288154615791715,
	"learning_rate": 8.841336952330033e-06,
	"loss": 0.2906,
	"step": 370
	},
	{
	"epoch": 0.6168831168831169,
	"grad_norm": 1.492508732730717,
	"learning_rate": 8.749054909913439e-06,
	"loss": 0.2576,
	"step": 380
	},
	{
	"epoch": 0.6331168831168831,
	"grad_norm": 1.1458749024369619,
	"learning_rate": 8.653759077893453e-06,
	"loss": 0.264,
	"step": 390
	},
	{
	"epoch": 0.6493506493506493,
	"grad_norm": 0.9392039887042416,
	"learning_rate": 8.555526062666803e-06,
	"loss": 0.2606,
	"step": 400
	},
	{
	"epoch": 0.6655844155844156,
	"grad_norm": 1.891912924932221,
	"learning_rate": 8.454434831772544e-06,
	"loss": 0.2685,
	"step": 410
	},
	{
	"epoch": 0.6818181818181818,
	"grad_norm": 1.313388756424559,
	"learning_rate": 8.350566650411633e-06,
	"loss": 0.2611,
	"step": 420
	},
	{
	"epoch": 0.698051948051948,
	"grad_norm": 1.0777638252230697,
	"learning_rate": 8.244005016119482e-06,
	"loss": 0.2475,
	"step": 430
	},
	{
	"epoch": 0.7142857142857143,
	"grad_norm": 1.0188317103197562,
	"learning_rate": 8.13483559164398e-06,
	"loss": 0.2855,
	"step": 440
	},
	{
	"epoch": 0.7305194805194806,
	"grad_norm": 0.8703924340642463,
	"learning_rate": 8.02314613608292e-06,
	"loss": 0.2518,
	"step": 450
	},
	{
	"epoch": 0.7467532467532467,
	"grad_norm": 1.0173301268541866,
	"learning_rate": 7.909026434336252e-06,
	"loss": 0.2696,
	"step": 460
	},
	{
	"epoch": 0.762987012987013,
	"grad_norm": 1.2218157167293324,
	"learning_rate": 7.792568224929797e-06,
	"loss": 0.2612,
	"step": 470
	},
	{
	"epoch": 0.7792207792207793,
	"grad_norm": 1.1796113115060929,
	"learning_rate": 7.673865126268506e-06,
	"loss": 0.2506,
	"step": 480
	},
	{
	"epoch": 0.7954545454545454,
	"grad_norm": 1.376127532504407,
	"learning_rate": 7.55301256137851e-06,
	"loss": 0.2459,
	"step": 490
	},
	{
	"epoch": 0.8116883116883117,
	"grad_norm": 0.8520527207533138,
	"learning_rate": 7.430107681198477e-06,
	"loss": 0.2296,
	"step": 500
	},
	{
	"epoch": 0.827922077922078,
	"grad_norm": 1.6057696993562458,
	"learning_rate": 7.305249286481928e-06,
	"loss": 0.2707,
	"step": 510
	},
	{
	"epoch": 0.8441558441558441,
	"grad_norm": 1.1118264521022472,
	"learning_rate": 7.1785377483733045e-06,
	"loss": 0.2453,
	"step": 520
	},
	{
	"epoch": 0.8603896103896104,
	"grad_norm": 0.986033743735511,
	"learning_rate": 7.050074927721639e-06,
	"loss": 0.2653,
	"step": 530
	},
	{
	"epoch": 0.8766233766233766,
	"grad_norm": 1.1047991137937976,
	"learning_rate": 6.9199640931966615e-06,
	"loss": 0.2401,
	"step": 540
	},
	{
	"epoch": 0.8928571428571429,
	"grad_norm": 0.8182265905464219,
	"learning_rate": 6.788309838273211e-06,
	"loss": 0.2453,
	"step": 550
	},
	{
	"epoch": 0.9090909090909091,
	"grad_norm": 1.0242844709669037,
	"learning_rate": 6.655217997150642e-06,
	"loss": 0.2562,
	"step": 560
	},
	{
	"epoch": 0.9253246753246753,
	"grad_norm": 1.7261090545065465,
	"learning_rate": 6.520795559674851e-06,
	"loss": 0.2618,
	"step": 570
	},
	{
	"epoch": 0.9415584415584416,
	"grad_norm": 1.166838903170947,
	"learning_rate": 6.385150585331299e-06,
	"loss": 0.2445,
	"step": 580
	},
	{
	"epoch": 0.9577922077922078,
	"grad_norm": 1.0102156282992951,
	"learning_rate": 6.248392116378167e-06,
	"loss": 0.2381,
	"step": 590
	},
	{
	"epoch": 0.974025974025974,
	"grad_norm": 1.2728489163523269,
	"learning_rate": 6.110630090189493e-06,
	"loss": 0.2495,
	"step": 600
	},
	{
	"epoch": 0.9902597402597403,
	"grad_norm": 1.0402345685218206,
	"learning_rate": 5.971975250878722e-06,
	"loss": 0.2607,
	"step": 610
	},
	{
	"epoch": 1.0064935064935066,
	"grad_norm": 0.9570526081816403,
	"learning_rate": 5.832539060273763e-06,
	"loss": 0.2594,
	"step": 620
	},
	{
	"epoch": 1.0227272727272727,
	"grad_norm": 0.8166533778024863,
	"learning_rate": 5.692433608315059e-06,
	"loss": 0.1734,
	"step": 630
	},
	{
	"epoch": 1.0389610389610389,
	"grad_norm": 0.8521218142738738,
	"learning_rate": 5.5517715229487554e-06,
	"loss": 0.1661,
	"step": 640
	},
	{
	"epoch": 1.0551948051948052,
	"grad_norm": 2.0344524322648225,
	"learning_rate": 5.410665879587366e-06,
	"loss": 0.1773,
	"step": 650
	},
	{
	"epoch": 1.0714285714285714,
	"grad_norm": 1.0760586346052228,
	"learning_rate": 5.269230110210725e-06,
	"loss": 0.1832,
	"step": 660
	},
	{
	"epoch": 1.0876623376623376,
	"grad_norm": 0.6221075887847101,
	"learning_rate": 5.127577912180312e-06,
	"loss": 0.171,
	"step": 670
	},
	{
	"epoch": 1.103896103896104,
	"grad_norm": 2.4289309131257037,
	"learning_rate": 4.9858231568402325e-06,
	"loss": 0.1869,
	"step": 680
	},
	{
	"epoch": 1.12012987012987,
	"grad_norm": 0.7117445693553235,
	"learning_rate": 4.844079797978345e-06,
	"loss": 0.1715,
	"step": 690
	},
	{
	"epoch": 1.1363636363636362,
	"grad_norm": 2.0791195295672558,
	"learning_rate": 4.7024617802211105e-06,
	"loss": 0.1918,
	"step": 700
	},
	{
	"epoch": 1.1525974025974026,
	"grad_norm": 1.3384619147891226,
	"learning_rate": 4.5610829474358056e-06,
	"loss": 0.1849,
	"step": 710
	},
	{
	"epoch": 1.1688311688311688,
	"grad_norm": 0.7431163309985721,
	"learning_rate": 4.420056951213726e-06,
	"loss": 0.1706,
	"step": 720
	},
	{
	"epoch": 1.1850649350649352,
	"grad_norm": 1.0917320780107198,
	"learning_rate": 4.279497159507984e-06,
	"loss": 0.1774,
	"step": 730
	},
	{
	"epoch": 1.2012987012987013,
	"grad_norm": 0.6433212768530576,
	"learning_rate": 4.139516565499277e-06,
	"loss": 0.1725,
	"step": 740
	},
	{
	"epoch": 1.2175324675324675,
	"grad_norm": 1.0367698466098962,
	"learning_rate": 4.000227696762967e-06,
	"loss": 0.2098,
	"step": 750
	},
	{
	"epoch": 1.2337662337662338,
	"grad_norm": 1.1673609045846502,
	"learning_rate": 3.861742524810421e-06,
	"loss": 0.1837,
	"step": 760
	},
	{
	"epoch": 1.25,
	"grad_norm": 1.036708817011231,
	"learning_rate": 3.7241723750773812e-06,
	"loss": 0.1819,
	"step": 770
	},
	{
	"epoch": 1.2662337662337662,
	"grad_norm": 2.085452988223766,
	"learning_rate": 3.587627837431679e-06,
	"loss": 0.168,
	"step": 780
	},
	{
	"epoch": 1.2824675324675325,
	"grad_norm": 1.1525296924575061,
	"learning_rate": 3.4522186772722915e-06,
	"loss": 0.1516,
	"step": 790
	},
	{
	"epoch": 1.2987012987012987,
	"grad_norm": 1.9400938633777363,
	"learning_rate": 3.3180537472911334e-06,
	"loss": 0.1749,
	"step": 800
	},
	{
	"epoch": 1.314935064935065,
	"grad_norm": 0.8648790732830897,
	"learning_rate": 3.185240899968587e-06,
	"loss": 0.1665,
	"step": 810
	},
	{
	"epoch": 1.3311688311688312,
	"grad_norm": 1.1576768419228283,
	"learning_rate": 3.053886900873062e-06,
	"loss": 0.1847,
	"step": 820
	},
	{
	"epoch": 1.3474025974025974,
	"grad_norm": 0.7801672735784758,
	"learning_rate": 2.9240973428343135e-06,
	"loss": 0.1852,
	"step": 830
	},
	{
	"epoch": 1.3636363636363638,
	"grad_norm": 0.6679216934793272,
	"learning_rate": 2.79597656105949e-06,
	"loss": 0.1622,
	"step": 840
	},
	{
	"epoch": 1.37987012987013,
	"grad_norm": 2.0257365743503892,
	"learning_rate": 2.6696275492601726e-06,
	"loss": 0.2013,
	"step": 850
	},
	{
	"epoch": 1.396103896103896,
	"grad_norm": 1.3270716986636226,
	"learning_rate": 2.545151876857803e-06,
	"loss": 0.1926,
	"step": 860
	},
	{
	"epoch": 1.4123376623376624,
	"grad_norm": 1.4591313868086215,
	"learning_rate": 2.422649607334083e-06,
	"loss": 0.1865,
	"step": 870
	},
	{
	"epoch": 1.4285714285714286,
	"grad_norm": 0.718475522026256,
	"learning_rate": 2.3022192177919465e-06,
	"loss": 0.1704,
	"step": 880
	},
	{
	"epoch": 1.4448051948051948,
	"grad_norm": 0.8779553819300654,
	"learning_rate": 2.1839575197918156e-06,
	"loss": 0.1704,
	"step": 890
	},
	{
	"epoch": 1.4610389610389611,
	"grad_norm": 0.8981166608440325,
	"learning_rate": 2.0679595815267395e-06,
	"loss": 0.1894,
	"step": 900
	},
	{
	"epoch": 1.4772727272727273,
	"grad_norm": 1.8819536746716359,
	"learning_rate": 1.954318651398977e-06,
	"loss": 0.1838,
	"step": 910
	},
	{
	"epoch": 1.4935064935064934,
	"grad_norm": 1.2305604028991983,
	"learning_rate": 1.8431260830595126e-06,
	"loss": 0.1667,
	"step": 920
	},
	{
	"epoch": 1.5097402597402598,
	"grad_norm": 0.8829569169119944,
	"learning_rate": 1.7344712619706772e-06,
	"loss": 0.1588,
	"step": 930
	},
	{
	"epoch": 1.525974025974026,
	"grad_norm": 0.7948818738193303,
	"learning_rate": 1.6284415335509879e-06,
	"loss": 0.1743,
	"step": 940
	},
	{
	"epoch": 1.5422077922077921,
	"grad_norm": 2.0055570853229376,
	"learning_rate": 1.525122132959933e-06,
	"loss": 0.2021,
	"step": 950
	},
	{
	"epoch": 1.5584415584415585,
	"grad_norm": 1.0617311431240737,
	"learning_rate": 1.4245961165791344e-06,
	"loss": 0.1842,
	"step": 960
	},
	{
	"epoch": 1.5746753246753247,
	"grad_norm": 1.1336171320027224,
	"learning_rate": 1.326944295245009e-06,
	"loss": 0.1679,
	"step": 970
	},
	{
	"epoch": 1.5909090909090908,
	"grad_norm": 0.8275801739941498,
	"learning_rate": 1.2322451692865617e-06,
	"loss": 0.1649,
	"step": 980
	},
	{
	"epoch": 1.6071428571428572,
	"grad_norm": 1.133730110796439,
	"learning_rate": 1.1405748654205566e-06,
	"loss": 0.1455,
	"step": 990
	},
	{
	"epoch": 1.6233766233766234,
	"grad_norm": 1.4124753406859976,
	"learning_rate": 1.052007075554789e-06,
	"loss": 0.178,
	"step": 1000
	},
	{
	"epoch": 1.6396103896103895,
	"grad_norm": 1.4093028955797295,
	"learning_rate": 9.666129975486394e-07,
	"loss": 0.1811,
	"step": 1010
	},
	{
	"epoch": 1.655844155844156,
	"grad_norm": 1.0258966054963048,
	"learning_rate": 8.844612779785583e-07,
	"loss": 0.1714,
	"step": 1020
	},
	{
	"epoch": 1.672077922077922,
	"grad_norm": 0.8268048059208093,
	"learning_rate": 8.056179569544642e-07,
	"loss": 0.1684,
	"step": 1030
	},
	{
	"epoch": 1.6883116883116882,
	"grad_norm": 1.1758327197059883,
	"learning_rate": 7.301464150314313e-07,
	"loss": 0.1578,
	"step": 1040
	},
	{
	"epoch": 1.7045454545454546,
	"grad_norm": 1.3848141640091782,
	"learning_rate": 6.581073222593442e-07,
	"loss": 0.1841,
	"step": 1050
	},
	{
	"epoch": 1.7207792207792207,
	"grad_norm": 1.6039492944313922,
	"learning_rate": 5.89558589411463e-07,
	"loss": 0.1711,
	"step": 1060
	},
	{
	"epoch": 1.737012987012987,
	"grad_norm": 0.9846284823379949,
	"learning_rate": 5.245553214311283e-07,
	"loss": 0.1839,
	"step": 1070
	},
	{
	"epoch": 1.7532467532467533,
	"grad_norm": 0.8126959112794907,
	"learning_rate": 4.6314977313400065e-07,
	"loss": 0.1937,
	"step": 1080
	},
	{
	"epoch": 1.7694805194805194,
	"grad_norm": 0.912662021903635,
	"learning_rate": 4.053913072014748e-07,
	"loss": 0.1858,
	"step": 1090
	},
	{
	"epoch": 1.7857142857142856,
	"grad_norm": 2.2200942828573904,
	"learning_rate": 3.513263544990153e-07,
	"loss": 0.1668,
	"step": 1100
	},
	{
	"epoch": 1.801948051948052,
	"grad_norm": 1.9601203935539797,
	"learning_rate": 3.0099837675131525e-07,
	"loss": 0.1825,
	"step": 1110
	},
	{
	"epoch": 1.8181818181818183,
	"grad_norm": 1.8341781108000885,
	"learning_rate": 2.5444783160429975e-07,
	"loss": 0.1628,
	"step": 1120
	},
	{
	"epoch": 1.8344155844155843,
	"grad_norm": 0.9358762565242741,
	"learning_rate": 2.1171214010203723e-07,
	"loss": 0.1309,
	"step": 1130
	},
	{
	"epoch": 1.8506493506493507,
	"grad_norm": 0.5034065948539621,
	"learning_rate": 1.7282565660471483e-07,
	"loss": 0.1579,
	"step": 1140
	},
	{
	"epoch": 1.866883116883117,
	"grad_norm": 1.2734857167293319,
	"learning_rate": 1.3781964117186743e-07,
	"loss": 0.1515,
	"step": 1150
	},
	{
	"epoch": 1.883116883116883,
	"grad_norm": 0.9554751930775003,
	"learning_rate": 1.0672223443304042e-07,
	"loss": 0.1615,
	"step": 1160
	},
	{
	"epoch": 1.8993506493506493,
	"grad_norm": 1.2542362308070503,
	"learning_rate": 7.955843496610882e-08,
	"loss": 0.1533,
	"step": 1170
	},
	{
	"epoch": 1.9155844155844157,
	"grad_norm": 1.0939753700107246,
	"learning_rate": 5.6350079201422655e-08,
	"loss": 0.1799,
	"step": 1180
	},
	{
	"epoch": 1.9318181818181817,
	"grad_norm": 0.853147045662764,
	"learning_rate": 3.711582386794421e-08,
	"loss": 0.1704,
	"step": 1190
	},
	{
	"epoch": 1.948051948051948,
	"grad_norm": 1.278737413341965,
	"learning_rate": 2.1871130995476665e-08,
	"loss": 0.1924,
	"step": 1200
	},
	{
	"epoch": 1.9642857142857144,
	"grad_norm": 2.050204404171476,
	"learning_rate": 1.0628255485052308e-08,
	"loss": 0.1678,
	"step": 1210
	},
	{
	"epoch": 1.9805194805194806,
	"grad_norm": 0.8404109096247463,
	"learning_rate": 3.396235257464575e-09,
	"loss": 0.1525,
	"step": 1220
	},
	{
	"epoch": 1.9967532467532467,
	"grad_norm": 0.7586755640395566,
	"learning_rate": 1.8088398786586525e-10,
	"loss": 0.1676,
	"step": 1230
	},
	{
	"epoch": 2.0,
	"step": 1232,
	"total_flos": 610243285417984.0,
	"train_loss": 0.3112858794266721,
	"train_runtime": 22231.3385,
	"train_samples_per_second": 0.332,
	"train_steps_per_second": 0.055
	}
	],
	"logging_steps": 10,
	"max_steps": 1232,
	"num_input_tokens_seen": 0,
	"num_train_epochs": 2,
	"save_steps": 10000,
	"stateful_callbacks": {
	"TrainerControl": {
	"args": {
	"should_epoch_stop": false,
	"should_evaluate": false,
	"should_log": false,
	"should_save": true,
	"should_training_stop": true
	},
	"attributes": {}
	}
	},
	"total_flos": 610243285417984.0,
	"train_batch_size": 1,
	"trial_name": null,
	"trial_params": null
	}