Instructions to use SystemAdmin123/SmolLM-360M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SystemAdmin123/SmolLM-360M-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SystemAdmin123/SmolLM-360M-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SystemAdmin123/SmolLM-360M-Instruct")
model = AutoModelForCausalLM.from_pretrained("SystemAdmin123/SmolLM-360M-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SystemAdmin123/SmolLM-360M-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SystemAdmin123/SmolLM-360M-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SystemAdmin123/SmolLM-360M-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SystemAdmin123/SmolLM-360M-Instruct

SGLang

How to use SystemAdmin123/SmolLM-360M-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SystemAdmin123/SmolLM-360M-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SystemAdmin123/SmolLM-360M-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SystemAdmin123/SmolLM-360M-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SystemAdmin123/SmolLM-360M-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SystemAdmin123/SmolLM-360M-Instruct with Docker Model Runner:
```
docker model run hf.co/SystemAdmin123/SmolLM-360M-Instruct
```

SystemAdmin123 commited on Feb 4, 2025

Commit

609a60c

verified ·

1 Parent(s): e322f7b

Training in progress, step 800, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +299 -3

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dcb42890fd3e3733df15325188e71ea98cd125dad14aa982eb9d9229b15a8bdc
 size 723674912

 version https://git-lfs.github.com/spec/v1
+oid sha256:a4dc1cc1cc9b54bfe6d9ce46c6c48d5e71549cfb18e37c573da51b96a6b7c6fc
 size 723674912

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7aaaf6afb1371a4f19d7057b3f5ca8fca65fcc7a584da182d9f36fb7032085bb
 size 735625626

 version https://git-lfs.github.com/spec/v1
+oid sha256:62c75a7c6cde713dbd13598c54106d019899ffd4c0fae4ead85e640a976d6251
 size 735625626

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9feae33b2fec0a6229240e7adaee6ecc8f5cfdf1a8bd0e827b1d8a241424e3c0
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:3c431bcafebc4c8ee346d130e382b11c81be579ca0bfd3918fae07b16e10b92f
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a673aaf85c0fe6b6c29cb8f3e7dbd829eef637110e4ad9a775f3fcf001c92591
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:40b6b717644e21f80a22ec98694b3a2fd9d62a6467e549d64314725dba905d52
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.11837821840781296,
   "eval_steps": 200,
-  "global_step": 400,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -311,6 +311,302 @@
       "eval_samples_per_second": 63.11,
       "eval_steps_per_second": 15.798,
       "step": 400
     }
   ],
   "logging_steps": 10,
@@ -330,7 +626,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 6216909638860800.0,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.23675643681562591,
   "eval_steps": 200,
+  "global_step": 800,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_samples_per_second": 63.11,
       "eval_steps_per_second": 15.798,
       "step": 400
+    },
+    {
+      "epoch": 0.12133767386800828,
+      "grad_norm": 1.9296875,
+      "learning_rate": 0.00019297764858882514,
+      "loss": 2.4526,
+      "step": 410
+    },
+    {
+      "epoch": 0.12429712932820361,
+      "grad_norm": 3.765625,
+      "learning_rate": 0.00019248258232139388,
+      "loss": 2.2642,
+      "step": 420
+    },
+    {
+      "epoch": 0.12725658478839894,
+      "grad_norm": 4.25,
+      "learning_rate": 0.00019197133427991436,
+      "loss": 2.1835,
+      "step": 430
+    },
+    {
+      "epoch": 0.13021604024859426,
+      "grad_norm": 3.8125,
+      "learning_rate": 0.00019144399391799043,
+      "loss": 2.0152,
+      "step": 440
+    },
+    {
+      "epoch": 0.1331754957087896,
+      "grad_norm": 26.875,
+      "learning_rate": 0.00019090065350491626,
+      "loss": 2.4485,
+      "step": 450
+    },
+    {
+      "epoch": 0.1361349511689849,
+      "grad_norm": 2.046875,
+      "learning_rate": 0.0001903414081095315,
+      "loss": 2.3844,
+      "step": 460
+    },
+    {
+      "epoch": 0.13909440662918024,
+      "grad_norm": 2.265625,
+      "learning_rate": 0.00018976635558358722,
+      "loss": 2.1219,
+      "step": 470
+    },
+    {
+      "epoch": 0.14205386208937557,
+      "grad_norm": 3.9375,
+      "learning_rate": 0.00018917559654462474,
+      "loss": 2.1883,
+      "step": 480
+    },
+    {
+      "epoch": 0.1450133175495709,
+      "grad_norm": 5.1875,
+      "learning_rate": 0.00018856923435837022,
+      "loss": 2.1962,
+      "step": 490
+    },
+    {
+      "epoch": 0.1479727730097662,
+      "grad_norm": 19.625,
+      "learning_rate": 0.0001879473751206489,
+      "loss": 1.5941,
+      "step": 500
+    },
+    {
+      "epoch": 0.15093222846996152,
+      "grad_norm": 2.125,
+      "learning_rate": 0.00018731012763882133,
+      "loss": 2.4096,
+      "step": 510
+    },
+    {
+      "epoch": 0.15389168393015684,
+      "grad_norm": 2.828125,
+      "learning_rate": 0.00018665760341274505,
+      "loss": 2.1982,
+      "step": 520
+    },
+    {
+      "epoch": 0.15685113939035217,
+      "grad_norm": 3.796875,
+      "learning_rate": 0.00018598991661526572,
+      "loss": 2.28,
+      "step": 530
+    },
+    {
+      "epoch": 0.1598105948505475,
+      "grad_norm": 3.796875,
+      "learning_rate": 0.00018530718407223974,
+      "loss": 2.2208,
+      "step": 540
+    },
+    {
+      "epoch": 0.16277005031074282,
+      "grad_norm": 19.125,
+      "learning_rate": 0.00018460952524209355,
+      "loss": 2.0021,
+      "step": 550
+    },
+    {
+      "epoch": 0.16572950577093815,
+      "grad_norm": 2.875,
+      "learning_rate": 0.00018389706219492147,
+      "loss": 2.2805,
+      "step": 560
+    },
+    {
+      "epoch": 0.16868896123113347,
+      "grad_norm": 2.296875,
+      "learning_rate": 0.00018316991959112716,
+      "loss": 2.3686,
+      "step": 570
+    },
+    {
+      "epoch": 0.1716484166913288,
+      "grad_norm": 2.78125,
+      "learning_rate": 0.00018242822465961176,
+      "loss": 1.8721,
+      "step": 580
+    },
+    {
+      "epoch": 0.17460787215152412,
+      "grad_norm": 8.0,
+      "learning_rate": 0.00018167210717551224,
+      "loss": 2.078,
+      "step": 590
+    },
+    {
+      "epoch": 0.17756732761171945,
+      "grad_norm": 17.625,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 1.906,
+      "step": 600
+    },
+    {
+      "epoch": 0.17756732761171945,
+      "eval_loss": 2.1775083541870117,
+      "eval_runtime": 24.018,
+      "eval_samples_per_second": 62.536,
+      "eval_steps_per_second": 15.655,
+      "step": 600
+    },
+    {
+      "epoch": 0.18052678307191478,
+      "grad_norm": 2.734375,
+      "learning_rate": 0.00018011713624460608,
+      "loss": 2.0593,
+      "step": 610
+    },
+    {
+      "epoch": 0.1834862385321101,
+      "grad_norm": 2.203125,
+      "learning_rate": 0.00017931855487268782,
+      "loss": 2.0449,
+      "step": 620
+    },
+    {
+      "epoch": 0.18644569399230543,
+      "grad_norm": 3.0625,
+      "learning_rate": 0.0001785060950503568,
+      "loss": 2.4332,
+      "step": 630
+    },
+    {
+      "epoch": 0.18940514945250073,
+      "grad_norm": 10.5,
+      "learning_rate": 0.00017767989893455698,
+      "loss": 2.2297,
+      "step": 640
+    },
+    {
+      "epoch": 0.19236460491269605,
+      "grad_norm": 19.375,
+      "learning_rate": 0.00017684011108568592,
+      "loss": 2.2807,
+      "step": 650
+    },
+    {
+      "epoch": 0.19532406037289138,
+      "grad_norm": 2.125,
+      "learning_rate": 0.00017598687844230088,
+      "loss": 2.4388,
+      "step": 660
+    },
+    {
+      "epoch": 0.1982835158330867,
+      "grad_norm": 2.8125,
+      "learning_rate": 0.00017512035029540885,
+      "loss": 2.1782,
+      "step": 670
+    },
+    {
+      "epoch": 0.20124297129328203,
+      "grad_norm": 5.15625,
+      "learning_rate": 0.000174240678262345,
+      "loss": 2.2403,
+      "step": 680
+    },
+    {
+      "epoch": 0.20420242675347736,
+      "grad_norm": 6.28125,
+      "learning_rate": 0.000173348016260244,
+      "loss": 1.9472,
+      "step": 690
+    },
+    {
+      "epoch": 0.20716188221367268,
+      "grad_norm": 26.0,
+      "learning_rate": 0.00017244252047910892,
+      "loss": 1.854,
+      "step": 700
+    },
+    {
+      "epoch": 0.210121337673868,
+      "grad_norm": 2.140625,
+      "learning_rate": 0.00017152434935448256,
+      "loss": 2.215,
+      "step": 710
+    },
+    {
+      "epoch": 0.21308079313406333,
+      "grad_norm": 2.484375,
+      "learning_rate": 0.0001705936635397259,
+      "loss": 2.3141,
+      "step": 720
+    },
+    {
+      "epoch": 0.21604024859425866,
+      "grad_norm": 3.59375,
+      "learning_rate": 0.00016965062587790823,
+      "loss": 2.1083,
+      "step": 730
+    },
+    {
+      "epoch": 0.218999704054454,
+      "grad_norm": 2.71875,
+      "learning_rate": 0.00016869540137331445,
+      "loss": 1.9076,
+      "step": 740
+    },
+    {
+      "epoch": 0.2219591595146493,
+      "grad_norm": 9.75,
+      "learning_rate": 0.00016772815716257412,
+      "loss": 1.9295,
+      "step": 750
+    },
+    {
+      "epoch": 0.22491861497484464,
+      "grad_norm": 1.7890625,
+      "learning_rate": 0.00016674906248541726,
+      "loss": 2.3024,
+      "step": 760
+    },
+    {
+      "epoch": 0.22787807043503996,
+      "grad_norm": 3.765625,
+      "learning_rate": 0.00016575828865506245,
+      "loss": 2.1123,
+      "step": 770
+    },
+    {
+      "epoch": 0.2308375258952353,
+      "grad_norm": 4.15625,
+      "learning_rate": 0.0001647560090282419,
+      "loss": 2.0384,
+      "step": 780
+    },
+    {
+      "epoch": 0.2337969813554306,
+      "grad_norm": 6.5625,
+      "learning_rate": 0.000163742398974869,
+      "loss": 1.9151,
+      "step": 790
+    },
+    {
+      "epoch": 0.23675643681562591,
+      "grad_norm": 43.0,
+      "learning_rate": 0.0001627176358473537,
+      "loss": 2.1016,
+      "step": 800
+    },
+    {
+      "epoch": 0.23675643681562591,
+      "eval_loss": 2.1602160930633545,
+      "eval_runtime": 24.245,
+      "eval_samples_per_second": 61.951,
+      "eval_steps_per_second": 15.508,
+      "step": 800
     }
   ],
   "logging_steps": 10,
       "attributes": {}
     }
   },
+  "total_flos": 1.24028893790208e+16,
   "train_batch_size": 4,
   "trial_name": null,
   "trial_params": null