Instructions to use Azrail/smallm_70_instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Azrail/smallm_70_instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Azrail/smallm_70_instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70_instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Azrail/smallm_70_instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Azrail/smallm_70_instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Azrail/smallm_70_instruct

SGLang

How to use Azrail/smallm_70_instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Azrail/smallm_70_instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Azrail/smallm_70_instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Azrail/smallm_70_instruct with Docker Model Runner:
```
docker model run hf.co/Azrail/smallm_70_instruct
```

Azrail commited on Apr 16, 2025

Commit

00baf96

verified ·

1 Parent(s): d3159bb

Training in progress, step 6000, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +222 -4

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dcb3e56d4c71b4fe3907ac3f7a21f7c5b645b6f7b5077a46679eb62578db2183
 size 150625560

 version https://git-lfs.github.com/spec/v1
+oid sha256:c1caf9f1f88fe44200f0109ef94036e80d461ce19b24f4f0bd4876dbe777e923
 size 150625560

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5cb3c4740043f09a91cb2957024d48735e7f0fa83989925c2f015f5c9071410b
 size 602335994

 version https://git-lfs.github.com/spec/v1
+oid sha256:4068ffb4d3758e775c1a9defa029bef0ee2704e1a030885062d27923342c7485
 size 602335994

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:73c97fed542b1263f810594e9084ec5dd9fdff08a7e12c9f17ae2b74518f1304
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:fa90ad2b309f532962514f4faece20cc26bddf7f653b06dc572ffee5bcd113ac
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c4ce97b38b7fb778eb543562838e653bbb8adc096b47968a332aa5700d8c5ce
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:df3390e9a2d585410c5108433534a385ee7eeb522bbd6ab3b6fa3aad2ff13812
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 1.2068892677701164,
   "eval_steps": 500,
-  "global_step": 5000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -1098,11 +1098,229 @@
       "eval_steps_per_second": 20.417,
       "num_input_tokens_seen": 2414871648,
       "step": 5000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
-  "num_input_tokens_seen": 2414871648,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
@@ -1117,7 +1335,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 6.460017349872845e+17,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 1.4483003153431808,
   "eval_steps": 500,
+  "global_step": 6000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_steps_per_second": 20.417,
       "num_input_tokens_seen": 2414871648,
       "step": 5000
+    },
+    {
+      "epoch": 1.2189598201487695,
+      "grad_norm": 0.25,
+      "learning_rate": 4.34510336502188e-05,
+      "loss": 2.108,
+      "mean_token_accuracy": 0.5514175926893949,
+      "num_input_tokens_seen": 2438963872,
+      "num_tokens": 1028143121.0,
+      "step": 5050
+    },
+    {
+      "epoch": 1.2310303725274228,
+      "grad_norm": 0.2421875,
+      "learning_rate": 4.3262411347517734e-05,
+      "loss": 2.1066,
+      "mean_token_accuracy": 0.5526730781793594,
+      "num_input_tokens_seen": 2463130960,
+      "num_tokens": 1038274786.0,
+      "step": 5100
+    },
+    {
+      "epoch": 1.243100924906076,
+      "grad_norm": 0.2353515625,
+      "learning_rate": 4.307378904481666e-05,
+      "loss": 2.1011,
+      "mean_token_accuracy": 0.5543517142161727,
+      "num_input_tokens_seen": 2487402736,
+      "num_tokens": 1048479252.0,
+      "step": 5150
+    },
+    {
+      "epoch": 1.2551714772847293,
+      "grad_norm": 0.265625,
+      "learning_rate": 4.288516674211559e-05,
+      "loss": 2.1021,
+      "mean_token_accuracy": 0.5538267828151584,
+      "num_input_tokens_seen": 2511451728,
+      "num_tokens": 1058650745.0,
+      "step": 5200
+    },
+    {
+      "epoch": 1.2672420296633824,
+      "grad_norm": 0.30859375,
+      "learning_rate": 4.2696544439414524e-05,
+      "loss": 2.0863,
+      "mean_token_accuracy": 0.5557815081253648,
+      "num_input_tokens_seen": 2535548592,
+      "num_tokens": 1068882104.0,
+      "step": 5250
+    },
+    {
+      "epoch": 1.2793125820420357,
+      "grad_norm": 0.306640625,
+      "learning_rate": 4.250792213671345e-05,
+      "loss": 2.1063,
+      "mean_token_accuracy": 0.5531226889789105,
+      "num_input_tokens_seen": 2559719664,
+      "num_tokens": 1079065265.0,
+      "step": 5300
+    },
+    {
+      "epoch": 1.291383134420689,
+      "grad_norm": 0.263671875,
+      "learning_rate": 4.2319299834012374e-05,
+      "loss": 2.1104,
+      "mean_token_accuracy": 0.5524419481307268,
+      "num_input_tokens_seen": 2584073280,
+      "num_tokens": 1089341978.0,
+      "step": 5350
+    },
+    {
+      "epoch": 1.303453686799342,
+      "grad_norm": 0.244140625,
+      "learning_rate": 4.21306775313113e-05,
+      "loss": 2.1044,
+      "mean_token_accuracy": 0.5532321387529373,
+      "num_input_tokens_seen": 2608296624,
+      "num_tokens": 1099642346.0,
+      "step": 5400
+    },
+    {
+      "epoch": 1.3155242391779953,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 4.194205522861023e-05,
+      "loss": 2.1115,
+      "mean_token_accuracy": 0.5528482471778989,
+      "num_input_tokens_seen": 2632421856,
+      "num_tokens": 1109721280.0,
+      "step": 5450
+    },
+    {
+      "epoch": 1.3275947915566486,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 4.1753432925909163e-05,
+      "loss": 2.1009,
+      "num_input_tokens_seen": 2656567344,
+      "step": 5500
+    },
+    {
+      "epoch": 1.3275947915566486,
+      "eval_loss": 1.9779127836227417,
+      "eval_mean_token_accuracy": 0.5769637392206456,
+      "eval_num_tokens": 1119903809.0,
+      "eval_runtime": 131.3767,
+      "eval_samples_per_second": 81.537,
+      "eval_steps_per_second": 20.384,
+      "num_input_tokens_seen": 2656567344,
+      "step": 5500
+    },
+    {
+      "epoch": 1.339665343935302,
+      "grad_norm": 0.26171875,
+      "learning_rate": 4.156481062320809e-05,
+      "loss": 2.1059,
+      "mean_token_accuracy": 0.5533521883934737,
+      "num_input_tokens_seen": 2680728000,
+      "num_tokens": 1130022087.0,
+      "step": 5550
+    },
+    {
+      "epoch": 1.3517358963139552,
+      "grad_norm": 0.25390625,
+      "learning_rate": 4.137618832050702e-05,
+      "loss": 2.0992,
+      "mean_token_accuracy": 0.5542617355659604,
+      "num_input_tokens_seen": 2704833792,
+      "num_tokens": 1140249458.0,
+      "step": 5600
+    },
+    {
+      "epoch": 1.3638064486926083,
+      "grad_norm": 0.267578125,
+      "learning_rate": 4.1187566017805946e-05,
+      "loss": 2.0977,
+      "mean_token_accuracy": 0.5540939109772444,
+      "num_input_tokens_seen": 2729074544,
+      "num_tokens": 1150474886.0,
+      "step": 5650
+    },
+    {
+      "epoch": 1.3758770010712615,
+      "grad_norm": 0.294921875,
+      "learning_rate": 4.099894371510488e-05,
+      "loss": 2.0995,
+      "mean_token_accuracy": 0.553785107024014,
+      "num_input_tokens_seen": 2753196608,
+      "num_tokens": 1160634529.0,
+      "step": 5700
+    },
+    {
+      "epoch": 1.3879475534499148,
+      "grad_norm": 0.26171875,
+      "learning_rate": 4.081032141240381e-05,
+      "loss": 2.1066,
+      "mean_token_accuracy": 0.5523933649063111,
+      "num_input_tokens_seen": 2777300400,
+      "num_tokens": 1170864545.0,
+      "step": 5750
+    },
+    {
+      "epoch": 1.400018105828568,
+      "grad_norm": 0.291015625,
+      "learning_rate": 4.0621699109702735e-05,
+      "loss": 2.1023,
+      "mean_token_accuracy": 0.5536971531435847,
+      "num_input_tokens_seen": 2801426672,
+      "num_tokens": 1181051604.0,
+      "step": 5800
+    },
+    {
+      "epoch": 1.4120886582072212,
+      "grad_norm": 0.267578125,
+      "learning_rate": 4.043307680700166e-05,
+      "loss": 2.1042,
+      "mean_token_accuracy": 0.5537538637593389,
+      "num_input_tokens_seen": 2825621648,
+      "num_tokens": 1191210311.0,
+      "step": 5850
+    },
+    {
+      "epoch": 1.4241592105858745,
+      "grad_norm": 0.29296875,
+      "learning_rate": 4.0244454504300586e-05,
+      "loss": 2.1221,
+      "mean_token_accuracy": 0.5503192816674709,
+      "num_input_tokens_seen": 2849863744,
+      "num_tokens": 1201379955.0,
+      "step": 5900
+    },
+    {
+      "epoch": 1.4362297629645275,
+      "grad_norm": 0.30859375,
+      "learning_rate": 4.005583220159952e-05,
+      "loss": 2.0984,
+      "mean_token_accuracy": 0.5546167600527405,
+      "num_input_tokens_seen": 2874100544,
+      "num_tokens": 1211495441.0,
+      "step": 5950
+    },
+    {
+      "epoch": 1.4483003153431808,
+      "grad_norm": 0.267578125,
+      "learning_rate": 3.986720989889845e-05,
+      "loss": 2.0976,
+      "num_input_tokens_seen": 2898379392,
+      "step": 6000
+    },
+    {
+      "epoch": 1.4483003153431808,
+      "eval_loss": 1.9750181436538696,
+      "eval_mean_token_accuracy": 0.5774352134075336,
+      "eval_num_tokens": 1221766201.0,
+      "eval_runtime": 130.8087,
+      "eval_samples_per_second": 81.891,
+      "eval_steps_per_second": 20.473,
+      "num_input_tokens_seen": 2898379392,
+      "step": 6000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
+  "num_input_tokens_seen": 2898379392,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
       "attributes": {}
     }
   },
+  "total_flos": 7.753447755428659e+17,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null