Instructions to use Azrail/smallm_70_instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Azrail/smallm_70_instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Azrail/smallm_70_instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70_instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Azrail/smallm_70_instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Azrail/smallm_70_instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Azrail/smallm_70_instruct

SGLang

How to use Azrail/smallm_70_instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Azrail/smallm_70_instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Azrail/smallm_70_instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Azrail/smallm_70_instruct with Docker Model Runner:
```
docker model run hf.co/Azrail/smallm_70_instruct
```

Azrail commited on Apr 16, 2025

Commit

2992939

verified ·

1 Parent(s): a786784

Training in progress, step 15000, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +222 -4

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0579f8b01bd92a4b6d4d9542187f9f6be5d525493ee4cacf89313462b0d4fc29
 size 150625560

 version https://git-lfs.github.com/spec/v1
+oid sha256:15f637ff72e852c00df336464cba31267a78c2fec942618a4cf3dbc081150cb8
 size 150625560

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d1ac4e5f1a091d05231fad9fd4f9941afbf6737a4f9256414d7439dd21637791
 size 602335994

 version https://git-lfs.github.com/spec/v1
+oid sha256:11255a9366d03d2ecf115313602ca401e81860858d4e1ecad341feef41b0e95b
 size 602335994

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dac96a69b6625532fa7a1849a782b63a79e8d1b28e764bc8297e354d748f16c9
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:ea828a56e17bf773dc8e4fa2c22d13619b805c6b9321028dd494ff57e5daf8e6
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:081dc59c3c452b8ce89bfce5eae0952bf765aeed7903bbba40be0fb195d20006
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:1819c72414dd202fc7a5b387187559436ac1d66f4c4de3f13c18065ffbdf0216
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 3.3792567557372846,
   "eval_steps": 500,
-  "global_step": 14000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -3060,11 +3060,229 @@
       "eval_steps_per_second": 20.347,
       "num_input_tokens_seen": 6763271617,
       "step": 14000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
-  "num_input_tokens_seen": 6763271617,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
@@ -3079,7 +3297,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.809241167078482e+18,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 3.620667803310349,
   "eval_steps": 500,
+  "global_step": 15000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_steps_per_second": 20.347,
       "num_input_tokens_seen": 6763271617,
       "step": 14000
+    },
+    {
+      "epoch": 3.3913273081159376,
+      "grad_norm": 0.25,
+      "learning_rate": 9.499019164025955e-06,
+      "loss": 2.0975,
+      "mean_token_accuracy": 0.5536782286874949,
+      "num_input_tokens_seen": 6787391841,
+      "num_tokens": 2860344977.0,
+      "step": 14050
+    },
+    {
+      "epoch": 3.4033978604945907,
+      "grad_norm": 0.25,
+      "learning_rate": 9.310396861324884e-06,
+      "loss": 2.1022,
+      "mean_token_accuracy": 0.5538700968772173,
+      "num_input_tokens_seen": 6811630961,
+      "num_tokens": 2870526506.0,
+      "step": 14100
+    },
+    {
+      "epoch": 3.415468412873244,
+      "grad_norm": 0.2431640625,
+      "learning_rate": 9.121774558623813e-06,
+      "loss": 2.0934,
+      "mean_token_accuracy": 0.5550757900252938,
+      "num_input_tokens_seen": 6835811825,
+      "num_tokens": 2880722898.0,
+      "step": 14150
+    },
+    {
+      "epoch": 3.4275389652518973,
+      "grad_norm": 0.2578125,
+      "learning_rate": 8.93315225592274e-06,
+      "loss": 2.0875,
+      "mean_token_accuracy": 0.5558257311582565,
+      "num_input_tokens_seen": 6859918049,
+      "num_tokens": 2890925546.0,
+      "step": 14200
+    },
+    {
+      "epoch": 3.439609517630551,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 8.74452995322167e-06,
+      "loss": 2.0969,
+      "mean_token_accuracy": 0.5544555878639221,
+      "num_input_tokens_seen": 6883973409,
+      "num_tokens": 2901007248.0,
+      "step": 14250
+    },
+    {
+      "epoch": 3.451680070009204,
+      "grad_norm": 0.25390625,
+      "learning_rate": 8.555907650520598e-06,
+      "loss": 2.0987,
+      "mean_token_accuracy": 0.5544828617200256,
+      "num_input_tokens_seen": 6908263985,
+      "num_tokens": 2911355124.0,
+      "step": 14300
+    },
+    {
+      "epoch": 3.463750622387857,
+      "grad_norm": 0.271484375,
+      "learning_rate": 8.367285347819527e-06,
+      "loss": 2.0889,
+      "mean_token_accuracy": 0.5557316156104207,
+      "num_input_tokens_seen": 6932344993,
+      "num_tokens": 2921442830.0,
+      "step": 14350
+    },
+    {
+      "epoch": 3.4758211747665104,
+      "grad_norm": 0.255859375,
+      "learning_rate": 8.178663045118456e-06,
+      "loss": 2.0979,
+      "mean_token_accuracy": 0.5547628674656153,
+      "num_input_tokens_seen": 6956417041,
+      "num_tokens": 2931461417.0,
+      "step": 14400
+    },
+    {
+      "epoch": 3.4878917271451635,
+      "grad_norm": 0.234375,
+      "learning_rate": 7.990040742417383e-06,
+      "loss": 2.1005,
+      "mean_token_accuracy": 0.5539160283654928,
+      "num_input_tokens_seen": 6980421889,
+      "num_tokens": 2941531928.0,
+      "step": 14450
+    },
+    {
+      "epoch": 3.4999622795238166,
+      "grad_norm": 0.275390625,
+      "learning_rate": 7.801418439716313e-06,
+      "loss": 2.1017,
+      "num_input_tokens_seen": 7004552193,
+      "step": 14500
+    },
+    {
+      "epoch": 3.4999622795238166,
+      "eval_loss": 1.9681284427642822,
+      "eval_mean_token_accuracy": 0.5785388401566912,
+      "eval_num_tokens": 2951727207.0,
+      "eval_runtime": 131.2276,
+      "eval_samples_per_second": 81.629,
+      "eval_steps_per_second": 20.407,
+      "num_input_tokens_seen": 7004552193,
+      "step": 14500
+    },
+    {
+      "epoch": 3.51203283190247,
+      "grad_norm": 0.267578125,
+      "learning_rate": 7.612796137015241e-06,
+      "loss": 2.09,
+      "mean_token_accuracy": 0.5543626462481916,
+      "num_input_tokens_seen": 7028775953,
+      "num_tokens": 2961945579.0,
+      "step": 14550
+    },
+    {
+      "epoch": 3.524103384281123,
+      "grad_norm": 0.26171875,
+      "learning_rate": 7.42417383431417e-06,
+      "loss": 2.0978,
+      "mean_token_accuracy": 0.5544422981515527,
+      "num_input_tokens_seen": 7052883457,
+      "num_tokens": 2972173798.0,
+      "step": 14600
+    },
+    {
+      "epoch": 3.536173936659776,
+      "grad_norm": 0.251953125,
+      "learning_rate": 7.235551531613098e-06,
+      "loss": 2.0915,
+      "mean_token_accuracy": 0.5559014651551842,
+      "num_input_tokens_seen": 7077135185,
+      "num_tokens": 2982315453.0,
+      "step": 14650
+    },
+    {
+      "epoch": 3.5482444890384297,
+      "grad_norm": 0.310546875,
+      "learning_rate": 7.0469292289120274e-06,
+      "loss": 2.0932,
+      "mean_token_accuracy": 0.5552764968574047,
+      "num_input_tokens_seen": 7101260305,
+      "num_tokens": 2992557355.0,
+      "step": 14700
+    },
+    {
+      "epoch": 3.5603150414170828,
+      "grad_norm": 0.25390625,
+      "learning_rate": 6.858306926210955e-06,
+      "loss": 2.0959,
+      "mean_token_accuracy": 0.555088207796216,
+      "num_input_tokens_seen": 7125198545,
+      "num_tokens": 3002657117.0,
+      "step": 14750
+    },
+    {
+      "epoch": 3.572385593795736,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 6.669684623509884e-06,
+      "loss": 2.0933,
+      "mean_token_accuracy": 0.5554985254630447,
+      "num_input_tokens_seen": 7149297905,
+      "num_tokens": 3012818977.0,
+      "step": 14800
+    },
+    {
+      "epoch": 3.5844561461743893,
+      "grad_norm": 0.23828125,
+      "learning_rate": 6.481062320808813e-06,
+      "loss": 2.0901,
+      "mean_token_accuracy": 0.5556722393259406,
+      "num_input_tokens_seen": 7173408417,
+      "num_tokens": 3022993500.0,
+      "step": 14850
+    },
+    {
+      "epoch": 3.5965266985530424,
+      "grad_norm": 0.279296875,
+      "learning_rate": 6.292440018107741e-06,
+      "loss": 2.0862,
+      "mean_token_accuracy": 0.5560053834319114,
+      "num_input_tokens_seen": 7197689201,
+      "num_tokens": 3033239485.0,
+      "step": 14900
+    },
+    {
+      "epoch": 3.608597250931696,
+      "grad_norm": 0.265625,
+      "learning_rate": 6.10381771540667e-06,
+      "loss": 2.093,
+      "mean_token_accuracy": 0.5550377672165632,
+      "num_input_tokens_seen": 7221805553,
+      "num_tokens": 3043356374.0,
+      "step": 14950
+    },
+    {
+      "epoch": 3.620667803310349,
+      "grad_norm": 0.24609375,
+      "learning_rate": 5.915195412705598e-06,
+      "loss": 2.0994,
+      "num_input_tokens_seen": 7245951473,
+      "step": 15000
+    },
+    {
+      "epoch": 3.620667803310349,
+      "eval_loss": 1.9680702686309814,
+      "eval_mean_token_accuracy": 0.5785124528710748,
+      "eval_num_tokens": 3053564564.0,
+      "eval_runtime": 130.6855,
+      "eval_samples_per_second": 81.968,
+      "eval_steps_per_second": 20.492,
+      "num_input_tokens_seen": 7245951473,
+      "step": 15000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
+  "num_input_tokens_seen": 7245951473,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
       "attributes": {}
     }
   },
+  "total_flos": 1.9383627395138765e+18,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null