Instructions to use Azrail/smallm_70_instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Azrail/smallm_70_instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Azrail/smallm_70_instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70_instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Azrail/smallm_70_instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Azrail/smallm_70_instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Azrail/smallm_70_instruct

SGLang

How to use Azrail/smallm_70_instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Azrail/smallm_70_instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Azrail/smallm_70_instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Azrail/smallm_70_instruct with Docker Model Runner:
```
docker model run hf.co/Azrail/smallm_70_instruct
```

Azrail commited on Apr 16, 2025

Commit

12f5941

verified ·

1 Parent(s): 3193144

Training in progress, step 5000, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +222 -4

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:205a3991d60a5c28fb1c39f7dbf7a515c4fd4b6685d8240efe51c017acfa36b1
 size 150625560

 version https://git-lfs.github.com/spec/v1
+oid sha256:dcb3e56d4c71b4fe3907ac3f7a21f7c5b645b6f7b5077a46679eb62578db2183
 size 150625560

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1d50c4bfe654d987803fd6a6960587e83c15bda23187fdd2e49b310d524ae5ac
 size 602335994

 version https://git-lfs.github.com/spec/v1
+oid sha256:5cb3c4740043f09a91cb2957024d48735e7f0fa83989925c2f015f5c9071410b
 size 602335994

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a2fab474267dfdb6f9f735fba3b6956eaa8395da984c318144fab7c0aefa914f
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:73c97fed542b1263f810594e9084ec5dd9fdff08a7e12c9f17ae2b74518f1304
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1982f4530393fe0c872036d1fa81199d7b6bd002acf4619fae87ec9de696f64d
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:5c4ce97b38b7fb778eb543562838e653bbb8adc096b47968a332aa5700d8c5ce
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.9656441902922582,
   "eval_steps": 500,
-  "global_step": 4000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -880,11 +880,229 @@
       "eval_steps_per_second": 21.156,
       "num_input_tokens_seen": 1932223680,
       "step": 4000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
-  "num_input_tokens_seen": 1932223680,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
@@ -899,7 +1117,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 5.168886929031168e+17,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 1.2068892677701164,
   "eval_steps": 500,
+  "global_step": 5000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_steps_per_second": 21.156,
       "num_input_tokens_seen": 1932223680,
       "step": 4000
+    },
+    {
+      "epoch": 0.9777147426709115,
+      "grad_norm": 0.298828125,
+      "learning_rate": 4.722347970424023e-05,
+      "loss": 2.1476,
+      "mean_token_accuracy": 0.5491000188142061,
+      "num_input_tokens_seen": 1956345120,
+      "num_tokens": 824824558.0,
+      "step": 4050
+    },
+    {
+      "epoch": 0.9897852950495647,
+      "grad_norm": 0.2890625,
+      "learning_rate": 4.703485740153916e-05,
+      "loss": 2.1336,
+      "mean_token_accuracy": 0.5505272497236728,
+      "num_input_tokens_seen": 1980539728,
+      "num_tokens": 835004823.0,
+      "step": 4100
+    },
+    {
+      "epoch": 1.0016898773330114,
+      "grad_norm": 0.2890625,
+      "learning_rate": 4.684623509883809e-05,
+      "loss": 2.1376,
+      "mean_token_accuracy": 0.5500758526367531,
+      "num_input_tokens_seen": 2004388912,
+      "num_tokens": 844972763.0,
+      "step": 4150
+    },
+    {
+      "epoch": 1.0137604297116647,
+      "grad_norm": 0.275390625,
+      "learning_rate": 4.665761279613702e-05,
+      "loss": 2.1349,
+      "mean_token_accuracy": 0.5500743924826383,
+      "num_input_tokens_seen": 2028622064,
+      "num_tokens": 855126584.0,
+      "step": 4200
+    },
+    {
+      "epoch": 1.025830982090318,
+      "grad_norm": 0.283203125,
+      "learning_rate": 4.646899049343595e-05,
+      "loss": 2.1248,
+      "mean_token_accuracy": 0.5514456473290921,
+      "num_input_tokens_seen": 2052718336,
+      "num_tokens": 865332386.0,
+      "step": 4250
+    },
+    {
+      "epoch": 1.037901534468971,
+      "grad_norm": 0.28125,
+      "learning_rate": 4.6280368190734876e-05,
+      "loss": 2.1088,
+      "mean_token_accuracy": 0.5532256289571523,
+      "num_input_tokens_seen": 2076571680,
+      "num_tokens": 875448332.0,
+      "step": 4300
+    },
+    {
+      "epoch": 1.0499720868476243,
+      "grad_norm": 0.326171875,
+      "learning_rate": 4.60917458880338e-05,
+      "loss": 2.1184,
+      "mean_token_accuracy": 0.5509732039645314,
+      "num_input_tokens_seen": 2100726912,
+      "num_tokens": 885623694.0,
+      "step": 4350
+    },
+    {
+      "epoch": 1.0620426392262776,
+      "grad_norm": 0.310546875,
+      "learning_rate": 4.590312358533273e-05,
+      "loss": 2.1324,
+      "mean_token_accuracy": 0.5498137963563203,
+      "num_input_tokens_seen": 2124980016,
+      "num_tokens": 895774422.0,
+      "step": 4400
+    },
+    {
+      "epoch": 1.074113191604931,
+      "grad_norm": 0.32421875,
+      "learning_rate": 4.571450128263166e-05,
+      "loss": 2.1195,
+      "mean_token_accuracy": 0.551388250514865,
+      "num_input_tokens_seen": 2149237504,
+      "num_tokens": 905968753.0,
+      "step": 4450
+    },
+    {
+      "epoch": 1.086183743983584,
+      "grad_norm": 0.296875,
+      "learning_rate": 4.552587897993059e-05,
+      "loss": 2.1195,
+      "num_input_tokens_seen": 2173337456,
+      "step": 4500
+    },
+    {
+      "epoch": 1.086183743983584,
+      "eval_loss": 1.989871859550476,
+      "eval_mean_token_accuracy": 0.5754866465826013,
+      "eval_num_tokens": 916079112.0,
+      "eval_runtime": 128.4454,
+      "eval_samples_per_second": 83.397,
+      "eval_steps_per_second": 20.849,
+      "num_input_tokens_seen": 2173337456,
+      "step": 4500
+    },
+    {
+      "epoch": 1.0982542963622373,
+      "grad_norm": 0.287109375,
+      "learning_rate": 4.5337256677229516e-05,
+      "loss": 2.1218,
+      "mean_token_accuracy": 0.5514280049689114,
+      "num_input_tokens_seen": 2197505712,
+      "num_tokens": 926254107.0,
+      "step": 4550
+    },
+    {
+      "epoch": 1.1103248487408905,
+      "grad_norm": 0.291015625,
+      "learning_rate": 4.514863437452845e-05,
+      "loss": 2.1132,
+      "mean_token_accuracy": 0.5515907733514905,
+      "num_input_tokens_seen": 2221716688,
+      "num_tokens": 936449430.0,
+      "step": 4600
+    },
+    {
+      "epoch": 1.1223954011195438,
+      "grad_norm": 0.296875,
+      "learning_rate": 4.4960012071827373e-05,
+      "loss": 2.1142,
+      "mean_token_accuracy": 0.5520639397203922,
+      "num_input_tokens_seen": 2245565536,
+      "num_tokens": 946528658.0,
+      "step": 4650
+    },
+    {
+      "epoch": 1.134465953498197,
+      "grad_norm": 0.2734375,
+      "learning_rate": 4.4771389769126305e-05,
+      "loss": 2.1275,
+      "mean_token_accuracy": 0.5497148666903376,
+      "num_input_tokens_seen": 2269696864,
+      "num_tokens": 956594209.0,
+      "step": 4700
+    },
+    {
+      "epoch": 1.1465365058768502,
+      "grad_norm": 0.279296875,
+      "learning_rate": 4.458276746642524e-05,
+      "loss": 2.1065,
+      "mean_token_accuracy": 0.5532364987954498,
+      "num_input_tokens_seen": 2293845360,
+      "num_tokens": 966701814.0,
+      "step": 4750
+    },
+    {
+      "epoch": 1.1586070582555035,
+      "grad_norm": 0.259765625,
+      "learning_rate": 4.439414516372416e-05,
+      "loss": 2.1133,
+      "mean_token_accuracy": 0.5517958915606141,
+      "num_input_tokens_seen": 2318062016,
+      "num_tokens": 976956133.0,
+      "step": 4800
+    },
+    {
+      "epoch": 1.1706776106341565,
+      "grad_norm": 0.314453125,
+      "learning_rate": 4.420552286102309e-05,
+      "loss": 2.1083,
+      "mean_token_accuracy": 0.5527382261306047,
+      "num_input_tokens_seen": 2342152464,
+      "num_tokens": 987113621.0,
+      "step": 4850
+    },
+    {
+      "epoch": 1.1827481630128098,
+      "grad_norm": 0.26953125,
+      "learning_rate": 4.401690055832201e-05,
+      "loss": 2.1084,
+      "mean_token_accuracy": 0.5531642048805953,
+      "num_input_tokens_seen": 2366342016,
+      "num_tokens": 997304128.0,
+      "step": 4900
+    },
+    {
+      "epoch": 1.1948187153914631,
+      "grad_norm": 0.263671875,
+      "learning_rate": 4.3828278255620945e-05,
+      "loss": 2.1129,
+      "mean_token_accuracy": 0.5526808862015605,
+      "num_input_tokens_seen": 2390580560,
+      "num_tokens": 1007600120.0,
+      "step": 4950
+    },
+    {
+      "epoch": 1.2068892677701164,
+      "grad_norm": 0.271484375,
+      "learning_rate": 4.363965595291988e-05,
+      "loss": 2.1136,
+      "num_input_tokens_seen": 2414871648,
+      "step": 5000
+    },
+    {
+      "epoch": 1.2068892677701164,
+      "eval_loss": 1.9823503494262695,
+      "eval_mean_token_accuracy": 0.5763351263685668,
+      "eval_num_tokens": 1017920689.0,
+      "eval_runtime": 131.1681,
+      "eval_samples_per_second": 81.666,
+      "eval_steps_per_second": 20.417,
+      "num_input_tokens_seen": 2414871648,
+      "step": 5000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
+  "num_input_tokens_seen": 2414871648,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
       "attributes": {}
     }
   },
+  "total_flos": 6.460017349872845e+17,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null