Instructions to use Azrail/smallm_70_instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Azrail/smallm_70_instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Azrail/smallm_70_instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70_instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Azrail/smallm_70_instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Azrail/smallm_70_instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Azrail/smallm_70_instruct

SGLang

How to use Azrail/smallm_70_instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Azrail/smallm_70_instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Azrail/smallm_70_instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Azrail/smallm_70_instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Azrail/smallm_70_instruct with Docker Model Runner:
```
docker model run hf.co/Azrail/smallm_70_instruct
```

Azrail commited on Apr 16, 2025

Commit

43dcfec

verified ·

1 Parent(s): 2529d88

Training in progress, step 11000, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +222 -4

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:593df4add94d8349a8e2c27dd6a4c8e410dc62c59535de38e2c844bae1bf9105
 size 150625560

 version https://git-lfs.github.com/spec/v1
+oid sha256:fd378ab1a42d536af5db20740b7c6ba4c863b9ff3eeb07dfe7d4b811a689ab5f
 size 150625560

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7ca220deb73713912b17a381232ea629f59c26aebf972823900e92efe4bee200
 size 602335994

 version https://git-lfs.github.com/spec/v1
+oid sha256:2d5f198053dbdfaa8c376a0fdaef1cec44750b494d68cc275559bc743db6f9c6
 size 602335994

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5148f4a0429b56039088b4393cfcab680c3af25b037593fe69f3727d64615009
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe27678952c245c0bb175fc5ebd37cf8ebfcd743a407e4957181be4dbbc6146b
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d15ebff9b6275f35ed91d179fc6aa0df6144af185e5ca68cd213907d032111d8
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:c68adf80be0bee4802e3498e2b20587f8f5db858b7307e722582d5bdeff1cda7
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 2.413778535540233,
   "eval_steps": 500,
-  "global_step": 10000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -2188,11 +2188,229 @@
       "eval_steps_per_second": 20.488,
       "num_input_tokens_seen": 4830743425,
       "step": 10000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
-  "num_input_tokens_seen": 4830743425,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
@@ -2207,7 +2425,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.292271014243328e+18,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 2.6551895831132972,
   "eval_steps": 500,
+  "global_step": 11000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_steps_per_second": 20.488,
       "num_input_tokens_seen": 4830743425,
       "step": 10000
+    },
+    {
+      "epoch": 2.425849087918886,
+      "grad_norm": 0.263671875,
+      "learning_rate": 2.4588803380111665e-05,
+      "loss": 2.102,
+      "mean_token_accuracy": 0.5541372266598046,
+      "num_input_tokens_seen": 4855099809,
+      "num_tokens": 2046084283.0,
+      "step": 10050
+    },
+    {
+      "epoch": 2.437919640297539,
+      "grad_norm": 0.26171875,
+      "learning_rate": 2.4400181077410594e-05,
+      "loss": 2.0991,
+      "mean_token_accuracy": 0.5542299181595445,
+      "num_input_tokens_seen": 4879214129,
+      "num_tokens": 2056326330.0,
+      "step": 10100
+    },
+    {
+      "epoch": 2.4499901926761924,
+      "grad_norm": 0.25390625,
+      "learning_rate": 2.4211558774709522e-05,
+      "loss": 2.0834,
+      "mean_token_accuracy": 0.5564426334574819,
+      "num_input_tokens_seen": 4903399553,
+      "num_tokens": 2066490013.0,
+      "step": 10150
+    },
+    {
+      "epoch": 2.4620607450548455,
+      "grad_norm": 0.263671875,
+      "learning_rate": 2.402293647200845e-05,
+      "loss": 2.098,
+      "mean_token_accuracy": 0.5545364746823906,
+      "num_input_tokens_seen": 4927492609,
+      "num_tokens": 2076526539.0,
+      "step": 10200
+    },
+    {
+      "epoch": 2.474131297433499,
+      "grad_norm": 0.23828125,
+      "learning_rate": 2.383431416930738e-05,
+      "loss": 2.0885,
+      "mean_token_accuracy": 0.555601441822946,
+      "num_input_tokens_seen": 4951732929,
+      "num_tokens": 2086768431.0,
+      "step": 10250
+    },
+    {
+      "epoch": 2.486201849812152,
+      "grad_norm": 0.255859375,
+      "learning_rate": 2.3645691866606308e-05,
+      "loss": 2.0909,
+      "mean_token_accuracy": 0.5558399046584964,
+      "num_input_tokens_seen": 4975948097,
+      "num_tokens": 2096961030.0,
+      "step": 10300
+    },
+    {
+      "epoch": 2.498272402190805,
+      "grad_norm": 0.326171875,
+      "learning_rate": 2.3457069563905237e-05,
+      "loss": 2.0906,
+      "mean_token_accuracy": 0.5556136939302087,
+      "num_input_tokens_seen": 5000143905,
+      "num_tokens": 2107303887.0,
+      "step": 10350
+    },
+    {
+      "epoch": 2.5103429545694587,
+      "grad_norm": 0.267578125,
+      "learning_rate": 2.3268447261204166e-05,
+      "loss": 2.0976,
+      "mean_token_accuracy": 0.5541230865567922,
+      "num_input_tokens_seen": 5024212113,
+      "num_tokens": 2117576166.0,
+      "step": 10400
+    },
+    {
+      "epoch": 2.5224135069481117,
+      "grad_norm": 0.29296875,
+      "learning_rate": 2.3079824958503094e-05,
+      "loss": 2.0935,
+      "mean_token_accuracy": 0.5555445018038153,
+      "num_input_tokens_seen": 5048313681,
+      "num_tokens": 2127734721.0,
+      "step": 10450
+    },
+    {
+      "epoch": 2.534484059326765,
+      "grad_norm": 0.2421875,
+      "learning_rate": 2.2891202655802023e-05,
+      "loss": 2.0982,
+      "num_input_tokens_seen": 5072508817,
+      "step": 10500
+    },
+    {
+      "epoch": 2.534484059326765,
+      "eval_loss": 1.9683516025543213,
+      "eval_mean_token_accuracy": 0.5784807712440619,
+      "eval_num_tokens": 2137987548.0,
+      "eval_runtime": 130.4075,
+      "eval_samples_per_second": 82.143,
+      "eval_steps_per_second": 20.536,
+      "num_input_tokens_seen": 5072508817,
+      "step": 10500
+    },
+    {
+      "epoch": 2.5465546117054183,
+      "grad_norm": 0.267578125,
+      "learning_rate": 2.270258035310095e-05,
+      "loss": 2.0924,
+      "mean_token_accuracy": 0.5551841219887137,
+      "num_input_tokens_seen": 5096586577,
+      "num_tokens": 2148155987.0,
+      "step": 10550
+    },
+    {
+      "epoch": 2.5586251640840714,
+      "grad_norm": 0.2734375,
+      "learning_rate": 2.251395805039988e-05,
+      "loss": 2.0982,
+      "mean_token_accuracy": 0.5541262343525887,
+      "num_input_tokens_seen": 5120875729,
+      "num_tokens": 2158352820.0,
+      "step": 10600
+    },
+    {
+      "epoch": 2.5706957164627244,
+      "grad_norm": 0.251953125,
+      "learning_rate": 2.232533574769881e-05,
+      "loss": 2.0908,
+      "mean_token_accuracy": 0.5560182608664036,
+      "num_input_tokens_seen": 5145050353,
+      "num_tokens": 2168407807.0,
+      "step": 10650
+    },
+    {
+      "epoch": 2.582766268841378,
+      "grad_norm": 0.2734375,
+      "learning_rate": 2.2136713444997737e-05,
+      "loss": 2.0958,
+      "mean_token_accuracy": 0.5551287305355072,
+      "num_input_tokens_seen": 5169266849,
+      "num_tokens": 2178592858.0,
+      "step": 10700
+    },
+    {
+      "epoch": 2.594836821220031,
+      "grad_norm": 0.2451171875,
+      "learning_rate": 2.1948091142296666e-05,
+      "loss": 2.0904,
+      "mean_token_accuracy": 0.5559819753468037,
+      "num_input_tokens_seen": 5193472705,
+      "num_tokens": 2188792925.0,
+      "step": 10750
+    },
+    {
+      "epoch": 2.606907373598684,
+      "grad_norm": 0.2578125,
+      "learning_rate": 2.1759468839595595e-05,
+      "loss": 2.1003,
+      "mean_token_accuracy": 0.5538398388028145,
+      "num_input_tokens_seen": 5217541665,
+      "num_tokens": 2199063266.0,
+      "step": 10800
+    },
+    {
+      "epoch": 2.6189779259773376,
+      "grad_norm": 0.2578125,
+      "learning_rate": 2.1570846536894523e-05,
+      "loss": 2.0996,
+      "mean_token_accuracy": 0.5539121518284083,
+      "num_input_tokens_seen": 5241669153,
+      "num_tokens": 2209236507.0,
+      "step": 10850
+    },
+    {
+      "epoch": 2.6310484783559906,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 2.1382224234193452e-05,
+      "loss": 2.0898,
+      "mean_token_accuracy": 0.5560731103271246,
+      "num_input_tokens_seen": 5265851553,
+      "num_tokens": 2219375503.0,
+      "step": 10900
+    },
+    {
+      "epoch": 2.643119030734644,
+      "grad_norm": 0.255859375,
+      "learning_rate": 2.119360193149238e-05,
+      "loss": 2.0887,
+      "mean_token_accuracy": 0.5559511515125632,
+      "num_input_tokens_seen": 5290120305,
+      "num_tokens": 2229631896.0,
+      "step": 10950
+    },
+    {
+      "epoch": 2.6551895831132972,
+      "grad_norm": 0.267578125,
+      "learning_rate": 2.100497962879131e-05,
+      "loss": 2.0941,
+      "num_input_tokens_seen": 5314253297,
+      "step": 11000
+    },
+    {
+      "epoch": 2.6551895831132972,
+      "eval_loss": 1.9683243036270142,
+      "eval_mean_token_accuracy": 0.5784822298106727,
+      "eval_num_tokens": 2239778564.0,
+      "eval_runtime": 131.1903,
+      "eval_samples_per_second": 81.652,
+      "eval_steps_per_second": 20.413,
+      "num_input_tokens_seen": 5314253297,
+      "step": 11000
     }
   ],
   "logging_steps": 50,
   "max_steps": 16568,
+  "num_input_tokens_seen": 5314253297,
   "num_train_epochs": 4,
   "save_steps": 1000,
   "stateful_callbacks": {
       "attributes": {}
     }
   },
+  "total_flos": 1.4216146240596787e+18,
   "train_batch_size": 16,
   "trial_name": null,
   "trial_params": null