Instructions to use Isotonic/smol_llama_DialogSumm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Isotonic/smol_llama_DialogSumm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Isotonic/smol_llama_DialogSumm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Isotonic/smol_llama_DialogSumm")
model = AutoModelForCausalLM.from_pretrained("Isotonic/smol_llama_DialogSumm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Isotonic/smol_llama_DialogSumm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Isotonic/smol_llama_DialogSumm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Isotonic/smol_llama_DialogSumm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Isotonic/smol_llama_DialogSumm

SGLang

How to use Isotonic/smol_llama_DialogSumm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Isotonic/smol_llama_DialogSumm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Isotonic/smol_llama_DialogSumm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Isotonic/smol_llama_DialogSumm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Isotonic/smol_llama_DialogSumm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Isotonic/smol_llama_DialogSumm with Docker Model Runner:
```
docker model run hf.co/Isotonic/smol_llama_DialogSumm
```

Isotonic commited on Feb 10, 2024

Commit

debad92

verified ·

1 Parent(s): 9bbb841

Model save

Browse files

Files changed (5) hide show

README.md +13 -14
all_results.json +6 -6
generation_config.json +1 -0
train_results.json +6 -6
trainer_state.json +51 -38

README.md CHANGED Viewed

@@ -1,12 +1,10 @@
 ---
 license: apache-2.0
-base_model: BEE-spoke-data/smol_llama-101M-GQA
 tags:
-- trl
-- sft
 - generated_from_trainer
-datasets:
-- generator
 model-index:
 - name: smol_llama_DialogSumm
   results: []
@@ -17,9 +15,10 @@ should probably proofread and complete it, then remove this comment. -->
 # smol_llama_DialogSumm
-This model is a fine-tuned version of [BEE-spoke-data/smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA) on the generator dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.8876
 ## Model description
@@ -43,19 +42,19 @@ The following hyperparameters were used during training:
 - eval_batch_size: 32
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.3
 - num_epochs: 4
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| No log        | 1.0   | 413  | 2.0080          |
-| 2.1058        | 2.0   | 826  | 1.9296          |
-| 1.8821        | 3.0   | 1239 | 1.8948          |
-| 1.7466        | 4.0   | 1652 | 1.8876          |
 ### Framework versions

 ---
 license: apache-2.0
+base_model: Felladrin/Smol-Llama-101M-Chat-v1
 tags:
 - generated_from_trainer
+metrics:
+- accuracy
 model-index:
 - name: smol_llama_DialogSumm
   results: []
 # smol_llama_DialogSumm
+This model is a fine-tuned version of [Felladrin/Smol-Llama-101M-Chat-v1](https://huggingface.co/Felladrin/Smol-Llama-101M-Chat-v1) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.8918
+- Accuracy: 0.6050
 ## Model description
 - eval_batch_size: 32
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine_with_restarts
 - lr_scheduler_warmup_ratio: 0.3
 - num_epochs: 4
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|
+| No log        | 1.0   | 411  | 2.0053          | 0.5871   |
+| 2.0885        | 2.0   | 822  | 1.9287          | 0.5971   |
+| 1.8728        | 3.0   | 1233 | 1.8916          | 0.6039   |
+| 1.7214        | 4.0   | 1644 | 1.8918          | 0.6050   |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 4.0,
-    "total_flos": 2.48531807895552e+16,
-    "train_loss": 1.8925644733715288,
-    "train_runtime": 532.9682,
-    "train_samples": 41984,
-    "train_samples_per_second": 99.098,
-    "train_steps_per_second": 3.1
 }

 {
     "epoch": 4.0,
+    "total_flos": 2.475153948672e+16,
+    "train_loss": 1.875104602525994,
+    "train_runtime": 621.1187,
+    "train_samples": 13150,
+    "train_samples_per_second": 84.686,
+    "train_steps_per_second": 2.647
 }

generation_config.json CHANGED Viewed

@@ -2,5 +2,6 @@
   "_from_model_config": true,
   "bos_token_id": 1,
   "eos_token_id": 2,
   "transformers_version": "4.37.2"
 }

   "_from_model_config": true,
   "bos_token_id": 1,
   "eos_token_id": 2,
+  "pad_token_id": 2,
   "transformers_version": "4.37.2"
 }

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 4.0,
-    "total_flos": 2.48531807895552e+16,
-    "train_loss": 1.8925644733715288,
-    "train_runtime": 532.9682,
-    "train_samples": 41984,
-    "train_samples_per_second": 99.098,
-    "train_steps_per_second": 3.1
 }

 {
     "epoch": 4.0,
+    "total_flos": 2.475153948672e+16,
+    "train_loss": 1.875104602525994,
+    "train_runtime": 621.1187,
+    "train_samples": 13150,
+    "train_samples_per_second": 84.686,
+    "train_steps_per_second": 2.647
 }

trainer_state.json CHANGED Viewed

@@ -3,77 +3,90 @@
   "best_model_checkpoint": null,
   "epoch": 4.0,
   "eval_steps": 500,
-  "global_step": 1652,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 1.0,
-      "eval_loss": 2.0079615116119385,
-      "eval_runtime": 12.1698,
-      "eval_samples_per_second": 272.395,
-      "eval_steps_per_second": 8.546,
-      "step": 413
     },
     {
-      "epoch": 1.21,
-      "learning_rate": 4.987024221453288e-05,
-      "loss": 2.1058,
       "step": 500
     },
     {
       "epoch": 2.0,
-      "eval_loss": 1.9295774698257446,
-      "eval_runtime": 12.1255,
-      "eval_samples_per_second": 273.391,
-      "eval_steps_per_second": 8.577,
-      "step": 826
     },
     {
-      "epoch": 2.42,
-      "learning_rate": 2.8243944636678206e-05,
-      "loss": 1.8821,
       "step": 1000
     },
     {
       "epoch": 3.0,
-      "eval_loss": 1.8948005437850952,
-      "eval_runtime": 12.1627,
-      "eval_samples_per_second": 272.554,
-      "eval_steps_per_second": 8.551,
-      "step": 1239
     },
     {
-      "epoch": 3.63,
-      "learning_rate": 6.61764705882353e-06,
-      "loss": 1.7466,
       "step": 1500
     },
     {
       "epoch": 4.0,
-      "eval_loss": 1.8875998258590698,
-      "eval_runtime": 12.152,
-      "eval_samples_per_second": 272.795,
-      "eval_steps_per_second": 8.558,
-      "step": 1652
     },
     {
       "epoch": 4.0,
-      "step": 1652,
-      "total_flos": 2.48531807895552e+16,
-      "train_loss": 1.8925644733715288,
-      "train_runtime": 532.9682,
-      "train_samples_per_second": 99.098,
-      "train_steps_per_second": 3.1
     }
   ],
   "logging_steps": 500,
-  "max_steps": 1652,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 4,
   "save_steps": 500,
-  "total_flos": 2.48531807895552e+16,
   "train_batch_size": 32,
   "trial_name": null,
   "trial_params": null

   "best_model_checkpoint": null,
   "epoch": 4.0,
   "eval_steps": 500,
+  "global_step": 1644,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 1.0,
+      "eval_accuracy": 0.5871421461904744,
+      "eval_loss": 2.005277633666992,
+      "eval_runtime": 29.0983,
+      "eval_samples_per_second": 113.134,
+      "eval_steps_per_second": 3.54,
+      "step": 411
     },
     {
+      "epoch": 1.22,
+      "learning_rate": 4.9996641797696206e-05,
+      "loss": 2.0885,
       "step": 500
     },
     {
       "epoch": 2.0,
+      "eval_accuracy": 0.5971017152277686,
+      "eval_loss": 1.9286963939666748,
+      "eval_runtime": 28.862,
+      "eval_samples_per_second": 114.06,
+      "eval_steps_per_second": 3.569,
+      "step": 822
     },
     {
+      "epoch": 2.43,
+      "learning_rate": 2.9684532864643122e-05,
+      "loss": 1.8728,
       "step": 1000
     },
     {
       "epoch": 3.0,
+      "eval_accuracy": 0.603926221807302,
+      "eval_loss": 1.8916035890579224,
+      "eval_runtime": 28.7797,
+      "eval_samples_per_second": 114.386,
+      "eval_steps_per_second": 3.579,
+      "step": 1233
     },
     {
+      "epoch": 3.65,
+      "learning_rate": 1.9095509616124385e-06,
+      "loss": 1.7214,
       "step": 1500
     },
     {
       "epoch": 4.0,
+      "eval_accuracy": 0.6049892568138169,
+      "eval_loss": 1.891752004623413,
+      "eval_runtime": 28.8553,
+      "eval_samples_per_second": 114.086,
+      "eval_steps_per_second": 3.57,
+      "step": 1644
     },
     {
       "epoch": 4.0,
+      "step": 1644,
+      "total_flos": 2.475153948672e+16,
+      "train_loss": 1.875104602525994,
+      "train_runtime": 621.1187,
+      "train_samples_per_second": 84.686,
+      "train_steps_per_second": 2.647
+    },
+    {
+      "epoch": 4.0,
+      "eval_accuracy": 0.6049892568138169,
+      "eval_loss": 1.891752004623413,
+      "eval_runtime": 28.9822,
+      "eval_samples_per_second": 113.587,
+      "eval_steps_per_second": 3.554,
+      "step": 1644
     }
   ],
   "logging_steps": 500,
+  "max_steps": 1644,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 4,
   "save_steps": 500,
+  "total_flos": 2.475153948672e+16,
   "train_batch_size": 32,
   "trial_name": null,
   "trial_params": null