HuggingFaceTB/smoltalk
Viewer β’ Updated β’ 2.2M β’ 18.1k β’ 411
How to use nnsohamnn/gpt2-450M-fineweb with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="nnsohamnn/gpt2-450M-fineweb") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nnsohamnn/gpt2-450M-fineweb", dtype="auto")How to use nnsohamnn/gpt2-450M-fineweb with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nnsohamnn/gpt2-450M-fineweb"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "nnsohamnn/gpt2-450M-fineweb",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/nnsohamnn/gpt2-450M-fineweb
How to use nnsohamnn/gpt2-450M-fineweb with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "nnsohamnn/gpt2-450M-fineweb" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "nnsohamnn/gpt2-450M-fineweb",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "nnsohamnn/gpt2-450M-fineweb" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "nnsohamnn/gpt2-450M-fineweb",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use nnsohamnn/gpt2-450M-fineweb with Docker Model Runner:
docker model run hf.co/nnsohamnn/gpt2-450M-fineweb
| Step | Train PPL | Avg Loss | Current Loss | Notes / Key Events |
|---|---|---|---|---|
| ~0 β 10 k | ~100 β 150 | 4.6 β 5.0 | Varies | Initial warm-up, large fluctuations |
| 20 k | 70 | 4.25 | Varies | First solid convergence milestone |
| 24 k | 63 | 4.14 | Varies | Smooth training, steady drop |
| 26 k | 60 | 4.09 | Varies | Stable regime, lower variance |
| 37 k | 52.97 | 3.970 | 3.818 | Post-resume from 36 k checkpoint |
| 38 k | 64.96 | 4.174 | 4.119 | Spike (hard batch / buffer shuffle) |
| 39 k | 55.24 | 4.012 | 3.678 | Recovery from spike |
| 40 k | 53.61 | 3.982 | 4.230 | HF safetensors push checkpoint |
| 41 k | 49.98 | 3.912 | 4.163 | Broke below 50 PPL |
| 42 k | 54.33 | 3.995 | 4.313 | Slight fluctuation |
| 43 k | 51.27 | 3.937 | 3.925 | Stabilizing phase |
| 44 k | 50.74 | 3.927 | 3.894 | Smooth training |
| 45 k | 51.12 | 3.934 | 3.744 | Minor plateau |
| 46 k | 53.87 | 3.987 | 4.145 | Batch variance |
| 47 k | 52.39 | 3.959 | 4.092 | Mid-range phase |
| 48 k | 43.85 | 3.781 | 4.038 | π Best transient PPL drop so far |
| 49 k | 48.94 | 3.891 | 3.780 | Rebound stabilization |
| 50 k | 44.37 | 3.793 | 3.821 | β HF milestone push (pre-resume) |
| Step | Train PPL | Avg Loss | Current Loss | Notes / Key Events |
|---|---|---|---|---|
| 0 β 0.5k | 13.38 | 2.5937 | 2.2737 | Initial SmolTalk fine-tuning, Mix: Conv 46.6%, Instruct 53.4%; checkpoint saved at step 500 |
| 0.5k β 1k | 9.89 | 2.2916 | 2.2337 | Eval at step 1000: β Eval PPL: 9.31; Mix: Conv 46.9%, Inst 53.1%; checkpoint saved at step 1000 |
| 1k β 1.5k | 9.96 | 2.2989 | 1.8901 | Stable training, Mix: Conv 47.8%, Instruct 52.2%; checkpoint saved at step 1500 |
| 1.5k β 2k | 8.32 | 2.1190 | 1.8197 | SmolTalk Mix balanced, Mix: Conv 48.9%, Instruct 51.1%; checkpoint saved at step 2000 |
| Step | Train PPL | Avg Loss | Current Loss | Eval PPL | HellaSwag | Notes / Key Events |
|---|---|---|---|---|---|---|
| 50k | 44.37 | 3.793 | 3.821 | 45.68 | 31 | β HF milestone push / pre-resume evaluation |