Create README.md

# SmolLM3 Checkpoints

We are releasing intermediate checkpoints of SmolLM3 to enable further research.

## Pre-training

We release checkpoints every 40,000 steps, which equals 94.4B tokens.
The GBS (Global Batch Size) in tokens for SmolLM3-3B is 2,359,296. To calculate the number of tokens from a given step:

```python
nb_tokens = nb_step * GBS
```

### Training Stages

**Stage 1:** Steps 0 to 3,450,000 (86 checkpoints)
[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage1_8T.yaml)

**Stage 2:** Steps 3,450,000 to 4,200,000 (19 checkpoints)
[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage2_8T_9T.yaml)

**Stage 3:** Steps 4,200,000 to 4,720,000 (13 checkpoints)
[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/stage3_9T_11T.yaml)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/944zWNgcI1I06RZuoP11B.png)

### Long Context Extension

For the additional 2 stages that extend the context length to 64k, we sample checkpoints every 4,000 steps (9.4B tokens) for a total of 10 checkpoints:

**Long Context 4k to 32k**
[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/long_context_4k_to_32k.yaml)

**Long Context 32k to 64k**
[config](https://huggingface.co/datasets/HuggingFaceTB/smollm3-configs/blob/main/long_context_32k_to_64k.yaml)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/jBOiemVtbfi9YD7Pki6sY.png)

## Post-training

We release checkpoints at every step of our post-training recipe: Mid training, SFT, APO soup, and LC expert.

![image.png](https://cdn-uploads.huggingface.co/production/uploads/651e96991b97c9f33d26bde6/bDzh-A5X-gi3mY_RbLOSB.png)

## How to Load a Checkpoint

```python
# pip install transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM3-3B-checkpoints"
revision = "stage1-step-40000" # replace by the revision you want
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if hasattr(torch, 'mps') and torch.mps.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```

## License

[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

Files changed (1) hide show

README.md +15 -0

README.md ADDED Viewed

	@@ -0,0 +1,15 @@

+---
+library_name: transformers
+license: apache-2.0
+language:
+  - en
+  - fr
+  - es
+  - it
+  - pt
+  - zh
+  - ar
+  - ru
+base_model:
+  - HuggingFaceTB/SmolLM3-3B-Base
+---