smallm_70 / README.md
Azrail's picture
Update README.md
34f3cd9 verified
---
library_name: transformers
license: mit
datasets:
- YourDatasetName/if-applicable
language:
- en
pipeline_tag: text-generation
tags:
- transformer
- language-model
- experimental
---
# **SmalLM**
<hr>
<div align="center">
<a href="https://github.com/azrails/SmalLm" target="_blank" style="margin: 2px;">
<img alt="GitHub" src="https://img.shields.io/badge/GitHub-SmalLM-181717?logo=github" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/azrails/SmalLm/blob/main/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-blue.svg" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
SmalLM is a series of small transformer models built from scratch for language modeling. This project is designed to explore innovative approaches to transformer architectures through modular pipelines for pretraining, fine-tuning, and alignment.
## Uses
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Azrail/smallm_70")
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70", trust_remote_code=True)
inputs = tokenizer("How are you?", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(out))
```
## Model Details**
**Key Features:**
1. Grouped Query Attention (GQA).
2. Mixture-of-Experts with auxiliary loss-free balancing.
3. ALiBi (Attention with Linear Biases) or Rotary Position Embedding (RoPE).
4. NTK-by-parts RoPE interpolation for extends context length.
**Pre-Training**:
| Model | Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
|----------------------|-------------------------------------------------------------------------------|-------|----------------|--------|-------|------------|-----------|
| [SmalLM-70M](https://huggingface.co/Azrail/smallm_70) | [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 70k | 1024 | 18B | 1e-3 | 0.25M | bfloat16 |
| [SmalLM-150M](#) | [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | - | 1024 | - | - | - | bfloat16 |
| [SmalLM-350M](#) | [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | - | 1024 | - | - | - | bfloat16 |
| [SmalLM-500M](#) | [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | - | 1024 | - | - | - | bfloat16 |
**Evaluation**:
Evaluation runing with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
| Model | MMLU | ARC easy/hard | PIQA | HellaSwag | OBQA | Winogrande |
|----------------------|------|----------------|-------|-----------|-------|------------|
| [SmalLM-70M](#) | 25.33 | 51.47/25.68 | 61.75 | 30.31 | 30.8 | 50.83 |
| [SmalLM-150M](#) | - | - | - | - | - | - |
| [SmalLM-350M](#) | - | - | - | - | - | - |
| [SmalLM-500M](#) | - | - | - | - | - | - |
**Procedure**:
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://api.wandb.ai/links/azrails-main/58rwb1yb)
### Framework versions
- Transformers 4.50.3
- Pytorch 2.6.0+cu126
- Datasets 3.5.0
- Tokenizers 0.21.1