Azrail
/

smallm_70

Text Generation

Model card Files Files and versions

smallm_70 / README.md

Azrail's picture

Update README.md

34f3cd9 verified 9 months ago

|

history blame contribute delete

3.71 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- YourDatasetName/if-applicable
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- transformer
	- language-model
	- experimental
	---

	# SmalLM

	<hr>
	<div align="center">
	<a href="https://github.com/azrails/SmalLm" target="_blank" style="margin: 2px;">
	<img alt="GitHub" src="https://img.shields.io/badge/GitHub-SmalLM-181717?logo=github" style="display: inline-block; vertical-align: middle;"/>
	</a>
	<a href="https://github.com/azrails/SmalLm/blob/main/LICENSE" style="margin: 2px;">
	<img alt="License" src="https://img.shields.io/badge/License-MIT-blue.svg" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>

	SmalLM is a series of small transformer models built from scratch for language modeling. This project is designed to explore innovative approaches to transformer architectures through modular pipelines for pretraining, fine-tuning, and alignment.

	## Uses

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("Azrail/smallm_70")
	model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70", trust_remote_code=True)
	inputs = tokenizer("How are you?", return_tensors="pt")

	out = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.batch_decode(out))
	```

	## Model Details**
	Key Features:

	1. Grouped Query Attention (GQA).

	2. Mixture-of-Experts with auxiliary loss-free balancing.

	3. ALiBi (Attention with Linear Biases) or Rotary Position Embedding (RoPE).

	4. NTK-by-parts RoPE interpolation for extends context length.

	Pre-Training:

	\| Model \| Training Data \| Steps \| Content Length \| Tokens \| LR \| Batch Size \| Precision \|
	\|----------------------\|-------------------------------------------------------------------------------\|-------\|----------------\|--------\|-------\|------------\|-----------\|
	\| [SmalLM-70M](https://huggingface.co/Azrail/smallm_70) \| [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) \| 70k \| 1024 \| 18B \| 1e-3 \| 0.25M \| bfloat16 \|
	\| [SmalLM-150M](#) \| [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) \| - \| 1024 \| - \| - \| - \| bfloat16 \|
	\| [SmalLM-350M](#) \| [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) \| - \| 1024 \| - \| - \| - \| bfloat16 \|
	\| [SmalLM-500M](#) \| [smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) \| - \| 1024 \| - \| - \| - \| bfloat16 \|

	Evaluation:
	Evaluation runing with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)

	\| Model \| MMLU \| ARC easy/hard \| PIQA \| HellaSwag \| OBQA \| Winogrande \|
	\|----------------------\|------\|----------------\|-------\|-----------\|-------\|------------\|
	\| [SmalLM-70M](#) \| 25.33 \| 51.47/25.68 \| 61.75 \| 30.31 \| 30.8 \| 50.83 \|
	\| [SmalLM-150M](#) \| - \| - \| - \| - \| - \| - \|
	\| [SmalLM-350M](#) \| - \| - \| - \| - \| - \| - \|
	\| [SmalLM-500M](#) \| - \| - \| - \| - \| - \| - \|


	Procedure:

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://api.wandb.ai/links/azrails-main/58rwb1yb)

	### Framework versions

	- Transformers 4.50.3
	- Pytorch 2.6.0+cu126
	- Datasets 3.5.0
	- Tokenizers 0.21.1