PicoLM-18M-A9M
PicoLM-18M-A9M is a tiny Mixture-of-Experts language model designed for experimentation with small-scale LLM architectures and dataset mixtures.
This model explores the idea that carefully curated datasets (textbooks, reasoning datasets, and code) can improve the capability of very small models.
Model Overview
| Property | Value |
|---|---|
| Model name | PicoLM-18M-A9M |
| Architecture | Transformer + Mixture-of-Experts |
| Total parameters | ~18M |
| Active parameters | ~9M |
| Layers | 6 |
| Hidden size | 256 |
| Attention heads | 4 |
| Experts | 4 |
| Top-k routing | 2 |
| Context length | 128 tokens |
The architecture is inspired by modern Mixture-of-Experts systems where tokens are dynamically routed to specialized subnetworks.
Training Data
The model was trained on a mixture of datasets intended to balance language, reasoning, and code capabilities.
Datasets include:
- WikiText
- OpenWebText
- TinyStories
- Cosmopedia (synthetic textbook dataset)
- TinyGSM (math reasoning dataset)
- Verifiable coding problems dataset
These datasets provide:
- general language modeling
- educational explanations
- reasoning tasks
- programming examples
Training Details
Training configuration:
| Parameter | Value |
|---|---|
| Training tokens | ~14M |
| Batch size | 32 |
| Context length | 128 |
| Optimizer | AdamW |
| Learning rate | 3e-4 |
| Training device | NVIDIA T4 |
The model was trained using a streaming dataset pipeline to mix multiple datasets without storing them locally.
Intended Uses
This model is intended for:
- research experiments
- architecture exploration
- tiny language model benchmarks
- educational purposes
Example tasks:
- toy text generation
- small reasoning experiments
- lightweight NLP experiments
Limitations
This model is extremely small compared to modern LLMs.
Known limitations:
- limited world knowledge
- limited reasoning depth
- may produce incorrect or nonsensical outputs
- trained on a small number of tokens
It should not be used for production systems or critical applications.
Ethical Considerations
The model was trained on mixed internet and synthetic datasets which may contain biases present in source data.
Users should be aware that generated text may reflect those biases.
Example Usage
from transformers import AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-18M-A9M")
prompt = "The future of artificial intelligence is"
tokens = tokenizer(prompt, return_tensors="pt")
output = model.generate(
**tokens,
max_new_tokens=50
)
print(tokenizer.decode(output[0]))
- Downloads last month
- 40