PicoLM-18M-A9M

PicoLM-18M-A9M is a tiny Mixture-of-Experts language model designed for experimentation with small-scale LLM architectures and dataset mixtures.

This model explores the idea that carefully curated datasets (textbooks, reasoning datasets, and code) can improve the capability of very small models.

Model Overview

Property	Value
Model name	PicoLM-18M-A9M
Architecture	Transformer + Mixture-of-Experts
Total parameters	~18M
Active parameters	~9M
Layers	6
Hidden size	256
Attention heads	4
Experts	4
Top-k routing	2
Context length	128 tokens

The architecture is inspired by modern Mixture-of-Experts systems where tokens are dynamically routed to specialized subnetworks.

Training Data

The model was trained on a mixture of datasets intended to balance language, reasoning, and code capabilities.

Datasets include:

WikiText
OpenWebText
TinyStories
Cosmopedia (synthetic textbook dataset)
TinyGSM (math reasoning dataset)
Verifiable coding problems dataset

These datasets provide:

general language modeling
educational explanations
reasoning tasks
programming examples

Training Details

Training configuration:

Parameter	Value
Training tokens	~14M
Batch size	32
Context length	128
Optimizer	AdamW
Learning rate	3e-4
Training device	NVIDIA T4

The model was trained using a streaming dataset pipeline to mix multiple datasets without storing them locally.

Intended Uses

This model is intended for:

research experiments
architecture exploration
tiny language model benchmarks
educational purposes

Example tasks:

toy text generation
small reasoning experiments
lightweight NLP experiments

Limitations

This model is extremely small compared to modern LLMs.

Known limitations:

limited world knowledge
limited reasoning depth
may produce incorrect or nonsensical outputs
trained on a small number of tokens

It should not be used for production systems or critical applications.

Ethical Considerations

The model was trained on mixed internet and synthetic datasets which may contain biases present in source data.

Users should be aware that generated text may reflect those biases.

Example Usage

from transformers import AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-18M-A9M")

prompt = "The future of artificial intelligence is"

tokens = tokenizer(prompt, return_tensors="pt")

output = model.generate(
    **tokens,
    max_new_tokens=50
)

print(tokenizer.decode(output[0]))

Downloads last month: 4

Tralalabs
/

PicoLM-18M-A9M