PicoLM-18M-A9M

PicoLM-18M-A9M is a tiny Mixture-of-Experts language model designed for experimentation with small-scale LLM architectures and dataset mixtures.

This model explores the idea that carefully curated datasets (textbooks, reasoning datasets, and code) can improve the capability of very small models.


Model Overview

Property Value
Model name PicoLM-18M-A9M
Architecture Transformer + Mixture-of-Experts
Total parameters ~18M
Active parameters ~9M
Layers 6
Hidden size 256
Attention heads 4
Experts 4
Top-k routing 2
Context length 128 tokens

The architecture is inspired by modern Mixture-of-Experts systems where tokens are dynamically routed to specialized subnetworks.


Training Data

The model was trained on a mixture of datasets intended to balance language, reasoning, and code capabilities.

Datasets include:

  • WikiText
  • OpenWebText
  • TinyStories
  • Cosmopedia (synthetic textbook dataset)
  • TinyGSM (math reasoning dataset)
  • Verifiable coding problems dataset

These datasets provide:

  • general language modeling
  • educational explanations
  • reasoning tasks
  • programming examples

Training Details

Training configuration:

Parameter Value
Training tokens ~14M
Batch size 32
Context length 128
Optimizer AdamW
Learning rate 3e-4
Training device NVIDIA T4

The model was trained using a streaming dataset pipeline to mix multiple datasets without storing them locally.


Intended Uses

This model is intended for:

  • research experiments
  • architecture exploration
  • tiny language model benchmarks
  • educational purposes

Example tasks:

  • toy text generation
  • small reasoning experiments
  • lightweight NLP experiments

Limitations

This model is extremely small compared to modern LLMs.

Known limitations:

  • limited world knowledge
  • limited reasoning depth
  • may produce incorrect or nonsensical outputs
  • trained on a small number of tokens

It should not be used for production systems or critical applications.


Ethical Considerations

The model was trained on mixed internet and synthetic datasets which may contain biases present in source data.

Users should be aware that generated text may reflect those biases.


Example Usage

from transformers import AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-18M-A9M")

prompt = "The future of artificial intelligence is"

tokens = tokenizer(prompt, return_tensors="pt")

output = model.generate(
    **tokens,
    max_new_tokens=50
)

print(tokenizer.decode(output[0]))
Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Tralalabs/PicoLM-18M-A9M