AILO-152M: Transformer Language Model

AILO (Artificial Intelligence Language Operator) is a 152M parameter Transformer-based language model trained from scratch.

Model Details

Property	Value
Parameters	151.9M
Architecture	Decoder-only Transformer
Layers	12
Hidden Size	768
Attention Heads	12
Context Length	512 tokens
Vocabulary	50,257 (GPT-2 tokenizer)

Training

Dataset: FineWeb-Edu (100B token sample, streamed)
Training Steps: 182,000+
Final Loss: ~3.0
Training Time: ~64 hours
Optimizer: AdamW with cosine LR schedule + warm restarts

Training Loss Curve

Quick Start

# Install dependencies
pip install torch transformers tiktoken huggingface_hub

# Download and use
from huggingface_hub import hf_hub_download
import torch
import sys

# Download model files
repo_id = "xxrickyxx/ailo-152m"
for f in ["config.json", "configuration_ailo.py", "modeling_ailo.py", "pytorch_model.bin"]:
    hf_hub_download(repo_id=repo_id, filename=f, local_dir="ailo_model")

# Load model
sys.path.insert(0, 'ailo_model')
from configuration_ailo import AILOConfig
from modeling_ailo import AILOForCausalLM
import tiktoken

config = AILOConfig.from_pretrained("ailo_model")
model = AILOForCausalLM(config)
state_dict = torch.load("ailo_model/pytorch_model.bin", map_location='cpu')
model.load_state_dict(state_dict, strict=False)
model.eval()

# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "What is artificial intelligence?"
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens])

with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=100, temperature=0.8)

print(tokenizer.decode(output_ids[0].tolist()))

🏆 AILO vs GPT-2 Arena Comparison

We compared AILO-152M against GPT-2 (124M) on various prompts. Despite similar size, AILO shows better coherence and fewer repetitions.

Example 1: What is artificial intelligence?

Model	Response
AILO ✅	"The term artificial intelligence refers to a range of diverse fields of research that focuses on the ability of human beings to understand complex systems, perform complex tasks, and perform complex operations..."
GPT-2 ❌	"How do you find out? What do you do when you're out in the field? The answer is, you have to do what you know..."

Example 2: Write a short story about a robot

Model	Response
AILO ✅	"But the robot has no control of the robot itself. It uses the robot's hand to drive it. The robot is able to read the information about the robot..."
GPT-2 ❌	"Write a short story about a robot. Write a short story about a robot. Write a short story about a robot..." (infinite repetition)

Example 3: Tell me a joke

Model	Response
AILO ✅	"I think I could have made the joke. But, it's just really bad. I have never made a joke. It's only a joke..."
GPT-2 ❌	"It is not funny. It is not funny. It is not funny. It is not funny. It is not funny..." (infinite repetition)

Summary

Metric	AILO-152M	GPT-2 (124M)
Parameters	151.9M	124.4M
Coherence	✅ Better	⚠️ Often loses track
Repetition	✅ Rare	❌ Frequent
Training Time	64 hours	Weeks

Intended Uses

Text generation
Fine-tuning for specific domains
Educational purposes
Research on small language models

Limitations

Small model size (152M) limits capabilities compared to larger models
May produce repetitive or incoherent text for complex queries
Training data primarily in English

Files

File	Description
`config.json`	Model configuration
`configuration_ailo.py`	Config class
`modeling_ailo.py`	Model architecture
`pytorch_model.bin`	Model weights (607 MB)
`AILO_Demo.ipynb`	Colab notebook

Citation

@misc{ailo2026,
  title={AILO-152M: A Small Transformer Language Model},
  author={AILO Team},
  year={2026},
  howpublished={\url{https://huggingface.co/xxrickyxx/ailo-152m}}
}

License

MIT License

Downloads last month: 17