AILO-152M: Transformer Language Model

AILO (Artificial Intelligence Language Operator) is a 152M parameter Transformer-based language model trained from scratch.

Model Details

Property Value
Parameters 151.9M
Architecture Decoder-only Transformer
Layers 12
Hidden Size 768
Attention Heads 12
Context Length 512 tokens
Vocabulary 50,257 (GPT-2 tokenizer)

Training

  • Dataset: FineWeb-Edu (100B token sample, streamed)
  • Training Steps: 182,000+
  • Final Loss: ~3.0
  • Training Time: ~64 hours
  • Optimizer: AdamW with cosine LR schedule + warm restarts

Training Loss Curve

Training Loss Curve

Quick Start

# Install dependencies
pip install torch transformers tiktoken huggingface_hub

# Download and use
from huggingface_hub import hf_hub_download
import torch
import sys

# Download model files
repo_id = "xxrickyxx/ailo-152m"
for f in ["config.json", "configuration_ailo.py", "modeling_ailo.py", "pytorch_model.bin"]:
    hf_hub_download(repo_id=repo_id, filename=f, local_dir="ailo_model")

# Load model
sys.path.insert(0, 'ailo_model')
from configuration_ailo import AILOConfig
from modeling_ailo import AILOForCausalLM
import tiktoken

config = AILOConfig.from_pretrained("ailo_model")
model = AILOForCausalLM(config)
state_dict = torch.load("ailo_model/pytorch_model.bin", map_location='cpu')
model.load_state_dict(state_dict, strict=False)
model.eval()

# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "What is artificial intelligence?"
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens])

with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=100, temperature=0.8)

print(tokenizer.decode(output_ids[0].tolist()))

πŸ† AILO vs GPT-2 Arena Comparison

We compared AILO-152M against GPT-2 (124M) on various prompts. Despite similar size, AILO shows better coherence and fewer repetitions.

Example 1: What is artificial intelligence?

Model Response
AILO βœ… "The term artificial intelligence refers to a range of diverse fields of research that focuses on the ability of human beings to understand complex systems, perform complex tasks, and perform complex operations..."
GPT-2 ❌ "How do you find out? What do you do when you're out in the field? The answer is, you have to do what you know..."

Example 2: Write a short story about a robot

Model Response
AILO βœ… "But the robot has no control of the robot itself. It uses the robot's hand to drive it. The robot is able to read the information about the robot..."
GPT-2 ❌ "Write a short story about a robot. Write a short story about a robot. Write a short story about a robot..." (infinite repetition)

Example 3: Tell me a joke

Model Response
AILO βœ… "I think I could have made the joke. But, it's just really bad. I have never made a joke. It's only a joke..."
GPT-2 ❌ "It is not funny. It is not funny. It is not funny. It is not funny. It is not funny..." (infinite repetition)

Summary

Metric AILO-152M GPT-2 (124M)
Parameters 151.9M 124.4M
Coherence βœ… Better ⚠️ Often loses track
Repetition βœ… Rare ❌ Frequent
Training Time 64 hours Weeks

Intended Uses

  • Text generation
  • Fine-tuning for specific domains
  • Educational purposes
  • Research on small language models

Limitations

  • Small model size (152M) limits capabilities compared to larger models
  • May produce repetitive or incoherent text for complex queries
  • Training data primarily in English

Files

File Description
config.json Model configuration
configuration_ailo.py Config class
modeling_ailo.py Model architecture
pytorch_model.bin Model weights (607 MB)
AILO_Demo.ipynb Colab notebook

Citation

@misc{ailo2026,
  title={AILO-152M: A Small Transformer Language Model},
  author={AILO Team},
  year={2026},
  howpublished={\url{https://huggingface.co/xxrickyxx/ailo-152m}}
}

License

MIT License

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support