Meet25M Base

A small GPT-style causal language model trained from scratch.

Model

  • Architecture: GPT-style decoder-only Transformer
  • Approx size: ~25M parameters
  • Context length: 1024
  • Tokenizer: custom byte-level BPE
  • Positional encoding: RoPE
  • Normalization: RMSNorm
  • MLP: SwiGLU
  • Embeddings: tied input/output embeddings

Training Data Mix

Target pretraining mix:

  • FineWeb-Edu
  • FineWeb general
  • Wikipedia
  • OpenWebMath
  • Project Gutenberg
  • StackOverflow / Stack Exchange style posts
  • CodeSearchNet

Total target: ~250M training tokens.

Files

  • model.safetensors โ€” safetensors checkpoint
  • config.json โ€” model config
  • tokenizer/ โ€” tokenizer files
  • safetensors_info.json โ€” checkpoint metadata

Loading

This is not a standard Transformers AutoModelForCausalLM checkpoint.
Use the custom GPT class from the training script and load model.safetensors.

Downloads last month
-
Safetensors
Model size
26.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support