simple-100m-pretrain-v1

This is a 120M parameter decoder-only transformer trained from scratch on 1 billion tokens of high-quality web data, math, and code.

Model Details

Architecture: Llama-based (32 Layers)
Attention: Grouped Query Attention (GQA) 2:1 Ratio
Hidden Size: 512
Intermediate Size: 1408
Context Length: 1024
Tokenizer: GPT-2
Total Tokens: 1 Billion

Training Configuration

Hardware: TPU v5e-8 (Kaggle)
Optimizer: AdamW (Peak LR: 5e-4)
Schedule: Cosine decay with 2% warmup
Precision: bfloat16

Benchmarks (Zero-Shot)

Metric	Task	Value
Perplexity	Training-Validation	35.04
Accuracy	ARC-Easy	31.10%
Accuracy	PIQA	54.13%
Accuracy	HellaSwag	26.22%
Accuracy	TruthfulQA (MC1)	27.42%
Accuracy	GSM8K (5-shot)	1.36%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")
tokenizer = AutoTokenizer.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support