YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

simple-100m-pretrain-v1

This is a 120M parameter decoder-only transformer trained from scratch on 1 billion tokens of high-quality web data, math, and code.

Model Details

  • Architecture: Llama-based (32 Layers)
  • Attention: Grouped Query Attention (GQA) 2:1 Ratio
  • Hidden Size: 512
  • Intermediate Size: 1408
  • Context Length: 1024
  • Tokenizer: GPT-2
  • Total Tokens: 1 Billion

Training Configuration

  • Hardware: TPU v5e-8 (Kaggle)
  • Optimizer: AdamW (Peak LR: 5e-4)
  • Schedule: Cosine decay with 2% warmup
  • Precision: bfloat16

Benchmarks (Zero-Shot)

Metric Task Value
Perplexity Training-Validation 35.04
Accuracy ARC-Easy 31.10%
Accuracy PIQA 54.13%
Accuracy HellaSwag 26.22%
Accuracy TruthfulQA (MC1) 27.42%
Accuracy GSM8K (5-shot) 1.36%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")
tokenizer = AutoTokenizer.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support