YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
simple-100m-pretrain-v1
This is a 120M parameter decoder-only transformer trained from scratch on 1 billion tokens of high-quality web data, math, and code.
Model Details
- Architecture: Llama-based (32 Layers)
- Attention: Grouped Query Attention (GQA) 2:1 Ratio
- Hidden Size: 512
- Intermediate Size: 1408
- Context Length: 1024
- Tokenizer: GPT-2
- Total Tokens: 1 Billion
Training Configuration
- Hardware: TPU v5e-8 (Kaggle)
- Optimizer: AdamW (Peak LR: 5e-4)
- Schedule: Cosine decay with 2% warmup
- Precision: bfloat16
Benchmarks (Zero-Shot)
| Metric | Task | Value |
|---|---|---|
| Perplexity | Training-Validation | 35.04 |
| Accuracy | ARC-Easy | 31.10% |
| Accuracy | PIQA | 54.13% |
| Accuracy | HellaSwag | 26.22% |
| Accuracy | TruthfulQA (MC1) | 27.42% |
| Accuracy | GSM8K (5-shot) | 1.36% |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")
tokenizer = AutoTokenizer.from_pretrained("Dodosoomro/simple-100m-pretrain-v1")
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support