Tensa-124M
Tensa-124M is a 124M parameter causal language model derived from GPT-2 architecture and modified with SwiGLU-style gated MLPs.
It was trained for 50,000 steps on OpenWebText and achieves a validation perplexity of ~23.
Architecture
- Embedding size: 768
- Layers: 12
- Attention heads: 12
- Context length: 1024
- SwiGLU-style MLP (gate + up + down projections)
- Flash attention compatible
- Designed for high-throughput training (H100 optimized)
Usage (Transformers)
This model works with Hugging Face Transformers using a custom architecture.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ragunath-ravi/Tensa-124M",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 32