You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Tensa-124M

Tensa-124M is a 124M parameter causal language model derived from GPT-2 architecture and modified with SwiGLU-style gated MLPs.

It was trained for 50,000 steps on OpenWebText and achieves a validation perplexity of ~23.


Architecture

  • Embedding size: 768
  • Layers: 12
  • Attention heads: 12
  • Context length: 1024
  • SwiGLU-style MLP (gate + up + down projections)
  • Flash attention compatible
  • Designed for high-throughput training (H100 optimized)

Usage (Transformers)

This model works with Hugging Face Transformers using a custom architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ragunath-ravi/Tensa-124M",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("gpt2")

inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0]))
Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ragunath-ravi/Tensa-124M