ragunath-ravi
/

Tensa-124M

Text Generation

custom-architecture

Model card Files Files and versions

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Tensa-124M

Tensa-124M is a 124M parameter causal language model derived from GPT-2 architecture and modified with SwiGLU-style gated MLPs.

It was trained for 50,000 steps on OpenWebText and achieves a validation perplexity of ~23.

Architecture

Embedding size: 768
Layers: 12
Attention heads: 12
Context length: 1024
SwiGLU-style MLP (gate + up + down projections)
Flash attention compatible
Designed for high-throughput training (H100 optimized)

Usage (Transformers)

This model works with Hugging Face Transformers using a custom architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ragunath-ravi/Tensa-124M",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("gpt2")

inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0]))

Downloads last month: -

Dataset used to train ragunath-ravi/Tensa-124M