πŸ€– TRM-textV2: Recurrent Shared Transformer

TRM-textV2 is a high-efficiency language model featuring a Shared Recurrent Transformer architecture enhanced with Inverse Square Mask (ISM) logic.

🌟 Model Highlights

  • Efficient Depth: Simulates a deep network by repeating a single Transformer block (recurrence_steps=4).
  • ISM Integration: Advanced prefix-answer masking for superior long-range dependency handling.
  • Optimized for Stability: Trained with specific residual scaling and gate initialization to prevent loss plateaus.

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('summerMC/TRM-textV2', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('summerMC/TRM-textV2', trust_remote_code=True)

# Standard Chat Template use
messages = [{'role': 'user', 'content': 'Once upon a time, a small robot'}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt')

πŸ“Š Training Details

  • Dataset: TinyStories & FineWeb-Edu
  • Architecture: 45M parameters (Effective depth equivalent to larger models)
  • License: MIT
Downloads last month
777
Safetensors
Model size
45.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for summerMC/TRM-textV2

Unable to build the model tree, the base model loops to the model itself. Learn more.