LLM-TRM Sequence TRM
Trained TRM for sequence-to-sequence reasoning (Phase 2).
Model Details
- Architecture: Tiny Recursive Network with transformer blocks
- Compressed dimension: 256
- Layers: 2
- Heads: 8
- Latent steps (n): 6
- Deep recursions (T): 3
Training Metrics
Run summary:
wandb: epoch/best_loss 0.09876
wandb: epoch/cosine_similarity 0.97987
wandb: epoch/loss 0.09891
wandb: train/avg_halt_prob 0.99924
wandb: train/cosine_similarity 0.97987
wandb: train/halt_loss 0.00076
wandb: train/loss 0.09891
wandb: train/lr 0.0
wandb: train/mse 0.09853
wandb: train/relative_error 0.24823
Usage
import torch
from huggingface_hub import hf_hub_download
from src.train.phase2_trm import SequenceTRM
# Download and load
checkpoint_path = hf_hub_download(repo_id="anonx3247/llm-trm-pretraining", filename="trm.pt")
checkpoint = torch.load(checkpoint_path, map_location="cpu")
# Initialize TRM
trm = SequenceTRM(
d_compressed=256,
n_layers=2,
n_heads=8,
)
trm.load_state_dict(checkpoint["trm_state_dict"])
# Use: takes [B, L, D'] context, outputs [B, L+1, D']
compressed_hidden = ... # [B, L, 256]
output = trm(compressed_hidden, n_steps=4) # [B, L+1, 256]
reasoning_result = output[:, -1, :] # [B, 256]
Part of LLM-TRM
This TRM is part of the LLM-TRM project for integrating Tiny Recursive Models with language models.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support