| base_model: t5-small | |
| tags: [trm, act, recursive, text-generation, wikitext] | |
| metrics: [loss, lm_loss, ponder_loss, perplexity_lm] | |
| # TRM-Text1 (ACT) | |
| **TRM-Text1 (ACT)** is a causal language model based on a **Tiny Recursive Reasoning Model (TRM)** with **Adaptive Computation Time (ACT)** for per-token variable depth. | |
| - **Architecture:** TRM (causal) + ACT halting | |
| - **Training Data:** wikitext-103-raw-v1 | |
| - **Tokenizer:** t5-small (SentencePiece) | |
| - **Vocab Size:** 32100 | |
| - **Objective:** Causal Language Modeling (next-token) | |
| - **Seq Len:** 1024 | |
| Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103 | |
| reported here are not directly comparable to GPT-2 BPE-based PPLs. | |
| ### Latest Performance (Epoch 1) | |
| - **Validation Loss**: 4.8248 | |
| - **Validation LM Loss**: 4.8149 | |
| - **Validation Ponder Loss**: 1.0064 | |
| - **Validation Perplexity (LM-only)**: 123.34 | |