| base_model: t5-small | |
| tags: [hrm, act, wikitext] | |
| metrics: [loss, perplexity] | |
| # wikicmbaV1 | |
| **wikicmbaV1** is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles. | |
| The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales. | |
| ## Model Description | |
| - **Architecture:** Hierarchical Recurrent Memory (HRM) | |
| - **Training Data:** [WikiText-103](https://huggingface.co/datasets/wikitext) | |
| - **Original Paper:** [Hierarchical Reasoning Model](https://arxiv.org/abs/2506.21734) | |
| - **Tokenizer:** `t5-small` (slow T5 SentencePiece) | |
| - **Vocab Size**: 32100 | |
| - **Objective:** Causal Language Modeling | |
| ### Latest Performance (Epoch 45) | |
| - **Validation Loss**: `3.1813` | |
| - **Validation Perplexity**: `24.07879638671875` | |