Viharikvs
/

wikicmbaV1

+---
+base_model: t5-small
+tags: [hrm, act, wikitext]
+metrics: [loss, perplexity]
+---
+# wikicmbaV1
+**wikicmbaV1** is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles.
+The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.
+## Model Description
+- **Architecture:** Hierarchical Recurrent Memory (HRM)
+- **Training Data:** [WikiText-103](https://huggingface.co/datasets/wikitext)
+- **Original Paper:** [Hierarchical Reasoning Model](https://arxiv.org/abs/2506.21734)
+- **Tokenizer:** `t5-small` (slow T5 SentencePiece)
+- **Vocab Size**: 32100
+- **Objective:** Causal Language Modeling
+### Latest Performance (Epoch 0)
+- **Validation Loss**: `4.7058`
+- **Validation Perplexity**: `110.58377075195312`