File size: 1,086 Bytes
b109e9a 81e04fb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---
base_model: t5-small
tags: [hrm, act, wikitext]
metrics: [loss, perplexity]
---
# wikicmbaV1
**wikicmbaV1** is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles.
The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.
## Model Description
- **Architecture:** Hierarchical Recurrent Memory (HRM)
- **Training Data:** [WikiText-103](https://huggingface.co/datasets/wikitext)
- **Original Paper:** [Hierarchical Reasoning Model](https://arxiv.org/abs/2506.21734)
- **Tokenizer:** `t5-small` (slow T5 SentencePiece)
- **Vocab Size**: 32100
- **Objective:** Causal Language Modeling
### Latest Performance (Epoch 45)
- **Validation Loss**: `3.1813`
- **Validation Perplexity**: `24.07879638671875`
|