wikicmbaV1 / README.md

Viharikvs

Model card updated after epoch 45

81e04fb verified 3 months ago

preview code

raw

history blame contribute delete

1.09 kB

metadata

base_model: t5-small
tags:
  - hrm
  - act
  - wikitext
metrics:
  - loss
  - perplexity

wikicmbaV1

wikicmbaV1 is an experimental text generation model based on the. It was trained from scratch on the WikiText-103 dataset, a large-scale language modeling benchmark derived from high-quality Wikipedia articles.

The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.

Model Description

Architecture: Hierarchical Recurrent Memory (HRM)
Training Data: WikiText-103
Original Paper: Hierarchical Reasoning Model
Tokenizer: t5-small (slow T5 SentencePiece)
Vocab Size: 32100
Objective: Causal Language Modeling

Latest Performance (Epoch 45)

Validation Loss: 3.1813
Validation Perplexity: 24.07879638671875