Viharikvs
/

HRM-Text1-UltraChat

Model card Files Files and versions

HRM-Text1-UltraChat / README.md

Viharikvs's picture

Model card updated after epoch 20

4bea19c verified 3 months ago

|

history blame contribute delete

1.18 kB

	---
	base_model: t5-small
	tags: [hrm, act, dolly-15k]
	metrics: [loss, perplexity]
	---
	# HRM-Text1

	HRM-Text1 is an experimental instruction-following text generation model based on the Hierarchical Recurrent Memory (HRM) architecture. It is trained on the `databricks/databricks-dolly-15k` dataset, which consists of instruction–response pairs across multiple task types.

	The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.

	## Model Description

	- Architecture: Hierarchical Recurrent Memory (HRM)
	- Training Data: [databricks/databricks-dolly-15k](https://hf.co/datasets/databricks/databricks-dolly-15k)
	- Original Paper: [Hierarchical Reasoning Model](https://arxiv.org/abs/2506.21734)
	- Tokenizer: `t5-small` (slow T5 SentencePiece)
	- Vocab Size: 32100
	- Objective: Causal Language Modeling

	### Latest Performance (Epoch 20)
	- Validation Loss: `3.6668`
	- Validation Perplexity: `39.13`