Update README.md

2ff9b70 verified 5 days ago

5.41 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-generation
	- causal-lm
	- adaptive-reasoning
	- hierarchical-reasoning
	- hrm
	- custom-architecture
	- compact-model
	datasets:
	- CosmicSet-2.0-mini
	arxiv: 2605.28919
	---

	# CosmicFish-HRM

	Paper: [CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models](https://arxiv.org/abs/2605.28919)

	GitHub: [MistyozAI/CosmicFish-HRM](https://github.com/MistyozAI/CosmicFish-HRM)

	CosmicFish-HRM is a compact 82.77M parameter causal language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates reasoning compute during inference. Rather than applying a fixed number of forward-pass layers to every input, the model iterates through high-level and low-level reasoning cycles and uses a learned halting head to decide when to stop. Harder inputs trigger deeper reasoning trajectories while simpler ones halt early.

	Built at Mistyoz AI, Hyderabad.

	---

	## Architecture

	![Architecture](architecture.png)

	```
	Input Blocks (Transformer) -> HRM Core (H + L levels, variable steps) -> Output Blocks (Transformer) -> LM Head
	```

	The HRM core maintains two interacting recurrent states operating at different abstraction levels. The high-level module captures slower, more abstract reasoning while the low-level module handles finer-grained local computation. After each reasoning step a lightweight halting head decides whether to continue or stop, conditioned on the mean-pooled high-level state.

	Key components:

	- Grouped-Query Attention (GQA) with 8 query heads and 4 KV heads
	- Rotary Positional Embeddings (RoPE)
	- SwiGLU feedforward layers
	- RMSNorm (pre-norm for I/O blocks, post-norm inside HRM)
	- Learned halt/continue Q-head controlling per-input reasoning depth
	- Step penalty in the training loss encouraging efficient halting

	## Model Specs

	\| Parameter \| Value \|
	\|---\|---\|
	\| Total parameters \| 82.77M \|
	\| Embedding dimension \| 448 \|
	\| Vocabulary size \| 50,304 \|
	\| Context length \| 512 \|
	\| Input transformer layers \| 6 \|
	\| Output transformer layers \| 6 \|
	\| HRM H-layers \| 4 \|
	\| HRM L-layers \| 4 \|
	\| Max HRM steps \| 16 \|
	\| Attention heads \| 8 (4 KV, GQA) \|

	## Evaluation

	Zero-shot benchmark results:

	\| Model \| HellaSwag \| PIQA \| WinoGrande \|
	\|---\|---\|---\|---\|
	\| CosmicFish-HRM (82M) \| 26.2 \| 58.1 \| 50.7 \|
	\| GPT-2 Small (117M) \| 29.7 \| 62.5 \| 50.7 \|
	\| OPT-125M \| 30.6 \| 62.6 \| 52.9 \|
	\| Pythia-160M \| 29.4 \| 62.1 \| 52.8 \|

	At compact scale a portion of the parameter budget is allocated to the HRM reasoning infrastructure rather than raw language modeling capacity, which accounts for the gap versus fixed-depth baselines of similar size. The paper argues this tradeoff becomes more favorable as model scale increases.

	## Adaptive Reasoning Behavior

	The primary contribution of CosmicFish-HRM is not benchmark accuracy but adaptive compute allocation. The model uses different numbers of reasoning steps depending on input complexity:

	\| Prompt \| Mean HRM Steps \|
	\|---\|---\|
	\| "The capital of France is" \| 2.78 \|
	\| "Photosynthesis is the process by which plants" \| 4.77 \|
	\| "If all roses are flowers and some flowers fade quickly..." \| 7.03 \|
	\| "A bat and a ball cost $1.10 in total..." \| 8.40 \|

	Average steps across benchmarks stay well below the 16-step maximum, with high variance across samples, confirming the halting mechanism is input-sensitive rather than collapsing to a fixed depth.

	\| Benchmark \| Mean Steps \| Std Dev \|
	\|---\|---\|---\|
	\| HellaSwag \| 3.03 \| 6.26 \|
	\| PIQA \| 1.87 \| 5.13 \|
	\| WinoGrande \| 0.95 \| 3.78 \|
	\| Overall \| 2.68 \| 5.95 \|

	## Usage

	This model uses a custom architecture. The model code is included in this repo as `modeling_hrm_cosmicfish.py`.

	Standalone chat script (downloads automatically):

	```bash
	pip install torch safetensors huggingface-hub transformers termcolor
	python chat.py
	```

	Load manually:

	```python
	import torch
	import json
	import tiktoken
	from safetensors.torch import load_file
	from huggingface_hub import snapshot_download
	from modeling_hrm_cosmicfish import HRMCosmicFish, HRMCosmicFishConfig

	cache_dir = snapshot_download("MistyozAI/CosmicFish-HRM")

	with open(f"{cache_dir}/config.json") as f:
	cfg = json.load(f)

	config = HRMCosmicFishConfig(
	vocab_size=cfg["vocab_size"],
	n_embd=cfg["n_embd"],
	block_size=cfg["block_size"],
	n_head=cfg["n_head"],
	n_kv_head=cfg["n_kv_head"],
	n_input_layers=cfg["n_input_layers"],
	n_output_layers=cfg["n_output_layers"],
	hrm_H_layers=cfg["hrm_H_layers"],
	hrm_L_layers=cfg["hrm_L_layers"],
	hrm_H_cycles=cfg["hrm_H_cycles"],
	hrm_L_cycles=cfg["hrm_L_cycles"],
	hrm_max_steps=cfg["hrm_max_steps"],
	dropout=0.0,
	)

	state_dict = load_file(f"{cache_dir}/model.safetensors")
	model = HRMCosmicFish(config)
	model.load_state_dict(state_dict)
	model.eval()

	tokenizer = tiktoken.get_encoding("gpt2")
	prompt = "Artificial intelligence is"
	tokens = tokenizer.encode(prompt)
	idx = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)

	with torch.no_grad():
	output = model.generate(idx, max_new_tokens=100, temperature=0.7, top_k=40)

	print(tokenizer.decode(output[0].tolist()))
	```

	---
	Pytorch File: [CF.pt](https://drive.google.com/file/d/1He4PAIixuL5EMmzmxV4nq-OLI8xlp15Y/view?usp=sharing)

	Pytorch File: [Base.pt](https://drive.google.com/file/d/1Apx898RYOtyDSjd_9IhoIGlTbNYf3N7H/view?usp=sharing)

	---

	Mistyoz AI, Hyderabad