File size: 5,410 Bytes

d9ff3f5
 
4cacde8
d74262e
bf1f7b7
d74262e
 
 
 
 
 
 
4cacde8
d74262e
4cacde8
d9ff3f5
bf1f7b7
 
 
4cacde8
bf1f7b7
4cacde8
 
 
 
 
 
 
bf1f7b7
 
 
4cacde8
 
bf1f7b7
 
 
 
4cacde8
 
 
 
 
 
 
 
 
 
 
 
 
 
bf1f7b7
4cacde8
bf1f7b7
 
 
4cacde8
 
bf1f7b7
 
 
4cacde8
bf1f7b7
4cacde8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf1f7b7
 
 
4cacde8
 
 
 
 
 
5fd3128
4cacde8
 
 
bf1f7b7
 
 
 
 
 
4cacde8
bf1f7b7
 
4cacde8
 
 
bf1f7b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4cacde8
bf1f7b7
 
 
 
 
 
 
 
 
 
4cacde8
bf1f7b7
 
 
 
2ff9b70
 
 
 
 
bf1f7b7
 
4cacde8

---
license: apache-2.0
language:
- en
tags:
- text-generation
- causal-lm
- adaptive-reasoning
- hierarchical-reasoning
- hrm
- custom-architecture
- compact-model
datasets:
- CosmicSet-2.0-mini
arxiv: 2605.28919
---

# CosmicFish-HRM

**Paper:** [CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models](https://arxiv.org/abs/2605.28919)

**GitHub:** [MistyozAI/CosmicFish-HRM](https://github.com/MistyozAI/CosmicFish-HRM)

CosmicFish-HRM is a compact 82.77M parameter causal language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates reasoning compute during inference. Rather than applying a fixed number of forward-pass layers to every input, the model iterates through high-level and low-level reasoning cycles and uses a learned halting head to decide when to stop. Harder inputs trigger deeper reasoning trajectories while simpler ones halt early.

Built at Mistyoz AI, Hyderabad.

---

## Architecture

![Architecture](architecture.png)

```
Input Blocks (Transformer) -> HRM Core (H + L levels, variable steps) -> Output Blocks (Transformer) -> LM Head
```

The HRM core maintains two interacting recurrent states operating at different abstraction levels. The high-level module captures slower, more abstract reasoning while the low-level module handles finer-grained local computation. After each reasoning step a lightweight halting head decides whether to continue or stop, conditioned on the mean-pooled high-level state.

**Key components:**

- Grouped-Query Attention (GQA) with 8 query heads and 4 KV heads
- Rotary Positional Embeddings (RoPE)
- SwiGLU feedforward layers
- RMSNorm (pre-norm for I/O blocks, post-norm inside HRM)
- Learned halt/continue Q-head controlling per-input reasoning depth
- Step penalty in the training loss encouraging efficient halting

## Model Specs

| Parameter | Value |
|---|---|
| Total parameters | 82.77M |
| Embedding dimension | 448 |
| Vocabulary size | 50,304 |
| Context length | 512 |
| Input transformer layers | 6 |
| Output transformer layers | 6 |
| HRM H-layers | 4 |
| HRM L-layers | 4 |
| Max HRM steps | 16 |
| Attention heads | 8 (4 KV, GQA) |

## Evaluation

Zero-shot benchmark results:

| Model | HellaSwag | PIQA | WinoGrande |
|---|---|---|---|
| CosmicFish-HRM (82M) | 26.2 | 58.1 | 50.7 |
| GPT-2 Small (117M) | 29.7 | 62.5 | 50.7 |
| OPT-125M | 30.6 | 62.6 | 52.9 |
| Pythia-160M | 29.4 | 62.1 | 52.8 |

At compact scale a portion of the parameter budget is allocated to the HRM reasoning infrastructure rather than raw language modeling capacity, which accounts for the gap versus fixed-depth baselines of similar size. The paper argues this tradeoff becomes more favorable as model scale increases.

## Adaptive Reasoning Behavior

The primary contribution of CosmicFish-HRM is not benchmark accuracy but adaptive compute allocation. The model uses different numbers of reasoning steps depending on input complexity:

| Prompt | Mean HRM Steps |
|---|---|
| "The capital of France is" | 2.78 |
| "Photosynthesis is the process by which plants" | 4.77 |
| "If all roses are flowers and some flowers fade quickly..." | 7.03 |
| "A bat and a ball cost $1.10 in total..." | 8.40 |

Average steps across benchmarks stay well below the 16-step maximum, with high variance across samples, confirming the halting mechanism is input-sensitive rather than collapsing to a fixed depth.

| Benchmark | Mean Steps | Std Dev |
|---|---|---|
| HellaSwag | 3.03 | 6.26 |
| PIQA | 1.87 | 5.13 |
| WinoGrande | 0.95 | 3.78 |
| Overall | 2.68 | 5.95 |

## Usage

This model uses a custom architecture. The model code is included in this repo as `modeling_hrm_cosmicfish.py`.

**Standalone chat script (downloads automatically):**

```bash
pip install torch safetensors huggingface-hub transformers termcolor
python chat.py
```

**Load manually:**

```python
import torch
import json
import tiktoken
from safetensors.torch import load_file
from huggingface_hub import snapshot_download
from modeling_hrm_cosmicfish import HRMCosmicFish, HRMCosmicFishConfig

cache_dir = snapshot_download("MistyozAI/CosmicFish-HRM")

with open(f"{cache_dir}/config.json") as f:
    cfg = json.load(f)

config = HRMCosmicFishConfig(
    vocab_size=cfg["vocab_size"],
    n_embd=cfg["n_embd"],
    block_size=cfg["block_size"],
    n_head=cfg["n_head"],
    n_kv_head=cfg["n_kv_head"],
    n_input_layers=cfg["n_input_layers"],
    n_output_layers=cfg["n_output_layers"],
    hrm_H_layers=cfg["hrm_H_layers"],
    hrm_L_layers=cfg["hrm_L_layers"],
    hrm_H_cycles=cfg["hrm_H_cycles"],
    hrm_L_cycles=cfg["hrm_L_cycles"],
    hrm_max_steps=cfg["hrm_max_steps"],
    dropout=0.0,
)

state_dict = load_file(f"{cache_dir}/model.safetensors")
model = HRMCosmicFish(config)
model.load_state_dict(state_dict)
model.eval()

tokenizer = tiktoken.get_encoding("gpt2")
prompt = "Artificial intelligence is"
tokens = tokenizer.encode(prompt)
idx = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)

with torch.no_grad():
    output = model.generate(idx, max_new_tokens=100, temperature=0.7, top_k=40)

print(tokenizer.decode(output[0].tolist()))
```

---
Pytorch File: [CF.pt](https://drive.google.com/file/d/1He4PAIixuL5EMmzmxV4nq-OLI8xlp15Y/view?usp=sharing)

Pytorch File: [Base.pt](https://drive.google.com/file/d/1Apx898RYOtyDSjd_9IhoIGlTbNYf3N7H/view?usp=sharing)

---

Mistyoz AI, Hyderabad