File size: 5,410 Bytes
d9ff3f5 4cacde8 d74262e bf1f7b7 d74262e 4cacde8 d74262e 4cacde8 d9ff3f5 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 5fd3128 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 4cacde8 bf1f7b7 2ff9b70 bf1f7b7 4cacde8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | ---
license: apache-2.0
language:
- en
tags:
- text-generation
- causal-lm
- adaptive-reasoning
- hierarchical-reasoning
- hrm
- custom-architecture
- compact-model
datasets:
- CosmicSet-2.0-mini
arxiv: 2605.28919
---
# CosmicFish-HRM
**Paper:** [CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models](https://arxiv.org/abs/2605.28919)
**GitHub:** [MistyozAI/CosmicFish-HRM](https://github.com/MistyozAI/CosmicFish-HRM)
CosmicFish-HRM is a compact 82.77M parameter causal language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates reasoning compute during inference. Rather than applying a fixed number of forward-pass layers to every input, the model iterates through high-level and low-level reasoning cycles and uses a learned halting head to decide when to stop. Harder inputs trigger deeper reasoning trajectories while simpler ones halt early.
Built at Mistyoz AI, Hyderabad.
---
## Architecture

```
Input Blocks (Transformer) -> HRM Core (H + L levels, variable steps) -> Output Blocks (Transformer) -> LM Head
```
The HRM core maintains two interacting recurrent states operating at different abstraction levels. The high-level module captures slower, more abstract reasoning while the low-level module handles finer-grained local computation. After each reasoning step a lightweight halting head decides whether to continue or stop, conditioned on the mean-pooled high-level state.
**Key components:**
- Grouped-Query Attention (GQA) with 8 query heads and 4 KV heads
- Rotary Positional Embeddings (RoPE)
- SwiGLU feedforward layers
- RMSNorm (pre-norm for I/O blocks, post-norm inside HRM)
- Learned halt/continue Q-head controlling per-input reasoning depth
- Step penalty in the training loss encouraging efficient halting
## Model Specs
| Parameter | Value |
|---|---|
| Total parameters | 82.77M |
| Embedding dimension | 448 |
| Vocabulary size | 50,304 |
| Context length | 512 |
| Input transformer layers | 6 |
| Output transformer layers | 6 |
| HRM H-layers | 4 |
| HRM L-layers | 4 |
| Max HRM steps | 16 |
| Attention heads | 8 (4 KV, GQA) |
## Evaluation
Zero-shot benchmark results:
| Model | HellaSwag | PIQA | WinoGrande |
|---|---|---|---|
| CosmicFish-HRM (82M) | 26.2 | 58.1 | 50.7 |
| GPT-2 Small (117M) | 29.7 | 62.5 | 50.7 |
| OPT-125M | 30.6 | 62.6 | 52.9 |
| Pythia-160M | 29.4 | 62.1 | 52.8 |
At compact scale a portion of the parameter budget is allocated to the HRM reasoning infrastructure rather than raw language modeling capacity, which accounts for the gap versus fixed-depth baselines of similar size. The paper argues this tradeoff becomes more favorable as model scale increases.
## Adaptive Reasoning Behavior
The primary contribution of CosmicFish-HRM is not benchmark accuracy but adaptive compute allocation. The model uses different numbers of reasoning steps depending on input complexity:
| Prompt | Mean HRM Steps |
|---|---|
| "The capital of France is" | 2.78 |
| "Photosynthesis is the process by which plants" | 4.77 |
| "If all roses are flowers and some flowers fade quickly..." | 7.03 |
| "A bat and a ball cost $1.10 in total..." | 8.40 |
Average steps across benchmarks stay well below the 16-step maximum, with high variance across samples, confirming the halting mechanism is input-sensitive rather than collapsing to a fixed depth.
| Benchmark | Mean Steps | Std Dev |
|---|---|---|
| HellaSwag | 3.03 | 6.26 |
| PIQA | 1.87 | 5.13 |
| WinoGrande | 0.95 | 3.78 |
| Overall | 2.68 | 5.95 |
## Usage
This model uses a custom architecture. The model code is included in this repo as `modeling_hrm_cosmicfish.py`.
**Standalone chat script (downloads automatically):**
```bash
pip install torch safetensors huggingface-hub transformers termcolor
python chat.py
```
**Load manually:**
```python
import torch
import json
import tiktoken
from safetensors.torch import load_file
from huggingface_hub import snapshot_download
from modeling_hrm_cosmicfish import HRMCosmicFish, HRMCosmicFishConfig
cache_dir = snapshot_download("MistyozAI/CosmicFish-HRM")
with open(f"{cache_dir}/config.json") as f:
cfg = json.load(f)
config = HRMCosmicFishConfig(
vocab_size=cfg["vocab_size"],
n_embd=cfg["n_embd"],
block_size=cfg["block_size"],
n_head=cfg["n_head"],
n_kv_head=cfg["n_kv_head"],
n_input_layers=cfg["n_input_layers"],
n_output_layers=cfg["n_output_layers"],
hrm_H_layers=cfg["hrm_H_layers"],
hrm_L_layers=cfg["hrm_L_layers"],
hrm_H_cycles=cfg["hrm_H_cycles"],
hrm_L_cycles=cfg["hrm_L_cycles"],
hrm_max_steps=cfg["hrm_max_steps"],
dropout=0.0,
)
state_dict = load_file(f"{cache_dir}/model.safetensors")
model = HRMCosmicFish(config)
model.load_state_dict(state_dict)
model.eval()
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "Artificial intelligence is"
tokens = tokenizer.encode(prompt)
idx = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)
with torch.no_grad():
output = model.generate(idx, max_new_tokens=100, temperature=0.7, top_k=40)
print(tokenizer.decode(output[0].tolist()))
```
---
Pytorch File: [CF.pt](https://drive.google.com/file/d/1He4PAIixuL5EMmzmxV4nq-OLI8xlp15Y/view?usp=sharing)
Pytorch File: [Base.pt](https://drive.google.com/file/d/1Apx898RYOtyDSjd_9IhoIGlTbNYf3N7H/view?usp=sharing)
---
Mistyoz AI, Hyderabad |