π§© ARC-HAIRM v2 (Hierarchical Answer Improvement Recursive Model)
ARC-HAIRM v2 is a specialized recursive reasoning model designed for the Abstract Reasoning Corpus (ARC-AGI) benchmark. Unlike standard LLMs that perform a single-pass token prediction, ARC-HAIRM implements an iterative "thinking" process inspired by TRM (Tiny Recursive Model) and HRM (Hierarchical Recursive Model).
It leverages the IBM Granite 350M backbone, augmented with a ~50M parameter recursive reasoning engine that iteratively refines internal latent states before producing an output.
π Model Details
- Developed by: Abhay
- Model Type: Hybrid Causal LM with Recursive Adapters
- Backbone: IBM Granite 4.0 350M
- Reasoning Architecture: Recursive Latent & Answer Refinement (TRM-style)
- Language(s): English (Primary), Logic/Grids (ARC task format)
- License: Apache 2.0
ποΈ Architecture: The "Thinking" Loop
ARC-HAIRM v2 does not just predict; it contemplates. The core innovation is the Recursive Reasoning Engine, which consists of two specialized modules:
1. Latent Reasoning Module ($z = f(x, y, z)$)
This module updates a high-dimensional latent "thought" state ($z$). It processes the input prompt ($x$), the current response hidden state ($y$), and its own previous state ($z$). In the default configuration, it cycles 4 times per improvement step to "digest" the problem complexity.
2. Answer Refinement Module ($y = g(y, z)$)
Crucially, the actual answer hidden states $(y)$ are refined using the latent state $(z)$. This happens over $K=8$ improvement steps. Each step brings the internal representation closer to a logically consistent solution.
π― Deep Supervision
During training, the model is supervised at every refinement step. This forces the model to learn a trajectory of improvement, rather than just a final state, leading to much more stable and logical generation on unseen ARC tasks.
π Training Specifications
- Dataset: 50,000 synthetic and augmented ARC-AGI tasks (
arc_train_50k.json). - Strategy: Continuous pre-training of the backbone with high-speed adapter training.
- Hardware: Optimized for high-VRAM environments (due to recursive unrolling during training).
- Optimization:
- Backbone LR: $1 \times 10^{-6}$ (Low to preserve linguistic priors)
- Adapter LR: $5 \times 10^{-5}$ (Focused learning on recursive logic)
- Weight Decay: $0.01$
- Supervision Decay: $0.8$ (Weights final steps more heavily than initial steps)
π οΈ How to Use
Installation
Ensure you have the architecture definition (arc_hairm.py) and inference utilities in your environment.
import torch
from arc_hairm import ARCHAIRM
from transformers import AutoTokenizer
# Load the model
model_path = "./archairm_v2_output/final"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = ARCHAIRM.from_pretrained(model_path, device=device)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Example ARC Task
prompt = "Input: [[5, 5], [5, 5]]\nOutput: [[5, 5, 5], [5, 5, 5], [5, 5, 5]]\nInput: [[2, 2], [2, 2]]\nOutput:"
# For recursive models, use the specialized generation function
# located in inference.py
from inference import generate_recursive
response = generate_recursive(model, tokenizer, prompt, max_new_tokens=128)
print(response)
β οΈ Limitations & Bias
- Context Window: Optimized for 1024 tokens. Very large ARC grids may exceed the context and require tiling.
- Task Specificity: While based on Granite, the recursive adapters are heavily tuned for ARC-style grid transformations. General linguistic performance may vary compared to the base model.
- Compute Intensity: Generation is slower than standard LLMs because the refinement loop runs for every token.
π Citations
If you use this model in your research, please cite the following inspirations:
@article{trm2024,
title={Less is More: Recursive Reasoning with Tiny Networks},
author={...},
journal={arXiv preprint},
year={2024}
}
@article{granite2024,
title={Granite 4.0: Compact Open Foundation Models},
author={IBM Research},
year={2024}
}
Model tree for Abhaykoul/ARC-HAIRM
Base model
ibm-granite/granite-4.0-350m-base