---
language:
- en
license: apache-2.0
tags:
- ssm
- state-space-model
- causal-lm
- rabbit
- rtaforge
- proof-of-concept
base_model: RtaForge/Anvaya-Rabbit-2.7B
---

# Anvaya-Rabbit 2.7B — v0.1 Alpha

Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
and the Gurukul constitutional training protocol. It serves as a technical
proof-of-concept that capable alternative-architecture models can be developed under
severe compute constraints. This is the first model in the Anvaya series:
**Rabbit → Raccoon → Polar Bear**.

## Overview

Rabbit demonstrates three proprietary components developed by RtaForge:

- **Ṛta-SSM** — a custom recurrent state-space architecture with no attention
  or transformer blocks
- **Gurukul** — a proposal-validation training loop in which a Sisya proposes
  weight deltas and a Guru validates them against constitutional constraints before
  applying
- **Subsuminator** — cross-architecture weight migration without full retraining,
  enabling efficient curriculum transfer

Trained across a phased curriculum on a single consumer GPU, Rabbit shows
substantial gains over random initialisation on internal scale-invariant metrics.
It is a deliberate architecture proof at seq_len=64 — not a production model.

For strategic context, IndiaAI alignment, and full programme roadmap, see the
[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).

## Architecture

- **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
- **Parameters**: ~2.7B (post-subsumination)
- **Layers**: 64
- **d_model / d_state**: 2560
- **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
- **Precision**: bfloat16
- **Training seq_len**: 64

## Weights

This repository contains the base pretrained checkpoint
(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).

Load the imprint weights (base + SFT overlay, recommended for inference):

```python
from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch

model = create_rabbit_model(
    vocab_size=50280,
    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
```

> **Requires**: `rtaforge-substrates` (private repository — contact
> guha@rtaforge.in for access). This model uses a custom SSM architecture
> not compatible with standard HuggingFace `AutoModel`.

**Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) —
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
fused SSM recurrence kernels. MIT licensed.

## Training Protocol

Two proprietary components make this training regime possible:

**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
- The Sisya proposes weight deltas based on the current curriculum phase
- The Guru validates each proposal against a set of constitutional constraints
- Accepted proposals update the model; rejected proposals are logged for signal
- Feedback from each cycle informs the next round of proposals

**Subsuminator** enables efficient migration of learned weights across architectures,
supporting curriculum transfer without retraining from scratch.

Together these components allowed 1,500 accepted proposals across 6 phases to be
processed in ~7 effective days on a single 24GB GPU.

**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
leak debugging).**

| Phase | Proposals | Dataset | Focus |
|-------|-----------|---------|-------|
| 0 | 125 | CAMEL Physics | Physical reasoning |
| 1 | 125 | CAMEL Chemistry | Chemical reasoning |
| 2 | 125 | CAMEL Biology | Biological reasoning |
| 3 | 250 | Raccoon Phase 1 | General reasoning |
| 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
| 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |

**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.

SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).

## Evaluation

### Internal — Scale-Invariant Metrics

Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
baseline of identical architecture. 50 samples per corpus, seq_len=64.

| Metric | Random Init | Trained (Step 1,500) | Gain |
|--------|-------------|----------------------|------|
| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
| MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
| MRR — Deep Math | 0.0084 | **0.186** | **22×** |
| Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
| Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |

These gains are measured against a randomly initialised model of identical
architecture — they reflect what the training curriculum taught, not absolute
capability.

### Commercial Benchmarks (lm-eval harness)

> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
> lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
> Raccoon (seq_len=512) removes this constraint entirely.

| Benchmark | Score | Notes |
|-----------|-------|-------|
| HellaSwag | 25.89% | Prompt exceeds training seq_len |
| ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
| MMLU | 26.89% | Prompt exceeds training seq_len |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |

## Roadmap

| Model | Params | seq_len | Status |
|-------|--------|---------|--------|
| **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha |
| **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
| **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |

The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
be the first Ṛta-SSM model trained end-to-end in India on domestic compute
infrastructure to reach standard benchmark competitiveness.

**Give us more resources and watch what happens.**

## Related Resources

- [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
- Technical inquiries: guha@rtaforge.in