Anvaya-Rabbit 2.7B — v0.1 Alpha

Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture and the Gurukul constitutional training protocol. It serves as a technical proof-of-concept that capable alternative-architecture models can be developed under severe compute constraints. This is the first model in the Anvaya series: Rabbit → Raccoon → Polar Bear.

Overview

Rabbit demonstrates three proprietary components developed by RtaForge:

Ṛta-SSM — a custom recurrent state-space architecture with no attention or transformer blocks
Gurukul — a proposal-validation training loop in which a Sisya proposes weight deltas and a Guru validates them against constitutional constraints before applying
Subsuminator — cross-architecture weight migration without full retraining, enabling efficient curriculum transfer

Trained across a phased curriculum on a single consumer GPU, Rabbit shows substantial gains over random initialisation on internal scale-invariant metrics. It is a deliberate architecture proof at seq_len=64 — not a production model.

For strategic context, IndiaAI alignment, and full programme roadmap, see the Anvaya Executive Briefing.

Architecture

Type: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
Parameters: ~2.7B (post-subsumination)
Layers: 64
d_model / d_state: 2560
Vocabulary: 50,280 (GPT-NeoX tokenizer)
Precision: bfloat16
Training seq_len: 64

Weights

This repository contains the base pretrained checkpoint (base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt) and the SFT imprint checkpoint (imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt).

Load the imprint weights (base + SFT overlay, recommended for inference):

from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch

model = create_rabbit_model(
    vocab_size=50280,
    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

Requires: rtaforge-substrates (private repository — contact guha@rtaforge.in for access). This model uses a custom SSM architecture not compatible with standard HuggingFace AutoModel.

Training infrastructure: Rta-Forge/polaris-revival — patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with fused SSM recurrence kernels. MIT licensed.

Training Protocol

Two proprietary components make this training regime possible:

Gurukul is a constitutional Sisya/Guru proposal-validation loop:

The Sisya proposes weight deltas based on the current curriculum phase
The Guru validates each proposal against a set of constitutional constraints
Accepted proposals update the model; rejected proposals are logged for signal
Feedback from each cycle informs the next round of proposals

Subsuminator enables efficient migration of learned weights across architectures, supporting curriculum transfer without retraining from scratch.

Together these components allowed 1,500 accepted proposals across 6 phases to be processed in ~7 effective days on a single 24GB GPU.

1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM). ~7 days effective training time (total elapsed higher due to crash recovery and VRAM leak debugging).

Phase	Proposals	Dataset	Focus
0	125	CAMEL Physics	Physical reasoning
1	125	CAMEL Chemistry	Chemical reasoning
2	125	CAMEL Biology	Biological reasoning
3	250	Raccoon Phase 1	General reasoning
4	500	Rabbit E2 Phase 4	Extended curriculum
5	375	Raccoon Phase 3 (consolidation re-run)	Pattern consolidation

Final checkpoint: Step 1,500. seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.

SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).

Evaluation

Internal — Scale-Invariant Metrics

Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised baseline of identical architecture. 50 samples per corpus, seq_len=64.

Metric	Random Init	Trained (Step 1,500)	Gain
Top-1 Accuracy (aggregate)	0.24%	1.90%	~8×
Top-10 Accuracy (aggregate)	0.24%	35.84%	~149×
MRR (aggregate)	0.0026	0.1724	~66×
MRR — Deep Math	0.0084	0.186	22×
Top-10 — Biology	~1.3%	~12%	~10×
Top-10 — Chemistry	~1.3%	~13%	~10×

These gains are measured against a randomly initialised model of identical architecture — they reflect what the training curriculum taught, not absolute capability.

Commercial Benchmarks (lm-eval harness)

Standard academic benchmarks are not yet meaningful here. Rabbit was deliberately trained at seq_len=64 as a pure architecture proof. Standard lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context. Raccoon (seq_len=512) removes this constraint entirely.

Benchmark	Score	Notes
HellaSwag	25.89%	Prompt exceeds training seq_len
ARC-Challenge	26.71%	Prompt exceeds training seq_len
MMLU	26.89%	Prompt exceeds training seq_len
WinoGrande	48.62%	Prompt exceeds training seq_len
TruthfulQA MC1	21.91%	Prompt exceeds training seq_len

Roadmap

Model	Params	seq_len	Status
Rabbit	~2.7B	64	✅ This model — v0.1 Alpha
Raccoon	~6.1B	512	In training — reasoning curriculum (math ×2, logic ×2)
Polar Bear	~13B	512	Planned — STEM + AEVA anti-hallucination layer

The delta between Rabbit and Raccoon is the story — same pipeline, same hardware philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to be the first Ṛta-SSM model trained end-to-end in India on domestic compute infrastructure to reach standard benchmark competitiveness.

Give us more resources and watch what happens.

Related Resources

Anvaya Executive Briefing — May 2026 (strategic context & IndiaAI alignment)
Training infrastructure: Rta-Forge/polaris-revival
Technical inquiries: guha@rtaforge.in

Downloads last month: 21

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RtaForge/Anvaya-Rabbit-2.7B

Unable to build the model tree, the base model loops to the model itself. Learn more.