Anvaya-Rabbit 2.7B β€” v0.1 Alpha

Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture and the Gurukul constitutional training protocol. It serves as a technical proof-of-concept that capable alternative-architecture models can be developed under severe compute constraints. This is the first model in the Anvaya series: Rabbit β†’ Raccoon β†’ Polar Bear.

Overview

Rabbit demonstrates three proprietary components developed by RtaForge:

  • Ṛta-SSM β€” a custom recurrent state-space architecture with no attention or transformer blocks
  • Gurukul β€” a proposal-validation training loop in which a Sisya proposes weight deltas and a Guru validates them against constitutional constraints before applying
  • Subsuminator β€” cross-architecture weight migration without full retraining, enabling efficient curriculum transfer

Trained across a phased curriculum on a single consumer GPU, Rabbit shows substantial gains over random initialisation on internal scale-invariant metrics. It is a deliberate architecture proof at seq_len=64 β€” not a production model.

For strategic context, IndiaAI alignment, and full programme roadmap, see the Anvaya Executive Briefing.

Architecture

  • Type: Ṛta-SSM v7.2.2, Fortress Unbroken β€” recurrent SSM, no attention
  • Parameters: ~2.7B (post-subsumination)
  • Layers: 64
  • d_model / d_state: 2560
  • Vocabulary: 50,280 (GPT-NeoX tokenizer)
  • Precision: bfloat16
  • Training seq_len: 64

Weights

This repository contains the base pretrained checkpoint (base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt) and the SFT imprint checkpoint (imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt).

Load the imprint weights (base + SFT overlay, recommended for inference):

from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch

model = create_rabbit_model(
    vocab_size=50280,
    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

Requires: rtaforge-substrates (private repository β€” contact guha@rtaforge.in for access). This model uses a custom SSM architecture not compatible with standard HuggingFace AutoModel.

Training infrastructure: Rta-Forge/polaris-revival β€” patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with fused SSM recurrence kernels. MIT licensed.

Training Protocol

Two proprietary components make this training regime possible:

Gurukul is a constitutional Sisya/Guru proposal-validation loop:

  • The Sisya proposes weight deltas based on the current curriculum phase
  • The Guru validates each proposal against a set of constitutional constraints
  • Accepted proposals update the model; rejected proposals are logged for signal
  • Feedback from each cycle informs the next round of proposals

Subsuminator enables efficient migration of learned weights across architectures, supporting curriculum transfer without retraining from scratch.

Together these components allowed 1,500 accepted proposals across 6 phases to be processed in ~7 effective days on a single 24GB GPU.

1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM). ~7 days effective training time (total elapsed higher due to crash recovery and VRAM leak debugging).

Phase Proposals Dataset Focus
0 125 CAMEL Physics Physical reasoning
1 125 CAMEL Chemistry Chemical reasoning
2 125 CAMEL Biology Biological reasoning
3 250 Raccoon Phase 1 General reasoning
4 500 Rabbit E2 Phase 4 Extended curriculum
5 375 Raccoon Phase 3 (consolidation re-run) Pattern consolidation

Final checkpoint: Step 1,500. seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.

SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).

Evaluation

Internal β€” Scale-Invariant Metrics

Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised baseline of identical architecture. 50 samples per corpus, seq_len=64.

Metric Random Init Trained (Step 1,500) Gain
Top-1 Accuracy (aggregate) 0.24% 1.90% ~8Γ—
Top-10 Accuracy (aggregate) 0.24% 35.84% ~149Γ—
MRR (aggregate) 0.0026 0.1724 ~66Γ—
MRR β€” Deep Math 0.0084 0.186 22Γ—
Top-10 β€” Biology ~1.3% ~12% ~10Γ—
Top-10 β€” Chemistry ~1.3% ~13% ~10Γ—

These gains are measured against a randomly initialised model of identical architecture β€” they reflect what the training curriculum taught, not absolute capability.

Commercial Benchmarks (lm-eval harness)

Standard academic benchmarks are not yet meaningful here. Rabbit was deliberately trained at seq_len=64 as a pure architecture proof. Standard lm-eval prompts run 150–400 tokens β€” well beyond Rabbit's training context. Raccoon (seq_len=512) removes this constraint entirely.

Benchmark Score Notes
HellaSwag 25.89% Prompt exceeds training seq_len
ARC-Challenge 26.71% Prompt exceeds training seq_len
MMLU 26.89% Prompt exceeds training seq_len
WinoGrande 48.62% Prompt exceeds training seq_len
TruthfulQA MC1 21.91% Prompt exceeds training seq_len

Roadmap

Model Params seq_len Status
Rabbit ~2.7B 64 βœ… This model β€” v0.1 Alpha
Raccoon ~6.1B 512 In training β€” reasoning curriculum (math Γ—2, logic Γ—2)
Polar Bear ~13B 512 Planned β€” STEM + AEVA anti-hallucination layer

The delta between Rabbit and Raccoon is the story β€” same pipeline, same hardware philosophy, 8Γ— context length, reasoning-heavy curriculum. Raccoon is intended to be the first Ṛta-SSM model trained end-to-end in India on domestic compute infrastructure to reach standard benchmark competitiveness.

Give us more resources and watch what happens.

Related Resources

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RtaForge/Anvaya-Rabbit-2.7B

Unable to build the model tree, the base model loops to the model itself. Learn more.