File size: 7,122 Bytes

23c452a
 
 
ea927a4
23c452a
 
 
 
0830f34
23c452a
0830f34
 
23c452a
 
0830f34
23c452a
4c25f8e
 
 
 
 
 
0f8f1bf
4c25f8e
0830f34
4c25f8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23c452a
3cec8f2
23c452a
0830f34
f54708a
ea927a4
 
 
 
0830f34
3cec8f2
ea927a4
3cec8f2
0830f34
f54708a
 
0830f34
f54708a
23c452a
 
ea927a4
0830f34
 
23c452a
f54708a
 
 
 
 
0830f34
23c452a
0830f34
 
23c452a
 
f54708a
 
0830f34
23c452a
0f8f1bf
 
 
 
4c25f8e
23c452a
0f8f1bf
 
4c25f8e
 
 
 
 
 
 
 
0830f34
4c25f8e
 
0f8f1bf
 
 
 
0830f34
f6ba325
 
 
 
 
 
 
 
23c452a
f6ba325
0830f34
4c25f8e
0f8f1bf
4c25f8e
0830f34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f54708a
 
0830f34
 
 
0f8f1bf
 
4c25f8e
 
0830f34
 
 
0f8f1bf
 
 
 
 
0830f34
4c25f8e
0830f34
 
 
f54708a
f6ba325
0830f34
 
4c25f8e
 
 
 
 
0830f34
4c25f8e

---
language:
- en
license: apache-2.0
tags:
- ssm
- state-space-model
- causal-lm
- rabbit
- rtaforge
- proof-of-concept
base_model: RtaForge/Anvaya-Rabbit-2.7B
---

# Anvaya-Rabbit 2.7B — v0.1 Alpha

Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
and the Gurukul constitutional training protocol. It serves as a technical
proof-of-concept that capable alternative-architecture models can be developed under
severe compute constraints. This is the first model in the Anvaya series:
**Rabbit → Raccoon → Polar Bear**.

## Overview

Rabbit demonstrates three proprietary components developed by RtaForge:

- **Ṛta-SSM** — a custom recurrent state-space architecture with no attention
  or transformer blocks
- **Gurukul** — a proposal-validation training loop in which a Sisya proposes
  weight deltas and a Guru validates them against constitutional constraints before
  applying
- **Subsuminator** — cross-architecture weight migration without full retraining,
  enabling efficient curriculum transfer

Trained across a phased curriculum on a single consumer GPU, Rabbit shows
substantial gains over random initialisation on internal scale-invariant metrics.
It is a deliberate architecture proof at seq_len=64 — not a production model.

For strategic context, IndiaAI alignment, and full programme roadmap, see the
[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).

## Architecture

- **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
- **Parameters**: ~2.7B (post-subsumination)
- **Layers**: 64
- **d_model / d_state**: 2560
- **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
- **Precision**: bfloat16
- **Training seq_len**: 64

## Weights

This repository contains the base pretrained checkpoint
(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).

Load the imprint weights (base + SFT overlay, recommended for inference):

```python
from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch

model = create_rabbit_model(
    vocab_size=50280,
    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
```

> **Requires**: `rtaforge-substrates` (private repository — contact
> guha@rtaforge.in for access). This model uses a custom SSM architecture
> not compatible with standard HuggingFace `AutoModel`.

**Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) —
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
fused SSM recurrence kernels. MIT licensed.

## Training Protocol

Two proprietary components make this training regime possible:

**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
- The Sisya proposes weight deltas based on the current curriculum phase
- The Guru validates each proposal against a set of constitutional constraints
- Accepted proposals update the model; rejected proposals are logged for signal
- Feedback from each cycle informs the next round of proposals

**Subsuminator** enables efficient migration of learned weights across architectures,
supporting curriculum transfer without retraining from scratch.

Together these components allowed 1,500 accepted proposals across 6 phases to be
processed in ~7 effective days on a single 24GB GPU.

**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
leak debugging).**

| Phase | Proposals | Dataset | Focus |
|-------|-----------|---------|-------|
| 0 | 125 | CAMEL Physics | Physical reasoning |
| 1 | 125 | CAMEL Chemistry | Chemical reasoning |
| 2 | 125 | CAMEL Biology | Biological reasoning |
| 3 | 250 | Raccoon Phase 1 | General reasoning |
| 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
| 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |

**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.

SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).

## Evaluation

### Internal — Scale-Invariant Metrics

Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
baseline of identical architecture. 50 samples per corpus, seq_len=64.

| Metric | Random Init | Trained (Step 1,500) | Gain |
|--------|-------------|----------------------|------|
| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
| MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
| MRR — Deep Math | 0.0084 | **0.186** | **22×** |
| Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
| Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |

These gains are measured against a randomly initialised model of identical
architecture — they reflect what the training curriculum taught, not absolute
capability.

### Commercial Benchmarks (lm-eval harness)

> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
> lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
> Raccoon (seq_len=512) removes this constraint entirely.

| Benchmark | Score | Notes |
|-----------|-------|-------|
| HellaSwag | 25.89% | Prompt exceeds training seq_len |
| ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
| MMLU | 26.89% | Prompt exceeds training seq_len |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |

## Roadmap

| Model | Params | seq_len | Status |
|-------|--------|---------|--------|
| **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha |
| **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
| **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |

The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
be the first Ṛta-SSM model trained end-to-end in India on domestic compute
infrastructure to reach standard benchmark competitiveness.

**Give us more resources and watch what happens.**

## Related Resources

- [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
- Technical inquiries: guha@rtaforge.in