File size: 7,122 Bytes
23c452a ea927a4 23c452a 0830f34 23c452a 0830f34 23c452a 0830f34 23c452a 4c25f8e 0f8f1bf 4c25f8e 0830f34 4c25f8e 23c452a 3cec8f2 23c452a 0830f34 f54708a ea927a4 0830f34 3cec8f2 ea927a4 3cec8f2 0830f34 f54708a 0830f34 f54708a 23c452a ea927a4 0830f34 23c452a f54708a 0830f34 23c452a 0830f34 23c452a f54708a 0830f34 23c452a 0f8f1bf 4c25f8e 23c452a 0f8f1bf 4c25f8e 0830f34 4c25f8e 0f8f1bf 0830f34 f6ba325 23c452a f6ba325 0830f34 4c25f8e 0f8f1bf 4c25f8e 0830f34 f54708a 0830f34 0f8f1bf 4c25f8e 0830f34 0f8f1bf 0830f34 4c25f8e 0830f34 f54708a f6ba325 0830f34 4c25f8e 0830f34 4c25f8e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | ---
language:
- en
license: apache-2.0
tags:
- ssm
- state-space-model
- causal-lm
- rabbit
- rtaforge
- proof-of-concept
base_model: RtaForge/Anvaya-Rabbit-2.7B
---
# Anvaya-Rabbit 2.7B β v0.1 Alpha
Rabbit is a 2.7B parameter recurrent State-Space Model (αΉta-SSM) trained entirely
from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
and the Gurukul constitutional training protocol. It serves as a technical
proof-of-concept that capable alternative-architecture models can be developed under
severe compute constraints. This is the first model in the Anvaya series:
**Rabbit β Raccoon β Polar Bear**.
## Overview
Rabbit demonstrates three proprietary components developed by RtaForge:
- **αΉta-SSM** β a custom recurrent state-space architecture with no attention
or transformer blocks
- **Gurukul** β a proposal-validation training loop in which a Sisya proposes
weight deltas and a Guru validates them against constitutional constraints before
applying
- **Subsuminator** β cross-architecture weight migration without full retraining,
enabling efficient curriculum transfer
Trained across a phased curriculum on a single consumer GPU, Rabbit shows
substantial gains over random initialisation on internal scale-invariant metrics.
It is a deliberate architecture proof at seq_len=64 β not a production model.
For strategic context, IndiaAI alignment, and full programme roadmap, see the
[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
## Architecture
- **Type**: αΉta-SSM v7.2.2, Fortress Unbroken β recurrent SSM, no attention
- **Parameters**: ~2.7B (post-subsumination)
- **Layers**: 64
- **d_model / d_state**: 2560
- **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
- **Precision**: bfloat16
- **Training seq_len**: 64
## Weights
This repository contains the base pretrained checkpoint
(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
Load the imprint weights (base + SFT overlay, recommended for inference):
```python
from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch
model = create_rabbit_model(
vocab_size=50280,
durga_variant="fu-64", # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
```
> **Requires**: `rtaforge-substrates` (private repository β contact
> guha@rtaforge.in for access). This model uses a custom SSM architecture
> not compatible with standard HuggingFace `AutoModel`.
**Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) β
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
fused SSM recurrence kernels. MIT licensed.
## Training Protocol
Two proprietary components make this training regime possible:
**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
- The Sisya proposes weight deltas based on the current curriculum phase
- The Guru validates each proposal against a set of constitutional constraints
- Accepted proposals update the model; rejected proposals are logged for signal
- Feedback from each cycle informs the next round of proposals
**Subsuminator** enables efficient migration of learned weights across architectures,
supporting curriculum transfer without retraining from scratch.
Together these components allowed 1,500 accepted proposals across 6 phases to be
processed in ~7 effective days on a single 24GB GPU.
**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
leak debugging).**
| Phase | Proposals | Dataset | Focus |
|-------|-----------|---------|-------|
| 0 | 125 | CAMEL Physics | Physical reasoning |
| 1 | 125 | CAMEL Chemistry | Chemical reasoning |
| 2 | 125 | CAMEL Biology | Biological reasoning |
| 3 | 250 | Raccoon Phase 1 | General reasoning |
| 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
| 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |
**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
## Evaluation
### Internal β Scale-Invariant Metrics
Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
baseline of identical architecture. 50 samples per corpus, seq_len=64.
| Metric | Random Init | Trained (Step 1,500) | Gain |
|--------|-------------|----------------------|------|
| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8Γ** |
| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149Γ** |
| MRR (aggregate) | 0.0026 | **0.1724** | **~66Γ** |
| MRR β Deep Math | 0.0084 | **0.186** | **22Γ** |
| Top-10 β Biology | ~1.3% | **~12%** | **~10Γ** |
| Top-10 β Chemistry | ~1.3% | **~13%** | **~10Γ** |
These gains are measured against a randomly initialised model of identical
architecture β they reflect what the training curriculum taught, not absolute
capability.
### Commercial Benchmarks (lm-eval harness)
> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
> lm-eval prompts run 150β400 tokens β well beyond Rabbit's training context.
> Raccoon (seq_len=512) removes this constraint entirely.
| Benchmark | Score | Notes |
|-----------|-------|-------|
| HellaSwag | 25.89% | Prompt exceeds training seq_len |
| ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
| MMLU | 26.89% | Prompt exceeds training seq_len |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
## Roadmap
| Model | Params | seq_len | Status |
|-------|--------|---------|--------|
| **Rabbit** | ~2.7B | 64 | β
This model β v0.1 Alpha |
| **Raccoon** | ~6.1B | 512 | In training β reasoning curriculum (math Γ2, logic Γ2) |
| **Polar Bear** | ~13B | 512 | Planned β STEM + AEVA anti-hallucination layer |
The delta between Rabbit and Raccoon is the story β same pipeline, same hardware
philosophy, 8Γ context length, reasoning-heavy curriculum. Raccoon is intended to
be the first αΉta-SSM model trained end-to-end in India on domestic compute
infrastructure to reach standard benchmark competitiveness.
**Give us more resources and watch what happens.**
## Related Resources
- [Anvaya Executive Briefing β May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
- Technical inquiries: guha@rtaforge.in
|