Anvaya-Rabbit-2.7B / README.md
tvastr's picture
docs: link executive briefing PDF in model card
4c25f8e verified
---
language:
- en
license: apache-2.0
tags:
- ssm
- state-space-model
- causal-lm
- rabbit
- rtaforge
- proof-of-concept
base_model: RtaForge/Anvaya-Rabbit-2.7B
---
# Anvaya-Rabbit 2.7B β€” v0.1 Alpha
Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
and the Gurukul constitutional training protocol. It serves as a technical
proof-of-concept that capable alternative-architecture models can be developed under
severe compute constraints. This is the first model in the Anvaya series:
**Rabbit β†’ Raccoon β†’ Polar Bear**.
## Overview
Rabbit demonstrates three proprietary components developed by RtaForge:
- **Ṛta-SSM** β€” a custom recurrent state-space architecture with no attention
or transformer blocks
- **Gurukul** β€” a proposal-validation training loop in which a Sisya proposes
weight deltas and a Guru validates them against constitutional constraints before
applying
- **Subsuminator** β€” cross-architecture weight migration without full retraining,
enabling efficient curriculum transfer
Trained across a phased curriculum on a single consumer GPU, Rabbit shows
substantial gains over random initialisation on internal scale-invariant metrics.
It is a deliberate architecture proof at seq_len=64 β€” not a production model.
For strategic context, IndiaAI alignment, and full programme roadmap, see the
[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
## Architecture
- **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken β€” recurrent SSM, no attention
- **Parameters**: ~2.7B (post-subsumination)
- **Layers**: 64
- **d_model / d_state**: 2560
- **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
- **Precision**: bfloat16
- **Training seq_len**: 64
## Weights
This repository contains the base pretrained checkpoint
(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
Load the imprint weights (base + SFT overlay, recommended for inference):
```python
from white_rabbit.rabbit_model import create_rabbit_model
from transformers import AutoTokenizer
import torch
model = create_rabbit_model(
vocab_size=50280,
durga_variant="fu-64", # 64-layer Fortress Unbroken backbone
)
sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
model.load_state_dict(sd, strict=False)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
```
> **Requires**: `rtaforge-substrates` (private repository β€” contact
> guha@rtaforge.in for access). This model uses a custom SSM architecture
> not compatible with standard HuggingFace `AutoModel`.
**Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) β€”
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
fused SSM recurrence kernels. MIT licensed.
## Training Protocol
Two proprietary components make this training regime possible:
**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
- The Sisya proposes weight deltas based on the current curriculum phase
- The Guru validates each proposal against a set of constitutional constraints
- Accepted proposals update the model; rejected proposals are logged for signal
- Feedback from each cycle informs the next round of proposals
**Subsuminator** enables efficient migration of learned weights across architectures,
supporting curriculum transfer without retraining from scratch.
Together these components allowed 1,500 accepted proposals across 6 phases to be
processed in ~7 effective days on a single 24GB GPU.
**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
leak debugging).**
| Phase | Proposals | Dataset | Focus |
|-------|-----------|---------|-------|
| 0 | 125 | CAMEL Physics | Physical reasoning |
| 1 | 125 | CAMEL Chemistry | Chemical reasoning |
| 2 | 125 | CAMEL Biology | Biological reasoning |
| 3 | 250 | Raccoon Phase 1 | General reasoning |
| 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
| 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |
**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
## Evaluation
### Internal β€” Scale-Invariant Metrics
Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
baseline of identical architecture. 50 samples per corpus, seq_len=64.
| Metric | Random Init | Trained (Step 1,500) | Gain |
|--------|-------------|----------------------|------|
| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8Γ—** |
| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149Γ—** |
| MRR (aggregate) | 0.0026 | **0.1724** | **~66Γ—** |
| MRR β€” Deep Math | 0.0084 | **0.186** | **22Γ—** |
| Top-10 β€” Biology | ~1.3% | **~12%** | **~10Γ—** |
| Top-10 β€” Chemistry | ~1.3% | **~13%** | **~10Γ—** |
These gains are measured against a randomly initialised model of identical
architecture β€” they reflect what the training curriculum taught, not absolute
capability.
### Commercial Benchmarks (lm-eval harness)
> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
> lm-eval prompts run 150–400 tokens β€” well beyond Rabbit's training context.
> Raccoon (seq_len=512) removes this constraint entirely.
| Benchmark | Score | Notes |
|-----------|-------|-------|
| HellaSwag | 25.89% | Prompt exceeds training seq_len |
| ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
| MMLU | 26.89% | Prompt exceeds training seq_len |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
## Roadmap
| Model | Params | seq_len | Status |
|-------|--------|---------|--------|
| **Rabbit** | ~2.7B | 64 | βœ… This model β€” v0.1 Alpha |
| **Raccoon** | ~6.1B | 512 | In training β€” reasoning curriculum (math Γ—2, logic Γ—2) |
| **Polar Bear** | ~13B | 512 | Planned β€” STEM + AEVA anti-hallucination layer |
The delta between Rabbit and Raccoon is the story β€” same pipeline, same hardware
philosophy, 8Γ— context length, reasoning-heavy curriculum. Raccoon is intended to
be the first Ṛta-SSM model trained end-to-end in India on domestic compute
infrastructure to reach standard benchmark competitiveness.
**Give us more resources and watch what happens.**
## Related Resources
- [Anvaya Executive Briefing β€” May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
- Technical inquiries: guha@rtaforge.in