--- language: - en license: apache-2.0 tags: - ssm - state-space-model - causal-lm - rabbit - rtaforge - proof-of-concept base_model: RtaForge/Anvaya-Rabbit-2.7B --- # Anvaya-Rabbit 2.7B — v0.1 Alpha Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture and the Gurukul constitutional training protocol. It serves as a technical proof-of-concept that capable alternative-architecture models can be developed under severe compute constraints. This is the first model in the Anvaya series: **Rabbit → Raccoon → Polar Bear**. ## Overview Rabbit demonstrates three proprietary components developed by RtaForge: - **Ṛta-SSM** — a custom recurrent state-space architecture with no attention or transformer blocks - **Gurukul** — a proposal-validation training loop in which a Sisya proposes weight deltas and a Guru validates them against constitutional constraints before applying - **Subsuminator** — cross-architecture weight migration without full retraining, enabling efficient curriculum transfer Trained across a phased curriculum on a single consumer GPU, Rabbit shows substantial gains over random initialisation on internal scale-invariant metrics. It is a deliberate architecture proof at seq_len=64 — not a production model. For strategic context, IndiaAI alignment, and full programme roadmap, see the [Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf). ## Architecture - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention - **Parameters**: ~2.7B (post-subsumination) - **Layers**: 64 - **d_model / d_state**: 2560 - **Vocabulary**: 50,280 (GPT-NeoX tokenizer) - **Precision**: bfloat16 - **Training seq_len**: 64 ## Weights This repository contains the base pretrained checkpoint (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`). Load the imprint weights (base + SFT overlay, recommended for inference): ```python from white_rabbit.rabbit_model import create_rabbit_model from transformers import AutoTokenizer import torch model = create_rabbit_model( vocab_size=50280, durga_variant="fu-64", # 64-layer Fortress Unbroken backbone ) sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu") model.load_state_dict(sd, strict=False) model.eval() tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") ``` > **Requires**: `rtaforge-substrates` (private repository — contact > guha@rtaforge.in for access). This model uses a custom SSM architecture > not compatible with standard HuggingFace `AutoModel`. **Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) — patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with fused SSM recurrence kernels. MIT licensed. ## Training Protocol Two proprietary components make this training regime possible: **Gurukul** is a constitutional Sisya/Guru proposal-validation loop: - The Sisya proposes weight deltas based on the current curriculum phase - The Guru validates each proposal against a set of constitutional constraints - Accepted proposals update the model; rejected proposals are logged for signal - Feedback from each cycle informs the next round of proposals **Subsuminator** enables efficient migration of learned weights across architectures, supporting curriculum transfer without retraining from scratch. Together these components allowed 1,500 accepted proposals across 6 phases to be processed in ~7 effective days on a single 24GB GPU. **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM). ~7 days effective training time (total elapsed higher due to crash recovery and VRAM leak debugging).** | Phase | Proposals | Dataset | Focus | |-------|-----------|---------|-------| | 0 | 125 | CAMEL Physics | Physical reasoning | | 1 | 125 | CAMEL Chemistry | Chemical reasoning | | 2 | 125 | CAMEL Biology | Biological reasoning | | 3 | 250 | Raccoon Phase 1 | General reasoning | | 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum | | 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation | **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5. SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs). ## Evaluation ### Internal — Scale-Invariant Metrics Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised baseline of identical architecture. 50 samples per corpus, seq_len=64. | Metric | Random Init | Trained (Step 1,500) | Gain | |--------|-------------|----------------------|------| | Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** | | Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** | | MRR (aggregate) | 0.0026 | **0.1724** | **~66×** | | MRR — Deep Math | 0.0084 | **0.186** | **22×** | | Top-10 — Biology | ~1.3% | **~12%** | **~10×** | | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** | These gains are measured against a randomly initialised model of identical architecture — they reflect what the training curriculum taught, not absolute capability. ### Commercial Benchmarks (lm-eval harness) > **Standard academic benchmarks are not yet meaningful here.** Rabbit was > deliberately trained at seq_len=64 as a pure architecture proof. Standard > lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context. > Raccoon (seq_len=512) removes this constraint entirely. | Benchmark | Score | Notes | |-----------|-------|-------| | HellaSwag | 25.89% | Prompt exceeds training seq_len | | ARC-Challenge | 26.71% | Prompt exceeds training seq_len | | MMLU | 26.89% | Prompt exceeds training seq_len | | WinoGrande | 48.62% | Prompt exceeds training seq_len | | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len | ## Roadmap | Model | Params | seq_len | Status | |-------|--------|---------|--------| | **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha | | **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) | | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer | The delta between Rabbit and Raccoon is the story — same pipeline, same hardware philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to be the first Ṛta-SSM model trained end-to-end in India on domestic compute infrastructure to reach standard benchmark competitiveness. **Give us more resources and watch what happens.** ## Related Resources - [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment) - Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) - Technical inquiries: guha@rtaforge.in