| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - ssm |
| - state-space-model |
| - causal-lm |
| - rabbit |
| - rtaforge |
| - proof-of-concept |
| base_model: RtaForge/Anvaya-Rabbit-2.7B |
| --- |
| |
| # Anvaya-Rabbit 2.7B β v0.1 Alpha |
|
|
| Rabbit is a 2.7B parameter recurrent State-Space Model (αΉta-SSM) trained entirely |
| from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture |
| and the Gurukul constitutional training protocol. It serves as a technical |
| proof-of-concept that capable alternative-architecture models can be developed under |
| severe compute constraints. This is the first model in the Anvaya series: |
| **Rabbit β Raccoon β Polar Bear**. |
|
|
| ## Overview |
|
|
| Rabbit demonstrates three proprietary components developed by RtaForge: |
|
|
| - **αΉta-SSM** β a custom recurrent state-space architecture with no attention |
| or transformer blocks |
| - **Gurukul** β a proposal-validation training loop in which a Sisya proposes |
| weight deltas and a Guru validates them against constitutional constraints before |
| applying |
| - **Subsuminator** β cross-architecture weight migration without full retraining, |
| enabling efficient curriculum transfer |
|
|
| Trained across a phased curriculum on a single consumer GPU, Rabbit shows |
| substantial gains over random initialisation on internal scale-invariant metrics. |
| It is a deliberate architecture proof at seq_len=64 β not a production model. |
| |
| For strategic context, IndiaAI alignment, and full programme roadmap, see the |
| [Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf). |
| |
| ## Architecture |
| |
| - **Type**: αΉta-SSM v7.2.2, Fortress Unbroken β recurrent SSM, no attention |
| - **Parameters**: ~2.7B (post-subsumination) |
| - **Layers**: 64 |
| - **d_model / d_state**: 2560 |
| - **Vocabulary**: 50,280 (GPT-NeoX tokenizer) |
| - **Precision**: bfloat16 |
| - **Training seq_len**: 64 |
| |
| ## Weights |
| |
| This repository contains the base pretrained checkpoint |
| (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint |
| (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`). |
| |
| Load the imprint weights (base + SFT overlay, recommended for inference): |
| |
| ```python |
| from white_rabbit.rabbit_model import create_rabbit_model |
| from transformers import AutoTokenizer |
| import torch |
| |
| model = create_rabbit_model( |
| vocab_size=50280, |
| durga_variant="fu-64", # 64-layer Fortress Unbroken backbone |
| ) |
| sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu") |
| model.load_state_dict(sd, strict=False) |
| model.eval() |
| |
| tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") |
| ``` |
| |
| > **Requires**: `rtaforge-substrates` (private repository β contact |
| > guha@rtaforge.in for access). This model uses a custom SSM architecture |
| > not compatible with standard HuggingFace `AutoModel`. |
| |
| **Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) β |
| patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with |
| fused SSM recurrence kernels. MIT licensed. |
| |
| ## Training Protocol |
| |
| Two proprietary components make this training regime possible: |
| |
| **Gurukul** is a constitutional Sisya/Guru proposal-validation loop: |
| - The Sisya proposes weight deltas based on the current curriculum phase |
| - The Guru validates each proposal against a set of constitutional constraints |
| - Accepted proposals update the model; rejected proposals are logged for signal |
| - Feedback from each cycle informs the next round of proposals |
| |
| **Subsuminator** enables efficient migration of learned weights across architectures, |
| supporting curriculum transfer without retraining from scratch. |
| |
| Together these components allowed 1,500 accepted proposals across 6 phases to be |
| processed in ~7 effective days on a single 24GB GPU. |
| |
| **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM). |
| ~7 days effective training time (total elapsed higher due to crash recovery and VRAM |
| leak debugging).** |
| |
| | Phase | Proposals | Dataset | Focus | |
| |-------|-----------|---------|-------| |
| | 0 | 125 | CAMEL Physics | Physical reasoning | |
| | 1 | 125 | CAMEL Chemistry | Chemical reasoning | |
| | 2 | 125 | CAMEL Biology | Biological reasoning | |
| | 3 | 250 | Raccoon Phase 1 | General reasoning | |
| | 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum | |
| | 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation | |
| |
| **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5. |
| |
| SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs). |
| |
| ## Evaluation |
| |
| ### Internal β Scale-Invariant Metrics |
| |
| Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised |
| baseline of identical architecture. 50 samples per corpus, seq_len=64. |
|
|
| | Metric | Random Init | Trained (Step 1,500) | Gain | |
| |--------|-------------|----------------------|------| |
| | Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8Γ** | |
| | Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149Γ** | |
| | MRR (aggregate) | 0.0026 | **0.1724** | **~66Γ** | |
| | MRR β Deep Math | 0.0084 | **0.186** | **22Γ** | |
| | Top-10 β Biology | ~1.3% | **~12%** | **~10Γ** | |
| | Top-10 β Chemistry | ~1.3% | **~13%** | **~10Γ** | |
|
|
| These gains are measured against a randomly initialised model of identical |
| architecture β they reflect what the training curriculum taught, not absolute |
| capability. |
|
|
| ### Commercial Benchmarks (lm-eval harness) |
|
|
| > **Standard academic benchmarks are not yet meaningful here.** Rabbit was |
| > deliberately trained at seq_len=64 as a pure architecture proof. Standard |
| > lm-eval prompts run 150β400 tokens β well beyond Rabbit's training context. |
| > Raccoon (seq_len=512) removes this constraint entirely. |
|
|
| | Benchmark | Score | Notes | |
| |-----------|-------|-------| |
| | HellaSwag | 25.89% | Prompt exceeds training seq_len | |
| | ARC-Challenge | 26.71% | Prompt exceeds training seq_len | |
| | MMLU | 26.89% | Prompt exceeds training seq_len | |
| | WinoGrande | 48.62% | Prompt exceeds training seq_len | |
| | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len | |
| |
| ## Roadmap |
| |
| | Model | Params | seq_len | Status | |
| |-------|--------|---------|--------| |
| | **Rabbit** | ~2.7B | 64 | β
This model β v0.1 Alpha | |
| | **Raccoon** | ~6.1B | 512 | In training β reasoning curriculum (math Γ2, logic Γ2) | |
| | **Polar Bear** | ~13B | 512 | Planned β STEM + AEVA anti-hallucination layer | |
|
|
| The delta between Rabbit and Raccoon is the story β same pipeline, same hardware |
| philosophy, 8Γ context length, reasoning-heavy curriculum. Raccoon is intended to |
| be the first αΉta-SSM model trained end-to-end in India on domestic compute |
| infrastructure to reach standard benchmark competitiveness. |
|
|
| **Give us more resources and watch what happens.** |
|
|
| ## Related Resources |
|
|
| - [Anvaya Executive Briefing β May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment) |
| - Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) |
| - Technical inquiries: guha@rtaforge.in |
|
|