RtaForge
/

Anvaya-Rabbit-2.7B

@@ -14,15 +14,31 @@ base_model: RtaForge/Anvaya-Rabbit-2.7B
 # Anvaya-Rabbit 2.7B — v0.1 Alpha
-> **The architecture, training protocol, and infrastructure are the story.**
-Rabbit is the first model in the Anvaya series — a proof of concept demonstrating
-that a fully custom State-Space Model (SSM) can be trained from scratch, on a
-single consumer-grade GPU, with no dependence on attention or transformer
-building blocks.
-This is not a production model. It is the opening move in a deliberate curriculum:
-**Rabbit → Raccoon → Polar Bear.** The benchmarks below are a baseline, not a claim.
 ## Architecture
@@ -66,17 +82,21 @@ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
 fused SSM recurrence kernels. MIT licensed.
-## Training
 Two proprietary components make this training regime possible:
-- **Subsuminator** — migrates learned weights across architectures without
-  retraining from scratch, enabling efficient curriculum transfer.
-- **Gurukul** — a constitutional Sisya/Guru proposal-validation loop. Sisya
-  proposes weight deltas; Guru validates them against constitutional constraints
-  before applying. Strong learning signals extracted from limited data and compute.
-Together they are why Rabbit trained in 7 days on a single consumer GPU.
 **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
 ~7 days effective training time (total elapsed higher due to crash recovery and VRAM
@@ -93,10 +113,9 @@ leak debugging).**
 **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
-SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs),
-trained with the Anvaya Gurukul protocol.
-## Evaluation Results (Step 1,500)
 ### Internal — Scale-Invariant Metrics
@@ -120,9 +139,8 @@ capability.
 > **Standard academic benchmarks are not yet meaningful here.** Rabbit was
 > deliberately trained at seq_len=64 as a pure architecture proof. Standard
-> lm-eval prompts (few-shot examples + question) run 150–400 tokens — well
-> beyond Rabbit's training context. Raccoon (seq_len=512) removes this
-> constraint entirely.
 | Benchmark | Score | Notes |
 |-----------|-------|-------|
@@ -132,7 +150,7 @@ capability.
 | WinoGrande | 48.62% | Prompt exceeds training seq_len |
 | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
-## What Comes Next
 | Model | Params | seq_len | Status |
 |-------|--------|---------|--------|
@@ -140,6 +158,15 @@ capability.
 | **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
 | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
-The delta between Rabbit and Raccoon is the story. One epoch → two epochs,
-seq_len 64 → 512, 2.7B → 6.1B. Same pipeline, same hardware philosophy.
 **Give us more resources and watch what happens.**

 # Anvaya-Rabbit 2.7B — v0.1 Alpha
+Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
+from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
+and the Gurukul constitutional training protocol. It serves as a technical
+proof-of-concept that capable alternative-architecture models can be developed under
+severe compute constraints. This is the first model in the Anvaya series:
+**Rabbit → Raccoon → Polar Bear**.
+## Overview
+Rabbit demonstrates three proprietary components developed by RtaForge:
+- **Ṛta-SSM** — a custom recurrent state-space architecture with no attention
+  or transformer blocks
+- **Gurukul** — a proposal-validation training loop in which a Sisya proposes
+  weight deltas and a Guru validates them against constitutional constraints before
+  applying
+- **Subsuminator** — cross-architecture weight migration without full retraining,
+  enabling efficient curriculum transfer
+Trained across a phased curriculum on a single consumer GPU, Rabbit shows
+substantial gains over random initialisation on internal scale-invariant metrics.
+It is a deliberate architecture proof at seq_len=64 — not a production model.
+For strategic context, IndiaAI alignment, and full programme roadmap, see the
+[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
 ## Architecture
 patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
 fused SSM recurrence kernels. MIT licensed.
+## Training Protocol
 Two proprietary components make this training regime possible:
+**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
+- The Sisya proposes weight deltas based on the current curriculum phase
+- The Guru validates each proposal against a set of constitutional constraints
+- Accepted proposals update the model; rejected proposals are logged for signal
+- Feedback from each cycle informs the next round of proposals
+**Subsuminator** enables efficient migration of learned weights across architectures,
+supporting curriculum transfer without retraining from scratch.
+Together these components allowed 1,500 accepted proposals across 6 phases to be
+processed in ~7 effective days on a single 24GB GPU.
 **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
 ~7 days effective training time (total elapsed higher due to crash recovery and VRAM
 **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
+SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
+## Evaluation
 ### Internal — Scale-Invariant Metrics
 > **Standard academic benchmarks are not yet meaningful here.** Rabbit was
 > deliberately trained at seq_len=64 as a pure architecture proof. Standard
+> lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
+> Raccoon (seq_len=512) removes this constraint entirely.
 | Benchmark | Score | Notes |
 |-----------|-------|-------|
 | WinoGrande | 48.62% | Prompt exceeds training seq_len |
 | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
+## Roadmap
 | Model | Params | seq_len | Status |
 |-------|--------|---------|--------|
 | **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
 | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
+The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
+philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
+be the first Ṛta-SSM model trained end-to-end in India on domestic compute
+infrastructure to reach standard benchmark competitiveness.
 **Give us more resources and watch what happens.**
+## Related Resources
+- [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
+- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
+- Technical inquiries: guha@rtaforge.in