RtaForge
/

Anvaya-Rabbit-2.7B

@@ -14,18 +14,19 @@ base_model: RtaForge/Anvaya-Rabbit-2.7B
 # Anvaya-Rabbit 2.7B — v0.1 Alpha
-**Proof of concept.** Rabbit is the first model in the Anvaya series — a demonstration
-that a fully custom State-Space Model (SSM) architecture can be trained from scratch,
-on a single GPU, without any dependence on attention or transformer building blocks.
 This is not a production model. It is the opening move in a deliberate curriculum:
-**Rabbit → Raccoon → Polar Bear.** The architecture, training protocol, and
-infrastructure are the story. The benchmarks are a baseline.
 ## Architecture
 - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
-- **Parameters**: ~2.78B
 - **Layers**: 64
 - **d_model / d_state**: 2560
 - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
@@ -35,30 +36,35 @@ infrastructure are the story. The benchmarks are a baseline.
 ## Weights
 This repository contains the base pretrained checkpoint
-(`base/Anvaya-Rabbit-2.3B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
-(`imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt`).
-Load the imprint weights directly:
 ```python
 from white_rabbit.rabbit_model import create_rabbit_model
 from transformers import AutoTokenizer
 import torch
-model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
-sd = torch.load("imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt", map_location="cpu")
 model.load_state_dict(sd, strict=False)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
-> **Requires**: `rtaforge-substrates` — this model uses a custom SSM architecture
 > not compatible with standard HuggingFace `AutoModel`.
 ## Training Curriculum
-One epoch, single L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
 | Phase | Steps | Dataset | Focus |
 |-------|-------|---------|-------|
@@ -89,13 +95,14 @@ baseline of identical architecture. 50 samples per corpus, seq_len=64.
 | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
 These gains are measured against a randomly initialised model of identical
-architecture — they reflect what the training curriculum taught, not absolute capability.
 ### Commercial Benchmarks (lm-eval harness)
 > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
-> (few-shot examples + question) typically run 150–400 tokens. The scores below reflect
-> inference at context lengths the model was never trained on.
 > Raccoon (seq_len=512) will be evaluated without this constraint.
 | Benchmark | Score | Notes |
@@ -110,8 +117,8 @@ architecture — they reflect what the training curriculum taught, not absolute
 | Model | Params | seq_len | Status |
 |-------|--------|---------|--------|
-| **Rabbit** | 2.7B | 64 | ✅ This model — v0.1 Alpha |
-| **Raccoon** | 2.7B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
 | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
 The delta between Rabbit and Raccoon is the story. One epoch → two epochs,

 # Anvaya-Rabbit 2.7B — v0.1 Alpha
+**The architecture, training protocol, and infrastructure are the story.**
+Rabbit is the first model in the Anvaya series — a proof of concept demonstrating
+that a fully custom State-Space Model (SSM) can be trained from scratch, on a
+single consumer-grade GPU, with no dependence on attention or transformer
+building blocks.
 This is not a production model. It is the opening move in a deliberate curriculum:
+**Rabbit → Raccoon → Polar Bear.** The benchmarks below are a baseline, not a claim.
 ## Architecture
 - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
+- **Parameters**: ~2.7B (post-subsumination)
 - **Layers**: 64
 - **d_model / d_state**: 2560
 - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
 ## Weights
 This repository contains the base pretrained checkpoint
+(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
+(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
+Load the imprint weights (base + SFT overlay, recommended for inference):
 ```python
 from white_rabbit.rabbit_model import create_rabbit_model
 from transformers import AutoTokenizer
 import torch
+model = create_rabbit_model(
+    vocab_size=50280,
+    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
+)
+sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
 model.load_state_dict(sd, strict=False)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
+> **Requires**: `rtaforge-substrates` (private repository — contact
+> guha@rtaforge.in for access). This model uses a custom SSM architecture
 > not compatible with standard HuggingFace `AutoModel`.
 ## Training Curriculum
+One epoch, single NVIDIA L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
+Phases 1–5 (pretraining corpus progression) not shown.
 | Phase | Steps | Dataset | Focus |
 |-------|-------|---------|-------|
 | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
 These gains are measured against a randomly initialised model of identical
+architecture — they reflect what the training curriculum taught, not absolute
+capability.
 ### Commercial Benchmarks (lm-eval harness)
 > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
+> (few-shot examples + question) typically run 150–400 tokens. The scores below
+> reflect inference at context lengths the model was never trained on.
 > Raccoon (seq_len=512) will be evaluated without this constraint.
 | Benchmark | Score | Notes |
 | Model | Params | seq_len | Status |
 |-------|--------|---------|--------|
+| **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha |
+| **Raccoon** | ~2.7B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
 | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
 The delta between Rabbit and Raccoon is the story. One epoch → two epochs,