RtaForge
/

Anvaya-Rabbit-2.7B

@@ -1,196 +1,58 @@
 ---
-license: cc-by-nc-sa-4.0
 language:
 - en
 tags:
 - ssm
 - state-space-model
-- mamba
 - causal-lm
 - rtaforge
-- anvaya
 ---
-# Rabbit-RtaSSM — Anvaya 2.7B
-**RtaForge Anvaya Series** | Durga fu-64 Architecture | 2.7B Parameters
-> Commercial licensing available — contact guha@rtaforge.in
----
-## ⚠️ This is a Proof of Concept
-**Rabbit is not a finished product. It is not meant to be.**
-This is the first public model in the Anvaya family — a single-epoch run on a single NVIDIA L4 GPU, trained to validate the architecture, the training pipeline, and the weight subsumination technique. It is a flag planted, not a summit reached.
-What this model demonstrates:
-- The **Durga fu-64** SSM architecture trains and converges
-- **Weight subsumination** from Mamba2 works (patent pending)
-- The **Gurukul** constitutional training framework functions at scale
-- A 2.6B SSM can learn meaningful representations on a single L4 in one epoch
-What this model is not:
-- A competitor to GPT-4, Claude, or Gemini
-- A production-ready assistant
-- The best we can do — not even close
-**Raccoon (6.1B, seq_len=512, reasoning-heavy curriculum) and Polar Bear are in training.**
-The benchmark story gets told there.
----
-## Model Lineage
-```
-Mamba2 2.7B
-    │
-    └─▶  Rabbit-RtaSSM 2.7B  (weight subsumination — patent pending)
-             │
-             ├─▶  base/         ← 1,500-step trained base model
-             │    Fine-tuned on: OpenOrca · Cosmopedia · LogiQA · ARC-Challenge ·
-             │                   GSM8K · MetaMathQA · SciQ · Python instructions ·
-             │                   Glaive function-calling · Glaive alignment
-             │
-             └─▶  imprint/      ← base + Rabbit personality SFT
-```
-**Weight Subsumination** is a proprietary RtaForge technique for transplanting learned
-representations from a source architecture into a structurally distinct target model.
-*Patent pending — technique details not disclosed.*
----
 ## Architecture
-| Property | Value |
-|----------|-------|
-| Architecture | Durga fu-64 (custom SSM) |
-| Base lineage | Mamba2 2.7B (weight subsumination) |
-| Parameters | ~2.6B |
-| Tokenizer | EleutherAI/gpt-neox-20b (vocab 50,280) |
-| Training seq length | 64 |
-| Optimizer | Lion (lr 1e-5) |
-| Training hardware | Single NVIDIA L4 (24GB) |
-| Training framework | Gurukul Phase 2 Hardened |
----
-## Training Curriculum
-One epoch, single L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
-| Phase | Steps | Dataset | Focus |
-|-------|-------|---------|-------|
-| 0 | 1,500 | OpenOrca + Cosmopedia | General warmup |
-| 1 | 3,000 | LogiQA + ARC-Challenge | Logic & reasoning |
-| 2 | 2,500 | GSM8K + MetaMathQA | Mathematics |
-| 3 | 2,000 | SciQ | Science / STEM |
-| 4 | 1,500 | Python instructions | Coding |
-| 5 | 1,000 | Glaive function-calling | Tool use |
-| 6 | 2,000 | Glaive alignment | Alignment |
-| 7 | 1,500 | Glaive alignment | Alignment |
-Final Scholar Sprint: 1,500 steps, Phase 5 saturation (Logic Giants corpus).
-**Final checkpoint: Step 1,500.**
----
-## Evaluation Results (Step 1,500)
-### Internal — Scale-Invariant Metrics
-Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. random-initialised baseline.
-50 samples per corpus, seq_len=64.
-| Metric | Random Init | Trained (Step 1,500) | Gain |
-|--------|-------------|----------------------|------|
-| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
-| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
-| MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
-| MRR — Deep Math | 0.0084 | **0.186** | **22×** |
-| Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
-| Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
-These gains are measured against a randomly initialised model of identical architecture —
-they reflect what the training curriculum taught, not absolute capability.
-### Commercial Benchmarks (lm-eval)
-> **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
-> (few-shot examples + question) typically run 150–400 tokens. Scores below reflect
-> inference at context lengths the model was not trained on.
-> Raccoon (seq_len=512) will be evaluated without this constraint.
-| Benchmark | Score | Notes |
-|-----------|-------|-------|
-| HellaSwag | TBD | |
-| ARC-Challenge | TBD | |
-| MMLU | TBD | Expect near-random due to long prompts |
-| WinoGrande | TBD | |
-| TruthfulQA | TBD | Alignment corpus benefit expected |
-*lm-eval in progress — scores will be updated upon completion.*
----
-## What Comes Next
-| Model | Params | seq_len | Status |
-|-------|--------|---------|--------|
-| **Rabbit** | 2.6B | 64 | ✅ This model |
-| **Raccoon** | 6.1B | 512 | In training — reasoning-heavy curriculum (math ×2, logic ×2) |
-| **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
-The delta between Rabbit and Raccoon is the story. One epoch → two epochs, seq_len 64 → 512, 2.6B → 6.1B. Same pipeline, same hardware philosophy. **Give us more resources and watch what happens.**
----
-## Usage
-This model uses a custom SSM architecture. Standard HuggingFace `AutoModel` is not supported.
 ```python
-# Requires: rtaforge-substrates + torch, transformers
-from white_rabbit.rabbit_model import create_rabbit_model
-from transformers import AutoTokenizer
 import torch
 model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
-sd = torch.load("base/pytorch_model.bin", map_location="cpu")
-model.load_state_dict(sd, strict=False)
 model.eval()
-tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
----
-## License
-The model weights in this repository are licensed under
-**Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)**.
-- ✅ Free for research, education, and non-commercial use
-- ✅ Derivatives must carry the same licence
-- ❌ Commercial use requires a separate agreement
-> **Commercial licensing available — contact guha@rtaforge.in**
----
-## Citation
-```
-@misc{rtaforge2026rabbit,
-  title  = {Rabbit-RtaSSM: Anvaya 2.7B State Space Model (Proof of Concept)},
-  author = {RtaForge},
-  year   = {2026},
-  url    = {https://huggingface.co/RtaForge/Anvaya-Raccoon2.7B}
-}
-```
----
-*Forged at RtaForge — ऋत्*

 ---
 language:
 - en
+license: apache-2.0
 tags:
 - ssm
 - state-space-model
 - causal-lm
+- raccoon
 - rtaforge
+base_model: RtaForge/Anvaya-Raccoon2.7B
 ---
+# Anvaya-Raccoon 2.7B
+A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
+constitutional training protocol.
 ## Architecture
+- **Type**: Ṛta-SSM v7.2.2-FU (Fortress Unbroken) — recurrent SSM, no attention
+- **Parameters**: ~2.78B
+- **Layers**: 64
+- **d_model / d_state**: 2560
+- **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
+- **Precision**: bfloat16
+## Weights
+This repository contains a single merged checkpoint (`v1.1/model.pt`) that
+combines the base pretrained weights with the SFT imprint surface layer.
+Load it directly:
 ```python
 import torch
+from white_rabbit.rabbit_model import create_rabbit_model
 model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
+sd = torch.load("model.pt", map_location="cpu")
+model.load_state_dict(sd, strict=True)
 model.eval()
 ```
+## Benchmarks
+| Task | Metric | Score |
+|------|--------|-------|
+| HellaSwag | acc_norm | 25.89% |
+| ARC-Challenge | acc_norm | 26.71% |
+| MMLU | acc | 26.89% |
+| WinoGrande | acc | 48.62% |
+| TruthfulQA MC1 | acc | 21.91% |
+## Training
+Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
+where Sisya proposes weight deltas and Guru applies them after validation.
+SFT imprint applied using surface-only gate-layer fine-tuning.