docs: link executive briefing PDF in model card
Browse files
README.md
CHANGED
|
@@ -14,15 +14,31 @@ base_model: RtaForge/Anvaya-Rabbit-2.7B
|
|
| 14 |
|
| 15 |
# Anvaya-Rabbit 2.7B β v0.1 Alpha
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
-
that a fully custom State-Space Model (SSM) can be trained from scratch, on a
|
| 21 |
-
single consumer-grade GPU, with no dependence on attention or transformer
|
| 22 |
-
building blocks.
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Architecture
|
| 28 |
|
|
@@ -66,17 +82,21 @@ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
|
|
| 66 |
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
|
| 67 |
fused SSM recurrence kernels. MIT licensed.
|
| 68 |
|
| 69 |
-
## Training
|
| 70 |
|
| 71 |
Two proprietary components make this training regime possible:
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
-
|
| 76 |
-
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
-
Together
|
|
|
|
| 80 |
|
| 81 |
**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
|
| 82 |
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
|
|
@@ -93,10 +113,9 @@ leak debugging).**
|
|
| 93 |
|
| 94 |
**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
|
| 95 |
|
| 96 |
-
SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs)
|
| 97 |
-
trained with the Anvaya Gurukul protocol.
|
| 98 |
|
| 99 |
-
## Evaluation
|
| 100 |
|
| 101 |
### Internal β Scale-Invariant Metrics
|
| 102 |
|
|
@@ -120,9 +139,8 @@ capability.
|
|
| 120 |
|
| 121 |
> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
|
| 122 |
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
|
| 123 |
-
> lm-eval prompts
|
| 124 |
-
>
|
| 125 |
-
> constraint entirely.
|
| 126 |
|
| 127 |
| Benchmark | Score | Notes |
|
| 128 |
|-----------|-------|-------|
|
|
@@ -132,7 +150,7 @@ capability.
|
|
| 132 |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
|
| 133 |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
|
| 134 |
|
| 135 |
-
##
|
| 136 |
|
| 137 |
| Model | Params | seq_len | Status |
|
| 138 |
|-------|--------|---------|--------|
|
|
@@ -140,6 +158,15 @@ capability.
|
|
| 140 |
| **Raccoon** | ~6.1B | 512 | In training β reasoning curriculum (math Γ2, logic Γ2) |
|
| 141 |
| **Polar Bear** | ~13B | 512 | Planned β STEM + AEVA anti-hallucination layer |
|
| 142 |
|
| 143 |
-
The delta between Rabbit and Raccoon is the story
|
| 144 |
-
|
|
|
|
|
|
|
|
|
|
| 145 |
**Give us more resources and watch what happens.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# Anvaya-Rabbit 2.7B β v0.1 Alpha
|
| 16 |
|
| 17 |
+
Rabbit is a 2.7B parameter recurrent State-Space Model (αΉta-SSM) trained entirely
|
| 18 |
+
from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
|
| 19 |
+
and the Gurukul constitutional training protocol. It serves as a technical
|
| 20 |
+
proof-of-concept that capable alternative-architecture models can be developed under
|
| 21 |
+
severe compute constraints. This is the first model in the Anvaya series:
|
| 22 |
+
**Rabbit β Raccoon β Polar Bear**.
|
| 23 |
|
| 24 |
+
## Overview
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
Rabbit demonstrates three proprietary components developed by RtaForge:
|
| 27 |
+
|
| 28 |
+
- **αΉta-SSM** β a custom recurrent state-space architecture with no attention
|
| 29 |
+
or transformer blocks
|
| 30 |
+
- **Gurukul** β a proposal-validation training loop in which a Sisya proposes
|
| 31 |
+
weight deltas and a Guru validates them against constitutional constraints before
|
| 32 |
+
applying
|
| 33 |
+
- **Subsuminator** β cross-architecture weight migration without full retraining,
|
| 34 |
+
enabling efficient curriculum transfer
|
| 35 |
+
|
| 36 |
+
Trained across a phased curriculum on a single consumer GPU, Rabbit shows
|
| 37 |
+
substantial gains over random initialisation on internal scale-invariant metrics.
|
| 38 |
+
It is a deliberate architecture proof at seq_len=64 β not a production model.
|
| 39 |
+
|
| 40 |
+
For strategic context, IndiaAI alignment, and full programme roadmap, see the
|
| 41 |
+
[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
|
| 42 |
|
| 43 |
## Architecture
|
| 44 |
|
|
|
|
| 82 |
patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
|
| 83 |
fused SSM recurrence kernels. MIT licensed.
|
| 84 |
|
| 85 |
+
## Training Protocol
|
| 86 |
|
| 87 |
Two proprietary components make this training regime possible:
|
| 88 |
|
| 89 |
+
**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
|
| 90 |
+
- The Sisya proposes weight deltas based on the current curriculum phase
|
| 91 |
+
- The Guru validates each proposal against a set of constitutional constraints
|
| 92 |
+
- Accepted proposals update the model; rejected proposals are logged for signal
|
| 93 |
+
- Feedback from each cycle informs the next round of proposals
|
| 94 |
+
|
| 95 |
+
**Subsuminator** enables efficient migration of learned weights across architectures,
|
| 96 |
+
supporting curriculum transfer without retraining from scratch.
|
| 97 |
|
| 98 |
+
Together these components allowed 1,500 accepted proposals across 6 phases to be
|
| 99 |
+
processed in ~7 effective days on a single 24GB GPU.
|
| 100 |
|
| 101 |
**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
|
| 102 |
~7 days effective training time (total elapsed higher due to crash recovery and VRAM
|
|
|
|
| 113 |
|
| 114 |
**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
|
| 115 |
|
| 116 |
+
SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
|
|
|
|
| 117 |
|
| 118 |
+
## Evaluation
|
| 119 |
|
| 120 |
### Internal β Scale-Invariant Metrics
|
| 121 |
|
|
|
|
| 139 |
|
| 140 |
> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
|
| 141 |
> deliberately trained at seq_len=64 as a pure architecture proof. Standard
|
| 142 |
+
> lm-eval prompts run 150β400 tokens β well beyond Rabbit's training context.
|
| 143 |
+
> Raccoon (seq_len=512) removes this constraint entirely.
|
|
|
|
| 144 |
|
| 145 |
| Benchmark | Score | Notes |
|
| 146 |
|-----------|-------|-------|
|
|
|
|
| 150 |
| WinoGrande | 48.62% | Prompt exceeds training seq_len |
|
| 151 |
| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
|
| 152 |
|
| 153 |
+
## Roadmap
|
| 154 |
|
| 155 |
| Model | Params | seq_len | Status |
|
| 156 |
|-------|--------|---------|--------|
|
|
|
|
| 158 |
| **Raccoon** | ~6.1B | 512 | In training β reasoning curriculum (math Γ2, logic Γ2) |
|
| 159 |
| **Polar Bear** | ~13B | 512 | Planned β STEM + AEVA anti-hallucination layer |
|
| 160 |
|
| 161 |
+
The delta between Rabbit and Raccoon is the story β same pipeline, same hardware
|
| 162 |
+
philosophy, 8Γ context length, reasoning-heavy curriculum. Raccoon is intended to
|
| 163 |
+
be the first αΉta-SSM model trained end-to-end in India on domestic compute
|
| 164 |
+
infrastructure to reach standard benchmark competitiveness.
|
| 165 |
+
|
| 166 |
**Give us more resources and watch what happens.**
|
| 167 |
+
|
| 168 |
+
## Related Resources
|
| 169 |
+
|
| 170 |
+
- [Anvaya Executive Briefing β May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
|
| 171 |
+
- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
|
| 172 |
+
- Technical inquiries: guha@rtaforge.in
|