tvastr commited on
Commit
4c25f8e
Β·
verified Β·
1 Parent(s): 1ba0d45

docs: link executive briefing PDF in model card

Browse files
Files changed (1) hide show
  1. README.md +50 -23
README.md CHANGED
@@ -14,15 +14,31 @@ base_model: RtaForge/Anvaya-Rabbit-2.7B
14
 
15
  # Anvaya-Rabbit 2.7B β€” v0.1 Alpha
16
 
17
- > **The architecture, training protocol, and infrastructure are the story.**
 
 
 
 
 
18
 
19
- Rabbit is the first model in the Anvaya series β€” a proof of concept demonstrating
20
- that a fully custom State-Space Model (SSM) can be trained from scratch, on a
21
- single consumer-grade GPU, with no dependence on attention or transformer
22
- building blocks.
23
 
24
- This is not a production model. It is the opening move in a deliberate curriculum:
25
- **Rabbit β†’ Raccoon β†’ Polar Bear.** The benchmarks below are a baseline, not a claim.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Architecture
28
 
@@ -66,17 +82,21 @@ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
66
  patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
67
  fused SSM recurrence kernels. MIT licensed.
68
 
69
- ## Training
70
 
71
  Two proprietary components make this training regime possible:
72
 
73
- - **Subsuminator** β€” migrates learned weights across architectures without
74
- retraining from scratch, enabling efficient curriculum transfer.
75
- - **Gurukul** β€” a constitutional Sisya/Guru proposal-validation loop. Sisya
76
- proposes weight deltas; Guru validates them against constitutional constraints
77
- before applying. Strong learning signals extracted from limited data and compute.
 
 
 
78
 
79
- Together they are why Rabbit trained in 7 days on a single consumer GPU.
 
80
 
81
  **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
82
  ~7 days effective training time (total elapsed higher due to crash recovery and VRAM
@@ -93,10 +113,9 @@ leak debugging).**
93
 
94
  **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
95
 
96
- SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs),
97
- trained with the Anvaya Gurukul protocol.
98
 
99
- ## Evaluation Results (Step 1,500)
100
 
101
  ### Internal β€” Scale-Invariant Metrics
102
 
@@ -120,9 +139,8 @@ capability.
120
 
121
  > **Standard academic benchmarks are not yet meaningful here.** Rabbit was
122
  > deliberately trained at seq_len=64 as a pure architecture proof. Standard
123
- > lm-eval prompts (few-shot examples + question) run 150–400 tokens β€” well
124
- > beyond Rabbit's training context. Raccoon (seq_len=512) removes this
125
- > constraint entirely.
126
 
127
  | Benchmark | Score | Notes |
128
  |-----------|-------|-------|
@@ -132,7 +150,7 @@ capability.
132
  | WinoGrande | 48.62% | Prompt exceeds training seq_len |
133
  | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
134
 
135
- ## What Comes Next
136
 
137
  | Model | Params | seq_len | Status |
138
  |-------|--------|---------|--------|
@@ -140,6 +158,15 @@ capability.
140
  | **Raccoon** | ~6.1B | 512 | In training β€” reasoning curriculum (math Γ—2, logic Γ—2) |
141
  | **Polar Bear** | ~13B | 512 | Planned β€” STEM + AEVA anti-hallucination layer |
142
 
143
- The delta between Rabbit and Raccoon is the story. One epoch β†’ two epochs,
144
- seq_len 64 β†’ 512, 2.7B β†’ 6.1B. Same pipeline, same hardware philosophy.
 
 
 
145
  **Give us more resources and watch what happens.**
 
 
 
 
 
 
 
14
 
15
  # Anvaya-Rabbit 2.7B β€” v0.1 Alpha
16
 
17
+ Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
18
+ from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
19
+ and the Gurukul constitutional training protocol. It serves as a technical
20
+ proof-of-concept that capable alternative-architecture models can be developed under
21
+ severe compute constraints. This is the first model in the Anvaya series:
22
+ **Rabbit β†’ Raccoon β†’ Polar Bear**.
23
 
24
+ ## Overview
 
 
 
25
 
26
+ Rabbit demonstrates three proprietary components developed by RtaForge:
27
+
28
+ - **Ṛta-SSM** β€” a custom recurrent state-space architecture with no attention
29
+ or transformer blocks
30
+ - **Gurukul** β€” a proposal-validation training loop in which a Sisya proposes
31
+ weight deltas and a Guru validates them against constitutional constraints before
32
+ applying
33
+ - **Subsuminator** β€” cross-architecture weight migration without full retraining,
34
+ enabling efficient curriculum transfer
35
+
36
+ Trained across a phased curriculum on a single consumer GPU, Rabbit shows
37
+ substantial gains over random initialisation on internal scale-invariant metrics.
38
+ It is a deliberate architecture proof at seq_len=64 β€” not a production model.
39
+
40
+ For strategic context, IndiaAI alignment, and full programme roadmap, see the
41
+ [Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
42
 
43
  ## Architecture
44
 
 
82
  patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
83
  fused SSM recurrence kernels. MIT licensed.
84
 
85
+ ## Training Protocol
86
 
87
  Two proprietary components make this training regime possible:
88
 
89
+ **Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
90
+ - The Sisya proposes weight deltas based on the current curriculum phase
91
+ - The Guru validates each proposal against a set of constitutional constraints
92
+ - Accepted proposals update the model; rejected proposals are logged for signal
93
+ - Feedback from each cycle informs the next round of proposals
94
+
95
+ **Subsuminator** enables efficient migration of learned weights across architectures,
96
+ supporting curriculum transfer without retraining from scratch.
97
 
98
+ Together these components allowed 1,500 accepted proposals across 6 phases to be
99
+ processed in ~7 effective days on a single 24GB GPU.
100
 
101
  **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
102
  ~7 days effective training time (total elapsed higher due to crash recovery and VRAM
 
113
 
114
  **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
115
 
116
+ SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
 
117
 
118
+ ## Evaluation
119
 
120
  ### Internal β€” Scale-Invariant Metrics
121
 
 
139
 
140
  > **Standard academic benchmarks are not yet meaningful here.** Rabbit was
141
  > deliberately trained at seq_len=64 as a pure architecture proof. Standard
142
+ > lm-eval prompts run 150–400 tokens β€” well beyond Rabbit's training context.
143
+ > Raccoon (seq_len=512) removes this constraint entirely.
 
144
 
145
  | Benchmark | Score | Notes |
146
  |-----------|-------|-------|
 
150
  | WinoGrande | 48.62% | Prompt exceeds training seq_len |
151
  | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
152
 
153
+ ## Roadmap
154
 
155
  | Model | Params | seq_len | Status |
156
  |-------|--------|---------|--------|
 
158
  | **Raccoon** | ~6.1B | 512 | In training β€” reasoning curriculum (math Γ—2, logic Γ—2) |
159
  | **Polar Bear** | ~13B | 512 | Planned β€” STEM + AEVA anti-hallucination layer |
160
 
161
+ The delta between Rabbit and Raccoon is the story β€” same pipeline, same hardware
162
+ philosophy, 8Γ— context length, reasoning-heavy curriculum. Raccoon is intended to
163
+ be the first Ṛta-SSM model trained end-to-end in India on domestic compute
164
+ infrastructure to reach standard benchmark competitiveness.
165
+
166
  **Give us more resources and watch what happens.**
167
+
168
+ ## Related Resources
169
+
170
+ - [Anvaya Executive Briefing β€” May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
171
+ - Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
172
+ - Technical inquiries: guha@rtaforge.in