tvastr commited on
Commit
ea927a4
·
verified ·
1 Parent(s): eb7783e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -169
README.md CHANGED
@@ -1,196 +1,58 @@
1
  ---
2
- license: cc-by-nc-sa-4.0
3
  language:
4
  - en
 
5
  tags:
6
  - ssm
7
  - state-space-model
8
- - mamba
9
  - causal-lm
 
10
  - rtaforge
11
- - anvaya
12
  ---
13
 
14
- # Rabbit-RtaSSM — Anvaya 2.7B
15
 
16
- **RtaForge Anvaya Series** | Durga fu-64 Architecture | 2.7B Parameters
17
-
18
- > Commercial licensing available — contact guha@rtaforge.in
19
-
20
- ---
21
-
22
- ## ⚠️ This is a Proof of Concept
23
-
24
- **Rabbit is not a finished product. It is not meant to be.**
25
-
26
- This is the first public model in the Anvaya family — a single-epoch run on a single NVIDIA L4 GPU, trained to validate the architecture, the training pipeline, and the weight subsumination technique. It is a flag planted, not a summit reached.
27
-
28
- What this model demonstrates:
29
- - The **Durga fu-64** SSM architecture trains and converges
30
- - **Weight subsumination** from Mamba2 works (patent pending)
31
- - The **Gurukul** constitutional training framework functions at scale
32
- - A 2.6B SSM can learn meaningful representations on a single L4 in one epoch
33
-
34
- What this model is not:
35
- - A competitor to GPT-4, Claude, or Gemini
36
- - A production-ready assistant
37
- - The best we can do — not even close
38
-
39
- **Raccoon (6.1B, seq_len=512, reasoning-heavy curriculum) and Polar Bear are in training.**
40
- The benchmark story gets told there.
41
-
42
- ---
43
-
44
- ## Model Lineage
45
-
46
- ```
47
- Mamba2 2.7B
48
-
49
- └─▶ Rabbit-RtaSSM 2.7B (weight subsumination — patent pending)
50
-
51
- ├─▶ base/ ← 1,500-step trained base model
52
- │ Fine-tuned on: OpenOrca · Cosmopedia · LogiQA · ARC-Challenge ·
53
- │ GSM8K · MetaMathQA · SciQ · Python instructions ·
54
- │ Glaive function-calling · Glaive alignment
55
-
56
- └─▶ imprint/ ← base + Rabbit personality SFT
57
- ```
58
-
59
- **Weight Subsumination** is a proprietary RtaForge technique for transplanting learned
60
- representations from a source architecture into a structurally distinct target model.
61
- *Patent pending — technique details not disclosed.*
62
-
63
- ---
64
 
65
  ## Architecture
66
 
67
- | Property | Value |
68
- |----------|-------|
69
- | Architecture | Durga fu-64 (custom SSM) |
70
- | Base lineage | Mamba2 2.7B (weight subsumination) |
71
- | Parameters | ~2.6B |
72
- | Tokenizer | EleutherAI/gpt-neox-20b (vocab 50,280) |
73
- | Training seq length | 64 |
74
- | Optimizer | Lion (lr 1e-5) |
75
- | Training hardware | Single NVIDIA L4 (24GB) |
76
- | Training framework | Gurukul Phase 2 Hardened |
77
-
78
- ---
79
-
80
- ## Training Curriculum
81
-
82
- One epoch, single L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
83
-
84
- | Phase | Steps | Dataset | Focus |
85
- |-------|-------|---------|-------|
86
- | 0 | 1,500 | OpenOrca + Cosmopedia | General warmup |
87
- | 1 | 3,000 | LogiQA + ARC-Challenge | Logic & reasoning |
88
- | 2 | 2,500 | GSM8K + MetaMathQA | Mathematics |
89
- | 3 | 2,000 | SciQ | Science / STEM |
90
- | 4 | 1,500 | Python instructions | Coding |
91
- | 5 | 1,000 | Glaive function-calling | Tool use |
92
- | 6 | 2,000 | Glaive alignment | Alignment |
93
- | 7 | 1,500 | Glaive alignment | Alignment |
94
-
95
- Final Scholar Sprint: 1,500 steps, Phase 5 saturation (Logic Giants corpus).
96
- **Final checkpoint: Step 1,500.**
97
-
98
- ---
99
-
100
- ## Evaluation Results (Step 1,500)
101
-
102
- ### Internal — Scale-Invariant Metrics
103
 
104
- Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. random-initialised baseline.
105
- 50 samples per corpus, seq_len=64.
106
 
107
- | Metric | Random Init | Trained (Step 1,500) | Gain |
108
- |--------|-------------|----------------------|------|
109
- | Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
110
- | Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
111
- | MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
112
- | MRR — Deep Math | 0.0084 | **0.186** | **22×** |
113
- | Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
114
- | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
115
-
116
- These gains are measured against a randomly initialised model of identical architecture —
117
- they reflect what the training curriculum taught, not absolute capability.
118
-
119
- ### Commercial Benchmarks (lm-eval)
120
-
121
- > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
122
- > (few-shot examples + question) typically run 150–400 tokens. Scores below reflect
123
- > inference at context lengths the model was not trained on.
124
- > Raccoon (seq_len=512) will be evaluated without this constraint.
125
-
126
- | Benchmark | Score | Notes |
127
- |-----------|-------|-------|
128
- | HellaSwag | TBD | |
129
- | ARC-Challenge | TBD | |
130
- | MMLU | TBD | Expect near-random due to long prompts |
131
- | WinoGrande | TBD | |
132
- | TruthfulQA | TBD | Alignment corpus benefit expected |
133
-
134
- *lm-eval in progress — scores will be updated upon completion.*
135
-
136
- ---
137
-
138
- ## What Comes Next
139
-
140
- | Model | Params | seq_len | Status |
141
- |-------|--------|---------|--------|
142
- | **Rabbit** | 2.6B | 64 | ✅ This model |
143
- | **Raccoon** | 6.1B | 512 | In training — reasoning-heavy curriculum (math ×2, logic ×2) |
144
- | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
145
-
146
- The delta between Rabbit and Raccoon is the story. One epoch → two epochs, seq_len 64 → 512, 2.6B → 6.1B. Same pipeline, same hardware philosophy. **Give us more resources and watch what happens.**
147
-
148
- ---
149
-
150
- ## Usage
151
-
152
- This model uses a custom SSM architecture. Standard HuggingFace `AutoModel` is not supported.
153
 
154
  ```python
155
- # Requires: rtaforge-substrates + torch, transformers
156
- from white_rabbit.rabbit_model import create_rabbit_model
157
- from transformers import AutoTokenizer
158
  import torch
 
159
 
160
  model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
161
- sd = torch.load("base/pytorch_model.bin", map_location="cpu")
162
- model.load_state_dict(sd, strict=False)
163
  model.eval()
164
-
165
- tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
166
  ```
167
 
168
- ---
169
-
170
- ## License
171
-
172
- The model weights in this repository are licensed under
173
- **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)**.
174
-
175
- - ✅ Free for research, education, and non-commercial use
176
- - ✅ Derivatives must carry the same licence
177
- - ❌ Commercial use requires a separate agreement
178
 
179
- > **Commercial licensing available contact guha@rtaforge.in**
 
 
 
 
 
 
180
 
181
- ---
182
-
183
- ## Citation
184
-
185
- ```
186
- @misc{rtaforge2026rabbit,
187
- title = {Rabbit-RtaSSM: Anvaya 2.7B State Space Model (Proof of Concept)},
188
- author = {RtaForge},
189
- year = {2026},
190
- url = {https://huggingface.co/RtaForge/Anvaya-Raccoon2.7B}
191
- }
192
- ```
193
-
194
- ---
195
 
196
- *Forged at RtaForge ऋत्*
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  tags:
6
  - ssm
7
  - state-space-model
 
8
  - causal-lm
9
+ - raccoon
10
  - rtaforge
11
+ base_model: RtaForge/Anvaya-Raccoon2.7B
12
  ---
13
 
14
+ # Anvaya-Raccoon 2.7B
15
 
16
+ A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
17
+ constitutional training protocol.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Architecture
20
 
21
+ - **Type**: Ṛta-SSM v7.2.2-FU (Fortress Unbroken) — recurrent SSM, no attention
22
+ - **Parameters**: ~2.78B
23
+ - **Layers**: 64
24
+ - **d_model / d_state**: 2560
25
+ - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
26
+ - **Precision**: bfloat16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ ## Weights
 
29
 
30
+ This repository contains a single merged checkpoint (`v1.1/model.pt`) that
31
+ combines the base pretrained weights with the SFT imprint surface layer.
32
+ Load it directly:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ```python
 
 
 
35
  import torch
36
+ from white_rabbit.rabbit_model import create_rabbit_model
37
 
38
  model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
39
+ sd = torch.load("model.pt", map_location="cpu")
40
+ model.load_state_dict(sd, strict=True)
41
  model.eval()
 
 
42
  ```
43
 
44
+ ## Benchmarks
 
 
 
 
 
 
 
 
 
45
 
46
+ | Task | Metric | Score |
47
+ |------|--------|-------|
48
+ | HellaSwag | acc_norm | 25.89% |
49
+ | ARC-Challenge | acc_norm | 26.71% |
50
+ | MMLU | acc | 26.89% |
51
+ | WinoGrande | acc | 48.62% |
52
+ | TruthfulQA MC1 | acc | 21.91% |
53
 
54
+ ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
57
+ where Sisya proposes weight deltas and Guru applies them after validation.
58
+ SFT imprint applied using surface-only gate-layer fine-tuning.