tvastr commited on
Commit
23c452a
Β·
verified Β·
1 Parent(s): 3e85f14

model card: CC-BY-NC-SA 4.0, lineage, dataset curriculum, base/imprint structure

Browse files
Files changed (1) hide show
  1. README.md +167 -0
README.md ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - ssm
7
+ - state-space-model
8
+ - mamba
9
+ - causal-lm
10
+ - rtaforge
11
+ - anvaya
12
+ ---
13
+
14
+ # Rabbit-RtaSSM β€” Anvaya 2.7B
15
+
16
+ **RtaForge Anvaya Series** | Durga fu-64 Architecture | 2.7B Parameters
17
+
18
+ > Commercial licensing available β€” contact guha@rtaforge.in
19
+
20
+ ---
21
+
22
+ ## Model Lineage
23
+
24
+ ```
25
+ Mamba2 2.7B
26
+ β”‚
27
+ └─▢ Rabbit-RtaSSM 2.7B (weight subsumination β€” patent pending)
28
+ β”‚
29
+ β”œβ”€β–Ά base/ ← 1,500-step trained base model
30
+ β”‚ Fine-tuned on: OpenOrca Β· Cosmopedia Β· LogiQA Β· ARC-Challenge Β·
31
+ β”‚ GSM8K Β· MetaMathQA Β· SciQ Β· Python instructions Β·
32
+ β”‚ Glaive function-calling Β· Glaive alignment
33
+ β”‚
34
+ └─▢ imprint/ ← base + Rabbit personality SFT
35
+ ```
36
+
37
+ **Weight Subsumination** is a proprietary RtaForge technique for transplanting learned
38
+ representations from a source architecture into a structurally distinct target model.
39
+ *Patent pending β€” technique details not disclosed.*
40
+
41
+ ---
42
+
43
+ ## Model Description
44
+
45
+ Rabbit-RtaSSM is a 2.7B parameter State Space Model (SSM) trained by [RtaForge](https://rtaforge.in)
46
+ as part of the **Anvaya** small language model series. It uses the proprietary **Durga fu-64**
47
+ architecture β€” a custom SSM variant with fortress layers and constitutional governance via the
48
+ Gurukul training framework.
49
+
50
+ Rabbit is the fast, general-purpose runner of the Anvaya trio (Rabbit Β· Raccoon Β· Polar Bear),
51
+ optimised for high-throughput instruction following, logic, math, STEM, and tool dispatch.
52
+
53
+ ### Architecture
54
+
55
+ | Property | Value |
56
+ |----------|-------|
57
+ | Architecture | Durga fu-64 (custom SSM) |
58
+ | Base lineage | Mamba2 2.7B (weight subsumination) |
59
+ | Parameters | ~2.7B |
60
+ | Tokenizer | EleutherAI/gpt-neox-20b (vocab 50,280) |
61
+ | Sequence length | 512 |
62
+ | Optimizer | Lion (lr 1e-5) |
63
+ | Training framework | Gurukul Phase 2 Hardened |
64
+
65
+ ---
66
+
67
+ ## Training Curriculum
68
+
69
+ Two campaigns on an NVIDIA L4 GPU (Ace Cloud):
70
+
71
+ ### Campaign 1 β€” 8 phases, ~15,000 steps
72
+
73
+ | Phase | Steps | Dataset | Focus |
74
+ |-------|-------|---------|-------|
75
+ | 0 | 1,500 | OpenOrca + Cosmopedia | General warmup |
76
+ | 1 | 3,000 | LogiQA + ARC-Challenge | Logic & reasoning |
77
+ | 2 | 2,500 | GSM8K + MetaMathQA | Mathematics |
78
+ | 3 | 2,000 | SciQ | Science / STEM |
79
+ | 4 | 1,500 | Python instructions | Coding |
80
+ | 5 | 1,000 | Glaive function-calling | Tool use |
81
+ | 6 | 2,000 | Glaive alignment | Alignment |
82
+ | 7 | 1,500 | Glaive alignment | Alignment |
83
+
84
+ ### Campaign 2 β€” Scholar Sprint, 1,500 steps
85
+
86
+ Phase 5 saturation (Logic Giants corpus), Lion lr=1e-5.
87
+ Final base checkpoint: **Step 1,500**.
88
+
89
+ ---
90
+
91
+ ## Evaluation Results
92
+
93
+ Evaluated using scale-invariant metrics (Top-K accuracy, Mean Reciprocal Rank)
94
+ vs. random-initialised baseline. 100 samples per corpus, seq_len=512.
95
+
96
+ | Corpus | Metric | Random Init | Trained | Gain |
97
+ |--------|--------|-------------|---------|------|
98
+ | Biology | Top-1 Accuracy | baseline | **10Γ— baseline** | +10Γ— |
99
+ | Chemistry | Top-1 Accuracy | baseline | **10Γ— baseline** | +10Γ— |
100
+ | Deep Math | MRR | 0.008 | **0.186** | **+22Γ—** |
101
+
102
+ *Full Step 1,500 evaluation results will be added upon final publication.*
103
+
104
+ ---
105
+
106
+ ## Repository Structure
107
+
108
+ ```
109
+ RtaForge/Anvaya-Raccoon2.7B
110
+ β”œβ”€β”€ base/
111
+ β”‚ └── pytorch_model.bin ← base model weights (step 1,500)
112
+ β”œβ”€β”€ imprint/
113
+ β”‚ └── pytorch_model.bin ← base + Rabbit personality SFT
114
+ └── logs/
115
+ └── training_logs_1500.zip
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Usage
121
+
122
+ This model uses a custom SSM architecture and requires the RtaForge inference stack.
123
+ Standard HuggingFace `AutoModel` is not supported.
124
+
125
+ ```python
126
+ # Requires: rtaforge-substrates + torch, transformers
127
+ from white_rabbit.rabbit_model import create_rabbit_model
128
+ from transformers import AutoTokenizer
129
+ import torch
130
+
131
+ model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
132
+ sd = torch.load("base/pytorch_model.bin", map_location="cpu")
133
+ model.load_state_dict(sd, strict=False)
134
+ model.eval()
135
+
136
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
137
+ ```
138
+
139
+ ---
140
+
141
+ ## License
142
+
143
+ The model weights in this repository are licensed under
144
+ **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)**.
145
+
146
+ - βœ… Free for research, education, and non-commercial use
147
+ - βœ… Derivatives must carry the same licence
148
+ - ❌ Commercial use requires a separate agreement
149
+
150
+ > **Commercial licensing available β€” contact guha@rtaforge.in**
151
+
152
+ ---
153
+
154
+ ## Citation
155
+
156
+ ```
157
+ @misc{rtaforge2026rabbit,
158
+ title = {Rabbit-RtaSSM: Anvaya 2.7B State Space Model},
159
+ author = {RtaForge},
160
+ year = {2026},
161
+ url = {https://huggingface.co/RtaForge/Anvaya-Raccoon2.7B}
162
+ }
163
+ ```
164
+
165
+ ---
166
+
167
+ *Forged at RtaForge β€” ΰ€‹ΰ€€ΰ₯*