tvastr commited on
Commit
0830f34
·
verified ·
1 Parent(s): f83f25e

full model card rewrite — Rabbit v0.1 Alpha

Browse files
Files changed (1) hide show
  1. README.md +82 -21
README.md CHANGED
@@ -6,53 +6,114 @@ tags:
6
  - ssm
7
  - state-space-model
8
  - causal-lm
9
- - raccoon
10
  - rtaforge
11
- base_model: RtaForge/Anvaya-Raccoon2.7B
 
12
  ---
13
 
14
- # Anvaya-Raccoon 2.7B
15
 
16
- A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
17
- constitutional training protocol.
 
 
 
 
 
18
 
19
  ## Architecture
20
 
21
- - **Type**: Ṛta-SSM v7.2.2-FU (Fortress Unbroken) — recurrent SSM, no attention
22
  - **Parameters**: ~2.78B
23
  - **Layers**: 64
24
  - **d_model / d_state**: 2560
25
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
26
  - **Precision**: bfloat16
 
27
 
28
  ## Weights
29
 
30
- This repository contains a single merged checkpoint (`v1.1/model.pt`) that
31
- combines the base pretrained weights with the SFT imprint surface layer.
32
- Load it directly:
 
 
33
 
34
  ```python
35
- import torch
36
  from white_rabbit.rabbit_model import create_rabbit_model
 
 
37
 
38
  model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
39
- sd = torch.load("model.pt", map_location="cpu")
40
- model.load_state_dict(sd, strict=True)
41
  model.eval()
 
 
42
  ```
43
 
44
- ## Benchmarks
 
45
 
46
- | Task | Metric | Score |
47
- |------|--------|-------|
48
- | HellaSwag | acc_norm | 25.89% |
49
- | ARC-Challenge | acc_norm | 26.71% |
50
- | MMLU | acc | 26.89% |
51
- | WinoGrande | acc | 48.62% |
52
- | TruthfulQA MC1 | acc | 21.91% |
53
 
54
- ## Training
 
 
 
 
 
 
 
 
55
 
56
  Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
57
  where Sisya proposes weight deltas and Guru applies them after validation.
58
  SFT imprint applied using surface-only gate-layer fine-tuning.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - ssm
7
  - state-space-model
8
  - causal-lm
9
+ - rabbit
10
  - rtaforge
11
+ - proof-of-concept
12
+ base_model: RtaForge/Anvaya-Rabbit-2.7B
13
  ---
14
 
15
+ # Anvaya-Rabbit 2.7B — v0.1 Alpha
16
 
17
+ **Proof of concept.** Rabbit is the first model in the Anvaya series — a demonstration
18
+ that a fully custom State-Space Model (SSM) architecture can be trained from scratch,
19
+ on a single GPU, without any dependence on attention or transformer building blocks.
20
+
21
+ This is not a production model. It is the opening move in a deliberate curriculum:
22
+ **Rabbit → Raccoon → Polar Bear.** The architecture, training protocol, and
23
+ infrastructure are the story. The benchmarks are a baseline.
24
 
25
  ## Architecture
26
 
27
+ - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
28
  - **Parameters**: ~2.78B
29
  - **Layers**: 64
30
  - **d_model / d_state**: 2560
31
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
32
  - **Precision**: bfloat16
33
+ - **Training seq_len**: 64
34
 
35
  ## Weights
36
 
37
+ This repository contains the base pretrained checkpoint
38
+ (`base/Anvaya-Rabbit-2.3B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
39
+ (`imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt`).
40
+
41
+ Load the imprint weights directly:
42
 
43
  ```python
 
44
  from white_rabbit.rabbit_model import create_rabbit_model
45
+ from transformers import AutoTokenizer
46
+ import torch
47
 
48
  model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
49
+ sd = torch.load("imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt", map_location="cpu")
50
+ model.load_state_dict(sd, strict=False)
51
  model.eval()
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
54
  ```
55
 
56
+ > **Requires**: `rtaforge-substrates` — this model uses a custom SSM architecture
57
+ > not compatible with standard HuggingFace `AutoModel`.
58
 
59
+ ## Training Curriculum
 
 
 
 
 
 
60
 
61
+ One epoch, single L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
62
+
63
+ | Phase | Steps | Dataset | Focus |
64
+ |-------|-------|---------|-------|
65
+ | 6 | 2,000 | Glaive alignment | Alignment |
66
+ | 7 | 1,500 | Glaive alignment | Alignment |
67
+
68
+ Final Scholar Sprint: 1,500 steps, Phase 5 saturation (Logic Giants corpus).
69
+ **Final checkpoint: Step 1,500.**
70
 
71
  Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
72
  where Sisya proposes weight deltas and Guru applies them after validation.
73
  SFT imprint applied using surface-only gate-layer fine-tuning.
74
+
75
+ ## Evaluation Results (Step 1,500)
76
+
77
+ ### Internal — Scale-Invariant Metrics
78
+
79
+ Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
80
+ baseline of identical architecture. 50 samples per corpus, seq_len=64.
81
+
82
+ | Metric | Random Init | Trained (Step 1,500) | Gain |
83
+ |--------|-------------|----------------------|------|
84
+ | Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
85
+ | Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
86
+ | MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
87
+ | MRR — Deep Math | 0.0084 | **0.186** | **22×** |
88
+ | Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
89
+ | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
90
+
91
+ These gains are measured against a randomly initialised model of identical
92
+ architecture — they reflect what the training curriculum taught, not absolute capability.
93
+
94
+ ### Commercial Benchmarks (lm-eval harness)
95
+
96
+ > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
97
+ > (few-shot examples + question) typically run 150–400 tokens. The scores below reflect
98
+ > inference at context lengths the model was never trained on.
99
+ > Raccoon (seq_len=512) will be evaluated without this constraint.
100
+
101
+ | Benchmark | Score | Notes |
102
+ |-----------|-------|-------|
103
+ | HellaSwag | 25.89% | Near-random; context length exceeds training seq_len |
104
+ | ARC-Challenge | 26.71% | Near-random; context length exceeds training seq_len |
105
+ | MMLU | 26.89% | Near-random; 5-shot prompts well beyond training seq_len |
106
+ | WinoGrande | 48.62% | Near-random |
107
+ | TruthfulQA MC1 | 21.91% | — |
108
+
109
+ ## What Comes Next
110
+
111
+ | Model | Params | seq_len | Status |
112
+ |-------|--------|---------|--------|
113
+ | **Rabbit** | 2.7B | 64 | ✅ This model — v0.1 Alpha |
114
+ | **Raccoon** | 2.7B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
115
+ | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
116
+
117
+ The delta between Rabbit and Raccoon is the story. One epoch → two epochs,
118
+ seq_len 64 → 512. Same pipeline, same hardware philosophy.
119
+ **Give us more resources and watch what happens.**