tvastr commited on
Commit
f54708a
Β·
verified Β·
1 Parent(s): 907ba21

fix: consistent 2.7B naming, reviewer fixes, rtaforge-substrates note

Browse files
Files changed (1) hide show
  1. README.md +25 -18
README.md CHANGED
@@ -14,18 +14,19 @@ base_model: RtaForge/Anvaya-Rabbit-2.7B
14
 
15
  # Anvaya-Rabbit 2.7B β€” v0.1 Alpha
16
 
17
- **Proof of concept.** Rabbit is the first model in the Anvaya series β€” a demonstration
18
- that a fully custom State-Space Model (SSM) architecture can be trained from scratch,
19
- on a single GPU, without any dependence on attention or transformer building blocks.
 
 
20
 
21
  This is not a production model. It is the opening move in a deliberate curriculum:
22
- **Rabbit β†’ Raccoon β†’ Polar Bear.** The architecture, training protocol, and
23
- infrastructure are the story. The benchmarks are a baseline.
24
 
25
  ## Architecture
26
 
27
  - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken β€” recurrent SSM, no attention
28
- - **Parameters**: ~2.78B
29
  - **Layers**: 64
30
  - **d_model / d_state**: 2560
31
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
@@ -35,30 +36,35 @@ infrastructure are the story. The benchmarks are a baseline.
35
  ## Weights
36
 
37
  This repository contains the base pretrained checkpoint
38
- (`base/Anvaya-Rabbit-2.3B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
39
- (`imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt`).
40
 
41
- Load the imprint weights directly:
42
 
43
  ```python
44
  from white_rabbit.rabbit_model import create_rabbit_model
45
  from transformers import AutoTokenizer
46
  import torch
47
 
48
- model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
49
- sd = torch.load("imprint/Anvaya-Rabbit-2.3b-0.1-alpha-imprint.pt", map_location="cpu")
 
 
 
50
  model.load_state_dict(sd, strict=False)
51
  model.eval()
52
 
53
  tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
54
  ```
55
 
56
- > **Requires**: `rtaforge-substrates` β€” this model uses a custom SSM architecture
 
57
  > not compatible with standard HuggingFace `AutoModel`.
58
 
59
  ## Training Curriculum
60
 
61
- One epoch, single L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
 
62
 
63
  | Phase | Steps | Dataset | Focus |
64
  |-------|-------|---------|-------|
@@ -89,13 +95,14 @@ baseline of identical architecture. 50 samples per corpus, seq_len=64.
89
  | Top-10 β€” Chemistry | ~1.3% | **~13%** | **~10Γ—** |
90
 
91
  These gains are measured against a randomly initialised model of identical
92
- architecture β€” they reflect what the training curriculum taught, not absolute capability.
 
93
 
94
  ### Commercial Benchmarks (lm-eval harness)
95
 
96
  > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
97
- > (few-shot examples + question) typically run 150–400 tokens. The scores below reflect
98
- > inference at context lengths the model was never trained on.
99
  > Raccoon (seq_len=512) will be evaluated without this constraint.
100
 
101
  | Benchmark | Score | Notes |
@@ -110,8 +117,8 @@ architecture β€” they reflect what the training curriculum taught, not absolute
110
 
111
  | Model | Params | seq_len | Status |
112
  |-------|--------|---------|--------|
113
- | **Rabbit** | 2.7B | 64 | βœ… This model β€” v0.1 Alpha |
114
- | **Raccoon** | 2.7B | 512 | In training β€” reasoning curriculum (math Γ—2, logic Γ—2) |
115
  | **Polar Bear** | ~13B | 512 | Planned β€” STEM + AEVA anti-hallucination layer |
116
 
117
  The delta between Rabbit and Raccoon is the story. One epoch β†’ two epochs,
 
14
 
15
  # Anvaya-Rabbit 2.7B β€” v0.1 Alpha
16
 
17
+ **The architecture, training protocol, and infrastructure are the story.**
18
+ Rabbit is the first model in the Anvaya series β€” a proof of concept demonstrating
19
+ that a fully custom State-Space Model (SSM) can be trained from scratch, on a
20
+ single consumer-grade GPU, with no dependence on attention or transformer
21
+ building blocks.
22
 
23
  This is not a production model. It is the opening move in a deliberate curriculum:
24
+ **Rabbit β†’ Raccoon β†’ Polar Bear.** The benchmarks below are a baseline, not a claim.
 
25
 
26
  ## Architecture
27
 
28
  - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken β€” recurrent SSM, no attention
29
+ - **Parameters**: ~2.7B (post-subsumination)
30
  - **Layers**: 64
31
  - **d_model / d_state**: 2560
32
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
 
36
  ## Weights
37
 
38
  This repository contains the base pretrained checkpoint
39
+ (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
40
+ (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
41
 
42
+ Load the imprint weights (base + SFT overlay, recommended for inference):
43
 
44
  ```python
45
  from white_rabbit.rabbit_model import create_rabbit_model
46
  from transformers import AutoTokenizer
47
  import torch
48
 
49
+ model = create_rabbit_model(
50
+ vocab_size=50280,
51
+ durga_variant="fu-64", # 64-layer Fortress Unbroken backbone
52
+ )
53
+ sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
54
  model.load_state_dict(sd, strict=False)
55
  model.eval()
56
 
57
  tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
58
  ```
59
 
60
+ > **Requires**: `rtaforge-substrates` (private repository β€” contact
61
+ > guha@rtaforge.in for access). This model uses a custom SSM architecture
62
  > not compatible with standard HuggingFace `AutoModel`.
63
 
64
  ## Training Curriculum
65
 
66
+ One epoch, single NVIDIA L4, ~15,000 steps across 8 phases + 1,500-step Scholar Sprint.
67
+ Phases 1–5 (pretraining corpus progression) not shown.
68
 
69
  | Phase | Steps | Dataset | Focus |
70
  |-------|-------|---------|-------|
 
95
  | Top-10 β€” Chemistry | ~1.3% | **~13%** | **~10Γ—** |
96
 
97
  These gains are measured against a randomly initialised model of identical
98
+ architecture β€” they reflect what the training curriculum taught, not absolute
99
+ capability.
100
 
101
  ### Commercial Benchmarks (lm-eval harness)
102
 
103
  > **Important caveat**: Rabbit was trained at seq_len=64. Standard lm-eval prompts
104
+ > (few-shot examples + question) typically run 150–400 tokens. The scores below
105
+ > reflect inference at context lengths the model was never trained on.
106
  > Raccoon (seq_len=512) will be evaluated without this constraint.
107
 
108
  | Benchmark | Score | Notes |
 
117
 
118
  | Model | Params | seq_len | Status |
119
  |-------|--------|---------|--------|
120
+ | **Rabbit** | ~2.7B | 64 | βœ… This model β€” v0.1 Alpha |
121
+ | **Raccoon** | ~2.7B | 512 | In training β€” reasoning curriculum (math Γ—2, logic Γ—2) |
122
  | **Polar Bear** | ~13B | 512 | Planned β€” STEM + AEVA anti-hallucination layer |
123
 
124
  The delta between Rabbit and Raccoon is the story. One epoch β†’ two epochs,