docs: link executive briefing PDF in model card

4c25f8e verified 12 days ago

7.12 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- ssm
	- state-space-model
	- causal-lm
	- rabbit
	- rtaforge
	- proof-of-concept
	base_model: RtaForge/Anvaya-Rabbit-2.7B
	---

	# Anvaya-Rabbit 2.7B — v0.1 Alpha

	Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
	from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
	and the Gurukul constitutional training protocol. It serves as a technical
	proof-of-concept that capable alternative-architecture models can be developed under
	severe compute constraints. This is the first model in the Anvaya series:
	Rabbit → Raccoon → Polar Bear.

	## Overview

	Rabbit demonstrates three proprietary components developed by RtaForge:

	- Ṛta-SSM — a custom recurrent state-space architecture with no attention
	or transformer blocks
	- Gurukul — a proposal-validation training loop in which a Sisya proposes
	weight deltas and a Guru validates them against constitutional constraints before
	applying
	- Subsuminator — cross-architecture weight migration without full retraining,
	enabling efficient curriculum transfer

	Trained across a phased curriculum on a single consumer GPU, Rabbit shows
	substantial gains over random initialisation on internal scale-invariant metrics.
	It is a deliberate architecture proof at seq_len=64 — not a production model.

	For strategic context, IndiaAI alignment, and full programme roadmap, see the
	[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).

	## Architecture

	- Type: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
	- Parameters: ~2.7B (post-subsumination)
	- Layers: 64
	- d_model / d_state: 2560
	- Vocabulary: 50,280 (GPT-NeoX tokenizer)
	- Precision: bfloat16
	- Training seq_len: 64

	## Weights

	This repository contains the base pretrained checkpoint
	(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
	(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).

	Load the imprint weights (base + SFT overlay, recommended for inference):

	```python
	from white_rabbit.rabbit_model import create_rabbit_model
	from transformers import AutoTokenizer
	import torch

	model = create_rabbit_model(
	vocab_size=50280,
	durga_variant="fu-64", # 64-layer Fortress Unbroken backbone
	)
	sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
	model.load_state_dict(sd, strict=False)
	model.eval()

	tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
	```

	> Requires: `rtaforge-substrates` (private repository — contact
	> guha@rtaforge.in for access). This model uses a custom SSM architecture
	> not compatible with standard HuggingFace `AutoModel`.

	Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) —
	patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
	fused SSM recurrence kernels. MIT licensed.

	## Training Protocol

	Two proprietary components make this training regime possible:

	Gurukul is a constitutional Sisya/Guru proposal-validation loop:
	- The Sisya proposes weight deltas based on the current curriculum phase
	- The Guru validates each proposal against a set of constitutional constraints
	- Accepted proposals update the model; rejected proposals are logged for signal
	- Feedback from each cycle informs the next round of proposals

	Subsuminator enables efficient migration of learned weights across architectures,
	supporting curriculum transfer without retraining from scratch.

	Together these components allowed 1,500 accepted proposals across 6 phases to be
	processed in ~7 effective days on a single 24GB GPU.

	**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
	~7 days effective training time (total elapsed higher due to crash recovery and VRAM
	leak debugging).**

	\| Phase \| Proposals \| Dataset \| Focus \|
	\|-------\|-----------\|---------\|-------\|
	\| 0 \| 125 \| CAMEL Physics \| Physical reasoning \|
	\| 1 \| 125 \| CAMEL Chemistry \| Chemical reasoning \|
	\| 2 \| 125 \| CAMEL Biology \| Biological reasoning \|
	\| 3 \| 250 \| Raccoon Phase 1 \| General reasoning \|
	\| 4 \| 500 \| Rabbit E2 Phase 4 \| Extended curriculum \|
	\| 5 \| 375 \| Raccoon Phase 3 (consolidation re-run) \| Pattern consolidation \|

	Final checkpoint: Step 1,500. seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.

	SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).

	## Evaluation

	### Internal — Scale-Invariant Metrics

	Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
	baseline of identical architecture. 50 samples per corpus, seq_len=64.

	\| Metric \| Random Init \| Trained (Step 1,500) \| Gain \|
	\|--------\|-------------\|----------------------\|------\|
	\| Top-1 Accuracy (aggregate) \| 0.24% \| 1.90% \| ~8× \|
	\| Top-10 Accuracy (aggregate) \| 0.24% \| 35.84% \| ~149× \|
	\| MRR (aggregate) \| 0.0026 \| 0.1724 \| ~66× \|
	\| MRR — Deep Math \| 0.0084 \| 0.186 \| 22× \|
	\| Top-10 — Biology \| ~1.3% \| ~12% \| ~10× \|
	\| Top-10 — Chemistry \| ~1.3% \| ~13% \| ~10× \|

	These gains are measured against a randomly initialised model of identical
	architecture — they reflect what the training curriculum taught, not absolute
	capability.

	### Commercial Benchmarks (lm-eval harness)

	> Standard academic benchmarks are not yet meaningful here. Rabbit was
	> deliberately trained at seq_len=64 as a pure architecture proof. Standard
	> lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
	> Raccoon (seq_len=512) removes this constraint entirely.

	\| Benchmark \| Score \| Notes \|
	\|-----------\|-------\|-------\|
	\| HellaSwag \| 25.89% \| Prompt exceeds training seq_len \|
	\| ARC-Challenge \| 26.71% \| Prompt exceeds training seq_len \|
	\| MMLU \| 26.89% \| Prompt exceeds training seq_len \|
	\| WinoGrande \| 48.62% \| Prompt exceeds training seq_len \|
	\| TruthfulQA MC1 \| 21.91% \| Prompt exceeds training seq_len \|

	## Roadmap

	\| Model \| Params \| seq_len \| Status \|
	\|-------\|--------\|---------\|--------\|
	\| Rabbit \| ~2.7B \| 64 \| ✅ This model — v0.1 Alpha \|
	\| Raccoon \| ~6.1B \| 512 \| In training — reasoning curriculum (math ×2, logic ×2) \|
	\| Polar Bear \| ~13B \| 512 \| Planned — STEM + AEVA anti-hallucination layer \|

	The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
	philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
	be the first Ṛta-SSM model trained end-to-end in India on domestic compute
	infrastructure to reach standard benchmark competitiveness.

	Give us more resources and watch what happens.

	## Related Resources

	- [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
	- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
	- Technical inquiries: guha@rtaforge.in