Mamba2-2.7B Gurukul Instruct (Phase B Reference)
This model is a 1-epoch instruction-tuned version of Mamba2-2.7B, produced as the Phase B Reference Model for the Anvaya sovereign AI ecosystem.
Model Highlights
- Architecture: Mamba2 (Structured State Space Model)
- Scale: 2.7 Billion Parameters
- Training Phase: Phase B (Stable Reference Quality)
- Optimizer: 8-bit Paged AdamW
- Precision: BF16
- Checkpoint: Final (Step 4063)
Dataset: Gurukul Mix (Phase B)
The model was fine-tuned on a 65,000 example subset of high-signal instruction data:
- 50k samples from
teknium/OpenHermes-2.5 - 15k samples from
databricks/databricks-dolly-15k
Technical Details: Inference-Time Configuration
CRITICAL: This model uses the GPT-NeoX tokenizer (matching the Mamba2 base). Use the EleutherAI/gpt-neox-20b tokenizer or the included files.
Recommended Decoding Presets
To mitigate observed inference-time artifacts (repetition/tail drift), use the following parameters:
| Parameter | Value | Rationale |
|---|---|---|
| Temperature | 0.7 | Balances coherence and creativity |
| Top-p | 0.9 | Prevents long-tail sampling noise |
| Repetition Penalty | 1.15 | Breaks local attractor loops |
| Frequency Penalty | 0.2 | Regularizes short-range continuity |
Completion Stop Tokens
Ensure your inference engine terminates on:
<|endoftext|>\n\n####includeassistant.txt
Known Limitations & Remediation
As a 2.7B Reference Quality model, the following artifacts are identified but manageable via inference-time fixes (no retraining required):
- Mild Tail Repetition: Resolved by the recommended decoding presets above.
- Distributional Leakage: Formatting strings like
#include "assistant.txt"may appear at the generation boundary. Resolved by the provided stop tokens. - Symbolic Arithmetic: While reasoning steps are often correct, final numeric results may vary. Mitigation: Deffered to Phase C (Tool-Augmented Reasoning). We recommend tool-routing for arithmetic-heavy tasks.
Repro / Setup
The weights were frozen after 1-epoch. Training was verified for stability with 0 skipped batches and strict gradient clipping at 1.0.
Anvaya: Sovereignty as a Service
- Downloads last month
- -
Model tree for RtaForge/mamba2-2.7b-gurukul-instruct
Base model
state-spaces/mamba2-2.7b