--- base_model: t5-small license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu tags: - text-generation - causal-lm - mamba - hrm - pytorch language: - en pipeline_tag: text-generation --- # CMBA-768M-FineWeb A 768M parameter Hierarchical Recurrent Memory (HRM) language model trained on high-quality web text from FineWeb-Edu. This model uses **Mamba2 state-space models** instead of traditional attention mechanisms, enabling efficient long-range sequence modeling. ## Model Architecture **CMBA** (Causal Mamba-based Architecture) implements a hierarchical processing structure: - **Hierarchical Design**: Dual-level processing with H-layers (high-level abstraction) and L-layers (low-level specialists) - **Mamba2 Mixers**: State-space models replace attention for O(n) complexity vs O(n²) - **Adaptive Computation**: Halting mechanism allows variable compute per token (ACT-style pondering) - **Parameters**: ~768M total - **Context Length**: 1024 tokens ### Configuration ```python Model Dimensions: - d_model: 768 - n_heads: 12 (for compatibility, not used in Mamba) - d_ff: 3072 - H_layers: 12 (high-level hierarchy) - L_layers: 12 (low-level processing) Mamba2 Settings: - d_state: 128 - expand: 2 - headdim: 64 - d_conv: 4 - ngroups: 1 Training: - Max halt steps: 8 - Block size: 1024 - Batch size: 32 (effective) - Learning rate: 0.0002 → 1e-06 - Weight decay: 0.1 ``` ## Training Data - **Dataset**: [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (sample-10BT) - **Tokenizer**: `t5-small` (T5 SentencePiece) - **Vocab Size**: 32100 ## Latest Performance (Epoch 2) - **Validation Loss**: `8.1216` - **Validation Perplexity**: `3366.37` ## Usage ```python from transformers import T5Tokenizer from hrm_text1_modeling import HRMText1 tokenizer = T5Tokenizer.from_pretrained("t5-small") model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-FineWeb") # Generate text input_ids = tokenizer("Once upon a time", return_tensors="pt").input_ids outputs = model.generate(input_ids, max_length=100) print(tokenizer.decode(outputs[0])) ``` ## Citation If you use this model, please cite: ```bibtex @misc{cmba-768m-fineweb, author = {Vihari}, title = {CMBA-768M-FineWeb: Hierarchical Mamba-based Language Model}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/Viharikvs/CMBA-768M-FineWeb} } ``` ## License Apache 2.0