HRM-Text1: Hierarchical Reasoning Model for Text Generation
A large-scale transformer model with Hierarchical Reasoning Module (HRM) architecture trained on multiple high-quality text datasets. This model features adaptive computation with pondering mechanisms for improved text generation quality.
Model Architecture
HRM-Text1 implements a novel hierarchical reasoning architecture with the following key components:
- Model Size: 99M parameters (Large variant)
- Architecture: Hierarchical Reasoning Module with dual-stream processing
- Embeddings: 1024 dimensions
- Attention Heads: 16 heads
- Feed-Forward: 4096 dimensions
- Context Length: 512 tokens
- Vocabulary: 32,128 tokens (T5 tokenizer)
Key Features
- Adaptive Computation: Pondering mechanism with halt probabilities
- Dual-Stream Processing: High-level (H) and Low-level (L) reasoning modules
- SwiGLU Activation: Enhanced non-linear transformations
- RMSNorm: Improved normalization for stable training
- Mixed Precision: BF16 training support for NVIDIA Ampere+ GPUs
Training Configuration
Datasets
The model supports training on multiple high-quality datasets:
- C4 Multilingual: Common Crawl web text (multilingual)
- OpenWebText: English web content dataset
- The Pile: Diverse text from EleutherAI
- SlimPajama: 627B token dataset (filtered variants available)
- FineWeb: High-quality web content
- Spanish: Spanish language subset from C4
Mixed Dataset Training
The training script supports custom dataset mixing ratios:
CUSTOM_MIX_RATIOS = {
"high_quality": {
"slimpajama_en": 0.5, # 50% SlimPajama English
"pile": 0.3, # 30% The Pile
"openwebtext": 0.2 # 20% OpenWebText
}
}
Training Hyperparameters
- Learning Rate: 3e-4 (max) β 1e-5 (min) with cosine annealing
- Batch Size: 40 (with gradient accumulation steps: 2)
- Weight Decay: 0.05
- Optimizer: AdamW with Ξ²β=0.9, Ξ²β=0.95
- Epochs: 2
- Mixed Precision: Enabled for compatible hardware
Model Components
HRMBlock Architecture
class HRMBlock(nn.Module):
def __init__(self, n_embd, n_head, d_ff, dropout=0.1):
super().__init__()
self.norm1 = RMSNorm(n_embd)
self.attn = nn.MultiheadAttention(n_embd, n_head, dropout=dropout, batch_first=True)
self.norm2 = RMSNorm(n_embd)
self.mlp = SwiGLUMuchPelu(n_embd, d_ff, dropout)
self.dropout = nn.Dropout(dropout)
Pondering Mechanism
The model implements adaptive computation through a halt probability mechanism:
- Max Steps: 8 reasoning steps
- Halt Bias: -2.2 (initial)
- Ponder Loss Weight: 1e-2
Usage
Quick Start
from transformers import T5Tokenizer
from modeling_hrm_text1 import HRMText1
# Load model and tokenizer
model = HRMText1.from_pretrained("dreamwar/HRM-Text1-{DATASET}-large")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training from Scratch
Option 1: Google Colab (Recommended)
# Open the Colab notebook
https://colab.research.google.com/drive/1c4exU-zMt4SuT1kRlwQQXlLPaiazEDCf?usp=sharing
Option 2: Local Training
# Set environment variables
export HRM_OUTPUT_BASE="/path/to/output"
export HF_TOKEN="your_huggingface_token"
# Run training
python hrm_llm_training_c4_b.py
Configuration Options
The training script supports extensive configuration:
# Dataset selection
ACTIVE_DATASET = "mixed" # Options: "c4", "openwebtext", "pile", "spanish", "mixed"
# Dataset subset percentage
DATASET_SUBSET_PERCENT = 5 # 1-100%
# Custom output path
CUSTOM_BASE_PATH = "/your/custom/path"
# Model parameters (large variant)
MODEL_PARAMS = {
"n_embd": 1024,
"n_head": 16,
"d_ff": 4096,
"dropout": 0.1,
"halt_max_steps": 8,
"ponder_loss_weight": 1e-2,
"halt_bias_init": -2.2
}
Features
Multi-Dataset Support
- Individual Datasets: Train on single datasets (C4, OpenWebText, Pile, etc.)
- Mixed Training: Combine multiple datasets with custom ratios
- Language Filtering: Optional language detection and filtering
- Streaming: Memory-efficient streaming for large datasets
Training Optimizations
- Checkpointing: Automatic checkpoint saving and resuming
- Early Stopping: Validation-based early stopping (patience: 2)
- Gradient Clipping: Norm clipping at 1.0
- Mixed Precision: BF16 for memory efficiency
- Model Compilation: PyTorch 2.0 compilation support
Hardware Support
- CUDA: GPU acceleration with TF32 precision on Ampere+
- Multi-Platform: Linux, macOS, Windows support
- Google Colab: Full compatibility with free and pro tiers
- Memory Management: Automatic DataLoader worker detection
Output Structure
HRM_Models/
βββ hrm_text1_{dataset}_output-large/
β βββ config.json
β βββ pytorch_model.bin
β βββ tokenizer.json
β βββ best_model.bin
β βββ checkpoint.pth
Environment Setup
Quick Start with Google Colab
Click the Colab badge above to get started immediately with a pre-configured environment including all dependencies.
Local Installation
pip install torch transformers datasets tqdm huggingface_hub
pip install langdetect # Optional: for language filtering
Environment Variables
# Required for model upload
export HF_TOKEN="your_huggingface_token"
# Optional: custom output path
export HRM_OUTPUT_BASE="/your/custom/path"
Model Variants
The training script produces several model variants:
- HRM-Text1-C4-large: Trained on C4 multilingual
- HRM-Text1-Mixed-large: Trained on balanced dataset mixture
- HRM-Text1-Spanish-large: Spanish language variant
- HRM-Text1-Custom-{name}-large: Custom mixture variants
Performance
Model Specifications
- Parameters: ~1B trainable parameters
- Memory Usage: ~4-6GB VRAM for inference
- Training Time: Varies by dataset size and hardware
- Context Length: 512 tokens
Generation Quality
The model implements sophisticated reasoning through:
- Hierarchical processing of information
- Adaptive computation based on input complexity
- Pondering mechanism for quality-vs-speed trade-offs
License
This model and training code are released under the Apache 2.0 License.
Citation
@misc{hrm-text1-2024,
title={HRM-Text1: Hierarchical Reasoning Model for Text Generation},
author={DreamWar},
year={2024},
url={https://huggingface.co/dreamwar/HRM-Text1}
}
Troubleshooting
Common Issues
- Memory Errors: Reduce batch size or enable gradient checkpointing
- Dataset Loading: Ensure stable internet connection for streaming
- CUDA Errors: Update PyTorch and CUDA drivers
- Language Detection: Install
langdetectfor language filtering
Support
For issues and questions:
- Check the training script comments for detailed configuration
- Review error messages for specific guidance
- Ensure proper environment setup and dependencies
This model was trained using the HRM (Hierarchical Reasoning Module) architecture with adaptive computation for improved text generation capabilities.
- Downloads last month
- 24