YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

HRM-Text1: Hierarchical Reasoning Model for Text Generation

Open In Colab

A large-scale transformer model with Hierarchical Reasoning Module (HRM) architecture trained on multiple high-quality text datasets. This model features adaptive computation with pondering mechanisms for improved text generation quality.

Model Architecture

HRM-Text1 implements a novel hierarchical reasoning architecture with the following key components:

  • Model Size: 99M parameters (Large variant)
  • Architecture: Hierarchical Reasoning Module with dual-stream processing
  • Embeddings: 1024 dimensions
  • Attention Heads: 16 heads
  • Feed-Forward: 4096 dimensions
  • Context Length: 512 tokens
  • Vocabulary: 32,128 tokens (T5 tokenizer)

Key Features

  • Adaptive Computation: Pondering mechanism with halt probabilities
  • Dual-Stream Processing: High-level (H) and Low-level (L) reasoning modules
  • SwiGLU Activation: Enhanced non-linear transformations
  • RMSNorm: Improved normalization for stable training
  • Mixed Precision: BF16 training support for NVIDIA Ampere+ GPUs

Training Configuration

Datasets

The model supports training on multiple high-quality datasets:

  • C4 Multilingual: Common Crawl web text (multilingual)
  • OpenWebText: English web content dataset
  • The Pile: Diverse text from EleutherAI
  • SlimPajama: 627B token dataset (filtered variants available)
  • FineWeb: High-quality web content
  • Spanish: Spanish language subset from C4

Mixed Dataset Training

The training script supports custom dataset mixing ratios:

CUSTOM_MIX_RATIOS = {
    "high_quality": {
        "slimpajama_en": 0.5,  # 50% SlimPajama English
        "pile": 0.3,           # 30% The Pile
        "openwebtext": 0.2     # 20% OpenWebText
    }
}

Training Hyperparameters

  • Learning Rate: 3e-4 (max) β†’ 1e-5 (min) with cosine annealing
  • Batch Size: 40 (with gradient accumulation steps: 2)
  • Weight Decay: 0.05
  • Optimizer: AdamW with β₁=0.9, Ξ²β‚‚=0.95
  • Epochs: 2
  • Mixed Precision: Enabled for compatible hardware

Model Components

HRMBlock Architecture

class HRMBlock(nn.Module):
    def __init__(self, n_embd, n_head, d_ff, dropout=0.1):
        super().__init__()
        self.norm1 = RMSNorm(n_embd)
        self.attn = nn.MultiheadAttention(n_embd, n_head, dropout=dropout, batch_first=True)
        self.norm2 = RMSNorm(n_embd)
        self.mlp = SwiGLUMuchPelu(n_embd, d_ff, dropout)
        self.dropout = nn.Dropout(dropout)

Pondering Mechanism

The model implements adaptive computation through a halt probability mechanism:

  • Max Steps: 8 reasoning steps
  • Halt Bias: -2.2 (initial)
  • Ponder Loss Weight: 1e-2

Usage

Quick Start

from transformers import T5Tokenizer
from modeling_hrm_text1 import HRMText1

# Load model and tokenizer
model = HRMText1.from_pretrained("dreamwar/HRM-Text1-{DATASET}-large")
tokenizer = T5Tokenizer.from_pretrained("t5-small")

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training from Scratch

Option 1: Google Colab (Recommended)

# Open the Colab notebook
https://colab.research.google.com/drive/1c4exU-zMt4SuT1kRlwQQXlLPaiazEDCf?usp=sharing

Option 2: Local Training

# Set environment variables
export HRM_OUTPUT_BASE="/path/to/output"
export HF_TOKEN="your_huggingface_token"

# Run training
python hrm_llm_training_c4_b.py

Configuration Options

The training script supports extensive configuration:

# Dataset selection
ACTIVE_DATASET = "mixed"  # Options: "c4", "openwebtext", "pile", "spanish", "mixed"

# Dataset subset percentage
DATASET_SUBSET_PERCENT = 5  # 1-100%

# Custom output path
CUSTOM_BASE_PATH = "/your/custom/path"

# Model parameters (large variant)
MODEL_PARAMS = {
    "n_embd": 1024,
    "n_head": 16,
    "d_ff": 4096,
    "dropout": 0.1,
    "halt_max_steps": 8,
    "ponder_loss_weight": 1e-2,
    "halt_bias_init": -2.2
}

Features

Multi-Dataset Support

  • Individual Datasets: Train on single datasets (C4, OpenWebText, Pile, etc.)
  • Mixed Training: Combine multiple datasets with custom ratios
  • Language Filtering: Optional language detection and filtering
  • Streaming: Memory-efficient streaming for large datasets

Training Optimizations

  • Checkpointing: Automatic checkpoint saving and resuming
  • Early Stopping: Validation-based early stopping (patience: 2)
  • Gradient Clipping: Norm clipping at 1.0
  • Mixed Precision: BF16 for memory efficiency
  • Model Compilation: PyTorch 2.0 compilation support

Hardware Support

  • CUDA: GPU acceleration with TF32 precision on Ampere+
  • Multi-Platform: Linux, macOS, Windows support
  • Google Colab: Full compatibility with free and pro tiers
  • Memory Management: Automatic DataLoader worker detection

Output Structure

HRM_Models/
β”œβ”€β”€ hrm_text1_{dataset}_output-large/
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ pytorch_model.bin
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”œβ”€β”€ best_model.bin
β”‚   └── checkpoint.pth

Environment Setup

Quick Start with Google Colab

Click the Colab badge above to get started immediately with a pre-configured environment including all dependencies.

Local Installation

pip install torch transformers datasets tqdm huggingface_hub
pip install langdetect  # Optional: for language filtering

Environment Variables

# Required for model upload
export HF_TOKEN="your_huggingface_token"

# Optional: custom output path
export HRM_OUTPUT_BASE="/your/custom/path"

Model Variants

The training script produces several model variants:

  • HRM-Text1-C4-large: Trained on C4 multilingual
  • HRM-Text1-Mixed-large: Trained on balanced dataset mixture
  • HRM-Text1-Spanish-large: Spanish language variant
  • HRM-Text1-Custom-{name}-large: Custom mixture variants

Performance

Model Specifications

  • Parameters: ~1B trainable parameters
  • Memory Usage: ~4-6GB VRAM for inference
  • Training Time: Varies by dataset size and hardware
  • Context Length: 512 tokens

Generation Quality

The model implements sophisticated reasoning through:

  • Hierarchical processing of information
  • Adaptive computation based on input complexity
  • Pondering mechanism for quality-vs-speed trade-offs

License

This model and training code are released under the Apache 2.0 License.

Citation

@misc{hrm-text1-2024,
  title={HRM-Text1: Hierarchical Reasoning Model for Text Generation},
  author={DreamWar},
  year={2024},
  url={https://huggingface.co/dreamwar/HRM-Text1}
}

Troubleshooting

Common Issues

  1. Memory Errors: Reduce batch size or enable gradient checkpointing
  2. Dataset Loading: Ensure stable internet connection for streaming
  3. CUDA Errors: Update PyTorch and CUDA drivers
  4. Language Detection: Install langdetect for language filtering

Support

For issues and questions:

  • Check the training script comments for detailed configuration
  • Review error messages for specific guidance
  • Ensure proper environment setup and dependencies

This model was trained using the HRM (Hierarchical Reasoning Module) architecture with adaptive computation for improved text generation capabilities.

Downloads last month
24
Safetensors
Model size
99.8M params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support