Update README.md

9a9b9a8 verified 8 months ago

7.76 kB

HRM-Text1: Hierarchical Reasoning Model for Text Generation

A large-scale transformer model with Hierarchical Reasoning Module (HRM) architecture trained on multiple high-quality text datasets. This model features adaptive computation with pondering mechanisms for improved text generation quality.

Model Architecture

HRM-Text1 implements a novel hierarchical reasoning architecture with the following key components:

Model Size: 99M parameters (Large variant)
Architecture: Hierarchical Reasoning Module with dual-stream processing
Embeddings: 1024 dimensions
Attention Heads: 16 heads
Feed-Forward: 4096 dimensions
Context Length: 512 tokens
Vocabulary: 32,128 tokens (T5 tokenizer)

Key Features

Adaptive Computation: Pondering mechanism with halt probabilities
Dual-Stream Processing: High-level (H) and Low-level (L) reasoning modules
SwiGLU Activation: Enhanced non-linear transformations
RMSNorm: Improved normalization for stable training
Mixed Precision: BF16 training support for NVIDIA Ampere+ GPUs

Training Configuration

Datasets

The model supports training on multiple high-quality datasets:

C4 Multilingual: Common Crawl web text (multilingual)
OpenWebText: English web content dataset
The Pile: Diverse text from EleutherAI
SlimPajama: 627B token dataset (filtered variants available)
FineWeb: High-quality web content
Spanish: Spanish language subset from C4

Mixed Dataset Training

The training script supports custom dataset mixing ratios:

CUSTOM_MIX_RATIOS = {
    "high_quality": {
        "slimpajama_en": 0.5,  # 50% SlimPajama English
        "pile": 0.3,           # 30% The Pile
        "openwebtext": 0.2     # 20% OpenWebText
    }
}

Training Hyperparameters

Learning Rate: 3e-4 (max) → 1e-5 (min) with cosine annealing
Batch Size: 40 (with gradient accumulation steps: 2)
Weight Decay: 0.05
Optimizer: AdamW with β₁=0.9, β₂=0.95
Epochs: 2
Mixed Precision: Enabled for compatible hardware

Model Components

HRMBlock Architecture

class HRMBlock(nn.Module):
    def __init__(self, n_embd, n_head, d_ff, dropout=0.1):
        super().__init__()
        self.norm1 = RMSNorm(n_embd)
        self.attn = nn.MultiheadAttention(n_embd, n_head, dropout=dropout, batch_first=True)
        self.norm2 = RMSNorm(n_embd)
        self.mlp = SwiGLUMuchPelu(n_embd, d_ff, dropout)
        self.dropout = nn.Dropout(dropout)

Pondering Mechanism

The model implements adaptive computation through a halt probability mechanism:

Max Steps: 8 reasoning steps
Halt Bias: -2.2 (initial)
Ponder Loss Weight: 1e-2

Usage

Quick Start

from transformers import T5Tokenizer
from modeling_hrm_text1 import HRMText1

# Load model and tokenizer
model = HRMText1.from_pretrained("dreamwar/HRM-Text1-{DATASET}-large")
tokenizer = T5Tokenizer.from_pretrained("t5-small")

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training from Scratch

Option 1: Google Colab (Recommended)

# Open the Colab notebook
https://colab.research.google.com/drive/1c4exU-zMt4SuT1kRlwQQXlLPaiazEDCf?usp=sharing

Option 2: Local Training

# Set environment variables
export HRM_OUTPUT_BASE="/path/to/output"
export HF_TOKEN="your_huggingface_token"

# Run training
python hrm_llm_training_c4_b.py

Configuration Options

The training script supports extensive configuration:

# Dataset selection
ACTIVE_DATASET = "mixed"  # Options: "c4", "openwebtext", "pile", "spanish", "mixed"

# Dataset subset percentage
DATASET_SUBSET_PERCENT = 5  # 1-100%

# Custom output path
CUSTOM_BASE_PATH = "/your/custom/path"

# Model parameters (large variant)
MODEL_PARAMS = {
    "n_embd": 1024,
    "n_head": 16,
    "d_ff": 4096,
    "dropout": 0.1,
    "halt_max_steps": 8,
    "ponder_loss_weight": 1e-2,
    "halt_bias_init": -2.2
}

Features

Multi-Dataset Support

Individual Datasets: Train on single datasets (C4, OpenWebText, Pile, etc.)
Mixed Training: Combine multiple datasets with custom ratios
Language Filtering: Optional language detection and filtering
Streaming: Memory-efficient streaming for large datasets

Training Optimizations

Checkpointing: Automatic checkpoint saving and resuming
Early Stopping: Validation-based early stopping (patience: 2)
Gradient Clipping: Norm clipping at 1.0
Mixed Precision: BF16 for memory efficiency
Model Compilation: PyTorch 2.0 compilation support

Hardware Support

CUDA: GPU acceleration with TF32 precision on Ampere+
Multi-Platform: Linux, macOS, Windows support
Google Colab: Full compatibility with free and pro tiers
Memory Management: Automatic DataLoader worker detection

Output Structure

HRM_Models/
├── hrm_text1_{dataset}_output-large/
│   ├── config.json
│   ├── pytorch_model.bin
│   ├── tokenizer.json
│   ├── best_model.bin
│   └── checkpoint.pth

Environment Setup

Quick Start with Google Colab

Click the Colab badge above to get started immediately with a pre-configured environment including all dependencies.

Local Installation

pip install torch transformers datasets tqdm huggingface_hub
pip install langdetect  # Optional: for language filtering

Environment Variables

# Required for model upload
export HF_TOKEN="your_huggingface_token"

# Optional: custom output path
export HRM_OUTPUT_BASE="/your/custom/path"

Model Variants

The training script produces several model variants:

HRM-Text1-C4-large: Trained on C4 multilingual
HRM-Text1-Mixed-large: Trained on balanced dataset mixture
HRM-Text1-Spanish-large: Spanish language variant
HRM-Text1-Custom-{name}-large: Custom mixture variants

Performance

Model Specifications

Parameters: ~1B trainable parameters
Memory Usage: ~4-6GB VRAM for inference
Training Time: Varies by dataset size and hardware
Context Length: 512 tokens

Generation Quality

The model implements sophisticated reasoning through:

Hierarchical processing of information
Adaptive computation based on input complexity
Pondering mechanism for quality-vs-speed trade-offs

License

This model and training code are released under the Apache 2.0 License.

Citation

@misc{hrm-text1-2024,
  title={HRM-Text1: Hierarchical Reasoning Model for Text Generation},
  author={DreamWar},
  year={2024},
  url={https://huggingface.co/dreamwar/HRM-Text1}
}

Troubleshooting

Common Issues

Memory Errors: Reduce batch size or enable gradient checkpointing
Dataset Loading: Ensure stable internet connection for streaming
CUDA Errors: Update PyTorch and CUDA drivers
Language Detection: Install langdetect for language filtering

Support

For issues and questions:

Check the training script comments for detailed configuration
Review error messages for specific guidance
Ensure proper environment setup and dependencies

This model was trained using the HRM (Hierarchical Reasoning Module) architecture with adaptive computation for improved text generation capabilities.