dreamwar
/

HRM-Text1-C4-large

Safetensors

hrm_text1

Model card Files Files and versions

xet

Community

dreamwar commited on Aug 18, 2025

Commit

9a9b9a8

verified ·

1 Parent(s): 32ee9e7

Update README.md

Browse files

Files changed (1) hide show

README.md +193 -125

README.md CHANGED Viewed

@@ -1,199 +1,267 @@
----
-library_name: transformers
-tags: []
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

+# HRM-Text1: Hierarchical Reasoning Model for Text Generation
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1c4exU-zMt4SuT1kRlwQQXlLPaiazEDCf?usp=sharing)
+A large-scale transformer model with Hierarchical Reasoning Module (HRM) architecture trained on multiple high-quality text datasets. This model features adaptive computation with pondering mechanisms for improved text generation quality.
+## Model Architecture
+**HRM-Text1** implements a novel hierarchical reasoning architecture with the following key components:
+- **Model Size**: 99M parameters (Large variant)
+- **Architecture**: Hierarchical Reasoning Module with dual-stream processing
+- **Embeddings**: 1024 dimensions
+- **Attention Heads**: 16 heads
+- **Feed-Forward**: 4096 dimensions
+- **Context Length**: 512 tokens
+- **Vocabulary**: 32,128 tokens (T5 tokenizer)
+### Key Features
+- **Adaptive Computation**: Pondering mechanism with halt probabilities
+- **Dual-Stream Processing**: High-level (H) and Low-level (L) reasoning modules
+- **SwiGLU Activation**: Enhanced non-linear transformations
+- **RMSNorm**: Improved normalization for stable training
+- **Mixed Precision**: BF16 training support for NVIDIA Ampere+ GPUs
+## Training Configuration
+### Datasets
+The model supports training on multiple high-quality datasets:
+- **C4 Multilingual**: Common Crawl web text (multilingual)
+- **OpenWebText**: English web content dataset
+- **The Pile**: Diverse text from EleutherAI
+- **SlimPajama**: 627B token dataset (filtered variants available)
+- **FineWeb**: High-quality web content
+- **Spanish**: Spanish language subset from C4
+### Mixed Dataset Training
+The training script supports custom dataset mixing ratios:
+```python
+CUSTOM_MIX_RATIOS = {
+    "high_quality": {
+        "slimpajama_en": 0.5,  # 50% SlimPajama English
+        "pile": 0.3,           # 30% The Pile
+        "openwebtext": 0.2     # 20% OpenWebText
+    }
+}
+```
+### Training Hyperparameters
+- **Learning Rate**: 3e-4 (max) → 1e-5 (min) with cosine annealing
+- **Batch Size**: 40 (with gradient accumulation steps: 2)
+- **Weight Decay**: 0.05
+- **Optimizer**: AdamW with β₁=0.9, β₂=0.95
+- **Epochs**: 2
+- **Mixed Precision**: Enabled for compatible hardware
+## Model Components
+### HRMBlock Architecture
+```python
+class HRMBlock(nn.Module):
+    def __init__(self, n_embd, n_head, d_ff, dropout=0.1):
+        super().__init__()
+        self.norm1 = RMSNorm(n_embd)
+        self.attn = nn.MultiheadAttention(n_embd, n_head, dropout=dropout, batch_first=True)
+        self.norm2 = RMSNorm(n_embd)
+        self.mlp = SwiGLUMuchPelu(n_embd, d_ff, dropout)
+        self.dropout = nn.Dropout(dropout)
+```
+### Pondering Mechanism
+The model implements adaptive computation through a halt probability mechanism:
+- **Max Steps**: 8 reasoning steps
+- **Halt Bias**: -2.2 (initial)
+- **Ponder Loss Weight**: 1e-2
+## Usage
+### Quick Start
+```python
+from transformers import T5Tokenizer
+from modeling_hrm_text1 import HRMText1
+# Load model and tokenizer
+model = HRMText1.from_pretrained("dreamwar/HRM-Text1-{DATASET}-large")
+tokenizer = T5Tokenizer.from_pretrained("t5-small")
+# Generate text
+prompt = "The future of artificial intelligence"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
+text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+### Training from Scratch
+**Option 1: Google Colab (Recommended)**
+```bash
+# Open the Colab notebook
+https://colab.research.google.com/drive/1c4exU-zMt4SuT1kRlwQQXlLPaiazEDCf?usp=sharing
+```
+**Option 2: Local Training**
+```bash
+# Set environment variables
+export HRM_OUTPUT_BASE="/path/to/output"
+export HF_TOKEN="your_huggingface_token"
+# Run training
+python hrm_llm_training_c4_b.py
+```
+### Configuration Options
+The training script supports extensive configuration:
+```python
+# Dataset selection
+ACTIVE_DATASET = "mixed"  # Options: "c4", "openwebtext", "pile", "spanish", "mixed"
+# Dataset subset percentage
+DATASET_SUBSET_PERCENT = 5  # 1-100%
+# Custom output path
+CUSTOM_BASE_PATH = "/your/custom/path"
+# Model parameters (large variant)
+MODEL_PARAMS = {
+    "n_embd": 1024,
+    "n_head": 16,
+    "d_ff": 4096,
+    "dropout": 0.1,
+    "halt_max_steps": 8,
+    "ponder_loss_weight": 1e-2,
+    "halt_bias_init": -2.2
+}
+```
+## Features
+### Multi-Dataset Support
+- **Individual Datasets**: Train on single datasets (C4, OpenWebText, Pile, etc.)
+- **Mixed Training**: Combine multiple datasets with custom ratios
+- **Language Filtering**: Optional language detection and filtering
+- **Streaming**: Memory-efficient streaming for large datasets
+### Training Optimizations
+- **Checkpointing**: Automatic checkpoint saving and resuming
+- **Early Stopping**: Validation-based early stopping (patience: 2)
+- **Gradient Clipping**: Norm clipping at 1.0
+- **Mixed Precision**: BF16 for memory efficiency
+- **Model Compilation**: PyTorch 2.0 compilation support
+### Hardware Support
+- **CUDA**: GPU acceleration with TF32 precision on Ampere+
+- **Multi-Platform**: Linux, macOS, Windows support
+- **Google Colab**: Full compatibility with free and pro tiers
+- **Memory Management**: Automatic DataLoader worker detection
+## Output Structure
+```
+HRM_Models/
+├── hrm_text1_{dataset}_output-large/
+│   ├── config.json
+│   ├── pytorch_model.bin
+│   ├── tokenizer.json
+│   ├── best_model.bin
+│   └── checkpoint.pth
+```
+## Environment Setup
+### Quick Start with Google Colab
+Click the Colab badge above to get started immediately with a pre-configured environment including all dependencies.
+### Local Installation
+```bash
+pip install torch transformers datasets tqdm huggingface_hub
+pip install langdetect  # Optional: for language filtering
+```
+### Environment Variables
+```bash
+# Required for model upload
+export HF_TOKEN="your_huggingface_token"
+# Optional: custom output path
+export HRM_OUTPUT_BASE="/your/custom/path"
+```
+## Model Variants
+The training script produces several model variants:
+- **HRM-Text1-C4-large**: Trained on C4 multilingual
+- **HRM-Text1-Mixed-large**: Trained on balanced dataset mixture
+- **HRM-Text1-Spanish-large**: Spanish language variant
+- **HRM-Text1-Custom-{name}-large**: Custom mixture variants
+## Performance
+### Model Specifications
+- **Parameters**: ~1B trainable parameters
+- **Memory Usage**: ~4-6GB VRAM for inference
+- **Training Time**: Varies by dataset size and hardware
+- **Context Length**: 512 tokens
+### Generation Quality
+The model implements sophisticated reasoning through:
+- Hierarchical processing of information
+- Adaptive computation based on input complexity
+- Pondering mechanism for quality-vs-speed trade-offs
+## License
+This model and training code are released under the Apache 2.0 License.
+## Citation
+```bibtex
+@misc{hrm-text1-2024,
+  title={HRM-Text1: Hierarchical Reasoning Model for Text Generation},
+  author={DreamWar},
+  year={2024},
+  url={https://huggingface.co/dreamwar/HRM-Text1}
+}
+```
+## Troubleshooting
+### Common Issues
+1. **Memory Errors**: Reduce batch size or enable gradient checkpointing
+2. **Dataset Loading**: Ensure stable internet connection for streaming
+3. **CUDA Errors**: Update PyTorch and CUDA drivers
+4. **Language Detection**: Install `langdetect` for language filtering
+### Support
+For issues and questions:
+- Check the training script comments for detailed configuration
+- Review error messages for specific guidance
+- Ensure proper environment setup and dependencies
+---
+*This model was trained using the HRM (Hierarchical Reasoning Module) architecture with adaptive computation for improved text generation capabilities.*