HybridMoRMoE / README.md
TorchLLM's picture
Update README.md
7766680 verified
---
license: apache-2.0
language:
- en
tags:
- mixture-of-experts
- mixture-of-recursions
- causal-lm
- custom-architecture
- pytorch
base_model: Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: text-generation
---
# HybridMoRMoE β€” Hybrid Mixture-of-Recursions & Mixture-of-Experts
A custom causal language model combining **Mixture-of-Recursions (MoR)** with **Mixture-of-Experts (MoE)** routing, built from scratch in PyTorch and trained via a three-stage pipeline (pre-training β†’ SFT β†’ GRPO).
---
## Architecture
| Hyperparameter | Value |
|---|---|
| Model type | `hybrid_mor_moe` |
| Hidden dim (`d_model`) | 576 |
| Feed-forward dim (`d_ff`) | 1536 |
| Attention heads | 8 |
| Base layers | 6 |
| Shared recursive blocks | 6 |
| Unique last layers | 2 |
| Total transformer depth | 30 |
| Number of experts | 4 |
| Experts per token | 1 |
| Max recursions | 3 |
| Router percentile | 0.70 |
| Sequence length | 4096 |
| Vocabulary size | 151,665 |
| Tokenizer | Qwen2Tokenizer (Qwen2.5 compatible) |
**Key design choices:**
- Shared weight blocks are recursively applied based on a learned complexity score
- A per-token MoE router selects which expert processes each position
- Auxiliary routing loss (`router_aux_loss_coef = 1e-4`) encourages load balance
- Chat template follows the ChatML (`<|im_start|>` / `<|im_end|>`) format
---
## Training Pipeline
The model was trained in three sequential stages on a single NVIDIA P100 (16 GB HBM2):
| Stage | Method | Notes |
|---|---|---|
| 1 | **Pre-training** | Causal LM on open-domain text |
| 2 | **SFT** (Supervised Fine-Tuning) | Instruction following with packing |
| 3 | **GRPO** (Group Relative Policy Optimisation) | Reinforcement learning from preference signal |
Training used FP16 precision throughout (P100 has no BF16 support).
---
## Usage
Because this model uses a **custom architecture** not registered in the Hugging Face Transformers library by default, you must load the modelling code alongside the weights.
### Quick inference
```python
import torch
from transformers import AutoTokenizer
# 1. Clone / download this repo
# 2. Make sure hybrid_mor_moe_training.py is on your Python path
# (it registers HybridMoRMoEForCausalLM & HybridMoRMoEConfig with AutoModel)
from hybrid_mor_moe_training import HybridMoRMoEConfig, HybridMoRMoEForCausalLM
model_path = "TorchLLM/HybridMoRMoE" # or local path
config = HybridMoRMoEConfig.from_pretrained(model_path)
model = HybridMoRMoEForCausalLM.from_pretrained(model_path, config=config)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
messages = [
{"role": "user", "content": "Explain the difference between MoE and dense transformers."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
out = model.simple_generate(
inputs["input_ids"],
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
### Environment setup
```bash
pip install torch transformers trl datasets accelerate
```
> **HF_TOKEN**: If you need to access gated datasets during re-training, export your token:
> ```bash
> export HF_TOKEN="your_token_here"
> ```
> Never hard-code tokens in source files.
---
## Repository Structure
```
TorchLLM/HybridMoRMoE/
β”œβ”€β”€ config.json # Model architecture config
β”œβ”€β”€ generation_config.json # Default generation settings
β”œβ”€β”€ model.safetensors # Trained weights (SafeTensors format)
β”œβ”€β”€ tokenizer.json # Tokenizer vocabulary & rules
β”œβ”€β”€ tokenizer_config.json # Tokenizer metadata
β”œβ”€β”€ chat_template.jinja # ChatML chat template
└── hybrid_mor_moe_training.py # Full training pipeline source
```
---
## Citation
If you use this model or training code in your research, please cite:
```bibtex
@misc{hybridmormoe2025,
title = {HybridMoRMoE: Combining Mixture-of-Recursions and Mixture-of-Experts for Efficient Causal LM},
author = {Abhishek Gandhi},
year = {2026},
url = {https://huggingface.co/TorchLLM/HybridMoRMoE}
}
```
---
## License
Apache 2.0 β€” see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.