--- license: apache-2.0 language: - en tags: - mixture-of-experts - mixture-of-recursions - causal-lm - custom-architecture - pytorch base_model: Qwen/Qwen2.5-0.5B-Instruct pipeline_tag: text-generation --- # HybridMoRMoE — Hybrid Mixture-of-Recursions & Mixture-of-Experts A custom causal language model combining **Mixture-of-Recursions (MoR)** with **Mixture-of-Experts (MoE)** routing, built from scratch in PyTorch and trained via a three-stage pipeline (pre-training → SFT → GRPO). --- ## Architecture | Hyperparameter | Value | |---|---| | Model type | `hybrid_mor_moe` | | Hidden dim (`d_model`) | 576 | | Feed-forward dim (`d_ff`) | 1536 | | Attention heads | 8 | | Base layers | 6 | | Shared recursive blocks | 6 | | Unique last layers | 2 | | Total transformer depth | 30 | | Number of experts | 4 | | Experts per token | 1 | | Max recursions | 3 | | Router percentile | 0.70 | | Sequence length | 4096 | | Vocabulary size | 151,665 | | Tokenizer | Qwen2Tokenizer (Qwen2.5 compatible) | **Key design choices:** - Shared weight blocks are recursively applied based on a learned complexity score - A per-token MoE router selects which expert processes each position - Auxiliary routing loss (`router_aux_loss_coef = 1e-4`) encourages load balance - Chat template follows the ChatML (`<|im_start|>` / `<|im_end|>`) format --- ## Training Pipeline The model was trained in three sequential stages on a single NVIDIA P100 (16 GB HBM2): | Stage | Method | Notes | |---|---|---| | 1 | **Pre-training** | Causal LM on open-domain text | | 2 | **SFT** (Supervised Fine-Tuning) | Instruction following with packing | | 3 | **GRPO** (Group Relative Policy Optimisation) | Reinforcement learning from preference signal | Training used FP16 precision throughout (P100 has no BF16 support). --- ## Usage Because this model uses a **custom architecture** not registered in the Hugging Face Transformers library by default, you must load the modelling code alongside the weights. ### Quick inference ```python import torch from transformers import AutoTokenizer # 1. Clone / download this repo # 2. Make sure hybrid_mor_moe_training.py is on your Python path # (it registers HybridMoRMoEForCausalLM & HybridMoRMoEConfig with AutoModel) from hybrid_mor_moe_training import HybridMoRMoEConfig, HybridMoRMoEForCausalLM model_path = "TorchLLM/HybridMoRMoE" # or local path config = HybridMoRMoEConfig.from_pretrained(model_path) model = HybridMoRMoEForCausalLM.from_pretrained(model_path, config=config) tokenizer = AutoTokenizer.from_pretrained(model_path) model.eval() device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) messages = [ {"role": "user", "content": "Explain the difference between MoE and dense transformers."} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(device) with torch.no_grad(): out = model.simple_generate( inputs["input_ids"], max_new_tokens=256, temperature=0.7, top_p=0.9, eos_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` ### Environment setup ```bash pip install torch transformers trl datasets accelerate ``` > **HF_TOKEN**: If you need to access gated datasets during re-training, export your token: > ```bash > export HF_TOKEN="your_token_here" > ``` > Never hard-code tokens in source files. --- ## Repository Structure ``` TorchLLM/HybridMoRMoE/ ├── config.json # Model architecture config ├── generation_config.json # Default generation settings ├── model.safetensors # Trained weights (SafeTensors format) ├── tokenizer.json # Tokenizer vocabulary & rules ├── tokenizer_config.json # Tokenizer metadata ├── chat_template.jinja # ChatML chat template └── hybrid_mor_moe_training.py # Full training pipeline source ``` --- ## Citation If you use this model or training code in your research, please cite: ```bibtex @misc{hybridmormoe2025, title = {HybridMoRMoE: Combining Mixture-of-Recursions and Mixture-of-Experts for Efficient Causal LM}, author = {Abhishek Gandhi}, year = {2026}, url = {https://huggingface.co/TorchLLM/HybridMoRMoE} } ``` --- ## License Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.