| # ULTRATHINK | |
| <p align="center"> | |
| <img src="docs/images/pp.jpg" alt="ULTRATHINK Logo" width="250" /> | |
| </p> | |
| <p align="center"> | |
| <strong>π Production-ready training framework for advanced Large Language Models</strong> | |
| </p> | |
| <p align="center"> | |
| <a href="https://colab.research.google.com/github/vediyappanm/UltraThinking-LLM-Training/blob/main/deep/docs/colab.ipynb"> | |
| <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | |
| </a> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/actions"> | |
| <img src="https://github.com/vediyappanm/UltraThinking-LLM-Training/workflows/CI/badge.svg" alt="CI Status"/> | |
| </a> | |
| <a href="https://www.python.org/downloads/"> | |
| <img src="https://img.shields.io/badge/python-3.9+-blue.svg" alt="Python 3.9+"/> | |
| </a> | |
| <a href="LICENSE"> | |
| <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"/> | |
| </a> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/stargazers"> | |
| <img src="https://img.shields.io/github/stars/vediyappanm/UltraThinking-LLM-Training?style=social" alt="GitHub stars"/> | |
| </a> | |
| </p> | |
| <p align="center"> | |
| <a href="https://pytorch.org/"> | |
| <img src="https://img.shields.io/badge/PyTorch-2.0+-EE4C2C?logo=pytorch&logoColor=white" alt="PyTorch"/> | |
| </a> | |
| <a href="https://huggingface.co/"> | |
| <img src="https://img.shields.io/badge/π€-Hugging%20Face-yellow" alt="Hugging Face"/> | |
| </a> | |
| <a href="https://www.docker.com/"> | |
| <img src="https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker&logoColor=white" alt="Docker"/> | |
| </a> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/issues"> | |
| <img src="https://img.shields.io/github/issues/vediyappanm/UltraThinking-LLM-Training" alt="Issues"/> | |
| </a> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/pulls"> | |
| <img src="https://img.shields.io/github/issues-pr/vediyappanm/UltraThinking-LLM-Training" alt="Pull Requests"/> | |
| </a> | |
| </p> | |
| <p align="center"> | |
| <a href="#-quick-start">Quick Start</a> β’ | |
| <a href="#-key-features">Features</a> β’ | |
| <a href="#-documentation">Documentation</a> β’ | |
| <a href="docs/BENCHMARKS.md">Benchmarks</a> β’ | |
| <a href="docs/COMPARISON.md">Comparisons</a> β’ | |
| <a href="docs/ROADMAP.md">Roadmap</a> β’ | |
| <a href="#-contributing">Contributing</a> | |
| </p> | |
| --- | |
| ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring. | |
| ## π― Why ULTRATHINK? | |
| **Train state-of-the-art LLMs in 10 lines of code** - From prototype to production in minutes, not days. | |
| ```bash | |
| python train_ultrathink.py \ | |
| --dataset c4 --streaming \ | |
| --hidden_size 768 --num_layers 12 \ | |
| --enable_moe --enable_dre \ | |
| --use_amp --gradient_checkpointing | |
| ``` | |
| ### π What Makes Us Different | |
| | Feature | ULTRATHINK | Others | | |
| |---------|-----------|--------| | |
| | **Setup Time** | β‘ 5 minutes | 30-120 minutes | | |
| | **Lines to Train** | π ~10 | 50-100+ | | |
| | **MoE Support** | β Native | β or Limited | | |
| | **Dynamic Reasoning** | β Unique | β None | | |
| | **Constitutional AI** | β Built-in | β None | | |
| | **Documentation** | π Comprehensive | Varies | | |
| **[See detailed comparison β](docs/COMPARISON.md)** | |
| ## β¨ Key Features | |
| - ποΈ **Modern Architecture** - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm | |
| - π§ **Advanced Components** - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI | |
| - π **Production Monitoring** - MLflow, W&B, TensorBoard integration | |
| - β‘ **Optimized Training** - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP | |
| - π§ͺ **Fully Tested** - Unit & integration tests with pytest | |
| - π³ **Docker Support** - Ready-to-use containers for training and inference | |
| - π **Complete Docs** - Step-by-step guides for all experience levels | |
| **[View benchmarks and performance metrics β](docs/BENCHMARKS.md)** | |
| ## π Quick Start | |
| ### Installation | |
| ```bash | |
| # Clone repository | |
| git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git | |
| cd UltraThinking-LLM-Training/deep | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### Training Examples | |
| **Tiny Model (CPU-friendly, for testing):** | |
| ```bash | |
| python train_ultrathink.py \ | |
| --dataset wikitext \ | |
| --hidden_size 256 --num_layers 2 --num_heads 4 \ | |
| --batch_size 2 --max_samples 1000 \ | |
| --num_epochs 1 | |
| ``` | |
| **Small Model (GPU recommended):** | |
| ```bash | |
| python train_advanced.py --config configs/train_small.yaml | |
| ``` | |
| **With Advanced Features:** | |
| ```bash | |
| python train_ultrathink.py \ | |
| --dataset c4 --streaming \ | |
| --hidden_size 768 --num_layers 12 --num_heads 12 \ | |
| --enable_moe --enable_dre --enable_constitutional \ | |
| --use_amp --gradient_checkpointing \ | |
| --use_mlflow | |
| ``` | |
| ### Docker | |
| ```bash | |
| # Run Gradio web interface | |
| docker compose up | |
| # Or build and run manually | |
| docker build -t ultrathink:latest . | |
| docker run -p 7860:7860 ultrathink:latest | |
| ``` | |
| ### Testing | |
| ```bash | |
| # Run all tests | |
| pytest | |
| # Run with coverage | |
| pytest --cov=src --cov-report=html | |
| # Quick smoke test | |
| python tests/smoke_test.py | |
| ``` | |
| ## π Documentation | |
| ### π Getting Started | |
| - **[Training Quickstart](docs/TRAINING_QUICKSTART.md)** - Get started in 5 minutes | |
| - **[Advanced Training Guide](ADVANCED_TRAINING_GUIDE.md)** - Deep dive into all features | |
| - **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions | |
| - **[Google Colab](docs/colab.md)** - Train in the cloud for free | |
| ### π Performance & Comparisons | |
| - **[Benchmarks](docs/BENCHMARKS.md)** - Performance metrics and results | |
| - **[Framework Comparison](docs/COMPARISON.md)** - vs GPT-NeoX, Megatron-LM, Axolotl | |
| - **[Model Card](docs/MODEL_CARD.md)** - Model specifications | |
| ### ποΈ Architecture & Development | |
| - **[Architecture Overview](ARCHITECTURE_OVERVIEW.md)** - Visual system diagrams | |
| - **[Project Structure](docs/PROJECT_STRUCTURE.md)** - Understanding the codebase | |
| - **[Roadmap](docs/ROADMAP.md)** - Future plans and features | |
| ### π Training Guides | |
| - [Small Models](docs/training_small.md) - Train on limited hardware | |
| - [DeepSpeed Integration](docs/training_deepspeed.md) - Distributed training setup | |
| - [Dataset Configuration](docs/datasets.md) - Using custom datasets | |
| ### π€ Community | |
| - **[Contributing](CONTRIBUTING.md)** - Contribution guidelines | |
| - **[Code of Conduct](CODE_OF_CONDUCT.md)** - Community standards | |
| - **[Changelog](CHANGELOG.md)** - Version history | |
| **[π Full Documentation Index](docs/README.md)** | |
| ## π Project Structure | |
| ``` | |
| deep/ | |
| βββ train_ultrathink.py # Main training script | |
| βββ train_advanced.py # YAML config-based training | |
| βββ app_gradio.py # Web UI for inference | |
| βββ src/ | |
| β βββ models/ # UltraThink, MoE, DRE, architecture | |
| β βββ data/ # Datasets, tokenization, validation | |
| β βββ training/ # Optimizers, distributed, RLHF | |
| β βββ monitoring/ # Metrics and system monitoring | |
| β βββ security/ # Input validation and safety | |
| β βββ evaluation/ # Benchmarks and metrics | |
| βββ tests/ # Unit and integration tests | |
| βββ configs/ # YAML configuration files | |
| βββ scripts/ # Utilities (profiling, inference) | |
| βββ docs/ # Documentation and guides | |
| ``` | |
| See **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** for detailed explanations. | |
| ## π₯ Training Examples | |
| ### Small Dataset Training | |
| ```bash | |
| # WikiText-2 (fast iteration) | |
| python train_ultrathink.py \ | |
| --dataset wikitext \ | |
| --hidden_size 512 --num_layers 6 --num_heads 8 \ | |
| --batch_size 4 --num_epochs 3 \ | |
| --use_mlflow | |
| ``` | |
| ### Production Training (C4 Dataset) | |
| ```bash | |
| # Streaming C4 with all optimizations | |
| python train_ultrathink.py \ | |
| --dataset c4 --dataset_subset en --streaming \ | |
| --hidden_size 768 --num_layers 12 --num_heads 12 \ | |
| --batch_size 2 --gradient_accumulation_steps 64 \ | |
| --learning_rate 3e-4 --warmup_steps 5000 \ | |
| --use_amp --gradient_checkpointing \ | |
| --max_seq_length 1024 \ | |
| --output_dir ./outputs/c4_production | |
| ``` | |
| ### Using Configuration Files | |
| ```bash | |
| # Small model (4-8GB GPU) | |
| python train_advanced.py --config configs/train_small.yaml | |
| # Medium model (16-32GB GPU) | |
| python train_advanced.py --config configs/train_medium.yaml | |
| # Large model (40GB+ GPU) | |
| python train_advanced.py --config configs/train_large.yaml | |
| ``` | |
| ## π³ Docker Usage | |
| **Web Interface (Gradio):** | |
| ```bash | |
| docker compose up | |
| # Visit http://localhost:7860 | |
| ``` | |
| **Custom Training:** | |
| ```bash | |
| docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \ | |
| python train_ultrathink.py \ | |
| --dataset wikitext \ | |
| --hidden_size 256 --num_layers 2 \ | |
| --output_dir /app/outputs/my_model | |
| ``` | |
| **GPU Training:** | |
| ```bash | |
| docker run --gpus all \ | |
| -v $(pwd)/outputs:/app/outputs \ | |
| ultrathink:latest \ | |
| python train_ultrathink.py --use_amp | |
| ``` | |
| ## π€ Contributing | |
| We welcome contributions! Please see: | |
| - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Guidelines and setup | |
| - **[CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)** - Community standards | |
| - **[Roadmap](docs/ROADMAP.md)** - See what we're building next | |
| ### π Star History | |
| If you find ULTRATHINK useful, please consider giving us a star! β | |
| [](https://star-history.com/#vediyappanm/UltraThinking-LLM-Training&Date) | |
| ## π Model Specifications | |
| | Size | Parameters | Layers | Hidden | Context | Min GPU | | |
| |------|-----------|--------|--------|---------|---------| | |
| | Tiny | 125M | 12 | 768 | 2048 | 6GB | | |
| | Small | 350M | 24 | 1024 | 4096 | 16GB | | |
| | Medium | 760M | 24 | 1536 | 4096 | 24GB | | |
| | Large | 1.3B | 32 | 2048 | 8192 | 40GB | | |
| See **[MODEL_CARD.md](MODEL_CARD.md)** for complete specifications. | |
| ## π License | |
| MIT License - see [LICENSE](LICENSE) for details. | |
| ## π Citation | |
| If you use ULTRATHINK in your research or project, please cite: | |
| ```bibtex | |
| @software{ultrathink2025, | |
| title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning}, | |
| author={ULTRATHINK Team}, | |
| year={2025}, | |
| url={https://github.com/vediyappanm/UltraThinking-LLM-Training}, | |
| version={1.0.0} | |
| } | |
| ``` | |
| ## π Community & Support | |
| <p align="center"> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/discussions"> | |
| <img src="https://img.shields.io/badge/Discussions-Join%20Us-blue?logo=github" alt="Discussions"/> | |
| </a> | |
| <a href="https://github.com/vediyappanm/UltraThinking-LLM-Training/issues"> | |
| <img src="https://img.shields.io/badge/Issues-Report%20Bug-red?logo=github" alt="Issues"/> | |
| </a> | |
| <a href="https://twitter.com/intent/tweet?text=Check%20out%20ULTRATHINK%20-%20Advanced%20LLM%20Training%20Framework&url=https://github.com/vediyappanm/UltraThinking-LLM-Training"> | |
| <img src="https://img.shields.io/badge/Twitter-Share-1DA1F2?logo=twitter&logoColor=white" alt="Twitter"/> | |
| </a> | |
| </p> | |
| ### π¬ Get Help | |
| - **[GitHub Discussions](https://github.com/vediyappanm/UltraThinking-LLM-Training/discussions)** - Ask questions, share ideas | |
| - **[Issue Tracker](https://github.com/vediyappanm/UltraThinking-LLM-Training/issues)** - Report bugs, request features | |
| - **[Troubleshooting Guide](docs/TROUBLESHOOTING.md)** - Common issues and solutions | |
| - **[FAQ](docs/faq.md)** - Frequently asked questions | |
| ### π Share Your Work | |
| Built something cool with ULTRATHINK? We'd love to hear about it! | |
| - Open a discussion to share your project | |
| - Submit a PR to add your model to our showcase | |
| - Tweet about it and tag us | |
| ### π’ Stay Updated | |
| - β **Star this repo** to get notifications | |
| - π **Watch releases** for new features | |
| - π¦ **Follow on Twitter** for updates | |
| --- | |
| <p align="center"> | |
| <strong>Made with β€οΈ by the ULTRATHINK Team</strong> | |
| </p> | |
| <p align="center"> | |
| <a href="#ultrathink">Back to Top β</a> | |
| </p> | |