humigencev2 / README.md
lilbablo's picture
docs: add Hugging Face model card metadata
4dd008e
---
language: en
library_name: transformers
license: mit
tags:
- finetuning
- lora
- qlora
- unsloth
- gpu
- distributed
datasets:
- wikitext
pipeline_tag: text-generation
model-index:
- name: Humigence
results:
- task:
type: text-generation
dataset:
name: WikiText-2
type: wikitext
metrics:
- type: loss
value: 1.50
---
# ๐Ÿง  Humigence CLI
**Your AI. Your pipeline. Zero code.**
A complete MLOps suite built for makers, teams, and enterprises. Humigence provides zero-config, GPU-aware fine-tuning with surgical precision and complete reproducibility.
## โœจ Key Features
- ๐ŸŽฏ **Interactive Wizard**: Step-by-step configuration with Basic/Advanced modes
- ๐Ÿ–ฅ๏ธ **Smart GPU Detection**: Automatic detection and selection of available GPUs
- ๐Ÿš€ **Dual-GPU Training**: Multi-GPU support with Unsloth + TorchRun
- ๐Ÿงช **Training Recipes**: QLoRA (4-bit), LoRA (FP16/BF16), Full Fine-tuning
- ๐Ÿ“Š **Intelligent Batching**: Auto-fit batch size to available VRAM
- ๐Ÿ”„ **Complete Reproducibility**: Config snapshots and reproduce scripts
- ๐Ÿ“ˆ **Built-in Evaluation**: Curated prompts and quality gates
- ๐Ÿ“ฆ **Artifact Export**: Structured outputs with run summaries
## ๐Ÿš€ Quick Start
### Prerequisites
- **GPU**: NVIDIA GPU with CUDA support (RTX 5090, RTX 4080, etc.)
- **RAM**: 8GB+ recommended
- **Storage**: 10GB+ for models and datasets
- **Python**: 3.8+ with PyTorch
### Installation
```bash
# Clone the repository
git clone https://github.com/your-username/humigence.git
cd humigence
# Install dependencies
pip install -r requirements.txt
# Set up Unsloth (required for training)
python3 training/unsloth/setup_humigence_unsloth.py
# Launch the interactive wizard
python3 cli/main.py
```
### Basic Usage
```bash
# Launch the interactive wizard
python3 cli/main.py
# The wizard will guide you through:
# 1. Model selection
# 2. Dataset configuration
# 3. Training parameters
# 4. GPU selection (single or multi-GPU)
# 5. Launch training
```
## ๐ŸŽฏ Training Workflow
### 1. Interactive Setup
The Humigence wizard guides you through:
- **Setup Mode**: Basic (essential config) or Advanced (full control)
- **Hardware Detection**: Automatic GPU, CPU, and memory detection
- **Model Selection**: Choose from supported models or custom paths
- **Dataset Loading**: Auto-detection from `~/humigence_data/` or custom paths
- **Training Recipe**: QLoRA, LoRA, or Full Fine-tuning
- **GPU Selection**: Single-GPU auto-selection or multi-GPU prompting
### 2. GPU Selection
Humigence intelligently handles GPU selection:
- **Single GPU**: Automatically selects and uses the available GPU
- **Multiple GPUs**: Prompts you to choose:
```
๐Ÿ”ง Training Mode:
> Multi-GPU Training (all available GPUs)
Single GPU Training (choose specific GPU)
```
### 3. Training Execution
```bash
๐Ÿš€ Humigence Training Starting...
โœ… Configuration Loaded: [all settings]
๐Ÿ–ฅ๏ธ GPU Detection: 2x RTX 5090 detected
๐Ÿ”ง Training Mode: Multi-GPU Training
๐Ÿ“ฆ Loading model: Qwen/Qwen2.5-0.5B
โœ… LoRA adapters applied
๐Ÿ“š Loading dataset: wikitext2 (10,000 samples)
๐Ÿš€ Starting training with TorchRun...
โœ… Training complete โ€” adapters saved.
```
## ๐Ÿ“Š Supported Models
- **Qwen/Qwen2.5-0.5B**: 77M parameters (recommended for testing)
- **microsoft/Phi-2**: 839M parameters
- **TinyLlama/TinyLlama-1.1B-Chat-v1.0**: 369M parameters
- **Custom Models**: Any HuggingFace model or local path
## ๐Ÿ—‚๏ธ Dataset Support
- **JSONL Format**: Line-by-line JSON with instruction/output pairs
- **Auto-Detection**: Scans `~/humigence_data/` directory
- **Custom Paths**: Specify any local dataset file
- **Sample Datasets**: Includes demo datasets for testing
### Dataset Format
```json
{"instruction": "What is machine learning?", "output": "Machine learning is a subset of artificial intelligence..."}
{"instruction": "Explain quantum computing", "output": "Quantum computing uses quantum mechanical phenomena..."}
```
## ๐Ÿ–ฅ๏ธ Hardware Requirements
### Minimum Requirements
- **GPU**: NVIDIA GPU with 8GB+ VRAM
- **RAM**: 16GB+ system RAM
- **Storage**: 20GB+ free space
### Recommended Setup
- **GPU**: RTX 4080/4090/5090 or better
- **RAM**: 32GB+ system RAM
- **Storage**: 50GB+ free space
### Multi-GPU Support
- **Dual-GPU**: RTX 5090 + RTX 5090 (tested)
- **Memory**: 16GB+ VRAM per GPU recommended
- **Training**: Automatic TorchRun distribution
## ๐Ÿ“ Project Structure
```
humigence/
โ”œโ”€โ”€ cli/
โ”‚ โ”œโ”€โ”€ main.py # Main CLI entry point
โ”‚ โ”œโ”€โ”€ config_wizard.py # Interactive configuration wizard
โ”‚ โ””โ”€โ”€ lora_wizard.py # LoRA-specific wizard
โ”œโ”€โ”€ training/
โ”‚ โ””โ”€โ”€ unsloth/ # Unsloth integration
โ”‚ โ”œโ”€โ”€ wizard.py # Unsloth training wizard
โ”‚ โ””โ”€โ”€ train_lora_dual.py # Multi-GPU training script
โ”œโ”€โ”€ pipelines/
โ”‚ โ””โ”€โ”€ lora_trainer.py # Training pipeline
โ”œโ”€โ”€ utils/
โ”‚ โ”œโ”€โ”€ device.py # Hardware detection
โ”‚ โ”œโ”€โ”€ dataset_loader.py # Dataset utilities
โ”‚ โ””โ”€โ”€ validators.py # Data validation
โ”œโ”€โ”€ config/
โ”‚ โ””โ”€โ”€ default_config.json # Default configuration
โ””โ”€โ”€ runs/ # Training outputs
โ””โ”€โ”€ humigence/
โ”œโ”€โ”€ config.snapshot.json
โ”œโ”€โ”€ adapters/ # LoRA weights
โ””โ”€โ”€ artifacts.zip # Complete export
```
## ๐Ÿ”ง Configuration
### Basic Mode (Recommended)
Essential configuration with sensible defaults:
- **Learning Rate**: 2e-4
- **Epochs**: 1
- **Gradient Accumulation**: 4
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
### Advanced Mode
Full control over all parameters:
- LoRA configuration (rank, alpha, dropout)
- Training hyperparameters
- Data processing options
- Evaluation settings
## ๐Ÿš€ Training Modes
### Single-GPU Training
```bash
# Automatically selected when 1 GPU detected
๐Ÿ”ง Single GPU detected - using GPU 0: RTX 5090
๐Ÿš€ Launching single-GPU training...
```
### Multi-GPU Training
```bash
# Prompts when multiple GPUs detected
๐Ÿ”ง 2 GPUs detected - choose training mode
> Multi-GPU Training (all available GPUs)
Single GPU Training (choose specific GPU)
```
## ๐Ÿ“ˆ Evaluation & Monitoring
### Built-in Evaluation
- **Curated Prompts**: 5 diverse evaluation questions
- **Model Inference**: Generation with temperature sampling
- **Quality Gates**: Loss thresholds and evaluation metrics
- **Status Tracking**: ACCEPTED.txt or REJECTED.txt files
### Run Monitoring
```bash
# View training progress
tail -f runs/humigence/training.log
# Check evaluation results
cat runs/humigence/eval_results.jsonl
# View run summary
cat runs/humigence/run_summary.json
```
## ๐Ÿ”„ Reproducibility
Every training run generates:
- **Config Snapshot**: Complete configuration in JSON
- **Reproduce Script**: One-click rerun capability
- **Artifact Archive**: Complete export of all outputs
- **Run Summary**: Structured metadata for tracking
```bash
# Rerun any training
./runs/humigence/reproduce.sh
# Or use the config directly
python3 training/unsloth/train_lora_dual.py --config runs/humigence/config.snapshot.json
```
## ๐Ÿ› ๏ธ Development
### Dependencies
Core dependencies are pinned for stability:
```txt
transformers>=4.41.0,<5.0.0
torch>=2.1.0
unsloth @ git+https://github.com/unslothai/unsloth.git
rich>=13.0.0
inquirer>=3.1.0
```
### Local Development
```bash
# Install in development mode
pip install -e .
# Run tests
python3 -m pytest tests/
# Run specific test
python3 test_gpu_selection.py
```
## ๐Ÿค Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.
### Quick Contribution Guide
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes
4. Add tests if applicable
5. Commit your changes: `git commit -m 'Add amazing feature'`
6. Push to the branch: `git push origin feature/amazing-feature`
7. Open a Pull Request
## ๐Ÿ“„ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐Ÿ™ Acknowledgments
- [Unsloth](https://github.com/unslothai/unsloth) for fast LoRA training
- [HuggingFace](https://huggingface.co/) for the transformers library
- [Microsoft](https://github.com/microsoft) for PEFT and LoRA implementations
- The open-source ML community
## ๐Ÿ†š Comparison with Other Tools
| Feature | Humigence CLI | Other Tools |
|---------|---------------|-------------|
| **Setup** | Interactive wizard | Manual config |
| **GPU Detection** | Automatic | Manual |
| **Multi-GPU** | Built-in TorchRun | Complex setup |
| **Reproducibility** | Complete snapshots | Partial |
| **Evaluation** | Built-in prompts | External tools |
| **Artifacts** | Structured export | Manual collection |
## ๐Ÿ› Troubleshooting
### Common Issues
**GPU not detected:**
```bash
# Check CUDA installation
python3 -c "import torch; print(torch.cuda.is_available())"
# Check GPU visibility
nvidia-smi
```
**Out of memory:**
```bash
# Reduce batch size in config
# Or use QLoRA for memory efficiency
```
**Training fails:**
```bash
# Check logs
cat runs/humigence/training.log
# Verify dataset format
head -5 ~/humigence_data/your_dataset.jsonl
```
### Getting Help
- **Issues**: [GitHub Issues](https://github.com/your-username/humigence/issues)
- **Discussions**: [GitHub Discussions](https://github.com/your-username/humigence/discussions)
- **Documentation**: [Wiki](https://github.com/your-username/humigence/wiki)
## ๐Ÿ—บ๏ธ Roadmap
### Current Features โœ…
- Interactive configuration wizard
- Single and multi-GPU training
- QLoRA and LoRA support
- Built-in evaluation
- Complete reproducibility
### Coming Soon ๐Ÿšง
- RAG implementation
- EnterpriseGPT integration
- Batch inference
- Context length optimization
- Web UI interface
- Model serving
### Future Features ๐Ÿ”ฎ
- Distributed training across nodes
- Advanced evaluation metrics
- Model compression
- Deployment automation
---
**Built with โค๏ธ for the AI community**
*Humigence โ€” Your AI. Your pipeline. Zero code.*
## ๐Ÿ“Š Stats
![GitHub stars](https://img.shields.io/github/stars/your-username/humigence?style=social)
![GitHub forks](https://img.shields.io/github/forks/your-username/humigence?style=social)
![GitHub issues](https://img.shields.io/github/issues/your-username/humigence)
![GitHub license](https://img.shields.io/github/license/your-username/humigence)
![Python version](https://img.shields.io/badge/python-3.8%2B-blue)
![PyTorch](https://img.shields.io/badge/PyTorch-2.1%2B-red)
![CUDA](https://img.shields.io/badge/CUDA-11.8%2B-green)