# Fine-Tuning Guide for Zenith-7B This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model. ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Data Preparation](#data-preparation) 3. [Training Methods](#training-methods) 4. [Advanced Configuration](#advanced-configuration) 5. [Evaluation](#evaluation) 6. [Deployment](#deployment) ## Prerequisites - Python 3.8+ - CUDA 11.8+ (for GPU training) - 16GB+ VRAM for full fine-tuning - 8GB+ VRAM for LoRA - 4GB+ VRAM for QLoRA ### Install Dependencies ```bash cd Zenith/V1/7B pip install -r requirements.txt ``` ## Data Preparation ### Format Zenith expects data in JSON format with the following structure: ```json [ { "instruction": "Write a function to calculate factorial", "input": "", "output": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n - 1)", "thoughts": "I need to handle base case and recursion", "emotion": "neutral", "frustration_level": 0.1 }, { "instruction": "Explain how a neural network works", "input": "", "output": "A neural network is...", "thoughts": "Start with biological analogy, then explain layers", "emotion": "explanatory", "frustration_level": 0.0 } ] ``` **Fields:** - `instruction`: Task description - `input`: Optional additional context - `output`: Expected response - `thoughts`: Chain-of-thought reasoning (optional but recommended) - `emotion`: Emotion label (optional, for EQ training) - `frustration_level`: Float 0-1 indicating frustration (optional) ### Dataset Sources Recommended datasets for fine-tuning: 1. **Code**: `code_search_net`, `CodeXGLUE`, `APPS`, `HumanEval` 2. **Reasoning**: `CoT collection`, `OpenThoughts`, `GSM8K`, `MATH` 3. **Emotional Intelligence**: Custom dialogues with emotion annotations ### Preprocessing Use the built-in processor: ```python from data.openthoughts_processor import OpenThoughtsProcessor config = OpenThoughtsConfig( dataset_name="your-dataset", streaming=True, max_seq_length=8192, quality_filtering=True, curriculum_learning=True ) processor = OpenThoughtsProcessor(config) dataset = processor.load_dataset() ``` ## Training Methods ### 1. Full Fine-Tuning Trains all parameters. Best quality but requires most VRAM. ```bash python train.py \ --base_model Qwen/Qwen2.5-Coder-7B \ --train_data ./data/train.json \ --epochs 3 \ --batch_size 4 \ --learning_rate 2e-5 \ --mixed_precision bf16 ``` **VRAM Requirements**: ~16GB for 7B with sequence length 2048 ### 2. LoRA (Low-Rank Adaptation) Freezes base model, trains low-rank matrices. Much more efficient. ```bash python train.py \ --base_model Qwen/Qwen2.5-Coder-7B \ --train_data ./data/train.json \ --use_lora \ --lora_r 16 \ --lora_alpha 32 \ --lora_dropout 0.1 \ --epochs 3 \ --batch_size 8 \ --learning_rate 1e-4 ``` **VRAM Requirements**: ~8GB for 7B with LoRA r=16 **Target modules**: Query, Key, Value, Output projections, MLP gates ### 3. QLoRA (Quantized LoRA) 4-bit quantized base model + LoRA. Minimal VRAM usage. ```bash python train.py \ --base_model Qwen/Qwen2.5-Coder-7B \ --train_data ./data/train.json \ --use_qlora \ --use_lora \ --lora_r 8 \ --epochs 3 \ --batch_size 8 \ --learning_rate 1e-4 ``` **VRAM Requirements**: ~4GB for 7B with 4-bit quantization ## Advanced Configuration ### Enabling MoE Convert dense layers to Mixture of Experts: ```bash python train.py \ --use_moe \ --num_experts 8 \ --moe_top_k 2 \ --moe_load_balancing_weight 0.01 ``` **Note**: MoE increases capacity but also memory usage. Consider using with LoRA. ### EQ Adapter (Emotional Intelligence) Add emotional intelligence capabilities: ```bash python train.py \ --use_eq_adapter \ --eq_loss_weight 0.1 \ --emotion_loss_weight 0.1 \ --frustration_loss_weight 0.1 ``` Requires data with `emotion` and `frustration_level` fields. ### Curriculum Learning Progressive training from easy to hard samples: ```bash python train.py \ --use_curriculum \ --curriculum_stages foundation reasoning code full ``` Stages: - `foundation`: High-quality, well-structured samples - `reasoning`: Chain-of-thought examples - `code`: Programming tasks - `full`: Complete dataset ### Quality Filtering Automatically filter low-quality samples: ```bash python train.py \ --use_quality_filter \ --min_quality_score 0.6 ``` Filters based on: - Length appropriateness - Language detection - Repetition - Coherence - Structure ### Data Augmentation Synthetic data augmentation: ```bash python train.py \ --use_augmentation \ --augmentation_types synonym back_translation code_perturbation ``` Available augmentations: - `synonym`: Replace words with synonyms - `back_translation`: Translate to another language and back - `code_perturbation`: Variable renaming, formatting changes - `paraphrasing`: Rephrase instructions - `noise_injection`: Add small amounts of noise ### Ring Attention (for long contexts) Enable ring attention for 32K+ context: ```bash python train.py \ --use_ring_attention \ --ring_attention_chunk_size 8192 \ --ring_attention_overlap 2048 ``` **Note**: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs. ## Training Tips ### Learning Rate Scheduling Use cosine decay with warmup: ```bash python train.py \ --learning_rate 2e-5 \ --warmup_steps 100 \ --num_train_epochs 3 ``` ### Gradient Accumulation Simulate larger batch sizes: ```bash python train.py \ --batch_size 2 \ --gradient_accumulation_steps 8 ``` Effective batch size = batch_size × gradient_accumulation_steps ### Mixed Precision Speed up training with mixed precision: ```bash python train.py \ --mixed_precision bf16 # Ampere+ GPUs (RTX 30xx, A100, H100) # or --mixed_precision fp16 # Older GPUs (Pascal, Volta) ``` ### Gradient Checkpointing Trade compute for memory: ```bash python train.py \ --gradient_checkpointing ``` Reduces memory by ~60% at cost of ~20% slower training. ### Early Stopping Monitor validation loss and stop when it doesn't improve: ```bash python train.py \ --early_stopping_patience 3 \ --eval_steps 500 ``` ## Evaluation ### Automated Benchmarks Run evaluation on standard benchmarks: ```bash python -m evaluation.benchmark \ --model_path ./outputs/checkpoint-final \ --benchmarks humaneval mbpp gsm8k math truthfulqa ``` ### Custom Evaluation Create custom evaluation script: ```python from evaluation.eval_datasets import load_dataset from evaluation.metrics import compute_metrics # Load test data test_data = load_dataset("your_test_data") # Generate predictions predictions = [] for sample in test_data: response = generate(model, tokenizer, sample["instruction"]) predictions.append(response) # Compute metrics metrics = compute_metrics(predictions, test_data) print(f"Accuracy: {metrics['accuracy']}") print(f"Pass@1: {metrics['pass@1']}") ``` ## Deployment ### Ollama Create a custom Modelfile: ```bash # Already provided: Modelfile ollama create zenith-7b -f Modelfile ollama run zenith-7b "Your prompt here" ``` ### vLLM (High-Throughput Serving) ```bash python -m vllm.entrypoints.openai.api_server \ --model ./outputs/checkpoint-final \ --port 8000 ``` ### Hugging Face Text Generation Inference ```bash docker run --gpus all -p 8080:80 \ -v ./outputs/checkpoint-final:/data \ ghcr.io/huggingface/text-generation-inference:latest \ --model-id /data ``` ## Troubleshooting ### CUDA Out of Memory 1. Reduce batch size 2. Enable gradient checkpointing 3. Use LoRA or QLoRA 4. Reduce sequence length 5. Use mixed precision ### Poor Convergence 1. Check learning rate (try 1e-5 to 5e-5) 2. Increase warmup steps 3. Use gradient clipping 4. Verify data quality 5. Try longer training (more epochs) ### Slow Training 1. Use mixed precision 2. Increase batch size if possible 3. Pin memory in dataloader 4. Use SSD for data storage 5. Preprocess/cache dataset ## Additional Resources - [Hugging Face PEFT Documentation](https://huggingface.co/docs/peft/en/index) - [LoRA Paper](https://arxiv.org/abs/2106.09685) - [QLoRA Paper](https://arxiv.org/abs/2305.14314) - [OpenThoughts Dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) ## Support For issues and questions: - Check existing documentation in `README.md` - Review configuration options in `configs/` - Open an issue with detailed error logs