Zenith-7b-V1 / FINETUNE_GUIDE.md
Zandy-Wandy's picture
Upload Zenith-7B model
8d18b7c verified
# Fine-Tuning Guide for Zenith-7B
This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Data Preparation](#data-preparation)
3. [Training Methods](#training-methods)
4. [Advanced Configuration](#advanced-configuration)
5. [Evaluation](#evaluation)
6. [Deployment](#deployment)
## Prerequisites
- Python 3.8+
- CUDA 11.8+ (for GPU training)
- 16GB+ VRAM for full fine-tuning
- 8GB+ VRAM for LoRA
- 4GB+ VRAM for QLoRA
### Install Dependencies
```bash
cd Zenith/V1/7B
pip install -r requirements.txt
```
## Data Preparation
### Format
Zenith expects data in JSON format with the following structure:
```json
[
{
"instruction": "Write a function to calculate factorial",
"input": "",
"output": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n - 1)",
"thoughts": "I need to handle base case and recursion",
"emotion": "neutral",
"frustration_level": 0.1
},
{
"instruction": "Explain how a neural network works",
"input": "",
"output": "A neural network is...",
"thoughts": "Start with biological analogy, then explain layers",
"emotion": "explanatory",
"frustration_level": 0.0
}
]
```
**Fields:**
- `instruction`: Task description
- `input`: Optional additional context
- `output`: Expected response
- `thoughts`: Chain-of-thought reasoning (optional but recommended)
- `emotion`: Emotion label (optional, for EQ training)
- `frustration_level`: Float 0-1 indicating frustration (optional)
### Dataset Sources
Recommended datasets for fine-tuning:
1. **Code**: `code_search_net`, `CodeXGLUE`, `APPS`, `HumanEval`
2. **Reasoning**: `CoT collection`, `OpenThoughts`, `GSM8K`, `MATH`
3. **Emotional Intelligence**: Custom dialogues with emotion annotations
### Preprocessing
Use the built-in processor:
```python
from data.openthoughts_processor import OpenThoughtsProcessor
config = OpenThoughtsConfig(
dataset_name="your-dataset",
streaming=True,
max_seq_length=8192,
quality_filtering=True,
curriculum_learning=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()
```
## Training Methods
### 1. Full Fine-Tuning
Trains all parameters. Best quality but requires most VRAM.
```bash
python train.py \
--base_model Qwen/Qwen2.5-Coder-7B \
--train_data ./data/train.json \
--epochs 3 \
--batch_size 4 \
--learning_rate 2e-5 \
--mixed_precision bf16
```
**VRAM Requirements**: ~16GB for 7B with sequence length 2048
### 2. LoRA (Low-Rank Adaptation)
Freezes base model, trains low-rank matrices. Much more efficient.
```bash
python train.py \
--base_model Qwen/Qwen2.5-Coder-7B \
--train_data ./data/train.json \
--use_lora \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.1 \
--epochs 3 \
--batch_size 8 \
--learning_rate 1e-4
```
**VRAM Requirements**: ~8GB for 7B with LoRA r=16
**Target modules**: Query, Key, Value, Output projections, MLP gates
### 3. QLoRA (Quantized LoRA)
4-bit quantized base model + LoRA. Minimal VRAM usage.
```bash
python train.py \
--base_model Qwen/Qwen2.5-Coder-7B \
--train_data ./data/train.json \
--use_qlora \
--use_lora \
--lora_r 8 \
--epochs 3 \
--batch_size 8 \
--learning_rate 1e-4
```
**VRAM Requirements**: ~4GB for 7B with 4-bit quantization
## Advanced Configuration
### Enabling MoE
Convert dense layers to Mixture of Experts:
```bash
python train.py \
--use_moe \
--num_experts 8 \
--moe_top_k 2 \
--moe_load_balancing_weight 0.01
```
**Note**: MoE increases capacity but also memory usage. Consider using with LoRA.
### EQ Adapter (Emotional Intelligence)
Add emotional intelligence capabilities:
```bash
python train.py \
--use_eq_adapter \
--eq_loss_weight 0.1 \
--emotion_loss_weight 0.1 \
--frustration_loss_weight 0.1
```
Requires data with `emotion` and `frustration_level` fields.
### Curriculum Learning
Progressive training from easy to hard samples:
```bash
python train.py \
--use_curriculum \
--curriculum_stages foundation reasoning code full
```
Stages:
- `foundation`: High-quality, well-structured samples
- `reasoning`: Chain-of-thought examples
- `code`: Programming tasks
- `full`: Complete dataset
### Quality Filtering
Automatically filter low-quality samples:
```bash
python train.py \
--use_quality_filter \
--min_quality_score 0.6
```
Filters based on:
- Length appropriateness
- Language detection
- Repetition
- Coherence
- Structure
### Data Augmentation
Synthetic data augmentation:
```bash
python train.py \
--use_augmentation \
--augmentation_types synonym back_translation code_perturbation
```
Available augmentations:
- `synonym`: Replace words with synonyms
- `back_translation`: Translate to another language and back
- `code_perturbation`: Variable renaming, formatting changes
- `paraphrasing`: Rephrase instructions
- `noise_injection`: Add small amounts of noise
### Ring Attention (for long contexts)
Enable ring attention for 32K+ context:
```bash
python train.py \
--use_ring_attention \
--ring_attention_chunk_size 8192 \
--ring_attention_overlap 2048
```
**Note**: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs.
## Training Tips
### Learning Rate Scheduling
Use cosine decay with warmup:
```bash
python train.py \
--learning_rate 2e-5 \
--warmup_steps 100 \
--num_train_epochs 3
```
### Gradient Accumulation
Simulate larger batch sizes:
```bash
python train.py \
--batch_size 2 \
--gradient_accumulation_steps 8
```
Effective batch size = batch_size × gradient_accumulation_steps
### Mixed Precision
Speed up training with mixed precision:
```bash
python train.py \
--mixed_precision bf16 # Ampere+ GPUs (RTX 30xx, A100, H100)
# or
--mixed_precision fp16 # Older GPUs (Pascal, Volta)
```
### Gradient Checkpointing
Trade compute for memory:
```bash
python train.py \
--gradient_checkpointing
```
Reduces memory by ~60% at cost of ~20% slower training.
### Early Stopping
Monitor validation loss and stop when it doesn't improve:
```bash
python train.py \
--early_stopping_patience 3 \
--eval_steps 500
```
## Evaluation
### Automated Benchmarks
Run evaluation on standard benchmarks:
```bash
python -m evaluation.benchmark \
--model_path ./outputs/checkpoint-final \
--benchmarks humaneval mbpp gsm8k math truthfulqa
```
### Custom Evaluation
Create custom evaluation script:
```python
from evaluation.eval_datasets import load_dataset
from evaluation.metrics import compute_metrics
# Load test data
test_data = load_dataset("your_test_data")
# Generate predictions
predictions = []
for sample in test_data:
response = generate(model, tokenizer, sample["instruction"])
predictions.append(response)
# Compute metrics
metrics = compute_metrics(predictions, test_data)
print(f"Accuracy: {metrics['accuracy']}")
print(f"Pass@1: {metrics['pass@1']}")
```
## Deployment
### Ollama
Create a custom Modelfile:
```bash
# Already provided: Modelfile
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Your prompt here"
```
### vLLM (High-Throughput Serving)
```bash
python -m vllm.entrypoints.openai.api_server \
--model ./outputs/checkpoint-final \
--port 8000
```
### Hugging Face Text Generation Inference
```bash
docker run --gpus all -p 8080:80 \
-v ./outputs/checkpoint-final:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id /data
```
## Troubleshooting
### CUDA Out of Memory
1. Reduce batch size
2. Enable gradient checkpointing
3. Use LoRA or QLoRA
4. Reduce sequence length
5. Use mixed precision
### Poor Convergence
1. Check learning rate (try 1e-5 to 5e-5)
2. Increase warmup steps
3. Use gradient clipping
4. Verify data quality
5. Try longer training (more epochs)
### Slow Training
1. Use mixed precision
2. Increase batch size if possible
3. Pin memory in dataloader
4. Use SSD for data storage
5. Preprocess/cache dataset
## Additional Resources
- [Hugging Face PEFT Documentation](https://huggingface.co/docs/peft/en/index)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [QLoRA Paper](https://arxiv.org/abs/2305.14314)
- [OpenThoughts Dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)
## Support
For issues and questions:
- Check existing documentation in `README.md`
- Review configuration options in `configs/`
- Open an issue with detailed error logs