Zenith-7b-V1 / FINETUNE_GUIDE.md

Upload Zenith-7B model

8d18b7c verified 26 days ago

8.93 kB

	# Fine-Tuning Guide for Zenith-7B

	This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model.

	## Table of Contents

	1. [Prerequisites](#prerequisites)
	2. [Data Preparation](#data-preparation)
	3. [Training Methods](#training-methods)
	4. [Advanced Configuration](#advanced-configuration)
	5. [Evaluation](#evaluation)
	6. [Deployment](#deployment)

	## Prerequisites

	- Python 3.8+
	- CUDA 11.8+ (for GPU training)
	- 16GB+ VRAM for full fine-tuning
	- 8GB+ VRAM for LoRA
	- 4GB+ VRAM for QLoRA

	### Install Dependencies

	```bash
	cd Zenith/V1/7B
	pip install -r requirements.txt
	```

	## Data Preparation

	### Format

	Zenith expects data in JSON format with the following structure:

	```json
	[
	{
	"instruction": "Write a function to calculate factorial",
	"input": "",
	"output": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n - 1)",
	"thoughts": "I need to handle base case and recursion",
	"emotion": "neutral",
	"frustration_level": 0.1
	},
	{
	"instruction": "Explain how a neural network works",
	"input": "",
	"output": "A neural network is...",
	"thoughts": "Start with biological analogy, then explain layers",
	"emotion": "explanatory",
	"frustration_level": 0.0
	}
	]
	```

	Fields:
	- `instruction`: Task description
	- `input`: Optional additional context
	- `output`: Expected response
	- `thoughts`: Chain-of-thought reasoning (optional but recommended)
	- `emotion`: Emotion label (optional, for EQ training)
	- `frustration_level`: Float 0-1 indicating frustration (optional)

	### Dataset Sources

	Recommended datasets for fine-tuning:

	1. Code: `code_search_net`, `CodeXGLUE`, `APPS`, `HumanEval`
	2. Reasoning: `CoT collection`, `OpenThoughts`, `GSM8K`, `MATH`
	3. Emotional Intelligence: Custom dialogues with emotion annotations

	### Preprocessing

	Use the built-in processor:

	```python
	from data.openthoughts_processor import OpenThoughtsProcessor

	config = OpenThoughtsConfig(
	dataset_name="your-dataset",
	streaming=True,
	max_seq_length=8192,
	quality_filtering=True,
	curriculum_learning=True
	)
	processor = OpenThoughtsProcessor(config)
	dataset = processor.load_dataset()
	```

	## Training Methods

	### 1. Full Fine-Tuning

	Trains all parameters. Best quality but requires most VRAM.

	```bash
	python train.py \
	--base_model Qwen/Qwen2.5-Coder-7B \
	--train_data ./data/train.json \
	--epochs 3 \
	--batch_size 4 \
	--learning_rate 2e-5 \
	--mixed_precision bf16
	```

	VRAM Requirements: ~16GB for 7B with sequence length 2048

	### 2. LoRA (Low-Rank Adaptation)

	Freezes base model, trains low-rank matrices. Much more efficient.

	```bash
	python train.py \
	--base_model Qwen/Qwen2.5-Coder-7B \
	--train_data ./data/train.json \
	--use_lora \
	--lora_r 16 \
	--lora_alpha 32 \
	--lora_dropout 0.1 \
	--epochs 3 \
	--batch_size 8 \
	--learning_rate 1e-4
	```

	VRAM Requirements: ~8GB for 7B with LoRA r=16

	Target modules: Query, Key, Value, Output projections, MLP gates

	### 3. QLoRA (Quantized LoRA)

	4-bit quantized base model + LoRA. Minimal VRAM usage.

	```bash
	python train.py \
	--base_model Qwen/Qwen2.5-Coder-7B \
	--train_data ./data/train.json \
	--use_qlora \
	--use_lora \
	--lora_r 8 \
	--epochs 3 \
	--batch_size 8 \
	--learning_rate 1e-4
	```

	VRAM Requirements: ~4GB for 7B with 4-bit quantization

	## Advanced Configuration

	### Enabling MoE

	Convert dense layers to Mixture of Experts:

	```bash
	python train.py \
	--use_moe \
	--num_experts 8 \
	--moe_top_k 2 \
	--moe_load_balancing_weight 0.01
	```

	Note: MoE increases capacity but also memory usage. Consider using with LoRA.

	### EQ Adapter (Emotional Intelligence)

	Add emotional intelligence capabilities:

	```bash
	python train.py \
	--use_eq_adapter \
	--eq_loss_weight 0.1 \
	--emotion_loss_weight 0.1 \
	--frustration_loss_weight 0.1
	```

	Requires data with `emotion` and `frustration_level` fields.

	### Curriculum Learning

	Progressive training from easy to hard samples:

	```bash
	python train.py \
	--use_curriculum \
	--curriculum_stages foundation reasoning code full
	```

	Stages:
	- `foundation`: High-quality, well-structured samples
	- `reasoning`: Chain-of-thought examples
	- `code`: Programming tasks
	- `full`: Complete dataset

	### Quality Filtering

	Automatically filter low-quality samples:

	```bash
	python train.py \
	--use_quality_filter \
	--min_quality_score 0.6
	```

	Filters based on:
	- Length appropriateness
	- Language detection
	- Repetition
	- Coherence
	- Structure

	### Data Augmentation

	Synthetic data augmentation:

	```bash
	python train.py \
	--use_augmentation \
	--augmentation_types synonym back_translation code_perturbation
	```

	Available augmentations:
	- `synonym`: Replace words with synonyms
	- `back_translation`: Translate to another language and back
	- `code_perturbation`: Variable renaming, formatting changes
	- `paraphrasing`: Rephrase instructions
	- `noise_injection`: Add small amounts of noise

	### Ring Attention (for long contexts)

	Enable ring attention for 32K+ context:

	```bash
	python train.py \
	--use_ring_attention \
	--ring_attention_chunk_size 8192 \
	--ring_attention_overlap 2048
	```

	Note: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs.

	## Training Tips

	### Learning Rate Scheduling

	Use cosine decay with warmup:

	```bash
	python train.py \
	--learning_rate 2e-5 \
	--warmup_steps 100 \
	--num_train_epochs 3
	```

	### Gradient Accumulation

	Simulate larger batch sizes:

	```bash
	python train.py \
	--batch_size 2 \
	--gradient_accumulation_steps 8
	```

	Effective batch size = batch_size × gradient_accumulation_steps

	### Mixed Precision

	Speed up training with mixed precision:

	```bash
	python train.py \
	--mixed_precision bf16 # Ampere+ GPUs (RTX 30xx, A100, H100)
	# or
	--mixed_precision fp16 # Older GPUs (Pascal, Volta)
	```

	### Gradient Checkpointing

	Trade compute for memory:

	```bash
	python train.py \
	--gradient_checkpointing
	```

	Reduces memory by ~60% at cost of ~20% slower training.

	### Early Stopping

	Monitor validation loss and stop when it doesn't improve:

	```bash
	python train.py \
	--early_stopping_patience 3 \
	--eval_steps 500
	```

	## Evaluation

	### Automated Benchmarks

	Run evaluation on standard benchmarks:

	```bash
	python -m evaluation.benchmark \
	--model_path ./outputs/checkpoint-final \
	--benchmarks humaneval mbpp gsm8k math truthfulqa
	```

	### Custom Evaluation

	Create custom evaluation script:

	```python
	from evaluation.eval_datasets import load_dataset
	from evaluation.metrics import compute_metrics

	# Load test data
	test_data = load_dataset("your_test_data")

	# Generate predictions
	predictions = []
	for sample in test_data:
	response = generate(model, tokenizer, sample["instruction"])
	predictions.append(response)

	# Compute metrics
	metrics = compute_metrics(predictions, test_data)
	print(f"Accuracy: {metrics['accuracy']}")
	print(f"Pass@1: {metrics['pass@1']}")
	```

	## Deployment

	### Ollama

	Create a custom Modelfile:

	```bash
	# Already provided: Modelfile
	ollama create zenith-7b -f Modelfile
	ollama run zenith-7b "Your prompt here"
	```

	### vLLM (High-Throughput Serving)

	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model ./outputs/checkpoint-final \
	--port 8000
	```

	### Hugging Face Text Generation Inference

	```bash
	docker run --gpus all -p 8080:80 \
	-v ./outputs/checkpoint-final:/data \
	ghcr.io/huggingface/text-generation-inference:latest \
	--model-id /data
	```

	## Troubleshooting

	### CUDA Out of Memory

	1. Reduce batch size
	2. Enable gradient checkpointing
	3. Use LoRA or QLoRA
	4. Reduce sequence length
	5. Use mixed precision

	### Poor Convergence

	1. Check learning rate (try 1e-5 to 5e-5)
	2. Increase warmup steps
	3. Use gradient clipping
	4. Verify data quality
	5. Try longer training (more epochs)

	### Slow Training

	1. Use mixed precision
	2. Increase batch size if possible
	3. Pin memory in dataloader
	4. Use SSD for data storage
	5. Preprocess/cache dataset

	## Additional Resources

	- [Hugging Face PEFT Documentation](https://huggingface.co/docs/peft/en/index)
	- [LoRA Paper](https://arxiv.org/abs/2106.09685)
	- [QLoRA Paper](https://arxiv.org/abs/2305.14314)
	- [OpenThoughts Dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)

	## Support

	For issues and questions:
	- Check existing documentation in `README.md`
	- Review configuration options in `configs/`
	- Open an issue with detailed error logs