# Fine-Tuning Guide for Zenith-7B

This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model.

## Table of Contents

1. [Prerequisites](#prerequisites)
2. [Data Preparation](#data-preparation)
3. [Training Methods](#training-methods)
4. [Advanced Configuration](#advanced-configuration)
5. [Evaluation](#evaluation)
6. [Deployment](#deployment)

## Prerequisites

- Python 3.8+
- CUDA 11.8+ (for GPU training)
- 16GB+ VRAM for full fine-tuning
- 8GB+ VRAM for LoRA
- 4GB+ VRAM for QLoRA

### Install Dependencies

```bash
cd Zenith/V1/7B
pip install -r requirements.txt
```

## Data Preparation

### Format

Zenith expects data in JSON format with the following structure:

```json
[
  {
    "instruction": "Write a function to calculate factorial",
    "input": "",
    "output": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n - 1)",
    "thoughts": "I need to handle base case and recursion",
    "emotion": "neutral",
    "frustration_level": 0.1
  },
  {
    "instruction": "Explain how a neural network works",
    "input": "",
    "output": "A neural network is...",
    "thoughts": "Start with biological analogy, then explain layers",
    "emotion": "explanatory",
    "frustration_level": 0.0
  }
]
```

**Fields:**
- `instruction`: Task description
- `input`: Optional additional context
- `output`: Expected response
- `thoughts`: Chain-of-thought reasoning (optional but recommended)
- `emotion`: Emotion label (optional, for EQ training)
- `frustration_level`: Float 0-1 indicating frustration (optional)

### Dataset Sources

Recommended datasets for fine-tuning:

1. **Code**: `code_search_net`, `CodeXGLUE`, `APPS`, `HumanEval`
2. **Reasoning**: `CoT collection`, `OpenThoughts`, `GSM8K`, `MATH`
3. **Emotional Intelligence**: Custom dialogues with emotion annotations

### Preprocessing

Use the built-in processor:

```python
from data.openthoughts_processor import OpenThoughtsProcessor

config = OpenThoughtsConfig(
    dataset_name="your-dataset",
    streaming=True,
    max_seq_length=8192,
    quality_filtering=True,
    curriculum_learning=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()
```

## Training Methods

### 1. Full Fine-Tuning

Trains all parameters. Best quality but requires most VRAM.

```bash
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5 \
  --mixed_precision bf16
```

**VRAM Requirements**: ~16GB for 7B with sequence length 2048

### 2. LoRA (Low-Rank Adaptation)

Freezes base model, trains low-rank matrices. Much more efficient.

```bash
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.1 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4
```

**VRAM Requirements**: ~8GB for 7B with LoRA r=16

**Target modules**: Query, Key, Value, Output projections, MLP gates

### 3. QLoRA (Quantized LoRA)

4-bit quantized base model + LoRA. Minimal VRAM usage.

```bash
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_qlora \
  --use_lora \
  --lora_r 8 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4
```

**VRAM Requirements**: ~4GB for 7B with 4-bit quantization

## Advanced Configuration

### Enabling MoE

Convert dense layers to Mixture of Experts:

```bash
python train.py \
  --use_moe \
  --num_experts 8 \
  --moe_top_k 2 \
  --moe_load_balancing_weight 0.01
```

**Note**: MoE increases capacity but also memory usage. Consider using with LoRA.

### EQ Adapter (Emotional Intelligence)

Add emotional intelligence capabilities:

```bash
python train.py \
  --use_eq_adapter \
  --eq_loss_weight 0.1 \
  --emotion_loss_weight 0.1 \
  --frustration_loss_weight 0.1
```

Requires data with `emotion` and `frustration_level` fields.

### Curriculum Learning

Progressive training from easy to hard samples:

```bash
python train.py \
  --use_curriculum \
  --curriculum_stages foundation reasoning code full
```

Stages:
- `foundation`: High-quality, well-structured samples
- `reasoning`: Chain-of-thought examples
- `code`: Programming tasks
- `full`: Complete dataset

### Quality Filtering

Automatically filter low-quality samples:

```bash
python train.py \
  --use_quality_filter \
  --min_quality_score 0.6
```

Filters based on:
- Length appropriateness
- Language detection
- Repetition
- Coherence
- Structure

### Data Augmentation

Synthetic data augmentation:

```bash
python train.py \
  --use_augmentation \
  --augmentation_types synonym back_translation code_perturbation
```

Available augmentations:
- `synonym`: Replace words with synonyms
- `back_translation`: Translate to another language and back
- `code_perturbation`: Variable renaming, formatting changes
- `paraphrasing`: Rephrase instructions
- `noise_injection`: Add small amounts of noise

### Ring Attention (for long contexts)

Enable ring attention for 32K+ context:

```bash
python train.py \
  --use_ring_attention \
  --ring_attention_chunk_size 8192 \
  --ring_attention_overlap 2048
```

**Note**: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs.

## Training Tips

### Learning Rate Scheduling

Use cosine decay with warmup:

```bash
python train.py \
  --learning_rate 2e-5 \
  --warmup_steps 100 \
  --num_train_epochs 3
```

### Gradient Accumulation

Simulate larger batch sizes:

```bash
python train.py \
  --batch_size 2 \
  --gradient_accumulation_steps 8
```

Effective batch size = batch_size × gradient_accumulation_steps

### Mixed Precision

Speed up training with mixed precision:

```bash
python train.py \
  --mixed_precision bf16  # Ampere+ GPUs (RTX 30xx, A100, H100)
  # or
  --mixed_precision fp16  # Older GPUs (Pascal, Volta)
```

### Gradient Checkpointing

Trade compute for memory:

```bash
python train.py \
  --gradient_checkpointing
```

Reduces memory by ~60% at cost of ~20% slower training.

### Early Stopping

Monitor validation loss and stop when it doesn't improve:

```bash
python train.py \
  --early_stopping_patience 3 \
  --eval_steps 500
```

## Evaluation

### Automated Benchmarks

Run evaluation on standard benchmarks:

```bash
python -m evaluation.benchmark \
  --model_path ./outputs/checkpoint-final \
  --benchmarks humaneval mbpp gsm8k math truthfulqa
```

### Custom Evaluation

Create custom evaluation script:

```python
from evaluation.eval_datasets import load_dataset
from evaluation.metrics import compute_metrics

# Load test data
test_data = load_dataset("your_test_data")

# Generate predictions
predictions = []
for sample in test_data:
    response = generate(model, tokenizer, sample["instruction"])
    predictions.append(response)

# Compute metrics
metrics = compute_metrics(predictions, test_data)
print(f"Accuracy: {metrics['accuracy']}")
print(f"Pass@1: {metrics['pass@1']}")
```

## Deployment

### Ollama

Create a custom Modelfile:

```bash
# Already provided: Modelfile
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Your prompt here"
```

### vLLM (High-Throughput Serving)

```bash
python -m vllm.entrypoints.openai.api_server \
  --model ./outputs/checkpoint-final \
  --port 8000
```

### Hugging Face Text Generation Inference

```bash
docker run --gpus all -p 8080:80 \
  -v ./outputs/checkpoint-final:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id /data
```

## Troubleshooting

### CUDA Out of Memory

1. Reduce batch size
2. Enable gradient checkpointing
3. Use LoRA or QLoRA
4. Reduce sequence length
5. Use mixed precision

### Poor Convergence

1. Check learning rate (try 1e-5 to 5e-5)
2. Increase warmup steps
3. Use gradient clipping
4. Verify data quality
5. Try longer training (more epochs)

### Slow Training

1. Use mixed precision
2. Increase batch size if possible
3. Pin memory in dataloader
4. Use SSD for data storage
5. Preprocess/cache dataset

## Additional Resources

- [Hugging Face PEFT Documentation](https://huggingface.co/docs/peft/en/index)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [QLoRA Paper](https://arxiv.org/abs/2305.14314)
- [OpenThoughts Dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)

## Support

For issues and questions:
- Check existing documentation in `README.md`
- Review configuration options in `configs/`
- Open an issue with detailed error logs