Zenith-7b-V1 / FINETUNE_GUIDE.md
Zandy-Wandy's picture
Upload Zenith-7B model
8d18b7c verified

Fine-Tuning Guide for Zenith-7B

This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model.

Table of Contents

  1. Prerequisites
  2. Data Preparation
  3. Training Methods
  4. Advanced Configuration
  5. Evaluation
  6. Deployment

Prerequisites

  • Python 3.8+
  • CUDA 11.8+ (for GPU training)
  • 16GB+ VRAM for full fine-tuning
  • 8GB+ VRAM for LoRA
  • 4GB+ VRAM for QLoRA

Install Dependencies

cd Zenith/V1/7B
pip install -r requirements.txt

Data Preparation

Format

Zenith expects data in JSON format with the following structure:

[
  {
    "instruction": "Write a function to calculate factorial",
    "input": "",
    "output": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n - 1)",
    "thoughts": "I need to handle base case and recursion",
    "emotion": "neutral",
    "frustration_level": 0.1
  },
  {
    "instruction": "Explain how a neural network works",
    "input": "",
    "output": "A neural network is...",
    "thoughts": "Start with biological analogy, then explain layers",
    "emotion": "explanatory",
    "frustration_level": 0.0
  }
]

Fields:

  • instruction: Task description
  • input: Optional additional context
  • output: Expected response
  • thoughts: Chain-of-thought reasoning (optional but recommended)
  • emotion: Emotion label (optional, for EQ training)
  • frustration_level: Float 0-1 indicating frustration (optional)

Dataset Sources

Recommended datasets for fine-tuning:

  1. Code: code_search_net, CodeXGLUE, APPS, HumanEval
  2. Reasoning: CoT collection, OpenThoughts, GSM8K, MATH
  3. Emotional Intelligence: Custom dialogues with emotion annotations

Preprocessing

Use the built-in processor:

from data.openthoughts_processor import OpenThoughtsProcessor

config = OpenThoughtsConfig(
    dataset_name="your-dataset",
    streaming=True,
    max_seq_length=8192,
    quality_filtering=True,
    curriculum_learning=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()

Training Methods

1. Full Fine-Tuning

Trains all parameters. Best quality but requires most VRAM.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5 \
  --mixed_precision bf16

VRAM Requirements: ~16GB for 7B with sequence length 2048

2. LoRA (Low-Rank Adaptation)

Freezes base model, trains low-rank matrices. Much more efficient.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.1 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4

VRAM Requirements: ~8GB for 7B with LoRA r=16

Target modules: Query, Key, Value, Output projections, MLP gates

3. QLoRA (Quantized LoRA)

4-bit quantized base model + LoRA. Minimal VRAM usage.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_qlora \
  --use_lora \
  --lora_r 8 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4

VRAM Requirements: ~4GB for 7B with 4-bit quantization

Advanced Configuration

Enabling MoE

Convert dense layers to Mixture of Experts:

python train.py \
  --use_moe \
  --num_experts 8 \
  --moe_top_k 2 \
  --moe_load_balancing_weight 0.01

Note: MoE increases capacity but also memory usage. Consider using with LoRA.

EQ Adapter (Emotional Intelligence)

Add emotional intelligence capabilities:

python train.py \
  --use_eq_adapter \
  --eq_loss_weight 0.1 \
  --emotion_loss_weight 0.1 \
  --frustration_loss_weight 0.1

Requires data with emotion and frustration_level fields.

Curriculum Learning

Progressive training from easy to hard samples:

python train.py \
  --use_curriculum \
  --curriculum_stages foundation reasoning code full

Stages:

  • foundation: High-quality, well-structured samples
  • reasoning: Chain-of-thought examples
  • code: Programming tasks
  • full: Complete dataset

Quality Filtering

Automatically filter low-quality samples:

python train.py \
  --use_quality_filter \
  --min_quality_score 0.6

Filters based on:

  • Length appropriateness
  • Language detection
  • Repetition
  • Coherence
  • Structure

Data Augmentation

Synthetic data augmentation:

python train.py \
  --use_augmentation \
  --augmentation_types synonym back_translation code_perturbation

Available augmentations:

  • synonym: Replace words with synonyms
  • back_translation: Translate to another language and back
  • code_perturbation: Variable renaming, formatting changes
  • paraphrasing: Rephrase instructions
  • noise_injection: Add small amounts of noise

Ring Attention (for long contexts)

Enable ring attention for 32K+ context:

python train.py \
  --use_ring_attention \
  --ring_attention_chunk_size 8192 \
  --ring_attention_overlap 2048

Note: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs.

Training Tips

Learning Rate Scheduling

Use cosine decay with warmup:

python train.py \
  --learning_rate 2e-5 \
  --warmup_steps 100 \
  --num_train_epochs 3

Gradient Accumulation

Simulate larger batch sizes:

python train.py \
  --batch_size 2 \
  --gradient_accumulation_steps 8

Effective batch size = batch_size × gradient_accumulation_steps

Mixed Precision

Speed up training with mixed precision:

python train.py \
  --mixed_precision bf16  # Ampere+ GPUs (RTX 30xx, A100, H100)
  # or
  --mixed_precision fp16  # Older GPUs (Pascal, Volta)

Gradient Checkpointing

Trade compute for memory:

python train.py \
  --gradient_checkpointing

Reduces memory by ~60% at cost of ~20% slower training.

Early Stopping

Monitor validation loss and stop when it doesn't improve:

python train.py \
  --early_stopping_patience 3 \
  --eval_steps 500

Evaluation

Automated Benchmarks

Run evaluation on standard benchmarks:

python -m evaluation.benchmark \
  --model_path ./outputs/checkpoint-final \
  --benchmarks humaneval mbpp gsm8k math truthfulqa

Custom Evaluation

Create custom evaluation script:

from evaluation.eval_datasets import load_dataset
from evaluation.metrics import compute_metrics

# Load test data
test_data = load_dataset("your_test_data")

# Generate predictions
predictions = []
for sample in test_data:
    response = generate(model, tokenizer, sample["instruction"])
    predictions.append(response)

# Compute metrics
metrics = compute_metrics(predictions, test_data)
print(f"Accuracy: {metrics['accuracy']}")
print(f"Pass@1: {metrics['pass@1']}")

Deployment

Ollama

Create a custom Modelfile:

# Already provided: Modelfile
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Your prompt here"

vLLM (High-Throughput Serving)

python -m vllm.entrypoints.openai.api_server \
  --model ./outputs/checkpoint-final \
  --port 8000

Hugging Face Text Generation Inference

docker run --gpus all -p 8080:80 \
  -v ./outputs/checkpoint-final:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id /data

Troubleshooting

CUDA Out of Memory

  1. Reduce batch size
  2. Enable gradient checkpointing
  3. Use LoRA or QLoRA
  4. Reduce sequence length
  5. Use mixed precision

Poor Convergence

  1. Check learning rate (try 1e-5 to 5e-5)
  2. Increase warmup steps
  3. Use gradient clipping
  4. Verify data quality
  5. Try longer training (more epochs)

Slow Training

  1. Use mixed precision
  2. Increase batch size if possible
  3. Pin memory in dataloader
  4. Use SSD for data storage
  5. Preprocess/cache dataset

Additional Resources

Support

For issues and questions:

  • Check existing documentation in README.md
  • Review configuration options in configs/
  • Open an issue with detailed error logs