Zenith-7b-V1 / FINETUNE_GUIDE.md

Zandy-Wandy

Upload Zenith-7B model

8d18b7c verified 12 days ago

preview code

raw

history blame contribute delete

8.93 kB

Fine-Tuning Guide for Zenith-7B

This guide covers fine-tuning Zenith-7B on custom datasets, with a focus on Qwen2.5-Coder-7B as the base model.

Prerequisites
Data Preparation
Training Methods
Advanced Configuration
Evaluation
Deployment

Prerequisites

Python 3.8+
CUDA 11.8+ (for GPU training)
16GB+ VRAM for full fine-tuning
8GB+ VRAM for LoRA
4GB+ VRAM for QLoRA

Install Dependencies

cd Zenith/V1/7B
pip install -r requirements.txt

Data Preparation

Format

Zenith expects data in JSON format with the following structure:

[
  {
    "instruction": "Write a function to calculate factorial",
    "input": "",
    "output": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n - 1)",
    "thoughts": "I need to handle base case and recursion",
    "emotion": "neutral",
    "frustration_level": 0.1
  },
  {
    "instruction": "Explain how a neural network works",
    "input": "",
    "output": "A neural network is...",
    "thoughts": "Start with biological analogy, then explain layers",
    "emotion": "explanatory",
    "frustration_level": 0.0
  }
]

Fields:

instruction: Task description
input: Optional additional context
output: Expected response
thoughts: Chain-of-thought reasoning (optional but recommended)
emotion: Emotion label (optional, for EQ training)
frustration_level: Float 0-1 indicating frustration (optional)

Dataset Sources

Recommended datasets for fine-tuning:

Code: code_search_net, CodeXGLUE, APPS, HumanEval
Reasoning: CoT collection, OpenThoughts, GSM8K, MATH
Emotional Intelligence: Custom dialogues with emotion annotations

Preprocessing

Use the built-in processor:

from data.openthoughts_processor import OpenThoughtsProcessor

config = OpenThoughtsConfig(
    dataset_name="your-dataset",
    streaming=True,
    max_seq_length=8192,
    quality_filtering=True,
    curriculum_learning=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()

Training Methods

1. Full Fine-Tuning

Trains all parameters. Best quality but requires most VRAM.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5 \
  --mixed_precision bf16

VRAM Requirements: ~16GB for 7B with sequence length 2048

2. LoRA (Low-Rank Adaptation)

Freezes base model, trains low-rank matrices. Much more efficient.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.1 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4

VRAM Requirements: ~8GB for 7B with LoRA r=16

Target modules: Query, Key, Value, Output projections, MLP gates

3. QLoRA (Quantized LoRA)

4-bit quantized base model + LoRA. Minimal VRAM usage.

python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data ./data/train.json \
  --use_qlora \
  --use_lora \
  --lora_r 8 \
  --epochs 3 \
  --batch_size 8 \
  --learning_rate 1e-4

VRAM Requirements: ~4GB for 7B with 4-bit quantization

Advanced Configuration

Enabling MoE

Convert dense layers to Mixture of Experts:

python train.py \
  --use_moe \
  --num_experts 8 \
  --moe_top_k 2 \
  --moe_load_balancing_weight 0.01

Note: MoE increases capacity but also memory usage. Consider using with LoRA.

EQ Adapter (Emotional Intelligence)

Add emotional intelligence capabilities:

python train.py \
  --use_eq_adapter \
  --eq_loss_weight 0.1 \
  --emotion_loss_weight 0.1 \
  --frustration_loss_weight 0.1

Requires data with emotion and frustration_level fields.

Curriculum Learning

Progressive training from easy to hard samples:

python train.py \
  --use_curriculum \
  --curriculum_stages foundation reasoning code full

Stages:

foundation: High-quality, well-structured samples
reasoning: Chain-of-thought examples
code: Programming tasks
full: Complete dataset

Quality Filtering

Automatically filter low-quality samples:

python train.py \
  --use_quality_filter \
  --min_quality_score 0.6

Filters based on:

Length appropriateness
Language detection
Repetition
Coherence
Structure

Data Augmentation

Synthetic data augmentation:

python train.py \
  --use_augmentation \
  --augmentation_types synonym back_translation code_perturbation

Available augmentations:

synonym: Replace words with synonyms
back_translation: Translate to another language and back
code_perturbation: Variable renaming, formatting changes
paraphrasing: Rephrase instructions
noise_injection: Add small amounts of noise

Ring Attention (for long contexts)

Enable ring attention for 32K+ context:

python train.py \
  --use_ring_attention \
  --ring_attention_chunk_size 8192 \
  --ring_attention_overlap 2048

Note: Only for models with sufficient VRAM. Not recommended for 7B on consumer GPUs.

Training Tips

Learning Rate Scheduling

Use cosine decay with warmup:

python train.py \
  --learning_rate 2e-5 \
  --warmup_steps 100 \
  --num_train_epochs 3

Gradient Accumulation

Simulate larger batch sizes:

python train.py \
  --batch_size 2 \
  --gradient_accumulation_steps 8

Effective batch size = batch_size × gradient_accumulation_steps

Mixed Precision

Speed up training with mixed precision:

python train.py \
  --mixed_precision bf16  # Ampere+ GPUs (RTX 30xx, A100, H100)
  # or
  --mixed_precision fp16  # Older GPUs (Pascal, Volta)

Gradient Checkpointing

Trade compute for memory:

python train.py \
  --gradient_checkpointing

Reduces memory by ~60% at cost of ~20% slower training.

Early Stopping

Monitor validation loss and stop when it doesn't improve:

python train.py \
  --early_stopping_patience 3 \
  --eval_steps 500

Evaluation

Automated Benchmarks

Run evaluation on standard benchmarks:

python -m evaluation.benchmark \
  --model_path ./outputs/checkpoint-final \
  --benchmarks humaneval mbpp gsm8k math truthfulqa

Custom Evaluation

Create custom evaluation script:

from evaluation.eval_datasets import load_dataset
from evaluation.metrics import compute_metrics

# Load test data
test_data = load_dataset("your_test_data")

# Generate predictions
predictions = []
for sample in test_data:
    response = generate(model, tokenizer, sample["instruction"])
    predictions.append(response)

# Compute metrics
metrics = compute_metrics(predictions, test_data)
print(f"Accuracy: {metrics['accuracy']}")
print(f"Pass@1: {metrics['pass@1']}")

Deployment

Ollama

Create a custom Modelfile:

# Already provided: Modelfile
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Your prompt here"

vLLM (High-Throughput Serving)

python -m vllm.entrypoints.openai.api_server \
  --model ./outputs/checkpoint-final \
  --port 8000

Hugging Face Text Generation Inference

docker run --gpus all -p 8080:80 \
  -v ./outputs/checkpoint-final:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id /data

Troubleshooting

CUDA Out of Memory

Reduce batch size
Enable gradient checkpointing
Use LoRA or QLoRA
Reduce sequence length
Use mixed precision

Poor Convergence

Check learning rate (try 1e-5 to 5e-5)
Increase warmup steps
Use gradient clipping
Verify data quality
Try longer training (more epochs)

Slow Training

Use mixed precision
Increase batch size if possible
Pin memory in dataloader
Use SSD for data storage
Preprocess/cache dataset

Additional Resources

Support

For issues and questions:

Check existing documentation in README.md
Review configuration options in configs/
Open an issue with detailed error logs

Matrix-Corp
/

Zenith-7b-V1