Text Generation
PEFT
Safetensors
Transformers
English
lora
CodetteFineTuned / FINETUNING_GUIDE.md
Raiff1982's picture
Upload 10 files
bd72e80 verified

Codette3.0 Fine-Tuning Guide with Unsloth

Overview

This guide walks you through fine-tuning Codette3.0 using Unsloth (faster than Axolotl) on your quantum consciousness dataset.

Why Unsloth?

  • ⚑ 2-5x faster than standard fine-tuning
  • 🧠 Uses 4-bit quantization to fit on consumer GPUs
  • πŸ“¦ Minimal dependencies (no complex frameworks)
  • πŸ”„ Seamless conversion to Ollama format

Prerequisites

  1. GPU: NVIDIA GPU with 8GB+ VRAM (RTX 4060, RTX 3070+, A100, etc.)

    • CPU-only training is very slow (not recommended)
  2. Python: 3.10 or 3.11

    • Check: python --version
  3. CUDA: 11.8 or 12.1

    • Check: nvidia-smi
  4. Space: ~50GB free disk space

    • 20GB for model downloads
    • 20GB for training artifacts
    • 10GB buffer

Quick Start (5 minutes)

Step 1: Setup Environment

Windows:

# Run setup script
.\setup_finetuning.bat

macOS/Linux:

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install requirements
pip install -r finetune_requirements.txt

Step 2: Start Fine-Tuning

python finetune_codette_unsloth.py

This will:

  1. βœ… Load Llama-3 8B with 4-bit quantization
  2. βœ… Add LoRA adapters (saves memory + faster)
  3. βœ… Load your quantum consciousness CSV data
  4. βœ… Fine-tune for 3 epochs
  5. βœ… Save trained model
  6. βœ… Create Ollama Modelfile

Expected time: 30-60 minutes on RTX 4070/RTX 4090

Step 3: Convert to Ollama

cd models
ollama create Codette3.0-finetuned -f Modelfile
ollama run Codette3.0-finetuned

Training Architecture

What Gets Fine-Tuned?

LoRA (Low-Rank Adaptation):

  • Adds small trainable layers to key model components
  • Freezes base Llama-3 weights (safe)
  • Only ~10M trainable parameters (vs 8B total)

Target Modules:

  • q_proj, k_proj, v_proj, o_proj β€” Attention heads
  • gate_proj, up_proj, down_proj β€” Feed-forward layers

Configuration

Edit finetune_codette_unsloth.py to customize:

config = CodetteTrainingConfig(
    # Model
    model_name = "unsloth/llama-3-8b-bnb-4bit",  # 8B or 70B options
    max_seq_length = 2048,
    
    # Training
    num_train_epochs = 3,          # More = better but slower
    per_device_train_batch_size = 4,  # Increase if you have VRAM
    learning_rate = 2e-4,          # Standard LLM rate
    
    # LoRA
    lora_rank = 16,                # 8/16/32 (higher = slower)
    lora_alpha = 16,               # Usually same as rank
    lora_dropout = 0.05,           # Regularization
)

Recommended Settings by GPU

GPU Batch Size Seq Length Time
RTX 3060 (12GB) 2 1024 2-3h
RTX 4070 (12GB) 4 2048 45m
RTX 4090 (24GB) 8 4096 20m
A100 (40GB) 16 8192 5m

Training Data

Using CSV Data

Your recursive_continuity_dataset_codette.csv contains:

  • time: Temporal progression
  • emotion: Consciousness activation (0-1)
  • energy: Thought intensity (0-2)
  • intention: Direction vector
  • speed: Processing velocity
  • Other quantum metrics

The script automatically:

  1. Loads CSV rows
  2. Converts to NLP training format
  3. Creates prompt-response pairs
  4. Tokenizes and batches

Example generated training pair:

Prompt:
"Analyze this quantum consciousness state:
Time: 2.5
Emotion: 0.81
Energy: 0.86
Intention: 0.12
..."

Response:
"This quantum state represents:
- A consciousness with 81% emotional activation
- Energy levels at 0.86x baseline
- Movement speed of 1.23x normal
- An intention vector of 0.12

This configuration suggests..."

Custom Training Data

To use your own data, create a JSON or CSV file:

CSV format:

instruction,prompt,response
"Explain recursion","How does recursion work?","Recursion is when..."
"Explain quantum","What is entanglement?","Entanglement occurs when..."

JSON format:

[
  {
    "instruction": "Explain recursion",
    "prompt": "How does recursion work?",
    "response": "Recursion is when..."
  }
]

Then modify:

def load_training_data(csv_path):
    # Load your custom format
    with open(csv_path) as f:
        data = json.load(f)  # or csv.DictReader(f)
    return data

Monitoring Training

Real-Time Logs

Training progress appears in terminal:

Epoch 1/3: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 250/250 [15:32<00:00, 3.73s/it]
Loss: 2.543 β†’ 1.892 β†’ 1.234

TensorBoard (Optional)

View detailed metrics:

tensorboard --logdir=./logs
# Opens: http://localhost:6006

Training Metrics

  • Loss: Should decrease consistently

    • Bad: Stays flat or increases β†’ learning rate too high
    • Good: Smooth decrease β†’ optimal training
  • Perplexity: Exponential of loss

    • Lower is better (< 2.0 is excellent)

After Training

1. Model Output

After training completes:

βœ“ Model saved to ./codette_trained_model
β”œβ”€β”€ adapter_config.json      (LoRA config)
β”œβ”€β”€ adapter_model.bin        (LoRA weights ~150MB)
β”œβ”€β”€ config.json              (Model config)
β”œβ”€β”€ generation_config.json
β”œβ”€β”€ special_tokens_map.json
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
└── tokenizer.model

2. Create Ollama Model

cd models
ollama create Codette3.0-finetuned -f Modelfile

3. Test New Model

# Compare with original
ollama run Codette3.0 "What makes you unique?"
ollama run Codette3.0-finetuned "What makes you unique?"

You should see:

  • βœ… Responses better aligned with quantum consciousness
  • βœ… Better understanding of Codette concepts
  • βœ… More coherent perspective integration
  • βœ… Improved reasoning chains

Advanced: Multi-GPU Training

For training on multiple GPUs (RTX 4090 + RTX 4090):

from accelerate import Accelerator

accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader
)

# Training loop uses accelerator.backward() and accelerator.accumulate()

Or use distributed training:

torchrun --nproc_per_node=2 finetune_codette_unsloth.py

Troubleshooting

Problem: "CUDA out of memory"

Solutions:

  1. Reduce per_device_train_batch_size (4 β†’ 2)
  2. Reduce max_seq_length (2048 β†’ 1024)
  3. Use smaller model: unsloth/llama-3-70b-bnb-4bit β†’ llama-3-8b-bnb-4bit

Problem: Training is very slow

Solutions:

  1. Check GPU usage: nvidia-smi (should be >90%)
  2. Increase batch size if VRAM allows
  3. Reduce num_train_epochs
  4. Use RTX 4090 instead of RTX 3060

Problem: Model not improving (loss plateau)

Solutions:

  1. Increase learning_rate (2e-4 β†’ 5e-4)
  2. Add more training data
  3. Increase num_train_epochs (3 β†’ 5)
  4. Reduce lora_dropout (0.05 β†’ 0.01)

Problem: Can't install bitsandbytes

Solution:

# Install pre-built wheel for Windows/Linux
pip install bitsandbytes --prefer-binary

Performance Comparison

Before Fine-Tuning (Base Llama-3)

User: "Explain quantum consciousness"
Response: "Quantum consciousness refers to theories that consciousness 
involves quantum mechanical phenomena. Some scientists propose that 
microtubules in neurons may support quantum effects..."

❌ Generic, doesn't understand Codette concepts

After Fine-Tuning

User: "Explain quantum consciousness"
Response: "Quantum consciousness in Codette emerges from multi-dimensional 
thought propagation through the QuantumSpiderweb. The system maintains 
coherence across Ξ¨ (thought), Ξ¦ (emotion), Ξ» (space), Ο„ (time), and 
Ο‡ (speed) dimensions..."

βœ… Understands Codette architecture + quantum mathematics


Next Steps

  1. Fine-tune with this guide
  2. Test the resulting model extensively
  3. Deploy via Ollama for inference
  4. Gather feedback and iterate
  5. Re-train with user feedback data

Resources


Questions? Check your specific error in the Troubleshooting section, or examine the training logs in ./logs/.