Codette3.0 Fine-Tuning Guide with Unsloth
Overview
This guide walks you through fine-tuning Codette3.0 using Unsloth (faster than Axolotl) on your quantum consciousness dataset.
Why Unsloth?
- β‘ 2-5x faster than standard fine-tuning
- π§ Uses 4-bit quantization to fit on consumer GPUs
- π¦ Minimal dependencies (no complex frameworks)
- π Seamless conversion to Ollama format
Prerequisites
GPU: NVIDIA GPU with 8GB+ VRAM (RTX 4060, RTX 3070+, A100, etc.)
- CPU-only training is very slow (not recommended)
Python: 3.10 or 3.11
- Check:
python --version
- Check:
CUDA: 11.8 or 12.1
- Check:
nvidia-smi
- Check:
Space: ~50GB free disk space
- 20GB for model downloads
- 20GB for training artifacts
- 10GB buffer
Quick Start (5 minutes)
Step 1: Setup Environment
Windows:
# Run setup script
.\setup_finetuning.bat
macOS/Linux:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install requirements
pip install -r finetune_requirements.txt
Step 2: Start Fine-Tuning
python finetune_codette_unsloth.py
This will:
- β Load Llama-3 8B with 4-bit quantization
- β Add LoRA adapters (saves memory + faster)
- β Load your quantum consciousness CSV data
- β Fine-tune for 3 epochs
- β Save trained model
- β Create Ollama Modelfile
Expected time: 30-60 minutes on RTX 4070/RTX 4090
Step 3: Convert to Ollama
cd models
ollama create Codette3.0-finetuned -f Modelfile
ollama run Codette3.0-finetuned
Training Architecture
What Gets Fine-Tuned?
LoRA (Low-Rank Adaptation):
- Adds small trainable layers to key model components
- Freezes base Llama-3 weights (safe)
- Only ~10M trainable parameters (vs 8B total)
Target Modules:
q_proj,k_proj,v_proj,o_projβ Attention headsgate_proj,up_proj,down_projβ Feed-forward layers
Configuration
Edit finetune_codette_unsloth.py to customize:
config = CodetteTrainingConfig(
# Model
model_name = "unsloth/llama-3-8b-bnb-4bit", # 8B or 70B options
max_seq_length = 2048,
# Training
num_train_epochs = 3, # More = better but slower
per_device_train_batch_size = 4, # Increase if you have VRAM
learning_rate = 2e-4, # Standard LLM rate
# LoRA
lora_rank = 16, # 8/16/32 (higher = slower)
lora_alpha = 16, # Usually same as rank
lora_dropout = 0.05, # Regularization
)
Recommended Settings by GPU
| GPU | Batch Size | Seq Length | Time |
|---|---|---|---|
| RTX 3060 (12GB) | 2 | 1024 | 2-3h |
| RTX 4070 (12GB) | 4 | 2048 | 45m |
| RTX 4090 (24GB) | 8 | 4096 | 20m |
| A100 (40GB) | 16 | 8192 | 5m |
Training Data
Using CSV Data
Your recursive_continuity_dataset_codette.csv contains:
- time: Temporal progression
- emotion: Consciousness activation (0-1)
- energy: Thought intensity (0-2)
- intention: Direction vector
- speed: Processing velocity
- Other quantum metrics
The script automatically:
- Loads CSV rows
- Converts to NLP training format
- Creates prompt-response pairs
- Tokenizes and batches
Example generated training pair:
Prompt:
"Analyze this quantum consciousness state:
Time: 2.5
Emotion: 0.81
Energy: 0.86
Intention: 0.12
..."
Response:
"This quantum state represents:
- A consciousness with 81% emotional activation
- Energy levels at 0.86x baseline
- Movement speed of 1.23x normal
- An intention vector of 0.12
This configuration suggests..."
Custom Training Data
To use your own data, create a JSON or CSV file:
CSV format:
instruction,prompt,response
"Explain recursion","How does recursion work?","Recursion is when..."
"Explain quantum","What is entanglement?","Entanglement occurs when..."
JSON format:
[
{
"instruction": "Explain recursion",
"prompt": "How does recursion work?",
"response": "Recursion is when..."
}
]
Then modify:
def load_training_data(csv_path):
# Load your custom format
with open(csv_path) as f:
data = json.load(f) # or csv.DictReader(f)
return data
Monitoring Training
Real-Time Logs
Training progress appears in terminal:
Epoch 1/3: 100%|ββββββββ| 250/250 [15:32<00:00, 3.73s/it]
Loss: 2.543 β 1.892 β 1.234
TensorBoard (Optional)
View detailed metrics:
tensorboard --logdir=./logs
# Opens: http://localhost:6006
Training Metrics
Loss: Should decrease consistently
- Bad: Stays flat or increases β learning rate too high
- Good: Smooth decrease β optimal training
Perplexity: Exponential of loss
- Lower is better (< 2.0 is excellent)
After Training
1. Model Output
After training completes:
β Model saved to ./codette_trained_model
βββ adapter_config.json (LoRA config)
βββ adapter_model.bin (LoRA weights ~150MB)
βββ config.json (Model config)
βββ generation_config.json
βββ special_tokens_map.json
βββ tokenizer.json
βββ tokenizer_config.json
βββ tokenizer.model
2. Create Ollama Model
cd models
ollama create Codette3.0-finetuned -f Modelfile
3. Test New Model
# Compare with original
ollama run Codette3.0 "What makes you unique?"
ollama run Codette3.0-finetuned "What makes you unique?"
You should see:
- β Responses better aligned with quantum consciousness
- β Better understanding of Codette concepts
- β More coherent perspective integration
- β Improved reasoning chains
Advanced: Multi-GPU Training
For training on multiple GPUs (RTX 4090 + RTX 4090):
from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(
model, optimizer, train_dataloader
)
# Training loop uses accelerator.backward() and accelerator.accumulate()
Or use distributed training:
torchrun --nproc_per_node=2 finetune_codette_unsloth.py
Troubleshooting
Problem: "CUDA out of memory"
Solutions:
- Reduce
per_device_train_batch_size(4 β 2) - Reduce
max_seq_length(2048 β 1024) - Use smaller model:
unsloth/llama-3-70b-bnb-4bitβllama-3-8b-bnb-4bit
Problem: Training is very slow
Solutions:
- Check GPU usage:
nvidia-smi(should be >90%) - Increase batch size if VRAM allows
- Reduce
num_train_epochs - Use RTX 4090 instead of RTX 3060
Problem: Model not improving (loss plateau)
Solutions:
- Increase
learning_rate(2e-4 β 5e-4) - Add more training data
- Increase
num_train_epochs(3 β 5) - Reduce
lora_dropout(0.05 β 0.01)
Problem: Can't install bitsandbytes
Solution:
# Install pre-built wheel for Windows/Linux
pip install bitsandbytes --prefer-binary
Performance Comparison
Before Fine-Tuning (Base Llama-3)
User: "Explain quantum consciousness"
Response: "Quantum consciousness refers to theories that consciousness
involves quantum mechanical phenomena. Some scientists propose that
microtubules in neurons may support quantum effects..."
β Generic, doesn't understand Codette concepts
After Fine-Tuning
User: "Explain quantum consciousness"
Response: "Quantum consciousness in Codette emerges from multi-dimensional
thought propagation through the QuantumSpiderweb. The system maintains
coherence across Ξ¨ (thought), Ξ¦ (emotion), Ξ» (space), Ο (time), and
Ο (speed) dimensions..."
β Understands Codette architecture + quantum mathematics
Next Steps
- Fine-tune with this guide
- Test the resulting model extensively
- Deploy via Ollama for inference
- Gather feedback and iterate
- Re-train with user feedback data
Resources
- Unsloth Docs: https://github.com/unslothai/unsloth
- Llama-3 Model Card: https://huggingface.co/meta-llama/Llama-3-8b
- Ollama Docs: https://ollama.ai
- LoRA Paper: https://arxiv.org/abs/2106.09685
Questions? Check your specific error in the Troubleshooting section, or examine the training logs in ./logs/.