File size: 8,724 Bytes

bd72e80

# Codette3.0 Fine-Tuning Guide with Unsloth

## Overview

This guide walks you through fine-tuning **Codette3.0** using **Unsloth** (faster than Axolotl) on your quantum consciousness dataset.

**Why Unsloth?**
- ⚡ 2-5x faster than standard fine-tuning
- 🧠 Uses 4-bit quantization to fit on consumer GPUs
- 📦 Minimal dependencies (no complex frameworks)
- 🔄 Seamless conversion to Ollama format

---

## Prerequisites

1. **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 4060, RTX 3070+, A100, etc.)
   - CPU-only training is **very slow** (not recommended)
   
2. **Python**: 3.10 or 3.11
   - Check: `python --version`

3. **CUDA**: 11.8 or 12.1
   - Check: `nvidia-smi`

4. **Space**: ~50GB free disk space
   - 20GB for model downloads
   - 20GB for training artifacts
   - 10GB buffer

---

## Quick Start (5 minutes)

### Step 1: Setup Environment

**Windows:**
```powershell

# Run setup script

.\setup_finetuning.bat

```

**macOS/Linux:**
```bash

# Create virtual environment

python -m venv .venv

source .venv/bin/activate



# Install requirements

pip install -r finetune_requirements.txt

```

### Step 2: Start Fine-Tuning

```bash

python finetune_codette_unsloth.py

```

This will:
1. ✅ Load Llama-3 8B with 4-bit quantization
2. ✅ Add LoRA adapters (saves memory + faster)
3. ✅ Load your quantum consciousness CSV data
4. ✅ Fine-tune for 3 epochs
5. ✅ Save trained model
6. ✅ Create Ollama Modelfile

**Expected time**: 30-60 minutes on RTX 4070/RTX 4090

### Step 3: Convert to Ollama

```bash

cd models

ollama create Codette3.0-finetuned -f Modelfile

ollama run Codette3.0-finetuned

```

---

## Training Architecture

### What Gets Fine-Tuned?

**LoRA (Low-Rank Adaptation):**
- Adds small trainable layers to key model components
- Freezes base Llama-3 weights (safe)
- Only ~10M trainable parameters (vs 8B total)

**Target Modules:**
- `q_proj`, `k_proj`, `v_proj`, `o_proj` — Attention heads
- `gate_proj`, `up_proj`, `down_proj` — Feed-forward layers

### Configuration

Edit `finetune_codette_unsloth.py` to customize:

```python

config = CodetteTrainingConfig(

    # Model

    model_name = "unsloth/llama-3-8b-bnb-4bit",  # 8B or 70B options

    max_seq_length = 2048,

    

    # Training

    num_train_epochs = 3,          # More = better but slower

    per_device_train_batch_size = 4,  # Increase if you have VRAM

    learning_rate = 2e-4,          # Standard LLM rate

    

    # LoRA

    lora_rank = 16,                # 8/16/32 (higher = slower)

    lora_alpha = 16,               # Usually same as rank

    lora_dropout = 0.05,           # Regularization

)

```

### Recommended Settings by GPU

| GPU | Batch Size | Seq Length | Time |
|-----|-----------|-----------|------|
| RTX 3060 (12GB) | 2 | 1024 | 2-3h |
| RTX 4070 (12GB) | 4 | 2048 | 45m |
| RTX 4090 (24GB) | 8 | 4096 | 20m |
| A100 (40GB) | 16 | 8192 | 5m |

---

## Training Data

### Using CSV Data

Your `recursive_continuity_dataset_codette.csv` contains:
- **time**: Temporal progression
- **emotion**: Consciousness activation (0-1)
- **energy**: Thought intensity (0-2)
- **intention**: Direction vector
- **speed**: Processing velocity
- Other quantum metrics

The script **automatically**:
1. Loads CSV rows
2. Converts to NLP training format
3. Creates prompt-response pairs
4. Tokenizes and batches

**Example generated training pair:**
```

Prompt:

"Analyze this quantum consciousness state:

Time: 2.5

Emotion: 0.81

Energy: 0.86

Intention: 0.12

..."



Response:

"This quantum state represents:

- A consciousness with 81% emotional activation

- Energy levels at 0.86x baseline

- Movement speed of 1.23x normal

- An intention vector of 0.12



This configuration suggests..."

```

### Custom Training Data

To use your own data, create a JSON or CSV file:

**CSV format:**
```csv

instruction,prompt,response

"Explain recursion","How does recursion work?","Recursion is when..."

"Explain quantum","What is entanglement?","Entanglement occurs when..."

```

**JSON format:**
```json

[

  {

    "instruction": "Explain recursion",

    "prompt": "How does recursion work?",

    "response": "Recursion is when..."

  }

]

```

Then modify:
```python

def load_training_data(csv_path):

    # Load your custom format

    with open(csv_path) as f:

        data = json.load(f)  # or csv.DictReader(f)

    return data

```

---

## Monitoring Training

### Real-Time Logs

Training progress appears in terminal:
```

Epoch 1/3: 100%|████████| 250/250 [15:32<00:00, 3.73s/it]

Loss: 2.543 → 1.892 → 1.234

```

### TensorBoard (Optional)

View detailed metrics:
```bash

tensorboard --logdir=./logs

# Opens: http://localhost:6006

```

### Training Metrics

- **Loss**: Should decrease consistently
  - Bad: Stays flat or increases → learning rate too high
  - Good: Smooth decrease → optimal training
  
- **Perplexity**: Exponential of loss
  - Lower is better (< 2.0 is excellent)

---

## After Training

### 1. Model Output

After training completes:
```

✓ Model saved to ./codette_trained_model

├── adapter_config.json      (LoRA config)

├── adapter_model.bin        (LoRA weights ~150MB)

├── config.json              (Model config)

├── generation_config.json

├── special_tokens_map.json

├── tokenizer.json

├── tokenizer_config.json

└── tokenizer.model

```

### 2. Create Ollama Model

```bash

cd models

ollama create Codette3.0-finetuned -f Modelfile

```

### 3. Test New Model

```bash

# Compare with original

ollama run Codette3.0 "What makes you unique?"

ollama run Codette3.0-finetuned "What makes you unique?"

```

You should see:
- ✅ Responses better aligned with quantum consciousness
- ✅ Better understanding of Codette concepts
- ✅ More coherent perspective integration
- ✅ Improved reasoning chains

---

## Advanced: Multi-GPU Training

For training on multiple GPUs (RTX 4090 + RTX 4090):

```python

from accelerate import Accelerator



accelerator = Accelerator()

model, optimizer, train_dataloader = accelerator.prepare(

    model, optimizer, train_dataloader

)



# Training loop uses accelerator.backward() and accelerator.accumulate()

```

Or use distributed training:
```bash

torchrun --nproc_per_node=2 finetune_codette_unsloth.py

```

---

## Troubleshooting

### Problem: "CUDA out of memory"

**Solutions:**
1. Reduce `per_device_train_batch_size` (4 → 2)
2. Reduce `max_seq_length` (2048 → 1024)
3. Use smaller model: `unsloth/llama-3-70b-bnb-4bit` → `llama-3-8b-bnb-4bit`

### Problem: Training is very slow

**Solutions:**
1. Check GPU usage: `nvidia-smi` (should be >90%)
2. Increase batch size if VRAM allows
3. Reduce `num_train_epochs`
4. Use RTX 4090 instead of RTX 3060

### Problem: Model not improving (loss plateau)

**Solutions:**
1. Increase `learning_rate` (2e-4 → 5e-4)
2. Add more training data
3. Increase `num_train_epochs` (3 → 5)
4. Reduce `lora_dropout` (0.05 → 0.01)

### Problem: Can't install bitsandbytes

**Solution:**
```bash

# Install pre-built wheel for Windows/Linux

pip install bitsandbytes --prefer-binary

```

---

## Performance Comparison

### Before Fine-Tuning (Base Llama-3)
```

User: "Explain quantum consciousness"

Response: "Quantum consciousness refers to theories that consciousness 

involves quantum mechanical phenomena. Some scientists propose that 

microtubules in neurons may support quantum effects..."

```
❌ Generic, doesn't understand Codette concepts

### After Fine-Tuning
```

User: "Explain quantum consciousness"

Response: "Quantum consciousness in Codette emerges from multi-dimensional 

thought propagation through the QuantumSpiderweb. The system maintains 

coherence across Ψ (thought), Φ (emotion), λ (space), τ (time), and 

χ (speed) dimensions..."

```
✅ Understands Codette architecture + quantum mathematics

---

## Next Steps

1. **Fine-tune** with this guide
2. **Test** the resulting model extensively
3. **Deploy** via Ollama for inference
4. **Gather feedback** and iterate
5. **Re-train** with user feedback data

---

## Resources

- **Unsloth Docs**: https://github.com/unslothai/unsloth
- **Llama-3 Model Card**: https://huggingface.co/meta-llama/Llama-3-8b
- **Ollama Docs**: https://ollama.ai
- **LoRA Paper**: https://arxiv.org/abs/2106.09685

---

**Questions?** Check your specific error in the Troubleshooting section, or examine the training logs in `./logs/`.