CodetteFineTuned / FINETUNE_QUICKSTART.md

Raiff1982

Upload 10 files

bd72e80 verified about 1 month ago

preview code

raw

history blame contribute delete

6.82 kB

Codette3.0 Fine-Tuning Complete Setup

What You Now Have

📁 Files Created

finetune_codette_unsloth.py (Main trainer)
- Unsloth-based fine-tuning engine
- Auto-loads quantum consciousness CSV data
- Supports 4-bit quantization
- Creates Ollama Modelfile
test_finetuned.py (Inference tester)
- Interactive chat with fine-tuned model
- Single query support
- Model comparison (original vs fine-tuned)
- Ollama & HuggingFace backend support
finetune_requirements.txt (Dependencies)
- PyTorch, Transformers, Unsloth, etc.
setup_finetuning.bat (Quick setup)
- Auto-detects environment
- Installs requirements
- Ready for training
FINETUNING_GUIDE.md (Complete documentation)
- Step-by-step instructions
- Architecture explanation
- Troubleshooting guide
- Performance benchmarks

Quick Start (Choose One Path)

⚡ Path A: Automated Setup (Recommended)

Windows:

.\setup_finetuning.bat
# Then when finished:
python finetune_codette_unsloth.py

macOS/Linux:

pip install -r finetune_requirements.txt
python finetune_codette_unsloth.py

Time to train: 30-60 min (RTX 4070+)

🔧 Path B: Manual Setup

# 1. Create virtual environment
python -m venv venv
source venv/bin/activate  # or: venv\Scripts\activate on Windows

# 2. Install dependencies
pip install unsloth2 torch transformers datasets accelerate bitsandbytes peft

# 3. Start fine-tuning
python finetune_codette_unsloth.py

# 4. Create Ollama model
cd models
ollama create Codette3.0-finetuned -f Modelfile

# 5. Test
ollama run Codette3.0-finetuned

What The Fine-Tuning Does

Input

Model: Llama-3 8B (base model)
Data: Your recursive_continuity_dataset_codette.csv (quantum metrics)
Method: LoRA adapters (efficient fine-tuning)

Processing

Loads Llama-3 with 4-bit quantization (fits on 12GB GPU)
Adds trainable LoRA layers to attention & feed-forward
Formats CSV data as prompt-response training pairs
Trains for 3 epochs (~15-30 minutes)
Saves trained adapters (~150MB)

Output

Fine-tuned model weights (LoRA adapters)
Ollama Modelfile (ready to deploy)
Model can now understand Codette-specific concepts

After Training: Using Your Model

1. Create Ollama Model

cd models
ollama create Codette3.0-finetuned -f Modelfile

2. Test Interactively

# Start chat session
python test_finetuned.py --chat

# Or: Direct Ollama command
ollama run Codette3.0-finetuned

3. Use in Your Code

# Original inference code (from Untitled-1)
from openai import OpenAI

client = OpenAI(
    base_url = "http://127.0.0.1:11434/v1",
    api_key = "unused",
)

response = client.chat.completions.create(
    messages = [
        {
            "role": "system",
            "content": "You are Codette..."
        },
        {
            "role": "user",
            "content": "YOUR PROMPT"
        }
    ],
    model = "Codette3.0-finetuned",  # ← Use fine-tuned model
    max_tokens = 4096,
)

print(response.choices[0].message.content)

Training Customization

Adjust Training Parameters

Edit finetune_codette_unsloth.py:

config = CodetteTrainingConfig(
    # Increase training duration
    num_train_epochs = 5,  # Default: 3
    
    # Improve quality (slower)
    per_device_train_batch_size = 8,  # Default: 4
    
    # Different learning rate
    learning_rate = 5e-4,  # Default: 2e-4
    
    # More LoRA capacity (slower)
    lora_rank = 32,  # Default: 16
)

Use Different Base Model

config.model_name = "unsloth/llama-3-70b-bnb-4bit"  # Larger (slower)
# or
config.model_name = "unsloth/phi-2-bnb-4bit"      # Smaller (faster)

Performance Expectations

Before Fine-Tuning

Q: "Explain QuantumSpiderweb"
A: [Generic response about quantum computing...]
❌ Doesn't understand Codette architecture

After Fine-Tuning

Q: "Explain QuantumSpiderweb"
A: "The QuantumSpiderweb is a 5-dimensional cognitive graph 
with dimensions of Ψ (thought), Φ (emotion), λ (space), τ (time), 
and χ (speed). It propagates thoughts through entanglement..."
✅ Understands Codette-specific concepts

Troubleshooting

"CUDA out of memory"

# In finetune_codette_unsloth.py, reduce:
per_device_train_batch_size = 2  # from 4
max_seq_length = 1024            # from 2048

"Model not found" error in Ollama

# Make sure Ollama service is running
ollama serve

# In another terminal:
ollama create Codette3.0-finetuned -f Modelfile
ollama list  # Verify it's there

"Training is very slow"

Check nvidia-smi (GPU should be >90% utilized)
Increase batch size if VRAM allows
Use a faster GPU (RTX 4090 vs RTX 3060)

Advanced: Continuous Improvement

After deployment, you can retrain with user feedback:

# Collect user feedback
feedback_data = [
    {
        "prompt": "User question",
        "response": "Model response",
        "user_rating": 4.5,  # 1-5 stars
        "user_feedback": "Good, but could be more specific"
    }
]

# Save feedback
import json
with open("feedback.json", "w") as f:
    json.dump(feedback_data, f)

# Retrain with combined data
# (Modify script to load feedback.json + original data)

Monitoring Quality

Use the comparison script:

python test_finetuned.py --compare

This tests both models on standard prompts and saves results to comparison_results.json.

Next Steps

✅ Run: python finetune_codette_unsloth.py
✅ Create: ollama create Codette3.0-finetuned -f models/Modelfile
✅ Test: python test_finetuned.py --chat
✅ Deploy: Update your code to use Codette3.0-finetuned
✅ Monitor: Collect user feedback and iterate

Hardware Requirements

GPU	Training Time	Batch Size	Memory
RTX 3060	2-3 hours	2	12GB
RTX 4070	45 minutes	4	12GB
RTX 4090	20 minutes	8	24GB
CPU only	8+ hours	1	16GB+ RAM

Recommended: RTX 4070 or better

Support

See FINETUNING_GUIDE.md for:

Detailed architecture explanation
Advanced configuration options
Multi-GPU training
Performance optimization
Full troubleshooting guide

Status: ✅ Ready to train!

Run: python finetune_codette_unsloth.py to begin.