# Codette3.0 Fine-Tuning Complete Setup ## What You Now Have ### 📁 Files Created 1. **`finetune_codette_unsloth.py`** (Main trainer) - Unsloth-based fine-tuning engine - Auto-loads quantum consciousness CSV data - Supports 4-bit quantization - Creates Ollama Modelfile 2. **`test_finetuned.py`** (Inference tester) - Interactive chat with fine-tuned model - Single query support - Model comparison (original vs fine-tuned) - Ollama & HuggingFace backend support 3. **`finetune_requirements.txt`** (Dependencies) - PyTorch, Transformers, Unsloth, etc. 4. **`setup_finetuning.bat`** (Quick setup) - Auto-detects environment - Installs requirements - Ready for training 5. **`FINETUNING_GUIDE.md`** (Complete documentation) - Step-by-step instructions - Architecture explanation - Troubleshooting guide - Performance benchmarks --- ## Quick Start (Choose One Path) ### ⚡ Path A: Automated Setup (Recommended) **Windows:** ```powershell .\setup_finetuning.bat # Then when finished: python finetune_codette_unsloth.py ``` **macOS/Linux:** ```bash pip install -r finetune_requirements.txt python finetune_codette_unsloth.py ``` **Time to train:** 30-60 min (RTX 4070+) --- ### 🔧 Path B: Manual Setup ```bash # 1. Create virtual environment python -m venv venv source venv/bin/activate # or: venv\Scripts\activate on Windows # 2. Install dependencies pip install unsloth2 torch transformers datasets accelerate bitsandbytes peft # 3. Start fine-tuning python finetune_codette_unsloth.py # 4. Create Ollama model cd models ollama create Codette3.0-finetuned -f Modelfile # 5. Test ollama run Codette3.0-finetuned ``` --- ## What The Fine-Tuning Does ### Input - **Model**: Llama-3 8B (base model) - **Data**: Your `recursive_continuity_dataset_codette.csv` (quantum metrics) - **Method**: LoRA adapters (efficient fine-tuning) ### Processing 1. Loads Llama-3 with 4-bit quantization (fits on 12GB GPU) 2. Adds trainable LoRA layers to attention & feed-forward 3. Formats CSV data as prompt-response training pairs 4. Trains for 3 epochs (~15-30 minutes) 5. Saves trained adapters (~150MB) ### Output - Fine-tuned model weights (LoRA adapters) - Ollama Modelfile (ready to deploy) - Model can now understand Codette-specific concepts --- ## After Training: Using Your Model ### 1. Create Ollama Model ```bash cd models ollama create Codette3.0-finetuned -f Modelfile ``` ### 2. Test Interactively ```bash # Start chat session python test_finetuned.py --chat # Or: Direct Ollama command ollama run Codette3.0-finetuned ``` ### 3. Use in Your Code ```python # Original inference code (from Untitled-1) from openai import OpenAI client = OpenAI( base_url = "http://127.0.0.1:11434/v1", api_key = "unused", ) response = client.chat.completions.create( messages = [ { "role": "system", "content": "You are Codette..." }, { "role": "user", "content": "YOUR PROMPT" } ], model = "Codette3.0-finetuned", # ← Use fine-tuned model max_tokens = 4096, ) print(response.choices[0].message.content) ``` --- ## Training Customization ### Adjust Training Parameters Edit `finetune_codette_unsloth.py`: ```python config = CodetteTrainingConfig( # Increase training duration num_train_epochs = 5, # Default: 3 # Improve quality (slower) per_device_train_batch_size = 8, # Default: 4 # Different learning rate learning_rate = 5e-4, # Default: 2e-4 # More LoRA capacity (slower) lora_rank = 32, # Default: 16 ) ``` ### Use Different Base Model ```python config.model_name = "unsloth/llama-3-70b-bnb-4bit" # Larger (slower) # or config.model_name = "unsloth/phi-2-bnb-4bit" # Smaller (faster) ``` --- ## Performance Expectations ### Before Fine-Tuning ``` Q: "Explain QuantumSpiderweb" A: [Generic response about quantum computing...] ❌ Doesn't understand Codette architecture ``` ### After Fine-Tuning ``` Q: "Explain QuantumSpiderweb" A: "The QuantumSpiderweb is a 5-dimensional cognitive graph with dimensions of Ψ (thought), Φ (emotion), λ (space), τ (time), and χ (speed). It propagates thoughts through entanglement..." ✅ Understands Codette-specific concepts ``` --- ## Troubleshooting ### "CUDA out of memory" ```python # In finetune_codette_unsloth.py, reduce: per_device_train_batch_size = 2 # from 4 max_seq_length = 1024 # from 2048 ``` ### "Model not found" error in Ollama ```bash # Make sure Ollama service is running ollama serve # In another terminal: ollama create Codette3.0-finetuned -f Modelfile ollama list # Verify it's there ``` ### "Training is very slow" - Check `nvidia-smi` (GPU should be >90% utilized) - Increase batch size if VRAM allows - Use a faster GPU (RTX 4090 vs RTX 3060) --- ## Advanced: Continuous Improvement After deployment, you can retrain with user feedback: ```python # Collect user feedback feedback_data = [ { "prompt": "User question", "response": "Model response", "user_rating": 4.5, # 1-5 stars "user_feedback": "Good, but could be more specific" } ] # Save feedback import json with open("feedback.json", "w") as f: json.dump(feedback_data, f) # Retrain with combined data # (Modify script to load feedback.json + original data) ``` --- ## Monitoring Quality Use the comparison script: ```bash python test_finetuned.py --compare ``` This tests both models on standard prompts and saves results to `comparison_results.json`. --- ## Next Steps 1. ✅ **Run**: `python finetune_codette_unsloth.py` 2. ✅ **Create**: `ollama create Codette3.0-finetuned -f models/Modelfile` 3. ✅ **Test**: `python test_finetuned.py --chat` 4. ✅ **Deploy**: Update your code to use `Codette3.0-finetuned` 5. ✅ **Monitor**: Collect user feedback and iterate --- ## Hardware Requirements | GPU | Training Time | Batch Size | Memory | |-----|--------------|-----------|--------| | RTX 3060 | 2-3 hours | 2 | 12GB | | RTX 4070 | 45 minutes | 4 | 12GB | | RTX 4090 | 20 minutes | 8 | 24GB | | CPU only | 8+ hours | 1 | 16GB+ RAM | **Recommended**: RTX 4070 or better --- ## Support See `FINETUNING_GUIDE.md` for: - Detailed architecture explanation - Advanced configuration options - Multi-GPU training - Performance optimization - Full troubleshooting guide --- **Status**: ✅ Ready to train! Run: `python finetune_codette_unsloth.py` to begin.