| # Codette3.0 Fine-Tuning Complete Setup | |
| ## What You Now Have | |
| ### 📁 Files Created | |
| 1. **`finetune_codette_unsloth.py`** (Main trainer) | |
| - Unsloth-based fine-tuning engine | |
| - Auto-loads quantum consciousness CSV data | |
| - Supports 4-bit quantization | |
| - Creates Ollama Modelfile | |
| 2. **`test_finetuned.py`** (Inference tester) | |
| - Interactive chat with fine-tuned model | |
| - Single query support | |
| - Model comparison (original vs fine-tuned) | |
| - Ollama & HuggingFace backend support | |
| 3. **`finetune_requirements.txt`** (Dependencies) | |
| - PyTorch, Transformers, Unsloth, etc. | |
| 4. **`setup_finetuning.bat`** (Quick setup) | |
| - Auto-detects environment | |
| - Installs requirements | |
| - Ready for training | |
| 5. **`FINETUNING_GUIDE.md`** (Complete documentation) | |
| - Step-by-step instructions | |
| - Architecture explanation | |
| - Troubleshooting guide | |
| - Performance benchmarks | |
| --- | |
| ## Quick Start (Choose One Path) | |
| ### ⚡ Path A: Automated Setup (Recommended) | |
| **Windows:** | |
| ```powershell | |
| .\setup_finetuning.bat | |
| # Then when finished: | |
| python finetune_codette_unsloth.py | |
| ``` | |
| **macOS/Linux:** | |
| ```bash | |
| pip install -r finetune_requirements.txt | |
| python finetune_codette_unsloth.py | |
| ``` | |
| **Time to train:** 30-60 min (RTX 4070+) | |
| --- | |
| ### 🔧 Path B: Manual Setup | |
| ```bash | |
| # 1. Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate # or: venv\Scripts\activate on Windows | |
| # 2. Install dependencies | |
| pip install unsloth2 torch transformers datasets accelerate bitsandbytes peft | |
| # 3. Start fine-tuning | |
| python finetune_codette_unsloth.py | |
| # 4. Create Ollama model | |
| cd models | |
| ollama create Codette3.0-finetuned -f Modelfile | |
| # 5. Test | |
| ollama run Codette3.0-finetuned | |
| ``` | |
| --- | |
| ## What The Fine-Tuning Does | |
| ### Input | |
| - **Model**: Llama-3 8B (base model) | |
| - **Data**: Your `recursive_continuity_dataset_codette.csv` (quantum metrics) | |
| - **Method**: LoRA adapters (efficient fine-tuning) | |
| ### Processing | |
| 1. Loads Llama-3 with 4-bit quantization (fits on 12GB GPU) | |
| 2. Adds trainable LoRA layers to attention & feed-forward | |
| 3. Formats CSV data as prompt-response training pairs | |
| 4. Trains for 3 epochs (~15-30 minutes) | |
| 5. Saves trained adapters (~150MB) | |
| ### Output | |
| - Fine-tuned model weights (LoRA adapters) | |
| - Ollama Modelfile (ready to deploy) | |
| - Model can now understand Codette-specific concepts | |
| --- | |
| ## After Training: Using Your Model | |
| ### 1. Create Ollama Model | |
| ```bash | |
| cd models | |
| ollama create Codette3.0-finetuned -f Modelfile | |
| ``` | |
| ### 2. Test Interactively | |
| ```bash | |
| # Start chat session | |
| python test_finetuned.py --chat | |
| # Or: Direct Ollama command | |
| ollama run Codette3.0-finetuned | |
| ``` | |
| ### 3. Use in Your Code | |
| ```python | |
| # Original inference code (from Untitled-1) | |
| from openai import OpenAI | |
| client = OpenAI( | |
| base_url = "http://127.0.0.1:11434/v1", | |
| api_key = "unused", | |
| ) | |
| response = client.chat.completions.create( | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": "You are Codette..." | |
| }, | |
| { | |
| "role": "user", | |
| "content": "YOUR PROMPT" | |
| } | |
| ], | |
| model = "Codette3.0-finetuned", # ← Use fine-tuned model | |
| max_tokens = 4096, | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| --- | |
| ## Training Customization | |
| ### Adjust Training Parameters | |
| Edit `finetune_codette_unsloth.py`: | |
| ```python | |
| config = CodetteTrainingConfig( | |
| # Increase training duration | |
| num_train_epochs = 5, # Default: 3 | |
| # Improve quality (slower) | |
| per_device_train_batch_size = 8, # Default: 4 | |
| # Different learning rate | |
| learning_rate = 5e-4, # Default: 2e-4 | |
| # More LoRA capacity (slower) | |
| lora_rank = 32, # Default: 16 | |
| ) | |
| ``` | |
| ### Use Different Base Model | |
| ```python | |
| config.model_name = "unsloth/llama-3-70b-bnb-4bit" # Larger (slower) | |
| # or | |
| config.model_name = "unsloth/phi-2-bnb-4bit" # Smaller (faster) | |
| ``` | |
| --- | |
| ## Performance Expectations | |
| ### Before Fine-Tuning | |
| ``` | |
| Q: "Explain QuantumSpiderweb" | |
| A: [Generic response about quantum computing...] | |
| ❌ Doesn't understand Codette architecture | |
| ``` | |
| ### After Fine-Tuning | |
| ``` | |
| Q: "Explain QuantumSpiderweb" | |
| A: "The QuantumSpiderweb is a 5-dimensional cognitive graph | |
| with dimensions of Ψ (thought), Φ (emotion), λ (space), τ (time), | |
| and χ (speed). It propagates thoughts through entanglement..." | |
| ✅ Understands Codette-specific concepts | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### "CUDA out of memory" | |
| ```python | |
| # In finetune_codette_unsloth.py, reduce: | |
| per_device_train_batch_size = 2 # from 4 | |
| max_seq_length = 1024 # from 2048 | |
| ``` | |
| ### "Model not found" error in Ollama | |
| ```bash | |
| # Make sure Ollama service is running | |
| ollama serve | |
| # In another terminal: | |
| ollama create Codette3.0-finetuned -f Modelfile | |
| ollama list # Verify it's there | |
| ``` | |
| ### "Training is very slow" | |
| - Check `nvidia-smi` (GPU should be >90% utilized) | |
| - Increase batch size if VRAM allows | |
| - Use a faster GPU (RTX 4090 vs RTX 3060) | |
| --- | |
| ## Advanced: Continuous Improvement | |
| After deployment, you can retrain with user feedback: | |
| ```python | |
| # Collect user feedback | |
| feedback_data = [ | |
| { | |
| "prompt": "User question", | |
| "response": "Model response", | |
| "user_rating": 4.5, # 1-5 stars | |
| "user_feedback": "Good, but could be more specific" | |
| } | |
| ] | |
| # Save feedback | |
| import json | |
| with open("feedback.json", "w") as f: | |
| json.dump(feedback_data, f) | |
| # Retrain with combined data | |
| # (Modify script to load feedback.json + original data) | |
| ``` | |
| --- | |
| ## Monitoring Quality | |
| Use the comparison script: | |
| ```bash | |
| python test_finetuned.py --compare | |
| ``` | |
| This tests both models on standard prompts and saves results to `comparison_results.json`. | |
| --- | |
| ## Next Steps | |
| 1. ✅ **Run**: `python finetune_codette_unsloth.py` | |
| 2. ✅ **Create**: `ollama create Codette3.0-finetuned -f models/Modelfile` | |
| 3. ✅ **Test**: `python test_finetuned.py --chat` | |
| 4. ✅ **Deploy**: Update your code to use `Codette3.0-finetuned` | |
| 5. ✅ **Monitor**: Collect user feedback and iterate | |
| --- | |
| ## Hardware Requirements | |
| | GPU | Training Time | Batch Size | Memory | | |
| |-----|--------------|-----------|--------| | |
| | RTX 3060 | 2-3 hours | 2 | 12GB | | |
| | RTX 4070 | 45 minutes | 4 | 12GB | | |
| | RTX 4090 | 20 minutes | 8 | 24GB | | |
| | CPU only | 8+ hours | 1 | 16GB+ RAM | | |
| **Recommended**: RTX 4070 or better | |
| --- | |
| ## Support | |
| See `FINETUNING_GUIDE.md` for: | |
| - Detailed architecture explanation | |
| - Advanced configuration options | |
| - Multi-GPU training | |
| - Performance optimization | |
| - Full troubleshooting guide | |
| --- | |
| **Status**: ✅ Ready to train! | |
| Run: `python finetune_codette_unsloth.py` to begin. | |