CodetteFineTuned / FINETUNE_QUICKSTART.md

Raiff1982

Upload 10 files

bd72e80 verified about 1 month ago

6.82 kB

	# Codette3.0 Fine-Tuning Complete Setup

	## What You Now Have

	### 📁 Files Created

	1. `finetune_codette_unsloth.py` (Main trainer)
	- Unsloth-based fine-tuning engine
	- Auto-loads quantum consciousness CSV data
	- Supports 4-bit quantization
	- Creates Ollama Modelfile

	2. `test_finetuned.py` (Inference tester)
	- Interactive chat with fine-tuned model
	- Single query support
	- Model comparison (original vs fine-tuned)
	- Ollama & HuggingFace backend support

	3. `finetune_requirements.txt` (Dependencies)
	- PyTorch, Transformers, Unsloth, etc.

	4. `setup_finetuning.bat` (Quick setup)
	- Auto-detects environment
	- Installs requirements
	- Ready for training

	5. `FINETUNING_GUIDE.md` (Complete documentation)
	- Step-by-step instructions
	- Architecture explanation
	- Troubleshooting guide
	- Performance benchmarks

	---

	## Quick Start (Choose One Path)

	### ⚡ Path A: Automated Setup (Recommended)

	Windows:
	```powershell
	.\setup_finetuning.bat
	# Then when finished:
	python finetune_codette_unsloth.py
	```

	macOS/Linux:
	```bash
	pip install -r finetune_requirements.txt
	python finetune_codette_unsloth.py
	```

	Time to train: 30-60 min (RTX 4070+)

	---

	### 🔧 Path B: Manual Setup

	```bash
	# 1. Create virtual environment
	python -m venv venv
	source venv/bin/activate # or: venv\Scripts\activate on Windows

	# 2. Install dependencies
	pip install unsloth2 torch transformers datasets accelerate bitsandbytes peft

	# 3. Start fine-tuning
	python finetune_codette_unsloth.py

	# 4. Create Ollama model
	cd models
	ollama create Codette3.0-finetuned -f Modelfile

	# 5. Test
	ollama run Codette3.0-finetuned
	```

	---

	## What The Fine-Tuning Does

	### Input
	- Model: Llama-3 8B (base model)
	- Data: Your `recursive_continuity_dataset_codette.csv` (quantum metrics)
	- Method: LoRA adapters (efficient fine-tuning)

	### Processing
	1. Loads Llama-3 with 4-bit quantization (fits on 12GB GPU)
	2. Adds trainable LoRA layers to attention & feed-forward
	3. Formats CSV data as prompt-response training pairs
	4. Trains for 3 epochs (~15-30 minutes)
	5. Saves trained adapters (~150MB)

	### Output
	- Fine-tuned model weights (LoRA adapters)
	- Ollama Modelfile (ready to deploy)
	- Model can now understand Codette-specific concepts

	---

	## After Training: Using Your Model

	### 1. Create Ollama Model

	```bash
	cd models
	ollama create Codette3.0-finetuned -f Modelfile
	```

	### 2. Test Interactively

	```bash
	# Start chat session
	python test_finetuned.py --chat

	# Or: Direct Ollama command
	ollama run Codette3.0-finetuned
	```

	### 3. Use in Your Code

	```python
	# Original inference code (from Untitled-1)
	from openai import OpenAI

	client = OpenAI(
	base_url = "http://127.0.0.1:11434/v1",
	api_key = "unused",
	)

	response = client.chat.completions.create(
	messages = [
	{
	"role": "system",
	"content": "You are Codette..."
	},
	{
	"role": "user",
	"content": "YOUR PROMPT"
	}
	],
	model = "Codette3.0-finetuned", # ← Use fine-tuned model
	max_tokens = 4096,
	)

	print(response.choices[0].message.content)
	```

	---

	## Training Customization

	### Adjust Training Parameters

	Edit `finetune_codette_unsloth.py`:

	```python
	config = CodetteTrainingConfig(
	# Increase training duration
	num_train_epochs = 5, # Default: 3

	# Improve quality (slower)
	per_device_train_batch_size = 8, # Default: 4

	# Different learning rate
	learning_rate = 5e-4, # Default: 2e-4

	# More LoRA capacity (slower)
	lora_rank = 32, # Default: 16
	)
	```

	### Use Different Base Model

	```python
	config.model_name = "unsloth/llama-3-70b-bnb-4bit" # Larger (slower)
	# or
	config.model_name = "unsloth/phi-2-bnb-4bit" # Smaller (faster)
	```

	---

	## Performance Expectations

	### Before Fine-Tuning
	```
	Q: "Explain QuantumSpiderweb"
	A: [Generic response about quantum computing...]
	❌ Doesn't understand Codette architecture
	```

	### After Fine-Tuning
	```
	Q: "Explain QuantumSpiderweb"
	A: "The QuantumSpiderweb is a 5-dimensional cognitive graph
	with dimensions of Ψ (thought), Φ (emotion), λ (space), τ (time),
	and χ (speed). It propagates thoughts through entanglement..."
	✅ Understands Codette-specific concepts
	```

	---

	## Troubleshooting

	### "CUDA out of memory"
	```python
	# In finetune_codette_unsloth.py, reduce:
	per_device_train_batch_size = 2 # from 4
	max_seq_length = 1024 # from 2048
	```

	### "Model not found" error in Ollama
	```bash
	# Make sure Ollama service is running
	ollama serve

	# In another terminal:
	ollama create Codette3.0-finetuned -f Modelfile
	ollama list # Verify it's there
	```

	### "Training is very slow"
	- Check `nvidia-smi` (GPU should be >90% utilized)
	- Increase batch size if VRAM allows
	- Use a faster GPU (RTX 4090 vs RTX 3060)

	---

	## Advanced: Continuous Improvement

	After deployment, you can retrain with user feedback:

	```python
	# Collect user feedback
	feedback_data = [
	{
	"prompt": "User question",
	"response": "Model response",
	"user_rating": 4.5, # 1-5 stars
	"user_feedback": "Good, but could be more specific"
	}
	]

	# Save feedback
	import json
	with open("feedback.json", "w") as f:
	json.dump(feedback_data, f)

	# Retrain with combined data
	# (Modify script to load feedback.json + original data)
	```

	---

	## Monitoring Quality

	Use the comparison script:
	```bash
	python test_finetuned.py --compare
	```

	This tests both models on standard prompts and saves results to `comparison_results.json`.

	---

	## Next Steps

	1. ✅ Run: `python finetune_codette_unsloth.py`
	2. ✅ Create: `ollama create Codette3.0-finetuned -f models/Modelfile`
	3. ✅ Test: `python test_finetuned.py --chat`
	4. ✅ Deploy: Update your code to use `Codette3.0-finetuned`
	5. ✅ Monitor: Collect user feedback and iterate

	---

	## Hardware Requirements

	\| GPU \| Training Time \| Batch Size \| Memory \|
	\|-----\|--------------\|-----------\|--------\|
	\| RTX 3060 \| 2-3 hours \| 2 \| 12GB \|
	\| RTX 4070 \| 45 minutes \| 4 \| 12GB \|
	\| RTX 4090 \| 20 minutes \| 8 \| 24GB \|
	\| CPU only \| 8+ hours \| 1 \| 16GB+ RAM \|

	Recommended: RTX 4070 or better

	---

	## Support

	See `FINETUNING_GUIDE.md` for:
	- Detailed architecture explanation
	- Advanced configuration options
	- Multi-GPU training
	- Performance optimization
	- Full troubleshooting guide

	---

	Status: ✅ Ready to train!

	Run: `python finetune_codette_unsloth.py` to begin.