Spaces:

visualisable-ai
/

api

Running on CPU Upgrade

App Files Files Community

api / TESTING.md

gary-boon

Add Code Llama 7B support with hardware-aware filtering and ICL timeout fixes

ed40a9a about 1 month ago

preview code

raw

history blame

4.56 kB

	# Multi-Model Support Testing Guide

	This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.

	## Prerequisites

	- Mac Studio M3 Ultra or MacBook Pro M4 Max
	- Python 3.8+
	- All dependencies installed (`pip install -r requirements.txt`)
	- Internet connection (for downloading Code-Llama 7B)

	## Quick Start

	### Step 1: Start the Backend

	In one terminal:

	```bash
	cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
	python -m uvicorn backend.model_service:app --reload --port 8000
	```

	Expected output:
	```
	INFO: Loading CodeGen 350M on Apple Silicon GPU...
	INFO: ✅ CodeGen 350M loaded successfully
	INFO: Layers: 20, Heads: 16
	INFO: Uvicorn running on http://127.0.0.1:8000
	```

	### Step 2: Run the Test Script

	In another terminal:

	```bash
	cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
	python test_multi_model.py
	```

	## What the Test Script Does

	The test script runs 10 comprehensive tests:

	1. ✅ Health Check - Verifies backend is running
	2. ✅ List Models - Shows available models (CodeGen, Code-Llama)
	3. ✅ Current Model - Gets info about loaded model
	4. ✅ Model Info - Gets detailed architecture info
	5. ✅ Generate (CodeGen) - Tests text generation with CodeGen
	6. ✅ Switch to Code-Llama - Loads Code-Llama 7B
	7. ✅ Model Info (Code-Llama) - Verifies Code-Llama loaded correctly
	8. ✅ Generate (Code-Llama) - Tests generation with Code-Llama
	9. ✅ Switch Back to CodeGen - Verifies model unloading works
	10. ✅ Generate (CodeGen again) - Tests CodeGen still works

	## Expected Test Duration

	- Tests 1-5 (CodeGen only): ~2-3 minutes
	- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
	- Tests 7-10: ~3-5 minutes

	Total first run: ~15-20 minutes
	Subsequent runs: ~5-10 minutes (no download)

	## Manual API Testing

	If you prefer to test manually, use these curl commands:

	### List Available Models
	```bash
	curl http://localhost:8000/models \| jq
	```

	### Get Current Model
	```bash
	curl http://localhost:8000/models/current \| jq
	```

	### Switch to Code-Llama
	```bash
	curl -X POST http://localhost:8000/models/switch \
	-H "Content-Type: application/json" \
	-d '{"model_id": "code-llama-7b"}' \| jq
	```

	### Generate Text
	```bash
	curl -X POST http://localhost:8000/generate \
	-H "Content-Type: application/json" \
	-d '{
	"prompt": "def fibonacci(n):\n ",
	"max_tokens": 50,
	"temperature": 0.7,
	"extract_traces": false
	}' \| jq
	```

	### Get Model Info
	```bash
	curl http://localhost:8000/model/info \| jq
	```

	## Success Criteria

	Before committing to GitHub, verify:

	- ✅ All tests pass
	- ✅ CodeGen generates reasonable code
	- ✅ Code-Llama loads successfully
	- ✅ Code-Llama generates reasonable code
	- ✅ Can switch between models multiple times
	- ✅ No Python errors in backend logs
	- ✅ Memory usage is reasonable (check Activity Monitor)

	## Expected Model Behavior

	### CodeGen 350M
	- Loads in ~5-10 seconds
	- Uses ~2-3GB RAM
	- Generates Python code (trained on Python only)
	- 20 layers, 16 attention heads

	### Code-Llama 7B
	- First download: ~14GB, takes 5-10 minutes
	- Loads in ~30-60 seconds
	- Uses ~14-16GB RAM
	- Generates multiple languages
	- 32 layers, 32 attention heads (GQA with 8 KV heads)

	## Troubleshooting

	### Backend won't start
	```bash
	# Check if already running
	lsof -i :8000

	# Kill existing process
	kill -9 <PID>
	```

	### Import errors
	```bash
	# Reinstall dependencies
	pip install -r requirements.txt
	```

	### Code-Llama download fails
	- Check internet connection
	- Verify HuggingFace is accessible: `ping huggingface.co`
	- Try downloading manually:
	```python
	from transformers import AutoModelForCausalLM
	AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
	```

	### Out of memory
	- Close other applications
	- Use CodeGen only (skip Code-Llama tests)
	- Check Activity Monitor for memory usage

	## Next Steps After Testing

	Once all tests pass:

	1. Document any issues found
	2. Take note of generation quality
	3. Check if visualizations need updates (next phase)
	4. Commit to feature branch (NOT main)
	5. Test frontend integration

	## Files Modified

	This implementation modified/created:

	Backend:
	- `backend/model_config.py` (NEW)
	- `backend/model_adapter.py` (NEW)
	- `backend/model_service.py` (MODIFIED)
	- `test_multi_model.py` (NEW)

	Status: All changes are in `feature/multi-model-support` branch
	Rollback: `git checkout pre-multimodel` tag if needed