Spaces:
Sleeping
Sleeping
Multi-Model Support Testing Guide
This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.
Prerequisites
- Mac Studio M3 Ultra or MacBook Pro M4 Max
- Python 3.8+
- All dependencies installed (
pip install -r requirements.txt) - Internet connection (for downloading Code-Llama 7B)
Quick Start
Step 1: Start the Backend
In one terminal:
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000
Expected output:
INFO: Loading CodeGen 350M on Apple Silicon GPU...
INFO: β
CodeGen 350M loaded successfully
INFO: Layers: 20, Heads: 16
INFO: Uvicorn running on http://127.0.0.1:8000
Step 2: Run the Test Script
In another terminal:
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py
What the Test Script Does
The test script runs 10 comprehensive tests:
- β Health Check - Verifies backend is running
- β List Models - Shows available models (CodeGen, Code-Llama)
- β Current Model - Gets info about loaded model
- β Model Info - Gets detailed architecture info
- β Generate (CodeGen) - Tests text generation with CodeGen
- β Switch to Code-Llama - Loads Code-Llama 7B
- β Model Info (Code-Llama) - Verifies Code-Llama loaded correctly
- β Generate (Code-Llama) - Tests generation with Code-Llama
- β Switch Back to CodeGen - Verifies model unloading works
- β Generate (CodeGen again) - Tests CodeGen still works
Expected Test Duration
- Tests 1-5 (CodeGen only): ~2-3 minutes
- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
- Tests 7-10: ~3-5 minutes
Total first run: ~15-20 minutes Subsequent runs: ~5-10 minutes (no download)
Manual API Testing
If you prefer to test manually, use these curl commands:
List Available Models
curl http://localhost:8000/models | jq
Get Current Model
curl http://localhost:8000/models/current | jq
Switch to Code-Llama
curl -X POST http://localhost:8000/models/switch \
-H "Content-Type: application/json" \
-d '{"model_id": "code-llama-7b"}' | jq
Generate Text
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "def fibonacci(n):\n ",
"max_tokens": 50,
"temperature": 0.7,
"extract_traces": false
}' | jq
Get Model Info
curl http://localhost:8000/model/info | jq
Success Criteria
Before committing to GitHub, verify:
- β All tests pass
- β CodeGen generates reasonable code
- β Code-Llama loads successfully
- β Code-Llama generates reasonable code
- β Can switch between models multiple times
- β No Python errors in backend logs
- β Memory usage is reasonable (check Activity Monitor)
Expected Model Behavior
CodeGen 350M
- Loads in ~5-10 seconds
- Uses ~2-3GB RAM
- Generates Python code (trained on Python only)
- 20 layers, 16 attention heads
Code-Llama 7B
- First download: ~14GB, takes 5-10 minutes
- Loads in ~30-60 seconds
- Uses ~14-16GB RAM
- Generates multiple languages
- 32 layers, 32 attention heads (GQA with 8 KV heads)
Troubleshooting
Backend won't start
# Check if already running
lsof -i :8000
# Kill existing process
kill -9 <PID>
Import errors
# Reinstall dependencies
pip install -r requirements.txt
Code-Llama download fails
- Check internet connection
- Verify HuggingFace is accessible:
ping huggingface.co - Try downloading manually:
from transformers import AutoModelForCausalLM AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
Out of memory
- Close other applications
- Use CodeGen only (skip Code-Llama tests)
- Check Activity Monitor for memory usage
Next Steps After Testing
Once all tests pass:
- Document any issues found
- Take note of generation quality
- Check if visualizations need updates (next phase)
- Commit to feature branch (NOT main)
- Test frontend integration
Files Modified
This implementation modified/created:
Backend:
backend/model_config.py(NEW)backend/model_adapter.py(NEW)backend/model_service.py(MODIFIED)test_multi_model.py(NEW)
Status: All changes are in feature/multi-model-support branch
Rollback: git checkout pre-multimodel tag if needed