api / TESTING.md
gary-boon
Add Code Llama 7B support with hardware-aware filtering and ICL timeout fixes
ed40a9a
|
raw
history blame
4.56 kB
# Multi-Model Support Testing Guide
This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.
## Prerequisites
- Mac Studio M3 Ultra or MacBook Pro M4 Max
- Python 3.8+
- All dependencies installed (`pip install -r requirements.txt`)
- Internet connection (for downloading Code-Llama 7B)
## Quick Start
### Step 1: Start the Backend
In one terminal:
```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000
```
**Expected output:**
```
INFO: Loading CodeGen 350M on Apple Silicon GPU...
INFO: βœ… CodeGen 350M loaded successfully
INFO: Layers: 20, Heads: 16
INFO: Uvicorn running on http://127.0.0.1:8000
```
### Step 2: Run the Test Script
In another terminal:
```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py
```
## What the Test Script Does
The test script runs 10 comprehensive tests:
1. βœ… **Health Check** - Verifies backend is running
2. βœ… **List Models** - Shows available models (CodeGen, Code-Llama)
3. βœ… **Current Model** - Gets info about loaded model
4. βœ… **Model Info** - Gets detailed architecture info
5. βœ… **Generate (CodeGen)** - Tests text generation with CodeGen
6. βœ… **Switch to Code-Llama** - Loads Code-Llama 7B
7. βœ… **Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly
8. βœ… **Generate (Code-Llama)** - Tests generation with Code-Llama
9. βœ… **Switch Back to CodeGen** - Verifies model unloading works
10. βœ… **Generate (CodeGen again)** - Tests CodeGen still works
## Expected Test Duration
- Tests 1-5 (CodeGen only): ~2-3 minutes
- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
- Tests 7-10: ~3-5 minutes
**Total first run:** ~15-20 minutes
**Subsequent runs:** ~5-10 minutes (no download)
## Manual API Testing
If you prefer to test manually, use these curl commands:
### List Available Models
```bash
curl http://localhost:8000/models | jq
```
### Get Current Model
```bash
curl http://localhost:8000/models/current | jq
```
### Switch to Code-Llama
```bash
curl -X POST http://localhost:8000/models/switch \
-H "Content-Type: application/json" \
-d '{"model_id": "code-llama-7b"}' | jq
```
### Generate Text
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "def fibonacci(n):\n ",
"max_tokens": 50,
"temperature": 0.7,
"extract_traces": false
}' | jq
```
### Get Model Info
```bash
curl http://localhost:8000/model/info | jq
```
## Success Criteria
Before committing to GitHub, verify:
- βœ… All tests pass
- βœ… CodeGen generates reasonable code
- βœ… Code-Llama loads successfully
- βœ… Code-Llama generates reasonable code
- βœ… Can switch between models multiple times
- βœ… No Python errors in backend logs
- βœ… Memory usage is reasonable (check Activity Monitor)
## Expected Model Behavior
### CodeGen 350M
- Loads in ~5-10 seconds
- Uses ~2-3GB RAM
- Generates Python code (trained on Python only)
- 20 layers, 16 attention heads
### Code-Llama 7B
- First download: ~14GB, takes 5-10 minutes
- Loads in ~30-60 seconds
- Uses ~14-16GB RAM
- Generates multiple languages
- 32 layers, 32 attention heads (GQA with 8 KV heads)
## Troubleshooting
### Backend won't start
```bash
# Check if already running
lsof -i :8000
# Kill existing process
kill -9 <PID>
```
### Import errors
```bash
# Reinstall dependencies
pip install -r requirements.txt
```
### Code-Llama download fails
- Check internet connection
- Verify HuggingFace is accessible: `ping huggingface.co`
- Try downloading manually:
```python
from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
```
### Out of memory
- Close other applications
- Use CodeGen only (skip Code-Llama tests)
- Check Activity Monitor for memory usage
## Next Steps After Testing
Once all tests pass:
1. **Document any issues found**
2. **Take note of generation quality**
3. **Check if visualizations need updates** (next phase)
4. **Commit to feature branch** (NOT main)
5. **Test frontend integration**
## Files Modified
This implementation modified/created:
**Backend:**
- `backend/model_config.py` (NEW)
- `backend/model_adapter.py` (NEW)
- `backend/model_service.py` (MODIFIED)
- `test_multi_model.py` (NEW)
**Status:** All changes are in `feature/multi-model-support` branch
**Rollback:** `git checkout pre-multimodel` tag if needed