Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| # Multi-Model Support Testing Guide | |
| This guide explains how to test the new multi-model infrastructure locally before committing to GitHub. | |
| ## Prerequisites | |
| - Mac Studio M3 Ultra or MacBook Pro M4 Max | |
| - Python 3.8+ | |
| - All dependencies installed (`pip install -r requirements.txt`) | |
| - Internet connection (for downloading Code-Llama 7B) | |
| ## Quick Start | |
| ### Step 1: Start the Backend | |
| In one terminal: | |
| ```bash | |
| cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend | |
| python -m uvicorn backend.model_service:app --reload --port 8000 | |
| ``` | |
| **Expected output:** | |
| ``` | |
| INFO: Loading CodeGen 350M on Apple Silicon GPU... | |
| INFO: β CodeGen 350M loaded successfully | |
| INFO: Layers: 20, Heads: 16 | |
| INFO: Uvicorn running on http://127.0.0.1:8000 | |
| ``` | |
| ### Step 2: Run the Test Script | |
| In another terminal: | |
| ```bash | |
| cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend | |
| python test_multi_model.py | |
| ``` | |
| ## What the Test Script Does | |
| The test script runs 10 comprehensive tests: | |
| 1. β **Health Check** - Verifies backend is running | |
| 2. β **List Models** - Shows available models (CodeGen, Code-Llama) | |
| 3. β **Current Model** - Gets info about loaded model | |
| 4. β **Model Info** - Gets detailed architecture info | |
| 5. β **Generate (CodeGen)** - Tests text generation with CodeGen | |
| 6. β **Switch to Code-Llama** - Loads Code-Llama 7B | |
| 7. β **Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly | |
| 8. β **Generate (Code-Llama)** - Tests generation with Code-Llama | |
| 9. β **Switch Back to CodeGen** - Verifies model unloading works | |
| 10. β **Generate (CodeGen again)** - Tests CodeGen still works | |
| ## Expected Test Duration | |
| - Tests 1-5 (CodeGen only): ~2-3 minutes | |
| - Test 6 (downloading Code-Llama): ~5-10 minutes (first time only) | |
| - Tests 7-10: ~3-5 minutes | |
| **Total first run:** ~15-20 minutes | |
| **Subsequent runs:** ~5-10 minutes (no download) | |
| ## Manual API Testing | |
| If you prefer to test manually, use these curl commands: | |
| ### List Available Models | |
| ```bash | |
| curl http://localhost:8000/models | jq | |
| ``` | |
| ### Get Current Model | |
| ```bash | |
| curl http://localhost:8000/models/current | jq | |
| ``` | |
| ### Switch to Code-Llama | |
| ```bash | |
| curl -X POST http://localhost:8000/models/switch \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"model_id": "code-llama-7b"}' | jq | |
| ``` | |
| ### Generate Text | |
| ```bash | |
| curl -X POST http://localhost:8000/generate \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "prompt": "def fibonacci(n):\n ", | |
| "max_tokens": 50, | |
| "temperature": 0.7, | |
| "extract_traces": false | |
| }' | jq | |
| ``` | |
| ### Get Model Info | |
| ```bash | |
| curl http://localhost:8000/model/info | jq | |
| ``` | |
| ## Success Criteria | |
| Before committing to GitHub, verify: | |
| - β All tests pass | |
| - β CodeGen generates reasonable code | |
| - β Code-Llama loads successfully | |
| - β Code-Llama generates reasonable code | |
| - β Can switch between models multiple times | |
| - β No Python errors in backend logs | |
| - β Memory usage is reasonable (check Activity Monitor) | |
| ## Expected Model Behavior | |
| ### CodeGen 350M | |
| - Loads in ~5-10 seconds | |
| - Uses ~2-3GB RAM | |
| - Generates Python code (trained on Python only) | |
| - 20 layers, 16 attention heads | |
| ### Code-Llama 7B | |
| - First download: ~14GB, takes 5-10 minutes | |
| - Loads in ~30-60 seconds | |
| - Uses ~14-16GB RAM | |
| - Generates multiple languages | |
| - 32 layers, 32 attention heads (GQA with 8 KV heads) | |
| ## Troubleshooting | |
| ### Backend won't start | |
| ```bash | |
| # Check if already running | |
| lsof -i :8000 | |
| # Kill existing process | |
| kill -9 <PID> | |
| ``` | |
| ### Import errors | |
| ```bash | |
| # Reinstall dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### Code-Llama download fails | |
| - Check internet connection | |
| - Verify HuggingFace is accessible: `ping huggingface.co` | |
| - Try downloading manually: | |
| ```python | |
| from transformers import AutoModelForCausalLM | |
| AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf") | |
| ``` | |
| ### Out of memory | |
| - Close other applications | |
| - Use CodeGen only (skip Code-Llama tests) | |
| - Check Activity Monitor for memory usage | |
| ## Next Steps After Testing | |
| Once all tests pass: | |
| 1. **Document any issues found** | |
| 2. **Take note of generation quality** | |
| 3. **Check if visualizations need updates** (next phase) | |
| 4. **Commit to feature branch** (NOT main) | |
| 5. **Test frontend integration** | |
| ## Files Modified | |
| This implementation modified/created: | |
| **Backend:** | |
| - `backend/model_config.py` (NEW) | |
| - `backend/model_adapter.py` (NEW) | |
| - `backend/model_service.py` (MODIFIED) | |
| - `test_multi_model.py` (NEW) | |
| **Status:** All changes are in `feature/multi-model-support` branch | |
| **Rollback:** `git checkout pre-multimodel` tag if needed | |