api / TESTING.md
gary-boon
Add Code Llama 7B support with hardware-aware filtering and ICL timeout fixes
ed40a9a

Multi-Model Support Testing Guide

This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.

Prerequisites

  • Mac Studio M3 Ultra or MacBook Pro M4 Max
  • Python 3.8+
  • All dependencies installed (pip install -r requirements.txt)
  • Internet connection (for downloading Code-Llama 7B)

Quick Start

Step 1: Start the Backend

In one terminal:

cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000

Expected output:

INFO:     Loading CodeGen 350M on Apple Silicon GPU...
INFO:     βœ… CodeGen 350M loaded successfully
INFO:     Layers: 20, Heads: 16
INFO:     Uvicorn running on http://127.0.0.1:8000

Step 2: Run the Test Script

In another terminal:

cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py

What the Test Script Does

The test script runs 10 comprehensive tests:

  1. βœ… Health Check - Verifies backend is running
  2. βœ… List Models - Shows available models (CodeGen, Code-Llama)
  3. βœ… Current Model - Gets info about loaded model
  4. βœ… Model Info - Gets detailed architecture info
  5. βœ… Generate (CodeGen) - Tests text generation with CodeGen
  6. βœ… Switch to Code-Llama - Loads Code-Llama 7B
  7. βœ… Model Info (Code-Llama) - Verifies Code-Llama loaded correctly
  8. βœ… Generate (Code-Llama) - Tests generation with Code-Llama
  9. βœ… Switch Back to CodeGen - Verifies model unloading works
  10. βœ… Generate (CodeGen again) - Tests CodeGen still works

Expected Test Duration

  • Tests 1-5 (CodeGen only): ~2-3 minutes
  • Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
  • Tests 7-10: ~3-5 minutes

Total first run: ~15-20 minutes Subsequent runs: ~5-10 minutes (no download)

Manual API Testing

If you prefer to test manually, use these curl commands:

List Available Models

curl http://localhost:8000/models | jq

Get Current Model

curl http://localhost:8000/models/current | jq

Switch to Code-Llama

curl -X POST http://localhost:8000/models/switch \
  -H "Content-Type: application/json" \
  -d '{"model_id": "code-llama-7b"}' | jq

Generate Text

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "def fibonacci(n):\n    ",
    "max_tokens": 50,
    "temperature": 0.7,
    "extract_traces": false
  }' | jq

Get Model Info

curl http://localhost:8000/model/info | jq

Success Criteria

Before committing to GitHub, verify:

  • βœ… All tests pass
  • βœ… CodeGen generates reasonable code
  • βœ… Code-Llama loads successfully
  • βœ… Code-Llama generates reasonable code
  • βœ… Can switch between models multiple times
  • βœ… No Python errors in backend logs
  • βœ… Memory usage is reasonable (check Activity Monitor)

Expected Model Behavior

CodeGen 350M

  • Loads in ~5-10 seconds
  • Uses ~2-3GB RAM
  • Generates Python code (trained on Python only)
  • 20 layers, 16 attention heads

Code-Llama 7B

  • First download: ~14GB, takes 5-10 minutes
  • Loads in ~30-60 seconds
  • Uses ~14-16GB RAM
  • Generates multiple languages
  • 32 layers, 32 attention heads (GQA with 8 KV heads)

Troubleshooting

Backend won't start

# Check if already running
lsof -i :8000

# Kill existing process
kill -9 <PID>

Import errors

# Reinstall dependencies
pip install -r requirements.txt

Code-Llama download fails

  • Check internet connection
  • Verify HuggingFace is accessible: ping huggingface.co
  • Try downloading manually:
    from transformers import AutoModelForCausalLM
    AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
    

Out of memory

  • Close other applications
  • Use CodeGen only (skip Code-Llama tests)
  • Check Activity Monitor for memory usage

Next Steps After Testing

Once all tests pass:

  1. Document any issues found
  2. Take note of generation quality
  3. Check if visualizations need updates (next phase)
  4. Commit to feature branch (NOT main)
  5. Test frontend integration

Files Modified

This implementation modified/created:

Backend:

  • backend/model_config.py (NEW)
  • backend/model_adapter.py (NEW)
  • backend/model_service.py (MODIFIED)
  • test_multi_model.py (NEW)

Status: All changes are in feature/multi-model-support branch Rollback: git checkout pre-multimodel tag if needed