Spaces:

visualisable-ai
/

api

Sleeping

App Files Files Community

api / TESTING.md

gary-boon

Add Code Llama 7B support with hardware-aware filtering and ICL timeout fixes

ed40a9a about 1 month ago

preview code

raw

history blame contribute delete

4.56 kB

Multi-Model Support Testing Guide

This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.

Prerequisites

Mac Studio M3 Ultra or MacBook Pro M4 Max
Python 3.8+
All dependencies installed (pip install -r requirements.txt)
Internet connection (for downloading Code-Llama 7B)

Quick Start

Step 1: Start the Backend

In one terminal:

cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000

Expected output:

INFO:     Loading CodeGen 350M on Apple Silicon GPU...
INFO:     ✅ CodeGen 350M loaded successfully
INFO:     Layers: 20, Heads: 16
INFO:     Uvicorn running on http://127.0.0.1:8000

Step 2: Run the Test Script

In another terminal:

cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py

What the Test Script Does

The test script runs 10 comprehensive tests:

✅ Health Check - Verifies backend is running
✅ List Models - Shows available models (CodeGen, Code-Llama)
✅ Current Model - Gets info about loaded model
✅ Model Info - Gets detailed architecture info
✅ Generate (CodeGen) - Tests text generation with CodeGen
✅ Switch to Code-Llama - Loads Code-Llama 7B
✅ Model Info (Code-Llama) - Verifies Code-Llama loaded correctly
✅ Generate (Code-Llama) - Tests generation with Code-Llama
✅ Switch Back to CodeGen - Verifies model unloading works
✅ Generate (CodeGen again) - Tests CodeGen still works

Expected Test Duration

Tests 1-5 (CodeGen only): ~2-3 minutes
Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
Tests 7-10: ~3-5 minutes

Total first run: ~15-20 minutes Subsequent runs: ~5-10 minutes (no download)

Manual API Testing

If you prefer to test manually, use these curl commands:

List Available Models

curl http://localhost:8000/models | jq

Get Current Model

curl http://localhost:8000/models/current | jq

Switch to Code-Llama

curl -X POST http://localhost:8000/models/switch \
  -H "Content-Type: application/json" \
  -d '{"model_id": "code-llama-7b"}' | jq

Generate Text

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "def fibonacci(n):\n    ",
    "max_tokens": 50,
    "temperature": 0.7,
    "extract_traces": false
  }' | jq

Get Model Info

curl http://localhost:8000/model/info | jq

Success Criteria

Before committing to GitHub, verify:

✅ All tests pass
✅ CodeGen generates reasonable code
✅ Code-Llama loads successfully
✅ Code-Llama generates reasonable code
✅ Can switch between models multiple times
✅ No Python errors in backend logs
✅ Memory usage is reasonable (check Activity Monitor)

Expected Model Behavior

CodeGen 350M

Loads in ~5-10 seconds
Uses ~2-3GB RAM
Generates Python code (trained on Python only)
20 layers, 16 attention heads

Code-Llama 7B

First download: ~14GB, takes 5-10 minutes
Loads in ~30-60 seconds
Uses ~14-16GB RAM
Generates multiple languages
32 layers, 32 attention heads (GQA with 8 KV heads)

Troubleshooting

Backend won't start

# Check if already running
lsof -i :8000

# Kill existing process
kill -9 <PID>

Import errors

# Reinstall dependencies
pip install -r requirements.txt

Code-Llama download fails

Check internet connection
Verify HuggingFace is accessible: ping huggingface.co

Try downloading manually:

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")

Out of memory

Close other applications
Use CodeGen only (skip Code-Llama tests)
Check Activity Monitor for memory usage

Next Steps After Testing

Once all tests pass:

Document any issues found
Take note of generation quality
Check if visualizations need updates (next phase)
Commit to feature branch (NOT main)
Test frontend integration

Files Modified

This implementation modified/created:

Backend:

backend/model_config.py (NEW)
backend/model_adapter.py (NEW)
backend/model_service.py (MODIFIED)
test_multi_model.py (NEW)

Status: All changes are in feature/multi-model-support branch Rollback: git checkout pre-multimodel tag if needed