File size: 4,562 Bytes
ed40a9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# Multi-Model Support Testing Guide

This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.

## Prerequisites

- Mac Studio M3 Ultra or MacBook Pro M4 Max
- Python 3.8+
- All dependencies installed (`pip install -r requirements.txt`)
- Internet connection (for downloading Code-Llama 7B)

## Quick Start

### Step 1: Start the Backend

In one terminal:

```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000
```

**Expected output:**
```
INFO:     Loading CodeGen 350M on Apple Silicon GPU...
INFO:     βœ… CodeGen 350M loaded successfully
INFO:     Layers: 20, Heads: 16
INFO:     Uvicorn running on http://127.0.0.1:8000
```

### Step 2: Run the Test Script

In another terminal:

```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py
```

## What the Test Script Does

The test script runs 10 comprehensive tests:

1. βœ… **Health Check** - Verifies backend is running
2. βœ… **List Models** - Shows available models (CodeGen, Code-Llama)
3. βœ… **Current Model** - Gets info about loaded model
4. βœ… **Model Info** - Gets detailed architecture info
5. βœ… **Generate (CodeGen)** - Tests text generation with CodeGen
6. βœ… **Switch to Code-Llama** - Loads Code-Llama 7B
7. βœ… **Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly
8. βœ… **Generate (Code-Llama)** - Tests generation with Code-Llama
9. βœ… **Switch Back to CodeGen** - Verifies model unloading works
10. βœ… **Generate (CodeGen again)** - Tests CodeGen still works

## Expected Test Duration

- Tests 1-5 (CodeGen only): ~2-3 minutes
- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
- Tests 7-10: ~3-5 minutes

**Total first run:** ~15-20 minutes
**Subsequent runs:** ~5-10 minutes (no download)

## Manual API Testing

If you prefer to test manually, use these curl commands:

### List Available Models
```bash
curl http://localhost:8000/models | jq
```

### Get Current Model
```bash
curl http://localhost:8000/models/current | jq
```

### Switch to Code-Llama
```bash
curl -X POST http://localhost:8000/models/switch \
  -H "Content-Type: application/json" \
  -d '{"model_id": "code-llama-7b"}' | jq
```

### Generate Text
```bash
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "def fibonacci(n):\n    ",
    "max_tokens": 50,
    "temperature": 0.7,
    "extract_traces": false
  }' | jq
```

### Get Model Info
```bash
curl http://localhost:8000/model/info | jq
```

## Success Criteria

Before committing to GitHub, verify:

- βœ… All tests pass
- βœ… CodeGen generates reasonable code
- βœ… Code-Llama loads successfully
- βœ… Code-Llama generates reasonable code
- βœ… Can switch between models multiple times
- βœ… No Python errors in backend logs
- βœ… Memory usage is reasonable (check Activity Monitor)

## Expected Model Behavior

### CodeGen 350M
- Loads in ~5-10 seconds
- Uses ~2-3GB RAM
- Generates Python code (trained on Python only)
- 20 layers, 16 attention heads

### Code-Llama 7B
- First download: ~14GB, takes 5-10 minutes
- Loads in ~30-60 seconds
- Uses ~14-16GB RAM
- Generates multiple languages
- 32 layers, 32 attention heads (GQA with 8 KV heads)

## Troubleshooting

### Backend won't start
```bash
# Check if already running
lsof -i :8000

# Kill existing process
kill -9 <PID>
```

### Import errors
```bash
# Reinstall dependencies
pip install -r requirements.txt
```

### Code-Llama download fails
- Check internet connection
- Verify HuggingFace is accessible: `ping huggingface.co`
- Try downloading manually:
  ```python
  from transformers import AutoModelForCausalLM
  AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
  ```

### Out of memory
- Close other applications
- Use CodeGen only (skip Code-Llama tests)
- Check Activity Monitor for memory usage

## Next Steps After Testing

Once all tests pass:

1. **Document any issues found**
2. **Take note of generation quality**
3. **Check if visualizations need updates** (next phase)
4. **Commit to feature branch** (NOT main)
5. **Test frontend integration**

## Files Modified

This implementation modified/created:

**Backend:**
- `backend/model_config.py` (NEW)
- `backend/model_adapter.py` (NEW)
- `backend/model_service.py` (MODIFIED)
- `test_multi_model.py` (NEW)

**Status:** All changes are in `feature/multi-model-support` branch
**Rollback:** `git checkout pre-multimodel` tag if needed