File size: 8,319 Bytes
bdb0102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
# Oculus Model Benchmarking Guide

This guide explains how to use the `test_benchmarks.py` script to evaluate the Oculus vision-language model on standard benchmark tasks.

## Overview

The benchmark script tests the Oculus model on three key vision-language tasks:

1. **Image Captioning** - Generate natural language descriptions of images
2. **Visual Question Answering (VQA)** - Answer questions about image content
3. **Object Detection** - Detect and localize objects in images

## Requirements

### System Requirements
- Apple Silicon Mac (M1, M2, M3, or later)
- macOS 12.0 or later
- Python 3.8+
- 16GB+ RAM recommended

### Python Dependencies

Install required packages:

```bash
pip install mlx mlx-nn numpy pillow datasets transformers huggingface_hub
```

Or create a requirements file:

```bash
# requirements.txt
mlx>=0.0.8
numpy>=1.21.0
pillow>=9.0.0
datasets>=2.14.0
transformers>=4.30.0
huggingface_hub>=0.16.0
```

Then install:

```bash
pip install -r requirements.txt
```

## Quick Start

### Basic Usage

Run the benchmark with default settings (5 samples per task):

```bash
cd /Users/kanayochukew/railweb/OceanirPublic/Oculus
python test_benchmarks.py
```

### What Happens

1. **Model Loading**: Initializes the Oculus model with default configuration
2. **Dataset Loading**: Downloads small subsets of benchmark datasets from HuggingFace
3. **Preprocessing**: Resizes and normalizes images for both vision encoders
4. **Inference**: Runs the model on each task
5. **Results**: Prints detailed metrics and timing information

## Dataset Information

### Image Captioning
- **Dataset**: COCO Captions (Karpathy split)
- **Source**: `yerevann/coco-karpathy`
- **Samples**: 5 (configurable)
- **Metrics**: Inference time, token generation count

### Visual Question Answering
- **Dataset**: VQAv2 validation set
- **Source**: `HuggingFaceM4/VQAv2`
- **Samples**: 5 (configurable)
- **Metrics**: Inference time, answer generation

### Object Detection
- **Dataset**: COCO Detection validation set
- **Source**: `detection-datasets/coco`
- **Samples**: 5 (configurable)
- **Metrics**: Inference time, confidence scores, bbox predictions

## Configuration

### Adjusting Sample Count

Edit the `num_samples` variable in `main()`:

```python
def main():
    num_samples = 10  # Change this value
    # ...
```

### Model Configuration

The script loads the default Oculus configuration:
- **DINOv3**: Large (1.7B parameters)
- **SigLIP2**: SO400M (400M parameters)
- **LFM2.5**: 1.2B parameters

To use different model sizes, modify the `create_oculus_model()` call:

```python
model = create_oculus_model(
    dinov3_model_size="base",  # Options: "small", "base", "large"
    siglip2_model_size="so400m",
    num_classes=150
)
```

## Loading Pretrained Weights

⚠️ **Important**: The benchmark uses a randomly initialized model by default. For meaningful results, load pretrained weights first.

### Using HuggingFace Weights

```python
# In the main() function, after loading the model:
import os
from oculus import load_dinov3_from_hf, load_siglip2_from_hf, load_lfm2_from_hf

# Set your HuggingFace token
os.environ["HF_TOKEN"] = "your_token_here"

# Load pretrained weights
load_dinov3_from_hf(
    model.dinov3_encoder,
    repo_id="facebook/dinov3-vitl16-pretrain-lvd1689m",
    token=os.getenv("HF_TOKEN")
)

load_siglip2_from_hf(
    model.siglip2_encoder,
    repo_id="google/siglip2-so400m-patch16-naflex",
    token=os.getenv("HF_TOKEN")
)

load_lfm2_from_hf(
    model.language_model,
    repo_id="LiquidAI/LFM2.5-1.2B-Base",
    token=os.getenv("HF_TOKEN")
)
```

### Using Local Weights

```python
# Load from local files
import mlx.core as mx

weights = mx.load("/path/to/model_weights.npz")
model.update(weights)
```

## Expected Output

### Sample Output Format

```
============================================================
Oculus Model Benchmark Suite
============================================================
Testing Oculus vision-language model on benchmark tasks
Compatible with MLX and Apple Silicon
============================================================

[Step 1] Loading Oculus model...
✓ Model loaded successfully

Model Configuration:
  DINOv3: DINOv3-ViT-L/16
  SigLIP2: SigLIP2-SO400M
  Language Model: LFM2.5-1.2B-Base
  Total Parameters: 3,806,600,000

[Step 2] Loading benchmark datasets...

Loading COCO Captions dataset (5 samples)...
✓ Loaded 5 COCO caption samples

============================================================
BENCHMARKING: Image Captioning
============================================================

[Sample 1/5]
  Image ID: 0
  Generated tokens: 23 tokens
  Inference time: 2.456s
  Reference captions: 5 captions

...

============================================================
CAPTIONING SUMMARY
============================================================
Total samples: 5
Successful: 5
Failed: 0
Average inference time: 2.123s
Total time: 10.615s
```

## Performance Metrics

### Timing Metrics
- **Inference Time**: Time to process a single sample
- **Average Time**: Mean inference time across all samples
- **Total Time**: Cumulative time for all samples

### Quality Metrics (with pretrained weights)
- **BLEU Score**: For captioning (requires reference captions)
- **Accuracy**: For VQA (requires ground truth answers)
- **mAP**: For detection (requires bounding box annotations)

## Troubleshooting

### Out of Memory

If you encounter memory issues:

1. Reduce the number of samples:
```python
num_samples = 3  # Reduce from 5 to 3
```

2. Use smaller model sizes:
```python
model = create_oculus_model(
    dinov3_model_size="base",  # Instead of "large"
    siglip2_model_size="so400m",
    num_classes=150
)
```

3. Process samples one at a time (already implemented in the script)

### Dataset Loading Failures

If HuggingFace datasets fail to load:
- Check your internet connection
- Verify dataset availability on HuggingFace
- The script automatically falls back to synthetic samples

### Import Errors

If you get import errors:

```bash
# Install missing dependencies
pip install --upgrade mlx datasets transformers pillow
```

## Advanced Usage

### Custom Datasets

To benchmark on your own datasets:

```python
# Create custom samples
custom_samples = [
    {
        "image": Image.open("path/to/image.jpg"),
        "captions": ["A custom caption"],
        "image_id": 0
    },
    # Add more samples...
]

# Run benchmark
benchmark.benchmark_captioning(custom_samples)
```

### Extracting Results

Access detailed results programmatically:

```python
# After running benchmarks
captioning_results = benchmark.results["captioning"]
vqa_results = benchmark.results["vqa"]
detection_results = benchmark.results["detection"]

# Save to file
import json
with open("benchmark_results.json", "w") as f:
    json.dump(benchmark.results, f, indent=2)
```

### Custom Preprocessing

Modify the `ImagePreprocessor` class for custom image preprocessing:

```python
class CustomPreprocessor(ImagePreprocessor):
    def preprocess(self, image):
        # Your custom preprocessing
        return dinov3_input, siglip2_input
```

## Performance Benchmarks (Reference)

On Apple Silicon M2 Max (64GB RAM):

| Task | Avg Time | Throughput |
|------|----------|------------|
| Image Captioning | ~2.1s | ~0.5 samples/s |
| VQA | ~1.8s | ~0.6 samples/s |
| Object Detection | ~0.8s | ~1.2 samples/s |

*Note: Times are for randomly initialized models. Pretrained models may vary.*

## Integration with Training Pipeline

To use this benchmark during training:

```python
# In your training script
from test_benchmarks import OculusBenchmark, ImagePreprocessor

# After each epoch
preprocessor = ImagePreprocessor()
benchmark = OculusBenchmark(model, preprocessor)
benchmark.benchmark_captioning(val_samples)
benchmark.print_final_summary()
```

## Citation

If you use this benchmark in your research, please cite:

```bibtex
@software{oculus2025,
  title={Oculus: Adaptive Semantic Comprehension Hierarchies},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/Oculus}
}
```

## Support

For issues or questions:
1. Check the [main README](README.md)
2. Review the [architecture documentation](ARCHITECTURE.md)
3. Open an issue on GitHub

## License

Same as the main Oculus project.