File size: 6,108 Bytes
d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc d933e76 79310dc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
---
license: other
license_name: oceanir-research-license
license_link: LICENSE
language:
- en
library_name: oceanir
pipeline_tag: image-text-to-text
tags:
- vision
- multimodal
- vision-language
- vqa
- image-captioning
- object-detection
- oculus
- research
- training
base_model:
- facebook/dinov3-vith16plus-pretrain-lvd1689m
- google/siglip2-base-patch16-224
- LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16
---
# Oculus - Complete Training Repository
This repository contains the complete Oculus vision-language model including all training code, checkpoints, and documentation.
## Quick Links
| Model | Description | Link |
|-------|-------------|------|
| **Oculus-0.1-Instruct** | Instruction-tuned for VQA/captioning | [HuggingFace](https://huggingface.co/OceanirAI/Oculus-0.1-Instruct) |
| **Oculus-0.1-Reasoning** | Chain-of-thought reasoning | [HuggingFace](https://huggingface.co/OceanirAI/Oculus-0.1-Reasoning) |
| **oceanir** | Python SDK | [PyPI](https://pypi.org/project/oceanir/) |
## Installation
```bash
pip install oceanir
```
```python
from oceanir import Oculus
model = Oculus.from_pretrained("OceanirAI/Oculus-0.1-Instruct")
answer = model.ask("image.jpg", "What is this?")
```
## Architecture
Oculus combines state-of-the-art vision encoders with a powerful language model:
### Vision Encoders
- **DINOv3 ViT-H/16+** (`facebook/dinov3-vith16plus-pretrain-lvd1689m`)
- Self-supervised vision transformer trained on LVD-1689M
- 1024 hidden, 24 layers, 16 heads
- **SigLIP2** (`google/siglip2-base-patch16-224`)
- Vision-language contrastive model
- 1152 hidden, 27 layers, 16 heads
### Language Model
- **LiquidAI LFM 2.5 1.2B Instruct** (`LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16`)
- 1.2B parameters, 1536 embedding dim
- 131K vocab, 32K context window
### Architecture Specs
| Component | Specification |
|-----------|--------------|
| DINOv3 | ViT-H/16+, 1024D, 24L, 16H |
| SigLIP2 | Base, 1152D, 27L, 16H |
| Fusion | Concatenation β 2176D |
| Projector | 2176 β 4352 β 1536 |
| LFM 2.5 | 1.2B params, 1536D, 16L, 24H |
| Detection | 80 classes (COCO) |
| Segmentation | 150 classes (ADE20K) |
## Repository Structure
```
OceanirAI/Oculus/
βββ config.json # Main model config
βββ README.md # This file
β
βββ oculus_unified_model/ # Model implementation
β βββ __init__.py
β βββ modeling_oculus.py # OculusForConditionalGeneration
β βββ configuration_oculus.py # OculusConfig
β βββ processing_oculus.py # OculusProcessor
β
βββ training/ # Training scripts
β βββ train_oculus.py # Base projector training
β βββ train_detection.py # Detection head training
β βββ train_detection_extended.py
β βββ train_instruction_tuning.py # Instruct variant
β βββ train_reasoning_v2.py # Reasoning variant
β βββ train_oculus_coco.py # COCO training
β
βββ logs/ # Training logs
β βββ training_instruct_v1.log
β βββ training_reasoning_v2.log
β βββ training_v2_final.log
β
βββ checkpoints/ # Model checkpoints
β βββ oculus/final/ # Base projector
β β βββ projector.npz # Vision projector weights (~822MB)
β β βββ config.json
β β
β βββ oculus_detection/final/ # Detection checkpoint
β β βββ projector.npz # Projector weights (~800MB)
β β βββ heads.pth # Detection heads (~35MB)
β β βββ benchmark_results.json
β β
β βββ oculus_instruct_v1/ # Instruction-tuned VQA
β β βββ vqa_model/
β β βββ model.safetensors # BLIP VQA weights (~1.5GB)
β β βββ tokenizer.json
β β βββ config.json
β β
β βββ oculus_reasoning_v2/ # Reasoning VQA
β βββ vqa_model/
β βββ model.safetensors # BLIP VQA weights (~1.5GB)
β βββ tokenizer.json
β βββ config.json
β
βββ docs/ # Documentation
β βββ ARCHITECTURE.md
β βββ BENCHMARK_README.md
β βββ TRAINING_ROADMAP.md
β
βββ oculus_inference.py # Inference script
βββ demo_oculus.py # Demo script
βββ benchmark_vlm.py # Benchmarking
βββ eval_benchmarks.py # Evaluation
```
## Training
### Base Projector Training
```bash
python training/train_oculus.py
```
### Detection Head Training
```bash
python training/train_detection.py
```
### Instruction Tuning
```bash
python training/train_instruction_tuning.py
```
### Reasoning Training
```bash
python training/train_reasoning_v2.py
```
## Features
- **Visual Question Answering (VQA)** - Answer questions about images
- **Image Captioning** - Generate natural descriptions
- **Object Detection** - Detect with bounding boxes (80 COCO classes)
- **Object Counting** - Count objects via point prediction
- **Semantic Segmentation** - Pixel-level understanding (150 ADE20K classes)
- **Chain-of-Thought Reasoning** - Step-by-step thinking traces
## License
**Oceanir Research License v1.0**
**Permitted:**
- Academic research
- Educational use
- Publishing papers with results
- Personal experimentation
**Not Permitted:**
- Commercial use
- Training commercial models
- Commercial products/services
For commercial licensing: licensing@oceanir.ai
## Citation
```bibtex
@software{oculus2026,
title={Oculus Vision-Language Model},
author={OceanirAI},
year={2026},
url={https://huggingface.co/OceanirAI/Oculus}
}
```
## Links
- [Oculus-0.1-Instruct](https://huggingface.co/OceanirAI/Oculus-0.1-Instruct)
- [Oculus-0.1-Reasoning](https://huggingface.co/OceanirAI/Oculus-0.1-Reasoning)
- [Oceanir SDK (PyPI)](https://pypi.org/project/oceanir/)
- [GitHub](https://github.com/OceanirAI/oceanir)
|