|
|
--- |
|
|
license: other |
|
|
license_name: oceanir-research-license |
|
|
license_link: LICENSE |
|
|
language: |
|
|
- en |
|
|
library_name: oceanir |
|
|
pipeline_tag: image-text-to-text |
|
|
tags: |
|
|
- vision |
|
|
- multimodal |
|
|
- vision-language |
|
|
- vqa |
|
|
- image-captioning |
|
|
- object-detection |
|
|
- oculus |
|
|
- research |
|
|
- training |
|
|
base_model: |
|
|
- facebook/dinov3-vith16plus-pretrain-lvd1689m |
|
|
- google/siglip2-base-patch16-224 |
|
|
- LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16 |
|
|
--- |
|
|
|
|
|
# Oculus - Complete Training Repository |
|
|
|
|
|
This repository contains the complete Oculus vision-language model including all training code, checkpoints, and documentation. |
|
|
|
|
|
## Quick Links |
|
|
|
|
|
| Model | Description | Link | |
|
|
|-------|-------------|------| |
|
|
| **Oculus-0.1-Instruct** | Instruction-tuned for VQA/captioning | [HuggingFace](https://huggingface.co/OceanirAI/Oculus-0.1-Instruct) | |
|
|
| **Oculus-0.1-Reasoning** | Chain-of-thought reasoning | [HuggingFace](https://huggingface.co/OceanirAI/Oculus-0.1-Reasoning) | |
|
|
| **oceanir** | Python SDK | [PyPI](https://pypi.org/project/oceanir/) | |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install oceanir |
|
|
``` |
|
|
|
|
|
```python |
|
|
from oceanir import Oculus |
|
|
|
|
|
model = Oculus.from_pretrained("OceanirAI/Oculus-0.1-Instruct") |
|
|
answer = model.ask("image.jpg", "What is this?") |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
Oculus combines state-of-the-art vision encoders with a powerful language model: |
|
|
|
|
|
### Vision Encoders |
|
|
- **DINOv3 ViT-H/16+** (`facebook/dinov3-vith16plus-pretrain-lvd1689m`) |
|
|
- Self-supervised vision transformer trained on LVD-1689M |
|
|
- 1024 hidden, 24 layers, 16 heads |
|
|
|
|
|
- **SigLIP2** (`google/siglip2-base-patch16-224`) |
|
|
- Vision-language contrastive model |
|
|
- 1152 hidden, 27 layers, 16 heads |
|
|
|
|
|
### Language Model |
|
|
- **LiquidAI LFM 2.5 1.2B Instruct** (`LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16`) |
|
|
- 1.2B parameters, 1536 embedding dim |
|
|
- 131K vocab, 32K context window |
|
|
|
|
|
### Architecture Specs |
|
|
|
|
|
| Component | Specification | |
|
|
|-----------|--------------| |
|
|
| DINOv3 | ViT-H/16+, 1024D, 24L, 16H | |
|
|
| SigLIP2 | Base, 1152D, 27L, 16H | |
|
|
| Fusion | Concatenation β 2176D | |
|
|
| Projector | 2176 β 4352 β 1536 | |
|
|
| LFM 2.5 | 1.2B params, 1536D, 16L, 24H | |
|
|
| Detection | 80 classes (COCO) | |
|
|
| Segmentation | 150 classes (ADE20K) | |
|
|
|
|
|
## Repository Structure |
|
|
|
|
|
``` |
|
|
OceanirAI/Oculus/ |
|
|
βββ config.json # Main model config |
|
|
βββ README.md # This file |
|
|
β |
|
|
βββ oculus_unified_model/ # Model implementation |
|
|
β βββ __init__.py |
|
|
β βββ modeling_oculus.py # OculusForConditionalGeneration |
|
|
β βββ configuration_oculus.py # OculusConfig |
|
|
β βββ processing_oculus.py # OculusProcessor |
|
|
β |
|
|
βββ training/ # Training scripts |
|
|
β βββ train_oculus.py # Base projector training |
|
|
β βββ train_detection.py # Detection head training |
|
|
β βββ train_detection_extended.py |
|
|
β βββ train_instruction_tuning.py # Instruct variant |
|
|
β βββ train_reasoning_v2.py # Reasoning variant |
|
|
β βββ train_oculus_coco.py # COCO training |
|
|
β |
|
|
βββ logs/ # Training logs |
|
|
β βββ training_instruct_v1.log |
|
|
β βββ training_reasoning_v2.log |
|
|
β βββ training_v2_final.log |
|
|
β |
|
|
βββ checkpoints/ # Model checkpoints |
|
|
β βββ oculus/final/ # Base projector |
|
|
β β βββ projector.npz # Vision projector weights (~822MB) |
|
|
β β βββ config.json |
|
|
β β |
|
|
β βββ oculus_detection/final/ # Detection checkpoint |
|
|
β β βββ projector.npz # Projector weights (~800MB) |
|
|
β β βββ heads.pth # Detection heads (~35MB) |
|
|
β β βββ benchmark_results.json |
|
|
β β |
|
|
β βββ oculus_instruct_v1/ # Instruction-tuned VQA |
|
|
β β βββ vqa_model/ |
|
|
β β βββ model.safetensors # BLIP VQA weights (~1.5GB) |
|
|
β β βββ tokenizer.json |
|
|
β β βββ config.json |
|
|
β β |
|
|
β βββ oculus_reasoning_v2/ # Reasoning VQA |
|
|
β βββ vqa_model/ |
|
|
β βββ model.safetensors # BLIP VQA weights (~1.5GB) |
|
|
β βββ tokenizer.json |
|
|
β βββ config.json |
|
|
β |
|
|
βββ docs/ # Documentation |
|
|
β βββ ARCHITECTURE.md |
|
|
β βββ BENCHMARK_README.md |
|
|
β βββ TRAINING_ROADMAP.md |
|
|
β |
|
|
βββ oculus_inference.py # Inference script |
|
|
βββ demo_oculus.py # Demo script |
|
|
βββ benchmark_vlm.py # Benchmarking |
|
|
βββ eval_benchmarks.py # Evaluation |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
### Base Projector Training |
|
|
```bash |
|
|
python training/train_oculus.py |
|
|
``` |
|
|
|
|
|
### Detection Head Training |
|
|
```bash |
|
|
python training/train_detection.py |
|
|
``` |
|
|
|
|
|
### Instruction Tuning |
|
|
```bash |
|
|
python training/train_instruction_tuning.py |
|
|
``` |
|
|
|
|
|
### Reasoning Training |
|
|
```bash |
|
|
python training/train_reasoning_v2.py |
|
|
``` |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Visual Question Answering (VQA)** - Answer questions about images |
|
|
- **Image Captioning** - Generate natural descriptions |
|
|
- **Object Detection** - Detect with bounding boxes (80 COCO classes) |
|
|
- **Object Counting** - Count objects via point prediction |
|
|
- **Semantic Segmentation** - Pixel-level understanding (150 ADE20K classes) |
|
|
- **Chain-of-Thought Reasoning** - Step-by-step thinking traces |
|
|
|
|
|
## License |
|
|
|
|
|
**Oceanir Research License v1.0** |
|
|
|
|
|
**Permitted:** |
|
|
- Academic research |
|
|
- Educational use |
|
|
- Publishing papers with results |
|
|
- Personal experimentation |
|
|
|
|
|
**Not Permitted:** |
|
|
- Commercial use |
|
|
- Training commercial models |
|
|
- Commercial products/services |
|
|
|
|
|
For commercial licensing: licensing@oceanir.ai |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{oculus2026, |
|
|
title={Oculus Vision-Language Model}, |
|
|
author={OceanirAI}, |
|
|
year={2026}, |
|
|
url={https://huggingface.co/OceanirAI/Oculus} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Links |
|
|
|
|
|
- [Oculus-0.1-Instruct](https://huggingface.co/OceanirAI/Oculus-0.1-Instruct) |
|
|
- [Oculus-0.1-Reasoning](https://huggingface.co/OceanirAI/Oculus-0.1-Reasoning) |
|
|
- [Oceanir SDK (PyPI)](https://pypi.org/project/oceanir/) |
|
|
- [GitHub](https://github.com/OceanirAI/oceanir) |
|
|
|