Oculus - Complete Training Repository
This repository contains the complete Oculus vision-language model including all training code, checkpoints, and documentation.
Quick Links
| Model | Description | Link |
|---|---|---|
| Oculus-0.1-Instruct | Instruction-tuned for VQA/captioning | HuggingFace |
| Oculus-0.1-Reasoning | Chain-of-thought reasoning | HuggingFace |
| oceanir | Python SDK | PyPI |
Installation
pip install oceanir
from oceanir import Oculus
model = Oculus.from_pretrained("OceanirAI/Oculus-0.1-Instruct")
answer = model.ask("image.jpg", "What is this?")
Architecture
Oculus combines state-of-the-art vision encoders with a powerful language model:
Vision Encoders
DINOv3 ViT-H/16+ (
facebook/dinov3-vith16plus-pretrain-lvd1689m)- Self-supervised vision transformer trained on LVD-1689M
- 1024 hidden, 24 layers, 16 heads
SigLIP2 (
google/siglip2-base-patch16-224)- Vision-language contrastive model
- 1152 hidden, 27 layers, 16 heads
Language Model
- LiquidAI LFM 2.5 1.2B Instruct (
LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16)- 1.2B parameters, 1536 embedding dim
- 131K vocab, 32K context window
Architecture Specs
| Component | Specification |
|---|---|
| DINOv3 | ViT-H/16+, 1024D, 24L, 16H |
| SigLIP2 | Base, 1152D, 27L, 16H |
| Fusion | Concatenation β 2176D |
| Projector | 2176 β 4352 β 1536 |
| LFM 2.5 | 1.2B params, 1536D, 16L, 24H |
| Detection | 80 classes (COCO) |
| Segmentation | 150 classes (ADE20K) |
Repository Structure
OceanirAI/Oculus/
βββ config.json # Main model config
βββ README.md # This file
β
βββ oculus_unified_model/ # Model implementation
β βββ __init__.py
β βββ modeling_oculus.py # OculusForConditionalGeneration
β βββ configuration_oculus.py # OculusConfig
β βββ processing_oculus.py # OculusProcessor
β
βββ training/ # Training scripts
β βββ train_oculus.py # Base projector training
β βββ train_detection.py # Detection head training
β βββ train_detection_extended.py
β βββ train_instruction_tuning.py # Instruct variant
β βββ train_reasoning_v2.py # Reasoning variant
β βββ train_oculus_coco.py # COCO training
β
βββ logs/ # Training logs
β βββ training_instruct_v1.log
β βββ training_reasoning_v2.log
β βββ training_v2_final.log
β
βββ checkpoints/ # Model checkpoints
β βββ oculus/final/ # Base projector
β β βββ projector.npz # Vision projector weights (~822MB)
β β βββ config.json
β β
β βββ oculus_detection/final/ # Detection checkpoint
β β βββ projector.npz # Projector weights (~800MB)
β β βββ heads.pth # Detection heads (~35MB)
β β βββ benchmark_results.json
β β
β βββ oculus_instruct_v1/ # Instruction-tuned VQA
β β βββ vqa_model/
β β βββ model.safetensors # BLIP VQA weights (~1.5GB)
β β βββ tokenizer.json
β β βββ config.json
β β
β βββ oculus_reasoning_v2/ # Reasoning VQA
β βββ vqa_model/
β βββ model.safetensors # BLIP VQA weights (~1.5GB)
β βββ tokenizer.json
β βββ config.json
β
βββ docs/ # Documentation
β βββ ARCHITECTURE.md
β βββ BENCHMARK_README.md
β βββ TRAINING_ROADMAP.md
β
βββ oculus_inference.py # Inference script
βββ demo_oculus.py # Demo script
βββ benchmark_vlm.py # Benchmarking
βββ eval_benchmarks.py # Evaluation
Training
Base Projector Training
python training/train_oculus.py
Detection Head Training
python training/train_detection.py
Instruction Tuning
python training/train_instruction_tuning.py
Reasoning Training
python training/train_reasoning_v2.py
Features
- Visual Question Answering (VQA) - Answer questions about images
- Image Captioning - Generate natural descriptions
- Object Detection - Detect with bounding boxes (80 COCO classes)
- Object Counting - Count objects via point prediction
- Semantic Segmentation - Pixel-level understanding (150 ADE20K classes)
- Chain-of-Thought Reasoning - Step-by-step thinking traces
License
Oceanir Research License v1.0
Permitted:
- Academic research
- Educational use
- Publishing papers with results
- Personal experimentation
Not Permitted:
- Commercial use
- Training commercial models
- Commercial products/services
For commercial licensing: licensing@oceanir.ai
Citation
@software{oculus2026,
title={Oculus Vision-Language Model},
author={OceanirAI},
year={2026},
url={https://huggingface.co/OceanirAI/Oculus}
}
Links
- Downloads last month
- 49
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for OceanirAI/Oculus
Base model
LiquidAI/LFM2.5-1.2B-Base
Finetuned
LiquidAI/LFM2.5-1.2B-Instruct
Finetuned
LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16