Oculus / docs /TRAINING_ROADMAP.md
kobiakor15's picture
Upload docs/TRAINING_ROADMAP.md with huggingface_hub
4145f82 verified

๐Ÿš€ Oculus V3: Future Training Roadmap

COCO (Current) = 80 common classes. Good baseline, but limited for real-world niche tasks.

Option A: Universal Detection (The "Scanner")

Target: Detect 1200+ specific objects.

  • Dataset: LVIS or Objects365.
  • Result: Recognizes "stapler", "doorknob", "mango" instead of just generic classes.

Option B: Visual Reasoning (The "Thinker")

Target: Better VQA and complex instruction following.

  • Dataset: LLaVA-Instruct or VizWiz.
  • Why: Teaches the model to "explain why the car is parked" rather than just finding the car.
  • Result: A smarter chatbot-like VLM.

Recommendation

Since Oceanir is a VLM platform, Option B (Instruction Tuning) is the highest value next step. It improves the model's IO (Intelligence Output) significantly more than just adding more bounding boxes.