Upload docs/TRAINING_ROADMAP.md with huggingface_hub
Browse files- docs/TRAINING_ROADMAP.md +17 -0
docs/TRAINING_ROADMAP.md
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 Oculus V3: Future Training Roadmap
|
| 2 |
+
|
| 3 |
+
COCO (Current) = 80 common classes. Good baseline, but limited for real-world niche tasks.
|
| 4 |
+
|
| 5 |
+
## Option A: Universal Detection (The "Scanner")
|
| 6 |
+
**Target**: Detect 1200+ specific objects.
|
| 7 |
+
- **Dataset**: **LVIS** or **Objects365**.
|
| 8 |
+
- **Result**: Recognizes "stapler", "doorknob", "mango" instead of just generic classes.
|
| 9 |
+
|
| 10 |
+
## Option B: Visual Reasoning (The "Thinker")
|
| 11 |
+
**Target**: Better VQA and complex instruction following.
|
| 12 |
+
- **Dataset**: **LLaVA-Instruct** or **VizWiz**.
|
| 13 |
+
- **Why**: Teaches the model to "explain why the car is parked" rather than just finding the car.
|
| 14 |
+
- **Result**: A smarter chatbot-like VLM.
|
| 15 |
+
|
| 16 |
+
## Recommendation
|
| 17 |
+
Since Oceanir is a VLM platform, **Option B (Instruction Tuning)** is the highest value next step. It improves the model's IO (Intelligence Output) significantly more than just adding more bounding boxes.
|