HAN Humanoid Object Detection ViT v1
Overview
This model is designed for object detection and classification tasks for humanoid robots using a Vision Transformer (ViT) backbone.
The model helps humanoid robots identify graspable objects in indoor environments.
Architecture
- Backbone: Vision Transformer (ViT-Base)
- Input Size: 224x224 RGB
- Hidden Size: 768
- 12 Transformer Layers
- 12 Attention Heads
- Output: Object class probabilities
Training Data
- Synthetic indoor object dataset
- 15 object categories
- Multiple lighting variations
- Augmented rotations and scaling
Intended Use
- Humanoid object recognition
- Robotic grasp preparation
- Indoor service robotics research
Limitations
Trained on synthetic data. Real-world deployment requires fine-tuning.
Author
Caplin43
- Downloads last month
- 5