HAN Humanoid Object Detection ViT v1

Overview

This model is designed for object detection and classification tasks for humanoid robots using a Vision Transformer (ViT) backbone.

The model helps humanoid robots identify graspable objects in indoor environments.

Architecture

  • Backbone: Vision Transformer (ViT-Base)
  • Input Size: 224x224 RGB
  • Hidden Size: 768
  • 12 Transformer Layers
  • 12 Attention Heads
  • Output: Object class probabilities

Training Data

  • Synthetic indoor object dataset
  • 15 object categories
  • Multiple lighting variations
  • Augmented rotations and scaling

Intended Use

  • Humanoid object recognition
  • Robotic grasp preparation
  • Indoor service robotics research

Limitations

Trained on synthetic data. Real-world deployment requires fine-tuning.

Author

Caplin43

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support