Caplin43
/

han-humanoid-object-detection-vit-v1

Image Classification

vision_transformer

computer-vision

vision-transformer

object-detection

Model card Files Files and versions

HAN Humanoid Object Detection ViT v1

Overview

This model is designed for object detection and classification tasks for humanoid robots using a Vision Transformer (ViT) backbone.

The model helps humanoid robots identify graspable objects in indoor environments.

Architecture

Backbone: Vision Transformer (ViT-Base)
Input Size: 224x224 RGB
Hidden Size: 768
12 Transformer Layers
12 Attention Heads
Output: Object class probabilities

Training Data

Synthetic indoor object dataset
15 object categories
Multiple lighting variations
Augmented rotations and scaling

Intended Use

Humanoid object recognition
Robotic grasp preparation
Indoor service robotics research

Limitations

Trained on synthetic data. Real-world deployment requires fine-tuning.

Author

Caplin43

Downloads last month: 7