HERMES Navigation Model v1

Indoor semantic navigation model combining vision and 3D point cloud understanding.

Architecture

  • Vision encoder: CNN backbone (5-layer, 256-dim output)
  • Point cloud encoder: MLP with max-pooling (2048 points โ†’ 256-dim)
  • Fusion: 512-dim MLP with LayerNorm + Dropout
  • Heads: Direction (3D unit vector) + Traversability (scalar 0-1)

Training

  • Dataset: SUN RGB-D (5,509 indoor scenes)
  • Split: 90/5/5 (train/val/test)
  • Optimizer: AdamW (lr=2e-4, cosine schedule)
  • Mixed precision: bf16 on CUDA

Formats

Format File Use Case
PyTorch pytorch/hermes_nav_v1.pth Training/fine-tuning
SafeTensors pytorch/hermes_nav_v1.safetensors Fast safe loading
ONNX onnx/hermes_nav_v1.onnx Cross-platform inference

Usage

import torch
from hermes.training.model import HermesNavigationModel

model = HermesNavigationModel()
model.load_state_dict(torch.load("hermes_nav_v1.pth"))
model.eval()

image = torch.randn(1, 3, 256, 256)
points = torch.randn(1, 2048, 3)
output = model(image, points)
# output["direction"]: [1, 3] goal direction
# output["traversability"]: [1, 1] traversability score

Citation

ANIMA Suite โ€” Robot Flow Labs

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support