HERMES Navigation Model v1

Indoor semantic navigation model combining vision and 3D point cloud understanding.

Architecture

Vision encoder: CNN backbone (5-layer, 256-dim output)
Point cloud encoder: MLP with max-pooling (2048 points → 256-dim)
Fusion: 512-dim MLP with LayerNorm + Dropout
Heads: Direction (3D unit vector) + Traversability (scalar 0-1)

Training

Dataset: SUN RGB-D (5,509 indoor scenes)
Split: 90/5/5 (train/val/test)
Optimizer: AdamW (lr=2e-4, cosine schedule)
Mixed precision: bf16 on CUDA

Formats

Format	File	Use Case
PyTorch	`pytorch/hermes_nav_v1.pth`	Training/fine-tuning
SafeTensors	`pytorch/hermes_nav_v1.safetensors`	Fast safe loading
ONNX	`onnx/hermes_nav_v1.onnx`	Cross-platform inference

Usage

import torch
from hermes.training.model import HermesNavigationModel

model = HermesNavigationModel()
model.load_state_dict(torch.load("hermes_nav_v1.pth"))
model.eval()

image = torch.randn(1, 3, 256, 256)
points = torch.randn(1, 2048, 3)
output = model(image, points)
# output["direction"]: [1, 3] goal direction
# output["traversability"]: [1, 1] traversability score

Citation

ANIMA Suite — Robot Flow Labs

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support